cluefish
A specialised workflow designed to enhance the biological interpretation of transcriptomic data series 🎣
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
â—‹Academic publication links
-
✓Committers with academic emails
1 of 2 committers (50.0%) from academic institutions -
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Keywords
interpretation
r
transcriptomics
workflow
Last synced: 6 months ago
·
JSON representation
Repository
A specialised workflow designed to enhance the biological interpretation of transcriptomic data series 🎣
Basic Info
- Host: GitHub
- Owner: ellfran-7
- License: other
- Language: R
- Default Branch: main
- Homepage: https://ellfran-7.github.io/cluefish/
- Size: 31.1 MB
Statistics
- Stars: 3
- Watchers: 2
- Forks: 1
- Open Issues: 2
- Releases: 2
Topics
interpretation
r
transcriptomics
workflow
Created over 1 year ago
· Last pushed 6 months ago
Metadata Files
Readme
License
Codemeta
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# cluefish
```{=html}
```
[](https://github.com/ellfran-7/cluefish/graphs/contributors) [](https://github.com/ellfran-7/cluefish/network/members) [](https://github.com/ellfran-7/cluefish/stargazers) [](https://github.com/ellfran-7/cluefish/issues) [](https://github.com/ellfran-7/cluefish/blob/main/LICENSE) [](https://linkedin.com/in/ellis-franklin-6188831ba)
Table of Contents
## Overview
Cluefish is a free and open-source, semi-automated R workflow designed for comprehensive and untargeted exploration of transcriptomic data series. Its name reflects the three key concepts driving the workflow: **Clustering**, **Enrichment**, and **Fishing**—metaphorically aligned with "*fishing for clues*"🎣 in complex biological data.
When used alongside the [DRomics](https://lbbe-software.github.io/DRomics/) (Dose-Response for Omics) R package, Cluefish provides a more comprehensive analysis of dose-response transcriptomic data. In toxicology/ecotoxicology, this will support the understanding/highlighting of contaminant’s mode of action.
This workflow addresses the limitations of the standard Over-Representation Analysis (ORA) by applying ORA to pre-clustered networks. These clusters serve as anchors for ORA, enhancing enrichment detection sensitivity and thus enabling the identification of smaller, more specific biological processes while simultaneously forming exploratory gene groups.
Cluefish is designed to be adaptable to a wide range of organisms, both model and non-model, ensuring broad applicability across various biological contexts.
------------------------------------------------------------------------
If you're ready to dive straight into using Cluefish, check out the Introduction to Cluefish vignette
------------------------------------------------------------------------
Graphical abstract of the Cluefish workflow.
## Installation
The Cluefish tool is developed in **R**, so having **R** installed is a prerequisite. You can download it [here](https://posit.co/download/rstudio-desktop/).
For an enhanced experience, we recommend using the **RStudio** integrated development environment (IDE), which is available for download at the same link, [here](https://posit.co/download/rstudio-desktop/).
You can use Cluefish locally in one of two ways:
1. Clone the repository via a terminal:
``` sh
git clone https://github.com/ellfran-7/cluefish.git
```
2. Install the developmental version of Cluefish from GitHub in R (`remotes` needed):
``` r
if (!requireNamespace("remotes", quietly = TRUE))
install.packages("remotes")
remotes::install_github("ellfran-7/cluefish")
```
## Additional Requirements
Cluefish relies on external open source software for an intermediate step within its workflow. Please ensure the following tools are installed:
1. **Cytoscape**:
Cluefish uses Cytoscape in order to visualize PPI networks. Install Cytoscape from their [download page](https://cytoscape.org/download.html).
2. **Required Cytoscape Apps**:
Within Cytoscape, install the **StringApp** and **clusterMaker2** apps. To do this:
- Open Cytoscape
- Navigate to `Apps` \> `App Store` \> `Show App Store`
- Search for and install "StringApp" (for retrieving STRING protein interactions) and "clusterMaker2"" (for clustering network data).
*You can also view more about these apps on the [Cytoscape App Store](https://apps.cytoscape.org/).*
## Usage
To run the Cluefish workflow, you can use the `make.R` script, which serves as the 'master' script for the entire process. We recommend using this script as a template to ensure smooth and sequential execution of the workflow steps.
### Required R packages
A key feature of Cluefish is the integration of `renv` to create reproducible environments. This allows you to install the required R packages in two ways:
- Run `renv::install()` to install the most recent version of the packages listed in the `renv.lock` file.
- For full reproducibility, run `renv::restore()` to install the exact package versions specified in the `renv.lock` file. Note that this process may take longer.
### Required inputs
Cluefish requires two key inputs:
1. **A background transcript list**: Typically, this includes the identifiers for all detected transcripts in the experiment.
2. **A deregulated transcript list**: A subset of the background list, containing the identifiers of significantly deregulated transcripts. This list can be derived using any selection method.
### Recommended Selection Method
While the inputs can be derived from any selection method, Cluefish was optimised to work seamlessly with the results from `DRomics`, a tool tailored for dose-response modelling of omics data.
Although using `DRomics` is optional, Cluefish leverages some of its visualization functions and modelling metrics to provide deeper insights into the biological interpretation of the data.
*For more information on DRomics, please refer to their [documentation](https://lbbe-software.github.io/DRomics/)*.
## Workflow
A schematic overview of the Cluefish workflow is shown below. For a full, step-by-step guide, refer to the vignette, [Introduction to Cluefish](https://ellfran-7.github.io/cluefish/articles/cluefish.html), which provides instructions using the *ZebrafishDBP* example dataset. The raw count data is publicly available on NCBI GEO and can be accessed with **GSE283957**.
Schematic of the Cluefish workflow.
## Citation
If you use Cluefish, please cite the associated paper as follows:
> Ellis Franklin, Elise Billoir, Philippe Veber, Jérémie Ohanessian, Marie Laure Delignette-Muller, Sophie Martine Prud’homme, Cluefish: mining the dark matter of transcriptional data series with over-representation analysis enhanced by aggregated biological prior knowledge, *NAR Genomics and Bioinformatics*, Volume 7, Issue 3, September 2025, lqaf103,
## Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are **greatly appreciated**.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
1. Fork the Project
2. Create your Feature Branch (`git checkout -b feature/AmazingIdea`)
3. Commit your Changes (`git commit -m 'Add some AmazingIdea'`)
4. Push to the Branch (`git push origin feature/AmazingIdea`)
5. Open a Pull Request
## License
This project is distributed under the CeCILL Free Software License Agreement v2.1 (CECILL-2.1). See `LICENSE.txt` for more information.
CECILL-2.1 is compatible with GNU GPL. See the [official CeCILL site](http://www.cecill.eu/index.en.html) for more information.
Please note that the creative assets, such as the logos and schematics associated with Cluefish, are distributed under the [CC-BY-SA-4.0 license](https://choosealicense.com/licenses/cc-by-sa-4.0/).
## Contact
If you have any need that is not yet covered, any feedback on Cluefish, or anything other question, feel free to contact me !
Ellis Franklin - [Website](https://ellfranklin.com/) - [LinkedIn](https://www.linkedin.com/in/ellis-franklin-6188831ba/) [Bluesky](https://bsky.app/profile/elfrank7.bsky.social) - [ellis.franklin\@univ-lorraine.fr](mailto:ellis.franklin@univ-lorraine.fr){.email}
Project Link:
## Acknowledgments
- [Othneil Drew's README template](https://github.com/othneildrew/Best-README-Template)
- [Malven's Flexbox Cheatsheet](https://flexbox.malven.co/)
- [Malven's Grid Cheatsheet](https://grid.malven.co/)
- [Img Shields](https://shields.io/)
Owner
- Name: Ellis
- Login: ellfran-7
- Kind: user
- Location: Metz
- Repositories: 1
- Profile: https://github.com/ellfran-7
New to Git but I'm with it
CodeMeta (codemeta.json)
{
"@context": "https://w3id.org/codemeta/3.0",
"type": "SoftwareSourceCode",
"applicationCategory": "Ecotoxicology, Toxicology, Pharmacology, Ecology ",
"author": [
{
"id": "https://orcid.org/0000-0002-6614-4109",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "Universit de Lorraine, CNRS, LIEC, F-57000 Metz, France"
},
"email": "ellisfranklin5@gmail.com",
"familyName": "FRANKLIN",
"givenName": "Ellis"
},
{
"type": "Role",
"schema:author": "https://orcid.org/0000-0002-6614-4109",
"roleName": "Maintainer"
},
{
"type": "Role",
"schema:author": "https://orcid.org/0000-0002-6614-4109",
"roleName": "Developer"
},
{
"id": "https://orcid.org/0000-0001-9012-3298",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "Universit de Lorraine, CNRS, LIEC, F-57000 Metz, France"
},
"email": "elise.billoir@univ-lorraine.fr",
"familyName": "Billoir",
"givenName": "Elise"
},
{
"type": "Role",
"schema:author": "https://orcid.org/0000-0001-9012-3298",
"roleName": "Reviewer"
},
{
"id": "https://orcid.org/0000-0001-5453-3994",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "Universit Claude Bernard Lyon 1, LBBE, UMR 5558, CNRS, VAS, F-69622 Villeurbanne, France"
},
"email": "marielaure.delignettemuller@vetagro-sup.fr",
"familyName": "Delignette-Muller",
"givenName": "Marie Laure"
},
{
"type": "Role",
"schema:author": "https://orcid.org/0000-0001-5453-3994",
"roleName": "Reviewer"
},
{
"id": "https://orcid.org/0000-0002-7199-1839",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "Universit de Lorraine, CNRS, LIEC, F-57000 Metz, France"
},
"email": "sophie.prud-homme@univ-lorraine.fr",
"familyName": "Prud'homme",
"givenName": "Sophie M."
},
{
"type": "Role",
"schema:author": "https://orcid.org/0000-0002-7199-1839",
"roleName": "Reviewer"
}
],
"codeRepository": "git+https://github.com/ellfran-7/cluefish",
"contributor": {
"id": "https://orcid.org/0009-0007-3909-3477",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "Universit de Lorraine, CNRS, LIEC, F-57000 Metz, France"
},
"email": "jeremie.ohanessian@univ-lorraine.fr",
"familyName": "Ohanessian",
"givenName": "Jrmie"
},
"dateCreated": "2023-12-11",
"dateModified": "2024-12-17",
"datePublished": "2024-12-17",
"description": "Cluefish is a free and open-source, semi-automated R workflow designed for comprehensive and untargeted exploration of transcriptomic data series. Its name reflects the three key concepts driving the workflow: Clustering, Enrichment, and Fishingmetaphorically aligned with fishing for clues in complex biological data. ",
"downloadUrl": "https://github.com/ellfran-7/cluefish/archive/refs/tags/v1.0.0.tar.gz",
"funder": {
"type": "Organization",
"name": "ANR"
},
"keywords": [
"rnaseq",
"data series",
"dose-response modelling",
"functional enrichment",
"protein-protein interaction networking"
],
"license": "https://spdx.org/licenses/CECILL-2.1",
"name": "Cluefish",
"operatingSystem": [
"Windows",
"macOS",
"Linux"
],
"programmingLanguage": "R",
"releaseNotes": "Main updates:\n- Included analysis of two external transcriptomic dose-response datasets in the analyses/ directory to provide further validation and testing of the Cluefish workflow\n- Included additional scripts to reproduce the figures and tables for both external datasets from the associated paper\n- Re-structured the analyses/ folder tree in general\n- Updated clusterenrich() and simplenrich() functions to now compute enrichment ratios for functionally enriched biological functions\n- Added download method choice for the dl_regulation_data() function to provide more flexibility\n- Fixed issue in dataframe merging for both clusterfusion() and lonelyfishing()\n\nSide-quests:\n- Enhanced documentation in scripts for reproducing results (figures and tables) from the associated paper\n- Corrected subtle issues in table generation (e.g., selecting only GO:BP terms for the enrichment results, although KEGG and WP pathways were enriched)\n- Added graphical abstract illustration from the associated paper to both the root README.md and vignette\n- Applied minor fixes and text adjustments throughout the project",
"runtimePlatform": "R version 4.4.1 (2024-12-04)",
"softwareRequirements": "R (>=4.4.1), biomaRt (>=2.60.1), gprofiler2 (>=0.2.3), g:profiler (e111_eg58_p18_f463989d) , Cytoscape (>=3.10.2), STRING (>=12.0), StringApp (>= 2.1.1), clustermaker2 (>=2.3.4)",
"version": "1.0.1",
"codemeta:contIntegration": {
"id": "https://github.com/ellfran-7/cluefish"
},
"continuousIntegration": "https://github.com/ellfran-7/cluefish",
"developmentStatus": "active",
"funding": "ANR JCJC CHROCO [ANR-21-CE34-0003]",
"issueTracker": "https://github.com/ellfran-7/cluefish/issues"
}
GitHub Events
Total
- Create event: 6
- Issues event: 3
- Release event: 3
- Watch event: 2
- Delete event: 1
- Issue comment event: 1
- Push event: 48
- Public event: 1
- Pull request review event: 3
- Pull request event: 7
- Fork event: 1
Last Year
- Create event: 6
- Issues event: 3
- Release event: 3
- Watch event: 2
- Delete event: 1
- Issue comment event: 1
- Push event: 48
- Public event: 1
- Pull request review event: 3
- Pull request event: 7
- Fork event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Ellis Franklin | e****n@u****r | 296 |
| Marie-Laure DELIGNETTE-MULLER | m****e | 3 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 2
- Total pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- simon-thiry (2)
Pull Request Authors
- simon-thiry (8)
Top Labels
Issue Labels
enhancement (1)
bug (1)
Pull Request Labels
Dependencies
DESCRIPTION
cran
- DRomics * imports
- biomaRt * imports
- data.table * imports
- dplyr * imports
- fs * imports
- gprofiler2 * imports
- quarto * imports
- rlang * imports
- stats * imports
- stringr * imports
- tidyr * imports
- utils * imports