fair_genome_indexer
Download and index Ensembl sequences and annotations, remove non-canonical chromosimes, remove low TSL, index with multiple tools
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.9%) to scientific vocabulary
Keywords
Repository
Download and index Ensembl sequences and annotations, remove non-canonical chromosimes, remove low TSL, index with multiple tools
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 38
Topics
Metadata Files
README.md
Snakemake workflow used to deploy and perform basic indexes of genome sequence.
This is done for teaching purpose as an example of FAIR principles applied with Snakemake.
Usage
The usage of this workflow is described in the Snakemake workflow catalog, it is also available locally on a single page.
Results
The expected results of this pipeline are described here.
Material and methods
The tools used in this pipeline are described here textually.
Step by step
Get DNA sequences
| Step | Commands | | -------------------------------- | ---------------------------------------------------------------------------------------------------------------- | | Download DNA Fasta from Ensembl | ensembl-sequence | | Remove non-canonical chromosomes | pyfaidx | | Index DNA sequence | samtools | | Creatse sequence Dictionary | picard |
┌────────────────────────────────────────┐
│Download Ensembl Sequence (wget + gzip) │
└──────────────────┬─────────────────────┘
│
│
┌──────────────────▼────────────────────────┐
│Remove non-canonical chromosomes (pyfaidx) │
└──────────────────┬──────────────────────┬─┘
│ │
│ │
┌──────────────────▼──────────┐ ┌─▼───────────────────────────────────┐
│Index DNA Sequence (samtools)│ │Create sequence dictionary (Picard) │
└─────────────────────────────┘ └─────────────────────────────────────┘
Get genome annotation (GTF)
| Step | Commands |
| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------- |
| Download GTF annotation | ensembl-annotation |
| Fix format errors | Agat |
| Remove non-canonical chromosomes, based on above DNA Fasta | Agat |
| Remove <NA> Transcript support levels | Agat |
| Convert GTF to GenePred format | gtf2genepred |
┌─────────────────────────────────────────┐
│Download Ensembl Annotation (wget + gzip)│
└─────────────┬───────────────────────────┘
│
│
┌─────────────▼─────────┐
│Fix format Error (Agat)│
└─────────────┬─────────┘
│
│
┌─────────────▼─────────────────────────┐ ┌────────────────────────────────────────┐
│Remove non-canonical chromosomes (Agat)◄───────────┤Fasta sequence index (see Get DNA Fasta)│
└─────────────┬─────────────────────────┘ └────────────────────────────────────────┘
│
│
┌─────────────▼───────────────────────┐
│Remove <NA> transcript levels (Agat) │
└─────────────┬───────────────────────┘
│
│
┌─────────────▼────────────────┐
│Convert GTF to GenePred (UCSC)│
└──────────────────────────────┘
Get transcripts sequence
| Step | Commands | | --------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | | Extract transcript sequences from above DNA Fasta and GTF | gffread | | Index DNA sequence | samtools | | Creatse sequence Dictionary | picard |
┌───────────────────────────────┐ ┌─────────────────────────────┐
│GTF (see get genome annotation)│ │DNA Fasta (See get dna fasta)│
└────────────────────┬──────────┘ └────────┬────────────────────┘
│ │
│ │
┌──────▼───────────────────────────▼─────┐
│Extract transcripts sequences (gffread) │
└──────┬───────────────────────────┬─────┘
│ │
│ │
┌────────────────────▼────┐ ┌────────▼───────────────────────────┐
│Index sequence (samtools)│ │Create sequence dictionary (Picard) │
└─────────────────────────┘ └────────────────────────────────────┘
Get cDNA sequences
| Step | Commands | | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- | | Extract coding transcripts from above GTF | Agat | | Extract coding sequences from above DNA Fasta and GTF | gffread | | Index DNA sequence | samtools | | Creatse sequence Dictionary | picard |
┌───────────────────────────────┐ ┌─────────────────────────────┐
│GTF (see get genome annotation)│ │DNA Fasta (See get dna fasta)│
└────────────────────┬──────────┘ └────────┬────────────────────┘
│ │
│ │
┌──────▼───────────────────────────▼─────┐
│Extract cDNA sequences (gffread) │
└──────┬───────────────────────────┬─────┘
│ │
│ │
┌────────────────────▼────┐ ┌────────▼───────────────────────────┐
│Index sequence (samtools)│ │Create sequence dictionary (Picard) │
└─────────────────────────┘ └────────────────────────────────────┘
Get dbSNP variants
| Step | Commands | | -------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | | Download dbSNP variants | ensembl-variation | | Filter non-canonical chromosomes | pyfaidx + BCFTools | | Index variants | tabix |
```
┌──────────────────────────────────────────┐
│Download dbSNP variants (wget + bcftools) │
└──────────┬───────────────────────────────┘
│
│
┌──────────▼───────────────────────────────────────────┐
│Remove non-canonical chromosomes (bcftools + bedtools)│
└──────────┬───────────────────────────────────────────┘
│
│
┌──────────▼─────────────┐
│Index variants (tabix) │
└────────────────────────┘
```
Get transcriptid, geneid, and gene_name correspondancy
| Step | Commands | | ----------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Extract geneid <-> genename correspondancy | pyroe | | Extract transcriptid <-> geneid <-> gene_name | Agat + XSV |
┌────────────────────────────────┐
│Genome annotation (see get GTF) ├──────────────────┐
└──────┬─────────────────────────┘ │
│ │
│ │
┌──────▼──────────────────────────────┐ ┌────────▼─────────────────────────────────────────────┐
│Extract gene_id <-> gene_name (pyroe)│ │Extract gene_id <-> gene_name <-> transcript_id (Agat)│
└──────┬──────────────────────────────┘ └────────┬─────────────────────────────────────────────┘
│ │
│ │
┌──────▼─────┐ ┌────────▼────┐
│Format (XSV)│ │Format (XSV) │
└────────────┘ └─────────────┘
Get blacklisted regions
| Step | Commands | | ---------------------------- | -------------------------------------------------------------------------------------------- | | Download blacklisted regions | Github source | | Merge overlapping intervals | bedtools |
```
┌────────────────────────────────┐
│Download known blacklists (wget)│
└────────────┬───────────────────┘
│
│
┌────────────▼──────────────────────────┐
│Merge overlapping intervals (bedtools) │
└───────────────────────────────────────┘
```
GenePred format
| Step | Commands | | --------------- | -------------------------------------------------------------------------------------------------- | | GTF to GenePred | UCSC-tools |
```
┌────────────────────────────────┐
│Genome annotation (see get GTF) │
└────────────┬───────────────────┘
│
│
┌────────────▼──────────────┐
│GTFtoGenePred (UCSC-tools) │
└───────────────────────────┘
```
2bit format
| Step | Commands | | --------------- | -------------------------------------------------------------------------------------------------- | | Fasta to 2bit | UCSC-tools |
```
┌────────────────────────────────┐
│Genome sequence (see get Fasta) │
└────────────┬───────────────────┘
│
│
┌────────────▼──────────────┐
│FaToTwoBit (UCSC-tools) │
└───────────────────────────┘
```
STAR index
| Step | Commands | | --------------- | -------------------------------------------------------------------------------------------------- | | STAR index | STAR |
```
┌────────────────────────────────┐
│Genome sequence (see get DNA) │
└────────────┬───────────────────┘
│
│
┌───────▼────┐
│ STAR index │
└────────────┘
```
Bowtie2 index
| Step | Commands | | --------------- | -------------------------------------------------------------------------------------------------- | | Bowtie2 build | Bowtie2 build |
```
┌────────────────────────────────┐
│Genome sequence (see get DNA) │
└────────────┬───────────────────┘
│
│
┌───────▼────┐
│ STAR index │
└────────────┘
```
Salmon decoy aware gentrome index
| Step | Commands | | --------------- | -------------------------------------------------------------------------------------------------- | | Generate decoy | Bash | | Salmon index | Salmon |
┌─────────────────────────────┐ ┌─────────────────────────────────────┐
│Genome sequence (see get DNA)│ │Transcriptome sequence (see get cDNA)│
└──────────────────────────┬──┘ └─────┬───────────────────────────────┘
│ │
│ │
│ │
┌────▼─────────────────▼────┐
│Generate decoy and gentrome│
└─────────────┬─────────────┘
│
┌─────────────────┐ │ ┌───────────────┐
│Gentrome sequence◄────────────────┴─────►Decoy sequences│
└────────────┬────┘ └────┬──────────┘
│ │
│ │
│ ┌──────────────┐ │
└───────► Salmon index ◄─────────┘
└──────────────┘
Owner
- Name: tdayris
- Login: tdayris
- Kind: user
- Company: Institut Gustave Roussy
- Website: https://gustaveroussy.github.io/STRonGR/
- Repositories: 2
- Profile: https://github.com/tdayris
Bioinformatician
Citation (CITATION.cff)
authors: - family-names: Dayris given-names: Thibault orcid: https://orcid.org/0009-0009-2758-8450 cff-version: 1.2.0 date-released: '2025-06-13' message: If you use this software, please cite it as below. title: fair-genome-indexer url: https://github.com/tdayris/fair_genome_indexer version: 3.10.0
GitHub Events
Total
- Release event: 6
- Push event: 16
- Create event: 6
Last Year
- Release event: 6
- Push event: 16
- Create event: 6
Committers
Last synced: about 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| tdayris | t****s@g****r | 47 |
| tdayris | t****s@o****r | 25 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: about 2 years ago
All Time
- Total issues: 0
- Total pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- tdayris (1)