Recent Releases of ensemblgenedownload
ensemblgenedownload - v2.0.2 -- Vicious Uruk-hai (patch 2)
Enhancements & fixes
- Remove defaults from lib/Utils.groovy
- Groovy
Published by tkchafin over 1 year ago
ensemblgenedownload - v2.0.1 - Vicious Uruk-hai (patch 1)
Enhancements & fixes
- Update module versions
- Remove reference to Anaconda repositories
Software dependencies
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only Docker or Singularity containers are supported, conda is not supported.
| Dependency | Old version | New version |
| ---------- | ----------- | ----------- |
| Python | 3.8.3,3.9.1 | 3.9.1 |
| samtools | 1.17 | 1.21 |
| tabix | 1.11 | 1.20 |
- Groovy
Published by tkchafin over 1 year ago
ensemblgenedownload - v2.0.0 – Vicious Uruk-hai
This version supports the new FTP structure of Ensembl
Enhancements & fixes
- Support for the updated directory structure of the Ensembl FTP
- Relative paths in the sample-sheet are now evaluated from the
--outdirparameter - Memory usage rules for
samtools dict - Appropriate use of
tabix's TBI and CSI indexing, depending on the sequence lengths - New command-line parameter (
--annotation_method): required for accessing the files on the Ensembl FTP --outdiris a mandatory parameter
Parameters
| Old parameter | New parameter | | ------------- | ------------------- | | | --annotation_method |
In the samplesheet
| Old parameter | New parameter | | ------------- | ----------------- | | speciesdir | outdir | | | annotationmethod | | assembly_name | |
NB: Parameter has been updated if both old and new parameter information is present. NB: Parameter has been added if just the new parameter information is present. NB: Parameter has been removed if new parameter information isn't present.
Software dependencies
Note, since the pipeline is using Nextflow DSL2, each process will be run with its own Biocontainer. This means that on occasion it is entirely possible for the pipeline to be using different versions of the same tool. However, the overall software dependency changes compared to the last release have been listed below for reference. Only Docker or Singularity containers are supported, conda is not supported.
| Dependency | Old version | New version | | ---------- | ----------- | ----------- | | multiqc | 1.13 | 1.14 |
- Groovy
Published by muffato about 2 years ago
ensemblgenedownload - v1.0.1 – Hefty mûmakil (patch 1)
Overview
The pipeline takes a CSV file that contains assembly accession number, Ensembl species names (as they may differ from Tree of Life ones !), output directories, and geneset versions.
Assembly accession numbers are optional. If missing, the pipeline assumes it can be retrieved from files named ACCESSION in the standard location on disk.
The pipeline downloads the Fasta files of the genes (cdna, cds, and protein sequences) as well as the GFF3 file.
All files are compressed with bgzip, and indexed with samtools faidx or tabix.
Steps involved:
- Download from Ensembl the GFF3 file, and the sequences of the genes in Fasta format.
- Compress and index all Fasta files with
bgzip,samtools faidx, andsamtools dict. - Compress and index the GFF3 file with
bgzipandtabix.
Fixed since v1.0.0
When a samplesheet is provided, do not process the individual command-line parameters
Dependencies
All dependencies are automatically fetched by Singularity.
- bgzip
- samtools
- tabix
- python3
- wget
- awk
- gzip
- Groovy
Published by muffato over 3 years ago
ensemblgenedownload - v1.0.0 – Hefty mûmakil
Overview
The pipeline takes a CSV file that contains assembly accession number, Ensembl species names (as they may differ from Tree of Life ones !), output directories, and geneset versions.
Assembly accession numbers are optional. If missing, the pipeline assumes it can be retrieved from files named ACCESSION in the standard location on disk.
The pipeline downloads the Fasta files of the genes (cdna, cds, and protein sequences) as well as the GFF3 file.
All files are compressed with bgzip, and indexed with samtools faidx or tabix.
Steps involved:
- Download from Ensembl the GFF3 file, and the sequences of the genes in Fasta format.
- Compress and index all Fasta files with
bgzip,samtools faidx, andsamtools dict. - Compress and index the GFF3 file with
bgzipandtabix.
Dependencies
All dependencies are automatically fetched by Singularity.
- bgzip
- samtools
- tabix
- python3
- wget
- awk
- gzip
- Groovy
Published by muffato over 3 years ago