Recent Releases of genomepy
genomepy - [0.16.2] - 2025-05-12
Fixed
- Ensembl release versions no longer includes unreleased versions
- unit tests
- upgraded formatters (and fixed the marked grammar & spelling errors)
Scientific Software - Peer-reviewed
- Python
Published by siebrenf 10 months ago
genomepy - [0.16.1] - 2023-06-14
Fixed
- fix for NCBI's assembly report header "asm_submitter" instead of "submitter"
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 2 years ago
genomepy - [0.16.0] - 2023-05-31
Added
genomepy searchnow accepts the--exactflaggenomepy.Annotation.attributes()returns a list of all attributes from the GTF attributes column.- e.g. genename, geneversion
- nice to use with
genomepy.Annotation.from_attributes()orgenomepy.Annotation.gtf_dict()
- When installing assemblies from older Ensembl release versions, a clearer error message is given if assembly cannot be found:
- if the release does not exist, options will be given
- if the assembly does not exist on the release version, all available options are given
- if the URL to the genome or annotation files is incorrect, the error message stays the same
- new config option:
ucsc_mirror, options:euorus.- the mirror should only affect download speed
- can be nice if the other mirror is down!
Changed
- function
get_divisionis now a class method of EnsemblProvider - EnsemblProvider class methods
get_divisionandget_versionnow require an assembly name. - UCSC data is now downloaded over HTTPS instead of HTTP
Fixed
genomepy.install()now returns aGenomeinstance with updated annotation attributes.- now ignoring ~1600 assemblies from the Ensembl database with incorrect metadata
- no easy way to retrieve this data
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 2 years ago
genomepy - [0.15.0] - 2023-02-28
Added
- you can now tune the cache expiration time in the config
- create a config with
genomepy config generate, then tweak the values as desired.
- create a config with
- support for biopython >=1.80 with pyfaidx update
- raise an informative error when UCSC tools are missing
- this should only happen in Pip installations
Fixed
- disabling already disabled plugins no longer throws an error
- bgzipping fixes:
- bgzip works again with python>3.7 (openssl shenanigans. tabix was deprecated for htslib)
- genome index works with
genome install --bgzip(a 2nd is created with the correct naming format) - export file works with
genome install --bgzip genomepy.install_genome(bgzip=True)returns a Genome class instance with correct paths
Scientific Software - Peer-reviewed
- Python
Published by siebrenf almost 3 years ago
genomepy - [0.14.0] - 2022-08-01
Added
- now using
filelockfor improved thread safety - now checking if every API/FTP/HTTP(S) is accessible before proceeding
- genomepy search improvements:
- text search now accepts regex, and multiple substrings (space separated) are unordered.
- taxonomy search now returns all hits that start with the given number.
Changed
- switched to
pyproject.toml+hatchlingfor packaging
Fixed
- updated the README and CLI documentation to mention the
Localprovider
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 3 years ago
genomepy - [0.13.1] - 2022-06-21
Changed
- removed unused keys from Ensembl and UCSC databases to reduce their size
Fixed
- added a retry for initializing the diskcache (seq2science/issues/887)
- can now find ensembl urls for genomes not using url_names properly (#205)
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 3 years ago
genomepy - [0.13.0] - 2022-06-02
Added
genomepy searchandgenomepy genomescan now return the (unfiltered) absolute genome size with argument--size
Changed
- changed caching backend to
diskcache(thread safe) - reduced the local cache size of NCBI (by about half)
- by only storing assembly summary columns actually used by genomepy
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 3 years ago
genomepy - [0.12.0] - 2022-03-28
Added
genomepy.Annotation.lengths()to retrieve the gene/transcript lengths.genomepy.Annotation.from_attributes()can extract any sub-column that pesky attributes column
Changed
- updated Boyle-lab blacklists
genomepy.Annotation.genes()default changed from bed (commonly containing transcript names) to gtf (gene names)
Fixed
- blacklists now work with GENCODE
query_mygeneno longer filters input.genomepy installwith local provider now understands you want the annotation if you pass a path to an annotation
Scientific Software - Peer-reviewed
- Python
Published by siebrenf almost 4 years ago
genomepy - [0.11.1] - 2022-01-06
Added
quietflag forgenomepy.Annotationgenomepy -vflag
Changed
genomepy.Annotationreturns aFileNotFoundErrorinstead of aValueErrorwhere appropriate.download_assembly_reportrefactored. Now downloads the report for the exact same assembly accession (and not the nearest NCBI assembly).- broader unit tests for UCSC assembly accession scraping
Fixed
- inconsistent behaviour with assembly reports (#193 + #194)
Scientific Software - Peer-reviewed
- Python
Published by siebrenf about 4 years ago
genomepy - [0.11.0] - 2021-11-18
Added
- extened docstrings
- GENCODE support (GENCODE gene annotations with UCSC genomes)
- only contains the main chromosomes, no scaffolds or alternate haplotypes.
- only contains 4 assemblies (2 mouse, 2 human)
- excellent annotations for these regions & species though!
- Ensembl's GRCh37 can now be downloaded through genomepy
- Local fasta/gtf/gff(3)/bed file support
- you can install a local genome and/or annotation by providing local path(s) to
genomepy install - if annotation downloading is requested, but not annotation path is provided, a gtf/gff(3) annotation will be sought in the genome's source directory.
- you can install a local genome and/or annotation by providing local path(s) to
Annotation.gtf_dictcreates a dictionary for any key-value pair in the GTF columns or attribute fields!- e.g.
Annotation.gtf_dict("seqname", "gene_name")
- e.g.
Changed
- Genome.track2fasta can now ignore comment lines (starting with
#) - Genome.track2fasta will skip header lines (a warning will be printed)
- Genome.track2fasta will ignore regions that cannot be parsed (a warning will be printed)
- these fixes should improve
gimme scanperformance and feedback
- these fixes should improve
- UCSC annotation conversion tool settings tweaked. Better results with source gff files.
- Ensembl now uses HTTP instead of FTP (in some cases). This improves stability on some servers.
- tweaked search result alignment for clarity
- explained UCSC annotations in the README
- better file path handling (relative paths, user home and variables are expanded)
Annotationnow accepts a file/directory/genomepy name as first argument.- this merges 2 arguments into one.
Annotation.map_genesnow works without a README file- you can now set Annotation.tax_id manually.
Fixed
- Ensembl annotations from previous releases can now be downloaded as intended.
- Genome.track2fasta will skip regions that clearly dont make sense (start>end, and start<0)
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 4 years ago
genomepy - Version 0.10.0
[0.10.0] - 2021-07-30
Added
- Annotation class, containing
- regex filter (
genomepy.Annotation.filter_regex()) - sanitize functions (
genomepy.Annotation.sanitize()) - option to skip filtering and/or matching the annotation to the genome (also on CLI)
- gene name remapping to various formats (
genomepy.Annotation.map_genes()) - using MyGene.info. Can be queried separately (
genomepy.annotation.query_mygene()) - contig name remapping to other provider formats (
genomepy.Annotation.map_locations()) - get the annotations, or gene locations, as dataframes (
genomepy.Annotation.gtf,bedorgene_coords()respectively) - get the gene names as a list (
genomepy.Annotation.genes("gtf")orgenomepy.Annotation.genes("bed"))
- regex filter (
genomepy installnow attempts to install the NCBI assembly report- NCBI provider also indexes the NCBI
genbank_historicalsummary genomepy searchnow shows if the genome has an annotation- this slows down the results a bit
- to compensate, results are now shown as soon as they are found
- for UCSC, availability of any of the 4 annotations is shown
genomepy annotationshows the first line(s) of each gene annotation.gtf- for developers:
- pre-commit-hooks for linting
- formatting/linting script
tests/format.sh(optional argumentlint) - isort & autoflake formatters
Changed
- provider module split per provider
- ProviderBase overhauled, now called Provider
- regex filtering separated from
Provider.download_genome - utils module split into utils, files and online
- now using loguru for pretty logging
- accession
searchimproved- now finds GCA and GCF accessions
- now ignores patch levels
genomepy installautomatic provider selection refactoredProvider.online_providersreturns a generator (faster!)
genomepy installuses a combined filter function (faster!)genomepy installonly zips annotation files if the genome is zipped (with the bgzip flag) (faster!)- NCBI provider should be parsed faster (faster!)
- new dependency: pandas
- tests no longer format code
Fixed
- broken URLs should keep genomepy occupied for less long (check_url will immediately return on "Not Found" errors 404/450) (faster!)
- the
Genomeclass now passes arguments to the parentFastaclass - the
Genomeclass now regenerates the sizes and gaps files similarly to theFastaclass and its index (when the genome is younger) (faster!) - somewhat more pythonic tests
Scientific Software - Peer-reviewed
- Python
Published by siebrenf over 4 years ago
genomepy - Version 0.9.3
[0.9.3] - 2021-02-03
Changed
- URL provider got better at searching for annotation files
- NCBI provider will fall back on FTP if HTTPS is offline
Fixed
- genomes from ftp locations not working
Scientific Software - Peer-reviewed
- Python
Published by siebrenf about 5 years ago
genomepy - Version 0.9.2
[0.9.2] - 2021-01-28
Added
- progress bars for downloading and bgzipping (the slow stuff)
- spinner to indexing plugins (the slowest stuff)
Changed
- removed dependency of psutils
- added dependency of tqdm
Fixed
- an oopsie in the regex filter functions slowing down
install. - rmrf and mkdirp to behave more like their namesakes.
Scientific Software - Peer-reviewed
- Python
Published by siebrenf about 5 years ago
genomepy - Version 0.9.1
[0.9.1] - 2020-10-26
Added
genomepy installflag-k/--keep-altto keep alternative regions- argparse custom type for a genome command line argument
Changed
- added retries to UCSC and NCBI
- added retries to Travis tests
- Bucketcache improvements
genomepy searchkeeps searching after an exact match is foundgenomepy installremoves alternative regions by default
Fixed
genomepy cleanwont complain when there is nothing to clean- properly gzip the annotation.gtf if it was unzipped during sanitizing
genomepy installcan use the URL provider againgenomepy installwith-f/--forcewill overwrite previouse sizes and gaps files
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 5 years ago
genomepy - Version 0.9.0
[0.9.0] - 2020-09-01
Added
- check to see if providers are online + error message if not
- automatic provider selection for
genomepy install- optional provider flag for
genomepy install(-p/--provider) - if no provider is passed to
genomepy install, the first provider with the genome is used (order: Ensembl > UCSC > NCBI).
- optional provider flag for
genomepy cleanremoves local caches. Will be reloaded when required.
Changed
- Ensembl genomes always download over ftp (http was too unstable)
- Ensembl release versions obtained via REST API (http was too unstable)
genomepy searchandgenomepy providersonly check online providers- Online function now have a timeout and a retry system
- API changes to
download_genomeanddownload_annotationfor consistency
Fixed
- Ensembl status check uses lighter url (more stable)
searchandinstallnow consistently use safe search terms (no spaces)searchnow uses UTF-8, no longer crashing for \u2019 (some quotation mark).searchcase insensitivity fixed for assembly names.- Bucketcache now stores less data, increasing responsiveness.
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 5 years ago
genomepy - Version 0.8.4
[0.8.4] - 2020-07-29
- Fix bug where Genome.sizes dict contains str instead of int (#110).
- Fix bug with UTF-8 in README (#109).
- Fix bug where BED files with chr:start-end in 4th column are not recognized as BED files.
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 5 years ago
genomepy - Version 0.8.3
[0.8.3] - 2020-06-03
Fixed
- Fixed bug introduced by fixing a bug: Provider-specific options for
genomepy installon command line work again - UCSC annotations can now once again be obtained from knownGene.txt
Added
- UCSC gene annotations will now be downloaded in GTF format where possible
- Desired UCSC gene annotation type can now be specified in the
genomepy installcommand using--ucsc-annotation
Changed
- Added the NCBI RefSeq gene annotation to the list of potential UCSC gene annotations for download
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 5 years ago
genomepy - Version 0.8.2
[0.8.2] - 2020-05-25
Fixed
Genome.sizesandGenome.gapsare now populated automatically.- backwards compatibility with old configuration files (with
genome_dirinstead ofgenomes_dir) - updating the README.txt will only happen if you have write permission
- after gzipping files the original unzipped file is now properly removed
- providers will only download genome summaries when specifically queried
Changed
- updated blacklist for hg38/GRCh38 based on work by Anshul Kundaje, see ENCODE README.txt
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 5 years ago
genomepy - Release 0.8.1
[0.8.1] - 2020-05-11
Added
- Now using the UCSC rest API
genomepy searchnow accepts taxonomy IDsgenomepy searchwill now return taxonomy IDs and Accession numbers- The README.txt will now store taxonomy IDs and Accession numbers
- Gene annotations:
- Downloading of annotation file (BED/GTF/GFF3) from URL
- Automatic search for annotation file (GTF/GFF3) in genome directory when downloading from URL
- Option for URL provider to link to annotation file (to process is similarly to other providers)
- Automatic annotation sanitizing (and skip sanitizing flag
-sforgenomepy install) - Option to only download annotation with
genomepy install -o
- Plugins:
- Blacklists are automatically unzipped.
- Multithreading support for plugins, thanks to alienzj!
- STAR now uses the annotation file for a (one-pass) splice-aware index
Changed
- sizes no longer a plugin, but always gets executed
genomepy FUNCTION --helptexts expanded- all genomepy classes exported when imported into Python
- all providers now let you know when they are downloading assembly information.
- more descriptive feedback to installing & many errors
Removed
- Sizes plugin
- Old tests
- Removed outdated dependency
xmltodict
Fixed
genomepy configoptions made more robust- README.txt will no longer:
- update 3x for each command
- drop regex info
- have duplicate lines
Refactoring
- Genome class moved to
genome.py - Many functions moved to
utils.py - Many other functions made static methods of a class
Genome.track2fastaandGenome.get_random_sequenceoptimized- All Provider classes now store their genomes as a dict-in-dict, with the assembly name as key.
- Many Provider class functions now standardized. Many functions moved to from the daughter classes to the ProviderBase class.
- README.txt file generation and updating standardized
- Unit tests! all functions now have an individual test. Almost all test use functions already tested prior to them.
- Old tests incorporated in several extra tests (e01, e02, e03).
- Raise statements now use more fitting errors
- All instances of
os.removeexchanged foros.unlink - Almost all warnings fixed
- Extensive, COVID19-enabled, and somewhat pointless alphabetizing, optimizing and/or organizing changes to
- imports everywhere
.gitignore.travis.ymlrelease_checklist.mdcli.py- strings (many strings with .format() replaced with f-strings)
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 6 years ago
genomepy - Release 0.7.2
[0.7.2] - 2019-03-31
Fixes
- Fix minor issue with hg19 wrong blacklist url
- Ensembl downloads over http instead of https (release 99 no longer has https)
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 6 years ago
genomepy - Release 0.7.0
[0.7.0] - 2019-11-18
Added
- Direct downloading from url through url provider.
- Added
--forceflag. Files will no longer be overwritten by default. - Provider specific options:
--ensembl-version: specify release version.--ensembl-toplevel: by default,genomepy installwill search for primary assemblies. This flag will only download toplevel assemblies.
- Added STAR index plugin
Changed
- Providers are now case-insensitive.
- Extended testing.
- Increased minimal Python version to 3.6.
- Removed gaps from plugins, added gaps to core functionality.
Fixed
- bugfix: NCBI will show all versions of an assembly (will no longer filter on BioSample ID, instead filters on asm_name).
- fix: gaps file will be generated when needed.
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 6 years ago
genomepy - Release 0.6.1
[0.6.1] - 2019-10-10
Bugfix release
Fixed
- Fixed bug with
get_track_type().
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 6 years ago
genomepy - Release 0.6.0.
Added
- Support for storing bzgip-compressed genomes (#41).
Changed
- Removed support for Python 2 (2020 is close!).
Fixed
- Ensembl annotation for non-vertebrate genomes should work again.
- Fixed bug where a deleted or empty config file would result in an error.
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 6 years ago
genomepy - Release 0.5.5
[0.5.5] - 2019-03-19
Added
- Plugin for downloading genome blacklists (from Kundaje lab).
Fixed
- Fix for new Ensembl REST API and FTP layout.
- Genomes from Ensembl with a space in their name can be downloaded.
- Plugin imports use relative parts to prevent conflicts with other imports.
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 7 years ago
genomepy - Release 0.5.4
Added
- Downloading annotation from NCBI now implemented.
- Genbank assemblies at NCBI can be searched and downloaded
Fixed
- Fixed #23.
- Fixed #26.
- Fixed Ensembl downloads (#30)
- Fixed FTP tests for CI
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 7 years ago
genomepy - Release 0.5.2
- Fixed genome_dir argument to
genomepy install - Fixed msgpack dependency
- Fixed issue with
config generatewhere config directory does note exist.
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 7 years ago
genomepy - Release 0.5.1
Release notes:
- Fixed installation issue with msgpack
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 7 years ago
genomepy - Release 0.5.0
Release notes:
- Added support for indexing genomes (bwa, bowtie2, gmap, hisat2, minimap2)
- Added configuration management
- Switched tests to py.test
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 8 years ago
genomepy - Release 0.4.0
Changes:
- Optionally download annotation in BED and GTF format.
- Create BED file with gaps.
Scientific Software - Peer-reviewed
- Python
Published by simonvh over 8 years ago
genomepy - Version 0.3.1
Bugfix release. Changes:
- Added requests dependency
- Removed dependency on xdg, as it didn't support OSX
- Fixed string decoding bug
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 9 years ago
genomepy - Version 0.3.0
Version 0.3.0
- Started CHANGELOG.
- Genome listings are cached locally.
- Added
-m hardoption toinstallto hard-mask sequences. - Added
-loption toinstallfor a custom name. - Added
-rand--match/--no-matchoption to select sequences by regex.
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 9 years ago
genomepy - Version 0.2.1
Scientific Software - Peer-reviewed
- Python
Published by simonvh almost 9 years ago