Releases | Open Source Science

Augur - 31.4.0

These release notes are automatically extracted from the full changelog.

Features

schema: Allow parentheses (()) in gene names. #1819 (@kimandrews)
geolocation rules: Add rules to define region per country to ensure that regions are labelled for all countries. This is especially useful for data sources that do not include region in the metadata. #1844 (@joverlee521)
support numpy v2 in addition to v1. #1855 (@corneliusroemer)
support for Python 3.13. #1857 (@corneliusroemer)
tree: Prefer iqtree3 binary over iqtree2 and iqtree when available. #1875 (@joverlee521)
export v2: URLs encoded in metadata (both TSV and node-data JSONs) will be associated with the value in the exported JSON. Given a column/key <X> then a valid URL in a column/key named <X>__url will be automatically used. This allows values to be a clickable link when viewed in Auspice. #1852 (@jameshadfield)

Bug fixes

filter: Improved speed of using --group-by month on large datasets. #1845 (@victorlin)
merge: Added validation to require at least two sequence inputs for merging, consistent with metadata merging behavior. #1865 (@victorlin)
validate: Send all log messages to stderr. #1869 (@victorlin)
validate: only print the entire merged Auspice config to stderr when there's a validation error. #1878(@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 5 months ago

Augur - 31.3.0

These release notes are automatically extracted from the full changelog.

Features

traits: Added new options --branch-labels and --branch-confidence to export branch labels for nodes which have a corresponding state change. These are useful for creating streamtrees which convey geographic jumps. #1814 (@jameshadfield)
filter, merge: Added a new option --nthreads to configure parallelism. Right now, it is only passed to SeqKit, but it may be used for other internal optimizations in the future. #1833 (@victorlin)
filter: Added a new option --skip-checks to bypass checks for duplicates in sequences and whether ids in metadata have a sequence entry. Mainly useful when working with larger files. #1833 (@victorlin)
Added a new AUGUR_PROFILE environment variable. If set, Augur will run with Python's cProfile profiler and save results to the value which should be a file path. This may result in slightly slower run times, and should only be used for debugging purposes. #1835 (@victorlin)

Bug fixes

filter, merge: Improved run time of sequence I/O operations, especially in the common use case of having a workflow manager run multiple invocations simultaneously. #1833 (@victorlin)
filter, merge: Previously, SeqKit was hardcoded to use its default of 4 threads per command, which could have resulted in oversubscription of resources in the common use case of having a workflow manager run multiple invocations simultaneously. The default behavior has been updated to use 1 thread per command to discourage oversubscription of resources. It is configurable with the new --nthreads option described above. #1833 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 6 months ago

Augur - 31.2.1

These release notes are automatically extracted from the full changelog.

Bug fixes

curate format-dates: Removed redundant warning messages that were previously displayed when using --failure-reporting "warn". #1816 (@victorlin)
filter: Improved performance of --output-sequences by using SeqKit internally. #1794 (@victorlin)
filter: Improved performance when using --sequences without --sequence-index by skipping indexing of --sequences when no sequence-based filters are used. #1827 (@victorlin)
filter: Fixed a bug that prevented proper checking of duplicates and sequence index mismatches on VCF inputs. #1826 (@victorlin)
merge: Fixed a performance bug where input sequence file validation unnecessarily loaded file contents into device memory. #1820 (@victorlin)
refine: Fixed a bug where inferred dates were being wrongly marked as not inferred. #1829 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 7 months ago

Augur - 31.2.0

These release notes are automatically extracted from the full changelog.

Features

merge: Support merging of sequence files with --sequences. #1579 (@victorlin)
read-file: Multiple files are now accepted. #1815 (@victorlin)
schema: Added fields for streamtrees and default zoom branch label. #1813

Bug fixes

Added a missing redirect for the environment variables documentation page from its previous location. #1812 (@tsibley)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 7 months ago

Augur - 31.1.0

These release notes are automatically extracted from the full changelog.

Features

schema: Allow full stop character (.) in gene names. #955 (@jameshadfield)

Bug fixes

filter: Improved speed of using --group-by, --min-date, and --max-date on large datasets. #1792, #1811 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 7 months ago

Augur - 31.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

augur mask --mask, augur tree --exclude-sites: BED files with inconsistent CHROM values (i.e., values in the first column of data lines) will throw an error, as Augur (implicitly) expects to be working on a single piece of DNA (chromosome, segment, etc), and multiple CHROM values in a BED file indicate a violation of this expectation. This is a breaking change. #945 (@genehack)
filter: Empty values in the metadata id column will result in an error that can only be resolved by editing the metadata file or by specifying a different id column with --metadata-id-columns. #1807 (@joverlee521)

Bug fixes

augur mask --mask, augur tree --exclude-sites: Providing an empty BED file, or one with only header lines and no data lines, will no longer cause an error to be thrown. #945 (@genehack)
augur.utils.read_bed_file() was rewritten for increased compliance with the BED file specification. In particular, header line dectection is improved and multiple header lines are now supported. #945 (@genehack)
export v2: Improved the error message that is displayed when the metadata index column has duplicated values #1791 (@genehack)
tree: Improved help text for --tree-builder-args to explain some IQ-TREE options won't work because of defline rewriting #875 (@genehack)
export v2: Automatically rename fields within the filters and colorings configs of the provided auspice config file to match the renamed fields in the exported nodes. #1804 (@joverlee521)
export v2: Divergence values are now exported with increased precision, showing up to 6 significant digits instead of 3. #1801 (@rneher)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 7 months ago

Augur - 30.0.1

These release notes are automatically extracted from the full changelog.

Bug fixes

filter: Removed the note that appeared in output when running with --sequences and without --sequence-index. The help text of both options has been updated to clarify the relationship between the two. #1797 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 8 months ago

Augur - 30.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

Note: The following breaking changes were effective as of version 29.1.0.

filter: Date values in <year>-<month> format with more than 4 digits in the year (e.g. 02025-04) or more than 2 digits in the month (e.g. 2025-004) are no longer supported. Support for these was unintentional, but it worked in practice. #1786 (@victorlin)
filter: Date values in <year>-<month>-<day> format that fall outside of valid date boundaries now fail with an error. For example, 2025-00-01 is invalid. Previously, all date parts were treated categorically without date validation so month=0 was its own category. #1786 (@victorlin)
filter: Date values in <year>-<month> format that fall outside of valid date boundaries are now auto-converted to the closest date. For example, 2025-00 will be auto-converted to 2025-01. Previously, all date parts were treated categorically without date validation so month=0 was its own category. It will now be treated as month=1. This is a side-effect of the change in 29.1.0 that switched to the same internal date parsing function that is used by other commands. A future major version may change behavior to fail with an error to better align with handling of <year>-<month>-<day>. [#1774]

Bug fixes

filter: version 29.1.0 inadvertently dropped support for date values in <year>-<month> or <year>-<month>-<day> format that are not in YYYY-MM or YYYY-MM-DD format. Support for some values has been restored. See the "Major Changes" section for details on which values are explicitly no longer supported. #1785 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 9 months ago

Augur - 29.1.0

These release notes are automatically extracted from the full changelog.

Features

export v2: Allow multiple auspice config files (--auspice-config) which are merged together. Note that the merging of lists extends the original list, although elements representing the same data are overwritten instead. You can optionally write out this merged config via --output-auspice-config for debugging purposes. #1756 (@jameshadfield)

Bug fixes

titers: Improve error messages when titer models do not have enough data. #1769 (@huddlej)
align: Remove extra logs for insertions since the coordinates are output the *.insertions.csv. #1772 (@joverlee521)
filter: Fixed an error with weighted sampling by year. #1776 (@victorlin)
filter: Previously, subsampling with --group-by year or month would crash on numeric dates. This has been fixed by switching to the same internal date parsing function that is used by other commands. #1774 (@victorlin)
filter: Made a small adjustment to use pandas's "string" dtype alias when processing values in metadata. #1782 (@victorlin)
filter: Options --output and -o have been deprecated and will now show a warning message. See DEPRECATED.md for details. #1622 (@victorlin)
Updated outdated documentation on supported date formats in metadata. #882 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 9 months ago

Augur - 29.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

Updated default latitudes and longitudes for geography traits that includes location name changes. See the pull request for more details. #1744 (@joverlee521)
curate apply-geolocation-rules: Augur's standard geolocation rules are used by default and rules provided via --geolocation-rules are considered custom rules that have precedence over the default rules. The --no-default-rules flag can be used to ignore the default rules. See the pull request for more details. #1745 (@joverlee521)
augur.utils.read_strains has been removed as it's been deprecated since January 2024. The same function is available through the public API as augur.io.read_strains. #1749 (@joverlee521)
Bumped minimum Python version to 3.9 as support for 3.8 was dropped in Augur v27.0.0. #1763 (@joverlee521)

Features

refine: Added a --remove-outgroup flag which can be used when rooting a tree on a single taxon. Rooting and removal of outgroup will be performed before any temporal inference, if applicable. #1751 (@jameshadfield)
Added standard geolocation rules in "augur/data/geolocation_rules.tsv" that can be used with augur curate apply-geolocation-rules. #1744 (@joverlee521)
[refine, export] Ambiguous dates (e.g. those with "XX" in the date string) are now exported in the Auspice JSON, and all tips now have an additional "inferred" boolean property. These changes only apply to temporal trees. #1760 (@jameshadfield)

Bug fixes

Certain strain names would be silently renamed by augur tree [--method iqtree]. We now avoid such renaming wherever possible and in cases where there are backslashes or single quotes we now raise a fatal error. Note that names with spaces in the FASTA header (description line) continue to be modified such that everything after the first space is not used in the resulting tree. #1750 (@jameshadfield)
Fixed the error that occurred when running augur curate --help. #1755 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 10 months ago

Augur - 28.0.1

These release notes are automatically extracted from the full changelog.

Bug Fixes

schema: document node property values support url. This feature has been supported in Auspice since v2.25.0. #1743 (@joverlee521)
augur.io.read_metadata: Ensure that the index column's dtype is always "string" so that numeric ids don't get converted to numeric dtypes. #1746 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 11 months ago

Augur - 28.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

export v2: The string "none" is now an invalid value for --color-by-metadata and --metadata-columns options and will be ignored to prevent clashes with Auspice's internal use of "none". #1113 (@joverlee521)
schema: The string "none" is now an invalid branch label, node_attr key, and coloring key. #1113 (@joverlee521)
curate apply-geolocation-rules: The geolocation rule matching has been updated to be case-insensitive. Use the new --case-sensitive flag if you want to revert to the previous behavior of case-sensitive matching. #1740, #1741 (@joverlee521)
augur.io.read_sequences: Only accept the values "fasta" and "genbank" for format, instead of allowing any value supported by Biopython. #1731 (@victorlin)
- This also applies to augur.io.sequences.read_single_sequence, which is not in the public API.

Features

All commands: Support compressed formats for input sequence files. This was already the case for most commands. Internal standardization extends the support to all other commands. #1730 (@victorlin)

Bug Fixes

When using >=Biopython 1.85: properly detect augur ancestral --root-sequence file format and, for all commands, support FASTA files with comments. #1731 (@victorlin)

Internal changes

Added a new function augur.io.sequences.read_single_sequence as a wrapper around Bio.SeqIO.read with support for compressed formats, similar to the augur.io.sequences.read_sequences wrapper around Bio.SeqIO.parse. #1730 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 11 months ago

Augur - 27.2.0

These release notes are automatically extracted from the full changelog.

Features

export: Added a new option --warning to display a warning banner in Auspice, supported as of Auspice version 2.62.0. #1722 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 11 months ago

Augur - 27.1.0

These release notes are automatically extracted from the full changelog.

Features

ancestral: Add --seed argument to enable deterministic inference of root states by TreeTime. #1690 (@huddlej)

Bug Fixes

ancestral, refine: Explicitly specify how the root and ambiguous states are handled during sequence reconstruction and mutation counting. #1690 (@rneher)
titers: Fix type errors in code associated with cross-validation of models. #1688 (@huddlej)
export: The help text for --lat-longs has been improved with a link to the defaults and specifics around the overriding behavior. #1715 (@victorlin)
augur.io.read_metadata: Pandas versions <1.4.0 prevented this function from properly setting the index column's data type. Support for those older versions has been dropped. #1716 (@victorlin)
In version 24.4.0, one of the new features was that all options that take multiple values could be repeated. Unfortunately, it overlooked a few that have been fixed in this version. #1707 (@victorlin)
- augur curate rename --field-map
- augur curate transform-strain-name --backup-fields
augur curate format-dates --expected-date-formats help text has been improved with clarifications regarding how values provided interact with builtin formats and how to match masked date parts. #1707, #1718 (@victorlin)
parse: Transform strain names the same way in both metadata and sequences instead of only transforming sequences. #1712 (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] 12 months ago

Augur - 27.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

Drop support for Python 3.8. #1693
Drop support for older versions of jsonschema (<4.18.0). #1691
Drop support for xopen <2.0.0. #1692

Bug fixes

export: validation will no longer crash with KeyError: 'tree' when newer versions of jsonschema (≥4.18.0) are installed. #1358

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] about 1 year ago

Augur - 26.2.0

These release notes are automatically extracted from the full changelog.

Features

This is the first version to officially support Python 3.12 and Pandas v2. #1671 (@corneliusroemer, @victorlin)
curate: change output metadata to RFC 4180 CSV-like TSVs to match the TSV format output by other Augur subcommands and the Nextstrain ecosystem as discussed in #1566. #1565 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] about 1 year ago

Augur - 26.1.0

These release notes are automatically extracted from the full changelog.

Features

ancestral, translate: Add --skip-validation as an alias to --validation-mode=skip. #1656 (@victorlin)
clades: Allow customizing the validation of input node data JSON files with --validation-mode and --skip-validation. #1656 (@victorlin)
tree: When using iqtree, check for all synonyms of default args when detecting potential conflicts, e.g. --threads-max is equivalent to -ntmax. Previously, we were only checking for the latter. Also use new, preferred IQtree2 option names (e.g. --polytomy instead of -czb etc.). #1547 (@corneliusroemer)

Bug Fixes

index: Previously specifying a directory that does not exist in the path to --output would result in an incorrect error stating that the input file does not exist. It now shows the correct path responsible for the error. #1644 (@victorlin)
curate format-dates: Update help docs and improve failure messages to show use of --expected-date-formats. #1653 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] about 1 year ago

Augur - 26.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

filter: Duplicate header names in the FASTA file (--sequences) will now result in an error. #1613
parse: When both strain and name fields are present, the strain field will now be used as the sequence ID field. #1629 (@victorlin)
merge: Generated source columns (e.g. __source_metadata_{NAME}) are now omitted by default. They may be explicitly included with --source-columns=TEMPLATE or explicitly omitted with --no-source-columns. This may be a breaking change for any existing uses of augur merge relying on the generated columns, though as augur merge is relatively new we believe usage to be scant if extant at all. #1625 #1632 (@tsibley)

Bug Fixes

filter: Previously, when --subsample-max-sequences was slightly lower than the number of groups, it was possible to fail with an uncaught AssertionError. Internal calculations have been adjusted to prevent this from happening. #1588 #1598 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 25.4.0

These release notes are automatically extracted from the full changelog.

Features

merge: Table-specific id columns and delimiters may now be specified, e.g. --metadata-id-columns X=id Y=strain and --metadata-delimiters X=, Y=';', to allow more precise behaviour and avoid ordering issues. #1594 (@tsibley)

Bug Fixes

filter: Improved warning and error messages in the case of missing columns. #1604
merge: Any user-customized ~/.sqliterc file is now ignored so it doesn't break augur merge's internal use of SQLite. #1608 (@tsibley)
merge: Non-id columns in metadata inputs that would conflict with the output id column are now forbidden and will cause an error if present. Previously they would overwrite values in the output id column, causing incorrect output. #1593 (@tsibley)
import: Spaces in BEAST MCC tree annotations (for example, from a discrete state reconstruction) no longer break augur import beast's parsing. #1610 (@watronfire)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 25.3.0

These release notes are automatically extracted from the full changelog.

Features

A new command, augur merge, now allows for generalized merging of two or more metadata tables. #1563 (@tsibley)
Two new commands, augur read-file and augur write-file, now allow external programs to do i/o like Augur by piping from/to these new commands. They provide handling of compression formats and newlines consistent with the rest of Augur. #1562 (@tsibley)
A new debugging mode can be enabled by setting the AUGUR_DEBUG environment variable to 1 (or any non-empty value). Currently the only effect is to print more information about handled (i.e. anticipated) errors. For example, stack traces and parent exceptions in an exception chain are normally omitted for handled errors, but setting this env var includes them. Future debugging and troubleshooting features, like verbose operation logging, will likely also condition on this new debugging mode. #1577 (@tsibley)
filter: Added the ability to use weights in subsampling. See help text of --group-by-weights and the updated Filtering and Subsampling guide for more information. #1454 (@victorlin)

Bug Fixes

Embedded newlines in quoted field values of metadata files read/written by many commands, annotation files read by augur curate apply-record-annotations, and index files written by augur index are now properly handled. #1561 #1564 (@tsibley)
Output written to stderr (e.g. informational messages, warnings, errors, etc.) is now always line-buffered regardless of the Python version in use. This helps with interleaved stderr and stdout. Previously, stderr was block-buffered on Python 3.8 and line-buffered on 3.9 and higher. #1563 (@tsibley)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 25.2.0

These release notes are automatically extracted from the full changelog.

Features

export v2: we now limit numerical precision on floats in the JSON. This should not change how a dataset is displayed / interpreted in Auspice but allows the gzipped & minimised JSON filesize to be reduced by around 30% (dataset-dependent). #1512 (@jameshadfield)
traits, export v2: augur traits now reports all confidence values above 0.1% rather than limiting them to the top 4 results. There is no change in the eventual Auspice dataset as augur export v2 will still only consider the top 4. #1512 (@jameshadfield)
curate: Excel (.xlsx and .xls) and OpenOffice (.ods) spreadsheet files are now also supported as metadata inputs (--metadata). The first sheet in the workbook is read as tabular data. #1550 (@tsibley)

Bug Fixes

titers sub: Fixes a bug where antigenic weights were assigned to branches for substitutions in the incorrect order of <derived allele><position><ancestral allele> instead of <ancestral allele><position><derived allele>. #1555 (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 25.1.1

These release notes are automatically extracted from the full changelog.

Bug Fixes

curate parse-genbank-location: Fix a bug where a mix of empty and populated location-field values would result in inconsistent fields in the output NDJSON #1531(@genehack)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 25.1.0

These release notes are automatically extracted from the full changelog.

Features

Support xopen major version 2. Deprecate v1. Schedule for removal around November 2024. #1532 (@corneliusroemer)
Support networkx major version 3. #1534 (@corneliusroemer)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 25.0.0

These release notes are automatically extracted from the full changelog.

Major changes

curate format-dates: Raises an error if provided date field does not exist in records. #1509 (@joverlee521)
All curate subcommands: Verifies all input records have the same fields and raises an error if a record does not have matching fields. #1518 (@joverlee521)

Features

Added a new sub-command augur curate apply-geolocation-rules to apply user curated geolocation rules to the geolocation fields in a metadata file. Previously, this was available as a script within the nextstrain/ingest repo. #1491 (@victorlin)
Added a default color for the "Asia" region that will be used in augur export is no custom colors are provided. #1490 (@joverlee521)
Added a new sub-command augur curate apply-record-annotations to apply user curated annotations to existing fields in a metadata file. Previously, this was available as a merge-user-metadata in the nextstrain/ingest repo. #1495 (@joverlee521)
Added a new sub-command augur curate abbreviate-authors to abbreviate lists of authors to " et al." Previously, this was avaliable as the transform-authors script within the nextstrain/ingest repo. [#1483]
Added a new sub-command augur curate parse-genbank-location to parse the geo_loc_name field from GenBank reconds. Previously, this was available as the translate-genbank-location script within the nextstrain/ingest repo. [#1485]
curate format-dates: Added defaults to --expected-date-formats so that ISO 8601 dates (%Y-%m-%d) and its various masked forms (e.g. %Y-XX-XX) are automatically parsed by the command. #1501 (@joverlee521)
Added a new sub-command augur curate transform-strain-name to filter strain names based on matching a regular expression. Previously, this was available as the transform-strain-names script within the nextstrain/ingest repo. #1514 (@genehack)
Added a new sub-command augur curate rename to rename field / column names. Previously, a similar version was available as the transform-field-names script within the nextstrain/ingest repo however the behaviour is slightly changed here. #1506 (@jameshadfield)

Bug Fixes

filter: Improve speed of checking duplicates in metadata, especially for large files. #1466 (@victorlin)
curate: Stop adding double quotes to the metadata TSV output when field values have internal quotes. #1493 (@joverlee521)
curate format-dates: Mask empty date values as XXXX-XX-XX to represent unknown dates. #1509 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 24.4.0

These release notes are automatically extracted from the full changelog.

Features

All commands: Allow repeating an option that takes multiple values. Previously, if multiple option flags were specified (e.g. --exclude-where 'region=A' --exclude-where 'region=B'), only the last one was used. Now, all values are used. #1445 (@victorlin)
ancestral, translate: output node data files are now validated. The argument --validation-mode is added which controls this behaviour (default: error). This argument also controls validation of the input node-data file (ancestral only). #1440 (@jameshadfield)
export: Updated default latitudes and longitudes for geography traits. This only applies if you are not using --lat-longs to override the built in mappings. #1449 (@trvrb)

Bug Fixes

validation: we no longer exit with a non-zero exit code when the requested validation mode is "warn" #1440 (@jameshadfield)
validation: we no longer perform any validation when the requested validation mode is "skip" #1440 (@jameshadfield)
filter: Send all log messages to stderr. This allows output to be written to stdout (e.g. --output-strains /dev/stdout). #1459 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 1 year ago

Augur - 24.3.0

These release notes are automatically extracted from the full changelog.

Features

filter: Added a new option --max-length to filter out sequences that are longer than a certain amount of base pairs. #1429 (@victorlin)
parse: Added support for environments that use pandas 2.x. #1436 (@emollier, @victorlin)

Bug Fixes

filter: Updated docs with an example of tiered subsampling. #1425 (@victorlin)
export: Fixes bug #1433 introduced in v23.1.0, that causes validation to fail when gene names start with nuc, e.g. nucleocapsid. #1434 (@corneliusroemer)
import: Fixes bug introduced in v24.2.0 that prevented import beast from running. #1439 (@tomkinsc)
translate, ancestral: Compound CDS are now exported as segmented CDS and are now viewable in Auspice. #1438 (@jameshadfield)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 24.2.3

These release notes are automatically extracted from the full changelog.

Bug Fixes

filter: Updated the help and report text of --min-length to explicitly state that the minimum length filter only counts standard nucleotide characters A, C, G, or T (case-insensitive). This has been the behavior since version 3.0.3.dev1, but has never been explicitly documented. #1422 (@joverlee521)
frequencies: Fixed a bug introduced in 24.2.0 and 24.1.0 that prevented --regions from working when providing regions other than the default "global" region. #1424

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 24.2.2

These release notes are automatically extracted from the full changelog.

Bug Fixes

filter: In versions 24.2.0 and 24.2.1, --query stopped working in cases where internal optimizations added in version 24.2.0 failed to parse the columns from the query. It now falls back to non-optimized behavior that allows queries to work. #1418 (@victorlin)
filter: Handle backtick quoting in internal optimizations of --query. #1417 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 24.2.1

These release notes are automatically extracted from the full changelog.

Bug Fixes

frequencies: Fixed a bug introduced in 24.2.0 that prevented --method diffusion from working alongside --tree. #1412 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 24.2.0

These release notes are automatically extracted from the full changelog.

Features

filter: Added a new option --query-columns that allows specifying what columns are used in --query along with the expected data types. If unspecified, automatic detection of columns and types is attempted. #1294 (@victorlin)
augur.io.read_metadata: A new optional columns argument allows specifying a subset of columns to load. The default behavior still loads all columns, so this is not a breaking change. #1294 (@victorlin)
augur parse: A new optional --output-id-field argument allows the user to select any ID field for the produced FASTA file (e.g. 'accession' instead of 'name' or 'strain'). #1403 (@j23414)
- When no --output-id-field is given and the data has both name and strain fields, continue to preferentially use name over strain as the sequence ID field; but, throw a deprecation warning that the order will be switched to prefer strain over name in the future to be consistent with the rest of Augur.
- Added entry to DEPRECATED.md.
Compression should now be supported for all input and output files. Please open an issue if you find one that doesn't! #1381 (@victorlin)

Bug Fixes

filter: In version 24.1.0, automatic conversion of boolean columns was accidentally removed. It has been restored with additional support for empty values evaluated as None. #1410 (@victorlin)
filter: The order of rows in --output-metadata and --output-strains now reflects the order in the original --metadata. #1294 (@victorlin)
filter, frequencies, refine: Performance improvements to reading the input metadata file. #1294 (@victorlin)
- For filter, this comes with increased writing times for --output-metadata and --output-strains. However, net I/O speed still decreased during testing of this change.
filter: Updated the help text of --include and --include-where to explicitly state that this can add strains that are missing an entry from --sequences. #1389 (@victorlin)
filter: Fixed the summary messages to properly reflect force-inclusion of strains that are missing an entry from --sequences. #1389 (@victorlin)
filter: Updated wording of summary messages. #1389 (@victorlin)
Enforce UTF-8 encoding when reading and writing files. Improve error messages when a non-UTF-8 file is used. #1381 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 24.1.0

These release notes are automatically extracted from the full changelog.

Features

augur.io.read_metadata: A new optional dtype argument allows custom data types for all columns. Automatic type inference still happens by default, so this is not a breaking change. #1252 (@victorlin)
augur.io.read_vcf has been removed and usage replaced with TreeTime's function of the same name which has improved validation of the VCF file. #1366 (@jameshadfield)

Bug Fixes

filter, frequencies, refine: Speed up reading of the metadata file. #1252 (@victorlin)
traits: Previously, columns with only numeric values were treated as numerical data. These are now treated as categorical data for discrete trait analysis. #1252 (@victorlin)
Support Biopython ≥1.82 by requiring bcbio-gff ≥0.7.1. #1400 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 24.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

ancestral, translate: For VCF inputs please ensure you are using TreeTime 0.11.2 or later. A large number of bugfixes and improvements have been added in both Augur and TreeTime. #1355 and TreeTime #263 (@jameshadfield)
ancestral, translate: GenBank files now require the (GFF mandatory) source feature to be present. #1351 (@jameshadfield)
ancestral, translate: For GFF files, we extract the genome/sequence coordinates by inspecting the sequence-region pragma, region type and/or source type. This information is now required. #1351 (@jameshadfield)

Features

ancestral, translate: Improvements to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
- Output VCF will better match the input VCF, including CHROM name and ploidy encoding.
- VCF inputs now require --vcf-reference-output
- AA sequences are now exported for the tree root
- VCF writing is now 3 orders of magnitude faster (dataset dependent)
ancestral, translate: A range of improvements to how we parse GFF and GenBank reference files. #1351 (@jameshadfield)
- translate will now always export a 'nuc' annotation in the output JSON, allowing it to pass validation
- Gene/CDS names of 'nuc' are now forbidden.
- If a Gene/CDS in the GFF/GenBank file is unparsed we now print a warning.
ancestral: For VCF alignments, a VCF output file is now only created when requested via --output-vcf. #1344 (@jameshadfield)
ancestral: Improvements to command line arguments. #1344 (@jameshadfield)
- Incompatible arguments are now checked, especially related to VCF vs FASTA inputs.
- --vcf-reference and --root-sequence are now mutually exclusive.
translate: Tree nodes are checked against the node-data JSON input to ensure sequences are present. #1348 (@jameshadfield)
utils::load_features: This function may now raise AugurError. #1351 (@jameshadfield)
export v2: Automatically minify large outputs. Use --no-minify-json to disable this default behavior. #1352 (@victorlin)
Added a new file DEPRECATED.md to document timelines and progress of deprecated features in the Augur CLI and Python API. #1371 (@victorlin)

Bug Fixes

ancestral, translate: Various fixes to VCF inputs / outputs. #1355 and TreeTime #263 (@jameshadfield)
- Fix incorrect (but passing) tests
- Fix case-sensitive sequence comparisons between the root and reference sequences.
- Fix a bug where ambiguous alleles are not inferred (see #1380 for full details).
- Fix a bug where positions with no sequence information were assigned a base because the mask was not being computed (see #1382 for full details).
- More than one ALT allele is now correctly parsed
- Mutations followed by an insertion are now parsed
- Unchanged ref genotypes are now encoded as '0' rather than '.'
- ALT alleles "*" are now valid (introduced in VCF spec 4.2, but observed in VCF 4.1 files)
- Positions with no variation are no longer exported
ancestral, translate: Fixes for JSON (non-VCF) inputs. #1355 (@jameshadfield)
- The "reference" translations are now from the provided reference sequence, not from the root of the tree. #1355 (@jameshadfield)
- Fix a bug where positions with no sequence information were assigned a base because the mask was not applied (see #1382 for full details)
ancestral, translate: Avoid incompatibilities with Biopython >=1.82. #1374, #1387 (@victorlin)
ancestral, translate: Address Biopython deprecation warnings. #1379 (@victorlin)
ancestral: Previously, the help text for --genes falsely claimed that it could accept a file. Now, it can truly claim that. #1353 (@victorlin)
translate: The 'source' ID for GFF files is now ignored as a potential gene feature (it is still used for overall nuc coords). #1348 (@jameshadfield)
translate: Improvements to command line arguments. #1348 (@jameshadfield)
- --tree and --ancestral-sequences are now required arguments.
- separate VCF-only arguments into their own group
translate: Fixes a bug in the parsing behaviour of GFF files whereby the presence of the --genes command line argument would change how we read individual GFF lines. Issue #1349, PR #1351 (@jameshadfield)
If TreeTimeError is encountered Augur now exits with code 2 rather than 0. (This restores the original behaviour.) #1367 (@jameshadfield)
Deprecate read_strains from augur.utils and add it to the public API under augur.io. #1353 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 2 years ago

Augur - 23.1.1

These release notes are automatically extracted from the full changelog.

Bug Fixes

Fix Python 3.11 installation for Conda environments. #1334 (@victorlin)
Bump pyfastx dependency to major versions 1 and 2. #1335 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] about 2 years ago

Augur - 23.1.0

These release notes are automatically extracted from the full changelog.

Features

Support treetime 0.11.* #1310 (@corneliusroemer)
export: Allow minimal export using only a (newick) tree in augur export v2. #1299 (@jameshadfield)
A number of schema updates and improvements #1299 (@jameshadfield)
- We now require all nodes to have node_attrs on them with one of div or num_date present
- Some never-used properties are removed from the schemas, including a pattern for defining nucleotide INDELs which was never used by augur or auspice.
- Tip label defaults are now settable within the auspice-config JSON
- Empty colorings definitions are allowed (the tree will be grey in Auspice)

Bug fixes

ancestral: Export amino acid sequences inferred for the root node of the tree in the node data JSON output for compatibility with augur translate output. #1317 (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 23.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

Drop support for Python 3.7. #1296 (@victorlin)

Features

export v2: Allow the root-sequence data to be included (inlined) in the main dataset JSON file, avoiding the need for a sidecar _root-sequence.json file. #1295 (@jameshadfield)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.4.0

These release notes are automatically extracted from the full changelog.

Features

refine: Export covariance matrix and standard deviation for clock rate regression in the node data JSON output when these values are calculated by TreeTime. These new values appear in the clock data structure of the JSON output as cov and rate_std keys, respectively. #1284 (@huddlej)

Bug fixes

clades: Fix outputs for genes named NA (previously the value was replaced by nan). #1293 (@rneher)
distance: Improve documentation by describing how gaps get treated as indels and how users can ignore specific characters in distance calculations. #1285 (@huddlej)
Fix help output compatibility with non-Unicode streams. #1290 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.3.0

These release notes are automatically extracted from the full changelog.

Features

ancestral: add functionality to reconstruct ancestral amino acid sequences and add inferred mutations to the node_data_json with output equivalent to augur translate. ancestral now takes an annotation (--annotation), a list of genes (--genes), and a file name pattern for amino acid alignments (--translations). Mutations for each of these genes will be inferred and added to the output JSON to each node as a list at ['aa_muts'][gene]. The annotations will be added to the annotation field in the output JSON. Inferred amino acids sequences can be saved with the new --output-translations argument. #1258 (@rneher, @huddlej)
ancestral: add the ability to report mutations relative to a sequence other than the inferred root of the tree. This sequence can be specified via --root-sequence and difference between this sequence and the inferred root of the tree will be added as mutations to the root node for nucleotides and amino acids. All differences between the specified root-sequence and the inferred sequence of the root node of the tree will be added as mutations to the root node. This was previously already possible for vcf input via --vcf-reference. #1258 (@rneher)
refine: add mid_point as rooting option to refine. #1257 (@rneher)

Bug fixes

filter: In version 22.2.0, --query would fail when the .str accessor was used on a column. This has been fixed. #1277 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.2.0

These release notes are automatically extracted from the full changelog.

Features

Adds a new sub-command augur curate titlecase. The titlecase command is intended to apply titlecase to string fields in a metadata record (e.g. BRAINE-LE-COMTE, FRANCE -> Braine-le-Comte, France). Previously, this was available in the transform-string-fields script within the monkeypox repo. #1197 (@j23414 and @joverlee521)

Bug fixes

export v2: Previously, when strain was not used as the metadata ID column, node attributes might have gone missing from the final Auspice JSON. This has been fixed. #1260, #1262 (@victorlin, @joverlee521)
export v1: Added a deprecation warning for this command. #1265 (@victorlin)
export v1: The recently introduced flag --metadata-id-columns did not work properly due to the same export v2 bug that was fixed in this release. Instead of fixing it in export v1, drop the broken feature since this command is no longer being maintained. #1265 (@victorlin)
filter: Expose internal Pandas errors from --query which may be useful to users. #1267 (@victorlin)
filter: Previously, --query would fail when numerical comparisons were used on columns with missing values. This has been fixed. #1269 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.1.0

These release notes are automatically extracted from the full changelog.

Features

export, frequencies, refine, traits: Add a new flag --metadata-id-columns to customize the possible metadata ID columns. Previously, this was only available in augur filter. #1240 (@victorlin)
Add new sub-subcommand augur curate format-dates. The format-dates command is intended to be used to format date fields to ISO 8601 date format (YYYY-MM-DD), where incomplete dates are masked with XX (e.g. 2023 -> 2023-XX-XX). #1146 (@joverlee521)

Bug fixes

parse: Fix a bug where --fix-dates was always applied, with a default of --fix-dates=monthfirst. Now, running without --fix-dates will leave dates as-is. #1247 (@victorlin)
augur.io.open_file: Previously, the docs described a type restriction on path_or_buffer but it was not enforced. It has been updated to allow all I/O classes, and is enforced at run-time. #1250 (@victorlin)
filter: Fix a bug where data files consisting of only numerical strain names would not work when both --metadata and --sequences are passed. #1256 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.0.3

These release notes are automatically extracted from the full changelog.

Bug fixes

utils: Serialize pandas Series in write_json. #1213 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.0.2

These release notes are automatically extracted from the full changelog.

Bug fixes

CI: Add a Github action to test augur on 8 Nextstrain pathogen workflows using example data. #1217 (@corneliusroemer)
parse: Denote required arguments including --fields, --output-sequences, and --output-metadata. #1228 (@huddlej)
Fix export of the strand attribute of gene annotations. Previously, features on the negative strand were not annotated as such since the code assumed that the strand attribute was boolean instead of [-1, +1]. #1211 @rneher and @j23414.
augur.io.read_metadata: explicitly set date column as string type to prevent year only dates from being inferred as integers. #1235 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.0.1

These release notes are automatically extracted from the full changelog.

Bug fixes

export: No longer export duplicate entries in the colorings array, a bug which has been present in Augur since at least v12 #719. #1218 (@jameshadfield)
export: In version 22.0.0, some configurations of export may have resulted in the clade coloring appearing last in the Auspice dropdown rather than first. This is now fixed. #1218
export: In version 22.0.0, validation of augur.utils.read_node_data was changed to error when a node data JSON did not contain any actual data. This causes export to error when an empty node data JSON is passed, as for example in ncov's pathogen-ci. This is now fixed by warning instead. The bug was originally introduced in PR [#728][]. #1214 (@corneliusroemer)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 22.0.0

These release notes are automatically extracted from the full changelog.

Major Changes

export, filter, frequencies, refine, traits: From versions 10.0.0 through 21.1.0, arbitrary delimiters for --metadata were supported due to internal implementation differences from the advertised CSV and TSV support. Starting with this version, non-CSV/TSV files will no longer be supported by default. To adjust for this breaking change, specify custom delimiters with the new --metadata-delimiters flag. #1196 (@victorlin)
augur.io.read_metadata: Previously, this supported any arbitrary delimiters for the metadata. Now, it only supports a list of possible delimiters represented by the new delimiters keyword argument, which defaults to , and \t. #812 (@victorlin)
refine: The seeding method for --seed has been updated. This affects usages that rely on the reproducibility of outputs with the same --seed value prior to this version. Outputs from this version onwards should be reproducible until the next implementation change, which we don't expect to happen any time soon. #1207 (@rneher)

Features

Constrain bcbio-gff to >=0.7.0 and allow Biopython >=1.81 again. We had to introduce the Biopython constraint in v21.0.1 (see #1152) due to bcbio-gff <0.7.0 relying on the removed Biopython feature UnknownSeq. #1178 (@corneliusroemer)
augur.io.read_metadata (used by export, filter, frequencies, refine, and traits): Previously, this used the Python parser engine for pandas.read_csv(). Updated to use the C engine for faster reading of metadata. #812 (@victorlin)
curate: Allow custom metadata delimiters with the new --metadata-delimiters flag. #1196 (@victorlin)
Bump the default recursion limit to 10,000. Users can continue to override this limit with the environment variable AUGUR_RECURSION_LIMIT. #1200 (@joverlee521)
clades, export v2: Clade labels + coloring keys are now definable via arguments to augur clades allowing pipelines to use multiple invocations of augur clades resulting in multiple sets of colors and branch labels. How labels are stored in the (intermediate) node-data JSON files has changed. This should be fully backwards compatible for pipelines using augur commands, however custom scripts may need updating. PR #728 (@jameshadfield)
refine: add flag --max-iter to control the maximal number of iterations TreeTime uses to infer time trees. This was previously hard-coded to 2, which is now the default. #1203 (@rneher)
refine: add flags --greedy-resolve and --stochastic-resolve to customize polytomy resolution. #1203, #1207 (@rneher)
- --greedy-resolve: resolve polytomies by greedily minimizing tree length (default behavior, unchanged).
- --stochastic-resolve: resolve polytomies as random coalescent trees.
- These are mutually exclusive with the pre-existing --keep-polytomies flag.

Bug fixes

filter, frequencies, refine, parse: Previously, ambiguous dates in the future had a limit of today's date imposed on the upper value but not the lower value. It is now imposed on the lower value as well. #1171 (@victorlin)
refine: --year-bounds was ignored in versions 9.0.0 through 20.0.0. It now works. #1136 (@victorlin)
tree: Input alignment filenames which do not end in .fasta are now properly handled when using IQ-TREE. Previously their contents were overwritten first by augur tree itself (resulting in truncation) and then by the log output of IQ-TREE (resulting in an error). Thanks to Jon Bråte for reporting this bug. #1206 (@tsibley)
clades: A number of small bug fixes, improvements to documentation, tests and improved error detection. #1199 (@jameshadfield)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] over 2 years ago

Augur - 21.1.0

These release notes are automatically extracted from the full changelog.

Features

filter: Add --empty-output-reporting={error,warn,silent} option to allow filter to produce empty outputs without raising an error. The default behavior is still to raise an error when filter produces an empty output, so users will have to explicitly pass the "warn" or "silent" value to bypass the error. #1175 (@joverlee521)

Bug fixes

translate: Fix error handling when features cannot be read from reference sequence file. #1168 (@victorlin)
translate: Remove an unnecessary check which allowed for inaccurate error messages to be shown. #1169 (@victorlin)
frequencies: Previously, monthly pivot points calculated from the end of a month may have been shifted by 1-3 days. This is now fixed. #1150 (@victorlin)
Update development status on PyPI from "3 - Alpha" to "5 - Production/Stable". This should have been done since the beginning of this changelog, but now it is official. #1160 (@corneliusroemer)

Scientific Software - Peer-reviewed - Python
Published by github-actions[bot] almost 3 years ago

Bug fixes

Constrain Biopython version to <=1.80 so that augur translate is not broken by a deprecation of UnknownSeq in 1.81. When running augur translate with Biopython 1.81, the user will receive an error starting with ERROR: Package BCBio.GFF not found! and ending with TypeError: object of type 'NoneType' has no len(). #1152 (@corneliusroemer)

Scientific Software - Peer-reviewed - Python
Published by huddlej almost 3 years ago

Major Changes

measurements export: Supports exporting multiple thresholds per collection via the measurements config and the --thresholds option. This change is backwards compatible with previous uses of the --threshold option. However, due to the updates to the JSON schema, users will need to update to Auspice v2.43.0 for thresholds to be displayed properly in the measurements panel. #1148 (@joverlee521)

Features

export v2: Add --validation-mode={error,warn,skip} option for more nuanced control of validation. The new "warn" mode performs validation and emits messages about potential problems, but it does not cause the export command to fail even if there are problems. #1135 (@tsibley)

Bug Fixes

filter, frequencies, refine, parse: Properly handle invalid date errors and output the bad date. #1140 (@victorlin)
export, validate: Validation errors are now much more human-readable and actually pinpoint the problems. #1134 (@tsibley)

Scientific Software - Peer-reviewed - Python
Published by joverlee521 almost 3 years ago

Major Changes

frequencies: Changes the logic for calculating the time points when frequencies are estimated to ensure that the user-provided "end date" is always included. This change in the behavior of the frequencies command fixes a bug where large intervals between time points (e.g., 3 months) could cause recent data to be omitted from frequency calculations. See the pull request for more details included the scientific implications of this bug. #1121 (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by huddlej almost 3 years ago

Features

titers: Support parsing of thresholded values (e.g., "<80" or ">2560"). #1118 (@huddlej)
tree: Support bootstrapped trees generated with RAxML via user-provided --tree-builder-args. #1127 (@tsibley)

Bug Fixes

utils: Serialize common numpy data types in write_json. #1119 (@victorlin)
filter: Standardize exit codes from internal error handling. #931 (@victorlin)
tree: Suppress the Cannot specify --substitution-model unless using IQTree warning when --substitution-model is left at its default. #1127 (@tsibley)
tree: Print the underlying error message when tree building fails. #1127 (@tsibley)
Previously, numpy and scipy were installed as dependencies of dependencies. Mark them as direct dependencies since they are used directly within Augur. #1120 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by victorlin almost 3 years ago

Features

titers: Allow users to specify a custom prefix for attributes in the JSON output (e.g., cTiter can be changed to custom_prefix_cTiter). #1106 (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by huddlej about 3 years ago

Features

io: Add open_file and write_sequences to the Python Pubic API. #1114 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by joverlee521 about 3 years ago

Major Changes

io: Only read_metadata and read_sequences are available as part of the Python Public API. Other Python API functions of the augur.io module are no longer directly available. This is a breaking change, although we suspect few users to be impacted. If you still need to use other imports in your scripts, they can be imported from the Developer API but note that they are no longer part of the Public API. #1087 (@victorlin)

Bug Fixes

docs: Update the API documentation to reflect the latest state of things in the codebase. #1087 (@victorlin)
Fix support for Biopython version 1.80 which deprecated Bio.Seq.Seq.ungap(). #1102 (@victorlin)
export v2: Fixed a bug where colorings for zero values via --colors would not get applied to the exported Auspice JSON. #1100 (@joverlee521)
curate: Fixed a bug where metadata TSVs failed to parse if data within a column included comma separated values #1110 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by joverlee521 about 3 years ago

Features

Add the curate subcommand with two sub-subcommands, passthru and normalize-strings. The curate subcommand is intended to be a suite of commands to help users with data curation prior to running Nextstrain analyses. We will continue to add more subcommands as we identify other common data curation tasks. Please see the usage docs for details. #1039 (@joverlee521)

Scientific Software - Peer-reviewed - Python
Published by joverlee521 about 3 years ago

Bug Fixes

traits: Fix trait inference when tips have missing values. #1081 (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by victorlin about 3 years ago

Bug Fixes

filter: Fixed a bug where --group-by week would fail when all samples in a chunk have been dropped due to ambiguous dates. #1080 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by victorlin about 3 years ago

Features

filter: Add support to group by ISO week (--group-by week) during subsampling. #1067 (@victorlin)

Bug Fixes

filter: Fixed unintended behavior in which grouping by day would "work" when used with month and/or year. Updated so it will be ignored. #1070 (@victorlin)
filter: Fixed unintended behavior in which grouping by month with ambiguous years would "work". Updated so date ambiguity is checked properly for all generated columns. #1072 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by victorlin about 3 years ago

Major Changes

export: The --node-data option may now be given multiple times to provide additional .json files. Previously, subsequent occurrences of the option overrode prior occurrences. This is a breaking change, although we expect few usages to be impacted. Each occurrence of the option may still specify multiple files at a time. #1010 (@tsibley)

Bug Fixes

refine: 17.1.0 updated TreeTime to version 0.9.2 and introduced the refine flag --use-fft. This makes previously costly marginal date inference cheaper. This update adjusts when refine runs marginal date inference during its iterative optimization. Without the use-fft flag, it will now behave as it did before 17.1.0 (marginal inference only during final iterations). With the --use-fft flag, marginal date inference will be used at every step during the iteration if refine is run with --date-inference marginal #1034. (@rneher)
tree: When using IQtree as tre builder, --nthreads now sets the maximum number of threads (IQtree argument -ntmax). The actual number of threads to use can be specified by the user through the tree-builder-arg -nt which defaults to -nt AUTO, causing IQtree to automatically chose the best number of threads to use #1042 (@corneliusroemer)
Make cvxopt as a required dependency, since it is required for titer models to work #1035. (@victorlin)
filter: Fix compatibility with Pandas 1.5.0 which could cause an unexpected AttributeError with an invalid --query given to augur filter. #1050 (@tsibley)
refine: Add --verbosity argument that is passed down to TreeTime to facilitate monitoring and debugging. #1033 (@anna-parker)
Improve handling of errors from TreeTime. #1033 (@anna-parker)

Scientific Software - Peer-reviewed - Python
Published by victorlin over 3 years ago

Features

refine: Upgrade TreeTime from 0.8.6 to >= 0.9.2 which enables a speedup of timetree inference in marginal mode due to the use of Fast Fourier Transforms #1018. (@rneher and @anna-parker)

Bug Fixes

refine, export v1: Use pandas.DataFrame.at instead of .loc for single values #979. (@victorlin)
refine: Gracefully handle all exceptions from TreeTime #1023. (@anna-parker)
refine: Document branch length units treetime expects #1024. (@anna-parker)
dates: Raise an error when metadata to get_numerical_dates() is not a pandas DataFrame #1026. (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by victorlin over 3 years ago

17.0.0 (9 August 2022)

Major Changes

Moved the following modules to subpackages #1002. (@joverlee521) These are technically breaking changes for the API, but they do not change the Augur CLI commands.
- import.py -> import_/__init__.py
- import_beast.py -> import_/beast.py
- measurements.py -> measurements/__init__.py + measurements/concat.py + measurements/export.py
Move the following internal functions/classes #1002. (@joverlee521)
- augur.add_default_command -> argparse_.add_default_command
- utils.HideAsFalseAction -> argparse_.HideAsFalseAction
Subcommands must include a register_parser function to add their own parser instead of a register_arguments function #1002. (@joverlee521)
utils: Remove internal function utils.read_metadata() #978. (@victorlin)
- Use io.read_metadata() going forwards.
- To switch to using metadata as a pandas DataFrame (recommended):
  - Iterate through strains: metadata.items() -> metadata.iterrows()
  - Check strain presence: strain in metadata -> strain in metadata.index
  - Check field presence: field in metadata[strain] -> field in metadata.columns
  - Get metadata for a strain: metadata[strain] -> metadata.loc[strain]
  - Get field for a strain: metadata[strain][field] -> metadata.at[strain, field]
- To keep using metadata in a dictionary: py metadata = read_metadata(args.metadata) metadata.insert(0, "strain", metadata.index.values) columns = metadata.columns metadata = metadata.to_dict(orient="index")

Features

export: --skip-validation now also skips version compatibility checks #902. (@corneliusroemer)
filter: Report names of duplicate strains found during metadata parsing #1008 (@huddlej)
translate: Add support for Nextclade gene map GFFs #1017 (@huddlej)

Bug Fixes

filter: Rename internal force inclusion filtering functions #1006 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by victorlin over 3 years ago

16.0.3 (6 July 2022)

Bug Fixes

filter: Move register_arguments to the top of the module for better readability #995. (@victorlin)
filter: Fix a regression introduced in 16.0.2 that caused grouping with subsampled max sequences and force-included strains to fail in a data-specific way #1000. (@huddlej)

Scientific Software - Peer-reviewed - Python
Published by corneliusroemer over 3 years ago

16.0.2 (30 June 2022)

Bug Fixes

The entropy panel was unavailable if mutations were not translated #881. This has been fixed by creating an additional annotations block in augur ancestral containing (nucleotide) genome annotations in the node-data #961 (@jameshadfield)
ancestral: WARNINGs to stdout have been updated to print to stderr #961 (@jameshadfield)
filter: Explicitly drop date/year/month columns from metadata during grouping. #967 (@victorlin)
- This fixes a bug #871 where augur filter would crash with a cryptic ValueError if year and/or month is a custom column in the input metadata and also included in --group-by.
filter: Fix duplicates that may appear in metadata when using --include/--include-where with subsampling #986 (@victorlin)

Scientific Software - Peer-reviewed - Python
Published by victorlin over 3 years ago

Features

Augur refine, ancestral and traits now use the upgraded TreeTime v0.7 This should have a number of under-the-hood improvements. See PR 431
ancestral: New options to either --keep-ambiguous or --infer-ambiguous. If using --infer-ambiguous the previous behavior will be maintained in which tips with N will have their nucleotide state inferred. If using --keep-ambiguous, these tips will be left as N. With this upgrade, we are still defaulting to --infer-ambiguous, however, we plan to swap default to --keep-ambiguous in the future. If this distintion matters to you, we would suggest that you explicitly record --keep-ambiguous / --infer-ambiguous in your build process. Also part of PR 431
traits: Allow input of --weights which references a .tsv file in the following format: division Hubei 10.0 division Jiangxi 1.0 division Chongqing 1.0 where these weights represent equilibrium frequencies in the CTMC transition model. We imagine the primary use of user-specified weights to correct for strong sampling biases in available data. See PR 443

Bug fixes

Improvements to make shell scripts run more easily on Windows. See PR 437

Scientific Software - Peer-reviewed - Python
Published by trvrb almost 6 years ago

Features

refine: Include --divergence-units option to distinguish between mutations and mutations-per-site. Keep mutations-per-site as default behavior. See PR 435

Bug fixes

utils: Support v2 auspice JSONs in jsontotree utility function. See PR 432

6.1.1 (17 December 2019)

Bug fixes

frequencies: Fix bug in string matching for weighted frequencies introduced in v6.1.0. See PR 426.

Scientific Software - Peer-reviewed - Python
Published by trvrb almost 6 years ago

6.1.1 (17 December 2019)

Bug fixes

frequencies: Fix bug in string matching for weighted frequencies introduced in v6.1.0. See PR 426.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

6.1.0 (13 December 2019)

Features

export: Include --description option to pass in a Markdown file with dataset description. This is displays in Auspice in the footer. For rationale, see Auspice issue 707 and for Augur changes see PR 423.

Bug fixes

frequencies: Fix weighted frequencies when weight keys are unrepresented. See PR 420.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.0.5.dev1 (26 November 2018)

Bug fixes

translate: Nucleotide ("nuc") annotation for non-bacterial builds starts at 0 again, not 1, fixing a regression.

Documentation

Schemas: Correct coordinate system description for genome start/end annotations.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.0 (18 December 2018)

Features

reconstruct-sequences: Include augur reconstruct-sequences module that reconstructs alignments from mutations inferred on the tree
distance: Include augur distance module that calculates the distance between amino acid sequences across entire genes or at a predefined subset of sites
lbi: Include augur lbi module that calculates local branching index (LBI) for a given tree and one or more sets of parameters.
frequencies: Include --method kde as option to augur frequencies, separate from the existing --method diffusion logic. KDE frequencies are faster and better for smaller clades but don't extrapolate as well as diffusion frequencies.
titers: Enable annotation of nodes in a tree from the substitution model

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.1 (21 December 2018)

Bug fixes

filter: Fix --include-where. Adds an all_seq variable needed by the logic to include records by value. This was previously working for VCF but threw an exception for sequences in FASTA format.
Update flu reference viruses and lat longs.
Update dependencies

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.2 (21 December 2018)

Bug fixes

Update dependencies

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.3 (29 December 2018)

Features

filter: Add --non-nucleotide option to remove sequences with non-conforming nucleotide characters.

Bug fixes

Revise treatment of -, in augur parse to leave - as is and remove white space. Also delimit [ and ] to _.
Fix bug in naming of temp IQTREE fixes to prevent conflicts from simultaneous builds.

Data

Include additional country lat/longs in base data

Development

Remove non-modular measles build in favor of nextstrain/measles repo.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.4 (1 January 2019)

Bug fixes

frequencies: Include counts in augur frequencies output JSON to support downstream plotting.

Data

Include additional country lat/longs in base data

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.5 (13 January 2019)

Features

frequencies: Add --ignore-char and --minimal-clade-size as options.
frequencies: Include --stiffness and --inertia as options.
titers: Allow multiple titer date files in --titers import.

Bug fixes

filter: Fix --non-nucleotide call to include ? as allowed character.
tree: Fix --method raxml to properly delimit interim RAxML output so that simultaneous builds don't conflict.

Data

Include additional country lat/longs in base data

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.6 (29 January 2019)

Features

filter: Allow negative matches to --exclude-where. For example, --exclude-where country!=usa would exclude all samples where metadata country does not equal usa.
tree: Allow --exclude-sites to work with FASTA input. Ensure that indexing of input sites is one-based.

Bug fixes

fix loading of strains when loading titers from file, previously strains had not been filtered to match the tree appropriately

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.7 (5 February 2019)

Bug fixes

Update to TreeTime 0.5.3
tree: Fix bug in printing causing errors in Python versions <3.6
tree: Alter site masking to not be so memory intensive

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

3.1.8 (13 February 2019)

Bug fixes

titers: fix calculation of mean_potentency for model export

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

4.0.0 (24 April 2019)

Features

distance: New interface for specifying distances between sequences. This is a backwards-incompatible change. Refer to augur distance --help for all the details.
export: Add a --minify-json flag to omit indentation in Auspice JSONs.

Bug fixes

frequencies: Emit one-based coordinates (instead of zero-based) for KDE-based mutation frequencies

Data

Include additional country lat/longs in base data

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.0.0 (26 May 2019)

Features

ancestral: New option to --keep-ambiguous, which will not infer nucleotides at ambiguous (N) sites on tip sequences and instead leave as 'N' See PR 280.
ancestral: New option to --keep-overhangs, which will not infer nucleotides for gaps on either side of the alignment and instead leave as '-'. See PR 286.
clades: This module has been reconfigured to identify clade defining mutations on top of a reference rather than identifying mutations along the tree. The command line arguments are the same except for the addition of --reference, which explicitly passes in a reference sequence. If --reference is not defined, then reference will be drawn from the root node of the phylogeny by looking for sequence attribute attached to root node of --tree. See PR 288.
refine: Revise rooting behavior. Previously --root took 'best', 'residual', 'rsq' and 'mindev' as options. In this update --root takes 'best', least-squares', 'mindev' and 'oldest' as rooting options. This eliminates 'residual' and 'rsq' as options. This is a backwards-incompatible change. This requires updating TreeTime to version 0.5.4 or above. See PR 263.
refine: Add --keep-root option that overrides --root specification to preserve tree rooting. See PR 263.
refine: Add --covariance and --no-covariance options that specify TreeTime behavior. See PR 263.
titers: This command now throws an InsufficientDataException if there are not sufficient titers to infer a model. This is paired with a new --allow-empty-model flag that proceeds past the InsufficientDataException and writes out a model JSON corresponding to an 'empty' model. See PR 281.
By default JSONs are written with index=1 to give a pretty-printed JSON. However, this adds significant file size to large tree JSONs. If the environment variable AUGUR_MINIFY_JSON is set then minified JSONs are printed instead. This mirror the explicit --minify-json argument available to augur export. See PR 278.

Bug fixes

export: Cast numeric values to strings for export. See issue 287.
export: Legend order preserves ordering passed in by user for traits that have default colorings ('country' and 'region'). See PR 284.
refine: Previously, the --root argument was silently ignored when no timetree was inferred. Re-rooting with an outgroup is sensible even without a timetree. See PR 282.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.1.0 (29 May 2019)

Documentation

Documentation is now available online for the augur CLI and Python API via Read The Docs: https://nextstrain-augur.readthedocs.io. The latest version on RTD points to the git master branch, and the stable version to the most recent tagged release. Instructions for building the docs locally are in the README.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.1.1 (1 July 2019)

Features

tree: Add support for the GTR+R10 substitution model.
tree: Support parentheses in node names when using IQ-TREE.

Bug fixes

Use the center of the UK for its coordinates instead of London.
filter: Mark --output required, which it always was but wasn't marked.
filter: Avoid error when no excluded strains file is provided.
export: Fix for preliminary version 2 schema support.
refine: Correct error handling when the tree file is missing or empty.

Documentation

Add examples of Augur usage in the wild.
Rename and reorganize CLI and Python API pages a little bit to make "where do I start learning to use Augur?" clearer to non-devs.

Development

Relax version requirements of pandas and seaborn. The hope is this will make installation smoother (particularly alongside other packages which require newer pandas versions) while not encountering breaking changes in newer versions ourselves.

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.2.0 (23 July 2019)

Features

ancestral: Adds a new flag --output-sequences and logic to support saving ancestral sequences and leaves from the given tree to a FASTA file. Also adds a redundant, more specific flag --output-node-data that will replace the current --output flag in the next major version release of augur. For now, we issue a deprecation warning when the --output flag is used. Note that FASTA output is only allowed for FASTA inputs and not for VCFs. We don't allow FASTA output for VCFs anywhere else and, if we did here, the output files would be very large. See PR 293
frequencies: Allow --method kde flag to compute frequencies via KDE kernels. This complements existing method of --method diffusion. Generally, KDE frequencies should be more robust and faster to run, but will not project as well when forecasting frequencies into the future. See PR 271

Bug fixes

ancestral, traits, translate: Print warning if supplied tree is missing internal node names (normally provided by running augur refine). See PR 283
Include pip in Conda enviroment file. See PR 309

Documentation

Document environment variables respected by Augur

Development

Remove matplotlib and seaborn from setup.py install. These are still called a few places in augur (like titers.validate()), but it was deemed rare enough that remove this from setup.py would ease general install for most users. Additionally, the ipdb debugger has been moved to dev dependencies. See PR 291
Refactor logic to read trees from multiple formats into a function. Adds a new function read_tree to the utils module that tries to safely handle reading trees in multiple input formats. See PR 310

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.2.1 (4 August 2019)

Bug fixes

Print more useful error message if Python recursion limit is reached. See issue 328
Print more useful error message if vcftools if missing. See PR 312

Development

Significantly relax version requirements specified in setup.py for biopython, pandas, etc... Additionally, move lesser used packages (cvxopt, matplotlib, seaborn) into an "extras_require" field. This should reduce conflicts with other pip installed packages. See PR 323

Data

Include additional country lat/longs in base data

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.3.0 (9 September 2019)

Features

export: Improve printing of error messages with missing or conflicting author data. See issue 274
filter: Improve printing of dropped strains to include reasons why strains were dropped. See PR 367
refine: Add support for command line flag --keep-polytomies to not resolve polytomies when producing a time tree. See PR 345

Bug fixes

Catch and throw error when there are duplicate strain names. See PR 356
Fix missing annotation of "parent" attribute for the root node
Run shell commands with more robust error checking. See PR 350
Better handling of rerooting options for trees without temporal information. See issue 348

Data

Small fixes in geographic coordinate file

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.4.0 (7 November 2019)

Features

frequencies: Include --minimal-clade-size-to-estimate command line option. See PR 383
lbi: Include --no-normalization command line option. See PR 380

Compatibility fixes

export: Include v1 subcommand to allow forwards compatibiliy with Augur v6 builds. See PR 398

Bug fixes

export: Include warning if using a mismatched v6 translate file. See PR 392
frequencies: Fix determination of interval for clipping of non-informative pivots

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

5.4.1 (12 November 2019)

Bug fixes

export v1: Include --minify-json option that was mistakenly not included in PR 398. See PR 409

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

6.0.0 (10 December 2019)

Overview

Version 6 is a major release of augur affecting many augur commands. The format of the exported JSON (v2) has changed and now merges the previously separate files containing tree and meta information. To maintain backward compatibility, the export command was split into export v1 (old) and export v2 (new). Detailed release notes are provided in the augur documentation on read-the-docs. For a migration guide, consult migrating-v5-v6.

Major features / changes

export: Swap from a separate _tree.json and _meta.json to a single "unified" dataset.json output file
export: Include additional command line options to alleviate need for Auspice config
export: Include option for reference sequence output
export: Move to GFF-style annotations
export: Validate exported JSONs against schema
ancestral: Allow output of FASTA and JSON files
import: Include import beast command to import labeled BEAST MCC tree
parse: Include --prettify-fields option to cleanup metadata fields
Documentation improvements

Minor features / changes

colors.tsv: Allow whitespace, but insist on tab delimiting
lat_longs.tsv: Allow whitespace, but insist on tab delimiting
Remove code for old "non-modular" augur, old "non-modular" builds and Python tests
Improve test builds
filter: More interpretable output of how many sequences have been filtered
filter: Additional flag --subsample-seed to seed the random number generator and thereby make subsampling reproducible
sequence-traits: Numerical output as originally intended, but required an Auspice bugfix
traits: Explanation of what is considered missing data & how it is interpreted
traits: GTR models are exported in the output JSON for better accountability & reproducibility

Scientific Software - Peer-reviewed - Python
Published by trvrb about 6 years ago

Recent Releases of Augur

Augur - 31.4.0

Features

Bug fixes

Augur - 31.3.0

Features

Bug fixes

Augur - 31.2.1

Bug fixes

Augur - 31.2.0

Features

Bug fixes

Augur - 31.1.0

Features

Bug fixes

Augur - 31.0.0

Major Changes

Bug fixes

Augur - 30.0.1

Bug fixes

Augur - 30.0.0

Major Changes

Bug fixes

Augur - 29.1.0

Features

Bug fixes

Augur - 29.0.0

Major Changes

Features

Bug fixes

Augur - 28.0.1

Bug Fixes

Augur - 28.0.0

Major Changes

Features

Bug Fixes

Internal changes

Augur - 27.2.0

Features

Augur - 27.1.0

Features

Bug Fixes

Augur - 27.0.0

Major Changes

Bug fixes

Augur - 26.2.0

Features

Augur - 26.1.0

Features

Bug Fixes

Augur - 26.0.0

Major Changes

Bug Fixes

Augur - 25.4.0

Features

Bug Fixes

Augur - 25.3.0

Features

Bug Fixes

Augur - 25.2.0

Features

Bug Fixes

Augur - 25.1.1

Bug Fixes

Augur - 25.1.0

Features

Augur - 25.0.0

Major changes

Features

Bug Fixes

Augur - 24.4.0

Features

Bug Fixes

Augur - 24.3.0

Features

Bug Fixes

Augur - 24.2.3

Bug Fixes

Augur - 24.2.2

Bug Fixes