Recent Releases of thapbi-pict

thapbi-pict - THAPBI PICT v1.0.20

Released on PyPI 2024-12-31:

https://pypi.org/project/thapbi-pict/1.0.20/

Script sample_filter.py now accepts multiple (classified) tally files. This can be used to pool multiple separate runs of the pipeline (e.g. one per plate).

Promoted MT065831.1 Phytophthora nicotianae in the default database to species level.

The helper scripts will now use rich-argparse if installed for more colorful command line help.

Fixed a scaling issue in the summary reports for large datasets.

Added SBATCH headers to the worked example run.sh scripts for use on a SLURM compute cluster.

Fixed some minor cross-platform and initial setup issues in the worked examples.

- Python
Published by peterjc about 1 year ago

thapbi-pict - THAPBI PICT v1.0.19

Released on PyPI 2024-12-05:

https://pypi.org/project/thapbi-pict/1.0.19/

Updated curated entries and NCBI bulk genus-only import in the default Phytophthora focused ITS1 database.

- Python
Published by peterjc over 1 year ago

thapbi-pict - THAPBI PICT v1.0.18

Released on PyPI 2024-11-25:

https://pypi.org/project/thapbi-pict/1.0.18/

The ena-submit command was updated for the current ENA expectations, 12 manadatory fields for uploading paired FASTQ files. This now requires the study accession, but no longer requires an insert size etc.

The default ITS1 DB was updated to use the November 2024 NCBI taxonomy, which now has a taxid for Phytophthora heteromorpha.

- Python
Published by peterjc over 1 year ago

thapbi-pict - THAPBI PICT v1.0.17

Released on PyPI 2024-11-08:

https://pypi.org/project/thapbi-pict/1.0.17/

This is mostly a maintenance release bumping the minimum version to Python 3.10, and updating the versions of the dependencies to their current known working releases.

This also adds additional cosmetic functionality if the Python library rich-argpase is installed: The command line help parsing will now display in colour if your terminal supports this.

- Python
Published by peterjc over 1 year ago

thapbi-pict - THAPBI PICT v1.0.16

Released on PyPI 2024-09-09:

https://pypi.org/project/thapbi-pict/1.0.16/

Drops empty rows/columns (all zero read counts) in the reports when using --requiremeta or -q to hide samples without metadata.

This release also updates the default ITS1 database slightly, adopting the September 2024 NCBI taxonomy and adding a few more reference sequences affecting non-Phytophthora only.

- Python
Published by peterjc over 1 year ago

thapbi-pict - THAPBI PICT v1.0.15

Released on PyPI 2024-09-05:

https://pypi.org/project/thapbi-pict/1.0.15/

Updated the default ITS1 database with additional curated entries, many from Jung et al. (2024), and the NCBI taxonomy as of August 2024. See:

Jung et al. (2024) Worldwide forest surveys reveal forty-three new species in Phytophthora major Clade 2 with fundamental implications for the evolution and biogeography of the genus and global plant biosecurity https://doi.org/10.3114/sim.2024.107.04

This included renaming our placeholder Phytophthora taxon ad7988a to Phytophthora sp. taxon castitis as per KC478759:

Bertier et al. (2013) The expansion of Phytophthora clade 8b: three new species associated with winter grown vegetable crops https://doi.org/10.3767%2F003158513X668554

- Python
Published by peterjc over 1 year ago

thapbi-pict - THAPBI PICT v1.0.14

Released on PyPI 2024-08-26:

https://pypi.org/project/thapbi-pict/1.0.14/

Added a fifth synthetic control to the default DB, now used in the revised protocol at the Hutton Institute to construct mock-communities with magnitude steps in concentration, for use with the Illumina NextSeq and potentially higher read depth.

The pipeline and prepare-reads commands now support an optional -k or --markers argument to restrict the analysis to a subset of the markers defined in a multiple-marker database. An example use-case would be processing our historic MiSeq plates using ITS1 only when using a DB containing ITS1 and rps10 marker definitions. The default remains looking for all defined markers.

- Python
Published by peterjc over 1 year ago

thapbi-pict - THAPBI PICT v1.0.13

Released on PyPI 2024-05-22:

https://pypi.org/project/thapbi-pict/1.0.13/

Updated the default Phytophthora focused ITS1 database with a more recent NCBI search, taxonomy, etc.

Stops using freeze-panes in Excel output which was no longer practical with small screens and/or lots of metadata.

- Python
Published by peterjc almost 2 years ago

thapbi-pict - THAPBI PICT v1.0.12

Released on PyPI on 2024-03-11:

https://pypi.org/project/thapbi-pict/1.0.12/

Fixes the inadvertent use of type-annotation syntax which had required Python 3.9 or later since THAPBI PICT v1.0.9. This release is tested on and requires at least Python 3.8.

Graceful edit-graph failure if missing graphviz fdp command line tool.

Heuristics for importing sequences in SINTAX style when missing genus in the species field.

Metadata argument -x now accepts multiple columns.

- Python
Published by peterjc almost 2 years ago

thapbi-pict - THAPBI PICT v1.0.11

Released on PyPI on 2024-03-05:

https://pypi.org/project/thapbi-pict/1.0.11/

Harmonised the ASV naming used in the optional FASTA and BIOM output files to match the main TSV and Excel usage. The sample-tally command and stage of the pipeline can now optionally output a BIOM file.

Importing the (pre-classifier) BIOM file and matching FASTA file of ASV sequences into Qiime2 has been tested.

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.10

Released on PyPI on 2024-02-26:

https://pypi.org/project/thapbi-pict/1.0.10/

Changed sample report 'Unique' column to be the number of unique ASVs as per the expectation in the examples. Previously this showed the number of unique species or species complex classifications (a smaller number, although often close with good database coverage and a low diversity sample).

Adjusted the metadata counts in the read report to show 'Accepted' and 'Unique' in read report column headers, and replaced the old TOTAL row values (equal to the 'Accepted' values) with MAX values instead.

Use thousands separators in Excel for read counts etc. This should respect regional settings.

The import command now rejects species names with semi-colon in them, which would cause issues downstream as the semi-colon is used to separate multiple taxonomic matches.

Updated the gg_to_sintax.py helper script to accept FASTA and TSV input files from Qiime2 archives.

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.9

Released on PyPI on 2024-02-12:

https://pypi.org/project/thapbi-pict/1.0.9/

Using Python type annotations (internal code change).

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.8

Released on PyPI on 2024-02-06:

https://pypi.org/project/thapbi-pict/1.0.8/

Updated the default database with the February 2024 NCBI taxonomy (e.g. Phytophthora glovera is now P. gloveri).

Added several additional curated Phytophthora to the default ITS1 database, including 15 novel taxa which we have observed in multiple samples from the environment or tree nurseries, and KP691408.1 as Phytophthora taxon Catala2017sp4 from Català et al. (2017) https://doi.org/10.1111/ppa.12541

Corrected the year in the novel species entries like Phytophthora taxon Catala2015sp1 to match the citation Català et al. (2015) https://doi.org/10.1371/journal.pone.0119311

Also added a 1s6g classifier following the existing naming pattern.

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.7

Released on PyPI on 2024-01-29:

https://pypi.org/project/thapbi-pict/1.0.7/

Updated the default database to treat Phytophthora cambivora as a synonym of the more recent Phytophthora x cambivora description as a hybrid.

Also fixed the documentation builds on Read-The-Docs.

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.6

Released on PyPI on 2024-01-24:

https://pypi.org/project/thapbi-pict/1.0.6/

Updated the NCBI import and curated entries in the default database.

Added a minimum sample count option to edit-graph command.

Added basic Python type annotation in helper scripts (internal change).

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.5

Released on PyPI on 2023-11-22:

https://pypi.org/project/thapbi-pict/1.0.5/

Updated the NCBI import in the default database, and scripted most of what was a semi-manual process to do this.

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.4

Released on PyPI on 2023-11-20:

https://pypi.org/project/thapbi-pict/1.0.4/

Dropped unused -m / --method argument to the edit-graph command.

- Python
Published by peterjc about 2 years ago

thapbi-pict - THAPBI PICT v1.0.3

Released on PyPI on 2023-09-04:

https://pypi.org/project/thapbi-pict/1.0.3/

Updated the NCBI taxonomy and bulk imported entries at genus level, along with adding two curated entries for Phytophthora condilina.

Belated update to the Batovska et al. (2021) pest insects worked example for the pooled marker report changes in v1.0.2.

- Python
Published by peterjc over 2 years ago

thapbi-pict - THAPBI PICT v1.0.2

Released on PyPI on 2023-08-18:

https://pypi.org/project/thapbi-pict/1.0.2/

Documentation updated to request citation of the now published paper:

Cock et al. (2023) "THAPBI PICT - a fast, cautious, and accurate metabarcoding analysis pipeline" PeerJ 11:e15648 https://doi.org/10.7717/peerj.15648

The summary stage now preserves the cutadapt, singletons, etc columns in the pooled reports for multiple markers. Will take the sum or maximum as appropriate. This allows the plot_reduction.py script to be used on pooled reports.

The plot_reduction.py script has been enhancemed to offer raw counts and percentages in addition to the original stacked counts mode, and the ability to pool sample groups by column(s) of the metadata.

Also a belated update to the soil_nematodes/ example for the v1.0.1 change to unoise-l read correction.

- Python
Published by peterjc over 2 years ago

thapbi-pict - THAPBI PICT v1.0.1

Released on PyPI on 2023-07-26:

https://pypi.org/project/thapbi-pict/1.0.1/

Now requires at least Python 3.7 (since Python 3.6 is no maintained). Fixed some rare corner-case read-corrections in unoise-l mode. Improved memory usage using the sample-tally step. Updated the tests for a slight change in chimera detection in VSEARCH 2.23.0. Adjustments to the logging in verbose mode.

Added a new script producing a data-reduction stacked plot as used for the accepted manuscript, based on the figure in the preprint.

Minor documentation changes including noting the paper has been accepted, but until it is published continue to suggest citing the preprint:

Cock et al. (2023) "THAPBI PICT - a fast, cautious, and accurate metabarcoding analysis pipeline" bioRxiv https://doi.org/10.1101/2023.03.24.534090

- Python
Published by peterjc over 2 years ago

thapbi-pict - THAPBI PICT v1.0.0

Released on PyPI on 2023-05-19:

https://pypi.org/project/thapbi-pict/1.0.0/

Minor documentation changes since v0.14.1, including adding links to our preprint:

Cock et al. (2023) "THAPBI PICT - a fast, cautious, and accurate metabarcoding analysis pipeline" bioRxiv https://doi.org/10.1101/2023.03.24.534090

- Python
Published by peterjc almost 3 years ago

thapbi-pict - THAPBI PICT v0.14.1

Released on PyPI on 2023-03-13:

https://pypi.org/project/thapbi-pict/0.14.1/

The tool now offers optional BIOM format output (requested at the command line with the --biom switch), which requires the Python biom-format library to be installed. See:

McDonald *et al.* (2012) The Biological Observation Matrix (BIOM) format or:
how I learned to stop worrying and love the ome-ome.
https://doi.org/10.1186/2047-217X-1-7

This will hopefully facilitate interoperability and downstream analysis of our tools output.

- Python
Published by peterjc almost 3 years ago

thapbi-pict - THAPBI PICT v0.14.0

Released on PyPI on 2023-02-03:

https://pypi.org/project/thapbi-pict/0.14.0/

The tool now offers UNOISE style read-correction (off by default), either a built-in implementation of the published algorithm, or by invoking the command line tools USEARCH or VSEARCH. This algorithm requires access to all the reads prior to abundance level thresholds, and thus required some restructuring of the pipeline. The read-preparation step therefore only discards singletons, with the new sample-tally step combining all the unique sequence variants (ASVs). This can optionally apply read-correction before applying the abundance thresholds (which can still be set dynamically using control samples). This is output as a sequence tally table which can be converted to BIOM format. Furthermore, the classifier output now extends the sequence tally table with additional columns containing the taxid and genus-species.

- Python
Published by peterjc about 3 years ago

thapbi-pict - THAPBI PICT v0.13.6

Released on PyPI on 2022-12-28:

https://pypi.org/project/thapbi-pict/0.13.6/

Miscellaneous small fixes and documentation updates, including fixing the factional abundance threshold in sample-tally which was not quite strict enough (only relevant if used outside the pipeline).

- Python
Published by peterjc about 3 years ago

thapbi-pict - THAPBI PICT v0.13.5

Released on PyPI on 2022-12-21:

https://pypi.org/project/thapbi-pict/0.13.5/

Miscellaneous small fixes and documentation updates, including fixing excessive memory usage in the new sample-tally command with larger datasets.

- Python
Published by peterjc about 3 years ago

thapbi-pict - THAPBI PICT v0.13.4

Released on PyPI on 2022-12-07:

https://pypi.org/project/thapbi-pict/0.13.4/

Extends the use of the sample-tally command added in v0.13.3, which can now also perform the abundance filtering including control-driven abundance thresholds. The species versus sample tally table header now also includes an entry for which of the samples are controls.

Again, the motivation behind this change is mostly as a stepping stone towards planned future functionality.

- Python
Published by peterjc about 3 years ago

thapbi-pict - THAPBI PICT v0.13.3

Released on PyPI on 2022-11-25:

https://pypi.org/project/thapbi-pict/0.13.3/

Introduces a new sample-tally command which is now used in the pipeline in place of the older fasta-nr command. This outputs a BIOM style TSV with one row per ASV giving the sample abundances as columns, with the ASV sequence as the final column. This TSV file is now used as input to the summary command, rather than scanning the per-sample intermediate FASTA files.

The motivation behind this change is mostly as a stepping stone towards planned future functionality.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.13.2

Released on PyPI on 2022-11-11:

https://pypi.org/project/thapbi-pict/0.13.2/

Sped up substr classifier, especially with larger databases. Various small documentation updates, and some minor code cleanup and refactoring.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.13.1

Released on PyPI on 2022-09-21:

https://pypi.org/project/thapbi-pict/0.13.1/

Dramatically improves the time taken to import large FASTA files into the database, at the expense of higher memory usage.

Updated the default database with latest genus-level NCBI results, added a new Plasmopara sequence (OP326699.1), and multiple additional accessions to an existing Phytopythium sequence.

Caps the --cpu argument by the number of available CPUs under Linux, or number of CPUs if that information is not available.

The pipeline and prepare reads commands now accept .fq or .fq.gz extensions in addition to .fastq or .fastq.gz for FASTQ input files.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.13.0

Released on PyPI on 2022-09-14:

https://pypi.org/project/thapbi-pict/0.13.0/

The main change is faster distance based classifiers by efficient use of the RapidFuzz library. This requires at least RapidFuzz version 2.4.0 to work.

This release drops the human readable plain text sample report which was not useful with large datasets, and now always includes the threshold columns (previously hidden when their values were the same for all samples).

Finally importing sequences into a database is now faster, and the NCBI taxonomy used in the default DB has been updated adding taxids to recently added species.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.9

Released on PyPI on 2022-08-19:

https://pypi.org/project/thapbi-pict/0.12.9/

Changes to the default Phytophthora-centric ITS1 database, with further refinement of the left-trimming for the bulk import of NCBI search results, and the addition of type strains from Jung et al. (2022) to the curated Phytophthora set.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.8

Released on PyPI on 2022-08-08:

https://pypi.org/project/thapbi-pict/0.12.8/

Now treats NCBI taxonomy 'equivalent name' as a synonym, resulting in minor changes to the non-Phytophthora in the default ITS1 database (including 11 cases of Pythium now recorded as Phytopythium).

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.7

Released on PyPI on 2022-07-26:

https://pypi.org/project/thapbi-pict/0.12.7/

Fixes the missing NCBI taxid in the genus-only fallback classifier output (when using the NCBI taxonomy), previously would report zero.

Also minor updates to the genus-only NCBI imports in the default database, adding another ten non-Phytophthora ITS1 sequences which had been excluded in error as part of resolving conflicting genus information. There are generally associated with species which have changed genus.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.6

Released on PyPI on 2022-07-25:

https://pypi.org/project/thapbi-pict/0.12.6/

Reworked how we import NCBI sequences (at genus level) into the default Phytophthora centric database. Now import extended published sequences missing (part of) the conserved starting 32bp leader where the extended version is observed in our own environmental data in at least 5 samples and at least 1000 reads total abundance. Currently this adds 59 unique sequences, around half of which are Phytophthora. Also, have relaxed the stringency on the right primer matching, but now insist on this being present. This added over 100 non-Phytophthora entries, mostly Globisporangium.

Internally we now track the history of the default database as a FASTA export (including genus/species and NCBI taxid), rather than a raw SQL dump. This is much easier to interpret by eye, and has less spurious changes (e.g. renumbering of entries from taxonomy additions). However, this does require rebuilding the database from the source files (which are already under version control).

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.5

Released on PyPI on 2022-07-08:

https://pypi.org/project/thapbi-pict/0.12.5/

Now records synonym entries for NCBI taxonomy identifiers for sub-nodes (e.g. varietas nodes for parent species, or clade nodes for parent genus), in addition to their names, which improves importing references into the database via the taxid.

The existing FASTA reference import support for the ObiTools format now prioritises any species information already in the database for the NCBI taxid over that in the FASTA file. This is particularly helpful when there have been changes (e.g. the genus of a species has been changed) since the FASTA file was created.

Added a new FASTA reference importing convention which uses the NCBI taxid only (which therefore requires pre-loading the taxonomy).

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.4

Released on PyPI on 2022-07-07:

https://pypi.org/project/thapbi-pict/0.12.4/

This is a minor release to support RapidFuzz v2.0.0 or later, used only for the edit-graph functionality.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.3

Released on PyPI on 2022-07-06:

https://pypi.org/project/thapbi-pict/0.12.3/

This release focused on additions to the default database, updating the NCBI taxonomy and refreshing the bulk import of genus-only entries from an ITS1 search on the NCBI.

Also includes some documentation updates, and fixed an integer overflow bug when outputting a distance matrix.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.2

Released on PyPI on 2022-06-15:

https://pypi.org/project/thapbi-pict/0.12.2/

This release focused on additions to the default database, with additional curated sequences for Phytophthora panamensis, sp. Kunnunara, transitoria, variabilis, and 13 candidate species from Catala et a. (2018). Additionally the right-trimming of the entry for Phytophthora rhizophorae was corrected.

- Python
Published by peterjc over 3 years ago

thapbi-pict - THAPBI PICT v0.12.1

Released on PyPI on 2022-05-18:

https://pypi.org/project/thapbi-pict/0.12.1/

Fixes a regression on sample reports including unsequenced samples missing a blank value field. This was only partially addressed in v0.11.6, and would block the pooling script from completing.

- Python
Published by peterjc almost 4 years ago

thapbi-pict - THAPBI PICT v0.12.0

Released on PyPI on 2022-04-19:

https://pypi.org/project/thapbi-pict/0.12.0/

Can now use synthetic spike-ins control to automatically raise the fractional abundance threshold. For example, if 99% of a control is recognised as synthetic spike-in sequence, then the fractional abundance threshold can be raised to 1%.

The sample reports updated to include the number of singletons and number of accepted unique sequences, in addition to the total number of accepted reads. This required updates to the metadata stored in the intermediate files.

Computation of the edit-graph has been optimised and performance runtime significantly improved. Additionally a regression in the XGMML output was fixed.

Dependencies, tests, and examples updated to use FLASH v1.2.11 and Cutadapt v4.0, a new release with minor changes in our accepted read counts compared to cutadapt v3.7.

- Python
Published by peterjc almost 4 years ago

thapbi-pict - THAPBI PICT v0.11.6

Released on PyPI on 2022-03-09:

https://pypi.org/project/thapbi-pict/0.11.6/

Fixed a regression building summary reports when some samples have not been sequenced.

- Python
Published by peterjc almost 4 years ago

thapbi-pict - THAPBI PICT v0.11.5

Released on PyPI on 2022-02-18:

https://pypi.org/project/thapbi-pict/0.11.5/

Reporting enhancements when using spike-in (synthetic) controls.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.6.1

This is a belated release on GitHub, with v0.6.1 published on PyPI 2020-01-08

https://pypi.org/project/thapbi-pict/0.6.1/

The marker sequences in the default curated Phytophthora ITS1 database were extended to include the leading normally conserved 32bp region which had previously been discarded. This point release required Python 3.6 onwards due to adopting new features of the Python language. The later v0.6.x releases focused on improved reporting.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.11.4

Released on PyPI on 2022-02-08:

https://pypi.org/project/thapbi-pict/0.11.4/

Updates the default DB with a further six Phytophthora species.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.11.3

Released on PyPI on 2022-02-01:

https://pypi.org/project/thapbi-pict/0.11.3/

Fixes the dynamic k-mer threshold for detecting synthetic spike-in control sequences.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.11.2

Released on PyPI on 2022-01-20:

https://pypi.org/project/thapbi-pict/0.11.2/

Small fixes for use on Windows, and automated continuous integration testing on Windows using AppVeyor.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.11.1

Released on PyPI on 2022-01-18:

https://pypi.org/project/thapbi-pict/0.11.1/

Switched from the python-Levenshtein to rapidfuzz Python library for Levenshtein edit-distance, which is more easily installed on Windows, and should be faster too.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.11.0

Released on PyPI on 2022-01-13:

https://pypi.org/project/thapbi-pict/0.11.0/

When used with multiple markers the pipeline now also produces combined reports by pooling the predictions from each marker.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.10.6

Released on PyPI on 2022-01-12:

https://pypi.org/project/thapbi-pict/0.10.6/

This fixes a slow-down in v0.10.0 when using a small database, most noticeable when exploring a large new dataset with a minimal (often ad-hoc) database. It restores the earlier approach of building a cloud of all 1bp edits of the DB entries in memory to speed up comparison to the samples sequences (as long as the DB is not too large as this becomes memory intensive).

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.10.5

Released on PyPI on 2021-12-23:

https://pypi.org/project/thapbi-pict/0.10.5/

Default for -f / --abundance-fraction is now 0.001, meaning 0.1%. The percentage based abundance threshold was previously off by default.

- Python
Published by peterjc about 4 years ago

thapbi-pict - THAPBI PICT v0.10.4

Released on PyPI on 2021-11-24:

https://pypi.org/project/thapbi-pict/0.10.4/

Updates to default curated DB, including newer NCBI taxonomy.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.10.3

Released on PyPI on 2021-11-19:

https://pypi.org/project/thapbi-pict/0.10.3/

New -f / --abundance-fraction setting, off by default. Particularly useful for experiments where the sequence depth varies dramatically between samples.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.10.2

Released on PyPI on 2021-11-05:

https://pypi.org/project/thapbi-pict/0.10.2/

Simplifies how the NCBI taxonomy is loaded. Also (unless in lax mode), when importing curated FASTA files sequences where the species is not in our DB, these are now retained but at genus level only.

The default database was rebuilt with these changes, the October 2021 NCBI taxonomy, and some additional newly curated entries.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.10.1

Released on PyPI on 2021-07-28:

https://pypi.org/project/thapbi-pict/0.10.0/

Fixes the classifier code to run under SQLAlchemy v1.3. The previous release was accidentally using a new alias only available in SQLAlchemy v1.4 onwards.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.10.0

Released on PyPI on 2021-07-28:

https://pypi.org/project/thapbi-pict/0.10.0/

This update changes the database schema in a backward compatibility breaking way in order to support multiple primer amplicons, which will be separated based on the primers when calling cutadapt.

It also reworks the distance based classifiers to reduce their memory usage, taking advantage of the changes in v0.9.9 to make a non-redundant FASTA file so that the classifier does not have to repeatedly re-classify the same sequences. This makes it possible to use larger databases.

This has required adding -k / --marker to some of the commands. There were therefore additional minor changes to the command line options including renaming -k / --spike to -y / --synthetic, dropping -k as a shorthand for --known. Also the -o / -output and -r / --report arguments were combined into a single output folder or stem setting.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.9.9

Released on PyPI on 2021-07-08:

https://pypi.org/project/thapbi-pict/0.9.9/

Dropped the SWARM based classifiers. This allowed optimising the pipeline to run the classifier on a non-redundant FASTA file containing all the observed sequences - with associated changes to the summary report code etc to match.

Also optimised the memory footprint of the load-tax command to more than halve the peak RAM requirement in building our default database.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.9.8

Released on PyPI on 2021-06-17:

https://pypi.org/project/thapbi-pict/0.9.8/

Dropped edit-graph in pipeline. Rarely required, and computationally expensive.

Require full length primers in merged reads. Avoids some problematic reads, but does discard a small number of usable sequences.

Fixed an issue with the pooling script when some entries were flagged as pending.

- Python
Published by peterjc over 4 years ago

thapbi-pict - THAPBI PICT v0.9.7

Released on PyPI on 2021-06-04:

https://pypi.org/project/thapbi-pict/0.9.7/

Now supports the USEARCH SINTAX and OBITools FASTA conventions in the import command.

Dropped support for in-situ intermediate FASTA output. Mixing raw data and intermediate is not a good idea, and this complicated future plans.

Drop prepare-reads command option -p / --primers for reporting reads which failed primer matching. Again, this complicated future plans.

Updates to the test suite.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.6

Released on PyPI on 2021-05-21:

https://pypi.org/project/thapbi-pict/0.9.6/

Updated the default database, focused on curation to remove uninformative entries:

  • Now using the May 2021 NCBI taxonomy, which moved Phytophthora versiformis out of the unclassified Phytophthora (so we now accept it). However, the curated sequence is shared with Phytophthora castanetorum & Phytophthora quercina.
  • Narrowed the genus level NCBI import to the Peronosporales & Pythiales only, having not seen any matches outside these groups.
  • Left primer the NCBI import before look for 32bp leader. This removes a leading G in many sequences where the ~32bp leader started TT rather than TTT after the left primer site.
  • Limited the import to sequences up to 450bp only, which dropped some untrimmed sequences lacking a match to the right primer, and the curated Nothophytophthora caduca entry as too long.
  • Ignoring GQ149496/JF916542 as probably not Phytophthora.

Also added a simple distance matrix output format to the edit-graph command.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.5

Released on PyPI on 2021-05-10:

https://pypi.org/project/thapbi-pict/0.9.5/

Simplified to just one import command, taking pre-trimmed FASTA input. This means the end user must apply any primer-trimming to their reference set before importing the trimmed sequences into a database.

Also dropped an unused field in the database, which was originally any pre-trimmed sequence. This version will still work with databases created with older versions, but not the other way round.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.4

Released on PyPI on 2021-05-05:

https://pypi.org/project/thapbi-pict/0.9.4/

Dropped unused metadata fields from databases schema. This version will still work with databases created with older versions, but not the other way round.

Fixed output of GML format edit graphs.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.3

Released on PyPI on 2021-05-04:

https://pypi.org/project/thapbi-pict/0.9.3/

Replaced use of a simple HMM for spike-in control detection, now done via synthetic controls in the database, and k-mer counting.

This allows us to drop the dependency on hmmer3, making use on Windows significantly easier in principle.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.2

Released on PyPI on 2021-04-28:

https://pypi.org/project/thapbi-pict/0.9.2/

Improved test coverage of automatically raising the minimum abundance threshold based on negative controls.

Fixed an obscure problem using relative versions of absolute paths.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.1

Released on PyPI on 2021-04-20:

https://pypi.org/project/thapbi-pict/0.9.1/

Can now specify the encoding of the metadata TSV file (e.g. "latin1" or "macintosh", for when this does not match the system default).

Adds explicit warnings when using a low abundance threshold.

Also fixed some oversights in the manifest which had left some files out of the recent tar-ball source releases.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.9.0

Released on PyPI on 2021-04-19:

https://pypi.org/project/thapbi-pict/0.9.0/

Dropped use of Trimmomatic, which made the read preparation step slightly faster, and yields slightly higher read counts.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.8.4

Released on PyPI on 2021-04-13:

https://pypi.org/project/thapbi-pict/0.8.4/

Speed up re-running the classifier by delaying method setup until and if actually required.

Includes recent work adding the abundance threshold to the reports (unless constant), and the addition of a Python script for pooling the sample report.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.8.1

Released on PyPI on 2021-04-09:

https://pypi.org/project/thapbi-pict/0.8.1/

Simplified the intermediate classifier TSV file by dropping the species list embedded in the header. The assess command now requires a database to provide the list of possible species. There is no change to use of the pipeline command.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.8.0

Released on PyPI on 2021-04-06:

https://pypi.org/project/thapbi-pict/0.8.0/

Revised genus/species columns in sample report. Dropped the genus sub-totals, and instead show 'Genus (unknown species)' as needed. Changed the column sort order to move unknowns to the end. Added a human readable comma separated classification string as a new column.

Shortened (uncertain/ambiguous) to (*) in text report.

Added scripts/ folder with a few helper Python scripts.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.7.11

Released on PyPI on 2021-03-30:

https://pypi.org/project/thapbi-pict/0.7.11/

The thapbi_pict assess command was simplified to only operate at sample/species level.

The thapbi_pict classify command now has an optional minimum abundance argument, useful when you have been exploring lower abundance thresholds but wish to run a slower classifier only on the more abundant sequences.

Some documentation updates too, reflecting recent changes.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.7.10

Released on PyPI on 2021-03-24:

https://pypi.org/project/thapbi-pict/0.7.10/

The pipeline command now includes calling the fasta-nr command, making non-redundant FASTA file of all the sequences being analysed.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.7.9

Released on PyPI on 2021-03-15:

https://pypi.org/project/thapbi-pict/0.7.9/

Option to show unsequenced entries in summary sample report (-u or --unsequenced).

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.7.8

Released on PyPI on 2021-03-11:

https://pypi.org/project/thapbi-pict/0.7.8/

Fixed Nothophytophthora valdiviana entry in the default DB, added in the last release but with gap characters present.

Database importing now blocks gaps or anything other than IUPAC DNA letters.

- Python
Published by peterjc almost 5 years ago

thapbi-pict - THAPBI PICT v0.7.7

Released on PyPI on 2021-02-24:

https://pypi.org/project/thapbi-pict/0.7.7/

This release focused on improving the coverage of the default Phytophthora centric ITS1 database. It added 12 novel species from Català et al. (2018), hand curated Nothophytophthora sequences, additional single isolate controls, and expanded the NCBI import of published sequences (at genus level only) to include all Oomycota and refined this to avoid importing partial sequences.

This work markedly reduced the number of unknown sequences in our tree nursery and environmental survey work.

- Python
Published by peterjc about 5 years ago

thapbi-pict - THAPBI PICT v0.7.6

Released on PyPI on 2021-02-17:

https://pypi.org/project/thapbi-pict/0.7.6/

Replaced thapbi_pict seq-import command for adding single isolate FASTQ files to a DB, with thapbi_pict curated-seq for turning them into small FASTA files for use with thapbi_pict curated-import. This allows us to keep our single isolates sequenced July 2018 under version control, for use when building default DB.

There is no change to the sequences in the provided default database with this release.

- Python
Published by peterjc about 5 years ago

thapbi-pict - THAPBI PICT v0.7.5

Released on PyPI on 2021-02-16:

https://pypi.org/project/thapbi-pict/0.7.5/

Refined the default DB by adjusting how the genus-level NCBI import is trimmed. There are now less entries in the provided database, but we have removed poorly trimmed entries which would be useless with most of the classifier methods.

- Python
Published by peterjc about 5 years ago

thapbi-pict - THAPBI PICT v0.7.4

Released on PyPI on 2021-02-15:

https://pypi.org/project/thapbi-pict/0.7.4/

Edit-graph now include genus-only labels.

New 1s2g, 1s4g & 1s5g classifiers - like the recently added 1s3g but with different edit-distance thresholds for assigning a genus level match.

- Python
Published by peterjc about 5 years ago

thapbi-pict - v0.7.3 - Updated NCBI import & taxonomy. New 1s3g classifier. Use cutadapt v3.0+.

This is a belated release on GitHub, with v0.7.3 published on PyPI 2021-01-29

https://pypi.org/project/thapbi-pict/0.7.3/

Updated NCBI import & taxonomy. New 1s3g classifier. Use cutadapt v3.0+.

- Python
Published by peterjc about 5 years ago

thapbi-pict - v0.7.2 - Added command for use with interactive ENA read submission.

This is a belated release on GitHub, with v0.7.2 published on PyPI 2020-10-06

https://pypi.org/project/thapbi-pict/0.7.2/

Added --ena-submit command for use with interactive ENA read submission.

- Python
Published by peterjc about 5 years ago

thapbi-pict - v0.7.1 - Curated Phytophthora DB minor updates. Classifier output in edit-graph.

This is a belated release on GitHub, with v0.7.1 published on PyPI 2020-09-29

https://pypi.org/project/thapbi-pict/0.7.1/

Curated Phytophthora DB minor updates. Classifier output in edit-graph.

- Python
Published by peterjc about 5 years ago

thapbi-pict - v0.7.0 - Reports now include FASTQ statistics

This is a belated release on GitHub, with v0.7.0 published on PyPI 2020-04-02

https://pypi.org/project/thapbi-pict/0.7.0/

Read counts etc now stored as a header in intermediate FASTA files, and shown in reports.

- Python
Published by peterjc about 5 years ago

thapbi-pict - Final release from the v0.6.x series

This is a belated release on GitHub, with v0.6.15 published on PyPI 2020-03-12

https://pypi.org/project/thapbi-pict/0.6.15/

The v0.6.x series focuses on improved reporting. Additionally, the marker sequences in the default curated Phytophthora ITS1 database were extended to include the leading normally conserved 32bp region which had previously been discarded.

- Python
Published by peterjc about 5 years ago

thapbi-pict - Final release from the v0.5.x series

This is a belated release on GitHub, with v0.5.8 published on PyPI 2019-12-13

https://pypi.org/project/thapbi-pict/0.5.8/

The v0.5.x series focussed on the use of HMMs, which are now only used to describe control sequences. Specifically, a marker specific HMM is not longer used, making the tool easier to apply to other marker sequences.

- Python
Published by peterjc about 5 years ago

thapbi-pict - Final release from the v0.4.x series

This is a belated release on GitHub, with v0.4.19 published on PyPI 2019-11-19

https://pypi.org/project/thapbi-pict/0.4.19/

The v0.4.x series focussed on the curated Phytophthora ITS1 database, including adding NCBI taxonomy synonym support.

The read preparation step now applies the control based minimum abundance threshold at folder level (each 96-well plate or other batch should be in a separate folder).

- Python
Published by peterjc about 5 years ago