Recent Releases of amptk

amptk - amptk v1.60

  • ITS database has now gotten too large to be hosted in single file on OSF, so its split into two. All users will need to upgrade to v1.6.0 in order for database to be properly downloaded, used.

- Python
Published by nextgenusfs over 2 years ago

amptk - AMPtk v1.5.5

  • bug fix for amptk unoise3 #96 #99

- Python
Published by nextgenusfs over 3 years ago

amptk - AMPtk v1.5.4

  • bug fix for amptk database that was stripping species epithet names #81
  • bug fix for linux systems related to platform, fix was to use distro #91
  • added action to build docker images and push to docker hub. Fully functional version can be run with shell amptk-docker script, usage see https://amptk.readthedocs.io/en/latest/index.html#run-from-docker
  • Updated fungal ITS database based off of UNITE v8.3, upgrade with amptk install
  • Added new PR2 database for universal SSU amplicons, https://github.com/pr2database/pr2database, add with amptk install
  • Re-wrote amptk funguild instead of running the original Guilds.py script as this stopped working at some point

- Python
Published by nextgenusfs over 4 years ago

amptk - AMPtk v1.5.3

  • typo/bug fixes #84 #85
  • fix to multiprocessing so now py>3.7 working #83

- Python
Published by nextgenusfs almost 5 years ago

amptk - amptk v1.5.2

Bug fix release

  • fix for merging PE reads with vsearch
  • fix dereplicate function in database #77
  • fix for filter if --negatives passed but no mock community barcode
  • fix for taxonomy where custom databases not working with the -d flag #80
  • add -p illumina3 to SRA-submit

- Python
Published by nextgenusfs about 5 years ago

amptk - amptk v1.5.1

  • to support OSX Catalina dropping 32-bit applications, I've removed usearch9 as a strict dependency. vsearch will be the default for all processing steps, including taxonomy assignment. Along these lines, I've dropped UTAX classifier as "default" in the hybrid taxonomy method.
  • dropping usearch, required addition of a few new dependencies: mafft, fasttree, and the python package pyfastx (for speed simplicity).
  • added support for PacBio CCS reads. Reads can be processed with amptk pacbio and then clustering with amptk pb-dada2
  • several bug fixes
  • fix embarrassing typos in v1.5.0 -- do not use v1.5.0 its broken.
  • apparently it does not work in python 3.8 -- investigating that for future release/support.
  • working on ONT (Oxford Nanopore) support as well -- stay tuned for this in near future.

- Python
Published by nextgenusfs over 5 years ago

amptk - amptk v1.4.2

  • add --pseudopool option to amptk dada2

- Python
Published by nextgenusfs over 6 years ago

amptk - amptk v1.4.1

  • bug fix for amptk summarize if taxonomy not present #57
  • add FASTA extraction step to install #59
  • add chimera detection options to dada2 #60
  • fix DADA2 for ion torrent data, in newer versions of dada2 the quality score issue has been fixed, so can now use quality scores during sample inference.

- Python
Published by nextgenusfs almost 7 years ago

amptk - amptk v1.4.0

  • fix for amptk summarize --> rewrote script
  • large update to taxonomy assignment, switched to vsearch for all global alignment steps
  • update to all taxonomy databases, the downloads are larger but the data should be more robust
  • update to the COI bold2utax method in docs
  • update to amptk database and the command line options
  • bug fixes along the way

- Python
Published by nextgenusfs over 7 years ago

amptk - AMPtk v1.3.0

This is a major reorganization of the code so it is "properly packaged" and can be installed with pip and hopefully conda. After v1.2.4 there were apparently changes to conda that would not allow the scripts to build with the previous code organization. I've then also fixed a few of the bugs that have shown up more recently, including: * The py2 error in the bold2amptk.py accessory script https://github.com/nextgenusfs/amptk/issues/40 * the edlib version bug reported several times after edlib updated how it stores the version info https://github.com/nextgenusfs/amptk/issues/46

- Python
Published by nextgenusfs over 7 years ago

amptk - amptk v1.2.5

  • bug fix where edlib version was not properly parsed due to changes upstream, fix is backwards compatible and hopefully future version compatible as well. This was error in #40 and several other "offline" emails.
  • update to amptk stats to generate interactive html output for NMDS output, allowing users to identify which samples correspond to which point on the graph. This requires r-plotly, r-htmltools, and r-dt -- however should be all installed through bioconda
  • bug fixes for amtpk SRA-submit
  • improve error/logging in amptk taxonomy
  • fix R scripts to be compatible with R>3.4.1 -- somehow different way of parsing the command line arguments. Also removed the "auto-install" function in the R scripts -- this was meant to be a convenience but didn't always work.

- Python
Published by nextgenusfs over 7 years ago

amptk - amptk v1.2.4

  • trying to fix tab error that bioconda didn't like, version bump accordingly.

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.2.3

  • fix auto-detect of base name output files in clustering scripts
  • bump version
  • update citation as now published in PeerJ https://peerj.com/articles/4925/

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.2.2

  • fix tab indentation bug for py3
  • add minimum length and trimming length to log file and terminal output.

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.2.1

  • numerous bug fixes: including a fix for --require_primer off in amptk illumina. Several bug fixes in amptk illumina2 and amptk illumina3 pre-processing steps.
  • thread (processes) control added for clustering steps

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.2.0

  • for all Illumina pre-processing methods the order of steps has been changed to now 1) search for primers/trim and then 2) merged PE reads. This ensures that all primer/adapter sequences are trimmed from dataset, whereas previously AMPtk releases merged PE reads first - because of how usearch/vsearch merge PE reads, occasional primer/adapter sequences were slipping through the pre-processing steps. Also reads with multiple primer hits (i.e. two forward primers) are now discarded. While this results in the pre-processing steps being slightly slower in runtime, it increases the data quality downstream.
  • read orientation is tested/fixed on the fly for amptk illumina2 workflow (barcodes/primers in reads). Some datasets of 50/50 read orientation.
  • added a check for "inverted OTUs" for all denoising/clustering steps. This was largely a check to validate that changing illumina pre-processing steps were working correctly (as unintended result was a small number of OTUs that were on the "crick" strand)
  • Default mapping file now has a 'RevBarcodeSequence' column. Mapping files used for amptk illumina2 enforce the paired barcode sequences in the mapping file. If barcode fasta files are given, then all combinations of barcodes (5' and 3') are saved.
  • a few py27/36 bug fixes

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.1.3

  • update to amptk taxonomy to alert user if sample names are duplicated
  • many fixes to unify the 'base name" output for many scripts
  • remove colored date/time stamp for all platforms except Mac
  • many fixes for py2/py3 compatibility
  • amptk now moved to bin directory - will slightly change install

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.1.2

  • update compatibility for py2/3

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.1.1

  • bug fixes for amptk lulu, bug fixes for amptk taxonomy to better deal with 16S 8 level taxonomy
  • added some minor functions to amptk filter
  • improvement of amptk illumina3 as well as allowing --barcode_rev_comp to reverse complement barcode sequences, i.e. if indexing was done on reverse primer. mapping file is not required now can also pass primers and a barcode faster file for demuxing.
  • clean up repo slightly, trying to get conda recipe working.

- Python
Published by nextgenusfs about 8 years ago

amptk - amptk v1.1.0

  • bug fix for amptk filter and the subtract feature https://github.com/nextgenusfs/amptk/issues/31
  • fix menu option https://github.com/nextgenusfs/amptk/issues/33
  • added LULU module for OTU curation, see more here: https://www.nature.com/articles/s41467-017-01312-x
  • LULU usage: amptk cluster --> amptk filter --> amptk lulu --> amptk taxonomy --> amptk stats
  • added indicator species analysis to amptk stats
  • several bug fixes for amptk stats
  • added option to drop specific OTUs prior to running amptk stats

- Python
Published by nextgenusfs over 8 years ago

amptk - amptk v1.0.3

  • update the ITS database to use newest version of UNITE database version 01.12.2017.

- Python
Published by nextgenusfs over 8 years ago

amptk - amptk v1.0.2

  • bug fix for amptk show if non gzipped file passed it was removed
  • added gzip support for amptk sample
  • bug fix for amptk filter if OTUs and OTU table did not overlap 100% then script would die, added sanity check
  • changed --col_order in amptk filter to be a space separated list (was previously comma separated with no space).

- Python
Published by nextgenusfs over 8 years ago

amptk - amptk v1.0.1

  • minor bug fixes for amptk database
  • update to amptk filter to only label potential chimera OTUs if --calculate all is passed, i.e. you are using a synthetic mock that won't be in the rest of your samples

- Python
Published by nextgenusfs over 8 years ago

amptk - amptk v1.0.0

  • now requires edlib v1.2.1 -> thanks to Edlib developer (Martin Šošić) for finding the bug that was preventing full usage of edlib in amptk. primer and barcode searches now very fast.
  • added a new phyloseq module, amptk stats which will run some preliminary community ecology stats on your BIOM output file. requires R and phyloseq
  • added support for UNOSIE3 via the amptk unoise3 command, note you will need to have USEARCHv10 for this script to work.
  • updated global alignment taxonomy search to better deal with multiple hits in the reference database
  • updated ITS reference database as well as COI database.
  • finally wrote some more comprehensive documents, located at http://amptk.readthedocs.io
  • several minor bug fixes

- Python
Published by nextgenusfs over 8 years ago

amptk - amptk v0.10.3

  • bug fixes for amptk database. Rewrote the dereplication function and added --lca or last common ancestor function for building the UTAX databases.
  • bug fixes for amptk SRA-submit and update to edlib for searching.
  • bug fix for amptk filter where samples passed to --drop are not used for index-bleed calculation
  • bug fix for pre-processing and --mult_samples argument

- Python
Published by nextgenusfs over 8 years ago

amptk - amptk v0.10.2

  • bug fix for -t, --threshold option of amptk select and amptk remove. Bug was introduced during last update when support for compressed files was introduced
  • AMPtk now supports degenerate nucleotide primer matching, thanks to the very fast edlib v1.2 library. You will need to have at least edlib v1.2.0, can upgrade with pip, i.e. pip install -U edlib. The scripts will check your edlib version during runtime and let you know if you need to upgrade.

- Python
Published by nextgenusfs almost 9 years ago

amptk - amptk v0.10.1

  • bug fix for amptk filter where some mock sequences were not being annotated correctly
  • upgrade amptk SRA-submit to use edlib alignment
  • fix menu in several places

- Python
Published by nextgenusfs almost 9 years ago

amptk - amptk v0.10.0

  • Major update is to use edlib library for alignment, this is a dramatic increase in speed, however downside is that degenerate nucleotides are not supported in edlib currently (hoping to get this fixed soon). You can increase --primer_mismatch to allow for degenerate matches, keep in mind that currently any degenerate nucleotide will be counted as a mismatch. v0.9.3 still supports degenerate nucleotides, although the alignment is much less accurate and is 10X slower.
  • edlib alignment now supports barcode_mismatches as well without a loss in speed.
  • update to MergePE function, which allows user to select either vsearch or search for merging paired end fastq files, controlled via --merge_method. Update to phiX filtering to split files if >3GB to avoid memory problem in USEARCH 32 bit.
  • add amptk illumina3 method for pre-processing, this will demultiplex Illumina PE files along with index read files
  • support for gzipped input files, as well as now default will output fq.gz demuxed files. Save space during processing.
  • updated docker container and install instructions

- Python
Published by nextgenusfs almost 9 years ago

amptk - amptk v0.9.3

  • remove bedtools as dependency for converting BAM -> FASTQ. Now AMPtk will first try to use samtools if it exists, then bedtools if it exists, and will default to pybam native python parser to convert. Pybam is 10X slower than samtools, but is written in python thus no extra dependencies needed.
  • added threshold filtering to amptk remove and amptk select, so you could remove all samples with reads less than 5000 by running, amptk remove -i input.demux.fq -t 5000 -o output.demux.fq

- Python
Published by nextgenusfs about 9 years ago

amptk - amptk v0.9.2

  • bug fix for amptk dada2 denoising where reads were getting ignored if they contained any ambiguous nucleotides. The filter for ambiguous nucleotides is still maintained prior to DADA2, note that terminal N's from padding will be properly removed, only internal ambiguous nucleotides are not allowed in DADA2

- Python
Published by nextgenusfs about 9 years ago

amptk - amptk v0.9.1

  • Add phix filtering for Illumina data. As part of the PE merging function in amptk illumina and amptk illumina2, scripts will also now run phix removal.
  • Workaround for DADA2 error where samples that only have 1 read post filtering trigger a derep$quals matrix error. amptk dada2 now has -m, --min_reads option to drop samples that have fewer than -m reads. Default this is set to 10, however, in practice probably this should be much higher, but this should avoid the above error.

- Python
Published by nextgenusfs about 9 years ago

amptk - amptk v0.9.0

  • added better support for amptk SRA-submit
  • added ability to normalize heat map
  • added amptk SRA which can be used to process reads downloaded from the SRA, where they are in a single FASTQ file, i.e. from ION or 454 data that has been demultiplexed into samples and then submitted.
  • created Dockerfile for using amptk with the scipy-notebook jupyter notebook server.

- Python
Published by nextgenusfs about 9 years ago

amptk - amptk v0.8.8

  • unify the output naming files from UNOISE2 and DADA2 "clustering" output.

- Python
Published by nextgenusfs over 9 years ago

amptk - amptk v0.8.7

  • support for new DADA2 algorithm allowing variable length reads, must have > v1.3.3.

- Python
Published by nextgenusfs over 9 years ago

amptk - amptk v0.8.6

  • add amptk drop to remove OTUs from a dataset and then create an updated OTU table
  • fix for amptk illumina where empty files would cause script to terminate
  • fix for biom output to explicitly be json
  • fix in amptk remove to allow fasta output

- Python
Published by nextgenusfs over 9 years ago

amptk - amptk v0.8.5

  • bug fixes for pre-processing steps where short primer-dimers could make it through filtering, get padded with N's and get incorporated as OTUs in clustering
  • update to amptk filter to output the final OTU table to have real read counts as opposed to "pseudo" counts from normalization. Filtering is done with normalization, but now read counts are restored to original read numbers. Important for downstream stats like beta diversity
  • improved read summary reporting in pre-processing steps
  • update to amptk unoise2 to output both inferred or denoised sequences/tables as well as biological OTU sequences (clustered at 97%).

- Python
Published by nextgenusfs over 9 years ago

amptk - amptk v0.8.0

  • package has undergone a name change to reflect changes in the scripts. Originally the project started as essentially a wrapper for UPARSE and thus relied heavily on USEARCH. Coupled with originally supporting fungal ITS sequences, it was named UFITS (usearch fungal ITS). However, the current implementation of AMPtk relies very little on USEARCH and can support any amplicon based NGS dataset. Out of the box the following DB are packaged: fungal ITS, fungal LSU, 16S, insect/animal COI. Thus I feel that amptk is a better name that describes what the scripts do.
  • option -p, --pad was added for amptk ion, amptk illumina, amptk illumina2, and amptk 454 to allow user to turn off the padding with Ns to the --trim_len
  • option -c,--calculate was added to amptk filter to control how the script calculates index-bleed. By default it calculates index-bleed into the mock community sample (-b) as well as out of the mock community into the rest of the samples. However, if members of the mock community are found in your samples, this calculated number is wrong, so if any members of your mock community are plausibly found in samples that you are sequencing, then you should use the --calculate in option.
  • packaged databases had to be moved to a different sharing location (USDA now prevents use of dropbox), so they are now on Box, however it seems like the download speed is quite a bit slower. If anybody has recommendations for a free place to host these databases let me, need about 1 GB of space and need to be able to access with a directly link from the command line.

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.7.4

  • move the mergereads function to general library
  • better reporting for merge illumina reads for both ufits illumina ufits illumina2
  • fix for ufits illumina to only require primer if amplicons are longer than the read length. This is to prevent amplicons that are shorter than the read length to be discarded as they are automatically trimmed/merged via usearch -fastq_mergepairs tool (and I can't change this). So the default behavior now is to require a forward primer via --require_primer on setting only if the amplicon length is longer than the read length. Read length is calculated automatically via sampling the first 50 reads, the automatic detection is overruled by the --read_length option

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.7.3

  • fix critical bug in ufits illumina processing of reads where if reverse primer was not found read would be discarded

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.7.2

  • update to ufits taxonomy to allow for taxonomy to be calculated elsewhere, pass the -t, --taxonomy option and a 2 column tsv file, OTUTaxonomy
  • update to progress/multiprocessing steps
  • re-write demultiplexing steps for faster processing
  • support gzip files in ufits illumina2
  • options in ufits filter for how the threshold is calculated for index-bleed filtering

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.7.1

  • bug fix in ufits illumina where R2 reads were not getting trimmed correctly.

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.7.0

I bumped versions here to illustrate that UFITS has changed a little under the hood, now requires at least USEARCH v9.1.13 and requires at least VSEARCH v2.20. These changes were made to maximize speed and simplify the code. The scripts will terminate if they detect lower versions of both of these software tools. BIOM, RDP, Blast are still "soft dependencies". - bug fix in ufits taxonomy where RDP taxonomy was not processed correctly for BIOM conversion - fix for https://github.com/nextgenusfs/ufits/issues/14 - fix for https://github.com/nextgenusfs/ufits/issues/13 by moving to VSEARCH for this task - fix for https://github.com/nextgenusfs/ufits/issues/12, now ufits filter requires you to add a mock community fasta file --mc if you specify a -b, --barcode to filter your data on - fix for ufits filter to deal with OTU tables that have taxonomy already appended - fix for ufits cluster_ref where script would die after conversion to VSEARCH as hard dependency - re-write of ufits heatmap to have a few more options and more flexibility. - update to docs as well as a section showing how to get your data into downstream statistical tools

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.6.1

  • bug fix in ufits taxonomy if --tax_filter was used the filtered OTU table would not be correct in the BIOM output file
  • fix for ufits ion if using --mult_samples now creates mapping_file correctly.
  • updates to the docs on new usage

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.6.0

Several major changes in this version: - taxonomy for ITS is updated to newest release from UNITE v7.1 11-20-2016. - USEARCH9 is now supported throughout and defaults have been changed to use usearch9 - UNOISE2 algorithm is employed in a 'clustering' module - SINTAX algorithm is supported in ufits taxonomy. default hybrid method now uses SINTAX, UTAX, and global alignment to infer the best taxonomy assignment. - QIIME-like mapping files can now be used during demultiplexing/pre-processing. If you do not use a mapping file, the scripts will create one for you. The mapping file can be used to add metadata to it and then passed to ufits taxonomy to create a BIOM output file containing all metadata, OTUs, and taxonomy - BIOM output of ufits taxonomy is compatible with QIIME, PHINCH, MetaCoMET, PhyloSeq, etc. - ufits filter now alerts user if passing a barcode name via -b is not found in OTU table

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.6

  • updated UFITS with better logging for external programs, so now log file should be more informative if you run into any errors. this will help me diagnose the problem.
  • bug fix for ufits dada2 where script would die if --uchime_ref option not passed

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.5

  • update to COI database. Previous version had some mistakes in re-formating the BOLD database. Scripts and workflow on how this database was constructed is available here
  • updates to ufits dada2 pipeline. Script will now also create bOTUs (biological OTUs) as the DADA2 output is sensitive to 1 bp, thus a single "species" may be spread out over several iSeqs. Therefore, to accommodate downstream community ecology statistics, these iSeqs are clustered at a set threshold (-p, --pct_otu) to collapse "species" into OTUs.
  • updates to ufits data2 so that it builds an OTU table in same manner as ufits cluster, i.e. original reads are mapped to iSeqs (as opposed to DADA2 only using quality filtered data for OTU table generation).

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.4

  • added support for reference chimera filtering in the ufits dada2 OTU picking method

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.3

  • improve the terminal output of ufits dada2 as well as the Rscript logging

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.2

  • update to Rscript running DADA2 to auto install the required R package if missing, this will only be done if package is missing
  • minor updates to output to terminal in ufits dada2

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.1

  • new module to support DADA2 inferred sequences "clustering" method. Reads must be the same length, so may not be ideal for fungal ITS sequences (or other variable length amplicons). The script is run with ufits dada2 and uses the output from any of the ufits pre-processing commands, i.e. ufits ion, ufits illumina, ufits 454, etc.

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.5.0

  • update to ufits cluster to correctly output chimeras detected during clustering and reference filtering
  • update to automatically detect delimiters in parsing OTU tables
  • enhancement to allow for barcode mismatches in the ufits illumina2 script - note the current implementation is slow and for 99% of uses I don't recommend setting barcode mismatches > 0.

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.9

  • remove requirement of otu table for ufits taxonomy
  • add support for dual barcodes for ufits illumina2 and ufits 454
  • default merge PE reads for ufits illumina2 now rescues the forward reads if the pair cannot be merged

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.8

  • fix bug in ufits illumina where period (.) where not processed correctly in sample names
  • fix bug in ufits filter when passing the --cleanup option. Totally dumb mistake....
  • Add support for creating BIOM v2.1 OTU tables if you have the biom package installed. Also added a unified taxonomy 2-column output for each OTU.
  • added --cleanup option to ufits illumina

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.7

  • fix for pre-processing reads from Illumina platform where custom sequencing primers are used, ufits illumina command now handles those datasets better. Note I would not recommend using custom primers, but that is of course up to you...
  • minor update to docs

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.6

  • some bug fixes related to use of USEARCH9, UFITS now supports v9.0.2124 (note: do not use 9.0.2123 has error in -cluster_otus command)
  • updates to how UFITS outputs the system info, now cleaner with more info on the OS
  • fix bug on failed import for ufits keep and ufits remove

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.5

*bug fix in ufits taxonomy where would output csv file - this is only output tsv now

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.4

  • bug fix related to downloading databases

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.3

  • update to Databases: UNITE v7.1, LSU (based on RDP data), 16S (based on RDP gold), COI (based on BOLD database of arthropods and chordates).
  • databases are installed now with ufits install -i ITS LSU 16S COI
  • Robert Edgar now says that chimera reference filtering should use largest database (as opposed to UCHIME paper that says a small curated database is better), thus chimera reference filtering is now configured to do exactly that, options during clustering are: ufits cluster --uchime_ref [ITS,16S,LSU,COI, custom_path]
  • ufits cluster_ref has been updated with the above database information - note this script is still experimental and I would not recommended using it for any environmental data at this point - there may be some targeted usage where it is appropriate
  • re-write of ufits taxonomy to coincide with updates to databases, now you can pass one of the pre-installed databases to the -d, --db flag or you can specify manually a database using --fasta_db, --usearch_db, and/or --utax_db

- Python
Published by nextgenusfs over 9 years ago

amptk - ufits v0.4.2

  • bug fix to ufits keep and ufits remove
  • bug fix to ufits taxonomy where passing a --fasta_db failed
  • update to README

- Python
Published by nextgenusfs almost 10 years ago

amptk - ufits v0.4.1

  • update to menu of ufits taxonomy to be more consistent with rest of the scripts, -i for input OTU table, -f for input fasta
  • improve flexibility of ufits taxonomy to work with other groups, removed --only_fungi and replace with --tax_filter, i.e. --tax_filter Fungi.
  • fix menu in wrapper scripts to reflect changes

- Python
Published by nextgenusfs almost 10 years ago

amptk - ufits v0.4.0

  • improvement to ufits filter script to handle some of the filtering more gracefully and added a few more options
  • added ability to supply a list of samples/barcodes to ufits remove and ufits select to make more flexible
  • add new clustering module based on reference based clustering called ufits cluster_ref, it functions by quality trimming, dereplicating, chimera filtering, mapping to reference database. It can then also rescue unmapped reads and run de novo clustering on them followed by UTAX reference based clustering. I think that standard de novo clustering is superior to this approach but in some fringe cases it may be useful.

- Python
Published by nextgenusfs almost 10 years ago

amptk - ufits v0.3.16

  • slight modification to ufits show that also shows read lengths
  • minor bug fix in ufits taxonomy that now properly exits if databases not installed
  • changed the way ufits filter calculates index bleed from a sum of all counts per OTU, to now using the maximum value per sample.
  • upgrade to ufits meta to also allow for splitting up data by taxonomy classification (if taxonomy from UFITS default method)

- Python
Published by nextgenusfs almost 10 years ago

amptk - ufits v0.3.14

  • added ufits show to count barcodes from demuxed data
  • fixed bug in ufits filter if try to use -s auto option without providing a --mock_barcode

- Python
Published by nextgenusfs almost 10 years ago

amptk - ufits v0.3.13

  • bug fix where clustering log file was getting overwritten

- Python
Published by nextgenusfs almost 10 years ago

amptk - ufits v0.3.12

  • bug fix to ufits filter where the --col_order option now checks for samples before sorting to avoid error if sample passed but not in dataset, now it is ignored
  • bug fix to ufits filter if using -s and --keep_mock options - script would die

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.11

  • minor update to annotate mock community mapping to OTUs in ufits filter

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.10

  • bug fix where ufits filter did not rename fasta headers for OTUs that were mapped from the mock community
  • changed filename of output OTUs from ufits cluster to basename.cluster.otus.fa

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.8

  • added a check for vsearch version, need > v1.9.1

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.7

  • updates to support looking at full length amplicons instead of trimming and padding. Only recommended to use if you know what you are doing

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.6

  • minor fix of ufits cluster using vsearch where if read length --length was longer than actual length of reads then vsearch crashes. Now scripts calculate read length and adjust appropriately
  • added some output to stdout for filter command to let users know where files are located

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.5

  • minor fix to allow vsearch to work with older Ion Torrent Data where sometimes quality scores are greater than 41.

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.4

  • added support for vsearch for pre-processing reads as well as mapping reads to OTUs. If vsearch is installed, scripts will use it automatically, otherwise default to Python and/or usearch
  • fix bug in ufits filter that resulted in error if no index bleed filter or spike in barcode passed on argument line
  • also a home-brew formula to install ufits. brew tap nextgenusfs/tap followed by brew install ufits will install the package as well as bedtools and vsearch two optional dependencies. usearch must still be installed manually.

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.3.3

  • some bug fixes and updated docs
  • fixed the local blast search for ufits taxonomy
  • fixed ufits filter to work with any mock community fasta file
  • update --uchime_ref to be able to use custom database
  • fix logging in ufits database command

- Python
Published by nextgenusfs about 10 years ago

amptk - UFITS v0.3.2

A somewhat major bug fix: the de-multiplexing script for Ion Torrent data was not removing reverse primer correctly. This has been fixed. - complete re-write of the ufits filter script to now normalize counts per number of reads in each sample prior to running the --index_bleed filter to remove noise from the dataset. - ufits filter script now can deal with synthetic mock spike-in control (set as default). Calculates index-bleed in both directions and smartly filters OTU table. (email me for details on synthetic mock if you are interested). - Taxonomy database was updated to most recent UNITE release - Taxonomy database installation method is changed, now downloads pre-formatted databases for ITS1, ITS2, and FULL length ITS sets. - Updated UCHIME reference sequences - New functionality for ufits taxonomy that allows for removal of non-fungal OTUs with the --only_fungi argument - Added script for sub-sampling or rarefaction of data prior to clustering - Added script for selecting or removing samples from a de-mulitplexed data set - Added script to append an OTU table to meta data file - Added support for FUNGuilds functional annotation of taxonomy in OTU table.

- Python
Published by nextgenusfs about 10 years ago

amptk - ufits v0.2.8

  • improvement of ufits database command for processing the taxonomy information for training UTAX
  • added support for illumina data that is in a single file, i.e. with similar read setup as 454/ion ufits illumina2. I've seen this type of data from MrDNA service
  • updated ufits install command to by default train UTAX for full length, ITS1, and ITS2 for ufits taxonomy
  • minor bug improvements

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits v0.2.6

  • update to support Roche 454 reads (SFF or FASTA/QUAL as input) and need a fasta file of barcodes used
  • minor bug fixes
  • ufits install now skips primer trimming for UNITE+INSD database as it takes a lot of time without any added benefit because global search uses full length of the OTU and db doesn't matter if it is trimmed or not

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits v0.2.5

  • Several changes to syntax for commands to make easier to use.
  • Added an install option to download, format, and create taxonomy databases from UNITE
  • by default ufits taxonomy now uses a hybrid approach (UTAX and USEARCH) to get most out of taxonomy from legitimate hits. If USEARCH hit is < 97% identical, then UTAX is used. If hit is > 97% then taxonomy from UTAX vs USEARCH is compared and whichever result has more levels of taxonomy is used.

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits v0.2.4

  • minor bug fixes
  • update to include ufits summarize command that generates taxonomy level OTU tables as well as makes a stacked Bar graph of each level of taxonomy in the dataset.

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits v0.2.3

  • update to allow for changing the UTAX confidence threshold for taxonomy (0 to 0.99), default is 0.8 or 80%

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits.v0.2.1

  • Major update that now supports taxonomy. You can assign taxonomy using the UTAX Classifier or using a more classical "blast" like search using USEARCH and a compatible database
  • introduce ufits taxonomy, ufits download, and ufits database commands for assigning taxonomy
  • several minor bug fixes

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits.v0.1.2

  • added support for using only the forward (R1) reads from Illumina data
  • updated documentation to support changes and version number.

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits.v0.1.1

  • Some minor bug fixes and updated documentation
  • Added support for drawing heatmap from OTU table, will require matplotlib, numpy, and pandas

```

install the dependencies with pip

pip install matplotlib numpy pandas ```

- Python
Published by nextgenusfs over 10 years ago

amptk - ufits.v0.1.0

Initial release of UFITS package.

- Python
Published by nextgenusfs over 10 years ago