Recent Releases of amptk
amptk - amptk v1.60
- ITS database has now gotten too large to be hosted in single file on OSF, so its split into two. All users will need to upgrade to v1.6.0 in order for database to be properly downloaded, used.
- Python
Published by nextgenusfs over 2 years ago
amptk - AMPtk v1.5.5
- bug fix for
amptk unoise3#96 #99
- Python
Published by nextgenusfs over 3 years ago
amptk - AMPtk v1.5.4
- bug fix for
amptk databasethat was stripping species epithet names #81 - bug fix for linux systems related to platform, fix was to use
distro#91 - added action to build docker images and push to docker hub. Fully functional version can be run with shell
amptk-dockerscript, usage see https://amptk.readthedocs.io/en/latest/index.html#run-from-docker - Updated fungal ITS database based off of UNITE v8.3, upgrade with
amptk install - Added new PR2 database for universal SSU amplicons, https://github.com/pr2database/pr2database, add with
amptk install - Re-wrote
amptk funguildinstead of running the originalGuilds.pyscript as this stopped working at some point
- Python
Published by nextgenusfs over 4 years ago
amptk - AMPtk v1.5.3
- typo/bug fixes #84 #85
- fix to multiprocessing so now py>3.7 working #83
- Python
Published by nextgenusfs almost 5 years ago
amptk - amptk v1.5.2
Bug fix release
- fix for merging PE reads with
vsearch - fix dereplicate function in
database#77 - fix for
filterif--negativespassed but no mock community barcode - fix for
taxonomywhere custom databases not working with the-dflag #80 - add
-p illumina3toSRA-submit
- Python
Published by nextgenusfs about 5 years ago
amptk - amptk v1.5.1
- to support OSX Catalina dropping 32-bit applications, I've removed
usearch9as a strict dependency.vsearchwill be the default for all processing steps, including taxonomy assignment. Along these lines, I've dropped UTAX classifier as "default" in the hybrid taxonomy method. - dropping usearch, required addition of a few new dependencies:
mafft,fasttree, and the python packagepyfastx(for speed simplicity). - added support for PacBio CCS reads. Reads can be processed with
amptk pacbioand then clustering withamptk pb-dada2 - several bug fixes
- fix embarrassing typos in v1.5.0 -- do not use v1.5.0 its broken.
- apparently it does not work in python 3.8 -- investigating that for future release/support.
- working on ONT (Oxford Nanopore) support as well -- stay tuned for this in near future.
- Python
Published by nextgenusfs over 5 years ago
amptk - amptk v1.4.2
- add
--pseudopooloption toamptk dada2
- Python
Published by nextgenusfs over 6 years ago
amptk - amptk v1.4.1
- bug fix for
amptk summarizeif taxonomy not present #57 - add FASTA extraction step to install #59
- add chimera detection options to dada2 #60
- fix DADA2 for ion torrent data, in newer versions of dada2 the quality score issue has been fixed, so can now use quality scores during sample inference.
- Python
Published by nextgenusfs almost 7 years ago
amptk - amptk v1.4.0
- fix for
amptk summarize--> rewrote script - large update to taxonomy assignment, switched to vsearch for all global alignment steps
- update to all taxonomy databases, the downloads are larger but the data should be more robust
- update to the COI bold2utax method in docs
- update to
amptk databaseand the command line options - bug fixes along the way
- Python
Published by nextgenusfs over 7 years ago
amptk - AMPtk v1.3.0
This is a major reorganization of the code so it is "properly packaged" and can be installed with pip and hopefully conda. After v1.2.4 there were apparently changes to conda that would not allow the scripts to build with the previous code organization. I've then also fixed a few of the bugs that have shown up more recently, including:
* The py2 error in the bold2amptk.py accessory script https://github.com/nextgenusfs/amptk/issues/40
* the edlib version bug reported several times after edlib updated how it stores the version info https://github.com/nextgenusfs/amptk/issues/46
- Python
Published by nextgenusfs over 7 years ago
amptk - amptk v1.2.5
- bug fix where
edlibversion was not properly parsed due to changes upstream, fix is backwards compatible and hopefully future version compatible as well. This was error in #40 and several other "offline" emails. - update to
amptk statsto generate interactivehtmloutput for NMDS output, allowing users to identify which samples correspond to which point on the graph. This requiresr-plotly, r-htmltools, and r-dt-- however should be all installed through bioconda - bug fixes for
amtpk SRA-submit - improve error/logging in
amptk taxonomy - fix R scripts to be compatible with R>3.4.1 -- somehow different way of parsing the command line arguments. Also removed the "auto-install" function in the R scripts -- this was meant to be a convenience but didn't always work.
- Python
Published by nextgenusfs over 7 years ago
amptk - amptk v1.2.4
- trying to fix tab error that bioconda didn't like, version bump accordingly.
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.2.3
- fix auto-detect of base name output files in clustering scripts
- bump version
- update citation as now published in PeerJ https://peerj.com/articles/4925/
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.2.2
- fix tab indentation bug for py3
- add minimum length and trimming length to log file and terminal output.
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.2.1
- numerous bug fixes: including a fix for
--require_primer offinamptk illumina. Several bug fixes inamptk illumina2andamptk illumina3pre-processing steps. - thread (processes) control added for clustering steps
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.2.0
- for all Illumina pre-processing methods the order of steps has been changed to now 1) search for primers/trim and then 2) merged PE reads. This ensures that all primer/adapter sequences are trimmed from dataset, whereas previously AMPtk releases merged PE reads first - because of how usearch/vsearch merge PE reads, occasional primer/adapter sequences were slipping through the pre-processing steps. Also reads with multiple primer hits (i.e. two forward primers) are now discarded. While this results in the pre-processing steps being slightly slower in runtime, it increases the data quality downstream.
- read orientation is tested/fixed on the fly for
amptk illumina2workflow (barcodes/primers in reads). Some datasets of 50/50 read orientation. - added a check for "inverted OTUs" for all denoising/clustering steps. This was largely a check to validate that changing illumina pre-processing steps were working correctly (as unintended result was a small number of OTUs that were on the "crick" strand)
- Default mapping file now has a 'RevBarcodeSequence' column. Mapping files used for
amptk illumina2enforce the paired barcode sequences in the mapping file. If barcode fasta files are given, then all combinations of barcodes (5' and 3') are saved. - a few py27/36 bug fixes
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.1.3
- update to
amptk taxonomyto alert user if sample names are duplicated - many fixes to unify the 'base name" output for many scripts
- remove colored date/time stamp for all platforms except Mac
- many fixes for py2/py3 compatibility
amptknow moved tobindirectory - will slightly change install
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.1.2
- update compatibility for py2/3
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.1.1
- bug fixes for
amptk lulu, bug fixes foramptk taxonomyto better deal with 16S 8 level taxonomy - added some minor functions to
amptk filter - improvement of
amptk illumina3as well as allowing--barcode_rev_compto reverse complement barcode sequences, i.e. if indexing was done on reverse primer. mapping file is not required now can also pass primers and a barcode faster file for demuxing. - clean up repo slightly, trying to get conda recipe working.
- Python
Published by nextgenusfs about 8 years ago
amptk - amptk v1.1.0
- bug fix for
amptk filterand the subtract feature https://github.com/nextgenusfs/amptk/issues/31 - fix menu option https://github.com/nextgenusfs/amptk/issues/33
- added LULU module for OTU curation, see more here: https://www.nature.com/articles/s41467-017-01312-x
- LULU usage:
amptk cluster-->amptk filter-->amptk lulu-->amptk taxonomy-->amptk stats - added indicator species analysis to
amptk stats - several bug fixes for
amptk stats - added option to drop specific OTUs prior to running
amptk stats
- Python
Published by nextgenusfs over 8 years ago
amptk - amptk v1.0.3
- update the ITS database to use newest version of UNITE database version 01.12.2017.
- Python
Published by nextgenusfs over 8 years ago
amptk - amptk v1.0.2
- bug fix for
amptk showif non gzipped file passed it was removed - added gzip support for
amptk sample - bug fix for
amptk filterif OTUs and OTU table did not overlap 100% then script would die, added sanity check - changed
--col_orderinamptk filterto be a space separated list (was previously comma separated with no space).
- Python
Published by nextgenusfs over 8 years ago
amptk - amptk v1.0.1
- minor bug fixes for
amptk database - update to
amptk filterto only label potential chimera OTUs if--calculate allis passed, i.e. you are using a synthetic mock that won't be in the rest of your samples
- Python
Published by nextgenusfs over 8 years ago
amptk - amptk v1.0.0
- now requires edlib v1.2.1 -> thanks to Edlib developer (Martin Šošić) for finding the bug that was preventing full usage of edlib in
amptk. primer and barcode searches now very fast. - added a new phyloseq module,
amptk statswhich will run some preliminary community ecology stats on your BIOM output file. requires R and phyloseq - added support for UNOSIE3 via the
amptk unoise3command, note you will need to have USEARCHv10 for this script to work. - updated global alignment taxonomy search to better deal with multiple hits in the reference database
- updated ITS reference database as well as COI database.
- finally wrote some more comprehensive documents, located at http://amptk.readthedocs.io
- several minor bug fixes
- Python
Published by nextgenusfs over 8 years ago
amptk - amptk v0.10.3
- bug fixes for
amptk database. Rewrote the dereplication function and added--lcaor last common ancestor function for building the UTAX databases. - bug fixes for
amptk SRA-submitand update to edlib for searching. - bug fix for
amptk filterwhere samples passed to--dropare not used for index-bleed calculation - bug fix for pre-processing and
--mult_samplesargument
- Python
Published by nextgenusfs over 8 years ago
amptk - amptk v0.10.2
- bug fix for
-t, --thresholdoption ofamptk selectandamptk remove. Bug was introduced during last update when support for compressed files was introduced - AMPtk now supports degenerate nucleotide primer matching, thanks to the very fast edlib v1.2 library. You will need to have at least edlib v1.2.0, can upgrade with pip, i.e.
pip install -U edlib. The scripts will check your edlib version during runtime and let you know if you need to upgrade.
- Python
Published by nextgenusfs almost 9 years ago
amptk - amptk v0.10.1
- bug fix for
amptk filterwhere some mock sequences were not being annotated correctly - upgrade
amptk SRA-submitto use edlib alignment - fix menu in several places
- Python
Published by nextgenusfs almost 9 years ago
amptk - amptk v0.10.0
- Major update is to use edlib library for alignment, this is a dramatic increase in speed, however downside is that degenerate nucleotides are not supported in edlib currently (hoping to get this fixed soon). You can increase
--primer_mismatchto allow for degenerate matches, keep in mind that currently any degenerate nucleotide will be counted as a mismatch. v0.9.3 still supports degenerate nucleotides, although the alignment is much less accurate and is 10X slower. - edlib alignment now supports barcode_mismatches as well without a loss in speed.
- update to MergePE function, which allows user to select either vsearch or search for merging paired end fastq files, controlled via
--merge_method. Update to phiX filtering to split files if >3GB to avoid memory problem in USEARCH 32 bit. - add
amptk illumina3method for pre-processing, this will demultiplex Illumina PE files along with index read files - support for gzipped input files, as well as now default will output fq.gz demuxed files. Save space during processing.
- updated docker container and install instructions
- Python
Published by nextgenusfs almost 9 years ago
amptk - amptk v0.9.3
- remove bedtools as dependency for converting BAM -> FASTQ. Now AMPtk will first try to use samtools if it exists, then bedtools if it exists, and will default to pybam native python parser to convert. Pybam is 10X slower than samtools, but is written in python thus no extra dependencies needed.
- added threshold filtering to
amptk removeandamptk select, so you could remove all samples with reads less than 5000 by running,amptk remove -i input.demux.fq -t 5000 -o output.demux.fq
- Python
Published by nextgenusfs about 9 years ago
amptk - amptk v0.9.2
- bug fix for
amptk dada2denoising where reads were getting ignored if they contained any ambiguous nucleotides. The filter for ambiguous nucleotides is still maintained prior to DADA2, note that terminal N's from padding will be properly removed, only internal ambiguous nucleotides are not allowed in DADA2
- Python
Published by nextgenusfs about 9 years ago
amptk - amptk v0.9.1
- Add phix filtering for Illumina data. As part of the PE merging function in
amptk illuminaandamptk illumina2, scripts will also now run phix removal. - Workaround for DADA2 error where samples that only have 1 read post filtering trigger a
derep$quals matrixerror.amptk dada2now has-m, --min_readsoption to drop samples that have fewer than-mreads. Default this is set to 10, however, in practice probably this should be much higher, but this should avoid the above error.
- Python
Published by nextgenusfs about 9 years ago
amptk - amptk v0.9.0
- added better support for
amptk SRA-submit - added ability to normalize heat map
- added
amptk SRAwhich can be used to process reads downloaded from the SRA, where they are in a single FASTQ file, i.e. from ION or 454 data that has been demultiplexed into samples and then submitted. - created Dockerfile for using
amptkwith thescipy-notebookjupyter notebook server.
- Python
Published by nextgenusfs about 9 years ago
amptk - amptk v0.8.8
- unify the output naming files from UNOISE2 and DADA2 "clustering" output.
- Python
Published by nextgenusfs over 9 years ago
amptk - amptk v0.8.7
- support for new DADA2 algorithm allowing variable length reads, must have > v1.3.3.
- Python
Published by nextgenusfs over 9 years ago
amptk - amptk v0.8.6
- add
amptk dropto remove OTUs from a dataset and then create an updated OTU table - fix for
amptk illuminawhere empty files would cause script to terminate - fix for biom output to explicitly be json
- fix in
amptk removeto allow fasta output
- Python
Published by nextgenusfs over 9 years ago
amptk - amptk v0.8.5
- bug fixes for pre-processing steps where short primer-dimers could make it through filtering, get padded with N's and get incorporated as OTUs in clustering
- update to
amptk filterto output the final OTU table to have real read counts as opposed to "pseudo" counts from normalization. Filtering is done with normalization, but now read counts are restored to original read numbers. Important for downstream stats like beta diversity - improved read summary reporting in pre-processing steps
- update to
amptk unoise2to output both inferred or denoised sequences/tables as well as biological OTU sequences (clustered at 97%).
- Python
Published by nextgenusfs over 9 years ago
amptk - amptk v0.8.0
- package has undergone a name change to reflect changes in the scripts. Originally the project started as essentially a wrapper for UPARSE and thus relied heavily on USEARCH. Coupled with originally supporting fungal ITS sequences, it was named UFITS (usearch fungal ITS). However, the current implementation of AMPtk relies very little on USEARCH and can support any amplicon based NGS dataset. Out of the box the following DB are packaged: fungal ITS, fungal LSU, 16S, insect/animal COI. Thus I feel that
amptkis a better name that describes what the scripts do. - option
-p, --padwas added foramptk ion,amptk illumina,amptk illumina2, andamptk 454to allow user to turn off the padding with Ns to the--trim_len - option
-c,--calculatewas added toamptk filterto control how the script calculates index-bleed. By default it calculates index-bleed into the mock community sample (-b) as well as out of the mock community into the rest of the samples. However, if members of the mock community are found in your samples, this calculated number is wrong, so if any members of your mock community are plausibly found in samples that you are sequencing, then you should use the--calculate inoption. - packaged databases had to be moved to a different sharing location (USDA now prevents use of dropbox), so they are now on Box, however it seems like the download speed is quite a bit slower. If anybody has recommendations for a free place to host these databases let me, need about 1 GB of space and need to be able to access with a directly link from the command line.
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.7.4
- move the mergereads function to general library
- better reporting for merge illumina reads for both
ufits illuminaufits illumina2 - fix for
ufits illuminato only require primer if amplicons are longer than the read length. This is to prevent amplicons that are shorter than the read length to be discarded as they are automatically trimmed/merged viausearch -fastq_mergepairstool (and I can't change this). So the default behavior now is to require a forward primer via--require_primer onsetting only if the amplicon length is longer than the read length. Read length is calculated automatically via sampling the first 50 reads, the automatic detection is overruled by the--read_lengthoption
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.7.3
- fix critical bug in
ufits illuminaprocessing of reads where if reverse primer was not found read would be discarded
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.7.2
- update to
ufits taxonomyto allow for taxonomy to be calculated elsewhere, pass the-t, --taxonomyoption and a 2 column tsv file, OTUTaxonomy - update to progress/multiprocessing steps
- re-write demultiplexing steps for faster processing
- support gzip files in
ufits illumina2 - options in
ufits filterfor how the threshold is calculated for index-bleed filtering
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.7.1
- bug fix in
ufits illuminawhere R2 reads were not getting trimmed correctly.
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.7.0
I bumped versions here to illustrate that UFITS has changed a little under the hood, now requires at least USEARCH v9.1.13 and requires at least VSEARCH v2.20. These changes were made to maximize speed and simplify the code. The scripts will terminate if they detect lower versions of both of these software tools. BIOM, RDP, Blast are still "soft dependencies".
- bug fix in ufits taxonomy where RDP taxonomy was not processed correctly for BIOM conversion
- fix for https://github.com/nextgenusfs/ufits/issues/14
- fix for https://github.com/nextgenusfs/ufits/issues/13 by moving to VSEARCH for this task
- fix for https://github.com/nextgenusfs/ufits/issues/12, now ufits filter requires you to add a mock community fasta file --mc if you specify a -b, --barcode to filter your data on
- fix for ufits filter to deal with OTU tables that have taxonomy already appended
- fix for ufits cluster_ref where script would die after conversion to VSEARCH as hard dependency
- re-write of ufits heatmap to have a few more options and more flexibility.
- update to docs as well as a section showing how to get your data into downstream statistical tools
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.6.1
- bug fix in
ufits taxonomyif--tax_filterwas used the filtered OTU table would not be correct in the BIOM output file - fix for
ufits ionif using--mult_samplesnow creates mapping_file correctly. - updates to the docs on new usage
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.6.0
Several major changes in this version:
- taxonomy for ITS is updated to newest release from UNITE v7.1 11-20-2016.
- USEARCH9 is now supported throughout and defaults have been changed to use usearch9
- UNOISE2 algorithm is employed in a 'clustering' module
- SINTAX algorithm is supported in ufits taxonomy. default hybrid method now uses SINTAX, UTAX, and global alignment to infer the best taxonomy assignment.
- QIIME-like mapping files can now be used during demultiplexing/pre-processing. If you do not use a mapping file, the scripts will create one for you. The mapping file can be used to add metadata to it and then passed to ufits taxonomy to create a BIOM output file containing all metadata, OTUs, and taxonomy
- BIOM output of ufits taxonomy is compatible with QIIME, PHINCH, MetaCoMET, PhyloSeq, etc.
- ufits filter now alerts user if passing a barcode name via -b is not found in OTU table
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.6
- updated UFITS with better logging for external programs, so now log file should be more informative if you run into any errors. this will help me diagnose the problem.
- bug fix for
ufits dada2where script would die if--uchime_refoption not passed
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.5
- update to COI database. Previous version had some mistakes in re-formating the BOLD database. Scripts and workflow on how this database was constructed is available here
- updates to
ufits dada2pipeline. Script will now also create bOTUs (biological OTUs) as the DADA2 output is sensitive to 1 bp, thus a single "species" may be spread out over several iSeqs. Therefore, to accommodate downstream community ecology statistics, these iSeqs are clustered at a set threshold (-p, --pct_otu) to collapse "species" into OTUs. - updates to
ufits data2so that it builds an OTU table in same manner asufits cluster, i.e. original reads are mapped to iSeqs (as opposed to DADA2 only using quality filtered data for OTU table generation).
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.4
- added support for reference chimera filtering in the
ufits dada2OTU picking method
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.3
- improve the terminal output of
ufits dada2as well as the Rscript logging
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.2
- update to Rscript running DADA2 to auto install the required R package if missing, this will only be done if package is missing
- minor updates to output to terminal in
ufits dada2
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.1
- new module to support DADA2 inferred sequences "clustering" method. Reads must be the same length, so may not be ideal for fungal ITS sequences (or other variable length amplicons). The script is run with
ufits dada2and uses the output from any of the ufits pre-processing commands, i.e.ufits ion,ufits illumina,ufits 454, etc.
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.5.0
- update to
ufits clusterto correctly output chimeras detected during clustering and reference filtering - update to automatically detect delimiters in parsing OTU tables
- enhancement to allow for barcode mismatches in the
ufits illumina2script - note the current implementation is slow and for 99% of uses I don't recommend setting barcode mismatches > 0.
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.9
- remove requirement of otu table for
ufits taxonomy - add support for dual barcodes for
ufits illumina2andufits 454 - default merge PE reads for
ufits illumina2now rescues the forward reads if the pair cannot be merged
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.8
- fix bug in
ufits illuminawhere period (.) where not processed correctly in sample names - fix bug in
ufits filterwhen passing the--cleanupoption. Totally dumb mistake.... - Add support for creating BIOM v2.1 OTU tables if you have the
biompackage installed. Also added a unified taxonomy 2-column output for each OTU. - added
--cleanupoption toufits illumina
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.7
- fix for pre-processing reads from Illumina platform where custom sequencing primers are used,
ufits illuminacommand now handles those datasets better. Note I would not recommend using custom primers, but that is of course up to you... - minor update to docs
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.6
- some bug fixes related to use of USEARCH9, UFITS now supports v9.0.2124 (note: do not use 9.0.2123 has error in -cluster_otus command)
- updates to how UFITS outputs the system info, now cleaner with more info on the OS
- fix bug on failed import for
ufits keepandufits remove
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.5
*bug fix in ufits taxonomy where would output csv file - this is only output tsv now
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.4
- bug fix related to downloading databases
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.3
- update to Databases: UNITE v7.1, LSU (based on RDP data), 16S (based on RDP gold), COI (based on BOLD database of arthropods and chordates).
- databases are installed now with
ufits install -i ITS LSU 16S COI - Robert Edgar now says that chimera reference filtering should use largest database (as opposed to UCHIME paper that says a small curated database is better), thus chimera reference filtering is now configured to do exactly that, options during clustering are:
ufits cluster --uchime_ref [ITS,16S,LSU,COI, custom_path] ufits cluster_refhas been updated with the above database information - note this script is still experimental and I would not recommended using it for any environmental data at this point - there may be some targeted usage where it is appropriate- re-write of
ufits taxonomyto coincide with updates to databases, now you can pass one of the pre-installed databases to the-d, --dbflag or you can specify manually a database using--fasta_db, --usearch_db, and/or --utax_db
- Python
Published by nextgenusfs over 9 years ago
amptk - ufits v0.4.2
- bug fix to
ufits keepandufits remove - bug fix to
ufits taxonomywhere passing a--fasta_dbfailed - update to README
- Python
Published by nextgenusfs almost 10 years ago
amptk - ufits v0.4.1
- update to menu of
ufits taxonomyto be more consistent with rest of the scripts,-ifor input OTU table,-ffor input fasta - improve flexibility of
ufits taxonomyto work with other groups, removed--only_fungiand replace with--tax_filter, i.e.--tax_filter Fungi. - fix menu in wrapper scripts to reflect changes
- Python
Published by nextgenusfs almost 10 years ago
amptk - ufits v0.4.0
- improvement to
ufits filterscript to handle some of the filtering more gracefully and added a few more options - added ability to supply a list of samples/barcodes to
ufits removeandufits selectto make more flexible - add new clustering module based on reference based clustering called
ufits cluster_ref, it functions by quality trimming, dereplicating, chimera filtering, mapping to reference database. It can then also rescue unmapped reads and run de novo clustering on them followed by UTAX reference based clustering. I think that standard de novo clustering is superior to this approach but in some fringe cases it may be useful.
- Python
Published by nextgenusfs almost 10 years ago
amptk - ufits v0.3.16
- slight modification to
ufits showthat also shows read lengths - minor bug fix in
ufits taxonomythat now properly exits if databases not installed - changed the way
ufits filtercalculates index bleed from a sum of all counts per OTU, to now using the maximum value per sample. - upgrade to
ufits metato also allow for splitting up data by taxonomy classification (if taxonomy from UFITS default method)
- Python
Published by nextgenusfs almost 10 years ago
amptk - ufits v0.3.14
- added
ufits showto count barcodes from demuxed data - fixed bug in
ufits filterif try to use-s autooption without providing a--mock_barcode
- Python
Published by nextgenusfs almost 10 years ago
amptk - ufits v0.3.13
- bug fix where clustering log file was getting overwritten
- Python
Published by nextgenusfs almost 10 years ago
amptk - ufits v0.3.12
- bug fix to
ufits filterwhere the--col_orderoption now checks for samples before sorting to avoid error if sample passed but not in dataset, now it is ignored - bug fix to
ufits filterif using-sand--keep_mockoptions - script would die
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.11
- minor update to annotate mock community mapping to OTUs in
ufits filter
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.10
- bug fix where
ufits filterdid not rename fasta headers for OTUs that were mapped from the mock community - changed filename of output OTUs from ufits cluster to
basename.cluster.otus.fa
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.8
- added a check for
vsearchversion, need > v1.9.1
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.7
- updates to support looking at full length amplicons instead of trimming and padding. Only recommended to use if you know what you are doing
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.6
- minor fix of
ufits clusterusingvsearchwhere if read length--lengthwas longer than actual length of reads thenvsearchcrashes. Now scripts calculate read length and adjust appropriately - added some output to stdout for filter command to let users know where files are located
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.5
- minor fix to allow
vsearchto work with older Ion Torrent Data where sometimes quality scores are greater than 41.
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.4
- added support for
vsearchfor pre-processing reads as well as mapping reads to OTUs. Ifvsearchis installed, scripts will use it automatically, otherwise default to Python and/or usearch - fix bug in
ufits filterthat resulted in error if no index bleed filter or spike in barcode passed on argument line - also a home-brew formula to install ufits.
brew tap nextgenusfs/tapfollowed bybrew install ufitswill install the package as well asbedtoolsandvsearchtwo optional dependencies.usearchmust still be installed manually.
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.3.3
- some bug fixes and updated docs
- fixed the local blast search for
ufits taxonomy - fixed
ufits filterto work with any mock community fasta file - update
--uchime_refto be able to use custom database - fix logging in
ufits databasecommand
- Python
Published by nextgenusfs about 10 years ago
amptk - UFITS v0.3.2
A somewhat major bug fix: the de-multiplexing script for Ion Torrent data was not removing reverse primer correctly. This has been fixed.
- complete re-write of the ufits filter script to now normalize counts per number of reads in each sample prior to running the --index_bleed filter to remove noise from the dataset.
- ufits filter script now can deal with synthetic mock spike-in control (set as default). Calculates index-bleed in both directions and smartly filters OTU table. (email me for details on synthetic mock if you are interested).
- Taxonomy database was updated to most recent UNITE release
- Taxonomy database installation method is changed, now downloads pre-formatted databases for ITS1, ITS2, and FULL length ITS sets.
- Updated UCHIME reference sequences
- New functionality for ufits taxonomy that allows for removal of non-fungal OTUs with the --only_fungi argument
- Added script for sub-sampling or rarefaction of data prior to clustering
- Added script for selecting or removing samples from a de-mulitplexed data set
- Added script to append an OTU table to meta data file
- Added support for FUNGuilds functional annotation of taxonomy in OTU table.
- Python
Published by nextgenusfs about 10 years ago
amptk - ufits v0.2.8
- improvement of
ufits databasecommand for processing the taxonomy information for training UTAX - added support for illumina data that is in a single file, i.e. with similar read setup as 454/ion
ufits illumina2. I've seen this type of data from MrDNA service - updated
ufits installcommand to by default train UTAX for full length, ITS1, and ITS2 forufits taxonomy - minor bug improvements
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits v0.2.6
- update to support Roche 454 reads (SFF or FASTA/QUAL as input) and need a fasta file of barcodes used
- minor bug fixes
ufits installnow skips primer trimming for UNITE+INSD database as it takes a lot of time without any added benefit because global search uses full length of the OTU and db doesn't matter if it is trimmed or not
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits v0.2.5
- Several changes to syntax for commands to make easier to use.
- Added an
installoption to download, format, and create taxonomy databases from UNITE - by default
ufits taxonomynow uses a hybrid approach (UTAX and USEARCH) to get most out of taxonomy from legitimate hits. If USEARCH hit is < 97% identical, then UTAX is used. If hit is > 97% then taxonomy from UTAX vs USEARCH is compared and whichever result has more levels of taxonomy is used.
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits v0.2.4
- minor bug fixes
- update to include
ufits summarizecommand that generates taxonomy level OTU tables as well as makes a stacked Bar graph of each level of taxonomy in the dataset.
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits v0.2.3
- update to allow for changing the UTAX confidence threshold for taxonomy (0 to 0.99), default is 0.8 or 80%
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits.v0.2.1
- Major update that now supports taxonomy. You can assign taxonomy using the UTAX Classifier or using a more classical "blast" like search using USEARCH and a compatible database
- introduce
ufits taxonomy,ufits download, andufits databasecommands for assigning taxonomy - several minor bug fixes
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits.v0.1.2
- added support for using only the forward (R1) reads from Illumina data
- updated documentation to support changes and version number.
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits.v0.1.1
- Some minor bug fixes and updated documentation
- Added support for drawing heatmap from OTU table, will require matplotlib, numpy, and pandas
```
install the dependencies with pip
pip install matplotlib numpy pandas ```
- Python
Published by nextgenusfs over 10 years ago
amptk - ufits.v0.1.0
Initial release of UFITS package.
- Python
Published by nextgenusfs over 10 years ago