aaftf

Automatic Assembly For The Fungi

https://github.com/stajichlab/aaftf

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: pubmed.ncbi, ncbi.nlm.nih.gov, plos.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Automatic Assembly For The Fungi

Basic Info
  • Host: GitHub
  • Owner: stajichlab
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 2.46 MB
Statistics
  • Stars: 25
  • Watchers: 3
  • Forks: 5
  • Open Issues: 16
  • Releases: 15
Created over 7 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License Citation Zenodo

README.md

AAFTF - Automatic Assembly For The Fungi

  • Authors: Jason Stajich and Jon Palmer*

AAFTF logo

Requirements

Most of these can be installed via conda packages. Noting that some tools have different samtools version expectations, which can lead to problems. In particular the bioconda install of samtools is v0.2 while the version expected for most other tools is v1.17. This can lead to some issues.

read aligners supporting

  • bwa - https://github.com/lh3/bwa
  • minimap2 - https://github.com/lh3/minimap2
  • bowtie2 - http://bowtie-bio.sourceforge.net/bowtie2/index.shtml (Optional)
  • BBTools - bbmap

QC and trimming

  • BBTools - https://jgi.doe.gov/data-and-tools/bbtools/ - supports read-level filtering for contamination and vector/primer
  • Trimmomatic - http://www.usadellab.org/cms/?page=trimmomatic (Optional)
  • fastp - alternative (preferred) read trimming and quality control https://github.com/OpenGene/fastp

Assemblers

  • SPAdes - http://cab.spbu.ru/software/spades/
  • megahit - https://github.com/voutcn/megahit
  • dipspades - (SPAdes 3.11.1 - note it is not part of later SPAdes packages) http://cab.spbu.ru/files/release3.11.1/dipspades_manual.html
  • NOVOplasty - https://github.com/ndierckx/NOVOPlasty for MT genome assembly
  • unicycler - https://github.com/rrwick/Unicycler (which runs spades)

Assembly Contamination screening support

  • sourmash (>=v3.5)- https://sourmash.readthedocs.io/ (install via conda/pip)
  • NCBI BLAST+ - ftp://ftp.ncbi.nlm.nih.gov/blast/executables/LATEST
  • ncbi-fcs (for vector screening) - https://github.com/ncbi/fcs/
  • ncbi-fcs-gx (for contaminant filtering, alternative to sourmash, requires large memory or SSD drive) https://github.com/ncbi/fcs/

Assembly polishing

  • polca (from MaSuRCA) - https://github.com/alekseyzimin/masurca (polca.sh polishing)
    • note that the polca use of samtools supports an old version and will not work with version of samtools installed by default To fix this apply the patch in patches/polca.patch to fix your local version or copy patches/polca.sh to replace version installed in your environment or system.
  • Pilon - https://github.com/broadinstitute/pilon/wiki
  • NextPolish - https://github.com/Nextomics/NextPolish

Authors

Citation

Palmer JM and Stajich JE. (2023). Automatic assembly for the fungi (AAFTF): genome assembly pipeline (v0.5.0). Zenodo. doi: 10.5281/zenodo.1620526

Install

We are working on simplifying the install, ie getting on Pypi and bioconda. Currently you could create conda environment and install like this:

conda create -n aaftf -c bioconda "python>=3.6" bbmap trimmomatic bowtie2 bwa pilon sourmash \ blast minimap2 spades megahit novoplasty biopython fastp masurca unicycler A challenge has been older version of samtools tied to some of the dependencies while AAFTF prefers samtool >= 1.0. If you can install samtools >1.22.1 for example after installing these depenendicies or via a separate env that can help ensure the markduplicates step can still be run. However this is relatively minor.

There is a slight performance improvement if you can run the later samtools as it does not require writing temp unsorted BAM files to disk.

And then install this repo with git/pip:

``` $ conda activate aaftf $ pip install AAFTF

or install latest from github

$ python -m pip install git+https://github.com/stajichlab/AAFTF.git ```

To install the sourmash database you need to set a place to store your AAFTF databases ``` $ mkdir -p ~/lib/AAFTFDB # or make a place that is systemwide $ export AAFTFDB=~/lib/AAFTF_DB

fill in download procedure / add to AAFTF

```

To run ncbi-fcs or ncbi-fcs-gx in AAFTF through singularity will need to have that installed in system or environment. The fcs gx database will need to be downloaded and requires large memory machines. More instructions coming for simplicity of install/testing.

Notes

This is partially a python re-write of JAAWS which was a unix shell based cleanup and assembly tool written by Jon.

Steps / Procedures

  1. trim Trim FASTQ input reads - with BBMap
  2. mito De novo assemble mitochondrial genome
  3. filter Filter contaminanting reads - with BBMap
  4. assemble Assemble reads - with SPAdes
  5. vecscreen Vector and Contaminant Screening of assembled contigs - with BlastN based method to replicate NCBI screening 6a. sourpurge Purge contigs based on sourmash results - with sourmash 6b. fcsgxpurge Purge contigs based on NCBI fcs-gx tool. Note this runs MUCH faster with large memory.
  6. rmdup Remove duplicate contigs - using minimap2 to find duplicates
  7. pilon Polish contig sequences with Pilon - uses Pilon
  8. sort Sort contigs by length and rename FASTA headers
  9. assess Assess completeness of genome assembly
  10. pipeline Run AAFTF pipeline all in one go.

Typical runs

Trimming and Filtering

Trimming options spelled out: ``` usage: AAFTF trim [-h] [-q] [-o BASENAME] [-c cpus] [-ml MINLEN] -l LEFT [-r RIGHT] [-v] [--pipe] [--method {bbduk,trimmomatic}] [-m MEMORY] [--trimmomatic trimmomatic_jar] [--trimmomaticadaptors TRIMMOMATICADAPTORS] [--trimmomaticclip TRIMMOMATICCLIP] [--trimmomaticleadingwindow TRIMMOMATICLEADINGWINDOW] [--trimmomatictrailingwindow TRIMMOMATICTRAILINGWINDOW] [--trimmomaticslidingwindow TRIMMOMATICSLIDINGWINDOW] [--trimmomaticquality TRIMMOMATICQUALITY]

This command trims reads in FASTQ format to remove low quality reads and trim adaptor sequences

optional arguments: -h, --help show this help message and exit -q, --quiet Do not output warnings to stderr -o BASENAME, --out BASENAME Output basename, default to base name of --left reads -c cpus, --cpus cpus Number of CPUs/threads to use. -ml MINLEN, --minlen MINLEN Minimum read length after trimming, default: 75 -l LEFT, --left LEFT left/forward reads of paired-end FASTQ or single-end FASTQ. -r RIGHT, --right RIGHT right/reverse reads of paired-end FASTQ. -v, --debug Provide debugging messages --pipe AAFTF is running in pipeline mode --method {bbduk,trimmomatic} Program to use for adapter trimming -m MEMORY, --memory MEMORY Max Memory (in GB) --trimmomatic trimmomaticjar, --jar trimmomaticjar Trimmomatic JAR path

Trimmomatic options: Trimmomatic trimming options

--trimmomaticadaptors TRIMMOMATICADAPTORS Trimmomatic adaptor file, default: TruSeq3-PE.fa --trimmomaticclip TRIMMOMATICCLIP Trimmomatic clipping, default: ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 --trimmomaticleadingwindow TRIMMOMATICLEADINGWINDOW Trimmomatic window processing arguments, default: LEADING:3 --trimmomatictrailingwindow TRIMMOMATICTRAILINGWINDOW Trimmomatic window processing arguments, default: TRAILING:3 --trimmomaticslidingwindow TRIMMOMATICSLIDINGWINDOW Trimmomatic window processing arguments, default: SLIDINGWINDOW:4:15 --trimmomaticquality TRIMMOMATICQUALITY Trimmomatic quality encoding -phred33 or phred64 ```

Example usage: ``` MEM=128 # 128gb BASE=STRAINX READSDIR=reads TRIMREAD=readstrimmed CPU=8 AAFTF trim --method bbduk --memory $MEM -c $CPU \ --left $READSDIR/${BASE}R1.fq.gz --right $READSDIR/${BASE}_R2.fq.gz \ -o $TRIMREAD/${BASE}

this step make take a lot of memory depending on how many filtering libraries you use

AAFTF filter -c $CPU --memory $MEM --aligner bbduk \ -o $TRIMREAD/${BASE} --left $TRIMREAD/${BASE}1P.fastq.gz --right $TRIMREAD/${BASE}2P.fastq.gz ```

Assembly

The specified assembler can be made through the --method option. The full set of options are below.

``` usage: AAFTF assemble [-h] [-q] [--method METHOD] -o OUT [-w WORKDIR] [-c cpus] [-m MEMORY] [-l LEFT] [-r RIGHT] [-v] [--tmpdir TMPDIR] [--assemblerargs ASSEMBLERARGS] [--haplocontigs] [--pipe]

Run assembler on cleaned reads

optional arguments: -h, --help show this help message and exit -q, --quiet Do not output warnings to stderr --method METHOD Assembly method: spades, dipspades, megahit -o OUT, --out OUT Output assembly FASTA -w WORKDIR, --workdir WORKDIR assembly output directory -c cpus, --cpus cpus Number of CPUs/threads to use. -m MEMORY, --memory MEMORY Memory (in GB) setting for SPAdes. Default is 32 -l LEFT, --left LEFT Left (Forward) reads -r RIGHT, --right RIGHT Right (Reverse) reads -v, --debug Print Spades stdout to terminal --tmpdir TMPDIR Assembler temporary dir --assemblerargs ASSEMBLERARGS Additional SPAdes/Megahit arguments --haplocontigs For dipSPAdes take the haplocontigs file --pipe AAFTF is running in pipeline mode ```

CPU=24 MEM=96 LEFT=$TRIMREAD/${BASE}_filtered_1.fastq.gz RIGHT=$TRIMREAD/${BASE}_filtered_2.fastq.gz WORKDIR=working_AAFTF OUTDIR=genomes ASMFILE=$OUTDIR/${BASE}.spades.fasta mkdir -p $WORKDIR $OUTDIR AAFTF assemble -c $CPU --mem $MEM \ --left $LEFT --right $RIGHT \ -o $ASMFILE -w $WORKDIR/spades_$BASE

vectrim

CPU=16 MEM=16 LEFT=$TRIMREAD/${BASE}_filtered_1.fastq.gz RIGHT=$TRIMREAD/${BASE}_filtered_2.fastq.gz WORKDIR=working_AAFTF OUTDIR=genomes ASMFILE=$OUTDIR/${BASE}.spades.fasta VECTRIM=$OUTDIR/${BASE}.vecscreen.fasta mkdir -p $WORKDIR $OUTDIR AAFTF vecscreen -c $CPU -i $ASMFILE -o $VECTRIM

Owner

  • Name: Jason Stajich Lab at UC Riverside
  • Login: stajichlab
  • Kind: organization
  • Email: jason.stajich@ucr.edu
  • Location: RIverside, CA

Evolutionary Genomics of Fungi projects, data and software

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Automatic Assembly For The Fungi
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Jason E.
    family-names: Stajich
    affiliation: University of California-Riverside
    email: jason.stajich@ucr.edu
    orcid: 'https://orcid.org/0000-0002-7591-0020'
  - given-names: Jonathan M.
    family-names: Palmer
    affiliation: USDA Forest Service
    orcid: 'https://orcid.org/0000-0003-0929-3658'
    email: nextgenusfs@gmail.com
license: MIT
repository-code: https://github.com/stajichlab/AAFTF
abstract: This is a tool for running automated genome assembly from short-reads to polished, cleaned, sorted assembly files.
version: 0.6.0
identifiers:
  - type: doi
    value: 10.5281/zenodo.1620526
date-released: 2025-08-01

GitHub Events

Total
  • Create event: 3
  • Release event: 1
  • Issues event: 10
  • Watch event: 4
  • Issue comment event: 19
  • Push event: 10
  • Pull request event: 3
Last Year
  • Create event: 3
  • Release event: 1
  • Issues event: 10
  • Watch event: 4
  • Issue comment event: 19
  • Push event: 10
  • Pull request event: 3

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 354
  • Total Committers: 6
  • Avg Commits per committer: 59.0
  • Development Distribution Score (DDS): 0.223
Past Year
  • Commits: 81
  • Committers: 2
  • Avg Commits per committer: 40.5
  • Development Distribution Score (DDS): 0.025
Top Committers
Name Email Commits
Jason Stajich j****d@g****m 275
Jason Stajich j****h@g****m 33
Jon Palmer n****s@g****m 32
Jason Stajich j****h@u****u 9
Cameron Gilchrist c****t@g****m 4
Jon Palmer n****s@g****m 1
Committer Domains (Top 20 + Academic)
ucr.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 29
  • Total pull requests: 10
  • Average time to close issues: 7 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 13
  • Total pull request authors: 3
  • Average comments per issue: 2.48
  • Average comments per pull request: 0.1
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 2
  • Average time to close issues: 6 months
  • Average time to close pull requests: less than a minute
  • Issue authors: 3
  • Pull request authors: 1
  • Average comments per issue: 0.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hyphaltip (16)
  • termithorbor (2)
  • llk578496 (1)
  • Okompath (1)
  • maruiqi0710 (1)
  • nextgenusfs (1)
  • nickolasmenezes (1)
  • alejorojas2 (1)
  • njliangdong (1)
  • rfitak (1)
  • Bambi3024 (1)
  • Gian77 (1)
  • bpeacock44 (1)
Pull Request Authors
  • hyphaltip (7)
  • nextgenusfs (2)
  • gamcil (2)
Top Labels
Issue Labels
enhancement (5) bug (2)
Pull Request Labels

Dependencies

environment.yml conda
  • bbmap
  • biopython
  • blast
  • bowtie2
  • bwa
  • diamond
  • fastp
  • minimap2
  • novoplasty
  • pilon
  • samtools
  • sourmash
  • spades
  • trimmomatic
requirements.txt pypi
  • biopython *
setup.py pypi
  • x.strip *