https://github.com/bacpop/unitig-counter
Uses cDBG to count unitigs in bacterial populations
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Uses cDBG to count unitigs in bacterial populations
Basic Info
- Host: GitHub
- Owner: bacpop
- License: agpl-3.0
- Language: C++
- Default Branch: master
- Size: 149 KB
Statistics
- Stars: 11
- Watchers: 5
- Forks: 2
- Open Issues: 4
- Releases: 0
Metadata Files
README.md
unitig-counter
Uses a compressed de Bruijn graph (implemented in GATB) to count unitigs in bacterial populations.
Details
This is a slightly modified version of the unitig and graph steps in DBGWAS software, repurposed for input into pyseer.
NB We cannot offer support for unitig-counter, it is provided 'as-is'. Please consider using unitig-caller instead, which offers the same functionality.
Citation
If you use this, please cite the DBGWAS paper:
Jaillard M., Lima L. et al. A fast and agnostic method for bacterial genome-wide association studies: Bridging the gap between k-mers and genetic events. PLOS Genetics. 14, e1007758 (2018). doi:10.1371/journal.pgen.1007758.
List of changes
- Changes the format of the output from
step1from bugwas matrix to pyseer input (Rtab or kmers). - Removes all code for
step2andstep3in DBGWAS. - Remove unused depencencies.
- Change installation procedure ready for bioconda.
Install
Recommended installation is through conda:
conda install unitig-counter
If the package cannot be found, ensure your channels are set up correctly for bioconda.
For compilation from source, see INSTALL.md.
Usage
Run:
unitig-counter -strains strain_list.txt -output output -nb-cores 4
Where strain_list.txt is a list of input files (assemblies) with a header, for example:
ID Path
6925_1_49 assemblies/6925_1#49.contigs_velvet.fa
6925_1_50 assemblies/6925_1#50.contigs_velvet.fa
Output is in output/unitigs.txt and can be used with --kmers in pyseer. You can also test just the
unique patterns in output/unitigs.unique_rows.txt with the --Rtab option.
Cleaning up output
Some unitigs in the output may span multiple input contigs. If you wish to restrict your unitig calls to those appearing in assembled contigs, you can either:
- Run unitig-caller on the input genomes, using the unitig calls from your run.
- Run the script in the
gatb/bcalmpackage, which will cut unitigs that span multiple contigs.
Thanks to @rchikhi and @apredeus for discovering and fixing this.
Extracting distances
Two get the shortest sequence distance between two unitigs:
cdbg-ops dist --graph test_data/graph --source GTAATAAACAAA --target AAAAAAAAAAGTTAAAAAT
Extending unitigs
Short unitigs can be extended by following paths in the graph to neightbouring nodes. This can help map sequences which on their own are difficult to align in a specific manner.
Create a file unitigs.txt with the unitigs to extend (probably your significantly associated hits)
and run:
cdbg-ops extend --graph output/graph --unitigs unitigs.txt > extended.txt
The output extended.txt will contain possible extensions, comma separated, with lines corresponding to unitigs
in the input. See the help for more options.
Python
A similar python script can be found in unitig-graph:
python unitig-graph/extend_hits.py --prefix output/graph --unitigs unitigs.txt > extended.txt
Owner
- Name: Bacterial population genetics
- Login: bacpop
- Kind: organization
- Email: contact@bacpop.org
- Location: United Kingdom
- Website: www.bacpop.org
- Repositories: 20
- Profile: https://github.com/bacpop
Pathogen Informatics and Modelling @ EMBL-EBI / Bacterial Evolutionary Epidemiology Group @ Imperial College London
GitHub Events
Total
- Issue comment event: 1
Last Year
- Issue comment event: 1