https://github.com/danforthcenter/falcon2fastg
Falcon2Fastg is a tool for converting a FALCON assembly to FASTG format to visualize with Bandage
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 7 committers (28.6%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Repository
Falcon2Fastg is a tool for converting a FALCON assembly to FASTG format to visualize with Bandage
Basic Info
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Falcon2Fastg
This software converts the results of PacBio assembly using FALCON, to a FASTG graph that can be visualized using Bandage.
Usage
python Falcon2Fastg.py [--only-output=reads|contigs]
This can be run in the output directory of FALCON assembly (2-asm-falcon). Please make sure to copy the preads4falcon.fasta file from the intermediate directory (1-preads_ovl) to the output directory (2-asm-falcon)
Falcon2Fastg needs the following 6 input files:
preads4falcon.fasta
sgedgeslist
utg_data (if
--only-outputis unset, or set tocontigs)ctg_paths (if
--only-outputis unset, or set tocontigs)p_ctg.fa (if
--only-outputis unset, or set tocontigs)pctgtiling_path (if
--only-outputis unset, or set tocontigs)
Dependencies :
Biopython (available at http://biopython.org/wiki/Download)
pyfaidx (available at https://github.com/mdshw5/pyfaidx)
Quick installation of dependencies:
pip install biopython pyfaidx # add --user if you don't have root
Output :
The output of the tool is two FASTG files (reads.fastg and contigs.fastg) that can be opened with
Bandage.
Additionally, the tool produces a CSV file : ReadsInContigs.csv that can be loaded with Bandage. This labels the reads according to the contigs that they are a part of, along with the mapping position within the contig.

Above is a sample Bandage visualization of a reads.fastg file generated by
Falcon2Fastg from a FALCON assembly (a plant mitochondrial genome).
- Each node is a read, and each node is represented as a colored strip (colors are random)
- Edges represent the overlaps between reads found by FALCON (better viewed in the zoomed-in image below)
- Only the edges used in the string graph ("G" flagged in sgedgeslist) are used by Falcon2Fastg to produce the output file.
Zooming in on a smaller set of nodes shows the edges in black, connecting the colored nodes :

For benchmarking, Falcon2Fastg was run on the preads4falcon.fasta and sgedgeslist file produced by the E.coli test dataset provided with the Falcon install. Instructions on obtaining the dataset are here : https://github.com/PacificBiosciences/FALCON/wiki/Setup:-Complete-example
Execution of Falcon2Fastg took 2 minutes on a desktop computer (size of preads4falcon.fasta: 449 MB).
The figure below represents a visualization of this E. coli data.

Contigs visualization
Falcon2Fastg can also be used to visualize the contigs produced by FALCON, and overlaps between them. The contig graph is created in contigs.fastg. By default, Falcon2Fastg will output this file. You can choose that it outputs only the reads graph using the --only-output=reads parameter.
To test this visualization mode, we assembled Drosophila melanogaster reads available at:
https://github.com/PacificBiosciences/DevNet/wiki/Drosophila-sequence-and-assembly
The input file was 2.2G in size (dmelFALCONpreassembled_reads.fasta).
FALCON assembly parameters were not optimized, and were as follows :
lengthcutoff = 3000, lengthcutoffpr = 6000, overlapfilteringsetting = --maxdiff 100 --maxcov 100 --mincov 20
The final p_ctgs.fa file had 642 contigs with total length ~27 Mbp.
Execution of Falcon2Fastg took 5 minutes on a desktop computer (size of preads4falcon.fasta: 2.2 GB).
The figure below is the visualization of these D. mel. contigs (colors are random)

Read density (approximate read coverage)
Bandage provides a way to visualize k-mer coverage, as reported by the assembler. As Falcon is a string graph assembler, it does not report such information. Ideally, to compute the coverage of a contig, one would need to re-map the reads back to the assembled contigs. Here, we report a more simple metric that is easy to compute from the output of Falcon.
Read density is calculated as (sum of length of all reads used by FALCON to construct the contig / length of contig). We believe that variation in read density reflects variation of coverage;
The figure below is a schematic of read density. The blue arrows represent reads that were used by Falcon to create the red (resp. black) contig. The contig above (black) has fewer reads within it. Its read density is around 2.0 The contig below (red) and has more reads within it. Its read density is around 5.0

The figure below is the visualization of the same D. mel. contigs, colored by read density.

Zooming in shows that bright red represents higher density (6.0x). Contigs colored black have a lower read density (2.0x)

Memory Warning
The pyfaidx module is used to read an entire FASTA file into memory. If the size of your preads4falcon.fasta is greater than the amount of available RAM, it is advisable to run this computation on a server with greater available memory.
Caveats :
Reads within "contained" unitigs are not used in the calculation of Read density.
Read density is calculated by dividing total length of all reads in the contig by length of each contig (obtained from ctgpaths). Depending on the orientation, Falcon ignores either the first read or the last read while reporting a contig. Due to this, in the contigs.fastg file, the forward and revcomp entries might have different read_densities and different lengths.
Any large differences are mostly restricted to short contigs, when one very long read at either extremity can affect the length of the contig.
- Read density is set to "1" for entries in reads.fastg, as this measure is only relevant for contigs.fastg
Testing :
Please see the test/ directory for a small example dataset and output
FALCON can be installed following the instructions here : https://github.com/PacificBiosciences/FALCON/wiki/Setup:-Complete-example
Other tools
Additional tools for visualizing read overlap can be found in the utils directory. Please consult utils/README.md for details
License
This content is released under MIT License. Please see LICENSE.md for details.
Authors
Primary author : Samarth Rangavittal, The Pennsylvania State University (szr165@psu.edu)
Rayan Chikhi, University of Lille 1
Jean-Stéphane Varré, University of Lille 1
Owner
- Name: Donald Danforth Plant Science Center
- Login: danforthcenter
- Kind: organization
- Location: St. Louis, MO
- Website: https://danforthcenter.org
- Repositories: 97
- Profile: https://github.com/danforthcenter
Our Mission: Improve the Human Condition Through Plant Science
GitHub Events
Total
Last Year
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| samarth | s****h@b****r | 33 |
| Samarth Rangavittal | s****5@b****u | 23 |
| Rayan Chikhi | r****i@e****g | 12 |
| Samarth Rangavittal | s****h@S****l | 10 |
| Samarth Rangavittal | s****l@g****m | 4 |
| afinit | m****1@g****m | 3 |
| Samarth Rangavittal | s****h@c****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: over 2 years ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0