Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
2 of 5 committers (40.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary
Keywords from Contributors
Repository
Filter of Pairwise Alignement
Basic Info
Statistics
- Stars: 44
- Watchers: 3
- Forks: 5
- Open Issues: 1
- Releases: 6
Metadata Files
Readme.md
fpa Filter Pairwise Alignment 🧬 💻
Filter output of all-against-all read mapping, you filter or select:
- internal match
- containment
- dovetails
- self matching
- read name match against regex
- length of overlap
- length of read in overlap
For internal match, containment, dovetails definition go read algorithm 5 in minimap article
Rationale
Long Read mapping tools provides all match they found in read dataset, for many usage some of match aren't useful, this programme provide some filter to remove it. This soft can be replace by a simple script in awk, bash, python, ~perl~, {your favorite language}.
More details and some experiment are present in this blog post. We have evaluated the effects of some fpa filter on miniasm assemblies, you can find scripts and how to get the real data sets in this repository.
Usage
fpa -i <input> -o <output> <option> <subcommand: drop | keep | index | rename | gfa>
Subcommand can be split in two group: - filters (drop, keep), select wich overlap are write in output - generators (index, rename, gfa), generate new data from overlap
By default input and output are stdin and stdout so you can use like this:
minimap2 long_read.fasta long_read.fasta | fpa keep -d | gzip - > only_dovetail.paf.gz
minimap2 long_read.fasta long_read.fasta | fpa drop -l 500 -L 2000 > only_between_500_2000.paf
minimap2 long_read.fasta long_read.fasta | fpa drop -m -n read_1 > no_self_no_match_read_1.paf
minimap2 long_read.fasta long_read.fasta | fpa drop -m rename -o rename.csv > no_self_match_renamed.paf
minimap2 long_read.fasta long_read.fasta | fpa drop -m rename -o rename.csv gfa -o no_self_match_renamed.gfa > no_self_match_renamed.paf
minimap2 long_read.fasta long_read.fasta | fpa drop -l 500 index -t query -f match_upper_500.paf.idx query > match_upper_500.paf
minimap2 long_read.fasta long_read.fasta | fpa -o match_upper_500.paf.bz2 -z bzip2 drop -l 500 index -f match_upper_500.paf.idx -t target
Generators
Only the mapping passed the filters are analyse by generators
Rename
The rename subcommand replaces the name of the read with another one.
If you use -i option the file will be read as a two-column csv, the first column is the original name and the second corresponds to the new name:
original name1, new name1
original name2, new name2
If the name of the read does not exist in the file it will not be replaced.
If you use -o, the names will automatically be replaced by a number a file like above example will be created.
Index
fpa can build an index of offset of the records in the file where a reads appears.
The index file looks like this:
read_id1, start_of_range_1:end_of_range_1; start_of_range_2:end_of_range_2;…
read_id2, start_of_range_1:end_of_range_1; start_of_range_2:end_of_range_2;…
fpa can index read only when it's query (first read in record) or target (second read in record) or both of them.
Gfa
fpa can generate an overlap graph with overlap pass filters
Requirements
- Rust
- libgz
- libbzip2
- liblzma
Instalation
With cargo
If you have a rust environment setup you can run :
cargo install fpa_lr
With conda
fpa is avaible in bioconda channel
if bioconda channel is setup you can run :
conda install fpa
From source
``` git clone https://github.com/natir/fpa.git cd fpa git checkout v0.5.1
cargo build cargo test cargo install ```
Minimum supported Rust version
Currently the minimum supported Rust version is 1.56.0.
Citation
If you use fpa in your research, please cite the following publication:
Pierre Marijon, Rayan Chikhi, Jean-Stéphane Varré, yacrd and fpa: upstream tools for long-read genome assembly, Bioinformatics, btaa262, https://doi.org/10.1093/bioinformatics/btaa262
bibtex format:
@article {@article{Marijon_2020,
doi = {10.1093/bioinformatics/btaa262},
url = {https://doi.org/10.1093%2Fbioinformatics%2Fbtaa262},
year = 2020,
month = {apr},
publisher = {Oxford University Press ({OUP})},
author = {Pierre Marijon and Rayan Chikhi and Jean-St{\'{e}}phane Varr{\'{e}}},
editor = {Inanc Birol},
title = {yacrd and fpa: upstream tools for long-read genome assembly},
journal = {Bioinformatics}
}
Owner
- Name: Pierre Marijon
- Login: natir
- Kind: user
- Location: Paris
- Company: Seqoia
- Website: https://pierre.marijon.fr/link.html
- Twitter: pierre_marijon
- Repositories: 105
- Profile: https://github.com/natir
Citation (CITATION.cff)
# YAML 1.2
---
abstract: "Genome assembly is increasingly performed on long, uncorrected reads. Assembly quality may be degraded due to unfiltered chimeric reads; also, the storage of all read overlaps can take up to terabytes of disk space.We introduce two tools: yacrd for chimera removal and read scrubbing, and fpa for filtering out spurious overlaps. We show that yacrd results in higher-quality assemblies and is one hundred times faster than the best available alternative.https://github.com/natir/yacrd and https://github.com/natir/fpa.Supplementary data are available at Bioinformatics online."
authors:
-
affiliation: " Department of Computer Science , Inria, Univ. Lille, CNRS, Centrale Lille, UMR 9189 - CRIStAL, Lille F-59000, France"
family-names: Marijon
given-names: Pierre
orcid: "https://orcid.org/0000-0002-6694-6873"
-
affiliation: " Department of Computational Biology , Institut Pasteur, C3BI USR 3756 IP CNRS, Paris, France"
family-names: Chikhi
given-names: Rayan
orcid: "https://orcid.org/0000-0003-1099-8735"
-
affiliation: " Univ. Lille , CNRS, Centrale Lille, UMR 9189 - CRIStAL - Centre de Recherche en Informatique Signal et Automatique de Lille, F-59000 Lille, France"
family-names: "Varré"
given-names: "Jean-Stéphane"
orcid: "https://orcid.org/0000-0001-6322-0519"
cff-version: "1.1.0"
date-released: 2020-04-21
doi: "10.1093/bioinformatics/btaa262"
message: "If you use this software, please cite it using these metadata."
repository-code: "https://github.com/natir/fpa"
title: "yacrd and fpa: upstream tools for long-read genome assembly"
...
GitHub Events
Total
- Watch event: 1
Last Year
- Watch event: 1
Committers
Last synced: almost 3 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Marijon Pierre | p****n@i****r | 29 |
| Pierre Marijon | p****n@m****e | 7 |
| Pierre Marijon | p****e@m****r | 5 |
| Pierre Marijon | p****t@a****r | 4 |
| Pierre Marijon | p****n@h****e | 3 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 16
- Total pull requests: 0
- Average time to close issues: 3 months
- Average time to close pull requests: N/A
- Total issue authors: 11
- Total pull request authors: 0
- Average comments per issue: 1.38
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- natir (6)
- rwhetten (1)
- ardy20 (1)
- hwalinga (1)
- tseemann (1)
- ilnamkang (1)
- JieZhouFighting (1)
- lfaino (1)
- ms-gx (1)
- RolandFaure (1)
- ekg (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cargo 10,274 total
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 7
- Total maintainers: 1
crates.io: fpa_lr
fpa filter long read mapping information to save disk space
- Homepage: https://github.com/natir/fpa
- Documentation: https://docs.rs/fpa_lr/
- License: MIT
-
Latest release: 0.5.1
published almost 6 years ago
Rankings
Maintainers (1)
Dependencies
- adler 1.0.2
- aho-corasick 0.7.18
- atty 0.2.14
- autocfg 1.0.1
- bgzip 0.2.1
- bitflags 1.3.2
- bstr 0.2.17
- bzip2 0.4.3
- bzip2-sys 0.1.11+1.0.8
- cc 1.0.71
- cfg-if 1.0.0
- clap 3.0.0-beta.4
- clap_derive 3.0.0-beta.4
- crc32fast 1.2.1
- csv 1.1.6
- csv-core 0.1.10
- fixedbitset 0.4.0
- flate2 1.0.22
- hashbrown 0.11.2
- heck 0.3.3
- hermit-abi 0.1.19
- indexmap 1.7.0
- itoa 0.4.8
- lazy_static 1.4.0
- libc 0.2.103
- lzma-sys 0.1.17
- memchr 2.4.1
- miniz_oxide 0.4.4
- niffler 2.3.2
- os_str_bytes 3.1.0
- petgraph 0.6.0
- pkg-config 0.3.20
- proc-macro-error 1.0.4
- proc-macro-error-attr 1.0.4
- proc-macro2 1.0.29
- quote 1.0.10
- regex 1.5.4
- regex-automata 0.1.10
- regex-syntax 0.6.25
- ryu 1.0.5
- serde 1.0.130
- serde_derive 1.0.130
- strsim 0.10.0
- syn 1.0.80
- termcolor 1.1.2
- textwrap 0.14.2
- thiserror 1.0.29
- thiserror-impl 1.0.29
- unicode-segmentation 1.8.0
- unicode-width 0.1.9
- unicode-xid 0.2.2
- vec_map 0.8.2
- version_check 0.9.3
- winapi 0.3.9
- winapi-i686-pc-windows-gnu 0.4.0
- winapi-util 0.1.5
- winapi-x86_64-pc-windows-gnu 0.4.0
- xz2 0.1.6