Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 2 committers (100.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary
Repository
Features Arrangement Description Miniformat
Basic Info
- Host: GitHub
- Owner: ggonnella
- License: other
- Language: Python
- Default Branch: main
- Size: 39.1 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Fardes: Features Arrangement Description Miniformat
The mini-format described here allows to describe the relative arrangement of named sequence features on one or multiple molecules, in terms of their order, length of the interval between them, possible presence of further features between them, strand, position on the same or different molecule.
It was developed for an application in which the expected genome contents of prokaryotic genomes is expressed as a set of rules, which in some cases concern the relative arrangement of features.
Specification
The miniformat is described in the Markdown document SPECIFICATION.md in
this repository.
Examples
Here are examples on how the format can be used to express different arrangements:
A,B,C,D: this is a list without any interval specifications, thus the
features (whose IDs are given) will just follow each other without any
other relevant feature in between.
A,B?,C: A is maybe followed by B and surely be C
A,1,B,C: in this case, between A and B, there is a further feature.
A,1(gene),C: in this case, between A and B, there is exacly one feature,
which is of type gene
A,3(rRNA;tRNA),C: in this case, between A and B, there are 3 features,
of type rRNA or tRNA.
A,8:10,B: in this case, between A and B, there are 8 to 10 other features.
A,1:*,B or A,>=1,B: these are two equivalent ways to express the fact
that between A and B there is at least one other feature.
A,<10,B: between A and B there are less than 10 features (max 9).
A,0[1000:3000],B: there are no features between A and B,
but there are between 1000 and 3000 bases.
A,0:1[1kb:3kb],B or A,<1[1kb:3kb],B: there are between
1000 and 3000 bases and eventually a feature in this interval
A,[>30kbp],B or A,>=0[>30kbp],B: there are at least 30000 bases between
A and B, including any number of features.
A,><,B,<>,C: A and B are close to each other (and thus also on the same
molecule) and distant from C (which can be on the same molecule or another)
A,><,B,<.>,C: A and B are close to each other and distant from C,
but all three are on the same molecule
A,><,B,<|>,C: A and B are close to each other on the same molecule,
while C is on another molecule
A,&,^B: A and B overlap each other, on different strands
A,B,>C,^D: the order of the features is A, B, C and D with no other
feature in between them; thereby C and D are on opposite strands, while
A and B can be on any strand)
A,B,=C: the feature C is on the same strand as A, but B can be on the
same or on the oppposite strand.
A,><,=B,><,=C: all three features are on the same strand and close
to each other, with no features in between.
Limitations
The format is designed to be as simple as possible. There is no way to express a branched graph structure; instead all possible different paths would be required to be linearly spelled.
A possible way (not yet implemented) to introduce branching could be
using a syntax such as: A,1,{,B,C,|,C,{,B,|,D,E},},F
for expressing the set of paths:
A,1,B,C,F, A,1,C,B,F and A,1,C,D,E,F.
This would require to implement additional validations, to check
if the branches opening { and closing } are balanced, and the branch
separator | used properly.
Implementation as a Python package
The miniformat has been implemented as a TextFormats specification
(fardes.tf.yaml).
This has been included in a Python module fardes, which additionally include
cross-checking not expressable in TextFormats and normalizes the elements
while parsing a string
(e.g. by including implicit values and applying multipliers).
The module can be installed using pip install fardes.
Example usage of the Python parser
Here is an example of usage of the module:
import fardes
elements = fardes.parse("A,1:10[1kb:3kb],>B,1(rRNA;tRNA),>C,1[2],>D,=E,[3:*],F,1:*[>2Mb],G,<>,H,>0,I,<4,J,[~3kb],K,<|>,L,><,M,&,N")
will result in the following:
[{'type': 'unit', 'unit': 'A', 'prefix': ''},
{'type': 'interval', 'length': {'min': 1000, 'max': 3000}, 'n_features': {'min': 1, 'max': 10}},
{'type': 'unit', 'unit': 'B', 'prefix': '>'},
{'type': 'interval', 'length': {'min': 0, 'max': None}, 'n_features': {'min': 1, 'max': 1, 'type_spec': {'types': ['rRNA', 'tRNA']}}},
{'type': 'unit', 'unit': 'C', 'prefix': '>'},
{'type': 'interval', 'length': {'min': 2, 'max': 2}, 'n_features': {'min': 1, 'max': 1}},
{'type': 'unit', 'unit': 'D', 'prefix': '>'},
{'type': 'interval', 'length': {'min': 0, 'max': None}, 'n_features': {'min': 0, 'max': 0}},
{'type': 'unit', 'unit': 'E', 'prefix': '='},
{'type': 'interval', 'length': {'min': 3, 'max': None}, 'n_features': {'min': 0, 'min': None}},
{'type': 'unit', 'unit': 'F', 'prefix': ''},
{'type': 'interval', 'length': {'min': 2000001, 'max': None}, 'n_features': {'min': 1, 'max': None}},
{'type': 'unit', 'unit': 'G', 'prefix': ''},
{'type': 'interval', 'special': 'distant'}
{'type': 'unit', 'unit': 'H', 'prefix': ''},
{'type': 'interval', 'length': {'min': 0, 'max': None}, 'n_features': {'min': 1, 'max': None}},
{'type': 'unit', 'unit': 'I', 'prefix': ''},
{'type': 'interval', 'length': {'min': 0, 'max': None}, 'n_features': {'min': 0, 'min': 3}},
{'type': 'unit', 'unit': 'J', 'prefix': ''},
{'type': 'interval', 'length': {'approx': 3000}, 'n_features': {'min': 0, 'max': None}},
{'type': 'unit', 'unit': 'K', 'prefix': ''},
{'type': 'interval', 'special': 'other_molecule'},
{'type': 'unit', 'unit': 'L', 'prefix': ''},
{'type': 'interval', 'special': 'near'},
{'type': 'unit', 'unit': 'M', 'prefix': ''},
{'type': 'interval', 'special': 'overlap'},
{'type': 'unit', 'unit': 'N', 'prefix': ''}]
Acknowledgements
This specification has been created in context of the DFG project GO 3192/1-1 “Automated characterization of microbial genomes and metagenomes by collection and verification of association rules”. The funders had no role in study design, data collection and analysis.
Name
The name Fardes is an acronym for "feature arrangement description". After naming the project, I noticed that, according to Wiktionary, in Belgian French, a "farde" (plural: fardes) is a file, in the meaning of stationery to keep documents together. This fits well to the purpose of the format.
Owner
- Name: Giorgio Gonnella
- Login: ggonnella
- Kind: user
- Location: Goettingen, Germany
- Company: Bioinformatics, University of Goettingen
- Repositories: 36
- Profile: https://github.com/ggonnella
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gonnella" given-names: "Giorgio" orcid: "https://orcid.org/0000-0003-3900-5397" title: 'The EGC format and GenExpect: representation and storage of rules about prokaryotic genome contents" version: 1.0 date-released: 2023-02-24 url: "htts://github.com/ggonnella/fardes/"
GitHub Events
Total
Last Year
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Giorgio Gonnella | g****a@z****e | 23 |
| Giorgio Gonnella | g****a@u****e | 4 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 21 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 3
- Total maintainers: 1
pypi.org: fardes
A miniformat for expressing arrangements of sequence features
- Homepage: https://github.com/ggonnella/fardes
- Documentation: https://fardes.readthedocs.io/
- License: ISC
-
Latest release: 1.2
published about 3 years ago