egc-spec

Specification for the Expected Genome Content format

https://github.com/ggonnella/egc-spec

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Specification for the Expected Genome Content format

Basic Info
  • Host: GitHub
  • Owner: ggonnella
  • License: other
  • Default Branch: main
  • Size: 21.5 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed about 3 years ago
Metadata Files
Readme License Citation Authors

README.md

Expected Genome Content (EGC) Format Specification

Expected Genome Content (EGC) is a format for expressing rules describing expectations about the content of genomes. In its current version, it is dedicated to prokaryotic genomes, although it is possible to use it, and eventually adapt it, to eukaryotic genomes.

Structure of the format

The format is very similar in structure to GFA. Ie. it is a multi-record tabular format (with tabs as separators), where each record is introduced by a record-type (1 letter-code) in the first field.

Each record has a predetermined number of mandatory positional fields. The number, format and semantics of these fields depend on the record type.

After the positional fields, additional information can be given in the form of tags.

Differently from GFA, each line can also include a comment, i.e. a last field in the line, introduced by '#' after the tab. Everything following this character and until the end of the line is considered a comment. For simplicity, comments cannot contain tabs.

Line types

The following line types have been defined.

Groups of organisms: - 'G': (Group) a group of organisms

Genome contents: - 'U' (Units) element observed/predicted/computed in the genome sequence and/or annotation when measuring an attribute - 'M' (Model) a model in an external database identifying a unit - 'A' (Attribute) a quantity which can be measured for a genome

Rules of expectation: - 'C': (Comparison): a rule of expectation comparing a genome attribute in two group of organisms - 'V': (Value) a rule of expectation comparing a genome attribute to a value or a set of values

Sources of rules: - 'D' (Document): describe an external textual document - 'S' (Snippet): reports a snippet of text from a document - 'T' (Tables): refers to a table in a document

Metadata (these lines do not support tags): - 'X' (eXternal resources definitions): describe external resources, their usage contexts, homepage, citation and item URLs - 'Y' (tag definitions): describe user-specific tags usage contexts format and semantics

Implementation of the specification

The specification is given as a set of TextFormats specification files. The main specification file is "egc.tf.yaml", in which the "line" datatype is defined. This represent a single line (i.e. a record) of a EGC file.

Acknowledgements

This specification has been created in context of the DFG project GO 3192/1-1 “Automated characterization of microbial genomes and metagenomes by collection and verification of association rules”. The funders had no role in study design, data collection and analysis.

Owner

  • Name: Giorgio Gonnella
  • Login: ggonnella
  • Kind: user
  • Location: Goettingen, Germany
  • Company: Bioinformatics, University of Goettingen

GitHub Events

Total
Last Year

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 19
  • Total Committers: 1
  • Avg Commits per committer: 19.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Giorgio Gonnella g****a@z****e 19
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels