fhr-file-converter

A file converter and validator for the FHR header and its serializations.

https://github.com/fair-bioheaders/fhr-file-converter

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 15 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A file converter and validator for the FHR header and its serializations.

Basic Info
  • Host: GitHub
  • Owner: FAIR-bioHeaders
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 119 KB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 2
Created almost 4 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

FHR-File-Converter

DOI

This is the fhr file converter, it can convert fhr inbetween json, fasta, microdata, and fasta header. If you would like a detailed specification of fhr, see FHR-Specification

Installation

You can intall the FHR file converter via pypi:

bash pip install fhr

You can also install the FHR file converter and its dependencies using Poetry (by first downloading the repo or release):

bash poetry install

Usage

Commnand line Usage

Using FHR on the command line:

bash fhr-convert <input>.<yaml|json|fasta|html> <output>.<yaml|json|fasta|html>

Detailed Usage:

``` usage: fhr-convert [-h] [--version]

Convert from one FHR supported file type to another

positional arguments: input followed by output

optional arguments: -h, --help show this help message and exit --version show program's version number and exit

positional input and output files input files can be one of: .yml .fasta - fasta contining a fhr header .html - html containing microdata

output files can be one of:
    <output>.yml
    <output>.fasta - fasta output type will be made as a fasta header without sequences
    <output>.html  - microdata output type will be made into generic html output

```

Validating an FHR file on command line

bash fhr-validate <input>.<yaml|json|fasta|html>

Detailed Usage:

``` usage: fhr-validate [-h] [--version]

Validate a fhr containing file

positional input and output files input files can be one of: .yml .fasta - fasta contining a fhr header .html - html containing microdata ```

As such validating a yaml file named "important_genome.fhr.yml" would be:

fhr-validate important_genome.fhr.yml

Other FHR Tools

FHR has several other command line tools:

  • fhr_fasta_combine - combine a fhr header in any serialization with an existing fasta file
  • fhr_fasta_strip - remove a fhr header out of a fasta file
  • fhr_fasta_validate - check an fhr containing fasta against its checksum
  • fhr_gfa_combine - combine a fhr header in any serilaization with an existing gfa file
  • fhr_gfa_strip - remove an fhr header our of a gfa file
  • fhr_gfa_validate - check an fhr containing gfa against its checksum

Using FHR in Python

To use FHR libabry in Python

```python

from fhr import fhr file = open("example.yaml") data = fhr() data.inputyaml(file.read()) data.outputfasta() ";~schema: https://raw.githubusercontent.com/FFRGS/FFRGS-Specification/main/fhr.json\n;~schemaVersion: 1\n;~genome: Bombas huntii\n;~version: 0.0.1\n;~author:;~ name:Adam Wright\n;~ url:https://wormbase.org/resource/person/WBPerson30813\n;~assembler:;~ name:David Molik\n;~ url:https:/david.molik.co/person\n;~place:;~ name:PBARC\n;~ url:https://www.ars.usda.gov/pacific-west-area/hilo-hi/daniel-k-inouye-us-pacific-basin-agricultural-research-center/\n;~taxa: Bombas huntii\n;~assemblySoftware: HiFiASM\n;~physicalSample: Located in Freezer 33, Drawer 137\n;~dateCreated: 2022-03-21\n;~instrument: ['Sequel IIe', 'Nanopore']\n;~scholarlyArticle: https://doi.org/10.1371/journal.pntd.0008755\n;~documentation: Built assembly from... \n;~identifier: ['gkx10242566416842']\n;~relatedLink: ['https/david.molik.co/genome']\n;~funding: some\n;~reuseConditions: public domain\n" ```

Checksums

The FHR stores checksums, allowing the FASTA header of the reference genome to contain the checksum for the FASTA file without the header.

To utilize the checksum, strip the FASTA header:

bash cat example.fasta | grep -E -v '^;~\s?checksum' > example.check.fasta

To strip the checksum:

bash cat example.fasta | grep -E ';~\s?checksum' | sed 's/^;~checksum://g' | sed '/\'//g'

Docker Support

You can also run the FHR file converter in a Docker container. To build the Docker image:

bash docker build -t fhr-file-converter .

And then run the Docker container:

bash docker run -it --rm fhr-file-converter

Running Code Quality Checks

Ensuring code quality is crucial for maintaining a healthy and sustainable codebase. The following tools help enforce coding standards and best practices:

isort

isort is a tool that sorts Python imports alphabetically within each section and separated by a blank line. It ensures consistent import styles across your project.

To run isort, use the following command:

bash poetry run isort .

ruff

ruff is a lightweight linter for Python that aims to detect common programming errors, stylistic issues, and code smells. It provides quick feedback on potential issues in your code.

To run ruff, use the following command:

bash poetry run ruff .

black

black is an uncompromising Python code formatter. It reformats entire files in place to ensure a consistent and readable code style. It's opinionated and strives for the smallest diffs possible.

To run black, use the following command:

bash poetry run black .

Running these code quality checks regularly helps maintain a clean and consistent codebase, making it easier to collaborate with others and ensuring code readability and maintainability. These checks are required to pass in order to pull changes into the main branch.

pytest

Make sure you install depedencies first and then run the tests with poetry bash poetry run install poetry run pytest

Citing FHR

Information on Citations of FHR

Citing the Validation Tool

cite the validation tool when directly interacting with the tool or library The APA citation for the FHR validation/converter software is:

Molik, D., & Wright, A. FHR File Converster [Computer software]. https://github.com/FAIR-bioHeaders/FHR-File-Converter

Or in bibtex: bibtex % Citation For FHR Validation/Converter Software @software{FHR_File_Converter, author = {Molik, David and Wright, Adam}, year = {2023}, license = {PDDL-1.0}, title = {{FHR File Converster}}, url = {https://github.com/FAIR-bioHeaders/FHR-File-Converter}, doi = {10.5281/zenodo.6762547} }

Citing the Specification

cite the specification when directly interacting with the specification (pull requests, comments on schema) The APA citation for the FHR specification is:

Molik, D., & Wright, A. FHR Specification [Data set]. https://github.com/FAIR-bioHeaders/FHR-Specification

Or in bibtex: bibtex % Citation For FHR Specification @misc{FHR_Specification, author = {Molik, David and Wright, Adam}, year = {2023}, title = {{FHR Specification}}, url = {https://github.com/FAIR-bioHeaders/FHR-Specification}, doi = {10.5281/zenodo.6762549} }

Citing FHR

The APA citation for the FHR Briefings in Bioinformatics is:

Adam Wright, Mark D Wilkinson, Christopher Mungall, Scott Cain, Stephen Richards, Paul Sternberg, Ellen Provin, Jonathan L Jacobs, Scott Geib, Daniela Raciti, Karen Yook, Lincoln Stein, David C Molik, FAIR Header Reference genome: a TRUSTworthy standard, Briefings in Bioinformatics, Volume 25, Issue 3, May 2024, bbae122, https://doi.org/10.1093/bib/bbae122

Or in bibtex: bibtex % Citation For FHR @article{10.1093/bib/bbae122, author = {Wright, Adam and Wilkinson, Mark D and Mungall, Christopher and Cain, Scott and Richards, Stephen and Sternberg, Paul and Provin, Ellen and Jacobs, Jonathan L and Geib, Scott and Raciti, Daniela and Yook, Karen and Stein, Lincoln and Molik, David C}, title = "{FAIR Header Reference genome: a TRUSTworthy standard}", journal = {Briefings in Bioinformatics}, volume = {25}, number = {3}, pages = {bbae122}, year = {2024}, month = {03}, abstract = "{The lack of interoperable data standards among reference genome data-sharing platforms inhibits cross-platform analysis while increasing the risk of data provenance loss. Here, we describe the FAIR bioHeaders Reference genome (FHR), a metadata standard guided by the principles of Findability, Accessibility, Interoperability and Reuse (FAIR) in addition to the principles of Transparency, Responsibility, User focus, Sustainability and Technology. The objective of FHR is to provide an extensive set of data serialisation methods and minimum data field requirements while still maintaining extensibility, flexibility and expressivity in an increasingly decentralised genomic data ecosystem. The effort needed to implement FHR is low; FHR’s design philosophy ensures easy implementation while retaining the benefits gained from recording both machine and human-readable provenance.}", issn = {1477-4054}, doi = {10.1093/bib/bbae122}, url = {https://doi.org/10.1093/bib/bbae122}, eprint = {https://academic.oup.com/bib/article-pdf/25/3/bbae122/57108923/bbae122.pdf}, }

Owner

  • Name: FAIR bioHeaders
  • Login: FAIR-bioHeaders
  • Kind: organization
  • Location: Canada

Multiple data serialized metadata for your favorite biology data.

Citation (CITATION.cff)

cff-version: 1.2.0
title: FHR File Converster
message: 'If you use this software, please cite it as below.'
type: software
authors:
  - given-names: David
    family-names: Molik
    email: david.molik@usda.gov
    affiliation: USDA ARS ABADRU
    orcid: 'https://orcid.org/0000-0003-3192-6538'
  - given-names: Adam
    family-names: Wright
    email: AWright@oicr.on.ca
    affiliation: OICR
    orcid: 'https://orcid.org/0000-0002-5719-4024'
identifiers:
  - type: doi
    value: 10.5281/zenodo.6762547
    description: Zenodo DOI
  - type: doi
    value: 10.1101/2023.11.29.569306
    description: Preprint DOI
repository-code: 'https://github.com/FAIR-bioHeaders/FHR-File-Converter'
url: 'https://github.com/FAIR-bioHeaders'
repository-artifact: 'https://github.com/FAIR-bioHeaders/FHR-Specification'
abstract: >-
  The lack of interoperable data standards among reference
  genome data-sharing platforms inhibits cross-platform
  analysis while increasing the risk of data provenance
  loss. Here, we describe the FAIR-bioHeaders Reference
  genome (FHR), a metadata standard guided by the principles
  of Findability, Accessibility, Interoperability, and Reuse
  (FAIR) in addition to the principles of Transparency,
  Responsibility, User focus, Sustainability, and Technology
  (TRUST). The objective of FHR is to provide an extensive
  set of data serialisation methods and minimum data field
  requirements while still maintaining extensibility,
  flexibility, and expressivity in an increasingly
  decentralised genomic data ecosystem. The effort needed to
  implement FHR is low; FHR’s design philosophy ensures easy
  implementation while retaining the benefits gained from
  recording both machine and human-readable provenance.
keywords:
  - FAIR
  - TRUST
  - fasta
  - yaml
  - json
  - gfa
  - microdata
  - genome assembly
license: PDDL-1.0

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 15
  • Total pull requests: 6
  • Average time to close issues: 9 months
  • Average time to close pull requests: about 22 hours
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 1.6
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • molikd (13)
  • adamjohnwright (1)
  • Woolly-at-EBI (1)
Pull Request Authors
  • adamjohnwright (4)
  • molikd (2)
Top Labels
Issue Labels
no-issue-activity (2) enhancement (1) help wanted (1)
Pull Request Labels

Dependencies

pyproject.toml pypi
Dockerfile docker
  • python 3.9 build
poetry.lock pypi
  • argparse 1.4.0
  • attrs 23.2.0
  • flake8 7.0.0
  • html5lib 1.1
  • jsonschema 4.21.1
  • jsonschema-specifications 2023.12.1
  • mccabe 0.7.0
  • microdata 0.8.0
  • pycodestyle 2.11.1
  • pyflakes 3.2.0
  • referencing 0.33.0
  • rpds-py 0.17.1
  • six 1.16.0
  • webencodings 0.5.1
.github/workflows/codacy.yml actions
  • actions/checkout v3 composite
  • codacy/codacy-analysis-cli-action d840f886c4bd4edc059706d09c6a1586111c540b composite
  • github/codeql-action/upload-sarif v2 composite
.github/workflows/pytest.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/stale.yml actions
  • actions/stale v5 composite