https://github.com/databio/bedms

Tool for standardization of genomics/epigenomics metadata

https://github.com/databio/bedms

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.6%) to scientific vocabulary

Keywords

genetics genomic-intervals metadata
Last synced: 5 months ago · JSON representation

Repository

Tool for standardization of genomics/epigenomics metadata

Basic Info
  • Host: GitHub
  • Owner: databio
  • License: bsd-2-clause
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 13.9 MB
Statistics
  • Stars: 3
  • Watchers: 17
  • Forks: 0
  • Open Issues: 2
  • Releases: 2
Topics
genetics genomic-intervals metadata
Created almost 2 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.md

BEDMS

BEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as ENCODE, FAIRTRACKS and BEDBASE. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (CUSTOM), allowing for the standardization of attributes based on users' specific research requirements.

Installation

To install bedms use this command: pip install bedms or install the latest version from the GitHub repository: pip install git+https://github.com/databio/bedms.git

Usage

Standardizing based on available schemas

To choose the schema you want to standardize according to, please refer to the HuggingFace repository. Based on the schema design .yaml files, you can select which schema best represents your attributes. In the example below, we have chosen encode schema.

```python from bedms import AttrStandardizer

model = AttrStandardizer( repoid="databio/attribute-standardizer-model6", modelname="encode" ) results = model.standardize(pep="geo/gse228634:default")

assert results ```

Training custom schemas

Training your custom schema is very easy with BEDMS. You would need two things to get started: 1. Training Sets 2. training_config.yaml

To instantiate TrainStandardizer class:

```python from bedms.train import AttrStandardizerTrainer

trainer = AttrStandardizerTrainer("training_config.yaml")

``` To load the datasets and encode them:

python train_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()

To train the custom model:

python trainer.train()

To test the custom model:

python test_results_dict = trainer.test()

To generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:

python acc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations()

Where acc_fig is Accuracy Curve figure object, loss_fig is Loss Curve figure object, conf_fig is the Confusion Matrix figure object, and roc_fig is the ROC Curve figure object.

Standardizing based on custom schema

For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on HuggingFace.

```python from bedms import AttrStandardizer

model = AttrStandardizer( repoid="name/of/your/hf/repo", modelname="model/name" ) results = model.standardize(pep="geo/gse228634:default")

print(results) #Dictionary of suggested predictions with their confidence: {'attr1':{'prediction1': 0.70, 'prediction_2':0.30}} ```

Owner

  • Name: Databio
  • Login: databio
  • Kind: organization
  • Location: University of Virginia

Solving problems in computational biology

GitHub Events

Total
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Push event: 5
  • Pull request event: 2
  • Pull request review event: 3
  • Pull request review comment event: 1
Last Year
  • Create event: 1
  • Issues event: 1
  • Release event: 1
  • Push event: 5
  • Pull request event: 2
  • Pull request review event: 3
  • Pull request review comment event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 17
  • Total pull requests: 19
  • Average time to close issues: 22 days
  • Average time to close pull requests: 8 days
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 1.65
  • Average comments per pull request: 0.89
  • Merged pull requests: 16
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 16
  • Pull requests: 16
  • Average time to close issues: 19 days
  • Average time to close pull requests: 9 days
  • Issue authors: 4
  • Pull request authors: 2
  • Average comments per issue: 1.69
  • Average comments per pull request: 1.06
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nleroy917 (4)
  • saanikat (4)
  • nsheff (4)
  • sanghoonio (1)
Pull Request Authors
  • saanikat (16)
  • khoroshevskyi (4)
Top Labels
Issue Labels
enhancement (1) question (1)
Pull Request Labels

Dependencies

requirements/requirements-all.txt pypi
  • numpy *
  • pandas *
  • pephubclient *
  • sentence-transformers *
  • torch *
requirements/requirements-dev.txt pypi
  • black * development
  • isort * development
  • pytest * development
setup.py pypi
.github/workflows/black.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • psf/black stable composite
.github/workflows/python-publish.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • pypa/gh-action-pypi-publish release/v1 composite
.github/workflows/run-pytest.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite