https://github.com/databio/bedms

Tool for standardization of genomics/epigenomics metadata

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary

Keywords

genetics genomic-intervals metadata

Last synced: 5 months ago · JSON representation

Repository

Tool for standardization of genomics/epigenomics metadata

Basic Info

Host: GitHub
Owner: databio
License: bsd-2-clause
Language: Python
Default Branch: master
Homepage:
Size: 13.9 MB

Statistics

Stars: 3
Watchers: 17
Forks: 0
Open Issues: 2
Releases: 2

Topics

genetics genomic-intervals metadata

Created almost 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License

BEDMS

BEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as ENCODE, FAIRTRACKS and BEDBASE. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (CUSTOM), allowing for the standardization of attributes based on users' specific research requirements.

Installation

To install bedms use this command: pip install bedms or install the latest version from the GitHub repository: pip install git+https://github.com/databio/bedms.git

Usage

Standardizing based on available schemas

To choose the schema you want to standardize according to, please refer to the HuggingFace repository. Based on the schema design .yaml files, you can select which schema best represents your attributes. In the example below, we have chosen encode schema.

```python from bedms import AttrStandardizer

model = AttrStandardizer( repoid="databio/attribute-standardizer-model6", modelname="encode" ) results = model.standardize(pep="geo/gse228634:default")

assert results ```

Training custom schemas

Training your custom schema is very easy with BEDMS. You would need two things to get started: 1. Training Sets 2. training_config.yaml

To instantiate TrainStandardizer class:

```python from bedms.train import AttrStandardizerTrainer

trainer = AttrStandardizerTrainer("training_config.yaml")

``` To load the datasets and encode them:

python train_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()

To train the custom model:

python trainer.train()

To test the custom model:

python test_results_dict = trainer.test()

To generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:

python acc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations()

Where acc_fig is Accuracy Curve figure object, loss_fig is Loss Curve figure object, conf_fig is the Confusion Matrix figure object, and roc_fig is the ROC Curve figure object.

Standardizing based on custom schema

For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on HuggingFace.

```python from bedms import AttrStandardizer

model = AttrStandardizer( repoid="name/of/your/hf/repo", modelname="model/name" ) results = model.standardize(pep="geo/gse228634:default")

print(results) #Dictionary of suggested predictions with their confidence: {'attr1':{'prediction1': 0.70, 'prediction_2':0.30}} ```

Owner

Name: Databio
Login: databio
Kind: organization
Location: University of Virginia

Website: https://databio.org
Repositories: 88
Profile: https://github.com/databio

Solving problems in computational biology

GitHub Events

Total

Create event: 1
Issues event: 1
Release event: 1
Push event: 5
Pull request event: 2
Pull request review event: 3
Pull request review comment event: 1

Last Year

Create event: 1
Issues event: 1
Release event: 1
Push event: 5
Pull request event: 2
Pull request review event: 3
Pull request review comment event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 17
Total pull requests: 19
Average time to close issues: 22 days
Average time to close pull requests: 8 days
Total issue authors: 4
Total pull request authors: 2
Average comments per issue: 1.65
Average comments per pull request: 0.89
Merged pull requests: 16
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 16
Pull requests: 16
Average time to close issues: 19 days
Average time to close pull requests: 9 days
Issue authors: 4
Pull request authors: 2
Average comments per issue: 1.69
Average comments per pull request: 1.06
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

nleroy917 (4)
saanikat (4)
nsheff (4)
sanghoonio (1)

Pull Request Authors

saanikat (16)
khoroshevskyi (4)

Top Labels

Issue Labels

enhancement (1) question (1)

Pull Request Labels

Dependencies

requirements/requirements-all.txt pypi

numpy *
pandas *
pephubclient *
sentence-transformers *
torch *

requirements/requirements-dev.txt pypi

black * development
isort * development
pytest * development

setup.py pypi

.github/workflows/black.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
psf/black stable composite

.github/workflows/python-publish.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite
pypa/gh-action-pypi-publish release/v1 composite

.github/workflows/run-pytest.yml actions

actions/checkout v4 composite
actions/setup-python v5 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/databio/bedms

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

BEDMS

Installation

Usage

Standardizing based on available schemas

Training custom schemas

Standardizing based on custom schema

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies