https://github.com/databio/bedms
Tool for standardization of genomics/epigenomics metadata
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.6%) to scientific vocabulary
Keywords
Repository
Tool for standardization of genomics/epigenomics metadata
Basic Info
Statistics
- Stars: 3
- Watchers: 17
- Forks: 0
- Open Issues: 2
- Releases: 2
Topics
Metadata Files
README.md
BEDMS
BEDMS (BED Metadata Standardizer) is a tool desgined to standardize genomics and epigenomics metadata attributes according to user-selected schemas such as ENCODE, FAIRTRACKS and BEDBASE. BEDMS ensures consistency and FAIRness of metadata across different platforms. Additionally, users have the option to train their own standardizer model using a custom schema (CUSTOM), allowing for the standardization of attributes based on users' specific research requirements.
Installation
To install bedms use this command:
pip install bedms
or install the latest version from the GitHub repository:
pip install git+https://github.com/databio/bedms.git
Usage
Standardizing based on available schemas
To choose the schema you want to standardize according to, please refer to the HuggingFace repository. Based on the schema design .yaml files, you can select which schema best represents your attributes. In the example below, we have chosen encode schema.
```python from bedms import AttrStandardizer
model = AttrStandardizer( repoid="databio/attribute-standardizer-model6", modelname="encode" ) results = model.standardize(pep="geo/gse228634:default")
assert results ```
Training custom schemas
Training your custom schema is very easy with BEDMS. You would need two things to get started:
1. Training Sets
2. training_config.yaml
To instantiate TrainStandardizer class:
```python from bedms.train import AttrStandardizerTrainer
trainer = AttrStandardizerTrainer("training_config.yaml")
``` To load the datasets and encode them:
python
train_data, val_data, test_data, label_encoder, vectorizer = trainer.load_data()
To train the custom model:
python
trainer.train()
To test the custom model:
python
test_results_dict = trainer.test()
To generate visualizations such as Learning Curves, Confusion Matrices, and ROC Curve:
python
acc_fig, loss_fig, conf_fig, roc_fig = trainer.plot_visualizations()
Where acc_fig is Accuracy Curve figure object, loss_fig is Loss Curve figure object, conf_fig is the Confusion Matrix figure object, and roc_fig is the ROC Curve figure object.
Standardizing based on custom schema
For standardizing based on custom schema, your model should be on HuggingFace. The directory structure should follow the instructions mentioned on HuggingFace.
```python from bedms import AttrStandardizer
model = AttrStandardizer( repoid="name/of/your/hf/repo", modelname="model/name" ) results = model.standardize(pep="geo/gse228634:default")
print(results) #Dictionary of suggested predictions with their confidence: {'attr1':{'prediction1': 0.70, 'prediction_2':0.30}} ```
Owner
- Name: Databio
- Login: databio
- Kind: organization
- Location: University of Virginia
- Website: https://databio.org
- Repositories: 88
- Profile: https://github.com/databio
Solving problems in computational biology
GitHub Events
Total
- Create event: 1
- Issues event: 1
- Release event: 1
- Push event: 5
- Pull request event: 2
- Pull request review event: 3
- Pull request review comment event: 1
Last Year
- Create event: 1
- Issues event: 1
- Release event: 1
- Push event: 5
- Pull request event: 2
- Pull request review event: 3
- Pull request review comment event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 17
- Total pull requests: 19
- Average time to close issues: 22 days
- Average time to close pull requests: 8 days
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 1.65
- Average comments per pull request: 0.89
- Merged pull requests: 16
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 16
- Pull requests: 16
- Average time to close issues: 19 days
- Average time to close pull requests: 9 days
- Issue authors: 4
- Pull request authors: 2
- Average comments per issue: 1.69
- Average comments per pull request: 1.06
- Merged pull requests: 14
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- nleroy917 (4)
- saanikat (4)
- nsheff (4)
- sanghoonio (1)
Pull Request Authors
- saanikat (16)
- khoroshevskyi (4)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- numpy *
- pandas *
- pephubclient *
- sentence-transformers *
- torch *
- black * development
- isort * development
- pytest * development
- actions/checkout v4 composite
- actions/setup-python v5 composite
- psf/black stable composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- pypa/gh-action-pypi-publish release/v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite