https://github.com/giorginolab/mdcath

This repository houses all the scripts and notebooks utilized for generating, analyzing, and validating the mdCATH dataset. Some user examples are also available.

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

This repository houses all the scripts and notebooks utilized for generating, analyzing, and validating the mdCATH dataset. Some user examples are also available.

Basic Info

Host: GitHub
Owner: giorginolab
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage:
Size: 3.02 MB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Fork of compsciencelab/mdCATH

Created almost 2 years ago · Last pushed over 1 year ago

https://github.com/giorginolab/mdCATH/blob/main/

# mdCATH Dataset Repository

Welcome to the mdCATH dataset repository! This repository houses all the scripts and notebooks utilized for generating, analyzing, and validating the mdCATH dataset. The dataset is available on the Hugging Face platform. All mdCATH trajectories can be directly visualized on PlayMolecule without needing to download, or alternatively download them in XTC format from PlayMolecule if needed.

## Useful Links
- Playmolecule: https://open.playmolecule.org/mdcath

- Hugging Face: https://huggingface.co/datasets/compsciencelab/mdCATH

## Repository Structure

- #### `user`
- Provides tutorials and example scripts to help new users familiarize themselves with the dataset.
- Step-by-step tutorials to guide users through common tasks and procedures using the dataset.
- Example scripts that demonstrate practical applications of the dataset in research scenarios.

- #### `user-utils`
- TCL code to load mdCATH's HDF5 files in VMD (for end-users)
- Python code to convert files to XTC format (for end-users)

- #### `generator`
- Directory with the scripts used to generate the dataset.
- `builder/generator.py`: is the main script responsible for dataset creation. It processes a list of CATH domains and their molecular dynamics outputs to produce H5 files for the mdCATH dataset. It features multiprocessing to accelerate the dataset generation process. For each domain, an H5 file is created accompanied by a log file that records the progress.

- #### `analysis`
- Houses tools required for analyzing the dataset.
- This directory includes various scripts and functions used to perform the analyses and generate the plots presented in the paper.

## Citation

> Antonio Mirarchi, Toni Giorgino and Gianni De Fabritiis. *mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics*. https://arxiv.org/abs/2407.14794

```
@misc{mirarchi2024mdcathlargescalemddataset,
title={mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics},
author={Antonio Mirarchi and Toni Giorgino and Gianni De Fabritiis},
year={2024},
eprint={2407.14794},
archivePrefix={arXiv},
primaryClass={q-bio.BM},
url={https://arxiv.org/abs/2407.14794},
}
```

Owner

Name: Giorgino Laboratory
Login: giorginolab
Kind: organization
Location: Milan, Italy

Website: www.giorginolab.it
Repositories: 63
Profile: https://github.com/giorginolab

Computational Biophysics

GitHub Events

Total

Push event: 1

Last Year

Push event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/giorginolab/mdcath

Science Score: 10.0%

Repository

Basic Info

Statistics

https://github.com/giorginolab/mdCATH/blob/main/

Owner

GitHub Events

Total

Last Year