https://github.com/giorginolab/mdcath

This repository houses all the scripts and notebooks utilized for generating, analyzing, and validating the mdCATH dataset. Some user examples are also available.

https://github.com/giorginolab/mdcath

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

This repository houses all the scripts and notebooks utilized for generating, analyzing, and validating the mdCATH dataset. Some user examples are also available.

Basic Info
  • Host: GitHub
  • Owner: giorginolab
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 3.02 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Fork of compsciencelab/mdCATH
Created over 1 year ago · Last pushed over 1 year ago

https://github.com/giorginolab/mdCATH/blob/main/

# mdCATH Dataset Repository

Welcome to the mdCATH dataset repository! This repository houses all the scripts and notebooks utilized for generating, analyzing, and validating the mdCATH dataset. The dataset is available on the Hugging Face platform. All mdCATH trajectories can be directly visualized on PlayMolecule without needing to download, or alternatively download them in XTC format from PlayMolecule if needed.

## Useful Links
- Playmolecule: https://open.playmolecule.org/mdcath 
- Hugging Face: https://huggingface.co/datasets/compsciencelab/mdCATH ## Repository Structure - #### `user` - Provides tutorials and example scripts to help new users familiarize themselves with the dataset. - Step-by-step tutorials to guide users through common tasks and procedures using the dataset. - Example scripts that demonstrate practical applications of the dataset in research scenarios. - #### `user-utils` - TCL code to load mdCATH's HDF5 files in VMD (for end-users) - Python code to convert files to XTC format (for end-users) - #### `generator` - Directory with the scripts used to generate the dataset. - `builder/generator.py`: is the main script responsible for dataset creation. It processes a list of CATH domains and their molecular dynamics outputs to produce H5 files for the mdCATH dataset. It features multiprocessing to accelerate the dataset generation process. For each domain, an H5 file is created accompanied by a log file that records the progress. - #### `analysis` - Houses tools required for analyzing the dataset. - This directory includes various scripts and functions used to perform the analyses and generate the plots presented in the paper. ## Citation > Antonio Mirarchi, Toni Giorgino and Gianni De Fabritiis. *mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics*. https://arxiv.org/abs/2407.14794 ``` @misc{mirarchi2024mdcathlargescalemddataset, title={mdCATH: A Large-Scale MD Dataset for Data-Driven Computational Biophysics}, author={Antonio Mirarchi and Toni Giorgino and Gianni De Fabritiis}, year={2024}, eprint={2407.14794}, archivePrefix={arXiv}, primaryClass={q-bio.BM}, url={https://arxiv.org/abs/2407.14794}, } ```

Owner

  • Name: Giorgino Laboratory
  • Login: giorginolab
  • Kind: organization
  • Location: Milan, Italy

Computational Biophysics

GitHub Events

Total
  • Push event: 1
Last Year
  • Push event: 1