mpds-aiida

Automated computational workflows based on the MPDS data platform using the CRYSTAL first-principles engine

https://github.com/mpds-io/mpds-aiida

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

ab-initio materials-informatics materials-science mpds-platform
Last synced: 6 months ago · JSON representation ·

Repository

Automated computational workflows based on the MPDS data platform using the CRYSTAL first-principles engine

Basic Info
Statistics
  • Stars: 3
  • Watchers: 2
  • Forks: 3
  • Open Issues: 3
  • Releases: 2
Topics
ab-initio materials-informatics materials-science mpds-platform
Created almost 7 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Cloud factory for the accurate materials data

DOI

using the MPDS data platform, AiiDA workflows, and CRYSTAL simulation engine.

MPDS AiiDA CRYSTAL

Rationale

  • get accurate encyclopedic, reference, and benchmarking scientific data
  • get vast systematic training data for machine learning
  • use the cheap commodity cloud environment (not necessarily the HPC cluster)
  • ensure provenance tracking and reproducibility of simulations with AiiDA

Installation

The code in this repo requires the aiida-crystal-dft, yascheduler, and mpds-ml-labs Python packages installed. In their turn, they depend on the aiida, mpds_client, and other Python packages.

Thus, installation is as follows (replace pip with pip3 if needed and mind virtual env):

shell pip install git+https://github.com/tilde-lab/aiida-crystal-dft pip install git+https://github.com/tilde-lab/yascheduler pip install git+https://github.com/mpds-io/mpds-ml-labs git clone https://github.com/mpds-io/mpds-aiida pip install mpds-aiida/

Here some reader's AiiDA experience is assumed. Note, since the AiiDA does not support cloud environments, the custom cloud scheduler engine yascheduler should be employed. This scheduler manages the CRYSTAL simulation engine at the cloud VPS instances and encapsulates all the details, concerning the remote computer task submission, queue, and results retrieval, as well as the VPS management. This scheduler runs its own daemon and lives together with the AiiDA at the same machine. However, AiiDA considers it as a remote service, accessible via the ssh transport, so the command ssh $USER@localhost should pass. To achieve that, the reader might run e.g.:

shell ssh-keygen -t rsa cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys ssh $USER@localhost (Note, that the AiiDA should be aware of the ~/.ssh/id_rsa.pub key file while SSH setup!)

For simplicity the yascheduler can share the database with AiiDA. Setting up the yascheduler looks like:

shell vi /etc/yascheduler/yascheduler.conf yainit service yascheduler start

The AiiDA should be set up normally, and the stub remote computer (e.g. cluster: yascheduler), as well as the stub CRYSTAL code (e.g. codes: Pcrystal) should be added:

shell reentry scan verdi setup verdi computer setup verdi computer configure ssh $COMPUTER verdi computer test $COMPUTER --print-traceback verdi code setup

Why stub? Because the computer and code management is delegated to the yascheduler, taking care of the on-demand cloud resources management.

The Gaussian basis sets used by CRYSTAL engine should be added to the AiiDA database. We download the entire basis set library from the CRYSTAL website and save some selected basis sets as *.basis files using the script scripts/bs_unito_download.py. Then, in a subfolder with the *.basis files, one runs:

shell verdi data crystal_dft uploadfamily --name=$BASIS_FAMILY

or, to add the internal basis sets predefined in CRYSTAL:

shell verdi data crystal_dft createpredefined

Then the desired name ($BASISFAMILY) should be used in the calculation settings inside `mpdsaiida/calc_templates` (see below).

Usage

The MPDS platform is the main data source for generating the simulation inputs and checking the simulation results. An access to the binary compounds data subset is free, one should login at the MPDS and get the MPDS API key:

shell export MPDS_KEY=... (Please do not forget to withdraw i.e. invalidate the API key after finishing the work.)

A template system is used to control the calculation parameters, see the mpds_aiida/calc_templates subfolder. Note, that the options: resources template directive makes no sense with our custom cloud scheduler. The cluster, codes, and basis_family template directives have to be specified exactly as defined above.

The following on-demand cloud providers are currently supported (resp. yascheduler directives given in brackets):

  • Hetzner (hetzner_token, hetzner_max_nodes), API token must be issued for a project
  • Upcloud (upcloud_login, upcloud_pass, upcloud_max_nodes), API permissions are set in account settings

At the moment of writing, the chosen default Hetzner configuration (CX51) runs a test task for 2-2.5 hours on average and costs EUR 35.88 per month, the chosen default Upcloud configuration (8 cores, 4Gb memory) runs a test task for 1.5 hours on average and costs $89 per month.

More examples are given in the scripts subfolder.

An operation principle is briefly illustrated below.

General workflow

Note: this repo is subject to change and presents an ongoing work in progress.

Licensing

The resulting data are available at the MPDS platform, according to the CC BY 4.0 license.

Issues and troubleshooting

Please, report any issues in the respective repositories: aiida-crystal-dft, yascheduler, mpds-ml-labs, aiida, mpds_client, etc.

The Google Cloud machines need first to be prepared via the web-browser SSH console (note sudo -i). The file /etc/ssh/sshd_config should be changed to allow root user to log in.

The Amazon EC2 machines need first to be accessed with the admin user (note sudo -i). Then the file /root/.ssh/authorized_keys needs to be cleaned to allow root user to log in.

Owner

  • Name: MPDS
  • Login: mpds-io
  • Kind: organization
  • Email: support@mpds.io
  • Location: The Internet

MPDS stands for Materials Platform for Data Science. It provides curated high-quality materials data, manually extracted from about 400k scientific publications

Citation (CITATION.cff)

cff-version: 1.2.0
title: mpds-aiida
type: software
license: MIT
authors:
  - given-names: Andrey
    family-names: Sobolev
    orcid: 'https://orcid.org/0000-0001-5086-6601'
  - given-names: Evgeny
    family-names: Blokhin
    orcid: 'https://orcid.org/0000-0002-5333-3947'
doi: 10.5281/zenodo.7693214
url: 'https://github.com/mpds-io/mpds-aiida'
keywords:
  - AiiDA
  - materials science
  - ab initio
  - first-priniples
  - materials informatics
  - MPDS platform

GitHub Events

Total
  • Member event: 1
  • Issue comment event: 1
  • Push event: 5
  • Pull request review comment event: 2
  • Pull request review event: 3
  • Pull request event: 3
  • Create event: 2
Last Year
  • Member event: 1
  • Issue comment event: 1
  • Push event: 5
  • Pull request review comment event: 2
  • Pull request review event: 3
  • Pull request event: 3
  • Create event: 2

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: about 1 month
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 1.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 3.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • akvatol (1)
Pull Request Authors
  • akvatol (3)
  • imgbot[bot] (2)
  • blokhin (1)
  • fossabot (1)
Top Labels
Issue Labels
Pull Request Labels