Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science

Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science - Published in JOSS (2024)

https://github.com/mlmi2-cssi/foundry

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
  • Committers with academic emails
    7 of 24 committers (29.2%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

chemistry data-science datasets machine-learning materials-science

Keywords from Contributors

meshing wavelets standardization

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 30% confidence
Last synced: 4 months ago · JSON representation

Repository

Simplifying the discovery and usage of machine-learning ready datasets in materials science and chemistry

Basic Info
  • Host: GitHub
  • Owner: MLMI2-CSSI
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 46.4 MB
Statistics
  • Stars: 84
  • Watchers: 5
  • Forks: 18
  • Open Issues: 35
  • Releases: 41
Topics
chemistry data-science datasets machine-learning materials-science
Created almost 6 years ago · Last pushed 8 months ago
Metadata Files
Readme License Support

README.md

PyPI Tests Tests NSF-1931306

Foundry-ML simplifies the discovery and usage of ML-ready datasets in materials science and chemistry providing a simple API to access even complex datasets. * Load ML-ready data with just a few lines of code * Work with datasets in local or cloud environments. * Publish your own datasets with Foundry to promote community usage * (in progress) Run published ML models without hassle

Learn more and see our available datasets on Foundry-ML.org

Documentation

Information on how to install and use Foundry is available in our documentation here.

DLHub documentation for model publication and running information can be found here.

Quick Start

Install Foundry-ML via command line with: pip install foundry_ml

You can use the following code to import and instantiate Foundry-ML, then load a dataset.

```python from foundry import Foundry f = Foundry(index="mdf")

f = f.load("10.18126/e73h-3w6n", globus=True) `` *NOTE*: If you run locally and don't want to install the [Globus Connect Personal endpoint](https://www.globus.org/globus-connect-personal), just set theglobus=False`.

If running this code in a notebook, a table of metadata for the dataset will appear:

metadata

We can use the data with f.load_data() and specifying splits such as train for different segments of the dataset, then use matplotlib to visualize it.

```python res = f.load_data()

imgs = res['train']['input']['imgs'] desc = res['train']['input']['metadata'] coords = res['train']['target']['coords']

nimages = 3 offset = 150 keylist = list(res['train']['input']['imgs'].keys())[0+offset:n_images+offset]

fig, axs = plt.subplots(1, nimages, figsize=(20,20)) for i in range(nimages): axs[i].imshow(imgs[keylist[i]]) axs[i].scatter(coords[keylist[i]][:,0], coords[key_list[i]][:,1], s = 20, c = 'r', alpha=0.5) ``` Screen Shot 2022-10-20 at 2 22 43 PM

See full examples

How to Cite

If you find Foundry-ML useful, please cite the following paper

@article{Schmidt2024, doi = {10.21105/joss.05467}, url = {https://doi.org/10.21105/joss.05467}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {93}, pages = {5467}, author = {Kj Schmidt and Aristana Scourtas and Logan Ward and Steve Wangen and Marcus Schwarting and Isaac Darling and Ethan Truelove and Aadit Ambadkar and Ribhav Bose and Zoa Katok and Jingrui Wei and Xiangguo Li and Ryan Jacobs and Lane Schultz and Doyeon Kim and Michael Ferris and Paul M. Voyles and Dane Morgan and Ian Foster and Ben Blaiszik}, title = {Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science}, journal = {Journal of Open Source Software} }

Contributing

Foundry is an Open Source project and we encourage contributions from the community. To contribute, please fork from the main branch and open a Pull Request on the main branch. A member of our team will review your PR shortly.

Developer notes

In order to enforce consistency with external schemas for the metadata and datacite structures (contained in the MDF data schema repository) the dc_model.py and project_model.py pydantic data models (found in the foundry/jsonschema_models folder) were generated using the datamodel-code-generator tool. In order to ensure compliance with the flake8 linting, the --use-annoted flag was passed to ensure regex patterns in dc_model.py were specified using pydantic's Annotated type vs the soon to be deprecated constr type. The command used to run the datamodel-code-generator looks like: datamodel-codegen --input dc.json --output dc_model.py --use-annotated

Primary Support

This work was supported by the National Science Foundation under NSF Award Number: 1931306 "Collaborative Research: Framework: Machine Learning Materials Innovation Infrastructure".

Other Support

Foundry-ML brings together many components in the materials data ecosystem. Including MAST-ML, the Data and Learning Hub for Science (DLHub), and the Materials Data Facility (MDF).

MAST-ML

This work was supported by the National Science Foundation (NSF) SI2 award No. 1148011 and DMREF award number DMR-1332851

The Data and Learning Hub for Science (DLHub)

This material is based upon work supported by Laboratory Directed Research and Development (LDRD) funding from Argonne National Laboratory, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357. https://www.dlhub.org

The Materials Data Facility

This work was performed under financial assistance award 70NANB14H012 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Material Design (CHiMaD). This work was performed under the following financial assistance award 70NANB19H005 from U.S. Department of Commerce, National Institute of Standards and Technology as part of the Center for Hierarchical Materials Design (CHiMaD). This work was also supported by the National Science Foundation as part of the Midwest Big Data Hub under NSF Award Number: 1636950 "BD Spokes: SPOKE: MIDWEST: Collaborative: Integrative Materials Design (IMaD): Leverage, Innovate, and Disseminate". https://www.materialsdatafacility.org

Owner

  • Name: Machine Learning Materials Innovation Infrastructure - NSF CSSI Project
  • Login: MLMI2-CSSI
  • Kind: organization

JOSS Publication

Foundry-ML - Software and Services to Simplify Access to Machine Learning Datasets in Materials Science
Published
January 23, 2024
Volume 9, Issue 93, Page 5467
Authors
Kj Schmidt ORCID
Globus, University of Chicago, Chicago, IL, United States of America, Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States of America
Aristana Scourtas ORCID
Globus, University of Chicago, Chicago, IL, United States of America, Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States of America
Logan Ward ORCID
Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States of America
Steve Wangen
Data Science Institute, University of Wisconsin-Madison, Madison, WI, United States of America
Marcus Schwarting ORCID
Department of Computer Science, University of Chicago, Chicago, IL, United States of America
Isaac Darling
Department of Computer Science, University of Chicago, Chicago, IL, United States of America
Ethan Truelove
Department of Computer Science, University of Chicago, Chicago, IL, United States of America
Aadit Ambadkar
Globus, University of Chicago, Chicago, IL, United States of America
Ribhav Bose
Globus, University of Chicago, Chicago, IL, United States of America
Zoa Katok
Globus, University of Chicago, Chicago, IL, United States of America
Jingrui Wei
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Xiangguo Li
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Ryan Jacobs ORCID
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Lane Schultz
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Doyeon Kim
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Michael Ferris ORCID
Department of Computer Science, University of Wisconsin-Madison, Madison, WI, United States of America
Paul M. Voyles ORCID
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Dane Morgan ORCID
Department of Materials Science and Engineering, University of Wisconsin-Madison, Madison, WI, United States of America
Ian Foster ORCID
Globus, University of Chicago, Chicago, IL, United States of America, Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States of America, Department of Computer Science, University of Chicago, Chicago, IL, United States of America
Ben Blaiszik ORCID
Globus, University of Chicago, Chicago, IL, United States of America, Data Science and Learning Division, Argonne National Laboratory, Lemont, IL, United States of America
Editor
Fei Tao ORCID
Tags
Machine Learning Artificial Intelligence Materials Science Data

GitHub Events

Total
  • Watch event: 2
  • Issue comment event: 3
  • Push event: 13
  • Pull request event: 2
  • Fork event: 3
  • Create event: 3
Last Year
  • Watch event: 2
  • Issue comment event: 3
  • Push event: 13
  • Pull request event: 2
  • Fork event: 3
  • Create event: 3

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 1,290
  • Total Committers: 24
  • Avg Commits per committer: 53.75
  • Development Distribution Score (DDS): 0.775
Past Year
  • Commits: 6
  • Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Ben Blaiszik b****k@u****u 290
Aristana Scourtas a****s@g****m 242
Ribhav Bose b****v@g****m 130
KJ k****3@g****m 112
zk794 z****k@g****m 108
Ethan Truelove e****e@u****u 108
Aadit-Ambadkar a****r@g****m 62
Marcus Schwarting m****s@M****l 54
repo-visualizer r****r 40
Isaac Darling 6****g 34
Aristana Scourtas a****a@u****u 30
Steven Wangen i****2@g****m 23
ZKatok z****k@s****g 14
github-actions[bot] 4****] 9
Ribhav Bose b****r@u****u 6
Marcus Schwarting m****t@M****l 6
BraedenCu b****0@g****m 4
Logan Ward W****T 4
Nathaniel Martinez n****3@u****u 4
C. Y. Schneck 2****k 2
Ian Foster f****r@a****v 2
NathanPruyne n****e@g****m 2
Ryan r****d@a****v 2
Sterling G. Baird 4****d 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 155
  • Total pull requests: 93
  • Average time to close issues: 8 months
  • Average time to close pull requests: 11 days
  • Total issue authors: 10
  • Total pull request authors: 16
  • Average comments per issue: 1.89
  • Average comments per pull request: 1.48
  • Merged pull requests: 68
  • Bot issues: 0
  • Bot pull requests: 10
Past Year
  • Issues: 0
  • Pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.75
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kjschmidt913 (53)
  • ascourtas (51)
  • blaiszik (22)
  • WardLT (9)
  • BenGalewsky (5)
  • marshallmcdonnell (3)
  • blue442 (1)
  • vsoch (1)
  • rabernat (1)
  • leschultz (1)
Pull Request Authors
  • blaiszik (43)
  • ascourtas (18)
  • allcontributors[bot] (16)
  • blue442 (13)
  • kjschmidt913 (10)
  • Aadit-Ambadkar (5)
  • WardLT (3)
  • dependabot[bot] (2)
  • wdwzyyg (2)
  • isaac-darling (2)
  • cyschneck (1)
  • marshallmcdonnell (1)
  • kurtmckee (1)
  • ianfoster (1)
  • rjacobs914 (1)
Top Labels
Issue Labels
enhancement (17) refactor (16) documentation and examples (15) bug (9) testing and deployment (3) good first issue (2) dataset (2) data (2) Summer-2023 (2) v-Automate (2) v-UI-overhaul (2) OSS-prep (1) planning (1) publish data (1) outreach (1) higher-priority (1)
Pull Request Labels
DO NOT MERGE (3) dependencies (2)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 348 last-month
  • Total docker downloads: 66
  • Total dependent packages: 1
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 44
  • Total maintainers: 3
pypi.org: foundry-ml

Package to support simplified application of machine learning models to datasets in materials science

  • Versions: 40
  • Dependent Packages: 1
  • Dependent Repositories: 1
  • Downloads: 348 Last month
  • Docker Downloads: 66
Rankings
Docker downloads count: 4.6%
Dependent packages count: 4.8%
Stargazers count: 8.5%
Forks count: 9.3%
Downloads: 9.7%
Average: 9.7%
Dependent repos count: 21.6%
Maintainers (3)
Last synced: 4 months ago
conda-forge.org: foundry_ml
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Stargazers count: 43.0%
Average: 43.6%
Forks count: 46.0%
Dependent packages count: 51.2%
Last synced: 4 months ago

Dependencies

.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/tests.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
examples/atom-position-finding/requirements.txt pypi
  • foundry_ml *
  • matplotlib *
examples/bandgap/requirements.txt pypi
  • foundry_ml *
  • matminer *
  • matplotlib *
  • pandas *
  • pymatgen *
  • scikit-learn *
examples/dendrite-segmentation/requirements.txt pypi
  • foundry_ml *
  • keras-unet *
  • opencv-python *
  • scikit-image *
  • scikit-learn *
  • tensorflow *
examples/oqmd/requirements.txt pypi
  • foundry_ml *
  • pandas *
examples/zeolite/requirements.txt pypi
  • foundry_ml *
  • matplotlib *
  • seaborn *
requirements.txt pypi
  • dlhub_sdk >=1.0.0
  • globus-sdk >=3,<4
  • h5py >=2.10.0
  • json2table >=1.1.5
  • mdf-connect-client >=0.4.0
  • mdf_forge >=0.8.0
  • numpy >=1.15.4
  • pandas >=0.23.4
  • pydantic >=1.6.1
  • requests >=2.18.4
  • scikit-learn >=1.0
  • six >=1.11.0
  • tensorflow >=2
  • torch >=1.8.0
  • tqdm >=4.19.4
  • tqdm >=4.64
setup.py pypi
  • dlhub_sdk >=1.0.0
  • globus-sdk >=3,<4
  • h5py >=2.10.0
  • json2table *
  • mdf_connect_client >=0.4.0
  • mdf_forge >=0.8.0
  • numpy >=1.15.4
  • pandas >=0.23.4
  • pydantic >=1.4
test-requirements.txt pypi
  • flake8 * test
  • jsonschema * test
  • pytest >=7 test
  • pytest-cov >=2.12 test