https://github.com/juliamatsci/cbfv.jl
A simple composition-based feature vectorization utility in Julia
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 8 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Keywords
Repository
A simple composition-based feature vectorization utility in Julia
Basic Info
- Host: GitHub
- Owner: JuliaMatSci
- License: other
- Language: Julia
- Default Branch: master
- Homepage: https://juliamatsci.github.io/CBFV.jl/
- Size: 761 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Topics
Metadata Files
README.md
CBFV.jl : A simple composition-based feature vectorization utility in Julia
This is a Julia rewrite of the python tool to create a composition-based feature vector representation for machine learning with materials science data. The ideas and methodology are discussed in the recent article:
Wang, Anthony Yu-Tung; Murdock, Ryan J.; Kauwe, Steven K.; Oliynyk, Anton O.; Gurlo, Aleksander; Brgoch, Jakoah; Persson, Kristin A.; Sparks, Taylor D., Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices, Chemistry of Materials 2020, 32 (12): 4954–4965. DOI: 10.1021/acs.chemmater.0c01907.
and the original python source code(s) can be found here:
- https://github.com/anthony-wang/BestPractices/tree/master/notebooks/CBFV
- https://github.com/kaaiian/CBFV
Example Use
The input data set should have a least two columns with the header/names formula and target.
julia
using DataFrames
using CBFV
data = DataFrame("name"=>["Rb2Te","CdCl2","LaN"],"bandgap_eV"=>[1.88,3.51,1.12])
rename!(data,Dict("name"=>"formula","bandgap_eV"=>"target"))
features = generatefeatures(data)
The thing to note is you most likely will still want to post-process the generated feature data using some transformation to scale the data. The StatsBase.jl package provides some basic fetures for this, although the input needs to be AbstractMatrix{<:Real} rather than a DataFrame. This can be achieved using generatefeatures(data,returndataframe=false)
Supported Featurization Schemes
As with the orignal CBFV python package the following element databases are available:
oliynyk(default): Database from A. Oliynyk.magpie: Materials Agnostic Platform for Informatics and Explorationmat2vec: Word embeddings capture latent knowledge from materials sciencejarvis: Joint Automated Repository for Various Integrated Simulations provided by U.S. National Institutes of Standards and Technologies.onehot: Simple one hot encoding scheme, i.e., diagonal elemental matrix.random_200: 200 random elemental properties (I'm assuming).
However, CBFV.jl will allow you to provide your own element database to featurize with. Also, the current implementation reads the saved .csv file in databases, however, this is prone to potential issues (ex. out of date files). To alleviate this I will change the implementation to utilize Pkg.Artificats with a Artificats.toml file that enables grabbing the datafiles needed from a server if they don't exist locally already.
Julia Dependencies
This is a relatively small package so there aren't a lot of dependencies. The required packages are:
- CSV
- DataFrames
- ProgressBars
Citations
Pleae cite the following when and if you use this package in your work:
bibtex
@misc{CBFV.jl,
author = {Bringuier, Stefan},
year = {2021},
title = {CBFV.jl - A simple composition based feature vectorization Julia utility},
url = {https://github.com/JuliaMatSci/CBFV.jl},
}
In addition, please also consider citing the original python implementation and tutorial paper.
bibtex
@misc{CBFV,
author = {Kauwe, Steven and Wang, Anthony Yu-Tung and Falkowski, Andrew},
title = {CBFV: Composition-based feature vectors},
url = {https://github.com/kaaiian/CBFV}
}
bibtex
@article{Wang2020bestpractices,
author = {Wang, Anthony Yu-Tung and Murdock, Ryan J. and Kauwe, Steven K. and Oliynyk, Anton O. and Gurlo, Aleksander and Brgoch, Jakoah and Persson, Kristin A. and Sparks, Taylor D.},
year = {2020},
title = {Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices},
url = {https://doi.org/10.1021/acs.chemmater.0c01907},
pages = {4954--4965},
volume = {32},
number = {12},
issn = {0897-4756},
journal = {Chemistry of Materials},
doi = {10.1021/acs.chemmater.0c01907}
}
Owner
- Name: JuliaMatSci
- Login: JuliaMatSci
- Kind: organization
- Repositories: 3
- Profile: https://github.com/JuliaMatSci
Materials Science Computing in Julia
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 3
- Total pull requests: 3
- Average time to close issues: about 17 hours
- Average time to close pull requests: 3 days
- Total issue authors: 2
- Total pull request authors: 1
- Average comments per issue: 0.67
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 3
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- stefanbringuier (2)
- JuliaTagBot (1)
Pull Request Authors
- github-actions[bot] (3)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 1
juliahub.com: CBFV
A simple composition-based feature vectorization utility in Julia
- Homepage: https://juliamatsci.github.io/CBFV.jl/
- Documentation: https://docs.juliahub.com/General/CBFV/stable/
- License: MIT
-
Latest release: 0.1.0
published almost 4 years ago