https://github.com/juliamatsci/cbfv.jl

A simple composition-based feature vectorization utility in Julia

https://github.com/juliamatsci/cbfv.jl

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary

Keywords

data-science machine-learning materials-informatics materials-science
Last synced: 6 months ago · JSON representation

Repository

A simple composition-based feature vectorization utility in Julia

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
data-science machine-learning materials-informatics materials-science
Created almost 4 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License

README.md

CBFV.jl : A simple composition-based feature vectorization utility in Julia

Stable Dev Build StatusCoverage

This is a Julia rewrite of the python tool to create a composition-based feature vector representation for machine learning with materials science data. The ideas and methodology are discussed in the recent article:

Wang, Anthony Yu-Tung; Murdock, Ryan J.; Kauwe, Steven K.; Oliynyk, Anton O.; Gurlo, Aleksander; Brgoch, Jakoah; Persson, Kristin A.; Sparks, Taylor D., Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices, Chemistry of Materials 2020, 32 (12): 4954–4965. DOI: 10.1021/acs.chemmater.0c01907.

and the original python source code(s) can be found here:

Example Use

The input data set should have a least two columns with the header/names formula and target.

julia using DataFrames using CBFV data = DataFrame("name"=>["Rb2Te","CdCl2","LaN"],"bandgap_eV"=>[1.88,3.51,1.12]) rename!(data,Dict("name"=>"formula","bandgap_eV"=>"target")) features = generatefeatures(data)

The thing to note is you most likely will still want to post-process the generated feature data using some transformation to scale the data. The StatsBase.jl package provides some basic fetures for this, although the input needs to be AbstractMatrix{<:Real} rather than a DataFrame. This can be achieved using generatefeatures(data,returndataframe=false)

Supported Featurization Schemes

As with the orignal CBFV python package the following element databases are available:

However, CBFV.jl will allow you to provide your own element database to featurize with. Also, the current implementation reads the saved .csv file in databases, however, this is prone to potential issues (ex. out of date files). To alleviate this I will change the implementation to utilize Pkg.Artificats with a Artificats.toml file that enables grabbing the datafiles needed from a server if they don't exist locally already.

Julia Dependencies

This is a relatively small package so there aren't a lot of dependencies. The required packages are:

  • CSV
  • DataFrames
  • ProgressBars

Citations

Pleae cite the following when and if you use this package in your work:

bibtex @misc{CBFV.jl, author = {Bringuier, Stefan}, year = {2021}, title = {CBFV.jl - A simple composition based feature vectorization Julia utility}, url = {https://github.com/JuliaMatSci/CBFV.jl}, } In addition, please also consider citing the original python implementation and tutorial paper.

bibtex @misc{CBFV, author = {Kauwe, Steven and Wang, Anthony Yu-Tung and Falkowski, Andrew}, title = {CBFV: Composition-based feature vectors}, url = {https://github.com/kaaiian/CBFV} }

bibtex @article{Wang2020bestpractices, author = {Wang, Anthony Yu-Tung and Murdock, Ryan J. and Kauwe, Steven K. and Oliynyk, Anton O. and Gurlo, Aleksander and Brgoch, Jakoah and Persson, Kristin A. and Sparks, Taylor D.}, year = {2020}, title = {Machine Learning for Materials Scientists: An Introductory Guide toward Best Practices}, url = {https://doi.org/10.1021/acs.chemmater.0c01907}, pages = {4954--4965}, volume = {32}, number = {12}, issn = {0897-4756}, journal = {Chemistry of Materials}, doi = {10.1021/acs.chemmater.0c01907} }

Owner

  • Name: JuliaMatSci
  • Login: JuliaMatSci
  • Kind: organization

Materials Science Computing in Julia

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 3
  • Total pull requests: 3
  • Average time to close issues: about 17 hours
  • Average time to close pull requests: 3 days
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.67
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • stefanbringuier (2)
  • JuliaTagBot (1)
Pull Request Authors
  • github-actions[bot] (3)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
juliahub.com: CBFV

A simple composition-based feature vectorization utility in Julia

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 9.9%
Dependent packages count: 38.9%
Average: 43.9%
Forks count: 53.5%
Stargazers count: 73.2%
Last synced: 7 months ago