catalysisie_knowledge_graph_generator
https://github.com/alesteb/catalysisie_knowledge_graph_generator
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: AleSteB
- Language: Jupyter Notebook
- Default Branch: main
- Size: 143 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
Readme.md
CatalysisIE based Knowledge Graph Generator
Repository for the publication "Generating knowledge graphs through text mining of catalysis research related literature". The two Excel-files listing the output of the queries as described in the publication are contained in the output folder
The tool consists following modules: preprocessonto.py, txtextract.py, textmining.py, ontoextension.py also there are jupyter notebook with SPARQL queries examples and functions for querying the ontology depending on the information of interest.
Preparations
Before starting the code, some preparations must be done: - Folder structure must be the following:
bash
main_folder
import
ontologies
ontology_snipet
CatalysisIE
PDFDataExtractor
robot
output
classlist
- The ontology to be extended must be stored in the ontologies folder
The following modules need to be installed/placed here:
- Pytorch version 1.8.0 and cuda toolkit version 11.1
- Clone the CatalysisIE (https://github.com/nsndimt/CatalysisIE) repository and download their checkpoints if needed
- Robot command line tool (http://robot.obolibrary.org/)
- PDFDataExtractor (https://pdfdataextractor.readthedocs.io/en/latest/getting_started/installation.html)
- More details regarding modules listed in cat_environment.yml and cat_environment.txt
Global variables listed in config.json must be adjusted for the process
CatalysisIE Checkpoint
The checkpoint of the extended CatalysisIE model is found here:
Usage
- Execute
create_ChEBIdict.pyto create a dictionary of all ChEBI classes for later entity recognition (might take some time) - Place PDFs in folder import
- Make sure a model for
- Insert your Scopus API key in
config.jsonand adjust other settings where necessary - Execute
run_pdfs.py(this uses modulestxt_extract.py,text_mining.py,preprocess_onto.py, andonto_extension.pyand stores resulting knowledge graph in ontologies) - Execute the jupyter notebook
user_queries.ipynbfor predefined queries on the resulting knowledge graph
Remarks
The directory labeling contains json files exported from labelStudio for the labeling of abstracts of both the methanation and hydroformylation datasets. Furthermore, this directory contains the resulting labeling of the models and the performances of the models.
Owner
- Name: AlexB
- Login: AleSteB
- Kind: user
- Repositories: 1
- Profile: https://github.com/AleSteB
GitHub Events
Total
- Push event: 11
Last Year
- Push event: 11