catalysisie_knowledge_graph_generator

https://github.com/alesteb/catalysisie_knowledge_graph_generator

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: AleSteB
Language: Jupyter Notebook
Default Branch: main
Size: 143 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

CatalysisIE based Knowledge Graph Generator

Repository for the publication "Generating knowledge graphs through text mining of catalysis research related literature". The two Excel-files listing the output of the queries as described in the publication are contained in the output folder

The tool consists following modules: preprocessonto.py, txtextract.py, textmining.py, ontoextension.py also there are jupyter notebook with SPARQL queries examples and functions for querying the ontology depending on the information of interest.

Preparations

Before starting the code, some preparations must be done: - Folder structure must be the following:

bash main_folder import ontologies ontology_snipet CatalysisIE PDFDataExtractor robot output classlist

The ontology to be extended must be stored in the ontologies folder
The following modules need to be installed/placed here:
- Pytorch version 1.8.0 and cuda toolkit version 11.1
- Clone the CatalysisIE (https://github.com/nsndimt/CatalysisIE) repository and download their checkpoints if needed
- Robot command line tool (http://robot.obolibrary.org/)
- PDFDataExtractor (https://pdfdataextractor.readthedocs.io/en/latest/getting_started/installation.html)
- More details regarding modules listed in cat_environment.yml and cat_environment.txt
Global variables listed in config.json must be adjusted for the process

CatalysisIE Checkpoint

The checkpoint of the extended CatalysisIE model is found here:

Usage

Execute create_ChEBIdict.py to create a dictionary of all ChEBI classes for later entity recognition (might take some time)
Place PDFs in folder import
Make sure a model for
Insert your Scopus API key in config.json and adjust other settings where necessary
Execute run_pdfs.py (this uses modules txt_extract.py, text_mining.py, preprocess_onto.py, and onto_extension.py and stores resulting knowledge graph in ontologies)
Execute the jupyter notebook user_queries.ipynb for predefined queries on the resulting knowledge graph

Remarks

The directory labeling contains json files exported from labelStudio for the labeling of abstracts of both the methanation and hydroformylation datasets. Furthermore, this directory contains the resulting labeling of the models and the performances of the models.

Owner

Name: AlexB
Login: AleSteB
Kind: user

Repositories: 1
Profile: https://github.com/AleSteB

GitHub Events

Total

Push event: 11

Last Year

Push event: 11

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science