Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AleSteB
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 143 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

Readme.md

CatalysisIE based Knowledge Graph Generator

Repository for the publication "Generating knowledge graphs through text mining of catalysis research related literature". The two Excel-files listing the output of the queries as described in the publication are contained in the output folder

The tool consists following modules: preprocessonto.py, txtextract.py, textmining.py, ontoextension.py also there are jupyter notebook with SPARQL queries examples and functions for querying the ontology depending on the information of interest.

Preparations

Before starting the code, some preparations must be done: - Folder structure must be the following:

bash main_folder import ontologies ontology_snipet CatalysisIE PDFDataExtractor robot output classlist

  • The ontology to be extended must be stored in the ontologies folder
  • The following modules need to be installed/placed here:

    • Pytorch version 1.8.0 and cuda toolkit version 11.1
    • Clone the CatalysisIE (https://github.com/nsndimt/CatalysisIE) repository and download their checkpoints if needed
    • Robot command line tool (http://robot.obolibrary.org/)
    • PDFDataExtractor (https://pdfdataextractor.readthedocs.io/en/latest/getting_started/installation.html)
    • More details regarding modules listed in cat_environment.yml and cat_environment.txt
  • Global variables listed in config.json must be adjusted for the process

CatalysisIE Checkpoint

The checkpoint of the extended CatalysisIE model is found here: DOI

Usage

  1. Execute create_ChEBIdict.py to create a dictionary of all ChEBI classes for later entity recognition (might take some time)
  2. Place PDFs in folder import
  3. Make sure a model for
  4. Insert your Scopus API key in config.json and adjust other settings where necessary
  5. Execute run_pdfs.py (this uses modules txt_extract.py, text_mining.py, preprocess_onto.py, and onto_extension.py and stores resulting knowledge graph in ontologies)
  6. Execute the jupyter notebook user_queries.ipynb for predefined queries on the resulting knowledge graph

Remarks

The directory labeling contains json files exported from labelStudio for the labeling of abstracts of both the methanation and hydroformylation datasets. Furthermore, this directory contains the resulting labeling of the models and the performances of the models.

Owner

  • Name: AlexB
  • Login: AleSteB
  • Kind: user

GitHub Events

Total
  • Push event: 11
Last Year
  • Push event: 11