Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: adrijmz
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 114 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation Codemeta

README.md

DOI

Repository Overview

https://os-group-project.readthedocs.io/en/latest/

The project aims to create a knowledge graph by extracting information from articles. It provides functionalities to extract metadata, process data from Wikidata and OpenAlex, perform topic modeling, and create a knowledge graph. The project can be installed either using Docker or from the source. To extract all information the script use the service GROBID (2008-2022) https://github.com/kermitt2/grobid, Wikidata https://www.wikidata.org/wiki/Wikidata:Main_Page and OpenAlex https://openalex.org.

Features

  • Extraction of metadata from articles using GROBID.
  • Processing data from Wikidata and OpenAlex.
  • Topic modeling functionality.
  • Creation of a knowledge graph.
  • API for querying the knowledge graph.

Install

First of all, clone the repository bash git clone https://github.com/adrijmz/os_group_project.git

Using Docker

To install the GROBID image, execute the following command bash docker pull lfoppiano/grobid:0.7.2

To build the extractor image, execute the followint command from the root directory of the repository bash cd /root/directory/of/os_group_project docker build -t paper_kg .

From Source

To install the GROBID image, execute the following command bash docker pull lfoppiano/grobid:0.7.2

Install Python Environment

This project requires Python 3.8

Step 1

Create a virtual environment to isolate the project dependencies bash conda create -n myenv python=3.8 Init the environment created if it is necessary bash conda init myenv Activate the new environment bash conda activate myenv

Step 2

Install dependencies bash cd /path/to/root/directory/of/os_group_project pip install -r requirements.txt

Usage

Using Docker

Create a Docker network to communicate both containers bash docker network create kg_red

To run the GROBID container, execute the following command bash docker run --name server --network kg_red -p 8070:8070 lfoppiano/grobid:0.7.2 Before running the app, check in src/functionalities/grobid.py that url has this value

bash url = "http://server:8070/api/processFulltextDocument

To run the app container, open a new terminal window and execute the following command bash docker run --name paper_kg --network kg_red paper_kg

When all scripts have finished executing, access this URL to make queries to the knowledge graph: - http://127.0.0.1:8050/

If you want to see the files generated and you have used Docker to run extractor, execute the following command

To check container ID bash docker ps -a

To copy all files to a desire directory bash docker cp container_id:/app /path/to/your/directory

From Source

To run the GROBID container, execute the following command bash docker run --name server -p 8070:8070 lfoppiano/grobid:0.7.2 Before running the app, check in src/functionalities/grobid.py that url has this value bash url = "http://localhost:8070/api/processFulltextDocument"

To run all scripts (from the root directory) follow this order. You need to have activated the previous conda env. bash python src/functionalities/grobid.py python src/functionalities/wikidataProcess.py python src/functionalities/openalex.py python src/functionalities/abstract_lda.py python src/functionalities/ner.py python src/functionalities/knowledge_graph.py python src/api/app.py

When app.py script have finished executing, access this URL to make queries to the knowledge graph: - http://127.0.0.1:8050/

To access the GROBID service, go to the following URL - http://localhost:8070/

Examples to query

To obtain all titles: bash PREFIX schema: <http://schema.org/> SELECT ?title WHERE { ?paper a schema:paper ; schema:title ?title . }

To obtain all possible topics: bash PREFIX schema: <http://schema.org/> SELECT ?topic WHERE { ?paper a schema:topic ; schema:name ?topic . }

To obtain a specific paper: ```bash PREFIX schema: http://schema.org/

SELECT ?title ?topic ?author
WHERE {
?paper a schema:paper ;
    schema:doi "10.26735/TLYG7256" ;
    schema:title ?title ;
    schema:topic ?topic ;
    schema:author ?author .
}

```

Owner

  • Name: Adrián Jiménez
  • Login: adrijmz
  • Kind: user
  • Location: Madrid, Spain
  • Company: Stratebi Business Solutions

Computer Engineering

Citation (CITATION.cff)

title: "Extractor: Extract data from a PDF file"
license: "MIT"
authors:
  - family-names: "Jiménez Cano"
    given-names: "Adrián"
  - family-names: "Guerra Pantojo"
    given-names: "Daniel"
  - family-names: "Turégano Ramos"
    given-names: "Adrián"
cff-version: "1.0.0"
preferred-citation:
  authors:
  - family-names: "Jiménez Cano"
    given-names: "Adrián"
  - family-names: "Guerra Pantojo"
    given-names: "Daniel"
  - family-names: "Turégano Ramos"
    given-names: "Adrián"
  title: "Extractor: Extract data from a PDF file"
  type: "software"
  year: 2024
  doi: "10.5281/zenodo.11200165"

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/MIT",
  "codeRepository": "https://github.com/adrijmz/os_group_project",
  "dateCreated": "2024-04-07",
  "datePublished": "2024-04-07",
  "dateModified": "2024-04-20",
  "name": "Paper KG",
  "version": "1.1.0",
  "identifier": "10.5281/zenodo.11200165",
  "description": "The project aims to create a knowledge graph by extracting information from articles.",
  "applicationCategory": "Software",
  "releaseNotes": "Final release",
  "developmentStatus": "active",
  "referencePublication": "https://zenodo.org/records/11200166",
  "keywords": [
    "extract",
    "analyze",
    "knowledge graph"
  ],
  "programmingLanguage": [
    "Python 3"
  ],
  "contributor": [
    {
      "@type": "Person",
      "givenName": "Adrian",
      "familyName": "Jimenez Cano"
    },
    {
      "@type": "Person",
      "givenName": "Daniel",
      "familyName": "Guerra Pantojo"
    },
    {
      "@type": "Person",
      "givenName": "Adrian",
      "familyName": "Turégano Ramos"
    }
  ]
}

GitHub Events

Total
Last Year