issa-pipeline

Description and generation of the ISSA RDF knowledge graph

https://github.com/issa-project/issa-pipeline

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Last synced: 11 months ago · JSON representation ·

Repository

Description and generation of the ISSA RDF knowledge graph

Basic Info

Host: GitHub
Owner: issa-project
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 53.3 MB

Statistics

Stars: 3
Watchers: 4
Forks: 4
Open Issues: 5
Releases: 5

Created about 5 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License Citation Codemeta

ISSA Processing Pipeline

This repository contains the pipeline developed by the ISSA project. It orchestrates the automatic indexing of a scientific archive by extracting from the articles full-text thematic descriptors and named entities, and linking them with terminological resources in the Semantic Web format.

The repository consists of various tools, scripts and configuration files involved in each step of the pipeline: - retrieve the articles metadata from the archive's API; - download and pre-process the PDF files of the articles; - process the output to extract thematic descriptors and named entities; - translate the output of each processing step into a unified, consistent RDF dataset; - retrieve additional metadata from OpenAlex: topics, Sustainable Devlopment Goals (SDG), authorship with institutions - upload the resulting dataset to a triple store equipped with a SPARQL endpoint.

These steps are summurized in the following diagram.

Content

License

See the LICENSE file.

Cite this work

Reference article

Anne Toulet, Franck Michel, Anna Bobasheva, Aline Menin, Sébastien Dupré, Marie-Claude Deboin, Marco Winckler, and Andon Tchechmedjiev. ISSA: generic pipeline, knowledge model and visualization tools to help scientists search and make sense of a scientific archive. In The Semantic Web–ISWC 2022: 21st International Semantic Web Conference, October 23–27, 2022, Proceedings, pp. 660-677. Cham: Springer International Publishing, 2022. https://doi.org/10.1007/978-3-031-19433-7_38

See BibTex

@inproceedings{toulet2022issa, title={ISSA: generic pipeline, knowledge model and visualization tools to help scientists search and make sense of a scientific archive}, author={Toulet, Anne and Michel, Franck and Bobasheva, Anna and Menin, Aline and Dupr{\'e}, S{\'e}bastien and Deboin, Marie-Claude and Winckler, Marco and Tchechmedjiev, Andon}, booktitle={The Semantic Web--ISWC 2022: 21st International Semantic Web Conference, Virtual Event, October 23--27, 2022, Proceedings}, pages={660--677}, year={2022}, organization={Springer} }

Cite this software

Anna BOBASHEVA, Franck MICHEL, Andon TCHECHMEDJIEV, Anne TOULET, Quentin SCORDO (2024). ISSA Processing Pipeline. https://github.com/issa-project/issa-pipeline.

See BibTex

@software{BOBASHEVA_issa-pipeline_2024, author = {BOBASHEVA, Anna and MICHEL, Franck and TCHECHMEDJIEV, Andon and TOULET, Anne , and SCORDO Quentin}, title = {{issa-pipeline}}, url = {https://github.com/issa-project/issa-pipeline}, version = {2.1.0}, year = {2024} }

Owner

Name: ISSA Project
Login: issa-project
Kind: organization

Repositories: 2
Profile: https://github.com/issa-project

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "BOBASHEVA"
  given-names: "Anna"
  orcid: "https://orcid.org/0000-0003-0395-2069"
- family-names: "MICHEL"
  given-names: "Franck"
  orcid: "https://orcid.org/0000-0001-9064-0463"
- family-names: "TCHECHMEDJIEV"
  given-names: "Andon"
  orcid: "https://orcid.org/0000-0003-3749-5521"
- family-names: "TOULET"
  given-names: "Anne"
  orcid: "https://orcid.org/0000-0003-0463-0854"
- family-names: "SCORDO"
  given-names: "Quentin"
title: "issa-pipeline"
version: 2.1.0
date-released: 2024-11-27
url: "https://github.com/issa-project/issa-pipeline"

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/Apache-2.0",
  "codeRepository": "https://github.com/issa-project/issa-pipeline/",
  "dateCreated": "2022-05-03",
  "datePublished": "2022-05-03",
  "dateModified": "2024-11-27",
  "issueTracker": "https://github.com/issa-project/issa-pipeline/issues",
  "name": "ISSA Pipeline",
  "version": "2.1.0",
  "description": "This pipeline, developed by the ISSA project, orchestrates the automatic indexing of a scientific archive by extracting from the articles full-text thematic descriptors and named entities, and linking them with terminological resources in the Semantic Web format.",
  "developmentStatus": "active",
  "referencePublication": "https://hal.science/hal-03807744",
  "keywords": [
    "open science",
    "knowledge graph",
    "scientific literature",
    "indexing"
  ],
  "programmingLanguage": [
    "python",
    "bash"
  ],
  "softwareRequirements": [
    "https://github.com/issa-project/issa-pipeline/tree/main/environment"
  ],
  "operatingSystem": [
    "Linux",
    "Docker"
  ],
  "author": [
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0001-9064-0463",
      "givenName": "Franck",
      "familyName": "Michel",
      "email": "franck.michel@inria.fr",
      "affiliation": {
        "@type": "Organization",
        "name": "University Côte d'Azur, CNRS, Inria"
      }
    },
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0003-0395-2069",
      "givenName": "Anna",
      "familyName": "Bobasheva",
      "affiliation": {
        "@type": "Organization",
        "name": "University Côte d'Azur, Inria, CNRS"
      }
    },
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0001-9064-0463",
      "givenName": "Franck",
      "familyName": "Michel",
      "affiliation": {
        "@type": "Organization",
        "name": "University Côte d'Azur, CNRS, Inria"
      }
    },
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0001-9064-0463",
      "givenName": "Andon",
      "familyName": "Tchechmedjiev",
      "affiliation": {
        "@type": "Organization",
        "name": "Euromov Digital Health in Motion, Univ Montpellier, IMT Mines Ales, Ales, France"
      }
    },
    {
      "@type": "Person",
      "givenName": "Quentin",
      "familyName": "Scordo",
      "affiliation": {
        "@type": "Organization",
        "name": "University Côte d'Azur, Inria"
      }
    }
  ]
}

GitHub Events

Total

Release event: 1
Push event: 13
Create event: 1

Last Year

Release event: 1
Push event: 13
Create event: 1

Dependencies

environment/containers/agrovoc-pyclinrec/requirements.txt pypi

Metafone *
SPARQLWrapper *
flask ==2.0.2
jellyfish *
nltk *
pandas ==1.1.5
regex ==2020.10.15
requests *
spacy *
torch *
tqdm *
transformers *

environment/python/requirements.txt pypi

SPARQLWrapper ==1.8.5
certifi ==2021.5.30
charset-normalizer ==2.0.4
cord-19-tools ==0.3.3
idna ==3.2
isodate ==0.6.0
joblib ==1.1.0
lxml ==4.6.3
numpy ==1.19.5
pandas ==1.1.5
pycld2 ==0.41
pyparsing ==2.4.7
python-dateutil ==2.8.2
pytz ==2021.1
rdflib ==5.0.0
requests ==2.26.0
retrying ==1.3.3
scikit-learn ==0.24.2
scipy ==1.5.4
six ==1.16.0
threadpoolctl ==3.1.0
tqdm ==4.62.3
urllib3 ==1.26.6
xmltodict ==0.12.0

environment/containers/agrovoc-pyclinrec/Dockerfile docker

python 3.8-slim-bullseye build

environment/containers/entity-fishing/Dockerfile docker

openjdk 8u275-jdk build

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science