Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

analysis paper papers research
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: anastmur
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 48.6 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 3
Topics
analysis paper papers research
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Advanced Paper Analyzer

DOI Documentation Status

Table of Contents

Introduction

Advanced Paper Analyzer takes a set of research papers and extracts its metadata to obtain information. It accesses Wikidata and ROR to expand the information and also has processes that compare the similarity between the abstracts taken from the papers and that analyze the possible topics the paper is about.

Installation

You have the choice to run the application in a container (note that you need a VNC-client) here or in your computer as follows :

  1. Clone the repository:

bash git clone https://gitlab.utc.fr/royhucheradorni/ia04.git

  1. Python

The code runs on Python 3.10, so it must be installed in the system to be able to use Advanced Paper Analyzer.

  1. Dependencies

Dependencies can be installed by using Poetry. You simply must go to the root directory of the repository and run:

bash poetry install

Or install all dependencies with pip using requirements.txt in the root directory of the repository by running:

bash pip install -r requirements.txt

  1. Grobid

Grobid is used to extract metadata from the papers, which are then used for further analysis. For this reason you must install either the full or light version of the Grobid 0.8.0 Docker image. To run Grobid use one of this commands depending on version you have:

Full image: https://hub.docker.com/r/grobid/grobid bash docker pull grobid/grobid

Light image: https://hub.docker.com/r/lfoppiano/grobid/ bash docker pull lfoppiano/grobid

  1. Apache Jena Fuseki

Jena Fuseki is used to create the triple-store and the SPARQL endpoint, so it must be installed and run as described in the section to create the dataset bash docker pull stain/jena-fuseki

How to use

  1. run Jena-fuseki and grobid with : bash docker run -p 8070:8070 lfoppiano/grobid:latest-develop bash docker run -p 3035:3030 -e ADMIN_PASSWORD=pw123 -e FUSEKI_DATASET_1=KG_dataset stain/jena-fuseki it creates the dataset at the same time

  2. Run the script interface.py with the parameter 0 bash poetry run python interface.py 0

You can now : - PROCESS PDF WITH GROBID : process all the pdf in the directory Corpus_pdf to reformat the data/metadata in a XML format.

  • EXTRACT DATA : Extract the data (title, date, author) from the processed pdf and do some topic modeling and compute similarity between the abstract of each pdf.

  • Enrich DATA : Add more information coming from ROR and WIKIDATA (name, authors, organizations_founder of referenced papers).

  • INSERT DATA FROM RDF : Add all this data to KG server Jena-fuseki.

  • SUBMIT QUERY : in the input box, write your SPARQL queries and submit.

Example of queries : 1. select each topic of which the papers have more than 0.90 probability of belonging to that topic : our RDF diagram

  1. to request all the pair of article with more than 70% of similarity : our RDF diagram

Our RDF diagram : our RDF diagram

DOCKER

In order to display the Graphic User Interface running in a docker container, we create a VNC-server. Therefore, you will need to have a VNC-client software (such as RealVNC Viewer).

How to install and run

  1. Go to the location of the docker-compose and run : bash docker-compose build docker-compose up -d
  2. Connect to the container using your VNC-client at the adress : localhost:5901 The password is : pw123

  3. Open a terminal and execute : bash poetry run python interface.py 1

Owner

  • Name: AMT
  • Login: anastmur
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Muran Trus
    given-names: Anastasia
  - family-names: Gonzalez Mendez
    given-names: Alvaro
  - family-names: Hucher
    given-names: Tristan
title: "Advanced Paper Analizer"
version: 0.1.0-alpha
identifiers:
  - type: doi
    value: x
date-released: 2024-04-17

GitHub Events

Total
Last Year

Dependencies

poetry.lock pypi
  • lxml 5.2.1
pyproject.toml pypi
  • lxml ^5.2.1
  • python ^3.10
docker-compose.yml docker
  • lfoppiano/grobid latest-develop
  • stain/jena-fuseki latest
vnc_ubuntu/Dockerfile docker
  • ubuntu 20.04 build
requirements.txt pypi