iaoptativa

https://github.com/miineloc0/iaoptativa

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: MiiNeLoC0
License: apache-2.0
Language: Python
Default Branch: main
Size: 79.6 MB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 4

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation Codemeta

README - Grobid PDF Processing Project

Description

In this repository you will find two installation options to extract information from scientific papers in PDF format using Grobid. One being manually downloading the dependencies and running grobid with docker. The other one is using Docker to do all automatically. The objective of the project is to processes documents to extract abstracts, figures count, and links in order to visualize the results through a word cloud and bar chart.

Requirements

We wil be using docker on both installations options. You only need to download Conda if you want to do the manual option.

Docker: You can download it from (https://www.docker.com/).
Conda: You can download it from (https://docs.conda.io/en/latest/miniconda.html).

Installation Instructions

Clone the repository

bash git clone https://github.com/MiiNeLoC0/IAoptativa.git cd IAoptativa

Open docker aplication

🚀 Execution Instructions

Option 1: Run with Docker (Recommended)

This method automates everything using Docker. Just use the follow command.

bash docker-compose up --build

This will:

Start the Grobid server.
Process all PDFs inside the papers/ folder.
Save extracted data into grobid_output/.

Option 2: Run Locally with Conda

If you prefer to run the project manually without using mainly Docker, follow these steps: Add your PDFs to papers/

Create a Conda environment and install dependencies:

bash conda env create -f environment.yml conda activate grobid_env

Opem another terminal and start the Grobid server manually:

bash docker run -t --rm -p 8070:8070 lfoppiano/grobid:latest-full

Wait untill grobid is connected. Run the script locally:

bash python script.py

Running Example

In papers/ there are 1 example paper that you can use to try the program.

Extracted abstracts are saved in grobid_output/summaries.txt
Figures count per paper is saved in grobid_output/figure_data.csv
Extracted links are stored in grobid_output/extracted_links.csv
A word cloud is generated in grobid_output/word_cloud_output.png
A bar chart of figures count is saved in grobid_output/figure_chart.png

Preferred Citation

If using this project in research, cite Xiaolei as the main contributor. For example:

yaml cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Xiaolei" given-names: "Zhu" title: "IAoptativa" version: 1.0.0 doi: 10.5281/zenodo.14882318 date-released: 2025-02-17 url: "https://github.com/MiiNeLoC0/IAoptativa"

Where to Get Help

You can contact the author through the following method:

Email: xiaolei.zhu@alumnos.upm.es

Acknowledgements

Grobid for text extraction.
Docker for containerized execution.
Conda for environment management.

Owner

Login: MiiNeLoC0
Kind: user

Repositories: 1
Profile: https://github.com/MiiNeLoC0

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Xiaolei"
  given-names: "Zhu"
title: "IAoptativa"
version: 1.0.0
doi: 10.5281/zenodo.14882318
date-released: 2025-02-17
url: "https://github.com/MiiNeLoC0/IAoptativa"

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "type": "SoftwareSourceCode",
  "author": [
    {
      "id": "_:author_1",
      "type": "Person",
      "affiliation": {
        "type": "Organization",
        "name": "UPM"
      },
      "email": "xiaolei.zhu@alumnos.upm.es",
      "familyName": "Zhu",
      "givenName": "Xiaolei"
    },
    {
      "type": "schema:Role",
      "schema:author": "_:author_1",
      "schema:roleName": "Student"
    }
  ],
  "dateModified": "2025-03-04",
  "description": "The program extracts information from scientific papers in PDF format using Grobid.The objective of the project is to processes documents to extract abstracts, figures count, and links in order to visualize the results through a word cloud and bar chart.",
  "license": "https://spdx.org/licenses/Apache-2.0",
  "name": "IAoptativa",
  "operatingSystem": [
    "Windoes",
    "Linux"
  ],
  "programmingLanguage": "Python",
  "version": "1.2.0"
}

GitHub Events

Total

Release event: 2
Delete event: 1
Public event: 1
Push event: 6
Create event: 3

Last Year

Release event: 2
Delete event: 1
Public event: 1
Push event: 6
Create event: 3

Dependencies

Dockerfile docker

continuumio/miniconda3 latest build

docker-compose.yml docker

lfoppiano/grobid latest-full

environment.yml pypi

beautifulsoup4 *
certifi *
pillow *
pyparsing *
python-dateutil *
pytz *
tqdm *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science