iaoptativa
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: MiiNeLoC0
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 79.6 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 4
Metadata Files
README.md
README - Grobid PDF Processing Project
Description
In this repository you will find two installation options to extract information from scientific papers in PDF format using Grobid. One being manually downloading the dependencies and running grobid with docker. The other one is using Docker to do all automatically. The objective of the project is to processes documents to extract abstracts, figures count, and links in order to visualize the results through a word cloud and bar chart.
Requirements
We wil be using docker on both installations options. You only need to download Conda if you want to do the manual option.
- Docker: You can download it from (https://www.docker.com/).
- Conda: You can download it from (https://docs.conda.io/en/latest/miniconda.html).
Installation Instructions
Clone the repository
bash
git clone https://github.com/MiiNeLoC0/IAoptativa.git
cd IAoptativa
Open docker aplication
🚀 Execution Instructions
Option 1: Run with Docker (Recommended)
This method automates everything using Docker. Just use the follow command.
bash
docker-compose up --build
This will:
- Start the Grobid server.
- Process all PDFs inside the
papers/folder. - Save extracted data into
grobid_output/.
Option 2: Run Locally with Conda
If you prefer to run the project manually without using mainly Docker, follow these steps:
Add your PDFs to papers/
Create a Conda environment and install dependencies:
bash
conda env create -f environment.yml
conda activate grobid_env
Opem another terminal and start the Grobid server manually:
bash
docker run -t --rm -p 8070:8070 lfoppiano/grobid:latest-full
Wait untill grobid is connected. Run the script locally:
bash
python script.py
Running Example
In papers/ there are 1 example paper that you can use to try the program.
- Extracted abstracts are saved in
grobid_output/summaries.txt - Figures count per paper is saved in
grobid_output/figure_data.csv - Extracted links are stored in
grobid_output/extracted_links.csv - A word cloud is generated in
grobid_output/word_cloud_output.png - A bar chart of figures count is saved in
grobid_output/figure_chart.png
Preferred Citation
If using this project in research, cite Xiaolei as the main contributor. For example:
yaml
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Xiaolei"
given-names: "Zhu"
title: "IAoptativa"
version: 1.0.0
doi: 10.5281/zenodo.14882318
date-released: 2025-02-17
url: "https://github.com/MiiNeLoC0/IAoptativa"
Where to Get Help
You can contact the author through the following method:
- Email:
xiaolei.zhu@alumnos.upm.es
Acknowledgements
- Grobid for text extraction.
- Docker for containerized execution.
- Conda for environment management.
Owner
- Login: MiiNeLoC0
- Kind: user
- Repositories: 1
- Profile: https://github.com/MiiNeLoC0
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Xiaolei" given-names: "Zhu" title: "IAoptativa" version: 1.0.0 doi: 10.5281/zenodo.14882318 date-released: 2025-02-17 url: "https://github.com/MiiNeLoC0/IAoptativa"
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"type": "SoftwareSourceCode",
"author": [
{
"id": "_:author_1",
"type": "Person",
"affiliation": {
"type": "Organization",
"name": "UPM"
},
"email": "xiaolei.zhu@alumnos.upm.es",
"familyName": "Zhu",
"givenName": "Xiaolei"
},
{
"type": "schema:Role",
"schema:author": "_:author_1",
"schema:roleName": "Student"
}
],
"dateModified": "2025-03-04",
"description": "The program extracts information from scientific papers in PDF format using Grobid.The objective of the project is to processes documents to extract abstracts, figures count, and links in order to visualize the results through a word cloud and bar chart.",
"license": "https://spdx.org/licenses/Apache-2.0",
"name": "IAoptativa",
"operatingSystem": [
"Windoes",
"Linux"
],
"programmingLanguage": "Python",
"version": "1.2.0"
}
GitHub Events
Total
- Release event: 2
- Delete event: 1
- Public event: 1
- Push event: 6
- Create event: 3
Last Year
- Release event: 2
- Delete event: 1
- Public event: 1
- Push event: 6
- Create event: 3
Dependencies
- continuumio/miniconda3 latest build
- lfoppiano/grobid latest-full
- beautifulsoup4 *
- certifi *
- pillow *
- pyparsing *
- python-dateutil *
- pytz *
- tqdm *