openscienceai
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: andriumon
- License: mit
- Language: Python
- Default Branch: main
- Size: 439 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
PDF Processing Software
Description
Software that processes papers in PDF format by calling Grobid's web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.
Requirements
- Papers used for input must have an abstract section or the software will fail.
- Docker must be installed
- Download the Grobid docker image with
console docker pull lfoppiano/grobid:0.7.2
Dependencies
This build has been developed on Python 3.10 and should work with higher versions.
Python libraries matplotlib and wordcloud must be previously installed.
Dependencies can be found here to use them to build the environment with Conda
Conda
You can install Conda to easily install all the dependencies needed on an environment(recommended)
If you don't want to use Conda then skip step 3 from the Instructions segment
Instructions
- Copy this repo
console git clone https://github.com/andriumon/OpenScienceAI.git - Go to the repo and then to the src directory
console cd OpenScienceAI/src Install dependencies or copy the dependencies file to the src directory and use Conda to do it with
console conda create -n newenv conda activate newenv python3 -m pip install --upgrade pip pip install -r dependencies.txtNote: If python3 doesn't work, try pyCreate a folder called "pdfs" in the src directory and put inside all the papers you want to process
Install Grobid's Python Client there
Run Grobid with Docker
console docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.2Run the script
console python3 pdfProcessing.py
You can check the results in the folders "wordclouds", "figures" and "links", which will be created in the directory after you run the script.
Workflow

Contact
Main author and contact: andres.montero.martin@alumnos.upm.es
Owner
- Name: Andrés Montero
- Login: andriumon
- Kind: user
- Repositories: 1
- Profile: https://github.com/andriumon
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: 'PDF Processing '
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Andrés
family-names: Montero Martín
email: andres.montero.martin@alumnos.upm.es
affiliation: UPM Student
- {}
repository-code: 'https://github.com/andriumon/OpenScienceAI.git'
abstract: >-
Software that processes papers in PDF format by calling
Grobid's web service and makes a wordcloud for each of
them as well as giving a graph indicating the number of
figures per article and a list of links found in all of
them.
keywords:
- pdf
- processing
- grobid
license: MIT
version: 1.0.0
date-released: '2023-03-08'
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"license": "https://spdx.org/licenses/MIT",
"codeRepository": "git+https://github.com/andriumon/OpenScienceAI.git",
"dateCreated": "2023-02-15",
"datePublished": "2023-03-08",
"dateModified": "2023-03-08",
"name": "PDF Processing Software",
"version": "1.0.0",
"description": "Software that processes papers in PDF format by calling Grobid's web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.",
"applicationCategory": "Processing",
"releaseNotes": "Initial release",
"keywords": [
"pdf",
"processing",
"grobid"
],
"programmingLanguage": [
"Python 3"
],
"runtimePlatform": [
"Visual Studio Code"
],
"operatingSystem": [
"Windows 10"
],
"softwareRequirements": [
"Python 3.10 or higher"
],
"author": [
{
"@type": "Person",
"givenName": "Andrs",
"familyName": "Montero Martn",
"email": "andres.montero.martin@alumnos.upm.es",
"affiliation": {
"@type": "Organization",
"name": "Student"
}
}
]
}