Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: andriumon
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 439 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 3 years ago · Last pushed almost 3 years ago
Metadata Files
Readme License Citation Codemeta

README.md

PDF Processing Software

DOI Documentation Status

Description

Software that processes papers in PDF format by calling Grobid's web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.

Requirements

  • Papers used for input must have an abstract section or the software will fail.
  • Docker must be installed
  • Download the Grobid docker image with console docker pull lfoppiano/grobid:0.7.2

Dependencies

This build has been developed on Python 3.10 and should work with higher versions.

Python libraries matplotlib and wordcloud must be previously installed.

Dependencies can be found here to use them to build the environment with Conda

Conda

You can install Conda to easily install all the dependencies needed on an environment(recommended)

If you don't want to use Conda then skip step 3 from the Instructions segment

Instructions

  1. Copy this repo console git clone https://github.com/andriumon/OpenScienceAI.git
  2. Go to the repo and then to the src directory console cd OpenScienceAI/src
  3. Install dependencies or copy the dependencies file to the src directory and use Conda to do it with console conda create -n newenv conda activate newenv python3 -m pip install --upgrade pip pip install -r dependencies.txt Note: If python3 doesn't work, try py

  4. Create a folder called "pdfs" in the src directory and put inside all the papers you want to process

  5. Install Grobid's Python Client there

  6. Run Grobid with Docker console docker run -t --rm -p 8070:8070 lfoppiano/grobid:0.7.2

  7. Run the script console python3 pdfProcessing.py

You can check the results in the folders "wordclouds", "figures" and "links", which will be created in the directory after you run the script.

Workflow

This is a total mess

Contact

Main author and contact: andres.montero.martin@alumnos.upm.es

Owner

  • Name: Andrés Montero
  • Login: andriumon
  • Kind: user

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: 'PDF Processing '
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Andrés
    family-names: Montero Martín
    email: andres.montero.martin@alumnos.upm.es
    affiliation: UPM Student
  - {}
repository-code: 'https://github.com/andriumon/OpenScienceAI.git'
abstract: >-
  Software that processes papers in PDF format by calling
  Grobid's web service and makes a wordcloud for each of
  them as well as giving a graph indicating the number of
  figures per article and a list of links found in all of
  them.
keywords:
  - pdf
  - processing
  - grobid
license: MIT
version: 1.0.0
date-released: '2023-03-08'

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/MIT",
  "codeRepository": "git+https://github.com/andriumon/OpenScienceAI.git",
  "dateCreated": "2023-02-15",
  "datePublished": "2023-03-08",
  "dateModified": "2023-03-08",
  "name": "PDF Processing Software",
  "version": "1.0.0",
  "description": "Software that processes papers in PDF format by calling Grobid's web service and makes a wordcloud for each of them as well as giving a graph indicating the number of figures per article and a list of links found in all of them.",
  "applicationCategory": "Processing",
  "releaseNotes": "Initial release",
  "keywords": [
    "pdf",
    "processing",
    "grobid"
  ],
  "programmingLanguage": [
    "Python 3"
  ],
  "runtimePlatform": [
    "Visual Studio Code"
  ],
  "operatingSystem": [
    "Windows 10"
  ],
  "softwareRequirements": [
    "Python 3.10 or higher"
  ],
  "author": [
    {
      "@type": "Person",
      "givenName": "Andrs",
      "familyName": "Montero Martn",
      "email": "andres.montero.martin@alumnos.upm.es",
      "affiliation": {
        "@type": "Organization",
        "name": "Student"
      }
    }
  ]
}

GitHub Events

Total
Last Year