task1-aiaosirse

That is the first repository for task1 in Artificial Intelligence And Open Science In Research Software Engineering

https://github.com/javizhangg/task1-aiaosirse

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

That is the first repository for task1 in Artificial Intelligence And Open Science In Research Software Engineering

Basic Info
  • Host: GitHub
  • Owner: javizhangg
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Size: 63.7 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created about 1 year ago · Last pushed 12 months ago
Metadata Files
Readme License Citation Codemeta

README.md

Project: Scientific Articles Analysis with Grobid

Description

This project analyzes 10 open-access articles related to Bitcoin using Grobid. It extracts key information and visualizes it in different formats (CSV, PNG).

The main objectives of this project are: - Extract keywords and generate a word cloud from abstracts. - Visualize the number of figures in each article. - List all the links found in each paper.

Requirements

To run the project, install the following dependencies:

Option 1: Running with Docker Compose (Recommended)

This method does not require installing Python or Conda. - Install Docker from Docker official website

Option 2: Running with Python and Conda (Manual Setup)

Installation Instructions

  1. Clone the repository: bash git clone https://github.com/javizhangg/task1-AIAOSIRSE.git cd task1-AIAOSIRSE

Execution Instructions

There are two ways to execute the program:

1. Using Docker Compose (Recommended)

This method sets up the full environment without requiring Python or Conda. 1. Ensure Docker . 2. Open Docker desktop 3. Navigate to the project directory and run: bash docker-compose up --build This will automatically start all required services, including Grobid, and execute the pipeline. 4. When you have the image created you can execute the image with the comand bash docker-compose up -d

2. Using the Manual Python Setup

  1. Start Grobid using Docker: bash docker run --rm --init -p 8070:8070 -p 8071:8071 grobid/grobid:0.8.1

  2. Activate the Conda environment: bash conda init conda activate mi_entorno

  3. Run the main script to process the articles: bash python main.py

Installation Instructions

  1. Clone the repository: bash git clone https://github.com/javizhangg/task1-AIAOSIRSE.git cd task1-AIAOSIRSE

  2. Start Grobid using Docker: bash docker run --rm --init -p 8070:8070 -p 8071:8071 grobid/grobid:0.8.1

  3. Activate the Conda environment: bash conda init conda activate <environment_name>

Execution Instructions

1. Using the Current Method (Python Script)

Run the main script to process the articles: bash python main.py

2. Using Docker Compose

Alternatively, you can use Docker Compose to run the project in a containerized environment. To do so:

  1. Ensure Docker and Docker Compose are installed.
  2. Navigate to the project directory and run: bash docker-compose up --build This will automatically start all required services, including Grobid, and execute the pipeline.

Automated Testing and CI/CD

This project uses GitHub Actions for continuous integration. To manually trigger tests: bash git push origin main CI/CD workflows validate the installation and execution of tests.

📄 Documentation

Complete documentation is available on ReadTheDocs: ReadTheDocs

It includes: - Introduction - Execution Methods

- Citation

Preferred Citation

If you use this work, please cite it as: bibtex @misc{ScientificArticlesAnalysis, title = {Scientific Articles Analysis with Grobid}, howpublished = {\url{https://github.com/javizhangg/task1-AIAOSIRSE}}, autor = {Zhiwei Zhang}, publisher = {GitHub}, year = {2025}, } DOI

License

This project is licensed under the Apache License 2.0.

Where to Get Help

For questions or issues, please use the forum or contact: - Author: Zhiwei Zhang - Email: Zhiwei.zha@alumnos.upm.es - GitHub: https://github.com/javizhangg

Acknowledgments

This project follows the best practices taught in the Open Science and AI course by Daniel Garijo, including reproducibility, metadata structuring, and documentation standards【43 source】【44 source】.

📄 Structured Metadata

This project includes metadata in CodeMeta format for easier discovery and reuse.

📌 The codemeta.json file can be found in the repository root: 🔗 codemeta.json

Owner

  • Name: Zhiwei Zhang
  • Login: javizhangg
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Zhang"
  given-names: "Zhiwei"
title: "task1-AIAOSIRSE"
version: 1.0.0
doi: 10.5281/zenodo.1234
date-released: 2025-02-26
url: "https://github.com/javizhangg/task1-AIAOSIRSE"

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "type": "SoftwareSourceCode",
  "applicationCategory": "Artificial Intelligence, Research Software Engineering",
  "author": [
    {
      "id": "https://github.com/javizhangg",
      "type": "Person",
      "email": "zhiwei.zha@alumnos.upm.es",
      "familyName": "Zhang",
      "givenName": "Zhiwei"
    }
  ],
  "codeRepository": "git+https://github.com/javizhangg/task1-AIAOSIRSE.git",
  "dateCreated": "2025-02-05",
  "dateModified": "2025-03-04",
  "datePublished": "2025-02-21",
  "description": "Este repositorio contiene scripts en Python diseados para procesar documentos PDF mediante Grobid.\nIncluye varios archivos PDF de investigacin que van a ser analizados y procesados automticamente.\nEl script principal (`main.py`) inicializa y ejecuta Grobid, permitiendo la extraccin de informacin de los documentos.\nTambin se proporciona documentacin relevante en README.md y rationale.md.\n",
  "downloadUrl": "https://github.com/javizhangg/task1-AIAOSIRSE.git",
  "identifier": "10.5281/zenodo.14905817",
  "keywords": [
    "Grobid",
    "PDF Processing",
    "Open Science",
    "Research Software",
    "Python"
  ],
  "license": "https://spdx.org/licenses/Apache-2.0",
  "name": "task1-AIAOSIRSE",
  "operatingSystem": [
    "Linux",
    "Windows",
    "macOS"
  ],
  "programmingLanguage": "Python",
  "schema:releaseNotes": "-Automatic processing of PDFs with Grobid.\n-Python scripts ready for easy execution.\n-Lacks deep PDF analysis.",
  "schema:review": {
    "type": "schema:Review",
    "schema:reviewAspect": "Object facet",
    "schema:reviewBody": "Software para la extraccin automtica de datos desde documentos PDF en investigacin."
  },
  "runtimePlatform": "Docker",
  "softwareRequirements": [
    "Python 3.x",
    "Grobid (Ejecutado mediante Docker)"
  ],
  "version": "2.0.0",
  "developmentStatus": "active",
  "codemeta:isSourceCodeOf": {
    "id": "Research Software"
  },
  "issueTracker": "https://github.com/javizhangg/task1-AIAOSIRSE/issues"
}

GitHub Events

Total
  • Release event: 1
  • Push event: 11
  • Public event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 11
  • Public event: 1
  • Create event: 1