task1-aiaosirse
That is the first repository for task1 in Artificial Intelligence And Open Science In Research Software Engineering
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.4%) to scientific vocabulary
Repository
That is the first repository for task1 in Artificial Intelligence And Open Science In Research Software Engineering
Basic Info
- Host: GitHub
- Owner: javizhangg
- License: apache-2.0
- Language: Python
- Default Branch: main
- Size: 63.7 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Project: Scientific Articles Analysis with Grobid
Description
This project analyzes 10 open-access articles related to Bitcoin using Grobid. It extracts key information and visualizes it in different formats (CSV, PNG).
The main objectives of this project are: - Extract keywords and generate a word cloud from abstracts. - Visualize the number of figures in each article. - List all the links found in each paper.
Requirements
To run the project, install the following dependencies:
Option 1: Running with Docker Compose (Recommended)
This method does not require installing Python or Conda. - Install Docker from Docker official website
Option 2: Running with Python and Conda (Manual Setup)
- Install Docker from Docker official website
- Import Grobid into Docker using the command:
bash docker pull grobid/grobid:0.8.1 - Install Anaconda from Anaconda official website
- Create a new environment using the
environment.ymlfile:bash cd (project-directory) conda env create -f environment.yml - Install Python from Python official website
Installation Instructions
- Clone the repository:
bash git clone https://github.com/javizhangg/task1-AIAOSIRSE.git cd task1-AIAOSIRSE
Execution Instructions
There are two ways to execute the program:
1. Using Docker Compose (Recommended)
This method sets up the full environment without requiring Python or Conda.
1. Ensure Docker .
2. Open Docker desktop
3. Navigate to the project directory and run:
bash
docker-compose up --build
This will automatically start all required services, including Grobid, and execute the pipeline.
4. When you have the image created you can execute the image with the comand
bash
docker-compose up -d
2. Using the Manual Python Setup
Start Grobid using Docker:
bash docker run --rm --init -p 8070:8070 -p 8071:8071 grobid/grobid:0.8.1Activate the Conda environment:
bash conda init conda activate mi_entornoRun the main script to process the articles:
bash python main.py
Installation Instructions
Clone the repository:
bash git clone https://github.com/javizhangg/task1-AIAOSIRSE.git cd task1-AIAOSIRSEStart Grobid using Docker:
bash docker run --rm --init -p 8070:8070 -p 8071:8071 grobid/grobid:0.8.1Activate the Conda environment:
bash conda init conda activate <environment_name>
Execution Instructions
1. Using the Current Method (Python Script)
Run the main script to process the articles:
bash
python main.py
2. Using Docker Compose
Alternatively, you can use Docker Compose to run the project in a containerized environment. To do so:
- Ensure Docker and Docker Compose are installed.
- Navigate to the project directory and run:
bash docker-compose up --buildThis will automatically start all required services, including Grobid, and execute the pipeline.
Automated Testing and CI/CD
This project uses GitHub Actions for continuous integration. To manually trigger tests:
bash
git push origin main
CI/CD workflows validate the installation and execution of tests.
📄 Documentation
Complete documentation is available on ReadTheDocs:
It includes: - Introduction - Execution Methods
- Citation
Preferred Citation
If you use this work, please cite it as:
bibtex
@misc{ScientificArticlesAnalysis,
title = {Scientific Articles Analysis with Grobid},
howpublished = {\url{https://github.com/javizhangg/task1-AIAOSIRSE}},
autor = {Zhiwei Zhang},
publisher = {GitHub},
year = {2025},
}
License
This project is licensed under the Apache License 2.0.
Where to Get Help
For questions or issues, please use the forum or contact: - Author: Zhiwei Zhang - Email: Zhiwei.zha@alumnos.upm.es - GitHub: https://github.com/javizhangg
Acknowledgments
This project follows the best practices taught in the Open Science and AI course by Daniel Garijo, including reproducibility, metadata structuring, and documentation standards【43 source】【44 source】.
📄 Structured Metadata
This project includes metadata in CodeMeta format for easier discovery and reuse.
📌 The codemeta.json file can be found in the repository root:
🔗 codemeta.json
Owner
- Name: Zhiwei Zhang
- Login: javizhangg
- Kind: user
- Repositories: 1
- Profile: https://github.com/javizhangg
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Zhang" given-names: "Zhiwei" title: "task1-AIAOSIRSE" version: 1.0.0 doi: 10.5281/zenodo.1234 date-released: 2025-02-26 url: "https://github.com/javizhangg/task1-AIAOSIRSE"
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"type": "SoftwareSourceCode",
"applicationCategory": "Artificial Intelligence, Research Software Engineering",
"author": [
{
"id": "https://github.com/javizhangg",
"type": "Person",
"email": "zhiwei.zha@alumnos.upm.es",
"familyName": "Zhang",
"givenName": "Zhiwei"
}
],
"codeRepository": "git+https://github.com/javizhangg/task1-AIAOSIRSE.git",
"dateCreated": "2025-02-05",
"dateModified": "2025-03-04",
"datePublished": "2025-02-21",
"description": "Este repositorio contiene scripts en Python diseados para procesar documentos PDF mediante Grobid.\nIncluye varios archivos PDF de investigacin que van a ser analizados y procesados automticamente.\nEl script principal (`main.py`) inicializa y ejecuta Grobid, permitiendo la extraccin de informacin de los documentos.\nTambin se proporciona documentacin relevante en README.md y rationale.md.\n",
"downloadUrl": "https://github.com/javizhangg/task1-AIAOSIRSE.git",
"identifier": "10.5281/zenodo.14905817",
"keywords": [
"Grobid",
"PDF Processing",
"Open Science",
"Research Software",
"Python"
],
"license": "https://spdx.org/licenses/Apache-2.0",
"name": "task1-AIAOSIRSE",
"operatingSystem": [
"Linux",
"Windows",
"macOS"
],
"programmingLanguage": "Python",
"schema:releaseNotes": "-Automatic processing of PDFs with Grobid.\n-Python scripts ready for easy execution.\n-Lacks deep PDF analysis.",
"schema:review": {
"type": "schema:Review",
"schema:reviewAspect": "Object facet",
"schema:reviewBody": "Software para la extraccin automtica de datos desde documentos PDF en investigacin."
},
"runtimePlatform": "Docker",
"softwareRequirements": [
"Python 3.x",
"Grobid (Ejecutado mediante Docker)"
],
"version": "2.0.0",
"developmentStatus": "active",
"codemeta:isSourceCodeOf": {
"id": "Research Software"
},
"issueTracker": "https://github.com/javizhangg/task1-AIAOSIRSE/issues"
}
GitHub Events
Total
- Release event: 1
- Push event: 11
- Public event: 1
- Create event: 1
Last Year
- Release event: 1
- Push event: 11
- Public event: 1
- Create event: 1