tidy_bib

Tidy_bib automates the preprocessing of .bib files exported from databases like Web of Science and Scopus, preparing the data for analysis in biblioshiny or for enrichment via APIs (Crossref, OpenAlex, ROR).

https://github.com/danielbrazil303/tidy_bib

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Tidy_bib automates the preprocessing of .bib files exported from databases like Web of Science and Scopus, preparing the data for analysis in biblioshiny or for enrichment via APIs (Crossref, OpenAlex, ROR).

Basic Info
  • Host: GitHub
  • Owner: danielbrazil303
  • License: mit
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 76.2 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 8 months ago · Last pushed 8 months ago
Metadata Files
Readme License Citation

README.Rmd

---
output: github_document
---

# 📚 Tidy\_bib

> A modular and reproducible pipeline for cleaning and organizing bibliographic data using R.

---

## 🎯 Purpose

`Tidy_bib` automates the preprocessing of `.bib` files exported from databases like **Web of Science** and **Scopus**, preparing the data for analysis in `biblioshiny` or for enrichment via APIs (Crossref, OpenAlex, ROR).

---

## 🗂️ Project Structure

```text
Tidy_bib/
├── R/               # Function scripts (e.g., safe_convert2df.R)
├── data/            # Raw files (.bib)
├── output/          # Cleaned data, spreadsheets, logs
├── tests/           # Unit tests with testthat
├── docs/            # Additional documentation or generated reports
├── run_pipeline.R   # Main execution script
├── config.yaml      # Configuration file
├── README.Rmd       # This file
├── .gitignore       # Files to ignore in version control
└── renv/            # Dependency management
```

---

## ⚙️ How to Run the Pipeline

### 1. Install required packages:

```r
install.packages(c("bibliometrix", "yaml", "here", "fs", "dplyr"))
```

### 2. Run the pipeline:

```r
source("R/convert_bib_files_to_df.R")
source("R/safe_convert2df.R")
source("run_pipeline.R")

initialize_pipeline("config.yaml")
```

---

## 🧪 Testing

This project uses `testthat`. To run the tests:

```r
devtools::test()
```

---

## 👥 Contributing

1. Fork the repository
2. Create a branch: `git checkout -b new-feature`
3. Commit your changes: `git commit -m "feat: add new feature"`
4. Push to your branch: `git push origin new-feature`
5. Open a pull request

---

## 🔒 License

This project is licensed under the MIT License. See the `LICENSE` file for details.

---

## 📌 Citation

If you use this project, please cite as:

```
Arraes, D. (2025). Tidy_bib: A modular bibliographic cleaning pipeline in R. https://github.com/danielbrazil303/Tidy_bib
```

Owner

  • Login: danielbrazil303
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this project, please cite it as below."
title: "Tidy_bib: A modular bibliographic cleaning pipeline in R"
version: "0.1.0"
authors:
  - family-names: Arraes
    given-names: Daniel
    email: dan.arraes@uece.br
    affiliation: Universidade Estadual do Ceará (UECE)
    orcid: "https://orcid.org/0000-0003-0697-2268" 
date-released: 2025-07-30
license: MIT
repository-code: https://github.com/danielbrazil303/Tidy_bib

GitHub Events

Total
  • Push event: 2
  • Create event: 1
Last Year
  • Push event: 2
  • Create event: 1

Dependencies

DESCRIPTION cran
  • R >= 4.2.0 depends
  • bibliometrix * imports
  • dplyr * imports
  • fs * imports
  • glue * imports
  • here * imports
  • stringr * imports
  • yaml * imports
  • openxlsx * suggests
  • renv * suggests
  • testthat >= 3.0.0 suggests