tidy_bib

Tidy_bib automates the preprocessing of .bib files exported from databases like Web of Science and Scopus, preparing the data for analysis in biblioshiny or for enrichment via APIs (Crossref, OpenAlex, ROR).

https://github.com/danielbrazil303/tidy_bib

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: danielbrazil303
License: mit
Language: R
Default Branch: main
Homepage:
Size: 76.2 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 10 months ago · Last pushed 10 months ago

Metadata Files

Readme License Citation

README.Rmd

---
output: github_document
---

# 📚 Tidy\_bib

> A modular and reproducible pipeline for cleaning and organizing bibliographic data using R.

---

## 🎯 Purpose

`Tidy_bib` automates the preprocessing of `.bib` files exported from databases like **Web of Science** and **Scopus**, preparing the data for analysis in `biblioshiny` or for enrichment via APIs (Crossref, OpenAlex, ROR).

---

## 🗂️ Project Structure

```text
Tidy_bib/
├── R/               # Function scripts (e.g., safe_convert2df.R)
├── data/            # Raw files (.bib)
├── output/          # Cleaned data, spreadsheets, logs
├── tests/           # Unit tests with testthat
├── docs/            # Additional documentation or generated reports
├── run_pipeline.R   # Main execution script
├── config.yaml      # Configuration file
├── README.Rmd       # This file
├── .gitignore       # Files to ignore in version control
└── renv/            # Dependency management
```

---

## ⚙️ How to Run the Pipeline

### 1. Install required packages:

```r
install.packages(c("bibliometrix", "yaml", "here", "fs", "dplyr"))
```

### 2. Run the pipeline:

```r
source("R/convert_bib_files_to_df.R")
source("R/safe_convert2df.R")
source("run_pipeline.R")

initialize_pipeline("config.yaml")
```

---

## 🧪 Testing

This project uses `testthat`. To run the tests:

```r
devtools::test()
```

---

## 👥 Contributing

1. Fork the repository
2. Create a branch: `git checkout -b new-feature`
3. Commit your changes: `git commit -m "feat: add new feature"`
4. Push to your branch: `git push origin new-feature`
5. Open a pull request

---

## 🔒 License

This project is licensed under the MIT License. See the `LICENSE` file for details.

---

## 📌 Citation

If you use this project, please cite as:

```
Arraes, D. (2025). Tidy_bib: A modular bibliographic cleaning pipeline in R. https://github.com/danielbrazil303/Tidy_bib
```

Owner

Login: danielbrazil303
Kind: user

Repositories: 1
Profile: https://github.com/danielbrazil303

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this project, please cite it as below."
title: "Tidy_bib: A modular bibliographic cleaning pipeline in R"
version: "0.1.0"
authors:
  - family-names: Arraes
    given-names: Daniel
    email: dan.arraes@uece.br
    affiliation: Universidade Estadual do Ceará (UECE)
    orcid: "https://orcid.org/0000-0003-0697-2268" 
date-released: 2025-07-30
license: MIT
repository-code: https://github.com/danielbrazil303/Tidy_bib

GitHub Events

Total

Push event: 2
Create event: 1

Last Year

Push event: 2
Create event: 1

Dependencies

DESCRIPTION cran

R >= 4.2.0 depends
bibliometrix * imports
dplyr * imports
fs * imports
glue * imports
here * imports
stringr * imports
yaml * imports
openxlsx * suggests
renv * suggests
testthat >= 3.0.0 suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science