doi_scraper
Digital Object Identifier scraper written in Python
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Keywords
Repository
Digital Object Identifier scraper written in Python
Basic Info
Statistics
- Stars: 5
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 5
Topics
Metadata Files
README.md
DOI Scraper
The DOI Scraper is a Python script that reads a .bib file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the Crossref API, and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields.
Prerequisites
- Python 3.x
requestslibrarytqdmlibrary
Installation
Clone the repository or download the
doi_scraper.pyfile.Install the required dependencies by running the following command:
shell
pip install -r requirements.txt
Usage
Place your input .bib file in the same directory as the doi_scraper.py script.
Open the doi_scraper.py file and modify the following variables according to your needs:
python
input_file = 'input.bib' # Name of the input .bib file
output_file = 'output.bib' # Name of the output .bib file
INDENT_PRE = 4 # Number of spaces before the field name
INDENT_POST = 16 # Number of spaces after the field name
Run the script using the following command:
shell
python doi_scraper.py
The script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs.
Once the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory.
Optional Arguments
--format-only: If you want to reformat the file without performing any Crossref lookups.
Example
Before
bibtex
@article{Cuadra2020,
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
pages= {A30 1--39}
}
After
bibtex
@article{Cuadra2020,
title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities},
author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos},
pages = {A30 1--39},
year = {2020},
journal = {Journal of Fluid Mechanics},
volume = {903},
doi = {10.1017/jfm.2020.651},
}
License
This project is licensed under the MIT License.
Owner
- Name: Alberto Cuadra-Lara
- Login: AlbertoCuadra
- Kind: user
- Location: Madrid, Spain
- Company: Universidad Carlos III de Madrid
- Website: https://acuadralara.com
- Repositories: 9
- Profile: https://github.com/AlbertoCuadra
Pre-doctoral researcher in Fluid Mechanics
Citation (CITATION.cff)
# YAML 1.2
---
cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
type: misc
license: "MIT"
title: "DOI Scraper"
version: 1.2.0
doi: 10.5281/zenodo.7932535
date-released: 2025-03-20
url: "https://github.com/AlbertoCuadra/doi_scraper"
abstract:
"The DOI Scraper is a Python script that reads a `.bib` file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the Crossref API, and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields."
authors:
-
family-names: "Cuadra"
given-names: A
orcid: "https://orcid.org/0000-0001-8280-2426"
keywords:
- scraper
- latex
- bibtex
- doi
- crossref
- "crossref-api"
- python
- "open-source"
GitHub Events
Total
- Release event: 1
- Watch event: 3
- Push event: 3
- Pull request event: 4
- Create event: 1
Last Year
- Release event: 1
- Watch event: 3
- Push event: 3
- Pull request event: 4
- Create event: 1
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Alberto Cuadra Lara | a****a@i****s | 13 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 0
- Total pull requests: 10
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 10
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 2
- Average time to close issues: N/A
- Average time to close pull requests: 1 minute
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- AlbertoCuadra (8)