doi_scraper

Digital Object Identifier scraper written in Python

https://github.com/albertocuadra/doi_scraper

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary

Keywords

bibtex crossref crossref-api doi latex python research scraper

Last synced: 10 months ago · JSON representation ·

Repository

Digital Object Identifier scraper written in Python

Basic Info

Host: GitHub
Owner: AlbertoCuadra
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 34.2 KB

Statistics

Stars: 5
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 5

Topics

bibtex crossref crossref-api doi latex python research scraper

Created about 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

DOI Scraper

The DOI Scraper is a Python script that reads a .bib file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the Crossref API, and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields.

Prerequisites

Python 3.x
requests library
tqdm library

Installation

Clone the repository or download the doi_scraper.py file.
Install the required dependencies by running the following command:

shell pip install -r requirements.txt

Usage

Place your input .bib file in the same directory as the doi_scraper.py script.

Open the doi_scraper.py file and modify the following variables according to your needs:

python input_file = 'input.bib' # Name of the input .bib file output_file = 'output.bib' # Name of the output .bib file INDENT_PRE = 4 # Number of spaces before the field name INDENT_POST = 16 # Number of spaces after the field name

Run the script using the following command:

shell python doi_scraper.py

The script will search for articles without a DOI and retrieve the missing DOIs using the Crossref API. It will then update the output .bib file with the retrieved DOIs.

Once the script completes, you will find the updated .bib file with the retrieved DOIs in the same directory.

Optional Arguments

--format-only: If you want to reformat the file without performing any Crossref lookups.

Example

Before

After

bibtex @article{Cuadra2020, title = {Effect of equivalence ratio fluctuations on planar detonation discontinuities}, author = {Cuadra, Alberto and Huete, C{\'e}sar and Vera, Marcos}, pages = {A30 1--39}, year = {2020}, journal = {Journal of Fluid Mechanics}, volume = {903}, doi = {10.1017/jfm.2020.651}, }

License

This project is licensed under the MIT License.

Owner

Name: Alberto Cuadra-Lara
Login: AlbertoCuadra
Kind: user
Location: Madrid, Spain
Company: Universidad Carlos III de Madrid

Website: https://acuadralara.com
Repositories: 9
Profile: https://github.com/AlbertoCuadra

Pre-doctoral researcher in Fluid Mechanics

Citation (CITATION.cff)

# YAML 1.2
---
cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
type: misc
license: "MIT"
title: "DOI Scraper"
version: 1.2.0
doi: 10.5281/zenodo.7932535
date-released: 2025-03-20
url: "https://github.com/AlbertoCuadra/doi_scraper"
abstract:
    "The DOI Scraper is a Python script that reads a `.bib` file, searches for entries missing required fields (such as a DOI), retrieves the missing information using the Crossref API, and reformats the file with consistent indentation. The refactored design supports different entry types (e.g., articles, books, inproceedings, tech reports), with each type defining its own required fields."
authors: 
  -
    family-names: "Cuadra"
    given-names: A
    orcid: "https://orcid.org/0000-0001-8280-2426"
keywords: 
  - scraper
  - latex
  - bibtex
  - doi
  - crossref
  - "crossref-api"
  - python
  - "open-source"

GitHub Events

Total

Release event: 1
Watch event: 3
Push event: 3
Pull request event: 4
Create event: 1

Last Year

Release event: 1
Watch event: 3
Push event: 3
Pull request event: 4
Create event: 1

Committers

Last synced: about 1 year ago

All Time

Total Commits: 13
Total Committers: 1
Avg Commits per committer: 13.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 3
Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Alberto Cuadra Lara	a**a@i**s	13

Committer Domains (Top 20 + Academic)

ing.uc3m.es: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 0
Total pull requests: 10
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 10
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: 1 minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

doi_scraper

Science Score: 57.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

DOI Scraper

Prerequisites

Installation

Usage

Optional Arguments

Example

Before

After

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels