citation
Generate bibtex entries from Document Object Identifiers (DOI)
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary
Keywords
Repository
Generate bibtex entries from Document Object Identifiers (DOI)
Basic Info
- Host: GitHub
- Owner: foucault
- Language: Python
- Default Branch: master
- Size: 6.84 KB
Statistics
- Stars: 4
- Watchers: 1
- Forks: 3
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
citation
About
citation is a dead simple Python script used to download readily formatted
citations for use in bibtex just by providing its Document Object Identifier
(DOI). Cut and paste the output into your .bib file and you are ready to go!
$ citation 10.1007/bf00883088
@article{Foti_1977,
author = {Foti, G. and Rimini, E. and Vitali, G. and Bertolotti, M.},
doi = {10.1007/bf00883088},
issn = {1432-0630},
journal = {Applied Physics},
month = oct,
number = {2},
pages = {189–191},
publisher = {Springer Nature},
shortjournal = {Appl. Phys.},
title = {Amorphous-polycrystal transition induced by laser pulse in self-ion implanted silicon},
url = {http://dx.doi.org/10.1007/bf00883088},
volume = {14},
year = {1977}
}
If you are using vim you can do that directly from your editor by using the following command
:r !citation 10.1007/bf00883088
and the bibtex entry will be appended into your current buffer.
Features
- Download bibtex entries with just the DOI of the article.
- Automatically generate the abbreviated journal name into the
shortjournalbibtex field. If you usebiblatexyou can use this field instead of thejournalto create a more compact bibliography. - Automatically strip curly braces from month specifications (
{jan}→jan). Enclosing month abbreviations in curly braces is a LaTeX literal and should be avoided if you want your citations to be sorted correctly in a chronological order.
Caveats
citation should work fairly well at least for most western languages. It is completely untested with anything else than latin and greek alphabet, so expect things to break. Although citation will probably get your citations correct the first time there is always the chance of typos or invalid characters. These errors are propagated from CrossRef and are very hard to catch. However this should not happen very often. In my PhD I only had to edit 3 or 4 citations out of a 400+ references.
Dependencies
- Python ≥ 3.2
- Requests
- BibtexParser
Owner
- Name: Spyros Stathopoulos
- Login: foucault
- Kind: user
- Location: UK
- Repositories: 8
- Profile: https://github.com/foucault
Citation (citation)
#!/usr/bin/python
from os import environ, makedirs
import os.path
import re
import sys
import gzip
import datetime
import requests
import bibtexparser
IGNORELIST = [
"of", "and", "in", "at", "on", "the", "&",
"für", "ab", "um"
]
MONTH_RE = re.compile("\s*month\s*=\s*\{\s*?(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|sep|september|oct|october|nov|november|dec|december)\s*\},?")
LATEST_ISSN = "http://www.issn.org/wp-content/uploads/2013/09/LTWA_20160915.txt"
ISSN_UPD = datetime.date(2016, 9, 15)
def key_from_phrase(title):
return "".join([x[0] for x in title.split()]).strip().lower()
def unix_data_home():
try:
return environ['XDG_DATA_HOME']
except KeyError:
return os.path.join(environ['HOME'], '.local', 'share')
def windows_data_home():
return environ['APPDATA']
def darwin_data_home():
return os.path.join(environ['HOME'], 'Library', 'Application Support')
def data_home(folder=None):
platform = sys.platform
if platform == 'win32':
data_dir = windows_data_home()
elif platform == 'darwin':
data_dir = darwin_data_home()
else:
data_dir = unix_data_home()
if folder is None:
return data_dir
else:
return os.path.join(data_dir, folder)
def dl_abbrev(fname='abbrev.txt.gz'):
url = LATEST_ISSN
r = requests.get(url, allow_redirects=True)
data = r.content
directory = data_home('citation')
makedirs(directory)
with gzip.open(os.path.join(directory, fname), 'wb') as f:
f.write(data)
def load_abbrev(fname):
"""
Loads the abbreviation database
"""
# Check if we have the abbreviations list
target = os.path.join(data_home('citation'), fname)
if not os.path.isfile(target):
print("%s not found; downloading..." % target, file=sys.stderr)
dl_abbrev(fname)
# Check for outdated abbreviation list
mtime = datetime.date.fromtimestamp(os.path.getmtime(target))
if not mtime > ISSN_UPD:
print("%s is out of date; redownloading..." % target, file=sys.stderr)
dl_abbrev(fname)
# Load the abbreviations database into memory
data = {}
with gzip.open(target, 'rt', encoding="utf-16") as f:
for line in f:
# usually the first line starts with WORD
if line.startswith('WORD'):
continue
parts = line.split("\t")
langs = parts[2].split(", ")
jname = parts[0]
jabbrev = parts[1]
data[jname.lower()] = jabbrev.lower()
return data
def journal_abbrev(name):
"""
Abbreviates a journal title
"""
#data = load_abbrev(os.path.join(sys.path[0], "abbrev.txt.gz"))
data = load_abbrev("abbrev.txt.gz")
n_abbrev = []
(name, _, _) = name.partition(":")
parts = re.split("\s+", name)
if len(parts) == 1 and len(parts[0]) < 12:
return name
for word in parts:
# Do not abbreviate wordsin the IGNORELIST
if word.lower() in IGNORELIST:
continue
for (k,v) in data.items():
found = False
# If the key ends with - it means we are checking for a prefix
if k.endswith("-"):
if word.lower().startswith(k[:-1]):
if v != "n.a.":
n_abbrev.append(v.capitalize())
else:
n_abbrev.append(word.lower().capitalize())
found = True
break
# Else we are checking for a whole match
else:
if word.lower() == k:
if v != "n.a.":
n_abbrev.append(v.capitalize())
else:
n_abbrev.append(word.lower().capitalize())
found = True
break
if not found:
# If all characters are uppercase leave as is
if not word.isupper():
n_abbrev.append(word.capitalize())
else:
n_abbrev.append(word)
return " ".join(n_abbrev)
def get_entry(doi):
url = 'https://dx.doi.org/%s' % doi
raw = requests.get(url, \
headers={'Accept':'text/x-bibliography;style=bibtex'},
timeout=2)
if raw.ok and raw.status_code == 200:
db = bibtexparser.loads(raw.content.decode('utf-8'))
entry = db.entries[0]
if 'journal' in entry.keys():
jabbr = journal_abbrev(entry['journal'])
if jabbr != entry['journal']:
entry['shortjournal'] = jabbr
if 'month' in entry.keys():
month = entry['month'].lower()[0:3]
entry['month'] = month
try:
authors = entry['author'].split(' and ')
first_author = authors[0].split(',')
if 'shortjournal' in entry.keys():
suffix = key_from_phrase(entry['shortjournal'])
elif 'journal' in entry.keys():
suffix = key_from_phrase(entry['journal'])
elif 'publisher' in entry.keys():
suffix = key_from_phrase(entry['publisher'])
else:
suffix = ''
authorkey = '%s%s%s' % (first_author[0], entry['year'], suffix)
entry['ID'] = authorkey
except (IndexError, KeyError):
pass
raw_result = bibtexparser.dumps(db).strip()
lines = []
for line in raw_result.splitlines():
match = MONTH_RE.match(line)
if match:
if line.strip().endswith(","):
line = " month = %s," % match.group(1)
else:
line = " month = %s" % match.group(1)
lines.append(line)
return "\n".join(lines)
else:
raise Exception("Could not get data for \"%s\" from CrossRef (status code)" %
(url, raw.status_code))
if __name__ == "__main__":
try:
doi = sys.argv[1]
data = get_entry(doi)
print(data)
except IndexError as ie:
print("Usage: %s DOI" % os.path.basename(sys.argv[0]), file=sys.stderr)
print("No DOI provided", file=sys.stderr)
sys.exit(1)
except Exception as exc:
print(exc, file=sys.stderr)
sys.exit(1)
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2