citation

Generate bibtex entries from Document Object Identifiers (DOI)

https://github.com/foucault/citation

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.7%) to scientific vocabulary

Keywords

biblatex bibtex citations crossref doi issn latex python python3 tex
Last synced: 6 months ago · JSON representation ·

Repository

Generate bibtex entries from Document Object Identifiers (DOI)

Basic Info
  • Host: GitHub
  • Owner: foucault
  • Language: Python
  • Default Branch: master
  • Size: 6.84 KB
Statistics
  • Stars: 4
  • Watchers: 1
  • Forks: 3
  • Open Issues: 0
  • Releases: 0
Topics
biblatex bibtex citations crossref doi issn latex python python3 tex
Created about 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme Citation

README.md

citation

About

citation is a dead simple Python script used to download readily formatted citations for use in bibtex just by providing its Document Object Identifier (DOI). Cut and paste the output into your .bib file and you are ready to go!

$ citation 10.1007/bf00883088
@article{Foti_1977,
 author = {Foti, G. and Rimini, E. and Vitali, G. and Bertolotti, M.},
 doi = {10.1007/bf00883088},
 issn = {1432-0630},
 journal = {Applied Physics},
 month = oct,
 number = {2},
 pages = {189–191},
 publisher = {Springer Nature},
 shortjournal = {Appl. Phys.},
 title = {Amorphous-polycrystal transition induced by laser pulse in self-ion implanted silicon},
 url = {http://dx.doi.org/10.1007/bf00883088},
 volume = {14},
 year = {1977}
}

If you are using vim you can do that directly from your editor by using the following command

:r !citation 10.1007/bf00883088

and the bibtex entry will be appended into your current buffer.

Features

  • Download bibtex entries with just the DOI of the article.
  • Automatically generate the abbreviated journal name into the shortjournal bibtex field. If you use biblatex you can use this field instead of the journal to create a more compact bibliography.
  • Automatically strip curly braces from month specifications ({jan}jan). Enclosing month abbreviations in curly braces is a LaTeX literal and should be avoided if you want your citations to be sorted correctly in a chronological order.

Caveats

citation should work fairly well at least for most western languages. It is completely untested with anything else than latin and greek alphabet, so expect things to break. Although citation will probably get your citations correct the first time there is always the chance of typos or invalid characters. These errors are propagated from CrossRef and are very hard to catch. However this should not happen very often. In my PhD I only had to edit 3 or 4 citations out of a 400+ references.

Dependencies

Owner

  • Name: Spyros Stathopoulos
  • Login: foucault
  • Kind: user
  • Location: UK

Citation (citation)

#!/usr/bin/python

from os import environ, makedirs
import os.path
import re
import sys
import gzip
import datetime

import requests
import bibtexparser

IGNORELIST = [
    "of", "and", "in", "at", "on", "the", "&",
    "für", "ab", "um"
]

MONTH_RE = re.compile("\s*month\s*=\s*\{\s*?(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|sep|september|oct|october|nov|november|dec|december)\s*\},?")
LATEST_ISSN = "http://www.issn.org/wp-content/uploads/2013/09/LTWA_20160915.txt"
ISSN_UPD = datetime.date(2016, 9, 15)


def key_from_phrase(title):
    return "".join([x[0] for x in title.split()]).strip().lower()


def unix_data_home():
    try:
        return environ['XDG_DATA_HOME']
    except KeyError:
        return os.path.join(environ['HOME'], '.local', 'share')


def windows_data_home():
    return environ['APPDATA']


def darwin_data_home():
    return os.path.join(environ['HOME'], 'Library', 'Application Support')


def data_home(folder=None):
    platform = sys.platform

    if platform == 'win32':
        data_dir = windows_data_home()
    elif platform == 'darwin':
        data_dir = darwin_data_home()
    else:
        data_dir = unix_data_home()

    if folder is None:
        return data_dir
    else:
        return os.path.join(data_dir, folder)


def dl_abbrev(fname='abbrev.txt.gz'):
    url = LATEST_ISSN
    r = requests.get(url, allow_redirects=True)
    data = r.content
    directory = data_home('citation')
    makedirs(directory)

    with gzip.open(os.path.join(directory, fname), 'wb') as f:
        f.write(data)


def load_abbrev(fname):
    """
    Loads the abbreviation database
    """
    # Check if we have the abbreviations list
    target = os.path.join(data_home('citation'), fname)

    if not os.path.isfile(target):
        print("%s not found; downloading..." % target, file=sys.stderr)
        dl_abbrev(fname)

    # Check for outdated abbreviation list
    mtime = datetime.date.fromtimestamp(os.path.getmtime(target))
    if not mtime > ISSN_UPD:
        print("%s is out of date; redownloading..." % target, file=sys.stderr)
        dl_abbrev(fname)

    # Load the abbreviations database into memory
    data = {}
    with gzip.open(target, 'rt', encoding="utf-16") as f:
        for line in f:
            # usually the first line starts with WORD
            if line.startswith('WORD'):
                continue
            parts = line.split("\t")
            langs = parts[2].split(", ")
            jname = parts[0]
            jabbrev = parts[1]
            data[jname.lower()] = jabbrev.lower()
    return data


def journal_abbrev(name):
    """
    Abbreviates a journal title
    """
    #data = load_abbrev(os.path.join(sys.path[0], "abbrev.txt.gz"))
    data = load_abbrev("abbrev.txt.gz")
    n_abbrev = []

    (name, _, _) = name.partition(":")
    parts = re.split("\s+", name)

    if len(parts) == 1 and len(parts[0]) < 12:
        return name
    for word in parts:
        # Do not abbreviate wordsin the IGNORELIST
        if word.lower() in IGNORELIST:
            continue
        for (k,v) in data.items():
            found = False

            # If the key ends with - it means we are checking for a prefix
            if k.endswith("-"):
                if word.lower().startswith(k[:-1]):
                    if v != "n.a.":
                        n_abbrev.append(v.capitalize())
                    else:
                        n_abbrev.append(word.lower().capitalize())
                    found = True
                    break
            # Else we are checking for a whole match
            else:
                if word.lower() == k:
                    if v != "n.a.":
                        n_abbrev.append(v.capitalize())
                    else:
                        n_abbrev.append(word.lower().capitalize())
                    found = True
                    break

        if not found:
            # If all characters are uppercase leave as is
            if not word.isupper():
                n_abbrev.append(word.capitalize())
            else:
                n_abbrev.append(word)
    return " ".join(n_abbrev)


def get_entry(doi):
    url = 'https://dx.doi.org/%s' % doi

    raw = requests.get(url, \
        headers={'Accept':'text/x-bibliography;style=bibtex'},
        timeout=2)
    if raw.ok and raw.status_code == 200:
        db = bibtexparser.loads(raw.content.decode('utf-8'))
        entry = db.entries[0]
        if 'journal' in entry.keys():
            jabbr = journal_abbrev(entry['journal'])
            if jabbr != entry['journal']:
                entry['shortjournal'] = jabbr
        if 'month' in entry.keys():
            month = entry['month'].lower()[0:3]
            entry['month'] = month

        try:
            authors = entry['author'].split(' and ')
            first_author = authors[0].split(',')
            if 'shortjournal' in entry.keys():
                suffix = key_from_phrase(entry['shortjournal'])
            elif 'journal' in entry.keys():
                suffix = key_from_phrase(entry['journal'])
            elif 'publisher' in entry.keys():
                suffix = key_from_phrase(entry['publisher'])
            else:
                suffix = ''

            authorkey = '%s%s%s' % (first_author[0], entry['year'], suffix)
            entry['ID'] = authorkey
        except (IndexError, KeyError):
            pass

        raw_result = bibtexparser.dumps(db).strip()
        lines = []
        for line in raw_result.splitlines():
            match = MONTH_RE.match(line)
            if match:
                if line.strip().endswith(","):
                    line = " month = %s," % match.group(1)
                else:
                    line = " month = %s" % match.group(1)
            lines.append(line)
        return "\n".join(lines)
    else:
        raise Exception("Could not get data for \"%s\" from CrossRef (status code)" %
                (url, raw.status_code))


if __name__ == "__main__":
    try:
        doi = sys.argv[1]
        data = get_entry(doi)
        print(data)
    except IndexError as ie:
        print("Usage: %s DOI" % os.path.basename(sys.argv[0]), file=sys.stderr)
        print("No DOI provided", file=sys.stderr)
        sys.exit(1)
    except Exception as exc:
        print(exc, file=sys.stderr)
        sys.exit(1)

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2