citation

Generate bibtex entries from Document Object Identifiers (DOI)

https://github.com/foucault/citation

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 6 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.7%) to scientific vocabulary

Keywords

biblatex bibtex citations crossref doi issn latex python python3 tex

Last synced: 6 months ago · JSON representation ·

Repository

Generate bibtex entries from Document Object Identifiers (DOI)

Basic Info

Host: GitHub
Owner: foucault
Language: Python
Default Branch: master
Size: 6.84 KB

Statistics

Stars: 4
Watchers: 1
Forks: 3
Open Issues: 0
Releases: 0

Topics

biblatex bibtex citations crossref doi issn latex python python3 tex

Created about 8 years ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

citation

About

citation is a dead simple Python script used to download readily formatted citations for use in bibtex just by providing its Document Object Identifier (DOI). Cut and paste the output into your .bib file and you are ready to go!

$ citation 10.1007/bf00883088
@article{Foti_1977,
 author = {Foti, G. and Rimini, E. and Vitali, G. and Bertolotti, M.},
 doi = {10.1007/bf00883088},
 issn = {1432-0630},
 journal = {Applied Physics},
 month = oct,
 number = {2},
 pages = {189–191},
 publisher = {Springer Nature},
 shortjournal = {Appl. Phys.},
 title = {Amorphous-polycrystal transition induced by laser pulse in self-ion implanted silicon},
 url = {http://dx.doi.org/10.1007/bf00883088},
 volume = {14},
 year = {1977}
}

If you are using vim you can do that directly from your editor by using the following command

:r !citation 10.1007/bf00883088

and the bibtex entry will be appended into your current buffer.

Features

Download bibtex entries with just the DOI of the article.
Automatically generate the abbreviated journal name into the shortjournal bibtex field. If you use biblatex you can use this field instead of the journal to create a more compact bibliography.
Automatically strip curly braces from month specifications ({jan} → jan). Enclosing month abbreviations in curly braces is a LaTeX literal and should be avoided if you want your citations to be sorted correctly in a chronological order.

Caveats

citation should work fairly well at least for most western languages. It is completely untested with anything else than latin and greek alphabet, so expect things to break. Although citation will probably get your citations correct the first time there is always the chance of typos or invalid characters. These errors are propagated from CrossRef and are very hard to catch. However this should not happen very often. In my PhD I only had to edit 3 or 4 citations out of a 400+ references.

Dependencies

Python ≥ 3.2
Requests
BibtexParser

Owner

Name: Spyros Stathopoulos
Login: foucault
Kind: user
Location: UK

Repositories: 8
Profile: https://github.com/foucault

Citation (citation)

#!/usr/bin/python

from os import environ, makedirs
import os.path
import re
import sys
import gzip
import datetime

import requests
import bibtexparser

IGNORELIST = [
    "of", "and", "in", "at", "on", "the", "&",
    "für", "ab", "um"
]

MONTH_RE = re.compile("\s*month\s*=\s*\{\s*?(jan|january|feb|february|mar|march|apr|april|may|jun|june|jul|july|aug|august|sep|september|oct|october|nov|november|dec|december)\s*\},?")
LATEST_ISSN = "http://www.issn.org/wp-content/uploads/2013/09/LTWA_20160915.txt"
ISSN_UPD = datetime.date(2016, 9, 15)


def key_from_phrase(title):
    return "".join([x[0] for x in title.split()]).strip().lower()


def unix_data_home():
    try:
        return environ['XDG_DATA_HOME']
    except KeyError:
        return os.path.join(environ['HOME'], '.local', 'share')


def windows_data_home():
    return environ['APPDATA']


def darwin_data_home():
    return os.path.join(environ['HOME'], 'Library', 'Application Support')


def data_home(folder=None):
    platform = sys.platform

    if platform == 'win32':
        data_dir = windows_data_home()
    elif platform == 'darwin':
        data_dir = darwin_data_home()
    else:
        data_dir = unix_data_home()

    if folder is None:
        return data_dir
    else:
        return os.path.join(data_dir, folder)


def dl_abbrev(fname='abbrev.txt.gz'):
    url = LATEST_ISSN
    r = requests.get(url, allow_redirects=True)
    data = r.content
    directory = data_home('citation')
    makedirs(directory)

    with gzip.open(os.path.join(directory, fname), 'wb') as f:
        f.write(data)


def load_abbrev(fname):
    """
    Loads the abbreviation database
    """
    # Check if we have the abbreviations list
    target = os.path.join(data_home('citation'), fname)

    if not os.path.isfile(target):
        print("%s not found; downloading..." % target, file=sys.stderr)
        dl_abbrev(fname)

    # Check for outdated abbreviation list
    mtime = datetime.date.fromtimestamp(os.path.getmtime(target))
    if not mtime > ISSN_UPD:
        print("%s is out of date; redownloading..." % target, file=sys.stderr)
        dl_abbrev(fname)

    # Load the abbreviations database into memory
    data = {}
    with gzip.open(target, 'rt', encoding="utf-16") as f:
        for line in f:
            # usually the first line starts with WORD
            if line.startswith('WORD'):
                continue
            parts = line.split("\t")
            langs = parts[2].split(", ")
            jname = parts[0]
            jabbrev = parts[1]
            data[jname.lower()] = jabbrev.lower()
    return data


def journal_abbrev(name):
    """
    Abbreviates a journal title
    """
    #data = load_abbrev(os.path.join(sys.path[0], "abbrev.txt.gz"))
    data = load_abbrev("abbrev.txt.gz")
    n_abbrev = []

    (name, _, _) = name.partition(":")
    parts = re.split("\s+", name)

    if len(parts) == 1 and len(parts[0]) < 12:
        return name
    for word in parts:
        # Do not abbreviate wordsin the IGNORELIST
        if word.lower() in IGNORELIST:
            continue
        for (k,v) in data.items():
            found = False

            # If the key ends with - it means we are checking for a prefix
            if k.endswith("-"):
                if word.lower().startswith(k[:-1]):
                    if v != "n.a.":
                        n_abbrev.append(v.capitalize())
                    else:
                        n_abbrev.append(word.lower().capitalize())
                    found = True
                    break
            # Else we are checking for a whole match
            else:
                if word.lower() == k:
                    if v != "n.a.":
                        n_abbrev.append(v.capitalize())
                    else:
                        n_abbrev.append(word.lower().capitalize())
                    found = True
                    break

        if not found:
            # If all characters are uppercase leave as is
            if not word.isupper():
                n_abbrev.append(word.capitalize())
            else:
                n_abbrev.append(word)
    return " ".join(n_abbrev)


def get_entry(doi):
    url = 'https://dx.doi.org/%s' % doi

    raw = requests.get(url, \
        headers={'Accept':'text/x-bibliography;style=bibtex'},
        timeout=2)
    if raw.ok and raw.status_code == 200:
        db = bibtexparser.loads(raw.content.decode('utf-8'))
        entry = db.entries[0]
        if 'journal' in entry.keys():
            jabbr = journal_abbrev(entry['journal'])
            if jabbr != entry['journal']:
                entry['shortjournal'] = jabbr
        if 'month' in entry.keys():
            month = entry['month'].lower()[0:3]
            entry['month'] = month

        try:
            authors = entry['author'].split(' and ')
            first_author = authors[0].split(',')
            if 'shortjournal' in entry.keys():
                suffix = key_from_phrase(entry['shortjournal'])
            elif 'journal' in entry.keys():
                suffix = key_from_phrase(entry['journal'])
            elif 'publisher' in entry.keys():
                suffix = key_from_phrase(entry['publisher'])
            else:
                suffix = ''

            authorkey = '%s%s%s' % (first_author[0], entry['year'], suffix)
            entry['ID'] = authorkey
        except (IndexError, KeyError):
            pass

        raw_result = bibtexparser.dumps(db).strip()
        lines = []
        for line in raw_result.splitlines():
            match = MONTH_RE.match(line)
            if match:
                if line.strip().endswith(","):
                    line = " month = %s," % match.group(1)
                else:
                    line = " month = %s" % match.group(1)
            lines.append(line)
        return "\n".join(lines)
    else:
        raise Exception("Could not get data for \"%s\" from CrossRef (status code)" %
                (url, raw.status_code))


if __name__ == "__main__":
    try:
        doi = sys.argv[1]
        data = get_entry(doi)
        print(data)
    except IndexError as ie:
        print("Usage: %s DOI" % os.path.basename(sys.argv[0]), file=sys.stderr)
        print("No DOI provided", file=sys.stderr)
        sys.exit(1)
    except Exception as exc:
        print(exc, file=sys.stderr)
        sys.exit(1)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

citation

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

citation

About

Features

Caveats

Dependencies

Owner

Citation (citation)

GitHub Events

Total

Last Year