citation-extractor

A tool to extract canonical references from text.

https://github.com/mromanello/citationextractor

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    3 of 6 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.5%) to scientific vocabulary

Keywords

canonical-references classics digital-classics extraction python
Last synced: 6 months ago · JSON representation

Repository

A tool to extract canonical references from text.

Basic Info
Statistics
  • Stars: 20
  • Watchers: 4
  • Forks: 3
  • Open Issues: 4
  • Releases: 0
Topics
canonical-references classics digital-classics extraction python
Created about 14 years ago · Last pushed over 4 years ago
Metadata Files
Readme Changelog License Citation

README.md

(Canonical) Citation Extractor

Status

DOI Build Status codecov

Installation

This software supports Python version 2.7, and it was tested only on POSIXcompliant operating systems (Linux, Mac OS X, FreeBSD, etc.).

Installing TreeTagger

The CitationExtractor relies on TreeTagger for the PoS tagging of input texts.

There is a handy script to install it.

To run it without having to clone this repo:

bash wget -O install_treetagger.sh https://raw.githubusercontent.com/mromanello/CitationExtractor/master/install_treetagger.sh chmod a+x install_treetagger.sh ./install_treetagger.sh rm install_treetagger.sh

otherwise:

bash git clone https://github.com/mromanello/CitationExtractor.git cd CitationExtractor chmod a+x install_treetagger.sh ./install_treetagger.sh

With pip

To install the CitationExtractor first run:

$ pip install http://www.antlr3.org/download/Python/antlr_python_runtime-3.1.3.tar.gz#egg=antlr_python_runtime-3.1.3
$ pip install https://github.com/mromanello/treetagger-python/archive/master.zip#egg=treetagger-1.0.1

followed by:

$ pip install citation-extractor

NB: the installation of all other dependencies is handled by setup.py but for some reason (that I'm still trying to figure out) it does not pick up these two.

Verify installation

To double check that everything was installed correctly, try running the following lines (it should take ~20s):

python from citation_extractor.settings import crfsuite from citation_extractor.pipeline import get_extractor extractor = get_extractor(crfsuite) assert extractor is not None

If the code above runs without throwing exceptions means you managed to install the library!

Documentation

I'm working on it ;-)

For the time being, you can find a concrete example of how to use the library in this notebook.

Owner

  • Name: Matteo Romanello
  • Login: mromanello
  • Kind: user
  • Location: Lausanne (CH)
  • Company: @dhlab-epfl (previously @dains and @kcl-ddh)

Researcher in Computational Humanities.

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 438
  • Total Committers: 6
  • Avg Commits per committer: 73.0
  • Development Distribution Score (DDS): 0.1
Top Committers
Name Email Commits
mromanello m****o@g****m 394
Matteo Filipponi m****i@e****h 32
Matteo Filipponi m****i@d****m 7
Matteo Romanello m****o@e****h 3
Matteo m****o@t****h 1
skruse s****k@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 19
  • Total pull requests: 6
  • Average time to close issues: almost 3 years
  • Average time to close pull requests: about 9 hours
  • Total issue authors: 2
  • Total pull request authors: 3
  • Average comments per issue: 0.32
  • Average comments per pull request: 0.67
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mromanello (18)
  • Jmuccigr (1)
Pull Request Authors
  • mromanello (3)
  • mfilippo (2)
  • skruse (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 17 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 3
  • Total maintainers: 1
pypi.org: citation-extractor
  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 17 Last month
Rankings
Dependent packages count: 10.0%
Stargazers count: 13.6%
Forks count: 16.8%
Dependent repos count: 21.7%
Average: 22.3%
Downloads: 49.2%
Maintainers (1)
Last synced: 7 months ago

Dependencies

requirements.txt pypi
  • antlr-python-runtime *
  • dask *
  • docopt *
  • hucitlib *
  • jellyfish >=0.5.6
  • langid *
  • numpy >=1.9.2
  • pandas *
  • rdflib ==4.2.1
  • scikit-learn >=0.16.1
  • scipy >=0.17.0
  • sklearn-crfsuite *
  • stop_words >=2015.2.23.1
  • treetagger *
requirements_dev.txt pypi
  • codecov * development
  • flake8 * development
  • ipdb * development
  • isort * development
  • parmap * development
  • pytest * development
  • pytest-cov <2.6.0 development
  • tabulate * development
  • tox * development
  • twine * development
setup.py pypi
  • citation_parser >=0.4.1
  • docopt *
  • hucitlib *
  • jellyfish >=0.5.6
  • langid *
  • pandas *
  • pycas *
  • scikit-learn >=0.16.1
  • scipy *
  • sklearn-crfsuite *
  • stop_words >=2015.2.23.1
  • treetagger *