citation-extractor
A tool to extract canonical references from text.
Science Score: 33.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
3 of 6 committers (50.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary
Keywords
Repository
A tool to extract canonical references from text.
Basic Info
- Host: GitHub
- Owner: mromanello
- License: gpl-3.0
- Language: HTML
- Default Branch: master
- Homepage: https://citationextractor.readthedocs.io
- Size: 26.5 MB
Statistics
- Stars: 20
- Watchers: 4
- Forks: 3
- Open Issues: 4
- Releases: 0
Topics
Metadata Files
README.md
(Canonical) Citation Extractor
Status
Installation
This software supports Python version 2.7, and it was tested only on POSIXcompliant operating systems (Linux, Mac OS X, FreeBSD, etc.).
Installing TreeTagger
The CitationExtractor relies on TreeTagger for the PoS tagging of input texts.
There is a handy script to install it.
To run it without having to clone this repo:
bash
wget -O install_treetagger.sh https://raw.githubusercontent.com/mromanello/CitationExtractor/master/install_treetagger.sh
chmod a+x install_treetagger.sh
./install_treetagger.sh
rm install_treetagger.sh
otherwise:
bash
git clone https://github.com/mromanello/CitationExtractor.git
cd CitationExtractor
chmod a+x install_treetagger.sh
./install_treetagger.sh
With pip
To install the CitationExtractor first run:
$ pip install http://www.antlr3.org/download/Python/antlr_python_runtime-3.1.3.tar.gz#egg=antlr_python_runtime-3.1.3
$ pip install https://github.com/mromanello/treetagger-python/archive/master.zip#egg=treetagger-1.0.1
followed by:
$ pip install citation-extractor
NB: the installation of all other dependencies is handled by setup.py but for some reason
(that I'm still trying to figure out) it does not pick up these two.
Verify installation
To double check that everything was installed correctly, try running the following lines (it should take ~20s):
python
from citation_extractor.settings import crfsuite
from citation_extractor.pipeline import get_extractor
extractor = get_extractor(crfsuite)
assert extractor is not None
If the code above runs without throwing exceptions means you managed to install the library!
Documentation
I'm working on it ;-)
For the time being, you can find a concrete example of how to use the library in this notebook.
Owner
- Name: Matteo Romanello
- Login: mromanello
- Kind: user
- Location: Lausanne (CH)
- Company: @dhlab-epfl (previously @dains and @kcl-ddh)
- Website: http://orcid.org/0000-0002-7406-6286
- Repositories: 60
- Profile: https://github.com/mromanello
Researcher in Computational Humanities.
GitHub Events
Total
Last Year
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 438
- Total Committers: 6
- Avg Commits per committer: 73.0
- Development Distribution Score (DDS): 0.1
Top Committers
| Name | Commits | |
|---|---|---|
| mromanello | m****o@g****m | 394 |
| Matteo Filipponi | m****i@e****h | 32 |
| Matteo Filipponi | m****i@d****m | 7 |
| Matteo Romanello | m****o@e****h | 3 |
| Matteo | m****o@t****h | 1 |
| skruse | s****k@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 19
- Total pull requests: 6
- Average time to close issues: almost 3 years
- Average time to close pull requests: about 9 hours
- Total issue authors: 2
- Total pull request authors: 3
- Average comments per issue: 0.32
- Average comments per pull request: 0.67
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mromanello (18)
- Jmuccigr (1)
Pull Request Authors
- mromanello (3)
- mfilippo (2)
- skruse (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 17 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 3
- Total maintainers: 1
pypi.org: citation-extractor
- Homepage: http://github.com/mromanello/CitationExtractor/
- Documentation: https://citation-extractor.readthedocs.io/
- License: gpl-3.0
-
Latest release: 1.6.3
published about 8 years ago
Rankings
Maintainers (1)
Dependencies
- antlr-python-runtime *
- dask *
- docopt *
- hucitlib *
- jellyfish >=0.5.6
- langid *
- numpy >=1.9.2
- pandas *
- rdflib ==4.2.1
- scikit-learn >=0.16.1
- scipy >=0.17.0
- sklearn-crfsuite *
- stop_words >=2015.2.23.1
- treetagger *
- codecov * development
- flake8 * development
- ipdb * development
- isort * development
- parmap * development
- pytest * development
- pytest-cov <2.6.0 development
- tabulate * development
- tox * development
- twine * development
- citation_parser >=0.4.1
- docopt *
- hucitlib *
- jellyfish >=0.5.6
- langid *
- pandas *
- pycas *
- scikit-learn >=0.16.1
- scipy *
- sklearn-crfsuite *
- stop_words >=2015.2.23.1
- treetagger *