epitator

EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.

https://github.com/ecohealthalliance/epitator

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
1 of 15 committers (6.7%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (9.5%) to scientific vocabulary

Keywords

disease-surveillance epidemiology geonames nlp spacy toponym-resolution

Last synced: 9 months ago · JSON representation

Repository

EpiTator annotates epidemiological information in text documents. It is the natural language processing framework that powers GRITS and EIDR Connect.

Basic Info

Host: GitHub
Owner: ecohealthalliance
License: apache-2.0
Language: Python
Default Branch: master
Homepage: https://epitator.readthedocs.io/en/latest/index.html
Size: 824 KB

Statistics

Stars: 41
Watchers: 8
Forks: 9
Open Issues: 4
Releases: 0

Topics

disease-surveillance epidemiology geonames nlp spacy toponym-resolution

Created about 9 years ago · Last pushed almost 4 years ago

Metadata Files

Readme Contributing License

README.rst

EpiTator
********

Annotators for extracting epidemiological information from text.

Installation
============

.. code:: bash

    pip install epitator
    python -m spacy download en_core_web_md


Annotators
==========

Geoname Annotator
-----------------

The geoname annotator uses the geonames.org dataset to resolve mentions of geonames.
A classifier is used to disambiguate geonames and rule out false positives.

To use the geoname annotator run the following command to import geonames.org
data into epitator's embedded sqlite3 database:

You should review the geonames license before using this data.

.. code:: bash

    python -m epitator.importers.import_geonames


Usage
-----

.. code:: python

    from epitator.annotator import AnnoDoc
    from epitator.geoname_annotator import GeonameAnnotator
    doc = AnnoDoc("Where is Chiang Mai?")
    doc.add_tiers(GeonameAnnotator())
    annotations = doc.tiers["geonames"].spans
    geoname = annotations[0].geoname
    geoname['name']
    # = 'Chiang Mai'
    geoname['geonameid']
    # = '1153671'
    geoname['latitude']
    # = 18.79038
    geoname['longitude']
    # = 98.98468


Resolved Keyword Annotator
--------------------------

The resolved keyword annotator uses an sqlite database of entities to resolve
mentions of multiple synonyms for an entity to a single id.
This project includes scripts for importing infectious diseases and animal species into
that database. The following commands can be used to invoke them:

The scripts import data from the `Disease Ontology `_,
`Wikidata `_
and `ITIS `_.
You should review their licenses and terms of use before using this data.
Currently the Disease Ontology is under public domain and ITIS requests citation.

.. code:: bash

    python -m epitator.importers.import_species
    # By default entities under the disease by infectious agent class will be
    # imported from the disease ontology, but this can be altered by supplying
    # a --root-uri parameter.
    python -m epitator.importers.import_disease_ontology
    python -m epitator.importers.import_wikidata


Usage
-----

.. code:: python

    from epitator.annotator import AnnoDoc
    from epitator.resolved_keyword_annotator import ResolvedKeywordAnnotator
    doc = AnnoDoc("5 cases of smallpox")
    doc.add_tiers(ResolvedKeywordAnnotator())
    annotations = doc.tiers["resolved_keywords"].spans
    annotations[0].metadata["resolutions"]
    # = [{'entity': , 'entity_id': u'http://purl.obolibrary.org/obo/DOID_8736', 'weight': 3}]


Count Annotator
---------------

The count annotator identifies counts, and case counts in particular.
The count's value is extracted and parsed. Attributes such as whether the count
refers to cases or deaths, or whether the value is approximate are also extracted.

Usage
-----

.. code:: python

    from epitator.annotator import AnnoDoc
    from epitator.count_annotator import CountAnnotator
    doc = AnnoDoc("5 cases of smallpox")
    doc.add_tiers(CountAnnotator())
    annotations = doc.tiers["counts"].spans
    annotations[0].metadata
    # = {'count': 5, 'text': '5 cases', 'attributes': ['case']}


Date Annotator
--------------

The date annotator identifies and parses dates and date ranges.
All dates are parsed into datetime ranges. For instance, a date like "11-6-87"
would be parsed as a range from the start of the day to the start of the next day,
while a month like "December 2011" would be parsed as a range from the start
of December 1st to the start of the next month.

Usage
-----

.. code:: python

    from epitator.annotator import AnnoDoc
    from epitator.date_annotator import DateAnnotator
    doc = AnnoDoc("From March 5 until April 7 1988")
    doc.add_tiers(DateAnnotator())
    annotations = doc.tiers["dates"].spans
    annotations[0].metadata["datetime_range"]
    # = [datetime.datetime(1988, 3, 5, 0, 0), datetime.datetime(1988, 4, 7, 0, 0)]


Structured Data Annotator
-------------------------

The structured data annotator identifies and parses embedded tables.

Usage
-----

.. code:: python

    from epitator.annotator import AnnoDoc
    from epitator.structured_data_annotator import StructuredDataAnnotator
    doc = AnnoDoc("""
    species | cases | deaths
    Cattle  | 0     | 0
    Dogs    | 2     | 1
    """)
    doc.add_tiers(StructuredDataAnnotator())
    annotations = doc.tiers["structured_data"].spans
    annotations[0].metadata
    # = {'data': [
    #       [AnnoSpan(1-8, species), AnnoSpan(11-16, cases), AnnoSpan(19-25, deaths)],
    #       [AnnoSpan(26-32, Cattle), AnnoSpan(36-37, 0), AnnoSpan(44-45, 0)],
    #       [AnnoSpan(46-50, Dogs), AnnoSpan(56-57, 2), AnnoSpan(64-65, 1)]],
    #    'delimiter': '|',
    #    'type': 'table'}


Structured Incident Annotator
-----------------------------

The structured incident annotator identifies and parses embedded tables that
describe case counts paired with location, date, disease and species metadata.
Metadata is also extracted from the text around the table.

Usage
-----

.. code:: python

    from epitator.annotator import AnnoDoc
    from epitator.structured_incident_annotator import StructuredIncidentAnnotator
    doc = AnnoDoc("""
    Fictional October 2015 rabies cases in Svalbard
    
    species | cases | deaths
    Cattle  | 0     | 0
    Dogs    | 4     | 1
    """)
    doc.add_tiers(StructuredIncidentAnnotator())
    annotations = doc.tiers["structured_incidents"].spans
    annotations[-1].metadata
    # = {'location': {'name': u'Svalbard', ...},
    #    'species': {'label': u'Canidae', ...},
    #    'attributes': [],
    #    'dateRange': [datetime.datetime(2015, 10, 1, 0, 0), datetime.datetime(2015, 11, 1, 0, 0)],
    #    'type': 'deathCount',
    #    'value': 1,
    #    'resolvedDisease': {'label': u'rabies', ...}}


Architecture
============

EpiTator provides the following classes for organizing annotations.

AnnoDoc - The document being annotated. The AnnoDoc links to the tiers of annotations applied to it.

AnnoTier - A group of AnnoSpans. Each annotator creates one or more tiers of annotations.

AnnoSpan - A span of text with an annotation applied to it.

License
=======

Copyright 2016 EcoHealth Alliance

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Owner

Name: EcoHealth Alliance
Login: ecohealthalliance
Kind: organization
Email: tech@ecohealthalliance.org
Location: New York, NY

Website: http://ecohealthalliance.org/
Repositories: 199
Profile: https://github.com/ecohealthalliance

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 459
Total Committers: 15
Avg Commits per committer: 30.6
Development Distribution Score (DDS): 0.542

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Nathan Breit	n**h@n**m	210
Russell Horton	r**n@e**o	99
Nathan Breit	n**t@g**m	76
Toph Allen	t**n@g**m	48
Auss Abbood	A**A@r**e	7
Amy Slagle	a**e@e**o	4
Léo Bouscarrat	l**t@e**u	3
Abe Miessler	m**r@e**g	2
Russell Horton	r**s@l**l	2
Freddie Rosario	r**o@e**g	2
Toph Allen	t****n	2
Dr Tom August	t**g@c**k	1
Stephen Matta	s**a@g**m	1
Freddie Rosario	f****o	1
aauss	a**d@l**e	1

Committer Domains (Top 20 + Academic)

ecohealthalliance.org: 2 ecohealth.io: 2 live.de: 1 ceh.ac.uk: 1 euranova.eu: 1 rki.de: 1 nathanathan.com: 1

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 8
Total pull requests: 51
Average time to close issues: 19 days
Average time to close pull requests: 9 days
Total issue authors: 5
Total pull request authors: 7
Average comments per issue: 1.88
Average comments per pull request: 0.65
Merged pull requests: 44
Bot issues: 0
Bot pull requests: 1

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

aauss (4)
leobouscarrat (1)
nickynicolson (1)
AugustT (1)
alam121 (1)

Pull Request Authors

nathanathan (37)
aauss (8)
toph-allen (2)
leobouscarrat (1)
AugustT (1)
dependabot[bot] (1)
Broham (1)

Top Labels

Issue Labels

Pull Request Labels

dependencies (1)

Packages

Total packages: 1
Total downloads:
- pypi 14 last-month

Total dependent packages: 0
Total dependent repositories: 2
Total versions: 22
Total maintainers: 1

pypi.org: epitator

Annotators for extracting epidemiological information from text.

Homepage: https://github.com/ecohealthalliance/EpiTator
Documentation: https://epitator.readthedocs.io/
License: Apache Software License
Latest release: 1.3.5
published over 6 years ago

Versions: 22
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 14 Last month

Rankings

Downloads: 6.1%

Dependent packages count: 7.3%

Average: 9.3%

Stargazers count: 10.1%

Forks count: 11.5%

Dependent repos count: 11.8%

Maintainers (1)

ehatech

Last synced: 9 months ago

Dependencies

requirements.readthedocs.txt pypi

six *

requirements.txt pypi

dateparser ==0.7.1
geopy *
numpy ==1.16.1
pyparsing ==2.2.0
python-dateutil *
rdflib *
regex ==2018.01.10
six *
spacy ==2.1.8
unicodecsv *

setup.py pypi

dateparser ==0.7.1
geopy >=1.11.0
numpy >=1.16.1
pyparsing ==2.2.0
python-dateutil >=2.6.0
rdflib >=4.2.2
regex ==2018.01.10
six *
spacy ==2.1.8
unicodecsv >=0.14.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

epitator

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.rst

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: epitator

Rankings

Maintainers (1)

Dependencies