news-entity-server
Pull text out a URL and return a list of all the entities in it. Supports multiple languages. Used by Highlighter tool and others.
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Repository
Pull text out a URL and return a list of all the entities in it. Supports multiple languages. Used by Highlighter tool and others.
Basic Info
- Host: GitHub
- Owner: counterdata-network
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://hub.docker.com/r/rahulbot/news-entity-server
- Size: 247 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 4
- Releases: 0
Metadata Files
README.md
News Entity Server
A small API server to return entities and other metadata for online news articles. Originally built to support the Data Against Feminicide project. Technically, this exposes API endpoints that accepts URLs and returns entities in JSON. Uses spaCy under the hood for entity extraction.
Install from Docker: The easiest approach to just start using this is to install the pre-built image from
DockerHub. Set a WEB_CONCURRENCY env var if you want more than one worker.
docker pull rahulbot/news-entity-server:latest
docker run -p 8000:8000 -e MODEL_MODE=small -m 8G news-entity-server:latest
Developing
Installation
pip install -r requirements.txt
./install.sh
Running Locally
Run locally with Gunicorn: ./run.sh
Or via docker:
docker image build -t news-entity-server .
docker container run --rm -it -p 8000:8000 -e MODEL_MODE=small news-entity-server
Testing
Just run pytest to run a small set of test on the API endpoints.
Usage
API documentation is available at http://localhost:5000/redoc. See the code in test/test_server.py for examples.
API Endpoints
Every endpoint returns a dict like this:
json
{
"duration": 123,
"status": "ok",
"version": "0.0.1",
"results": { ... }
}
- duration: the number of milliseconds the request too to complete on the server
- status: "ok" if it worked, "error" if it did not work
- version: a semantically versioned number indicating the server version
- results: a dict of the results you requested (potentially different for different endpoints)
/entities/from-url
POST a url and language to this endpoint and it returns JSON with all the entities it finds.
Add a title argument, set to 1 or 0, to optionally include the article title in the entity extraction.
/entities/from-content
POST text and language content to this endpoint, and it returns JSON with all the entities it finds.
/content/from-url
POST a url to this endpoint, and it returns just the extracted content from the HTML.
Releasing to DockerHub
I build and release this to DockerHub for easier deployment on your server. To release the latest code:
docker build -t rahulbot/news-entity-server .
docker push rahulbot/news-entity-server
To release a tagged version:
docker build -t rahulbot/news-entity-server:2.4.2 .
docker push rahulbot/news-entity-server:2.4.2
Contributors
- Rahul Bhargava
- Sybille Légitime
- Karabo Kakopo
Owner
- Name: counterdata-network
- Login: counterdata-network
- Kind: organization
- Repositories: 1
- Profile: https://github.com/counterdata-network
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: News Entity Server
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Rahul
family-names: Bhargava
email: r.bhargava@northeastern.edu
affiliation: Northeastern University
orcid: 'https://orcid.org/0000-0003-3904-4302'
- given-names: Harini
family-names: Suresh
affiliation: Brown University
repository-code: 'https://github.com/dataculturegroup/news-entity-server'
repository-artifact: 'https://hub.docker.com/r/rahulbot/news-entity-server'
abstract: >-
A small API server to return entities and other metadata
for online news articles. Originally built to support the
Data Against Feminicide project. Technically, this
exposes API endpoints that accepts URLs and returns
entities in JSON. Uses spaCy under the hood for entity
extraction.
keywords:
- computational journalism
- entity extraction
- api
license: MIT
version: v2.4.3
date-released: '2024-07-02'
GitHub Events
Total
- Issues event: 4
- Delete event: 1
- Issue comment event: 7
- Push event: 19
- Pull request event: 4
- Pull request review event: 10
- Pull request review comment event: 15
- Create event: 5
Last Year
- Issues event: 4
- Delete event: 1
- Issue comment event: 7
- Push event: 19
- Pull request event: 4
- Pull request review event: 10
- Pull request review comment event: 15
- Create event: 5
Dependencies
- fastapi ==0.78.
- gunicorn ==20.1.0
- mediacloud-metadata ==0.5.
- pytest ==7.1.2
- python-dotenv ==0.20.
- requests *
- sentry-sdk ==1.7.
- spacy ==3.4.
- uvicorn *
- actions/checkout v2 composite
- docker/build-push-action v2.10.0 composite
- docker/login-action v1.14.1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- python 3.10 build