news-entity-server

Pull text out a URL and return a list of all the entities in it. Supports multiple languages. Used by Highlighter tool and others.

https://github.com/counterdata-network/news-entity-server

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Pull text out a URL and return a list of all the entities in it. Supports multiple languages. Used by Highlighter tool and others.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 4
  • Releases: 0
Created almost 5 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License Citation

README.md

News Entity Server

Python tests Docker Image CI

A small API server to return entities and other metadata for online news articles. Originally built to support the Data Against Feminicide project. Technically, this exposes API endpoints that accepts URLs and returns entities in JSON. Uses spaCy under the hood for entity extraction.

Install from Docker: The easiest approach to just start using this is to install the pre-built image from DockerHub. Set a WEB_CONCURRENCY env var if you want more than one worker.

docker pull rahulbot/news-entity-server:latest docker run -p 8000:8000 -e MODEL_MODE=small -m 8G news-entity-server:latest

Developing

Installation

pip install -r requirements.txt ./install.sh

Running Locally

Run locally with Gunicorn: ./run.sh

Or via docker: docker image build -t news-entity-server . docker container run --rm -it -p 8000:8000 -e MODEL_MODE=small news-entity-server

Testing

Just run pytest to run a small set of test on the API endpoints.

Usage

API documentation is available at http://localhost:5000/redoc. See the code in test/test_server.py for examples.

API Endpoints

Every endpoint returns a dict like this:

json { "duration": 123, "status": "ok", "version": "0.0.1", "results": { ... } }

  • duration: the number of milliseconds the request too to complete on the server
  • status: "ok" if it worked, "error" if it did not work
  • version: a semantically versioned number indicating the server version
  • results: a dict of the results you requested (potentially different for different endpoints)

/entities/from-url

POST a url and language to this endpoint and it returns JSON with all the entities it finds. Add a title argument, set to 1 or 0, to optionally include the article title in the entity extraction.

/entities/from-content

POST text and language content to this endpoint, and it returns JSON with all the entities it finds.

/content/from-url

POST a url to this endpoint, and it returns just the extracted content from the HTML.

Releasing to DockerHub

I build and release this to DockerHub for easier deployment on your server. To release the latest code:

docker build -t rahulbot/news-entity-server . docker push rahulbot/news-entity-server

To release a tagged version:

docker build -t rahulbot/news-entity-server:2.4.2 . docker push rahulbot/news-entity-server:2.4.2

Contributors

  • Rahul Bhargava
  • Sybille Légitime
  • Karabo Kakopo

Owner

  • Name: counterdata-network
  • Login: counterdata-network
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: News Entity Server
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Rahul
    family-names: Bhargava
    email: r.bhargava@northeastern.edu
    affiliation: Northeastern University
    orcid: 'https://orcid.org/0000-0003-3904-4302'
  - given-names: Harini
    family-names: Suresh
    affiliation: Brown University
repository-code: 'https://github.com/dataculturegroup/news-entity-server'
repository-artifact: 'https://hub.docker.com/r/rahulbot/news-entity-server'
abstract: >-
  A small API server to return entities and other metadata
  for online news articles. Originally built to support the
  Data Against Feminicide⁠ project. Technically, this
  exposes API endpoints that accepts URLs and returns
  entities in JSON. Uses spaCy under the hood for entity
  extraction.
keywords:
  - computational journalism
  - entity extraction
  - api
license: MIT
version: v2.4.3
date-released: '2024-07-02'

GitHub Events

Total
  • Issues event: 4
  • Delete event: 1
  • Issue comment event: 7
  • Push event: 19
  • Pull request event: 4
  • Pull request review event: 10
  • Pull request review comment event: 15
  • Create event: 5
Last Year
  • Issues event: 4
  • Delete event: 1
  • Issue comment event: 7
  • Push event: 19
  • Pull request event: 4
  • Pull request review event: 10
  • Pull request review comment event: 15
  • Create event: 5

Dependencies

requirements.txt pypi
  • fastapi ==0.78.
  • gunicorn ==20.1.0
  • mediacloud-metadata ==0.5.
  • pytest ==7.1.2
  • python-dotenv ==0.20.
  • requests *
  • sentry-sdk ==1.7.
  • spacy ==3.4.
  • uvicorn *
.github/workflows/docker-image.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action v2.10.0 composite
  • docker/login-action v1.14.1 composite
  • docker/setup-buildx-action v1 composite
  • docker/setup-qemu-action v1 composite
.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
Dockerfile docker
  • python 3.10 build