https://github.com/copyleftdev/table_extractor_api

A FastAPI-based service for extracting tables from PDF files. The service supports extracting tables, rate limiting, and retrieving previously processed results.

https://github.com/copyleftdev/table_extractor_api

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

A FastAPI-based service for extracting tables from PDF files. The service supports extracting tables, rate limiting, and retrieving previously processed results.

Basic Info
  • Host: GitHub
  • Owner: copyleftdev
  • License: gpl-2.0
  • Language: Python
  • Default Branch: main
  • Size: 62.5 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

tableextractorapi

GitHub issues GitHub forks GitHub stars GitHub license Build Status Python FastAPI Docker Redis

Table Extractor API

A FastAPI-based service for extracting tables from PDF files. The service supports extracting tables, rate limiting, and retrieving previously processed results.

Features

  • Extract tables from PDF files.
  • Rate limiting to prevent abuse.
  • Retrieve previously processed results.
  • Support for uploading multiple files with pagination (TODO).
  • OAuth2 authentication (TODO).

Getting Started

Prerequisites

  • Docker
  • Docker Compose

Installation

  1. Clone the repository: bash git clone https://github.com/copyleftdev/table_extractor_api.git cd table_extractor_api

  2. Build and start the Docker containers: bash docker-compose up --build

API Endpoints

Extract Tables from PDF

bash curl -X 'POST' \ 'http://localhost:8000/extract_tables' \ -H 'accept: application/json' \ -H 'Content-Type: multipart/form-data' \ -F 'file=@/path/to/your/file.pdf'

Retrieve Extraction Result by ID

bash curl -X 'GET' \ 'http://localhost:8000/result/{result_id}' \ -H 'accept: application/json'

Development

  1. Install dependencies: bash pip install -r requirements.txt

  2. Run the development server: bash uvicorn app.api:app --reload

Technology Stack

Python FastAPI Docker Redis

Owner

  • Name: Donald Johnson
  • Login: copyleftdev
  • Kind: user
  • Location: Los Angeles

GitHub Events

Total
Last Year

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 3
  • Total Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Don Johnson dj@c****o 2
L337[69988bc5]SIGMA d****n@c****o 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

Dockerfile docker
  • python 3.12-slim build
docker-compose.yml docker
  • redis alpine
  • rediscommander/redis-commander latest
Pipfile pypi
  • deepdiff *
  • fastapi *
  • fpdf2 *
  • install *
  • pandas *
  • pdfplumber *
  • pymupdf *
  • pytest *
  • python-jose *
  • redis *
  • reportlab *
  • requests *
  • slowapi *
  • uvicorn *
Pipfile.lock pypi
  • annotated-types ==0.7.0
  • anyio ==4.4.0
  • certifi ==2024.6.2
  • cffi ==1.16.0
  • chardet ==5.2.0
  • charset-normalizer ==3.3.2
  • click ==8.1.7
  • colorama ==0.4.6
  • cryptography ==42.0.8
  • deepdiff ==7.0.1
  • defusedxml ==0.7.1
  • deprecated ==1.2.14
  • dnspython ==2.6.1
  • ecdsa ==0.19.0
  • email-validator ==2.1.1
  • fastapi ==0.111.0
  • fastapi-cli ==0.0.4
  • fonttools ==4.53.0
  • fpdf2 ==2.7.9
  • h11 ==0.14.0
  • httpcore ==1.0.5
  • httptools ==0.6.1
  • httpx ==0.27.0
  • idna ==3.7
  • importlib-resources ==6.4.0
  • iniconfig ==2.0.0
  • install ==1.3.5
  • itsdangerous ==2.2.0
  • jinja2 ==3.1.4
  • limits ==3.12.0
  • markdown-it-py ==3.0.0
  • markupsafe ==2.1.5
  • mdurl ==0.1.2
  • numpy ==1.26.4
  • ordered-set ==4.1.0
  • orjson ==3.10.5
  • packaging ==24.1
  • pandas ==2.2.2
  • pdfminer.six ==20231228
  • pdfplumber ==0.11.1
  • pillow ==10.3.0
  • pluggy ==1.5.0
  • pyasn1 ==0.6.0
  • pycparser ==2.22
  • pydantic ==2.7.4
  • pydantic-core ==2.18.4
  • pydantic-extra-types ==2.8.1
  • pydantic-settings ==2.3.3
  • pygments ==2.18.0
  • pymupdf ==1.24.5
  • pymupdfb ==1.24.3
  • pypdfium2 ==4.30.0
  • pytest ==8.2.2
  • python-dateutil ==2.9.0.post0
  • python-dotenv ==1.0.1
  • python-jose ==3.3.0
  • python-multipart ==0.0.9
  • pytz ==2024.1
  • pyyaml ==6.0.1
  • redis ==5.0.6
  • reportlab ==4.2.0
  • requests ==2.32.3
  • rich ==13.7.1
  • rsa ==4.9
  • shellingham ==1.5.4
  • six ==1.16.0
  • slowapi ==0.1.9
  • sniffio ==1.3.1
  • starlette ==0.37.2
  • typer ==0.12.3
  • typing-extensions ==4.12.2
  • tzdata ==2024.1
  • ujson ==5.10.0
  • urllib3 ==2.2.1
  • uvicorn ==0.30.1
  • watchfiles ==0.22.0
  • websockets ==12.0
  • wrapt ==1.16.0
requirements.txt pypi
  • fastapi *
  • pandas *
  • pdfplumber *
  • python-jose *
  • redis *
  • slowapi *
  • uvicorn *