https://github.com/copyleftdev/table_extractor_api
A FastAPI-based service for extracting tables from PDF files. The service supports extracting tables, rate limiting, and retrieving previously processed results.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.3%) to scientific vocabulary
Repository
A FastAPI-based service for extracting tables from PDF files. The service supports extracting tables, rate limiting, and retrieving previously processed results.
Basic Info
- Host: GitHub
- Owner: copyleftdev
- License: gpl-2.0
- Language: Python
- Default Branch: main
- Size: 62.5 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
tableextractorapi
Table Extractor API
A FastAPI-based service for extracting tables from PDF files. The service supports extracting tables, rate limiting, and retrieving previously processed results.
Features
- Extract tables from PDF files.
- Rate limiting to prevent abuse.
- Retrieve previously processed results.
- Support for uploading multiple files with pagination (TODO).
- OAuth2 authentication (TODO).
Getting Started
Prerequisites
- Docker
- Docker Compose
Installation
Clone the repository:
bash git clone https://github.com/copyleftdev/table_extractor_api.git cd table_extractor_apiBuild and start the Docker containers:
bash docker-compose up --build
API Endpoints
Extract Tables from PDF
bash
curl -X 'POST' \
'http://localhost:8000/extract_tables' \
-H 'accept: application/json' \
-H 'Content-Type: multipart/form-data' \
-F 'file=@/path/to/your/file.pdf'
Retrieve Extraction Result by ID
bash
curl -X 'GET' \
'http://localhost:8000/result/{result_id}' \
-H 'accept: application/json'
Development
Install dependencies:
bash pip install -r requirements.txtRun the development server:
bash uvicorn app.api:app --reload
Technology Stack
Owner
- Name: Donald Johnson
- Login: copyleftdev
- Kind: user
- Location: Los Angeles
- Repositories: 39
- Profile: https://github.com/copyleftdev
GitHub Events
Total
Last Year
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Don Johnson | dj@c****o | 2 |
| L337[69988bc5]SIGMA | d****n@c****o | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- python 3.12-slim build
- redis alpine
- rediscommander/redis-commander latest
- deepdiff *
- fastapi *
- fpdf2 *
- install *
- pandas *
- pdfplumber *
- pymupdf *
- pytest *
- python-jose *
- redis *
- reportlab *
- requests *
- slowapi *
- uvicorn *
- annotated-types ==0.7.0
- anyio ==4.4.0
- certifi ==2024.6.2
- cffi ==1.16.0
- chardet ==5.2.0
- charset-normalizer ==3.3.2
- click ==8.1.7
- colorama ==0.4.6
- cryptography ==42.0.8
- deepdiff ==7.0.1
- defusedxml ==0.7.1
- deprecated ==1.2.14
- dnspython ==2.6.1
- ecdsa ==0.19.0
- email-validator ==2.1.1
- fastapi ==0.111.0
- fastapi-cli ==0.0.4
- fonttools ==4.53.0
- fpdf2 ==2.7.9
- h11 ==0.14.0
- httpcore ==1.0.5
- httptools ==0.6.1
- httpx ==0.27.0
- idna ==3.7
- importlib-resources ==6.4.0
- iniconfig ==2.0.0
- install ==1.3.5
- itsdangerous ==2.2.0
- jinja2 ==3.1.4
- limits ==3.12.0
- markdown-it-py ==3.0.0
- markupsafe ==2.1.5
- mdurl ==0.1.2
- numpy ==1.26.4
- ordered-set ==4.1.0
- orjson ==3.10.5
- packaging ==24.1
- pandas ==2.2.2
- pdfminer.six ==20231228
- pdfplumber ==0.11.1
- pillow ==10.3.0
- pluggy ==1.5.0
- pyasn1 ==0.6.0
- pycparser ==2.22
- pydantic ==2.7.4
- pydantic-core ==2.18.4
- pydantic-extra-types ==2.8.1
- pydantic-settings ==2.3.3
- pygments ==2.18.0
- pymupdf ==1.24.5
- pymupdfb ==1.24.3
- pypdfium2 ==4.30.0
- pytest ==8.2.2
- python-dateutil ==2.9.0.post0
- python-dotenv ==1.0.1
- python-jose ==3.3.0
- python-multipart ==0.0.9
- pytz ==2024.1
- pyyaml ==6.0.1
- redis ==5.0.6
- reportlab ==4.2.0
- requests ==2.32.3
- rich ==13.7.1
- rsa ==4.9
- shellingham ==1.5.4
- six ==1.16.0
- slowapi ==0.1.9
- sniffio ==1.3.1
- starlette ==0.37.2
- typer ==0.12.3
- typing-extensions ==4.12.2
- tzdata ==2024.1
- ujson ==5.10.0
- urllib3 ==2.2.1
- uvicorn ==0.30.1
- watchfiles ==0.22.0
- websockets ==12.0
- wrapt ==1.16.0
- fastapi *
- pandas *
- pdfplumber *
- python-jose *
- redis *
- slowapi *
- uvicorn *