pdftotext

Simple PDF text extraction

https://github.com/jalan/pdftotext

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.8%) to scientific vocabulary

Keywords

pdf python
Last synced: 6 months ago · JSON representation

Repository

Simple PDF text extraction

Basic Info
  • Host: GitHub
  • Owner: jalan
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 227 KB
Statistics
  • Stars: 922
  • Watchers: 16
  • Forks: 102
  • Open Issues: 15
  • Releases: 15
Topics
pdf python
Created almost 9 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License

README.md

pdftotext

PyPI Tests Downloads

Simple PDF text extraction

```python import pdftotext

Load your PDF

with open("lorem_ipsum.pdf", "rb") as f: pdf = pdftotext.PDF(f)

If it's password-protected

with open("secure.pdf", "rb") as f: pdf = pdftotext.PDF(f, "secret")

How many pages?

print(len(pdf))

Iterate over all the pages

for page in pdf: print(page)

Read some individual pages

print(pdf[0]) print(pdf[1])

Read all the text into one string

print("\n\n".join(pdf)) ```

OS Dependencies

These instructions assume you're on a recent OS. Package names may differ for an older OS.

Debian, Ubuntu, and friends

sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev

Fedora, Red Hat, and friends

sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel

macOS

brew install pkg-config poppler python

Windows

Currently tested only when using conda:

  • Install the Microsoft Visual C++ Build Tools
  • Install poppler through conda: conda install -c conda-forge poppler

Install

pip install pdftotext

Owner

  • Name: Jason Alan Palmer
  • Login: jalan
  • Kind: user

Born under a bad sign

GitHub Events

Total
  • Create event: 2
  • Release event: 1
  • Issues event: 4
  • Watch event: 73
  • Delete event: 2
  • Issue comment event: 14
  • Push event: 9
  • Fork event: 4
Last Year
  • Create event: 2
  • Release event: 1
  • Issues event: 4
  • Watch event: 73
  • Delete event: 2
  • Issue comment event: 14
  • Push event: 9
  • Fork event: 4

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 140
  • Total Committers: 4
  • Avg Commits per committer: 35.0
  • Development Distribution Score (DDS): 0.029
Past Year
  • Commits: 4
  • Committers: 1
  • Avg Commits per committer: 4.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
jalan j****r@g****m 136
William Sackfield w****d@g****m 2
Jason Woods j****s@d****e 1
Karthikeyan Singaravelan t****i@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 102
  • Total pull requests: 13
  • Average time to close issues: 4 months
  • Average time to close pull requests: 6 months
  • Total issue authors: 93
  • Total pull request authors: 12
  • Average comments per issue: 3.93
  • Average comments per pull request: 4.38
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 4
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jalan (7)
  • ripspin5 (2)
  • eladbitton (2)
  • stefan6419846 (2)
  • Mubarak0888 (2)
  • svaidyans (1)
  • mdaeron (1)
  • MartinThoma (1)
  • lohithn4 (1)
  • fladi (1)
  • caseydm (1)
  • mrooding (1)
  • MiaHuang97 (1)
  • A-Gulati (1)
  • sheikgit (1)
Pull Request Authors
  • mstackhouse (2)
  • smancill (1)
  • Shorotshishir (1)
  • bauerj (1)
  • jalan (1)
  • owen9825 (1)
  • asif-mahmud (1)
  • 8W9aG (1)
  • tirkarthi (1)
  • sunn-e (1)
  • woodsjs (1)
  • wileykestner (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 40,052 last-month
  • Total docker downloads: 1,348
  • Total dependent packages: 17
    (may contain duplicates)
  • Total dependent repositories: 258
    (may contain duplicates)
  • Total versions: 21
  • Total maintainers: 1
pypi.org: pdftotext

Simple PDF text extraction

  • Versions: 16
  • Dependent Packages: 17
  • Dependent Repositories: 257
  • Downloads: 40,052 Last month
  • Docker Downloads: 1,348
Rankings
Dependent packages count: 0.7%
Dependent repos count: 1.0%
Downloads: 1.2%
Average: 2.0%
Docker downloads count: 2.2%
Stargazers count: 2.3%
Forks count: 4.6%
Maintainers (1)
Last synced: about 1 year ago
conda-forge.org: pdftotext

Simple PDF text extraction

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 1
Rankings
Stargazers count: 15.2%
Forks count: 18.6%
Dependent repos count: 24.1%
Average: 27.3%
Dependent packages count: 51.5%
Last synced: 7 months ago

Dependencies

.github/workflows/format.yml actions
  • actions/checkout v3 composite
.github/workflows/tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • conda-incubator/setup-miniconda v2 composite