Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.8%) to scientific vocabulary
Keywords
Repository
Simple PDF text extraction
Basic Info
Statistics
- Stars: 922
- Watchers: 16
- Forks: 102
- Open Issues: 15
- Releases: 15
Topics
Metadata Files
README.md
pdftotext
Simple PDF text extraction
```python import pdftotext
Load your PDF
with open("lorem_ipsum.pdf", "rb") as f: pdf = pdftotext.PDF(f)
If it's password-protected
with open("secure.pdf", "rb") as f: pdf = pdftotext.PDF(f, "secret")
How many pages?
print(len(pdf))
Iterate over all the pages
for page in pdf: print(page)
Read some individual pages
print(pdf[0]) print(pdf[1])
Read all the text into one string
print("\n\n".join(pdf)) ```
OS Dependencies
These instructions assume you're on a recent OS. Package names may differ for an older OS.
Debian, Ubuntu, and friends
sudo apt install build-essential libpoppler-cpp-dev pkg-config python3-dev
Fedora, Red Hat, and friends
sudo yum install gcc-c++ pkgconfig poppler-cpp-devel python3-devel
macOS
brew install pkg-config poppler python
Windows
Currently tested only when using conda:
- Install the Microsoft Visual C++ Build Tools
- Install poppler through conda:
conda install -c conda-forge poppler
Install
pip install pdftotext
Owner
- Name: Jason Alan Palmer
- Login: jalan
- Kind: user
- Repositories: 11
- Profile: https://github.com/jalan
Born under a bad sign
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 4
- Watch event: 73
- Delete event: 2
- Issue comment event: 14
- Push event: 9
- Fork event: 4
Last Year
- Create event: 2
- Release event: 1
- Issues event: 4
- Watch event: 73
- Delete event: 2
- Issue comment event: 14
- Push event: 9
- Fork event: 4
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| jalan | j****r@g****m | 136 |
| William Sackfield | w****d@g****m | 2 |
| Jason Woods | j****s@d****e | 1 |
| Karthikeyan Singaravelan | t****i@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 102
- Total pull requests: 13
- Average time to close issues: 4 months
- Average time to close pull requests: 6 months
- Total issue authors: 93
- Total pull request authors: 12
- Average comments per issue: 3.93
- Average comments per pull request: 4.38
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 4
- Pull request authors: 0
- Average comments per issue: 1.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jalan (7)
- ripspin5 (2)
- eladbitton (2)
- stefan6419846 (2)
- Mubarak0888 (2)
- svaidyans (1)
- mdaeron (1)
- MartinThoma (1)
- lohithn4 (1)
- fladi (1)
- caseydm (1)
- mrooding (1)
- MiaHuang97 (1)
- A-Gulati (1)
- sheikgit (1)
Pull Request Authors
- mstackhouse (2)
- smancill (1)
- Shorotshishir (1)
- bauerj (1)
- jalan (1)
- owen9825 (1)
- asif-mahmud (1)
- 8W9aG (1)
- tirkarthi (1)
- sunn-e (1)
- woodsjs (1)
- wileykestner (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 40,052 last-month
- Total docker downloads: 1,348
-
Total dependent packages: 17
(may contain duplicates) -
Total dependent repositories: 258
(may contain duplicates) - Total versions: 21
- Total maintainers: 1
pypi.org: pdftotext
Simple PDF text extraction
- Homepage: https://github.com/jalan/pdftotext
- Documentation: https://pdftotext.readthedocs.io/
- License: MIT
-
Latest release: 3.0.0
published about 1 year ago
Rankings
Maintainers (1)
conda-forge.org: pdftotext
Simple PDF text extraction
- Homepage: https://github.com/jalan/pdftotext
- License: MIT
-
Latest release: 2.2.2
published over 4 years ago
Rankings
Dependencies
- actions/checkout v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- conda-incubator/setup-miniconda v2 composite