pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
7 of 265 committers (2.6%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Basic Info
- Host: GitHub
- Owner: py-pdf
- License: other
- Language: Python
- Default Branch: main
- Homepage: https://pypdf.readthedocs.io/en/latest/
- Size: 23.3 MB
Statistics
- Stars: 9,391
- Watchers: 148
- Forks: 1,493
- Open Issues: 127
- Releases: 112
Topics
Metadata Files
README.md
pypdf
pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
See pdfly for a CLI application that uses pypdf to interact with PDFs.
Installation
Install pypdf using pip:
pip install pypdf
For using pypdf with AES encryption or decryption, install extra dependencies:
pip install pypdf[crypto]
NOTE:
pypdf3.1.0 and above include significant improvements compared to previous versions. Please refer to the migration guide for more information.
Usage
```python from pypdf import PdfReader
reader = PdfReader("example.pdf") numberofpages = len(reader.pages) page = reader.pages[0] text = page.extract_text() ```
pypdf can do a lot more, e.g. splitting, merging, reading and creating annotations, decrypting and encrypting. Check out the documentation for additional usage examples!
For questions and answers, visit StackOverflow (tagged with pypdf).
Contributions
Maintaining pypdf is a collaborative effort. You can support the project by writing documentation, helping to narrow down issues, and submitting code. See the CONTRIBUTING.md file for more information.
Q&A
The experience pypdf users have covers the whole range from beginner to expert. You can contribute to the pypdf community by answering questions on StackOverflow, helping in discussions, and asking users who report issues for MCVE's (Code + example PDF!).
Issues
A good bug ticket includes a MCVE - a minimal complete verifiable example.
For pypdf, this means that you must upload a PDF that causes the bug to occur
as well as the code you're executing with all of the output. Use
print(pypdf.__version__) to tell us which version you're using.
Code
All code contributions are welcome, but smaller ones have a better chance to get included in a timely manner. Adding unit tests for new features or test cases for bugs you've fixed help us to ensure that the Pull Request (PR) is fine.
pypdf includes a test suite which can be executed with pytest:
```bash $ pytest ===================== test session starts ===================== platform linux -- Python 3.6.15, pytest-7.0.1, pluggy-1.0.0 rootdir: /home/moose/GitHub/Martin/pypdf plugins: cov-3.0.0 collected 233 items
tests/testbasicfeatures.py .. [ 0%] tests/testconstants.py . [ 1%] tests/testfilters.py .................x..... [ 11%] tests/testgeneric.py ................................. [ 25%] ............. [ 30%] tests/testjavascript.py .. [ 31%] tests/testmerger.py . [ 32%] tests/testpage.py ......................... [ 42%] tests/testpagerange.py ................ [ 49%] tests/testpapersizes.py .................. [ 57%] tests/testreader.py .................................. [ 72%] ............... [ 78%] tests/testutils.py .................... [ 87%] tests/testworkflows.py .......... [ 91%] tests/testwriter.py ................. [ 98%] tests/test_xmp.py ... [100%]
========== 232 passed, 1 xfailed, 1 warning in 4.52s ========== ```
Owner
- Name: py-pdf
- Login: py-pdf
- Kind: organization
- Email: info@martin-thoma.de
- Twitter: py_pdf
- Repositories: 11
- Profile: https://github.com/py-pdf
The py-pdf organization maintains Python packages that deal with the PDF file format
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martin Thoma | i****o@m****e | 719 |
| pubpub-zz | 4****z | 238 |
| j-t-1 | 1****1 | 203 |
| Stefan | 9****6 | 146 |
| Matthew Stamy | m****4@g****m | 96 |
| Noah Kessler | n****2@g****m | 38 |
| Matthew Peveler | m****r@g****m | 33 |
| exiledkingcc | e****c@g****m | 17 |
| switham | g****b@m****m | 13 |
| dependabot[bot] | 4****] | 10 |
| mozbugbox | m****x@y****u | 10 |
| Sylvain Pelissier | s****r@g****m | 8 |
| Pierre-Alain Mignot | me@p****g | 7 |
| dkg | d****g@f****t | 7 |
| mtd91429 | m****9 | 7 |
| speedplane | m****5@c****u | 7 |
| Ryo Kamei | 4****i | 6 |
| Rob Oakes | r****s@g****m | 6 |
| Harry Karvonen | h****n@g****m | 5 |
| Noah Jackowitz | n****z@u****l | 5 |
| Kushal Kumaran | k****b@g****m | 4 |
| marcstober | m****r@g****m | 4 |
| Henry Keiter | h****r@g****m | 3 |
| Christian Clauss | c****s@m****m | 3 |
| TWAC | 3 | |
| Lucas Cimon | 9****C | 3 |
| Maxim Kamenkov | m****v@g****m | 3 |
| Moshe Kaplan | m****n@g****m | 3 |
| Rob1080 | r****s@g****m | 3 |
| Sascha Rogmann | 5****n | 3 |
| and 235 more... | ||
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 651
- Total pull requests: 1,346
- Average time to close issues: 6 months
- Average time to close pull requests: 9 days
- Total issue authors: 430
- Total pull request authors: 140
- Average comments per issue: 4.0
- Average comments per pull request: 2.65
- Merged pull requests: 1,061
- Bot issues: 0
- Bot pull requests: 23
Past Year
- Issues: 177
- Pull requests: 659
- Average time to close issues: 8 days
- Average time to close pull requests: 4 days
- Issue authors: 106
- Pull request authors: 53
- Average comments per issue: 1.82
- Average comments per pull request: 1.91
- Merged pull requests: 512
- Bot issues: 0
- Bot pull requests: 8
Top Authors
Issue Authors
- stefan6419846 (77)
- MartinThoma (37)
- pubpub-zz (15)
- j-t-1 (10)
- michelcrypt4d4mus (10)
- Avgor46 (7)
- neeraj9 (6)
- Piloudev (4)
- kitterma (4)
- KanorUbu (4)
- larsga (4)
- mnmtz (4)
- thusharagokulnath (4)
- kaos-ocs (3)
- hackowitz-af (3)
Pull Request Authors
- j-t-1 (433)
- stefan6419846 (231)
- pubpub-zz (209)
- MartinThoma (131)
- dependabot[bot] (23)
- exiledkingcc (13)
- hackowitz-af (12)
- ssjkamei (11)
- syanng (10)
- m32 (10)
- rsinger417 (8)
- larsga (7)
- henningkoertelgmg (6)
- shartzog (5)
- Lucas-C (5)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 19
-
Total downloads:
- pypi 17,212,658 last-month
- Total docker downloads: 1,138,342,757
-
Total dependent packages: 397
(may contain duplicates) -
Total dependent repositories: 3,809
(may contain duplicates) - Total versions: 152
- Total maintainers: 3
- Total advisories: 3
pypi.org: pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
- Documentation: https://pypdf.readthedocs.io/en/latest/
- License: other
-
Latest release: 6.0.0
published 6 months ago
Rankings
Maintainers (2)
Advisories (3)
alpine-v3.18: py3-pypdf-pyc
Precompiled Python bytecode for py3-pypdf
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.8.1-r0
published almost 3 years ago
Rankings
Maintainers (1)
alpine-v3.18: py3-pypdf
Pure-Python library built as a PDF toolkit
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.8.1-r0
published almost 3 years ago
Rankings
Maintainers (1)
alpine-edge: py3-pypdf-pyc
Precompiled Python bytecode for py3-pypdf
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 6.0.0-r0
published 6 months ago
Rankings
Maintainers (1)
alpine-edge: py3-pypdf
Pure-Python library built as a PDF toolkit
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 6.0.0-r0
published 6 months ago
Rankings
Maintainers (1)
proxy.golang.org: github.com/py-pdf/pypdf
- Documentation: https://pkg.go.dev/github.com/py-pdf/pypdf#section-documentation
- License: other
-
Latest release: v1.25.1
published over 10 years ago
Rankings
spack.io: py-pypdf
A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files
- Homepage: https://github.com/py-pdf/pypdf
- License: []
-
Latest release: 4.3.1
published over 1 year ago
Rankings
anaconda.org: pypdf-with-image
pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
- Homepage: https://pypi.org/project/pypdf
- License: BSD-3-Clause
-
Latest release: 4.2.0
published almost 2 years ago
Rankings
anaconda.org: pypdf-with-full
pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
- Homepage: https://pypi.org/project/pypdf
- License: BSD-3-Clause
-
Latest release: 4.2.0
published almost 2 years ago
Rankings
anaconda.org: pypdf
pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
- Homepage: https://pypi.org/project/pypdf
- License: BSD-3-Clause
-
Latest release: 4.2.0
published almost 2 years ago
Rankings
anaconda.org: pypdf-with-crypto
pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.
- Homepage: https://pypi.org/project/pypdf
- License: BSD-3-Clause
-
Latest release: 4.2.0
published almost 2 years ago
Rankings
alpine-v3.19: py3-pypdf
Pure-Python library built as a PDF toolkit
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.17.1-r0
published over 2 years ago
Rankings
Maintainers (1)
alpine-v3.20: py3-pypdf-pyc
Precompiled Python bytecode for py3-pypdf
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.17.4-r1
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.20: py3-pypdf
Pure-Python library built as a PDF toolkit
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.17.4-r1
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.19: py3-pypdf-pyc
Precompiled Python bytecode for py3-pypdf
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.17.1-r0
published over 2 years ago
Rankings
Maintainers (1)
alpine-v3.21: py3-pypdf
Pure-Python library built as a PDF toolkit
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.17.4-r1
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.22: py3-pypdf
Pure-Python library built as a PDF toolkit
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 5.3.1-r0
published 12 months ago
Rankings
Maintainers (1)
alpine-v3.21: py3-pypdf-pyc
Precompiled Python bytecode for py3-pypdf
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 3.17.4-r1
published almost 2 years ago
Rankings
Maintainers (1)
alpine-v3.22: py3-pypdf-pyc
Precompiled Python bytecode for py3-pypdf
- Homepage: https://github.com/py-pdf/pypdf
- License: BSD-3-Clause
-
Latest release: 5.3.1-r0
published 12 months ago
Rankings
Maintainers (1)
Dependencies
- coverage *
- flake8 *
- flake8-bugbear *
- flake8-print *
- flake8_implicit_str_concat *
- mypy *
- pillow *
- pycryptodome *
- pytest *
- pytest-benchmark *
- typeguard *
- types-Pillow *
- attrs ==20.3.0
- coverage ==6.2
- flake8 ==5.0.4
- flake8-bugbear ==22.7.1
- flake8-implicit-str-concat ==0.2.0
- flake8-print ==4.0.1
- importlib-metadata ==4.2.0
- iniconfig ==1.1.1
- mccabe ==0.7.0
- more-itertools ==8.13.0
- mypy ==0.971
- mypy-extensions ==0.4.3
- packaging ==21.3
- pillow ==8.4.0
- pluggy ==1.0.0
- py ==1.11.0
- py-cpuinfo ==8.0.0
- pycodestyle ==2.9.1
- pycryptodome ==3.15.0
- pyflakes ==2.5.0
- pyparsing ==3.0.9
- pytest ==7.0.1
- pytest-benchmark ==3.4.1
- six ==1.16.0
- tomli ==1.2.3
- typed-ast ==1.5.4
- typeguard ==2.13.3
- types-pillow ==9.2.1
- typing-extensions ==4.1.1
- zipp ==3.6.0
- black * development
- pip-tools * development
- pre-commit <2.18.0 development
- pytest-cov * development
- twine * development
- wheel * development
- attrs ==21.4.0 development
- black ==22.6.0 development
- bleach ==4.1.0 development
- certifi ==2022.6.15 development
- cffi ==1.15.1 development
- cfgv ==3.3.1 development
- charset-normalizer ==2.0.12 development
- click ==8.0.4 development
- colorama ==0.4.5 development
- coverage ==6.2 development
- cryptography ==37.0.4 development
- dataclasses ==0.8 development
- distlib ==0.3.5 development
- docutils ==0.18.1 development
- filelock ==3.4.1 development
- identify ==2.4.4 development
- idna ==3.3 development
- importlib-metadata ==4.8.3 development
- importlib-resources ==5.2.3 development
- iniconfig ==1.1.1 development
- jeepney ==0.7.1 development
- keyring ==23.4.1 development
- mypy-extensions ==0.4.3 development
- nodeenv ==1.6.0 development
- packaging ==21.3 development
- pathspec ==0.9.0 development
- pep517 ==0.12.0 development
- pip-tools ==6.4.0 development
- pkginfo ==1.8.3 development
- platformdirs ==2.4.0 development
- pluggy ==1.0.0 development
- pre-commit ==2.17.0 development
- py ==1.11.0 development
- pycparser ==2.21 development
- pygments ==2.12.0 development
- pyparsing ==3.0.9 development
- pytest ==7.0.1 development
- pytest-cov ==3.0.0 development
- pyyaml ==6.0 development
- readme-renderer ==34.0 development
- requests ==2.27.1 development
- requests-toolbelt ==0.9.1 development
- rfc3986 ==1.5.0 development
- secretstorage ==3.3.2 development
- six ==1.16.0 development
- toml ==0.10.2 development
- tomli ==1.2.3 development
- tqdm ==4.64.0 development
- twine ==3.8.0 development
- typed-ast ==1.5.4 development
- typing-extensions ==4.1.1 development
- urllib3 ==1.26.10 development
- virtualenv ==20.15.1 development
- webencodings ==0.5.1 development
- wheel ==0.37.1 development
- zipp ==3.6.0 development
- myst_parser *
- sphinx *
- sphinx_rtd_theme *
- alabaster ==0.7.12
- attrs ==21.4.0
- babel ==2.10.3
- certifi ==2022.6.15
- charset-normalizer ==2.0.12
- docutils ==0.17.1
- idna ==3.3
- imagesize ==1.4.1
- importlib-metadata ==4.8.3
- jinja2 ==3.0.3
- markdown-it-py ==2.0.1
- markupsafe ==2.0.1
- mdit-py-plugins ==0.3.0
- mdurl ==0.1.0
- myst-parser ==0.16.1
- packaging ==21.3
- pygments ==2.12.0
- pyparsing ==3.0.9
- pytz ==2022.1
- pyyaml ==6.0
- requests ==2.27.1
- snowballstemmer ==2.2.0
- sphinx ==4.5.0
- sphinx-rtd-theme ==1.0.0
- sphinxcontrib-applehelp ==1.0.2
- sphinxcontrib-devhelp ==1.0.2
- sphinxcontrib-htmlhelp ==2.0.0
- sphinxcontrib-jsmath ==1.0.1
- sphinxcontrib-qthelp ==1.0.3
- sphinxcontrib-serializinghtml ==1.1.5
- typing-extensions ==4.1.1
- urllib3 ==1.26.10
- zipp ==3.6.0
- actions/checkout v3 composite
- actions/setup-python v3 composite
- benchmark-action/github-action-benchmark v1 composite
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/download-artifact v3 composite
- actions/setup-python v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- codecov/codecov-action v2 composite
- softprops/action-gh-release v1 composite
- attrs ==22.1.0
- coverage ==6.5.0
- flake8 ==5.0.4
- flake8-bugbear ==22.10.27
- flake8-implicit-str-concat ==0.3.0
- flake8-print ==5.0.0
- iniconfig ==1.1.1
- mccabe ==0.7.0
- more-itertools ==8.14.0
- mypy ==0.982
- mypy-extensions ==0.4.3
- packaging ==21.3
- pillow ==9.3.0
- pluggy ==1.0.0
- py-cpuinfo ==9.0.0
- pycodestyle ==2.9.1
- pycryptodome ==3.15.0
- pyflakes ==2.5.0
- pyparsing ==3.0.9
- pytest ==7.2.0
- pytest-benchmark ==4.0.0
- typeguard ==2.13.3
- types-dataclasses ==0.6.6
- types-pillow ==9.2.2.2
- typing-extensions ==4.4.0
- dataclasses python_version < '3.7'
- typing_extensions >= 3.10.0.0; python_version < '3.10'
- actions/checkout v4 composite
- actions/setup-python v4 composite
- softprops/action-gh-release v1 composite