pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

https://github.com/py-pdf/pypdf

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    7 of 265 committers (2.6%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary

Keywords

help-wanted pdf pdf-documents pdf-manipulation pdf-parser pdf-parsing pypdf2 python

Keywords from Contributors

templates apps tensor views transformation closember test-data-generator test-data faker-generator faker
Last synced: 6 months ago · JSON representation

Repository

A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files

Basic Info
Statistics
  • Stars: 9,391
  • Watchers: 148
  • Forks: 1,493
  • Open Issues: 127
  • Releases: 112
Topics
help-wanted pdf pdf-documents pdf-manipulation pdf-parser pdf-parsing pypdf2 python
Created about 14 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Security

README.md

PyPI version Python Support GitHub last commit codecov

pypdf

pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.

See pdfly for a CLI application that uses pypdf to interact with PDFs.

Installation

Install pypdf using pip:

pip install pypdf

For using pypdf with AES encryption or decryption, install extra dependencies:

pip install pypdf[crypto]

NOTE: pypdf 3.1.0 and above include significant improvements compared to previous versions. Please refer to the migration guide for more information.

Usage

```python from pypdf import PdfReader

reader = PdfReader("example.pdf") numberofpages = len(reader.pages) page = reader.pages[0] text = page.extract_text() ```

pypdf can do a lot more, e.g. splitting, merging, reading and creating annotations, decrypting and encrypting. Check out the documentation for additional usage examples!

For questions and answers, visit StackOverflow (tagged with pypdf).

Contributions

Maintaining pypdf is a collaborative effort. You can support the project by writing documentation, helping to narrow down issues, and submitting code. See the CONTRIBUTING.md file for more information.

Q&A

The experience pypdf users have covers the whole range from beginner to expert. You can contribute to the pypdf community by answering questions on StackOverflow, helping in discussions, and asking users who report issues for MCVE's (Code + example PDF!).

Issues

A good bug ticket includes a MCVE - a minimal complete verifiable example. For pypdf, this means that you must upload a PDF that causes the bug to occur as well as the code you're executing with all of the output. Use print(pypdf.__version__) to tell us which version you're using.

Code

All code contributions are welcome, but smaller ones have a better chance to get included in a timely manner. Adding unit tests for new features or test cases for bugs you've fixed help us to ensure that the Pull Request (PR) is fine.

pypdf includes a test suite which can be executed with pytest:

```bash $ pytest ===================== test session starts ===================== platform linux -- Python 3.6.15, pytest-7.0.1, pluggy-1.0.0 rootdir: /home/moose/GitHub/Martin/pypdf plugins: cov-3.0.0 collected 233 items

tests/testbasicfeatures.py .. [ 0%] tests/testconstants.py . [ 1%] tests/testfilters.py .................x..... [ 11%] tests/testgeneric.py ................................. [ 25%] ............. [ 30%] tests/testjavascript.py .. [ 31%] tests/testmerger.py . [ 32%] tests/testpage.py ......................... [ 42%] tests/testpagerange.py ................ [ 49%] tests/testpapersizes.py .................. [ 57%] tests/testreader.py .................................. [ 72%] ............... [ 78%] tests/testutils.py .................... [ 87%] tests/testworkflows.py .......... [ 91%] tests/testwriter.py ................. [ 98%] tests/test_xmp.py ... [100%]

========== 232 passed, 1 xfailed, 1 warning in 4.52s ========== ```

Owner

  • Name: py-pdf
  • Login: py-pdf
  • Kind: organization
  • Email: info@martin-thoma.de

The py-pdf organization maintains Python packages that deal with the PDF file format

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 1,899
  • Total Committers: 265
  • Avg Commits per committer: 7.166
  • Development Distribution Score (DDS): 0.621
Past Year
  • Commits: 356
  • Committers: 42
  • Avg Commits per committer: 8.476
  • Development Distribution Score (DDS): 0.534
Top Committers
Name Email Commits
Martin Thoma i****o@m****e 719
pubpub-zz 4****z 238
j-t-1 1****1 203
Stefan 9****6 146
Matthew Stamy m****4@g****m 96
Noah Kessler n****2@g****m 38
Matthew Peveler m****r@g****m 33
exiledkingcc e****c@g****m 17
switham g****b@m****m 13
dependabot[bot] 4****] 10
mozbugbox m****x@y****u 10
Sylvain Pelissier s****r@g****m 8
Pierre-Alain Mignot me@p****g 7
dkg d****g@f****t 7
mtd91429 m****9 7
speedplane m****5@c****u 7
Ryo Kamei 4****i 6
Rob Oakes r****s@g****m 6
Harry Karvonen h****n@g****m 5
Noah Jackowitz n****z@u****l 5
Kushal Kumaran k****b@g****m 4
marcstober m****r@g****m 4
Henry Keiter h****r@g****m 3
Christian Clauss c****s@m****m 3
TWAC 3
Lucas Cimon 9****C 3
Maxim Kamenkov m****v@g****m 3
Moshe Kaplan m****n@g****m 3
Rob1080 r****s@g****m 3
Sascha Rogmann 5****n 3
and 235 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 651
  • Total pull requests: 1,346
  • Average time to close issues: 6 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 430
  • Total pull request authors: 140
  • Average comments per issue: 4.0
  • Average comments per pull request: 2.65
  • Merged pull requests: 1,061
  • Bot issues: 0
  • Bot pull requests: 23
Past Year
  • Issues: 177
  • Pull requests: 659
  • Average time to close issues: 8 days
  • Average time to close pull requests: 4 days
  • Issue authors: 106
  • Pull request authors: 53
  • Average comments per issue: 1.82
  • Average comments per pull request: 1.91
  • Merged pull requests: 512
  • Bot issues: 0
  • Bot pull requests: 8
Top Authors
Issue Authors
  • stefan6419846 (77)
  • MartinThoma (37)
  • pubpub-zz (15)
  • j-t-1 (10)
  • michelcrypt4d4mus (10)
  • Avgor46 (7)
  • neeraj9 (6)
  • Piloudev (4)
  • kitterma (4)
  • KanorUbu (4)
  • larsga (4)
  • mnmtz (4)
  • thusharagokulnath (4)
  • kaos-ocs (3)
  • hackowitz-af (3)
Pull Request Authors
  • j-t-1 (433)
  • stefan6419846 (231)
  • pubpub-zz (209)
  • MartinThoma (131)
  • dependabot[bot] (23)
  • exiledkingcc (13)
  • hackowitz-af (12)
  • ssjkamei (11)
  • syanng (10)
  • m32 (10)
  • rsinger417 (8)
  • larsga (7)
  • henningkoertelgmg (6)
  • shartzog (5)
  • Lucas-C (5)
Top Labels
Issue Labels
is-bug (98) workflow-text-extraction (54) workflow-forms (39) is-feature (37) is-robustness-issue (35) Has MCVE (31) nf-documentation (28) workflow-images (26) needs-pdf (25) PdfWriter (21) PdfReader (17) key-error (17) workflow-annotation (16) whitespace (12) is-cjk-issue (11) is-maintenance (11) nf-security (11) Meta (10) is-regression (10) help wanted (9) is-question (8) needs-example-code (8) needs-discussion (6) Easy (6) workflow-merge (6) PdfMerger (5) nf-performance (5) workflow-encryption (5) breaking-change (5) cannot-reproduce (4)
Pull Request Labels
nf-documentation (34) dependencies (23) needs-test (17) github_actions (16) on-hold (16) soon (15) is-bug (13) nf-security (12) needs-discussion (12) nf-packaging (10) needs-change (10) nf-ci (10) nf-testing (9) is-feature (9) python (7) PdfWriter (5) breaking-change (5) workflow-images (5) help wanted (4) workflow-encryption (4) is-maintenance (4) nf-performance (3) PdfReader (3) is-robustness-issue (3) is-regression (2) needs-pdf (2) workflow-text-extraction (2) workflow-annotation (2) workflow-forms (2) workflow-advanced-text-extraction (2)

Packages

  • Total packages: 19
  • Total downloads:
    • pypi 17,212,658 last-month
  • Total docker downloads: 1,138,342,757
  • Total dependent packages: 397
    (may contain duplicates)
  • Total dependent repositories: 3,809
    (may contain duplicates)
  • Total versions: 152
  • Total maintainers: 3
  • Total advisories: 3
pypi.org: pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files

  • Versions: 75
  • Dependent Packages: 390
  • Dependent Repositories: 3,809
  • Downloads: 17,212,658 Last month
  • Docker Downloads: 1,138,342,757
Rankings
Dependent repos count: 0.2%
Downloads: 0.2%
Dependent packages count: 0.2%
Stargazers count: 0.4%
Docker downloads count: 0.4%
Average: 0.4%
Forks count: 1.2%
Maintainers (2)
Last synced: 6 months ago
alpine-v3.18: py3-pypdf-pyc

Precompiled Python bytecode for py3-pypdf

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 1.1%
Forks count: 1.5%
Stargazers count: 2.7%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.18: py3-pypdf

Pure-Python library built as a PDF toolkit

  • Versions: 1
  • Dependent Packages: 2
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 1.1%
Forks count: 1.5%
Stargazers count: 2.7%
Maintainers (1)
Last synced: 6 months ago
alpine-edge: py3-pypdf-pyc

Precompiled Python bytecode for py3-pypdf

  • Versions: 27
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Forks count: 1.6%
Stargazers count: 2.9%
Average: 4.7%
Dependent packages count: 14.3%
Maintainers (1)
Last synced: 6 months ago
alpine-edge: py3-pypdf

Pure-Python library built as a PDF toolkit

  • Versions: 34
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Forks count: 1.5%
Stargazers count: 2.7%
Average: 4.7%
Dependent packages count: 14.6%
Maintainers (1)
Last synced: 6 months ago
proxy.golang.org: github.com/py-pdf/pypdf
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 0.7%
Stargazers count: 0.8%
Average: 5.5%
Dependent packages count: 9.6%
Dependent repos count: 10.8%
Last synced: 6 months ago
spack.io: py-pypdf

A pure-python PDF library capable of splitting, merging, cropping, and transforming PDF files

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Average: 28.7%
Dependent packages count: 57.5%
Last synced: 6 months ago
anaconda.org: pypdf-with-image

pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Stargazers count: 9.4%
Forks count: 9.7%
Average: 31.3%
Dependent packages count: 48.8%
Dependent repos count: 57.4%
Last synced: 6 months ago
anaconda.org: pypdf-with-full

pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 49.1%
Average: 53.4%
Dependent repos count: 57.7%
Last synced: 6 months ago
anaconda.org: pypdf

pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.

  • Versions: 1
  • Dependent Packages: 3
  • Dependent Repositories: 0
Rankings
Dependent packages count: 49.1%
Average: 53.4%
Dependent repos count: 57.7%
Last synced: 6 months ago
anaconda.org: pypdf-with-crypto

pypdf is a free and open-source pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. It can also add custom data, viewing options, and passwords to PDF files. pypdf can retrieve text and metadata from PDFs as well.

  • Versions: 1
  • Dependent Packages: 1
  • Dependent Repositories: 0
Rankings
Dependent packages count: 49.1%
Average: 53.4%
Dependent repos count: 57.7%
Last synced: 6 months ago
alpine-v3.19: py3-pypdf

Pure-Python library built as a PDF toolkit

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.20: py3-pypdf-pyc

Precompiled Python bytecode for py3-pypdf

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.20: py3-pypdf

Pure-Python library built as a PDF toolkit

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.19: py3-pypdf-pyc

Precompiled Python bytecode for py3-pypdf

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.21: py3-pypdf

Pure-Python library built as a PDF toolkit

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.22: py3-pypdf

Pure-Python library built as a PDF toolkit

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.21: py3-pypdf-pyc

Precompiled Python bytecode for py3-pypdf

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago
alpine-v3.22: py3-pypdf-pyc

Precompiled Python bytecode for py3-pypdf

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 100%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements/ci.in pypi
  • coverage *
  • flake8 *
  • flake8-bugbear *
  • flake8-print *
  • flake8_implicit_str_concat *
  • mypy *
  • pillow *
  • pycryptodome *
  • pytest *
  • pytest-benchmark *
  • typeguard *
  • types-Pillow *
requirements/ci.txt pypi
  • attrs ==20.3.0
  • coverage ==6.2
  • flake8 ==5.0.4
  • flake8-bugbear ==22.7.1
  • flake8-implicit-str-concat ==0.2.0
  • flake8-print ==4.0.1
  • importlib-metadata ==4.2.0
  • iniconfig ==1.1.1
  • mccabe ==0.7.0
  • more-itertools ==8.13.0
  • mypy ==0.971
  • mypy-extensions ==0.4.3
  • packaging ==21.3
  • pillow ==8.4.0
  • pluggy ==1.0.0
  • py ==1.11.0
  • py-cpuinfo ==8.0.0
  • pycodestyle ==2.9.1
  • pycryptodome ==3.15.0
  • pyflakes ==2.5.0
  • pyparsing ==3.0.9
  • pytest ==7.0.1
  • pytest-benchmark ==3.4.1
  • six ==1.16.0
  • tomli ==1.2.3
  • typed-ast ==1.5.4
  • typeguard ==2.13.3
  • types-pillow ==9.2.1
  • typing-extensions ==4.1.1
  • zipp ==3.6.0
requirements/dev.in pypi
  • black * development
  • pip-tools * development
  • pre-commit <2.18.0 development
  • pytest-cov * development
  • twine * development
  • wheel * development
requirements/dev.txt pypi
  • attrs ==21.4.0 development
  • black ==22.6.0 development
  • bleach ==4.1.0 development
  • certifi ==2022.6.15 development
  • cffi ==1.15.1 development
  • cfgv ==3.3.1 development
  • charset-normalizer ==2.0.12 development
  • click ==8.0.4 development
  • colorama ==0.4.5 development
  • coverage ==6.2 development
  • cryptography ==37.0.4 development
  • dataclasses ==0.8 development
  • distlib ==0.3.5 development
  • docutils ==0.18.1 development
  • filelock ==3.4.1 development
  • identify ==2.4.4 development
  • idna ==3.3 development
  • importlib-metadata ==4.8.3 development
  • importlib-resources ==5.2.3 development
  • iniconfig ==1.1.1 development
  • jeepney ==0.7.1 development
  • keyring ==23.4.1 development
  • mypy-extensions ==0.4.3 development
  • nodeenv ==1.6.0 development
  • packaging ==21.3 development
  • pathspec ==0.9.0 development
  • pep517 ==0.12.0 development
  • pip-tools ==6.4.0 development
  • pkginfo ==1.8.3 development
  • platformdirs ==2.4.0 development
  • pluggy ==1.0.0 development
  • pre-commit ==2.17.0 development
  • py ==1.11.0 development
  • pycparser ==2.21 development
  • pygments ==2.12.0 development
  • pyparsing ==3.0.9 development
  • pytest ==7.0.1 development
  • pytest-cov ==3.0.0 development
  • pyyaml ==6.0 development
  • readme-renderer ==34.0 development
  • requests ==2.27.1 development
  • requests-toolbelt ==0.9.1 development
  • rfc3986 ==1.5.0 development
  • secretstorage ==3.3.2 development
  • six ==1.16.0 development
  • toml ==0.10.2 development
  • tomli ==1.2.3 development
  • tqdm ==4.64.0 development
  • twine ==3.8.0 development
  • typed-ast ==1.5.4 development
  • typing-extensions ==4.1.1 development
  • urllib3 ==1.26.10 development
  • virtualenv ==20.15.1 development
  • webencodings ==0.5.1 development
  • wheel ==0.37.1 development
  • zipp ==3.6.0 development
requirements/docs.in pypi
  • myst_parser *
  • sphinx *
  • sphinx_rtd_theme *
requirements/docs.txt pypi
  • alabaster ==0.7.12
  • attrs ==21.4.0
  • babel ==2.10.3
  • certifi ==2022.6.15
  • charset-normalizer ==2.0.12
  • docutils ==0.17.1
  • idna ==3.3
  • imagesize ==1.4.1
  • importlib-metadata ==4.8.3
  • jinja2 ==3.0.3
  • markdown-it-py ==2.0.1
  • markupsafe ==2.0.1
  • mdit-py-plugins ==0.3.0
  • mdurl ==0.1.0
  • myst-parser ==0.16.1
  • packaging ==21.3
  • pygments ==2.12.0
  • pyparsing ==3.0.9
  • pytz ==2022.1
  • pyyaml ==6.0
  • requests ==2.27.1
  • snowballstemmer ==2.2.0
  • sphinx ==4.5.0
  • sphinx-rtd-theme ==1.0.0
  • sphinxcontrib-applehelp ==1.0.2
  • sphinxcontrib-devhelp ==1.0.2
  • sphinxcontrib-htmlhelp ==2.0.0
  • sphinxcontrib-jsmath ==1.0.1
  • sphinxcontrib-qthelp ==1.0.3
  • sphinxcontrib-serializinghtml ==1.1.5
  • typing-extensions ==4.1.1
  • urllib3 ==1.26.10
  • zipp ==3.6.0
.github/workflows/benchmark.yaml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • benchmark-action/github-action-benchmark v1 composite
.github/workflows/github-ci.yaml actions
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/download-artifact v3 composite
  • actions/setup-python v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • codecov/codecov-action v2 composite
  • softprops/action-gh-release v1 composite
requirements/ci-3.11.txt pypi
  • attrs ==22.1.0
  • coverage ==6.5.0
  • flake8 ==5.0.4
  • flake8-bugbear ==22.10.27
  • flake8-implicit-str-concat ==0.3.0
  • flake8-print ==5.0.0
  • iniconfig ==1.1.1
  • mccabe ==0.7.0
  • more-itertools ==8.14.0
  • mypy ==0.982
  • mypy-extensions ==0.4.3
  • packaging ==21.3
  • pillow ==9.3.0
  • pluggy ==1.0.0
  • py-cpuinfo ==9.0.0
  • pycodestyle ==2.9.1
  • pycryptodome ==3.15.0
  • pyflakes ==2.5.0
  • pyparsing ==3.0.9
  • pytest ==7.2.0
  • pytest-benchmark ==4.0.0
  • typeguard ==2.13.3
  • types-dataclasses ==0.6.6
  • types-pillow ==9.2.2.2
  • typing-extensions ==4.4.0
pyproject.toml pypi
  • dataclasses python_version < '3.7'
  • typing_extensions >= 3.10.0.0; python_version < '3.10'
.github/workflows/release.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • softprops/action-gh-release v1 composite