pdf_summarization_demo
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: semanticClimate
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Size: 41 KB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Demonstration of PDF Summarization
DOI Zenodo badge:
Citation:
Barbhuiya, S., S, A., Jawed, M., Kumari, R., Simon, W., Yadav, G., & Murray-Rust, P. (2025). Demonstration of PDF Summarization (0.1a). Zenodo. https://doi.org/10.5281/zenodo.16526790
Description:
This Jupyter notebook provides an end-to-end pipeline for summarizing scientific PDFs using Natural Language Processing (NLP) techniques. It extracts text from uploaded PDFs and generates concise summaries using transformer-based models.
Features
- Upload and parse PDF documents
- Extract meaningful text content
- Generate summaries using Hugging Face Transformers (e.g., BART, T5)
- Optionally view original and summarized text side-by-side
- Includes visualization support with PyMuPDF and IPython.display #### Requirements
- Install the following packages:
- pip install transformers
- pip install PyPDF2
- pip install fitz
- pip install PyMuPDF
- pip install nltk
- pip install torch
How to Use
- Clone this repository or download the notebook.
- Launch Jupyter Notebook or Google Colab.
- Upload your scientific or research-based PDF.
- Run all cells to: - Extract the full text - Preprocess and chunk the content - Generate a summary using a transformer model
Structure
- upload_pdf() – Upload and read PDF files
- extract_text() – Extract text from all pages
- summarize_text() – Use pre-trained summarization models
- visualize() – Display original vs. summarized content
Applications
- Research paper summarization
- Literature review automation
- Information extraction for large documents
Notes
- Pretrained models like facebook/bart-large-cnn or t5-base are used.
- Results depend on PDF formatting quality.
Reviewers & review process: <Add reviewers and review process link>
Software citation information: CITATION.cff
License: Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: LICENSE
Owner
- Name: semanticClimate
- Login: semanticClimate
- Kind: organization
- Repositories: 3
- Profile: https://github.com/semanticClimate
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Barbhuiya" given-names: "Shabnam" orcid: "https://orcid.org/0009-0004-0729-7385" - family-names: "S" given-names: "Anudev" orcid: "https://orcid.org/0009-0006-5487-4741" - family-names: "Jawed" given-names: "Moobashara" orcid: "https://orcid.org/0009-0009-7488-4834" - family-names: "Kumari" given-names: "Renu" orcid: "https://orcid.org/0000-0002-9451-7814" - family-names: "Simon" given-names: "Worthington" orcid: "https://orcid.org/0000-0002-8579-9717" - family-names: "Yadav" given-names: "Gitanjali" orcid: "https://orcid.org/0000-0001-6591-9964" - family-names: "Murray-Rust" given-names: "Peter" orcid: "https://orcid.org/0000-0003-3386-3972" title: "Demonstration of PDF Summarization" version: 0.0.1 doi: 10.5281/zenodo. date-released: 2025-07-28 url: "https://github.com/semanticClimate/"
GitHub Events
Total
- Push event: 2
- Create event: 2
Last Year
- Push event: 2
- Create event: 2