pdf_summarization_demo

https://github.com/semanticclimate/pdf_summarization_demo

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: semanticClimate
License: apache-2.0
Language: Jupyter Notebook
Default Branch: main
Size: 41 KB

Statistics

Stars: 0
Watchers: 0
Forks: 0
Open Issues: 0
Releases: 0

Created 11 months ago · Last pushed 11 months ago

Metadata Files

Readme License Citation

Demonstration of PDF Summarization

DOI Zenodo badge:

Citation:

Barbhuiya, S., S, A., Jawed, M., Kumari, R., Simon, W., Yadav, G., & Murray-Rust, P. (2025). Demonstration of PDF Summarization (0.1a). Zenodo. https://doi.org/10.5281/zenodo.16526790

Description:

This Jupyter notebook provides an end-to-end pipeline for summarizing scientific PDFs using Natural Language Processing (NLP) techniques. It extracts text from uploaded PDFs and generates concise summaries using transformer-based models.

Features

Upload and parse PDF documents
Extract meaningful text content
Generate summaries using Hugging Face Transformers (e.g., BART, T5)
Optionally view original and summarized text side-by-side
Includes visualization support with PyMuPDF and IPython.display #### Requirements
Install the following packages:
pip install transformers
pip install PyPDF2
pip install fitz
pip install PyMuPDF
pip install nltk
pip install torch

How to Use

Clone this repository or download the notebook.
Launch Jupyter Notebook or Google Colab.
Upload your scientific or research-based PDF.
Run all cells to: - Extract the full text - Preprocess and chunk the content - Generate a summary using a transformer model

Structure

upload_pdf() – Upload and read PDF files
extract_text() – Extract text from all pages
summarize_text() – Use pre-trained summarization models
visualize() – Display original vs. summarized content

Applications

Research paper summarization
Literature review automation
Information extraction for large documents

Notes

Pretrained models like facebook/bart-large-cnn or t5-base are used.
Results depend on PDF formatting quality.

Reviewers & review process: <Add reviewers and review process link>

Software citation information: CITATION.cff

License: Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: LICENSE

Owner

Name: semanticClimate
Login: semanticClimate
Kind: organization

Repositories: 3
Profile: https://github.com/semanticClimate

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Barbhuiya"
  given-names: "Shabnam"
  orcid: "https://orcid.org/0009-0004-0729-7385"
- family-names: "S"
  given-names: "Anudev"
  orcid: "https://orcid.org/0009-0006-5487-4741"
- family-names: "Jawed"
  given-names: "Moobashara"
  orcid: "https://orcid.org/0009-0009-7488-4834"
- family-names: "Kumari"
  given-names: "Renu"
  orcid: "https://orcid.org/0000-0002-9451-7814"
- family-names: "Simon"
  given-names: "Worthington"
  orcid: "https://orcid.org/0000-0002-8579-9717"
- family-names: "Yadav"
  given-names: "Gitanjali"
  orcid: "https://orcid.org/0000-0001-6591-9964"
- family-names: "Murray-Rust"
  given-names: "Peter"
  orcid: "https://orcid.org/0000-0003-3386-3972"
title: "Demonstration of PDF Summarization"
version: 0.0.1
doi: 10.5281/zenodo.
date-released: 2025-07-28
url: "https://github.com/semanticClimate/"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

pdf_summarization_demo

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Demonstration of PDF Summarization

Features

How to Use

Structure

Applications

Notes

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year