Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: semanticClimate
  • License: apache-2.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 41 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

Demonstration of PDF Summarization

Open In Colab

DOI Zenodo badge:

DOI

Citation:

Barbhuiya, S., S, A., Jawed, M., Kumari, R., Simon, W., Yadav, G., & Murray-Rust, P. (2025). Demonstration of PDF Summarization (0.1a). Zenodo. https://doi.org/10.5281/zenodo.16526790

Description:

This Jupyter notebook provides an end-to-end pipeline for summarizing scientific PDFs using Natural Language Processing (NLP) techniques. It extracts text from uploaded PDFs and generates concise summaries using transformer-based models.

Features

  • Upload and parse PDF documents
  • Extract meaningful text content
  • Generate summaries using Hugging Face Transformers (e.g., BART, T5)
  • Optionally view original and summarized text side-by-side
  • Includes visualization support with PyMuPDF and IPython.display #### Requirements
  • Install the following packages:
  • pip install transformers
  • pip install PyPDF2
  • pip install fitz
  • pip install PyMuPDF
  • pip install nltk
  • pip install torch

How to Use

  1. Clone this repository or download the notebook.
  2. Launch Jupyter Notebook or Google Colab.
  3. Upload your scientific or research-based PDF.
  4. Run all cells to: - Extract the full text - Preprocess and chunk the content - Generate a summary using a transformer model

Structure

  • upload_pdf() – Upload and read PDF files
  • extract_text() – Extract text from all pages
  • summarize_text() – Use pre-trained summarization models
  • visualize() – Display original vs. summarized content

Applications

  • Research paper summarization
  • Literature review automation
  • Information extraction for large documents

Notes

  • Pretrained models like facebook/bart-large-cnn or t5-base are used.
  • Results depend on PDF formatting quality.

Reviewers & review process: <Add reviewers and review process link>


Software citation information: CITATION.cff

License: Apache License Version 2.0, January 2004 http://www.apache.org/licenses/ | License information: LICENSE

Owner

  • Name: semanticClimate
  • Login: semanticClimate
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Barbhuiya"
  given-names: "Shabnam"
  orcid: "https://orcid.org/0009-0004-0729-7385"
- family-names: "S"
  given-names: "Anudev"
  orcid: "https://orcid.org/0009-0006-5487-4741"
- family-names: "Jawed"
  given-names: "Moobashara"
  orcid: "https://orcid.org/0009-0009-7488-4834"
- family-names: "Kumari"
  given-names: "Renu"
  orcid: "https://orcid.org/0000-0002-9451-7814"
- family-names: "Simon"
  given-names: "Worthington"
  orcid: "https://orcid.org/0000-0002-8579-9717"
- family-names: "Yadav"
  given-names: "Gitanjali"
  orcid: "https://orcid.org/0000-0001-6591-9964"
- family-names: "Murray-Rust"
  given-names: "Peter"
  orcid: "https://orcid.org/0000-0003-3386-3972"
title: "Demonstration of PDF Summarization"
version: 0.0.1
doi: 10.5281/zenodo.
date-released: 2025-07-28
url: "https://github.com/semanticClimate/"

GitHub Events

Total
  • Push event: 2
  • Create event: 2
Last Year
  • Push event: 2
  • Create event: 2