textsummerizer-nlp

Text Summarization is a crucial and top Natural Language Processing task that involves generating concise and coherent summaries of longer pieces of text. It enables quick information retrieval and comprehension, making it invaluable for dealing with large volumes of textual data.

https://github.com/parham075/textsummerizer-nlp

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.8%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Text Summarization is a crucial and top Natural Language Processing task that involves generating concise and coherent summaries of longer pieces of text. It enables quick information retrieval and comprehension, making it invaluable for dealing with large volumes of textual data.

Basic Info
  • Host: GitHub
  • Owner: parham075
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 7.12 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Codemeta

README.md

TextSummerizer-NLP

Text Summarization is a crucial and top Natural Language Processing task that involves generating concise and coherent summaries of longer pieces of text. It enables quick information retrieval and comprehension, making it invaluable for dealing with large volumes of textual data.

Objective

This project aims to develop an abstractive or extractive text summarization model capable of creating informative and concise summaries from lengthy text documents. Before that, we saw we could use langchain to provide text summarization using OpenAI API though NTA-LLM. now I am going to train a transformer model using transfer learning and try to fine-tune it to tackle the problem.

Usecase:

The output model from this project can be useful for academic researcher, students, and anybody who doesn't have much time to read large amounts of text and looking for a solution to summerize their articles, course slides, etc.

Dataset Overview and Data Preprocessing

This project requires a dataset containing articles or documents with human-generated summaries. Data preprocessing involves tokenizing the text, handling punctuation, and creating input-target pairs for training. For this project I used a well-known dataset from Hugging Face: - Samsum

Model(s)

| Model| Weights| | --- | --- | |pre-trained model | pegasus-cnn_dailymail|

Queries for Analysis

Generate summaries for long articles or documents. Evaluate the quality of generated summaries using ROUGE and BLEU metrics. Key Insights and Findings

The text summarization model will successfully generate concise and coherent summaries, improving the efficiency of information retrieval and enhancing the user experience when dealing with extensive textual content.

Owner

  • Name: parham
  • Login: parham075
  • Kind: user
  • Location: Rome,Italy

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/CC-BY-NC-SA-4.0",
  "codeRepository": "https://github.com/parham075/TextSummerizer-NLP",
  "dateCreated": "2023-04-27",
  "datePublished": "2023-",
  "name": "TextSummerizer-NLP",
  "version": "1.0.0",
  "developmentStatus": "active",
  "relatedLink": [
    ""
  ],
  "keywords": [
    "NLP",
    "Text Summerizer",
    "mlflow",
    "Deep Learning"
  ],
  "programmingLanguage": [
    "Python",
    "CWL"
  ],
  "softwareRequirements": [
    "container runtime",
    "cwl runner",
    "mlflow"
  ],
  "author": [
    {
      "@type": "Person",
      "givenName": "Parham",
      "familyName": "Membari",
      "username": "parham075",
      "email": "p.membari96@gmail.com"
    }
  ]
}

GitHub Events

Total
Last Year

Dependencies

Dockerfile docker
setup.py pypi
environment.yml pypi
  • Jinja2 *
  • boto3 *
  • datasets *
  • ensure *
  • fastapi *
  • matplotlib *
  • mypy-boto3-s3 *
  • nltk *
  • notebook *
  • numpy *
  • pandas *
  • py7zr *
  • pyYAML *
  • python-box *
  • rouge_score *
  • sacrebleu *
  • torch *
  • torchvision *
  • tqdm *
  • transformers *
  • uvicorn *