textsummerizer-nlp
Text Summarization is a crucial and top Natural Language Processing task that involves generating concise and coherent summaries of longer pieces of text. It enables quick information retrieval and comprehension, making it invaluable for dealing with large volumes of textual data.
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.8%) to scientific vocabulary
Repository
Text Summarization is a crucial and top Natural Language Processing task that involves generating concise and coherent summaries of longer pieces of text. It enables quick information retrieval and comprehension, making it invaluable for dealing with large volumes of textual data.
Basic Info
- Host: GitHub
- Owner: parham075
- Language: Jupyter Notebook
- Default Branch: main
- Size: 7.12 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
TextSummerizer-NLP
Text Summarization is a crucial and top Natural Language Processing task that involves generating concise and coherent summaries of longer pieces of text. It enables quick information retrieval and comprehension, making it invaluable for dealing with large volumes of textual data.
Objective
This project aims to develop an abstractive or extractive text summarization model capable of creating informative and concise summaries from lengthy text documents. Before that, we saw we could use langchain to provide text summarization using OpenAI API though NTA-LLM. now I am going to train a transformer model using transfer learning and try to fine-tune it to tackle the problem.
Usecase:
The output model from this project can be useful for academic researcher, students, and anybody who doesn't have much time to read large amounts of text and looking for a solution to summerize their articles, course slides, etc.
Dataset Overview and Data Preprocessing
This project requires a dataset containing articles or documents with human-generated summaries. Data preprocessing involves tokenizing the text, handling punctuation, and creating input-target pairs for training.
For this project I used a well-known dataset from Hugging Face:
- Samsum
Model(s)
| Model| Weights| | --- | --- | |pre-trained model | pegasus-cnn_dailymail|
Queries for Analysis
Generate summaries for long articles or documents. Evaluate the quality of generated summaries using ROUGE and BLEU metrics. Key Insights and Findings
The text summarization model will successfully generate concise and coherent summaries, improving the efficiency of information retrieval and enhancing the user experience when dealing with extensive textual content.
Owner
- Name: parham
- Login: parham075
- Kind: user
- Location: Rome,Italy
- Repositories: 1
- Profile: https://github.com/parham075
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"license": "https://spdx.org/licenses/CC-BY-NC-SA-4.0",
"codeRepository": "https://github.com/parham075/TextSummerizer-NLP",
"dateCreated": "2023-04-27",
"datePublished": "2023-",
"name": "TextSummerizer-NLP",
"version": "1.0.0",
"developmentStatus": "active",
"relatedLink": [
""
],
"keywords": [
"NLP",
"Text Summerizer",
"mlflow",
"Deep Learning"
],
"programmingLanguage": [
"Python",
"CWL"
],
"softwareRequirements": [
"container runtime",
"cwl runner",
"mlflow"
],
"author": [
{
"@type": "Person",
"givenName": "Parham",
"familyName": "Membari",
"username": "parham075",
"email": "p.membari96@gmail.com"
}
]
}
GitHub Events
Total
Last Year
Dependencies
- Jinja2 *
- boto3 *
- datasets *
- ensure *
- fastapi *
- matplotlib *
- mypy-boto3-s3 *
- nltk *
- notebook *
- numpy *
- pandas *
- py7zr *
- pyYAML *
- python-box *
- rouge_score *
- sacrebleu *
- torch *
- torchvision *
- tqdm *
- transformers *
- uvicorn *