manga-reader

Generate a video recap of any manga volume PDF with GPT Vision and Elevenlabs narration. Discord: https://discord.gg/MMqcuDe2WZ

https://github.com/pashpashpash/manga-reader

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary

Keywords

accessibility comic comics comics-reader elevenlabs manga manga-reader manhwa manhwa-reader openai summarization video vision
Last synced: 6 months ago · JSON representation

Repository

Generate a video recap of any manga volume PDF with GPT Vision and Elevenlabs narration. Discord: https://discord.gg/MMqcuDe2WZ

Basic Info
  • Host: GitHub
  • Owner: pashpashpash
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://mangarecap.ai
  • Size: 173 MB
Statistics
  • Stars: 37
  • Watchers: 3
  • Forks: 24
  • Open Issues: 9
  • Releases: 0
Topics
accessibility comic comics comics-reader elevenlabs manga manga-reader manhwa manhwa-reader openai summarization video vision
Created about 2 years ago · Last pushed 9 months ago
Metadata Files
Readme Contributing License Citation Security

README.md

Manga Recap with GPT-4 Vision

Project Overview

This project aims to generate summaries of manga volumes by analyzing images extracted from PDF files of the manga. It uses the GPT-4 Vision API to understand the content of manga pages and produce compelling, story-telling tone summaries. The project processes PDFs to extract images, scales them to a specific size, encodes them in base64, and then uses these images as input for the GPT-4 Vision API alongside custom prompts to generate summaries. Once a summary is generated, it is sent to ElevenLabs API for narration. The resulting narration and relevant panel images are then combined to create a video recap summarizing the volume.

Join the Discord: https://discord.gg/MMqcuDe2WZ

https://github.com/pashpashpash/manga-reader/assets/20898225/debb0c15-3579-477c-813d-2ed878b0e6ea

Features

  • PDF processing to extract manga pages as images as well as panel extraction from within pages.
  • Image scaling to fit the requirements of the GPT-4 Vision API.
  • Base64 encoding of images for API submission.
  • Generating text summaries of manga volumes in a story-telling tone.
  • Narration of the generated summaries using the ElevenLabs API.
  • Video creation from the narration and relevant panel images/pages.

Prerequisites

Before you begin, ensure you have met the following requirements:

  • Python 3.7+
  • Pip3 (Python package manager)
  • Virtual environment (recommended)

Installation Steps

  1. Create a virtual environment to manage your project's dependencies separately.

python3 -m venv venv

  1. Activate the virtual environment

source venv/bin/activate

  1. Install Required Python Packages

pip3 install -r requirements.txt

  1. Set Up Environment Variables

Create a .env file in the root directory of your project. Add your OpenAI API key to this file:

OPENAI_API_KEY=your_openai_api_key_here ELEVENLABS_API_KEY=your_elevenlabs_api_key_here

  1. Prepare Your Manga PDFs

Place your manga volume PDF files in a directory structure as expected by the script, for example, naruto/v10/v10.pdf. Additionally, you should have a chapter-reference.pdf and a profile-reference.pdf in each manga directory. For example, naruto/chapter-reference.pdf and naruto/profile-reference.pdf. These files are used by GPT vision to identify the chapter pages and character introductions, respectively, so that jobs can be split up by chapter and for characters to be identified correctly by GPT Vision.

Running the Project

To run the project, execute the app.py script from the root directory of your project:

python3 app.py --manga naruto --volume-number 10

This script processes the specified PDF files, extracts and scales images, encodes them in base64, and sends them to the GPT-4 Vision API for analysis. The summaries generated by the API are printed to the console, including the total tokens used. The script then sends the summaries to the ElevenLabs API for narration. The resulting narration and relevant panel images are then combined to create a video recap summarizing the volume. The video is saved inside the relevant volume directory, i.e. naruto/v10/recap.mp4.

Optional/Recommended running instructions

I personally recommend running this in a Jupiter notebook (anime_recap.ipynb), as it allows you run the script one cell at a time, which is useful for debugging and understanding the process.

Owner

  • Login: pashpashpash
  • Kind: user
  • Company: redeemable.app

GitHub Events

Total
  • Issues event: 4
  • Watch event: 28
  • Issue comment event: 2
  • Push event: 1
  • Pull request event: 4
  • Fork event: 21
Last Year
  • Issues event: 4
  • Watch event: 28
  • Issue comment event: 2
  • Push event: 1
  • Pull request event: 4
  • Fork event: 21

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 13
  • Total pull requests: 5
  • Average time to close issues: 4 months
  • Average time to close pull requests: 3 months
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 0.31
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 4
  • Average time to close issues: 4 months
  • Average time to close pull requests: 4 months
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • pashpashpash (10)
  • JackyYu510yt (1)
  • vedanthkadam555 (1)
  • Tophatven (1)
Pull Request Authors
  • 1ashtray (4)
  • Hiabst2 (2)
Top Labels
Issue Labels
bug (1) enhancement (1) good first issue (1)
Pull Request Labels