https://github.com/aglaonemacommutatum/agentpoet

https://github.com/aglaonemacommutatum/agentpoet

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: AglaonemaCommutatum
  • Language: Python
  • Default Branch: master
  • Size: 14.6 KB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme

README.md

AgentCrossTalk

This project uses the Google Gemini to create a simple chatbot application simulating two crosstalk performers (Dougen and Penggen) performing based on user-provided topics with text,image (audio coming soon) input.

The project was completed by Yue Su,Shunyuan Mao,Ting Wang,Yingying Li,Haonan Shi.

Welcome to visit our project page.

Project Details

This project consists of three Python files: - main.py: The main program file responsible for handling user interactions with multimodal input. - crosstalk.py: Contains the logic for the crosstalk performance, including interactions with the Gemini and generating dialogues for Dougen and Penggen. - config.py: Contains API key and other configuration details. - crosstalk_utils.py: The ultis helps Blip Model extract topic from image as well as audio assistance. - dianatalk.py: A sample for vtuber talk. You can try to interact with diana. - tts_speech.py: For standard audio output. - ui_elements.py For UI windows embark design. - Vtuber_speech.py: Implement for vtuber audio. You can change with your preferd vtuber on huggingface through link.

How to Run

  1. Install required libraries:

    bash conda create -n poet python==3.11 pip install -r requirements.txt

  2. Obtain the Google Gemini API key:

    You need to register a Google account, enable the Gemini API, and obtain an API key. Refer to the official Google AI documentation for details: Google aistudio api docs

    Copy your API key into the api_key.txt file's first line. Do not commit the api_key.txt file to version control!

  3. Run the application:

    bash conda activate crosstalk python main.py

    This will start a GUI window where you can input a topic and click the "Start Performance" button. The program will simulate two crosstalk performers discussing and performing based on your topic, displaying the conversation in the chat area. You can also add Image input or launch Vtuber voice via the GUI buttons.

Example

Enter "Artificial Intelligence" in the input box and click the "Start Performance" button. You will see two crosstalk performers discussing and performing around the topic of artificial intelligence.

Notes

  • Ensure that you have Python 3.7 or later installed.
  • The quality and coherence of the crosstalk performance may vary due to limitations of the Gemini API.
  • This project is for demonstration and educational purposes only. Please adhere to the Google Cloud Platform's terms of service and usage limitations.

File Structure

```plaintext ├── main.py # Main program, handles GUI logic ├── crosstalk.py # Logic for crosstalk performance ├── config.py # Configuration (API Key and model initialization) └── api_key.txt # File containing the Gemini API key (handle securely) └── ... (Core files are above)

Owner

  • Name: msy
  • Login: AglaonemaCommutatum
  • Kind: user
  • Company: 西安电子科技大学

GitHub Events

Total
  • Watch event: 3
  • Member event: 1
  • Push event: 5
  • Create event: 3
Last Year
  • Watch event: 3
  • Member event: 1
  • Push event: 5
  • Create event: 3

Dependencies

requirements.txt pypi
  • Pillow ==11.0.0
  • SpeechRecognition ==3.12.0
  • google-generativeai *
  • gradio_client ==1.5.2
  • playsound ==1.3.0
  • protobuf ==5.29.2
  • pydub ==0.25.1
  • pygame ==2.6.1
  • pyttsx3 ==2.98
  • torch ==2.5.1
  • transformers ==4.47.1