https://github.com/azurelotus06/ai-executive-assistant
Science Score: 26.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.2%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: azurelotus06
- Language: Python
- Default Branch: main
- Size: 35.9 MB
Statistics
- Stars: 16
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Real-Time AI Executive Assistant for Zoom Meetings
An intelligent AI assistant that actively participates in Zoom meetings by listening, analyzing conversations in real-time using GPT-4/GPT-5, and providing strategic insights via natural text-to-speech.
🎯 Overview
This project creates an AI-powered executive assistant that: - Listens to Zoom meetings in real-time - Analyzes conversations using GPT-4/GPT-5 for strategic insights - Speaks naturally using ElevenLabs TTS when appropriate - Provides executive-level insights and recommendations
🚀 Key Features
✅ Real-Time Audio Processing
- Captures Zoom audio via MeetingBaaS integration
- Automatic bot joins meetings with provided URL
- Converts speech to text using OpenAI Whisper
- Implements Voice Activity Detection (VAD) for efficient processing
- Speaker identification and tracking
✅ AI-Powered Insights
- Context-aware analysis with GPT-4
- Executive-style prompt engineering for professional responses
- Maintains conversation history for contextual understanding
- Identifies key topics, risks, and opportunities
✅ Natural Voice Interaction
- Human-like speech synthesis using ElevenLabs
- Speaks only during natural pauses (non-intrusive)
- Configurable voice settings for different scenarios
✅ User Control
- Real-time mute/unmute functionality
- Pause/resume AI analysis
- Manual prompt injection
- Emergency stop feature
🛠️ Tech Stack
- Frontend: React with TypeScript
- Backend: Python with FastAPI
- Database: PostgreSQL
- Package Management: UV (Python), npm (JavaScript)
- APIs:
- OpenAI (Whisper + GPT-4)
- ElevenLabs (Text-to-Speech)
- MeetingBaaS (Zoom integration)
- Real-time Communication: WebSockets
- Audio Processing: WebRTC VAD, librosa, sounddevice
📋 Prerequisites
- Node.js 16+ and npm
- Python 3.9+
- PostgreSQL 12+
- FFmpeg
- Valid API keys for:
- OpenAI (with GPT-4 access)
- ElevenLabs
- MeetingBaaS (for Zoom integration)
- Devtunnel or ngrok for webhook endpoints
🔧 Installation
1. Clone the Repository
bash
git clone https://github.com/yourusername/zoom-ai-assistant.git
cd zoom-ai-assistant
2. Backend Setup
```bash
Install UV package manager
pip install uv
Navigate to backend
cd backend
Install dependencies
uv sync
Copy and configure environment variables
cp .env.example .env
Edit .env with your API keys
Set up database
uv run python scripts/startup.py
Start the backend server
uv run uvicorn main:app --reload ```
3. Frontend Setup
```bash
Navigate to frontend
cd frontend
Install dependencies
npm install
Start the development server
npm start ```
🔑 Configuration
Environment Variables
Create a .env file in the backend directory:
```env
Required
OPENAIAPIKEY=youropenaiapikey ELEVENLABSAPIKEY=yourelevenlabsapikey
MeetingBaaS Configuration (Required for Zoom integration)
MEETINGBAASAPIKEY=yourmeetingbaasapikey DEVTUNNELHOST=your-devtunnel-host.devtunnels.ms
Optional
ZOOMMEETINGID=yourmeetingid ZOOMPASSCODE=yourpasscode DATABASEURL=postgresql://user:password@localhost:5432/zoomassistant
AI Settings
AIMODEL=gpt-4 AITEMPERATURE=0.7 AIMAXTOKENS=150
TTS Settings
TTSVOICEID=21m00Tcm4TlvDq8ikWAM ```
📱 Usage
Start Both Servers
- Backend:
http://localhost:8000 - Frontend:
http://localhost:3000
- Backend:
Set up Devtunnel (for MeetingBaaS webhooks) ```bash
Install devtunnel
winget install Microsoft.devtunnel
# Login and create tunnel devtunnel user login devtunnel create --allow-anonymous devtunnel port create -p 8000 devtunnel host
# Copy the URL and set DEVTUNNEL_HOST in .env ```
Connect to a Meeting
- Enter the Zoom meeting URL in the Meeting Integration panel
- Click "Join Meeting" to send the AI bot
- Admit the bot when it appears in the waiting room
- The AI will begin listening and analyzing
During the Meeting
- The AI listens continuously
- Transcripts appear in real-time
- AI speaks when it has valuable insights
- Use controls to mute/pause as needed
Manual Interaction
- Type custom prompts in the input field
- AI will respond based on conversation context
🏗️ Architecture
project/
├── frontend/ # React TypeScript application
│ ├── src/
│ │ ├── components/ # UI components
│ │ ├── hooks/ # Custom React hooks
│ │ ├── services/ # API and WebSocket services
│ │ └── types/ # TypeScript definitions
│ └── public/ # Static assets
│
├── backend/ # FastAPI Python application
│ ├── app/
│ │ ├── api/ # REST API endpoints
│ │ ├── core/ # Core configuration
│ │ ├── models/ # Database models
│ │ ├── schemas/ # Pydantic schemas
│ │ └── services/ # Business logic
│ ├── alembic/ # Database migrations
│ └── scripts/ # Utility scripts
│
└── docs/ # Additional documentation
🔄 Data Flow
- Meeting Join → Bot joins via MeetingBaaS API
- Audio Stream → Real-time WebSocket from meeting
- Audio Processing → VAD + Speaker identification
- Speech-to-Text → OpenAI Whisper (local)
- AI Analysis → GPT-4 Context Analysis
- Decision Making → Should AI speak?
- Text-to-Speech → ElevenLabs API
- Audio Output → Through meeting bot
🧪 Testing
Backend Tests
bash
cd backend
uv run pytest
Frontend Tests
bash
cd frontend
npm test
End-to-End Pipeline Test
```bash
With backend running
curl -X POST http://localhost:8000/api/control/test-pipeline ```
📊 Performance Considerations
- Audio Chunk Duration: 1 second (configurable)
- AI Response Cooldown: 30 seconds minimum
- Context Window: Last 10 messages
- WebSocket Reconnection: Automatic with 5s interval
🔒 Security
- API keys stored in environment variables
- CORS configured for local development
- WebSocket connections validated
- Database connections use SSL in production
🐛 Troubleshooting
Common Issues
Audio Not Capturing
- Check microphone permissions
- Verify FFmpeg installation
- Test with
api/audio/test-recording
AI Not Responding
- Verify OpenAI API key and GPT-4 access
- Check conversation context window
- Review AI pause status
No Speech Output
- Confirm ElevenLabs API key
- Check TTS mute status
- Verify voice ID configuration
🚀 Deployment
Ngrok Deployment
To expose your local development environment using ngrok:
Install ngrok (if not already installed): ```bash
Download from https://ngrok.com/download
Or use the included ngrok.exe on Windows
```
Backend Deployment: ```bash cd backend
Start backend server first
python -m uvicorn main:app --host 0.0.0.0 --port 8000 --reload
# In another terminal, expose backend ngrok http 8000 ```
Note the ngrok URL (e.g., https://abc123.ngrok-free.app) and update:
- Frontend's APIBASEURL in src/services/api.ts
- Any webhook URLs in your Zoom integration
- Frontend Deployment: ```bash cd frontend # Build production version npm run build
# Serve frontend npx serve -s build -l 3000
# In another terminal, expose frontend ngrok http 3000 ```
- Access the Application:
- Frontend: Use the ngrok URL for port 3000
- Backend: Use the ngrok URL for port 8000 for API calls
Production Considerations
- Use environment-specific configurations
- Enable SSL/TLS for WebSocket connections
- Implement proper authentication
- Set up monitoring and logging
- Configure auto-scaling for high load
📝 License
MIT License - see LICENSE file for details
🤝 Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Open a Pull Request
📧 Support
For issues and questions: - Open an issue on GitHub - Check existing documentation - Review API logs for debugging
Built with ❤️ for enhancing meeting productivity with AI
Owner
- Login: azurelotus06
- Kind: user
- Repositories: 1
- Profile: https://github.com/azurelotus06
GitHub Events
Total
- Issues event: 3
- Watch event: 9
- Issue comment event: 2
- Member event: 1
- Push event: 13
- Pull request event: 4
- Create event: 3
Last Year
- Issues event: 3
- Watch event: 9
- Issue comment event: 2
- Member event: 1
- Push event: 13
- Pull request event: 4
- Create event: 3
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 3
- Total pull requests: 2
- Average time to close issues: 1 minute
- Average time to close pull requests: 2 minutes
- Total issue authors: 3
- Total pull request authors: 2
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 2
- Average time to close issues: 1 minute
- Average time to close pull requests: 2 minutes
- Issue authors: 3
- Pull request authors: 2
- Average comments per issue: 1.0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- azurelotus06 (1)
- skylotus6 (1)
- ashlotus6 (1)
Pull Request Authors
- skylotus6 (1)
- ashlotus6 (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- 1322 dependencies
- @testing-library/dom ^10.4.0
- @testing-library/jest-dom ^6.6.3
- @testing-library/react ^16.3.0
- @testing-library/user-event ^14.5.2
- @types/jest ^29.5.12
- @types/node ^20.11.24
- @types/react ^18.2.61
- @types/react-dom ^18.2.19
- @types/uuid ^9.0.8
- autoprefixer ^10.4.17
- axios ^1.6.7
- date-fns ^3.3.1
- framer-motion ^11.0.5
- lucide-react ^0.515.0
- postcss ^8.4.35
- react ^18.2.0
- react-dom ^18.2.0
- react-hot-toast ^2.4.1
- react-icons ^5.0.1
- react-scripts 5.0.1
- recharts ^2.12.0
- tailwindcss ^3.4.1
- typescript ^5.3.3
- use-sound ^4.0.1
- uuid ^9.0.1
- wavesurfer.js ^7.6.2
- web-vitals ^3.5.2
- aiofiles >=23.2.1
- aiohttp >=3.9.0
- alembic >=1.12.1
- asyncio-mqtt >=0.16.1
- elevenlabs >=0.2.26
- fastapi >=0.104.1
- httpx >=0.25.2
- librosa >=0.10.1
- numpy >=1.24.0
- openai >=1.3.0
- openai-whisper >=20231117
- passlib [bcrypt]>=1.7.4
- psycopg2-binary >=2.9.9
- pyaudio >=0.2.11
- pydantic >=2.5.0
- pydantic-settings >=2.9.1
- python-dotenv >=1.0.0
- python-jose [cryptography]>=3.3.0
- python-multipart >=0.0.6
- requests >=2.31.0
- scipy >=1.11.0
- sounddevice >=0.4.6
- sqlalchemy >=2.0.23
- uvicorn [standard]>=0.24.0
- webrtcvad-wheels >=2.0.10
- websockets >=12.0
- 136 dependencies