https://github.com/atlarge-research/fails
A Framework for Automated Collection and Analysis of Incidents on LLM Services
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.4%) to scientific vocabulary
Repository
A Framework for Automated Collection and Analysis of Incidents on LLM Services
Basic Info
- Host: GitHub
- Owner: atlarge-research
- Language: Python
- Default Branch: main
- Size: 5.83 MB
Statistics
- Stars: 3
- Watchers: 9
- Forks: 1
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
FAILS: A Framework for Automated Collection and Analysis of Incidents on LLM Services
This repository contains the web application for the FAILS project. It is built using React for the frontend and Flask for the backend.
Large Language Model (LLM) services such as ChatGPT, DALL·E, and Cursor have quickly become essential for society, businesses, and individuals, empowering applications such as chatbots, image generation, and code assistance. The complexity of LLM systems makes them prone to failures and affects their reliability and availability, yet their failure patterns are not fully understood, making it an emerging problem. However, there are limited datasets and studies in this area, particularly lacking an open-access tool for analyzing LLM service failures based on incident reports. Addressing these problems, in this work we propose FAILS, the first open-sourced framework for incident reports collection and analysis on different LLM services and providers. FAILS provides comprehensive data collection, analysis, and visualization capabilities, including:(1) It can automatically collect, clean, and update incident data through its data scraper and processing components;(2) It provides 17 types of failure analysis, allowing users to explore temporal trends of incidents, analyze service reliability metrics, such as Mean Time to Recovery (MTTR) and Mean Time Between Failures (MTBF);(3) It leverages advanced LLM tools to assist in data analysis and interpretation, enabling users to gain observations and insights efficiently. All functions are integrated in the backend, allowing users to easily access them through a web-based frontend interface.
Getting it running
Prerequisites
- Node.js and npm
- Python 3.11 (tested with 3.12 and 3.13, didn't work!)
- OpenAI API key (not system critical but needed for AI plot analysis feature)
Installation
- Install Node.js and npm:
If you haven't installed Node.js and npm, download and install them from the official Node.js website. This will also install npm, which is the package manager for Node.js.
- Install frontend dependencies:
Navigate to the client directory and install the dependencies:
bash
cd client
npm install
- Set up Python virtual environment:
Navigate to the llm_analysis directory and create a virtual environment:
bash
python -m venv venv
Activate the virtual environment:
On macOS and Linux:
bash source venv/bin/activateOn Windows:
bash .\venv\Scripts\activate
- Install backend dependencies:
With the virtual environment activated, install the dependencies using pip:
bash
pip install -r requirements.txt
- Configure Environment Variables:
Create a .env file in the server/scripts directory with your API keys:
env
OPENAI_API_KEY=your_openai_api_key_here
Replace your_openai_api_key_here with your actual OpenAI API key.
Running the Application
- Start the backend server:
#### Development Mode
For development with auto-reload:
In the server directory, ensure the virtual environment is activated, then run:
bash
python app.py
This will start the Flask server on http://localhost:5000.
#### Production Mode
For production deployment using Gunicorn:
bash
cd server
chmod +x start.sh stop.sh # Make scripts executable (first time only)
./start.sh # Start the server
./stop.sh # Stop the server when needed
The server will be available at http://localhost:5000.
- Start the frontend development server:
In the client directory, run:
bash
npm start
This will start the React development server on http://localhost:3000.
Features
Dashboard
The main dashboard provides visualization and analysis of LLM service incidents through various plots and metrics.
Failure Analysis Chat
An interactive chat interface that allows users to analyze incident patterns and get AI-powered insights about service reliability. The chat interface:
- Maintains conversation context for follow-up questions
- Provides markdown-formatted responses
- Supports natural language queries about:
- Common failure patterns
- Service reliability trends
- Impact analysis
- Recovery time patterns
- Root cause categorization
Example queries:
- "Sort the service providers by number of incidents in total in the entire dataset and give the timeframe!"
- "Tell me more about the impact levels of incidents"
The analysis is powered by GPT-4o-mini and uses the historical incident data to provide data-backed insights.
AI Plot Analysis
The application includes an AI-powered plot analysis feature that can analyze visualizations and provide insights. To use this feature:
Setup Requirements:
- Ensure you have a valid OpenAI API key
- Add the API key to your
.envfile as described above - Make sure you're running the application in production mode using the start.sh script
Using the Feature:
- Generate plots by selecting services and date range
- Once plots are displayed, find the "AI Plot Analysis" section below the plots
- Choose either:
- A single plot to analyze specific visualizations
- "Analyze All Plots" for a comprehensive summary
- Click "Analyze Plot" to generate AI insights
Analysis Types:
- Single Plot Analysis: Provides detailed insights about specific visualizations
- All Plots Analysis: Generates a comprehensive summary of all plots, highlighting key patterns and insights
Troubleshooting:
- If you see "Please use production server" message, ensure you're running the server using start.sh
- Verify your API key is correctly set in the .env file
- Check the server logs for any API-related errors
Data Collection and Updates
The application includes scripts to collect and update incident data from various LLM providers. There are two main data collection scripts:
- Regular Data Updates - Collects recent incidents:
bash
cd server/scripts
python run_incident_scrapers.py
This script: - Collects new incidents from OpenAI, Anthropic, Character.AI, and StabilityAI - Updates the existing incident database with new data - Runs both the StabilityAI.py file and the incidentscraperoac.py file
- Historical Data Collection - One-time collection of all historical incidents:
bash
cd server/scripts/data_gen_modules
python incident_scraper_oac_historical.py
This script: - Collects all available historical incidents - Creates a complete historical database - Should be run only once when setting up a new instance
Troubleshooting Data Collection
If you encounter issues during data collection:
Check the Logs:
- View server/logs/incident_scrapers.log for detailed error messages
- Common issues include network timeouts and parsing errors
Browser Issues:
- If you see WebDriver errors, ensure Chrome is properly installed
- Try running without headless mode for debugging by removing the '--headless=new' option
Data Validation Failures:
- Check that the source websites haven't changed their structure
- Verify network connectivity to all provider status pages
Learn More
Some screenshots of the interface:
Code by Nishanthi Srinivasan, Bálint László Szarvas and Sándor Battaglini-Fischer.
Many thanks to Xiaoyu Chu and Prof. Dr. Ir. Alexandru Iosup for the support!
Owner
- Name: @Large Research
- Login: atlarge-research
- Kind: organization
- Email: info@atlarge-research.com
- Website: http://atlarge-research.com/
- Twitter: LargeResearch
- Repositories: 24
- Profile: https://github.com/atlarge-research
Massivizing Computer Systems
GitHub Events
Total
- Issues event: 4
- Watch event: 4
- Issue comment event: 2
- Push event: 1
- Public event: 1
- Fork event: 1
Last Year
- Issues event: 4
- Watch event: 4
- Issue comment event: 2
- Push event: 1
- Public event: 1
- Fork event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 0
- Average time to close issues: 12 days
- Average time to close pull requests: N/A
- Total issue authors: 3
- Total pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 0
- Average time to close issues: 12 days
- Average time to close pull requests: N/A
- Issue authors: 3
- Pull request authors: 0
- Average comments per issue: 0.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- NielsRogge (1)
- chuxiaoyu (1)
- Radu-Nicolae (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- 1422 dependencies
- @babel/plugin-proposal-private-property-in-object ^7.21.11 development
- gh-pages ^6.3.0 development
- @emotion/react ^11.11.1
- @emotion/styled ^11.11.0
- @mui/icons-material ^5.15.11
- @mui/material ^5.15.11
- @mui/x-data-grid ^7.23.0
- @testing-library/jest-dom ^5.17.0
- @testing-library/react ^13.4.0
- @testing-library/user-event ^13.5.0
- body-parser ^1.20.3
- dompurify ^3.0.9
- framer-motion ^11.0.8
- http-proxy-middleware ^2.0.6
- jszip ^3.10.1
- notistack ^3.0.1
- react ^18.2.0
- react-dom ^18.2.0
- react-markdown ^9.0.1
- react-router-dom ^6.22.2
- react-scripts 5.0.1
- remark-gfm ^4.0.0
- web-vitals ^2.1.4
- clsx 1.2.1
- csstype 3.1.3
- goober 2.1.16
- js-tokens 4.0.0
- loose-envify 1.4.0
- notistack 3.0.1
- react 18.3.1
- react-dom 18.3.1
- scheduler 0.23.2
- notistack ^3.0.1
- Flask ==3.0.2
- Flask-CORS ==4.0.0
- Werkzeug ==3.0.1
- anthropic ==0.8.1
- base64io ==1.0.3
- certifi ==2024.2.2
- charset-normalizer ==3.3.2
- gunicorn ==21.2.0
- idna ==3.6
- matplotlib ==3.8.3
- numpy ==1.24.3
- openai ==1.60.1
- pandas ==2.2.1
- pillow ==10.2.0
- python-dateutil ==2.8.2
- python-dotenv ==1.0.0
- pytz ==2024.1
- requests ==2.31.0
- scikit-learn ==1.2.2
- seaborn ==0.13.2
- selenium ==4.27.1
- statsmodels *
- tenacity ==8.2.3
- tensorflow *
- urllib3 ==2.2.1