mer-factory
Workflow Agent for constructing Affect Computing (e.g., Multimodal Emotion Recognition and Reasoning, Sentiment Analysis) datasets.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Keywords
Repository
Workflow Agent for constructing Affect Computing (e.g., Multimodal Emotion Recognition and Reasoning, Sentiment Analysis) datasets.
Basic Info
- Host: GitHub
- Owner: Lum1104
- License: mit
- Language: Python
- Default Branch: main
- Homepage: https://lum1104.github.io/MER-Factory/
- Size: 2.14 MB
Statistics
- Stars: 28
- Watchers: 2
- Forks: 4
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
👉🏻 MER-Factory 👈🏻
中文  |   English  
Your automated factory for constructing Multimodal Emotion Recognition and Reasoning (MERR) datasets.
Click here to expand/collapse
```mermaid graph TD; __start__([__start__
]):::first setup_paths(setup_paths) handle_error(handle_error) run_au_extraction(run_au_extraction) save_au_results(save_au_results) generate_audio_description(generate_audio_description) save_audio_results(save_audio_results) generate_video_description(generate_video_description) save_video_results(save_video_results) extract_full_features(extract_full_features) filter_by_emotion(filter_by_emotion) find_peak_frame(find_peak_frame) generate_peak_frame_visual_description(generate_peak_frame_visual_description) generate_peak_frame_au_description(generate_peak_frame_au_description) synthesize_summary(synthesize_summary) save_mer_results(save_mer_results) run_image_analysis(run_image_analysis) synthesize_image_summary(synthesize_image_summary) save_image_results(save_image_results) __end__([__end__
]):::last __start__ --> setup_paths; extract_full_features --> filter_by_emotion; filter_by_emotion -.-> find_peak_frame; filter_by_emotion -.-> handle_error; filter_by_emotion -.-> save_au_results; find_peak_frame --> generate_audio_description; generate_audio_description -.-> generate_video_description; generate_audio_description -.-> handle_error; generate_audio_description -.-> save_audio_results; generate_peak_frame_au_description --> synthesize_summary; generate_peak_frame_visual_description --> generate_peak_frame_au_description; generate_video_description -.-> generate_peak_frame_visual_description; generate_video_description -.-> handle_error; generate_video_description -.-> save_video_results; run_au_extraction --> filter_by_emotion; run_image_analysis --> synthesize_image_summary; setup_paths -. full_pipeline .-> extract_full_features; setup_paths -. audio_pipeline .-> generate_audio_description; setup_paths -. video_pipeline .-> generate_video_description; setup_paths -.-> handle_error; setup_paths -. au_pipeline .-> run_au_extraction; setup_paths -. image_pipeline .-> run_image_analysis; synthesize_image_summary --> save_image_results; synthesize_summary --> save_mer_results; handle_error --> __end__; save_au_results --> __end__; save_audio_results --> __end__; save_image_results --> __end__; save_mer_results --> __end__; save_video_results --> __end__; classDef default fill:#f2f0ff,line-height:1.2 classDef first fill-opacity:0 classDef last fill:#bfb6fc ```📚 Please visit project documentation for detailed installation and usage instructions.
Usage
Basic Command Structure
bash
python main.py [INPUT_PATH] [OUTPUT_DIR] [OPTIONS]
Examples
```bash
Show all supported args.
python main.py --help
Full MER pipeline with Gemini (default)
python main.py pathtovideo/ output/ --type MER --silent --threshold 0.8
Using Sentiment Analysis task instead of MERR
python main.py pathtovideo/ output/ --type MER --task "Sentiment Analysis" --silent
Using ChatGPT models
python main.py pathtovideo/ output/ --type MER --chatgpt-model gpt-4o --silent
Using local Ollama models
python main.py pathtovideo/ output/ --type MER --ollama-vision-model llava-llama3:latest --ollama-text-model llama3.2 --silent
Using Hugging Face model
python main.py pathtovideo/ output/ --type MER --huggingface-model google/gemma-3n-E4B-it --silent
Process images instead of videos
python main.py ./images ./output --type MER ```
Note: Run ollama pull llama3.2 etc, if Ollama model is needed. Ollama does not support video analysis for now.
Dashboard for Data Curation and Hyperparameter Tuning
We provide an interactive dashboard webpage to facilitate data curation and hyperparameter tuning. The dashboard allows you to test different prompts, save and run configurations, and rate the generated data.
To launch the dashboard, use the following command:
bash
python dashboard.py
Command Line Options
| Option | Short | Description | Default |
|--------|-------|-------------|---------|
| --type | -t | Processing type (AU, audio, video, image, MER) | MER |
| --task | -tk | Analysis task type (MERR, Sentiment Analysis) | MERR |
| --label-file | -l | Path to a CSV file with 'name' and 'label' columns. Optional, for ground truth labels. | None |
| --threshold | -th | Emotion detection threshold (0.0-5.0) | 0.8 |
| --peak_dis | -pd | Steps between peak frame detection (min 8) | 15 |
| --silent | -s | Run with minimal output | False |
| --cache | -ca | Reuse existing audio/video/AU results from previous pipeline runs | False |
| --concurrency | -c | Concurrent files for async processing (min 1) | 4 |
| --ollama-vision-model | -ovm | Ollama vision model name | None |
| --ollama-text-model | -otm | Ollama text model name | None |
| --chatgpt-model | -cgm | ChatGPT model name (e.g., gpt-4o) | None |
| --huggingface-model | -hfm | Hugging Face model ID | None |
Processing Types
1. Action Unit (AU) Extraction
Extracts facial Action Units and generates natural language descriptions:
bash
python main.py video.mp4 output/ --type AU
2. Audio Analysis
Extracts audio, transcribes speech, and analyzes tone:
bash
python main.py video.mp4 output/ --type audio
3. Video Analysis
Generates comprehensive video content descriptions:
bash
python main.py video.mp4 output/ --type video
4. Image Analysis
Runs the pipeline with image input: ```bash python main.py ./images ./output --type image
Note: Image files will automatically use image pipeline regardless of --type setting
```
5. Full MER Pipeline (Default)
Runs the complete multimodal emotion recognition pipeline: ```bash python main.py video.mp4 output/ --type MER
or simply:
python main.py video.mp4 output/ ```
Task Types
The --task option allows you to choose between different analysis tasks:
1. Emotion Recognition (Default)
Performs detailed emotion analysis with granular emotion categories: ```bash python main.py video.mp4 output/ --task "Emotion Recognition"
or simply omit the --task option since it's the default
python main.py video.mp4 output/ ```
2. Sentiment Analysis
Performs sentiment-focused analysis (positive, negative, neutral):
bash
python main.py video.mp4 output/ --task "Sentiment Analysis"
Export the Dataset
To export datasets for curation or training, use the following commands:
For Dataset Curation
bash
python export.py --output_folder "{output_folder}" --file_type {file_type.lower()} --export_path "{export_path}" --export_csv
For Training
bash
python export.py --input_csv path/to/csv_file.csv --export_format sharegpt
Model Support
The tool supports four types of models:
- Google Gemini (default): Requires
GOOGLE_API_KEYin.env - OpenAI ChatGPT: Requires
OPENAI_API_KEYin.env, specify with--chatgpt-model - Ollama: Local models, specify with
--ollama-vision-modeland--ollama-text-model - Hugging Face: Currently supports multimodal models like
google/gemma-3n-E4B-it
Note: If using Hugging Face models, concurrency is automatically set to 1 for synchronous processing.
Model Recommendations
When to Use Ollama
Recommended for: Image analysis, Action Unit analysis, text processing, and simple audio transcription tasks.
Benefits: - ✅ Async support: Ollama supports asynchronous calling, making it ideal for processing large datasets efficiently - ✅ Local processing: No API costs or rate limits - ✅ Wide model selection: Visit ollama.com to explore available models - ✅ Privacy: All processing happens locally
Example usage: ```bash
Process images with Ollama
python main.py ./images ./output --type image --ollama-vision-model llava-llama3:latest --ollama-text-model llama3.2 --silent
AU extraction with Ollama
python main.py video.mp4 output/ --type AU --ollama-text-model llama3.2 --silent ```
When to Use ChatGPT/Gemini
Recommended for: Advanced video analysis, complex multimodal reasoning, and high-quality content generation.
Benefits: - ✅ State-of-the-art performance: Latest GPT-4o and Gemini models offer superior reasoning capabilities - ✅ Advanced video understanding: Better support for complex video analysis and temporal reasoning - ✅ High-quality outputs: More nuanced and detailed emotion recognition and reasoning - ✅ Robust multimodal integration: Excellent performance across text, image, and video modalities
Example usage: ```bash python main.py video.mp4 output/ --type MER --chatgpt-model gpt-4o --silent
python main.py video.mp4 output/ --type MER --silent ```
Trade-offs: API costs and rate limits, but typically provides the highest quality results for complex emotion reasoning tasks.
When to Use Hugging Face Models
Recommended for: When you need the latest state-of-the-art models or specific features not available in Ollama.
Custom Model Integration: If you want to use the latest HF models or features that Ollama doesn't support:
Option 1 - Implement yourself: Navigate to
mer_factory/models/hf_models/__init__.pyto register your own model and implement the needed functions following our existing patterns.Option 2 - Request support: Open an issue on our repository to let us know which model you'd like us to support, and we'll consider adding it.
Current supported models: google/gemma-3n-E4B-it and others listed in the HF models directory.
Citation
If you find MER-Factory useful in your research or project, please consider giving us a ⭐! Your support helps us grow and continue improving.
Additionally, if you use MER-Factory in your work, please consider cite us using the following BibTeX entries:
```bibtex @software{LinMER-Factory2025, author = {Lin, Yuxiang and Zheng, Shunchao}, doi = {10.5281/zenodo.15847351}, license = {MIT}, month = {7}, title = {{MER-Factory}}, url = {https://github.com/Lum1104/MER-Factory}, version = {0.1.0}, year = {2025} }
@inproceedings{NEURIPS2024c7f43ada, author = {Cheng, Zebang and Cheng, Zhi-Qi and He, Jun-Yan and Wang, Kai and Lin, Yuxiang and Lian, Zheng and Peng, Xiaojiang and Hauptmann, Alexander}, booktitle = {Advances in Neural Information Processing Systems}, editor = {A. Globerson and L. Mackey and D. Belgrave and A. Fan and U. Paquet and J. Tomczak and C. Zhang}, pages = {110805--110853}, publisher = {Curran Associates, Inc.}, title = {Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning}, url = {https://proceedings.neurips.cc/paperfiles/paper/2024/file/c7f43ada17acc234f568dc66da527418-Paper-Conference.pdf}, volume = {37}, year = {2024} } ```
Owner
- Name: Lum
- Login: Lum1104
- Kind: user
- Repositories: 3
- Profile: https://github.com/Lum1104
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: MER-Factory
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Yuxiang
family-names: Lin
email: lin.yuxiang.contact@gmail.com
affiliation: Georgia Institute of Technology
orcid: 'https://orcid.org/0009-0004-7835-9352'
- given-names: Shunchao
family-names: Zheng
email: zsc01036@gmail.com
affiliation: Georgia Institute of Technology
identifiers:
- type: doi
value: 10.5281/zenodo.15847351
repository-code: 'https://github.com/Lum1104/MER-Factory'
url: 'https://lum1104.github.io/MER-Factory/'
license: MIT
date-released: '2025-07-09'
version: 0.1.0
GitHub Events
Total
- Watch event: 21
- Delete event: 1
- Push event: 27
- Pull request event: 1
- Gollum event: 1
- Fork event: 2
- Create event: 1
Last Year
- Watch event: 21
- Delete event: 1
- Push event: 27
- Pull request event: 1
- Gollum event: 1
- Fork event: 2
- Create event: 1
Dependencies
- langchain ==0.3.25
- langchain-community ==0.3.25
- langchain-core ==0.3.65
- langchain-google-genai ==2.1.3
- langchain-ollama ==0.3.3
- langgraph ==0.4.8
- librosa *
- numpy *
- ollama ==0.5.1
- pandas *
- python-dotenv *
- rich *
- timm ==1.0.16
- torch >=2.4.0
- transformers >=4.53.0
- typer ==0.16.0