Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.5%) to scientific vocabulary
Repository
Mixture of Lora Experts
Basic Info
Statistics
- Stars: 2
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
MoLE(Mixture of Lora Experts)
MoLE is a novel approach to fine-tuning large language models (LLMs) for multiple tasks simultaneously, leveraging the concept of specialized LoRA adapters that dynamically adapt the base model's behavior based on task requirements. MoLE is designed to enhance the versatility and performance of pre-trained LLMs by incorporating task-specific/language-specific adapters. These adapters are automatically selected and merged with the base model during inference, guided by a task classifier trained during the fine-tuning process.
|
|
Key ComponentsBase ModelThe MoLE architecture starts with a pre-trained base LLM (such as Llama | Mistal | Gemma) that serves as a foundation for all tasks. This base model captures general language understanding and can be fine-tuned for specific tasks. Lora AdaptersLora adapters are specialized modules tailored for individual tasks. Each Lora adapter encapsulates task-specific knowledge and is designed to seamlessly integrate with the base model to enhance its capabilities for the given task. Task ClassifierDuring the fine-tuning process, MoLE trains a task classifier that determines the most appropriate Lora adapter for a given input. This classifier learns to identify task categories and selects the corresponding Lora adapter to apply at runtime. |
Workflow
- Fine-tuning: The base model is fine-tuned on a diverse set of tasks.
- Lora Selection: A task classifier is trained concurrently to predict which Lora adapter to use for each task.
- Inference: During inference, the task classifier identifies the task category of the input, and MoLE seamlessly integrates the corresponding Lora adapter with the base model to generate task-specific outputs.
Benefits
- Task Specialization: MoLE enables the base model to adapt dynamically to diverse tasks without catastrophic forgetting.
- Improved Performance: By leveraging task-specific Lora adapters, MoLE achieves enhanced performance across multiple tasks.
- Scalability: The modular design of MoLE allows for easy integration of new tasks through the addition of custom Lora adapters.
Getting Started
To use MoLE for your own tasks, follow these steps:
- Prepare Data: Organize your dataset with labeled examples for each task.
- Fine-tuning: Fine-tune the base LLM using MoLE, specifying the tasks of interest.
- Integration: Implement task-specific Lora adapters and train the task classifier.
- Inference: Deploy MoLE for inference, where it automatically selects and applies the appropriate Lora adapter based on input tasks.
Installation
Clone the repo:
git clone https://github.com/adithya-s-k/MoLE
cd MoLE
Create a virtual environment using virtualenv or conda depending on your preferences. We require Python 3.10 or above:
conda create -n mole-venv python=3.10 && conda activate mole-venv
Install the dependencies. For the default installation, you just need:
pip install .
If you want to push your results to the Hugging Face Hub, don't forget to add your access token to the environment variable HUGGINGFACEHUB_TOKEN. You can do this by running:
huggingface-cli login
Training Parameters
```yaml
model arguments
basemodel: tokenisemodel: model_type:
classifer arguments
bertclassifier: true embeddingclassifier: true
tasks arguments
tasks: nameoftask1: datasetname: datasetsubset: datasetsplit: promptformate: numepochs: 1 steps: nameoftask2: datasetname: datasetsubset: datasetsplit: promptformate: numepochs: 1 steps:
lora arguments
adapter: qlora #either lora or qlora lorar: 32 loraalpha: 16 loradropout: 0.05 loratargetlinear: true loratargetmodules: - gateproj - downproj - upproj - qproj - vproj - kproj - oproj
training arguments
gradientaccumulationsteps: 4 microbatchsize: 2 optimizer: adamwbnb8bit lrscheduler: cosine learningrate: 0.0002
wandb arguments to track training
wandbproject: wandbentity: wandbwatch: wandbname: wandblogmodel:
```
Contributing
We welcome contributions to MoLE! If you have ideas for improvements or would like to extend MoLE with new features, please open an issue or submit a pull request on our GitHub repository.
Owner
- Name: Adithya S K
- Login: adithya-s-k
- Kind: user
- Location: Indian
- Company: Cognitivelab
- Website: https://adithyask.com/
- Twitter: adithya_s_k
- Repositories: 60
- Profile: https://github.com/adithya-s-k
Exploring Generative AI • Google DSC Lead'23 • Cloud & Full Stack Engineer • Drones & IoT • FOSS Contributor
GitHub Events
Total
- Watch event: 9
Last Year
- Watch event: 9
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- GitPython >=3.1.41
- aenum ==3.1.15
- colorama *
- datasets >=2.14.0
- huggingface_hub >=0.22.0
- nltk ==3.8.1
- protobuf ==3.20.*
- pycountry *
- pytablewriter *
- rouge_score ==0.1.2
- sacrebleu *
- scikit-learn *
- sentencepiece >=0.1.99
- spacy ==3.7.2
- termcolor ==2.3.0
- torch >=2.0
- transformers >=4.38.0