https://github.com/adithya-s-k/llm-inferencenet
LLM InferenceNet is a C++ project designed to facilitate fast and efficient inference from Large Language Models (LLMs) using a client-server architecture. It enables optimized interactions with pre-trained language models, making deployment on edge devices easier.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.3%) to scientific vocabulary
Keywords
Repository
LLM InferenceNet is a C++ project designed to facilitate fast and efficient inference from Large Language Models (LLMs) using a client-server architecture. It enables optimized interactions with pre-trained language models, making deployment on edge devices easier.
Basic Info
Statistics
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
LLM InferenceNet
LLM InferenceNet is a C++ based project designed to achieve fast inference from Large Language Models (LLMs) by leveraging a client-server architecture. The project aims to make it easier for deployment and to run optimized LLMs on edge devices, ensuring efficient interactions with pre-trained language models.
Introduction
Language models such as LLaMa2 have shown exceptional capabilities in natural language processing tasks. However, running inference on these large models can be computationally intensive. LLM InferenceNet addresses this challenge by providing a C++ implementation that facilitates fast and efficient inference from pre-trained language models.
Project Structure
The project is organized as follows:
- /server: This directory contains the source code for the C++ implementation of the inference engine and HTTP server.
- /models: In this directory, you can find pre-trained language models used for inference. (Note: Due to model size limitations, you will need to download and place the models in this directory before running the project.)
- /docs: This folder contains documentation related to the project.
- /examples: Explore this directory to find examples demonstrating how to interact with the inference engine and perform inference through the client-server architecture.
To-Do List
The following are some of the key tasks that need to be addressed:
- [ ] Implement the C++ inference engine to load and run pre-trained language models efficiently.
- [ ] Design an API for the client-server communication to send input data to the server for inference.
- [ ] Implement the HTTP server in C++ to handle client requests and responses.
- [ ] Handle concurrent requests efficiently for improved performance.
- [ ] Benchmark and optimize the inference process for faster execution.
- [ ] Support edge devices and optimize LLMs for deployment on resource-constrained environments.
- [ ] Explore model optimization techniques for better performance on edge devices.
Contributions to any of the above tasks or other improvements are highly welcome!
Installation
To get started with LLM InferenceNet, follow these steps:
- Clone the repository:
git clone https://github.com/adithya-s-k/LLM-InferenceNet.git - Navigate to the project directory:
cd LLM-InferenceNet - Install the required dependencies (TorchScript, Boost.Beast, etc.).
- Download the pre-trained language models (e.g., LLAMA 2, Vicuna, MPT) and place them in the
models/directory. - Build the project using the provided CMakeLists.txt file.
- Run the HTTP server executable.
Detailed installation instructions and usage guidelines can be found in the docs/ directory.
Quick start
bash
mkdir build && cd build
cmake ..
make
./test_SimpleHttpServer # Run unit tests
./SimpleHttpServer # Start the HTTP server on port 8080
Contribution
Contributions to LLM InferenceNet are highly appreciated. If you have any ideas, bug fixes, or enhancements, please feel free to open an issue or submit a pull request. Together, we can make this project even more powerful and efficient.
Let's work together to bring fast and optimized inference capabilities to large language models using C++ and the client-server architecture, enabling easier deployment on edge devices!
Owner
- Name: Adithya S K
- Login: adithya-s-k
- Kind: user
- Location: Indian
- Company: Cognitivelab
- Website: https://adithyask.com/
- Twitter: adithya_s_k
- Repositories: 60
- Profile: https://github.com/adithya-s-k
Exploring Generative AI • Google DSC Lead'23 • Cloud & Full Stack Engineer • Drones & IoT • FOSS Contributor
GitHub Events
Total
Last Year
Committers
Last synced: 6 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| raunak kodwani | 6****r | 18 |
| Adithya S K | a****i@g****m | 11 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: less than a minute
- Total issue authors: 0
- Total pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- kanuar (5)