https://github.com/adithya-s-k/llm-inferencenet

LLM InferenceNet is a C++ project designed to facilitate fast and efficient inference from Large Language Models (LLMs) using a client-server architecture. It enables optimized interactions with pre-trained language models, making deployment on edge devices easier.

Keywords

cpp llama llamacpp llm

Last synced: 6 months ago · JSON representation

Repository

LLM InferenceNet is a C++ project designed to facilitate fast and efficient inference from Large Language Models (LLMs) using a client-server architecture. It enables optimized interactions with pre-trained language models, making deployment on edge devices easier.

Basic Info

Host: GitHub
Owner: adithya-s-k
Language: C++
Default Branch: main
Homepage:
Size: 57.6 KB

Statistics

Stars: 7
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

cpp llama llamacpp llm

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme

README.md

LLM InferenceNet

LLM InferenceNet is a C++ based project designed to achieve fast inference from Large Language Models (LLMs) by leveraging a client-server architecture. The project aims to make it easier for deployment and to run optimized LLMs on edge devices, ensuring efficient interactions with pre-trained language models.

Introduction

Language models such as LLaMa2 have shown exceptional capabilities in natural language processing tasks. However, running inference on these large models can be computationally intensive. LLM InferenceNet addresses this challenge by providing a C++ implementation that facilitates fast and efficient inference from pre-trained language models.

Project Structure

The project is organized as follows:

/server: This directory contains the source code for the C++ implementation of the inference engine and HTTP server.
/models: In this directory, you can find pre-trained language models used for inference. (Note: Due to model size limitations, you will need to download and place the models in this directory before running the project.)
/docs: This folder contains documentation related to the project.
/examples: Explore this directory to find examples demonstrating how to interact with the inference engine and perform inference through the client-server architecture.

To-Do List

The following are some of the key tasks that need to be addressed:

[ ] Implement the C++ inference engine to load and run pre-trained language models efficiently.
[ ] Design an API for the client-server communication to send input data to the server for inference.
[ ] Implement the HTTP server in C++ to handle client requests and responses.
[ ] Handle concurrent requests efficiently for improved performance.
[ ] Benchmark and optimize the inference process for faster execution.
[ ] Support edge devices and optimize LLMs for deployment on resource-constrained environments.
[ ] Explore model optimization techniques for better performance on edge devices.

Contributions to any of the above tasks or other improvements are highly welcome!

Installation

To get started with LLM InferenceNet, follow these steps:

Clone the repository: git clone https://github.com/adithya-s-k/LLM-InferenceNet.git
Navigate to the project directory: cd LLM-InferenceNet
Install the required dependencies (TorchScript, Boost.Beast, etc.).
Download the pre-trained language models (e.g., LLAMA 2, Vicuna, MPT) and place them in the models/ directory.
Build the project using the provided CMakeLists.txt file.
Run the HTTP server executable.

Detailed installation instructions and usage guidelines can be found in the docs/ directory.

Quick start

bash mkdir build && cd build cmake .. make ./test_SimpleHttpServer # Run unit tests ./SimpleHttpServer # Start the HTTP server on port 8080

Contribution

Contributions to LLM InferenceNet are highly appreciated. If you have any ideas, bug fixes, or enhancements, please feel free to open an issue or submit a pull request. Together, we can make this project even more powerful and efficient.

Let's work together to bring fast and optimized inference capabilities to large language models using C++ and the client-server architecture, enabling easier deployment on edge devices!

Owner

Name: Adithya S K
Login: adithya-s-k
Kind: user
Location: Indian
Company: Cognitivelab

Website: https://adithyask.com/
Twitter: adithya_s_k
Repositories: 60
Profile: https://github.com/adithya-s-k

Exploring Generative AI • Google DSC Lead'23 • Cloud & Full Stack Engineer • Drones & IoT • FOSS Contributor

GitHub Events

Total

Last Year

Committers

Last synced: 6 months ago

All Time

Total Commits: 29
Total Committers: 2
Avg Commits per committer: 14.5
Development Distribution Score (DDS): 0.379

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
raunak kodwani	6****r	18
Adithya S K	a**i@g**m	11

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 0
Total pull requests: 5
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/adithya-s-k/llm-inferencenet

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

LLM InferenceNet

Introduction

Project Structure

To-Do List

Installation

Quick start

Contribution

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels