https://github.com/adithya-s-k/llm-inferencenet

LLM InferenceNet is a C++ project designed to facilitate fast and efficient inference from Large Language Models (LLMs) using a client-server architecture. It enables optimized interactions with pre-trained language models, making deployment on edge devices easier.

https://github.com/adithya-s-k/llm-inferencenet

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.3%) to scientific vocabulary

Keywords

cpp llama llamacpp llm
Last synced: 6 months ago · JSON representation

Repository

LLM InferenceNet is a C++ project designed to facilitate fast and efficient inference from Large Language Models (LLMs) using a client-server architecture. It enables optimized interactions with pre-trained language models, making deployment on edge devices easier.

Basic Info
  • Host: GitHub
  • Owner: adithya-s-k
  • Language: C++
  • Default Branch: main
  • Homepage:
  • Size: 57.6 KB
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
cpp llama llamacpp llm
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme

README.md

LLM InferenceNet

LLM InferenceNet is a C++ based project designed to achieve fast inference from Large Language Models (LLMs) by leveraging a client-server architecture. The project aims to make it easier for deployment and to run optimized LLMs on edge devices, ensuring efficient interactions with pre-trained language models.

Introduction

Language models such as LLaMa2 have shown exceptional capabilities in natural language processing tasks. However, running inference on these large models can be computationally intensive. LLM InferenceNet addresses this challenge by providing a C++ implementation that facilitates fast and efficient inference from pre-trained language models.

Project Structure

The project is organized as follows:

  • /server: This directory contains the source code for the C++ implementation of the inference engine and HTTP server.
  • /models: In this directory, you can find pre-trained language models used for inference. (Note: Due to model size limitations, you will need to download and place the models in this directory before running the project.)
  • /docs: This folder contains documentation related to the project.
  • /examples: Explore this directory to find examples demonstrating how to interact with the inference engine and perform inference through the client-server architecture.

To-Do List

The following are some of the key tasks that need to be addressed:

  • [ ] Implement the C++ inference engine to load and run pre-trained language models efficiently.
  • [ ] Design an API for the client-server communication to send input data to the server for inference.
  • [ ] Implement the HTTP server in C++ to handle client requests and responses.
  • [ ] Handle concurrent requests efficiently for improved performance.
  • [ ] Benchmark and optimize the inference process for faster execution.
  • [ ] Support edge devices and optimize LLMs for deployment on resource-constrained environments.
  • [ ] Explore model optimization techniques for better performance on edge devices.

Contributions to any of the above tasks or other improvements are highly welcome!

Installation

To get started with LLM InferenceNet, follow these steps:

  1. Clone the repository: git clone https://github.com/adithya-s-k/LLM-InferenceNet.git
  2. Navigate to the project directory: cd LLM-InferenceNet
  3. Install the required dependencies (TorchScript, Boost.Beast, etc.).
  4. Download the pre-trained language models (e.g., LLAMA 2, Vicuna, MPT) and place them in the models/ directory.
  5. Build the project using the provided CMakeLists.txt file.
  6. Run the HTTP server executable.

Detailed installation instructions and usage guidelines can be found in the docs/ directory.

Quick start

bash mkdir build && cd build cmake .. make ./test_SimpleHttpServer # Run unit tests ./SimpleHttpServer # Start the HTTP server on port 8080

Contribution

Contributions to LLM InferenceNet are highly appreciated. If you have any ideas, bug fixes, or enhancements, please feel free to open an issue or submit a pull request. Together, we can make this project even more powerful and efficient.

Let's work together to bring fast and optimized inference capabilities to large language models using C++ and the client-server architecture, enabling easier deployment on edge devices!

Owner

  • Name: Adithya S K
  • Login: adithya-s-k
  • Kind: user
  • Location: Indian
  • Company: Cognitivelab

Exploring Generative AI • Google DSC Lead'23 • Cloud & Full Stack Engineer • Drones & IoT • FOSS Contributor

GitHub Events

Total
Last Year

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 29
  • Total Committers: 2
  • Avg Commits per committer: 14.5
  • Development Distribution Score (DDS): 0.379
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
raunak kodwani 6****r 18
Adithya S K a****i@g****m 11

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • kanuar (5)
Top Labels
Issue Labels
Pull Request Labels