instructionspipe

Instructions MapReduce

https://github.com/innernull/instructionspipe

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (6.8%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Instructions MapReduce

Basic Info

Host: GitHub
Owner: innerNULL
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 354 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed 12 months ago

Metadata Files

Readme License Citation

Instructions MapReduce

Background

Nowadays LLMs are very popular with text generation tasks (like QA and summarization), but there are still some potential blockers prevent users having high quality results: * Customized Domain Knowledge: Here my suggestion is domain knowledges are kind of general, it's better if we allow user to define how are they going to use domain knowledges. * Omission: LLMs can not get user's intention very clearly, so it will casually miss sth important. * Hallucination * Long Input: Sometimes the input text can be vary long, which will increase the chance of omission and hallucination, also cause high latency.

To solve above problems, here I propose InstructionsMR framework. This is similar with Hadoop MapReduce, but here we map "instructions" into LLM's responses, and then reduce these response to the final results or next Map/Reduce's inputs.

Quick Start

(Start LLM Server on Local)

Start SGLanag: shell CUDA_VISIBLE_DEVICES=2,3 python3 -m sglang.launch_server --model-path hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --port 8765 --host 127.0.0.1 --quantization awq --tensor-parallel-size 2 --device cuda --dtype auto Start vLLM: shell CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --dtype bfloat16 --max_model_len 50000 --tensor-parallel-size 4 --gpu-memory-utilization 0.5 --enable-prefix-caching --port 8765

Architecture

Single MapReduce Flow

mermaid graph TD Input[Source Inputs or Last Map/Reduce Outputs] Input --fields scope 1-->InputSubet1[Mapping Inputs 1] Input --fields scope 2-->InputSubet2[Mapping Inputs 2] Input --fields scope n-->InputSubetN[Mapping Inputs n] InputSubet1 --> Instruction1(Map Instruction 1) InputSubet2 --> Instruction2(Map Instruction 2) InputSubetN --> InstructionN(Map Instruction n) Instruction1 --> LlmMapper(LLM Based Mapper) Instruction2 --> LlmMapper InstructionN --> LlmMapper LlmMapper --> MappingOutput1[Mapping Output 1] LlmMapper --> MappingOutput2[Mapping Output 2] LlmMapper --> MappingOutputN[Mapping Output n] MappingOutput1 --> MappingOutputs[Structured Map Outputs] MappingOutput2 --> MappingOutputs MappingOutputN --> MappingOutputs MappingOutputs --fields scope 1--> ReduceInputs1(Reduce Inputs 1) MappingOutputs --fields scope 2--> ReduceInputs2(Reduce Inputs 2) MappingOutputs --fields scope m--> ReduceInputsM(Reduce Inputs m) ReduceInputs1 --> ReduceInstruction1(Reduce Instruction 1) ReduceInputs2 --> ReduceInstruction2(Reduce Instruction 2) ReduceInputsM --> ReduceInstructionM(Reduce Instruction m) ReduceInstruction1 --> LlmReducer(LLM Based Reducer) ReduceInstruction2 --> LlmReducer ReduceInstructionM --> LlmReducer LlmReducer --> ReduceOutput1(Reduce Output 1) LlmReducer --> ReduceOutput2(Reduce Output 2) LlmReducer --> ReduceOutputM(Reduce Output M) ReduceOutput1 --> Outputs[Final Outputs or Next Map/Reduce Inputs] ReduceOutput2 --> Outputs ReduceOutputM --> Outputs

MapReduces Flow

mermaid graph TD OriginInputs[Original Inputs] OriginInputs --> Mapper1(Mapper 1) Mapper1 --> MappingOutputs1[Mapping Outputs 1] MappingOutputs1 --> Reducer1(Reducer 1) Reducer1 --> ReducerOutput1[Reducing Outputs 1] ReducerOutput1 --> Mapper2(Mapper 2) Mapper2 --> MappingOutputs2[Mapping Outputs 2] MappingOutputs2 --> Reducer2(Reducer 2) Reducer2 --> ReducerOutput2[Reducing Outputs 2]

Advantages

Customization

Via the configuration of map instructions, users can set:
- The key information they want focus on.
- The fields of data they need to finish this instruction.
Via the configuration of reduce instructions, users can set:
- How to group multiple mapping results together into a reduce result.
- So can define final output format.

Parallelization

All LLMs calling in this implementation are done via async in Python, which means we can parrallelize both map and reduce with using Instruction as minimum parallization unit.

Especially for mapping, it's kind of split a single prompt into several small units and run them at same time. If not, when you have a long input, you may have to generate a long output token by token sequentially. But with mapping, you can have multiple inference run at same time, each will generate a much shorter output. Of course this also means the you need run inference for prefix conditioning multiple time. But as this stage is naturally faster than decoding, not mention we can have some prefix cache machemanism, the overall latency should be still more friendly.

(Custmoized) Information Retrieval

When input is too long, there's higher probability of hallucination and missing information. So for each instruction, instead of feeding all inputs, we can only feed relevant information to LLMs to follow instruction.

To do above, we first have each mapper/reduce have a JSON input, which each key corresponding with one data. In defination of struct Instruction, there's an member variable called scope, which can define to follow this instruction, which fields from inputs will be used.

Omission & Hallucination Checking

As each map/reduce is an independent LLM calling, you can fit any prompt-engineering based text generation technique in the concrete mapper/reducer implementation to solve the problem of omission & hallucination problems.

Drawbacks

Can Not Streaming E2E

As each map/reduce (except the initial ones) depends on previous map/reduce's outputs, so we can only streaming the last input but not the intermediate ones.

Can Not 100% Eliminate Hallucinations and Omissions

This is for sure for all LLMs based solutions.

Q&A

For long document, why don't RAG to retrive most relevant parts?
- For POC key based retrieval is enought, but yes RAG is necessary for long term.
Why didn't use LangChain or some other prompt-engineering frameworks?
- No need to include unnecessary abstractions.

POC

Here I build a POC for EHR documents summarization. The input is a semi-structured JSON EHR generate by ChatGPT with prompt.

We can use a single MR to solve this problem.

Source Inputs

json { "patient": {...}, "allergies": [...], "diagnosis": [...], "encounters": [...], "labs": [...], "medications": [...], "procedures": [...], "visits": [...], "notes": [...] }

Reducer

Reducer here are just multiple "Re-writing" instructions running at same time on outputs of specific instructions in mapping stage.

json { "Demography": "...", "Personal Histories": "...", "(Historical) Subjectives": "...", "(Historical) Objectives": "...", "(Historical) Assessments": "...", "(Historical) Plans": "..." }

Final Outputs

```

Demography

...

Personal Histories

...

(Historical) Subjectives

...

(Historical) Objectives

...

(Historical) Assessments

...

(Historical) Plans

... ```

Cite This Work

@software{Liu_Instructions-MapReduce_2024, author = {Liu, Yutong}, month = nov, title = {{Instructions-MapReduce}}, url = {https://github.com/innerNULL/instructions-mr/tree/main}, version = {0.0.1}, year = {2024} }

References

Owner

Name: inull
Login: innerNULL
Kind: user

Website: inull.github.com
Repositories: 6
Profile: https://github.com/innerNULL

All inner NULL will in NULL

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Liu"
  given-names: "Yutong"
  orcid: "https://orcid.org/0009-0005-7038-5801"
title: "Instructions-MapReduce"
version: 2.0.4
date-released: 2024-11-13
url: "https://github.com/innerNULL/instructions-mr/tree/main"

GitHub Events

Total

Issues event: 1
Push event: 162
Pull request event: 76
Create event: 2

Last Year

Issues event: 1
Push event: 162
Pull request event: 76
Create event: 2

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 2
Total pull requests: 37
Average time to close issues: 2 months
Average time to close pull requests: about 19 hours
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 29
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 37
Average time to close issues: 2 months
Average time to close pull requests: about 19 hours
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 29
Bot issues: 0
Bot pull requests: 0

instructionspipe

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Instructions MapReduce

Background

Quick Start

(Start LLM Server on Local)

Architecture

Single MapReduce Flow

MapReduces Flow

Advantages

Customization

Parallelization

(Custmoized) Information Retrieval

Omission & Hallucination Checking

Drawbacks

Can Not Streaming E2E

Can Not 100% Eliminate Hallucinations and Omissions

Q&A

POC

Source Inputs

Reducer

Final Outputs

Demography

Personal Histories

(Historical) Subjectives

(Historical) Objectives

(Historical) Assessments

(Historical) Plans

Cite This Work

References

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels