instructionspipe

Instructions MapReduce

https://github.com/innernull/instructionspipe

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (6.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Instructions MapReduce

Basic Info
  • Host: GitHub
  • Owner: innerNULL
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 354 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

Instructions MapReduce

Background

Nowadays LLMs are very popular with text generation tasks (like QA and summarization), but there are still some potential blockers prevent users having high quality results: * Customized Domain Knowledge: Here my suggestion is domain knowledges are kind of general, it's better if we allow user to define how are they going to use domain knowledges. * Omission: LLMs can not get user's intention very clearly, so it will casually miss sth important. * Hallucination * Long Input: Sometimes the input text can be vary long, which will increase the chance of omission and hallucination, also cause high latency.

To solve above problems, here I propose InstructionsMR framework. This is similar with Hadoop MapReduce, but here we map "instructions" into LLM's responses, and then reduce these response to the final results or next Map/Reduce's inputs.

Quick Start

(Start LLM Server on Local)

Start SGLanag: shell CUDA_VISIBLE_DEVICES=2,3 python3 -m sglang.launch_server --model-path hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --port 8765 --host 127.0.0.1 --quantization awq --tensor-parallel-size 2 --device cuda --dtype auto Start vLLM: shell CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve hugging-quants/Meta-Llama-3.1-8B-Instruct-AWQ-INT4 --dtype bfloat16 --max_model_len 50000 --tensor-parallel-size 4 --gpu-memory-utilization 0.5 --enable-prefix-caching --port 8765

Architecture

Single MapReduce Flow

mermaid graph TD Input[Source Inputs or Last Map/Reduce Outputs] Input --fields scope 1-->InputSubet1[Mapping Inputs 1] Input --fields scope 2-->InputSubet2[Mapping Inputs 2] Input --fields scope n-->InputSubetN[Mapping Inputs n] InputSubet1 --> Instruction1(Map Instruction 1) InputSubet2 --> Instruction2(Map Instruction 2) InputSubetN --> InstructionN(Map Instruction n) Instruction1 --> LlmMapper(LLM Based Mapper) Instruction2 --> LlmMapper InstructionN --> LlmMapper LlmMapper --> MappingOutput1[Mapping Output 1] LlmMapper --> MappingOutput2[Mapping Output 2] LlmMapper --> MappingOutputN[Mapping Output n] MappingOutput1 --> MappingOutputs[Structured Map Outputs] MappingOutput2 --> MappingOutputs MappingOutputN --> MappingOutputs MappingOutputs --fields scope 1--> ReduceInputs1(Reduce Inputs 1) MappingOutputs --fields scope 2--> ReduceInputs2(Reduce Inputs 2) MappingOutputs --fields scope m--> ReduceInputsM(Reduce Inputs m) ReduceInputs1 --> ReduceInstruction1(Reduce Instruction 1) ReduceInputs2 --> ReduceInstruction2(Reduce Instruction 2) ReduceInputsM --> ReduceInstructionM(Reduce Instruction m) ReduceInstruction1 --> LlmReducer(LLM Based Reducer) ReduceInstruction2 --> LlmReducer ReduceInstructionM --> LlmReducer LlmReducer --> ReduceOutput1(Reduce Output 1) LlmReducer --> ReduceOutput2(Reduce Output 2) LlmReducer --> ReduceOutputM(Reduce Output M) ReduceOutput1 --> Outputs[Final Outputs or Next Map/Reduce Inputs] ReduceOutput2 --> Outputs ReduceOutputM --> Outputs

MapReduces Flow

mermaid graph TD OriginInputs[Original Inputs] OriginInputs --> Mapper1(Mapper 1) Mapper1 --> MappingOutputs1[Mapping Outputs 1] MappingOutputs1 --> Reducer1(Reducer 1) Reducer1 --> ReducerOutput1[Reducing Outputs 1] ReducerOutput1 --> Mapper2(Mapper 2) Mapper2 --> MappingOutputs2[Mapping Outputs 2] MappingOutputs2 --> Reducer2(Reducer 2) Reducer2 --> ReducerOutput2[Reducing Outputs 2]

Advantages

Customization

  • Via the configuration of map instructions, users can set:
    • The key information they want focus on.
    • The fields of data they need to finish this instruction.
  • Via the configuration of reduce instructions, users can set:
    • How to group multiple mapping results together into a reduce result.
    • So can define final output format.

Parallelization

All LLMs calling in this implementation are done via async in Python, which means we can parrallelize both map and reduce with using Instruction as minimum parallization unit.

Especially for mapping, it's kind of split a single prompt into several small units and run them at same time. If not, when you have a long input, you may have to generate a long output token by token sequentially. But with mapping, you can have multiple inference run at same time, each will generate a much shorter output. Of course this also means the you need run inference for prefix conditioning multiple time. But as this stage is naturally faster than decoding, not mention we can have some prefix cache machemanism, the overall latency should be still more friendly.

(Custmoized) Information Retrieval

When input is too long, there's higher probability of hallucination and missing information. So for each instruction, instead of feeding all inputs, we can only feed relevant information to LLMs to follow instruction.

To do above, we first have each mapper/reduce have a JSON input, which each key corresponding with one data. In defination of struct Instruction, there's an member variable called scope, which can define to follow this instruction, which fields from inputs will be used.

Omission & Hallucination Checking

As each map/reduce is an independent LLM calling, you can fit any prompt-engineering based text generation technique in the concrete mapper/reducer implementation to solve the problem of omission & hallucination problems.

Drawbacks

Can Not Streaming E2E

As each map/reduce (except the initial ones) depends on previous map/reduce's outputs, so we can only streaming the last input but not the intermediate ones.

Can Not 100% Eliminate Hallucinations and Omissions

This is for sure for all LLMs based solutions.

Q&A

  • For long document, why don't RAG to retrive most relevant parts?
    • For POC key based retrieval is enought, but yes RAG is necessary for long term.
  • Why didn't use LangChain or some other prompt-engineering frameworks?
    • No need to include unnecessary abstractions.

POC

Here I build a POC for EHR documents summarization. The input is a semi-structured JSON EHR generate by ChatGPT with prompt.

We can use a single MR to solve this problem.

Source Inputs

json { "patient": {...}, "allergies": [...], "diagnosis": [...], "encounters": [...], "labs": [...], "medications": [...], "procedures": [...], "visits": [...], "notes": [...] }

Reducer

Reducer here are just multiple "Re-writing" instructions running at same time on outputs of specific instructions in mapping stage.

json { "Demography": "...", "Personal Histories": "...", "(Historical) Subjectives": "...", "(Historical) Objectives": "...", "(Historical) Assessments": "...", "(Historical) Plans": "..." }

Final Outputs

```

Demography

...

Personal Histories

...

(Historical) Subjectives

...

(Historical) Objectives

...

(Historical) Assessments

...

(Historical) Plans

... ```

Cite This Work

@software{Liu_Instructions-MapReduce_2024, author = {Liu, Yutong}, month = nov, title = {{Instructions-MapReduce}}, url = {https://github.com/innerNULL/instructions-mr/tree/main}, version = {0.0.1}, year = {2024} }

References

Owner

  • Name: inull
  • Login: innerNULL
  • Kind: user

All inner NULL will in NULL

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Liu"
  given-names: "Yutong"
  orcid: "https://orcid.org/0009-0005-7038-5801"
title: "Instructions-MapReduce"
version: 2.0.4
date-released: 2024-11-13
url: "https://github.com/innerNULL/instructions-mr/tree/main"

GitHub Events

Total
  • Issues event: 1
  • Push event: 162
  • Pull request event: 76
  • Create event: 2
Last Year
  • Issues event: 1
  • Push event: 162
  • Pull request event: 76
  • Create event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 37
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 19 hours
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 29
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 37
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 19 hours
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 29
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • HichamINOX (1)
  • innerNULL (1)
Pull Request Authors
  • innerNULL (40)
Top Labels
Issue Labels
Pull Request Labels