https://github.com/charlespikachu/toolbridge
ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Repository
ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
Basic Info
- Host: GitHub
- Owner: CharlesPikachu
- License: apache-2.0
- Default Branch: main
- Homepage: https://arxiv.org/abs/2410.10872
- Size: 259 KB
Statistics
- Stars: 9
- Watchers: 1
- Forks: 2
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities
Through the incorporation of external tools, large language models (LLMs), exemplified by GPT-4o and Llama 3.1, materially expand their functional capabilities, evolving from rudimentary dialogue systems to versatile general-purpose AI assistants. We propose that the foremost contributors to these advancements are the richness and heterogeneity of the training data. However, the existing LLMs with external tool integration provide only limited transparency pertaining to their datasets and data curation methodologies, motivating the initiation of this study. Specifically, this paper endeavors to present a detailed exposition of the methodology for constructing data entries that facilitate LLMs in effectively learning how to utilize external tools and disseminate this process to the public through the introduction of ToolBridge. ToolBridge seeks to harness a collection of open-source datasets as its raw dataset pool and then incorporates a series of strategies to determine appropriate data entries, upon which external tool insertions are executed. By supervised fine-tuning (SFT) on these curated entries, LLMs can develop proficiency in invoking external tools in appropriate contexts to improve their predictive accuracy, particularly for essential functions including numerical computation, data processing and factual retrieval. Our experiments rigorously isolate model architectures and training configurations, zeroing in exclusively on the role of data. The experimental results demonstrate that LLMs trained on ToolBridge consistently achieve superior performance across standard and custom evaluation datasets. To reinforce transparency, we will publicly release all associated code and data as open source, facilitating research on equipping LLMs with external tool capabilities.
Overview of Construction Pipeline

Data Examples from ToolBridge

ToolBridge Construct Pipeline
Coming Soon
Download Dataset
Download from HuggingFace
Load with Python,
```python from datasets import load_dataset
dataset = load_dataset("BLWX/ToolBridge") print(dataset['train'][0]) ```
Then, you can see,
python
{'original': {'messages': [{'role': 'user', 'content': '题目:\n鹿妈妈买了24个苹果,她想平均分给她的3只小鹿吃,每只小鹿可以分到几个苹果?'}, {'role': 'assistant', 'content': '鹿妈妈买了24个苹果,平均分给3只小鹿吃,那么每只小鹿可以分到的苹果数就是总苹果数除以小鹿的只数。\n24÷3=8\n每只小鹿可以分到8个苹果。所以,答案是每只小鹿可以分到8个苹果。'}]}, 'inserted': {'messages': [{'role': 'user', 'content': '题目:\n鹿妈妈买了24个苹果,她想平均分给她的3只小鹿吃,每只小鹿可以分到几个苹果?'}, {'role': 'assistant', 'content': '鹿妈妈买了24个苹果,平均分给3只小鹿吃,那么每只小鹿可以分到的苹果数就是总苹果数除以小鹿的只数。\n<python>total_apples = 24\nnum_deer = 3\napples_per_deer = total_apples // num_deer\nprint(apples_per_deer)</python>\n24÷3=8\n每只小鹿可以分到8个苹果。所以,答案是每只小鹿可以分到8个苹果。'}]}, 'insertedwithresults': {'messages': [{'role': 'user', 'content': '题目:\n鹿妈妈买了24个苹果,她想平均分给她的3只小鹿吃,每只小鹿可以分到几个苹果?'}, {'role': 'assistant', 'content': '鹿妈妈买了24个苹果,平均分给3只小鹿吃,那么每只小鹿可以分到的苹果数就是总苹果数除以小鹿的只数。\n<python>total_apples = 24\nnum_deer = 3\napples_per_deer = total_apples // num_deer\nprint(apples_per_deer)</python><result>8</result>\n24÷3=8\n每只小鹿可以分到8个苹果。所以,答案是每只小鹿可以分到8个苹果。'}]}}
where,
- original: Data entries from public sources.
- inserted: Data entries with special characters
<python>and</python>. - insertedwithresults: Data entries with special characters
<python>,</python>,<result>and</result>.
Download from Github
Download ToolBridge datasets,
sh
wget https://github.com/CharlesPikachu/ToolBridge/releases/download/datasets/toolbridge_178k.pkl
Load with Python,
```python import pickle
data = pickle.load(open('toolbridge_178k.pkl', 'rb')) print(data[0]) ```
Then, you can see,
python
{'original': {'messages': [{'role': 'user', 'content': '题目:\n鹿妈妈买了24个苹果,她想平均分给她的3只小鹿吃,每只小鹿可以分到几个苹果?'}, {'role': 'assistant', 'content': '鹿妈妈买了24个苹果,平均分给3只小鹿吃,那么每只小鹿可以分到的苹果数就是总苹果数除以小鹿的只数。\n24÷3=8\n每只小鹿可以分到8个苹果。所以,答案是每只小鹿可以分到8个苹果。'}]}, 'inserted': {'messages': [{'role': 'user', 'content': '题目:\n鹿妈妈买了24个苹果,她想平均分给她的3只小鹿吃,每只小鹿可以分到几个苹果?'}, {'role': 'assistant', 'content': '鹿妈妈买了24个苹果,平均分给3只小鹿吃,那么每只小鹿可以分到的苹果数就是总苹果数除以小鹿的只数。\n<python>total_apples = 24\nnum_deer = 3\napples_per_deer = total_apples // num_deer\nprint(apples_per_deer)</python>\n24÷3=8\n每只小鹿可以分到8个苹果。所以,答案是每只小鹿可以分到8个苹果。'}]}, 'insertedwithresults': {'messages': [{'role': 'user', 'content': '题目:\n鹿妈妈买了24个苹果,她想平均分给她的3只小鹿吃,每只小鹿可以分到几个苹果?'}, {'role': 'assistant', 'content': '鹿妈妈买了24个苹果,平均分给3只小鹿吃,那么每只小鹿可以分到的苹果数就是总苹果数除以小鹿的只数。\n<python>total_apples = 24\nnum_deer = 3\napples_per_deer = total_apples // num_deer\nprint(apples_per_deer)</python><result>8</result>\n24÷3=8\n每只小鹿可以分到8个苹果。所以,答案是每只小鹿可以分到8个苹果。'}]}}
where,
- original: Data entries from public sources.
- inserted: Data entries with special characters
<python>and</python>. - insertedwithresults: Data entries with special characters
<python>,</python>,<result>and</result>.
Citation
If you use ToolBridge in your research, please consider citing this project,
latex
@article{jin2024toolbridge,
title={ToolBridge: An Open-Source Dataset to Equip LLMs with External Tool Capabilities},
author={Jin, Zhenchao and Liu, Mengchen and Chen, Dongdong and Zhu, Lingting and Li, Yunsheng and Yu, Lequan},
journal={arXiv preprint arXiv:2410.10872},
year={2024}
}
Owner
- Login: CharlesPikachu
- Kind: user
- Location: HongKong
- Company: HKU
- Website: https://charlespikachu.github.io/
- Repositories: 27
- Profile: https://github.com/CharlesPikachu
GitHub Events
Total
- Create event: 1
- Release event: 1
- Issues event: 1
- Watch event: 7
- Issue comment event: 1
- Push event: 4
- Fork event: 2
Last Year
- Create event: 1
- Release event: 1
- Issues event: 1
- Watch event: 7
- Issue comment event: 1
- Push event: 4
- Fork event: 2