@superagent-ai/poker-eval

A comprehensive tool for assessing AI Agents performance in simulated poker environments

https://github.com/superagent-ai/poker-eval

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary

Keywords

agents ai evaluation llm llmops

Last synced: 10 months ago · JSON representation ·

Repository

A comprehensive tool for assessing AI Agents performance in simulated poker environments

Basic Info

Host: GitHub
Owner: superagent-ai
Language: TypeScript
Default Branch: main
Homepage: https://www.npmjs.com/package/@superagent-ai/poker-eval
Size: 202 KB

Statistics

Stars: 19
Watchers: 3
Forks: 3
Open Issues: 0
Releases: 1

Topics

agents ai evaluation llm llmops

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme Citation

# PokerEval A comprehensive tool for assessing AI agents performance in simulated poker environments. Written in Typescript. [Leaderboard](#leaderboard-nlth) | [Getting Started](#getting-started) | [Why Poker?](#why-poker) | [Examples](#examples)

Leaderboard NLTH

Each LLM is benchmared over 1000 hands of No Limit Texas Holdem ($1/$2) $300 Cash Game vs 2 vanilla gpt-4o models. We will contiously be releasing benchmarks for new models/agents, feel free to do PRs with your own benchmarks.

| Rank | Agent | BB/100 | |------|-------------------------|---------| | 1 | mistral-large-latest | +11.26 | | 2 | gpt-4o | -14.78 | | 3 | claude-3-5-sonnet-latest| -19.95 | | 4 | gpt-4o-mini | -45.09 | | 5 | gemini-1.5-pro-latest | -166.85 |

Getting started

Install the package

npm i @superagent-ai/poker-eval

Create a game

```ts // index.ts

import { PokerGame } from "@superagent-ai/poker-eval"; import { Player, PlayerAction } from "@superagent-ai/poker-eval/dist/types"

// See example agent: https://github.com/superagent-ai/poker-eval/blob/main/examples/ai-sdk/agent.ts import { generateAction } from "./agent";

async function executeGameSimulation(numHands: number): Promise { // Setup AI players const players: Player[] = [ { name: "GPT 1", action: async (state): Promise => { // Use any model, framework or code to generate a response const action = await generateAction(state); return action; }, }, { name: "GPT 2", action: async (state): Promise => { // Use any model, framework or code to generate a response const action = await generateAction(state); return action; }, }, ];

// Setup a game const game = new PokerGame(players, { defaultChipSize: 1000, smallBlind: 1, bigBlind: 2, });

// Set the output director for stats collection const results = await game.runSimulation(numHands, { outputPath: "./stats" });

console.log(Simulation completed for ${numHands} hands.); console.log("Results:", results); }

// Execute the function with ts-node index.ts executeGameSimulation(5).catch(console.error);

```

Evaluate the agent

After the hands are completed you can find the the dataset in the outputPath you specified above.

| position | holecards | communitycards | bb_profit | |----------|------------|-----------------|-----------| | UTG | Ah Kh | 2d 7c 9h 3s 5d | 3.5 | | CO | Qs Qd | 2d 7c 9h 3s 5d | -1.0 | | BTN | 9c 9s | 2d 7c 9h 3s 5d | 2.0 | | SB | 7h 8h | 2d 7c 9h 3s 5d | -0.5 | | BB | 5c 6c | 2d 7c 9h 3s 5d | 1.0 |

In this example, the dataset shows the position of the player, their hole cards, the community cards, and the big blind profit (bbprofit) for each hand. The positions are labeled according to standard poker terminology (e.g., UTG for Under the Gun, CO for Cutoff, BTN for Button). The hole cards and community cards are represented in a standard card notation format, and the bbprofit indicates the profit or loss in terms of big blinds for the player in that hand.

BB/100, or Big Blinds per 100 hands, is a common metric used in poker to measure a player's win rate. It represents the average number of big blinds a player wins or loses over 100 hands. To calculate BB/100, use the formula:

BB/100 = (Total bb_profit / Number of hands) * 100

This formula provides a standardized measure of performance, allowing for comparison across different sessions or players by normalizing the win rate to a per-100-hands basis.

Why Poker?

Poker combines elements of strategy, psychology, risk assessment, and partial information - perfect for testing an Agent's decision-making skills in complex, uncertain environments. Poker provides measurable KPIs like EV, BB/100, All-In adj BB/100 and VPIP. These KPIs are widely recognized standards, not created by a single company, making them ideal for objectively evaluating an Agent's decision-making skills.

We've specificalyy chosen No Limit Texas Holdem cash games and are officially calling the eval NLTH.

Examples

We've created some examples using populat agent frameworks you can use as inspiration (feel free to contribute):

Citations

json { "cff-version": "1.2.0", "message": "If you use this software, please cite it as below.", "authors": [ { "family-names": "Ismail", "given-names": "Pelaseyed" } ], "title": "Superagent PokerEval", "date-released": "2024-11-25", "url": "https://github.com/superagent-ai/poker-eval" }

Owner

Name: Superagent
Login: superagent-ai
Kind: organization
Email: ismail@superagent.sh
Location: United States of America

Website: https://superagent.sh
Twitter: superagent_ai
Repositories: 10
Profile: https://github.com/superagent-ai

The open framework for building AI Assistants

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Ismail"
  given-names: "Pelaseyd"
title: "Superagent PokerEval"
date-released: 2024-11-25
url: "https://github.com/superagent-ai/poker-eval"

GitHub Events

Total

Watch event: 17
Public event: 1
Fork event: 2

Last Year

Watch event: 17
Public event: 1
Fork event: 2

Committers

Last synced: about 1 year ago

All Time

Total Commits: 16
Total Committers: 1
Avg Commits per committer: 16.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 16
Committers: 1
Avg Commits per committer: 16.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Ismail Pelaseyed	h**p@g**m	16

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 4
Total pull requests: 15
Average time to close issues: about 11 hours
Average time to close pull requests: 8 minutes
Total issue authors: 1
Total pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 4
Pull requests: 15
Average time to close issues: about 11 hours
Average time to close pull requests: 8 minutes
Issue authors: 1
Pull request authors: 1
Average comments per issue: 0.0
Average comments per pull request: 0.0
Merged pull requests: 15
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

homanp (4)

Pull Request Authors

homanp (15)

Top Labels

Issue Labels

chore (4)

Pull Request Labels

enhancement (8) chore (2)

Packages

Total packages: 1
Total downloads:
- npm 2 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2
Total maintainers: 1

npmjs.org: @superagent-ai/poker-eval

A poker game simulation library for Node.js

Homepage: https://github.com/superagent-ai/poker-eval#readme
License: MIT
Latest release: 1.0.1
published over 1 year ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 2 Last month

Rankings

Dependent repos count: 25.5%

Average: 31.2%

Dependent packages count: 36.9%

Maintainers (1)

homanp

Last synced: 11 months ago

Dependencies

.github/workflows/build.yaml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-node v3 composite
actions/upload-artifact v3 composite

examples/ai-sdk/package-lock.json npm

@ai-sdk/openai 1.0.4
@ai-sdk/provider 1.0.1
@ai-sdk/provider-utils 2.0.2
@ai-sdk/react 1.0.2
@ai-sdk/ui-utils 1.0.2
@opentelemetry/api 1.9.0
@superagent-ai/poker-eval 1.0.1
@types/diff-match-patch 1.0.36
ai 4.0.3
chalk 5.3.0
client-only 0.0.1
diff-match-patch 1.0.5
dotenv 16.4.5
eventsource-parser 3.0.0
js-tokens 4.0.0
json-schema 0.4.0
jsondiffpatch 0.6.0
loose-envify 1.4.0
nanoid 3.3.7
poker-ts 1.4.0
react 18.3.1
secure-json-parse 2.7.0
swr 2.2.5
throttleit 2.1.0
use-sync-external-store 1.2.2
zod 3.23.8
zod-to-json-schema 3.23.5

examples/ai-sdk/package.json npm

@ai-sdk/openai ^1.0.4
@superagent-ai/poker-eval ^1.0.1
ai ^4.0.3
dotenv ^16.4.5
zod ^3.23.8

examples/langchain/package-lock.json npm

@langchain/core 0.3.19
@langchain/openai 0.3.14
@superagent-ai/poker-eval 1.0.1
@types/node 18.19.66
@types/node-fetch 2.6.12
@types/retry 0.12.0
@types/uuid 10.0.0
abort-controller 3.0.0
agentkeepalive 4.5.0
ansi-styles 5.2.0
asynckit 0.4.0
base64-js 1.5.1
camelcase 6.3.0
combined-stream 1.0.8
commander 10.0.1
decamelize 1.2.0
delayed-stream 1.0.0
dotenv 16.4.5
event-target-shim 5.0.1
eventemitter3 4.0.7
form-data 4.0.1
form-data-encoder 1.7.2
formdata-node 4.4.1
humanize-ms 1.2.1
js-tiktoken 1.0.15
langsmith 0.2.7
mime-db 1.52.0
mime-types 2.1.35
ms 2.1.3
mustache 4.2.0
node-domexception 1.0.0
node-fetch 2.7.0
openai 4.73.1
p-finally 1.0.0
p-queue 6.6.2
p-retry 4.6.2
p-timeout 3.2.0
poker-ts 1.4.0
retry 0.13.1
semver 7.6.3
tr46 0.0.3
undici-types 5.26.5
uuid 10.0.0
web-streams-polyfill 4.0.0-beta.3
webidl-conversions 3.0.1
whatwg-url 5.0.0
zod 3.23.8
zod-to-json-schema 3.23.5

examples/langchain/package.json npm

@langchain/core ^0.3.19
@langchain/openai ^0.3.14
@superagent-ai/poker-eval ^1.0.1
dotenv ^16.4.5
zod ^3.23.8

examples/llama-index/package-lock.json npm

602 dependencies

examples/llama-index/package.json npm

@superagent-ai/poker-eval ^1.0.1
dotenv ^16.4.5
llamaindex ^0.8.22
zod ^3.23.8

examples/mastra/package-lock.json npm

205 dependencies

examples/mastra/package.json npm

@types/node ^22.10.0 development
tsx ^4.19.2 development
typescript ^5.7.2 development
@mastra/core ^0.1.27-alpha.12
@superagent-ai/poker-eval ^1.0.1
dotenv ^16.4.5

examples/openai/package-lock.json npm

@superagent-ai/poker-eval 1.0.1
@types/node 18.19.66
@types/node-fetch 2.6.12
abort-controller 3.0.0
agentkeepalive 4.5.0
asynckit 0.4.0
combined-stream 1.0.8
delayed-stream 1.0.0
dotenv 16.4.5
event-target-shim 5.0.1
form-data 4.0.1
form-data-encoder 1.7.2
formdata-node 4.4.1
humanize-ms 1.2.1
mime-db 1.52.0
mime-types 2.1.35
ms 2.1.3
node-domexception 1.0.0
node-fetch 2.7.0
openai 4.73.1
poker-ts 1.4.0
tr46 0.0.3
undici-types 5.26.5
web-streams-polyfill 4.0.0-beta.3
webidl-conversions 3.0.1
whatwg-url 5.0.0
zod 3.23.8

examples/openai/package.json npm

@superagent-ai/poker-eval ^1.0.1
dotenv ^16.4.5
openai ^4.73.1
zod ^3.23.8

package-lock.json npm

282 dependencies

package.json npm

@types/jest ^29.5.14 development
@types/node ^18.0.0 development
jest ^29.7.0 development
ts-jest ^29.2.5 development
typescript ^4.9.0 development
poker-ts ^1.0.0

@superagent-ai/poker-eval

Science Score: 44.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Leaderboard NLTH

Getting started

Install the package

Create a game

Evaluate the agent

Why Poker?

Examples

Citations

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

npmjs.org: @superagent-ai/poker-eval

Rankings

Maintainers (1)

Dependencies