@superagent-ai/poker-eval
A comprehensive tool for assessing AI Agents performance in simulated poker environments
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Keywords
Repository
A comprehensive tool for assessing AI Agents performance in simulated poker environments
Basic Info
- Host: GitHub
- Owner: superagent-ai
- Language: TypeScript
- Default Branch: main
- Homepage: https://www.npmjs.com/package/@superagent-ai/poker-eval
- Size: 202 KB
Statistics
- Stars: 19
- Watchers: 3
- Forks: 3
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
Leaderboard NLTH
Each LLM is benchmared over 1000 hands of No Limit Texas Holdem ($1/$2) $300 Cash Game vs 2 vanilla gpt-4o models. We will contiously be releasing benchmarks for new models/agents, feel free to do PRs with your own benchmarks.
| Rank | Agent | BB/100 | |------|-------------------------|---------| | 1 | mistral-large-latest | +11.26 | | 2 | gpt-4o | -14.78 | | 3 | claude-3-5-sonnet-latest| -19.95 | | 4 | gpt-4o-mini | -45.09 | | 5 | gemini-1.5-pro-latest | -166.85 |
Getting started
Install the package
npm i @superagent-ai/poker-eval
Create a game
```ts // index.ts
import { PokerGame } from "@superagent-ai/poker-eval"; import { Player, PlayerAction } from "@superagent-ai/poker-eval/dist/types"
// See example agent: https://github.com/superagent-ai/poker-eval/blob/main/examples/ai-sdk/agent.ts import { generateAction } from "./agent";
async function executeGameSimulation(numHands: number): Promise
// Setup a game const game = new PokerGame(players, { defaultChipSize: 1000, smallBlind: 1, bigBlind: 2, });
// Set the output director for stats collection const results = await game.runSimulation(numHands, { outputPath: "./stats" });
console.log(Simulation completed for ${numHands} hands.);
console.log("Results:", results);
}
// Execute the function with ts-node index.ts executeGameSimulation(5).catch(console.error);
```
Evaluate the agent
After the hands are completed you can find the the dataset in the outputPath you specified above.
| position | holecards | communitycards | bb_profit | |----------|------------|-----------------|-----------| | UTG | Ah Kh | 2d 7c 9h 3s 5d | 3.5 | | CO | Qs Qd | 2d 7c 9h 3s 5d | -1.0 | | BTN | 9c 9s | 2d 7c 9h 3s 5d | 2.0 | | SB | 7h 8h | 2d 7c 9h 3s 5d | -0.5 | | BB | 5c 6c | 2d 7c 9h 3s 5d | 1.0 |
In this example, the dataset shows the position of the player, their hole cards, the community cards, and the big blind profit (bbprofit) for each hand. The positions are labeled according to standard poker terminology (e.g., UTG for Under the Gun, CO for Cutoff, BTN for Button). The hole cards and community cards are represented in a standard card notation format, and the bbprofit indicates the profit or loss in terms of big blinds for the player in that hand.
BB/100, or Big Blinds per 100 hands, is a common metric used in poker to measure a player's win rate. It represents the average number of big blinds a player wins or loses over 100 hands. To calculate BB/100, use the formula:
BB/100 = (Total bb_profit / Number of hands) * 100
This formula provides a standardized measure of performance, allowing for comparison across different sessions or players by normalizing the win rate to a per-100-hands basis.
Why Poker?
Poker combines elements of strategy, psychology, risk assessment, and partial information - perfect for testing an Agent's decision-making skills in complex, uncertain environments. Poker provides measurable KPIs like EV, BB/100, All-In adj BB/100 and VPIP. These KPIs are widely recognized standards, not created by a single company, making them ideal for objectively evaluating an Agent's decision-making skills.
We've specificalyy chosen No Limit Texas Holdem cash games and are officially calling the eval NLTH.
Examples
We've created some examples using populat agent frameworks you can use as inspiration (feel free to contribute):
Citations
json
{
"cff-version": "1.2.0",
"message": "If you use this software, please cite it as below.",
"authors": [
{
"family-names": "Ismail",
"given-names": "Pelaseyed"
}
],
"title": "Superagent PokerEval",
"date-released": "2024-11-25",
"url": "https://github.com/superagent-ai/poker-eval"
}
Owner
- Name: Superagent
- Login: superagent-ai
- Kind: organization
- Email: ismail@superagent.sh
- Location: United States of America
- Website: https://superagent.sh
- Twitter: superagent_ai
- Repositories: 10
- Profile: https://github.com/superagent-ai
The open framework for building AI Assistants
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Ismail" given-names: "Pelaseyd" title: "Superagent PokerEval" date-released: 2024-11-25 url: "https://github.com/superagent-ai/poker-eval"
GitHub Events
Total
- Watch event: 17
- Public event: 1
- Fork event: 2
Last Year
- Watch event: 17
- Public event: 1
- Fork event: 2
Committers
Last synced: 8 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Ismail Pelaseyed | h****p@g****m | 16 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 15
- Average time to close issues: about 11 hours
- Average time to close pull requests: 8 minutes
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 15
- Average time to close issues: about 11 hours
- Average time to close pull requests: 8 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- homanp (4)
Pull Request Authors
- homanp (15)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- npm 2 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
npmjs.org: @superagent-ai/poker-eval
A poker game simulation library for Node.js
- Homepage: https://github.com/superagent-ai/poker-eval#readme
- License: MIT
-
Latest release: 1.0.1
published about 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/setup-node v3 composite
- actions/upload-artifact v3 composite
- @ai-sdk/openai 1.0.4
- @ai-sdk/provider 1.0.1
- @ai-sdk/provider-utils 2.0.2
- @ai-sdk/react 1.0.2
- @ai-sdk/ui-utils 1.0.2
- @opentelemetry/api 1.9.0
- @superagent-ai/poker-eval 1.0.1
- @types/diff-match-patch 1.0.36
- ai 4.0.3
- chalk 5.3.0
- client-only 0.0.1
- diff-match-patch 1.0.5
- dotenv 16.4.5
- eventsource-parser 3.0.0
- js-tokens 4.0.0
- json-schema 0.4.0
- jsondiffpatch 0.6.0
- loose-envify 1.4.0
- nanoid 3.3.7
- poker-ts 1.4.0
- react 18.3.1
- secure-json-parse 2.7.0
- swr 2.2.5
- throttleit 2.1.0
- use-sync-external-store 1.2.2
- zod 3.23.8
- zod-to-json-schema 3.23.5
- @ai-sdk/openai ^1.0.4
- @superagent-ai/poker-eval ^1.0.1
- ai ^4.0.3
- dotenv ^16.4.5
- zod ^3.23.8
- @langchain/core 0.3.19
- @langchain/openai 0.3.14
- @superagent-ai/poker-eval 1.0.1
- @types/node 18.19.66
- @types/node-fetch 2.6.12
- @types/retry 0.12.0
- @types/uuid 10.0.0
- abort-controller 3.0.0
- agentkeepalive 4.5.0
- ansi-styles 5.2.0
- asynckit 0.4.0
- base64-js 1.5.1
- camelcase 6.3.0
- combined-stream 1.0.8
- commander 10.0.1
- decamelize 1.2.0
- delayed-stream 1.0.0
- dotenv 16.4.5
- event-target-shim 5.0.1
- eventemitter3 4.0.7
- form-data 4.0.1
- form-data-encoder 1.7.2
- formdata-node 4.4.1
- humanize-ms 1.2.1
- js-tiktoken 1.0.15
- langsmith 0.2.7
- mime-db 1.52.0
- mime-types 2.1.35
- ms 2.1.3
- mustache 4.2.0
- node-domexception 1.0.0
- node-fetch 2.7.0
- openai 4.73.1
- p-finally 1.0.0
- p-queue 6.6.2
- p-retry 4.6.2
- p-timeout 3.2.0
- poker-ts 1.4.0
- retry 0.13.1
- semver 7.6.3
- tr46 0.0.3
- undici-types 5.26.5
- uuid 10.0.0
- web-streams-polyfill 4.0.0-beta.3
- webidl-conversions 3.0.1
- whatwg-url 5.0.0
- zod 3.23.8
- zod-to-json-schema 3.23.5
- @langchain/core ^0.3.19
- @langchain/openai ^0.3.14
- @superagent-ai/poker-eval ^1.0.1
- dotenv ^16.4.5
- zod ^3.23.8
- 602 dependencies
- @superagent-ai/poker-eval ^1.0.1
- dotenv ^16.4.5
- llamaindex ^0.8.22
- zod ^3.23.8
- 205 dependencies
- @types/node ^22.10.0 development
- tsx ^4.19.2 development
- typescript ^5.7.2 development
- @mastra/core ^0.1.27-alpha.12
- @superagent-ai/poker-eval ^1.0.1
- dotenv ^16.4.5
- @superagent-ai/poker-eval 1.0.1
- @types/node 18.19.66
- @types/node-fetch 2.6.12
- abort-controller 3.0.0
- agentkeepalive 4.5.0
- asynckit 0.4.0
- combined-stream 1.0.8
- delayed-stream 1.0.0
- dotenv 16.4.5
- event-target-shim 5.0.1
- form-data 4.0.1
- form-data-encoder 1.7.2
- formdata-node 4.4.1
- humanize-ms 1.2.1
- mime-db 1.52.0
- mime-types 2.1.35
- ms 2.1.3
- node-domexception 1.0.0
- node-fetch 2.7.0
- openai 4.73.1
- poker-ts 1.4.0
- tr46 0.0.3
- undici-types 5.26.5
- web-streams-polyfill 4.0.0-beta.3
- webidl-conversions 3.0.1
- whatwg-url 5.0.0
- zod 3.23.8
- @superagent-ai/poker-eval ^1.0.1
- dotenv ^16.4.5
- openai ^4.73.1
- zod ^3.23.8
- 282 dependencies
- @types/jest ^29.5.14 development
- @types/node ^18.0.0 development
- jest ^29.7.0 development
- ts-jest ^29.2.5 development
- typescript ^4.9.0 development
- poker-ts ^1.0.0