https://github.com/aaronjs99/intelligent-agents

Comparative analysis of Markov decision processes & intelligent agents

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary

Keywords

bandit-algorithms linear-programming mdp policy-iteration reinforcement-learning value-iteration

Last synced: 6 months ago · JSON representation

Repository

Comparative analysis of Markov decision processes & intelligent agents

Basic Info

Host: GitHub
Owner: aaronjs99
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 1.68 MB

Statistics

Stars: 3
Watchers: 3
Forks: 1
Open Issues: 0
Releases: 0

Topics

bandit-algorithms linear-programming mdp policy-iteration reinforcement-learning value-iteration

Created over 5 years ago · Last pushed 9 months ago

Metadata Files

Readme License

Intelligent Agents

A collection of reinforcement learning and intelligent agents projects showcasing implementations of key algorithms and their comparative analysis. These projects were developed as part of the CS747: Foundations of Intelligent and Learning Agents course at IIT Bombay.

Projects Included

Bandits (`src/bandits`)

Implements and compares classical multi-armed bandit algorithms: ε-greedy, UCB, KL-UCB, and Thompson Sampling, along with a custom variant: Thompson Sampling with a hint. Each algorithm minimizes cumulative regret across different horizons and random seeds.

Key Findings: - Thompson Sampling generally outperforms others in regret minimization. - KL-UCB improves over UCB using a tighter confidence bound via binary search. - ε-Greedy performs best at ε ≈ 0.02 — striking a balance between exploration and exploitation. - The Thompson Sampling "hinted" version leverages knowledge of true means, improving early performance through a custom Beta-distribution-based selector.

Includes regret plots over multiple seeds and horizons, as well as parameter studies.

MDP Maze Solver (`src/mdp`)

Solves mazes by modeling them as Markov Decision Processes using: - Value Iteration - Linear Programming (via PuLP) - Howard’s Policy Iteration

Pipeline: 1. encoder.py transforms grid mazes into MDPs. 2. solver.py computes optimal policy. 3. decoder.py reconstructs the shortest path using the policy.

Insights: - LP is consistently fastest for large mazes. - Howard's Policy Iteration performs well on small problems but becomes costly as maze complexity grows. - Visual comparisons confirm that solved mazes follow intuitive paths with minimal steps.

Benchmarks for runtime across methods and visualizations for grid navigation are included.

Windy Gridworld (`src/windy_gridworld`)

Adopts the Sutton & Barto Windy Gridworld challenge with multiple RL approaches: - Sarsa (normal and King’s moves) - Sarsa with stochastic wind - Q-Learning - Expected Sarsa

Key Results: - Sarsa with King’s Moves converges fastest due to shorter episodes. - Q-Learning and Expected Sarsa outperform standard Sarsa on stability and convergence. - The stochastic wind variant adds realistic randomness but slows convergence. - Paths from all agents are visualized for both deterministic and windy environments.

Gridworld is defined as an episodic MDP with reward shaping and stepwise convergence plotting.

🔧 Running Experiments with `run.py`

Use the run.py script to run all experiments. It acts as a unified launcher for bandits, MDP solving, verification, visualization, and Windy Gridworld tasks.

Enable --verbose to view subprocess outputs and logs in real time.

Bandits

bash python run.py --verbose bandits \ --instance data/bandits/instances/i-1.txt \ --algorithm thompson-sampling \ --rseed 42 \ --epsilon 0.1 \ --horizon 1000

MDP Maze Solver

bash python run.py --verbose solve_mdp \ --grid data/mdp/grids/grid10.txt \ --algorithm pi

To create a synthetic MDP file: bash python run.py --verbose generate_mdp \ --num_states 10 \ --num_actions 5 \ --gamma 0.95 \ --mdptype episodic \ --rseed 42 \ --output_file src/mdp/tmp/generated_mdp.txt

To verify all default grids (10 through 100): bash python run.py --verbose verify_mdp --algorithm vi

To verify specific grids: bash python run.py --verbose verify_mdp \ --algorithm lp \ --grid data/mdp/grids/grid40.txt data/mdp/grids/grid50.txt

To visualize a grid: bash python run.py --verbose visualize_mdp \ --grid_file data/mdp/grids/grid10.txt \ --output_file plots/mdp/grid10_unsolved.png

To visualize a solved grid: bash python run.py --verbose visualize_mdp \ --grid_file data/mdp/grids/grid10.txt \ --path_file data/mdp/paths/path10.txt \ --output_file plots/mdp/grid10_solved.png

Windy Gridworld

bash python run.py --verbose windy \ --episodes 200 \ --epsilon 0.15 \ --discount 0.99 \ --learning-rate 0.5

Command Reference (Summary)

| Command | Description | |------------------|------------------------------------------| | bandits | Run multi-armed bandit experiments | | windy | Run Windy Gridworld RL agents | | generate_mdp | Generate synthetic MDP instance files | | solve_mdp | Solve a maze-based MDP using vi/pi/lp | | verify_mdp | Verify path optimality for maze solvers | | visualize_mdp | Create visual output of MDP grid/paths |

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

Owner

Name: Aaron John Sabu
Login: aaronjs99
Kind: user
Location: Los Angeles, California
Company: University of California Los Angeles

Website: sites.google.com/view/aaronjs
Repositories: 3
Profile: https://github.com/aaronjs99

Mechanical and Aerospace Engineering PhD Candidate | Class of 2027 (hopefully)

GitHub Events

Total

Last Year

Dependencies

requirements.txt pypi

black *
matplotlib *
numpy *
pulp *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aaronjs99/intelligent-agents

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Intelligent Agents

Projects Included

Bandits (`src/bandits`)

MDP Maze Solver (`src/mdp`)

Windy Gridworld (`src/windy_gridworld`)

🔧 Running Experiments with `run.py`

Bandits

MDP Maze Solver

Windy Gridworld

Command Reference (Summary)

References

License

Owner

GitHub Events

Total

Last Year

Dependencies

https://github.com/aaronjs99/intelligent-agents

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Intelligent Agents

Projects Included

Bandits (src/bandits)

MDP Maze Solver (src/mdp)

Windy Gridworld (src/windy_gridworld)

🔧 Running Experiments with run.py

Bandits

MDP Maze Solver

Windy Gridworld

Command Reference (Summary)

References

License

Owner

GitHub Events

Total

Last Year

Dependencies

Bandits (`src/bandits`)

MDP Maze Solver (`src/mdp`)

Windy Gridworld (`src/windy_gridworld`)

🔧 Running Experiments with `run.py`