https://github.com/aaronjs99/intelligent-agents

Comparative analysis of Markov decision processes & intelligent agents

https://github.com/aaronjs99/intelligent-agents

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

bandit-algorithms linear-programming mdp policy-iteration reinforcement-learning value-iteration
Last synced: 6 months ago · JSON representation

Repository

Comparative analysis of Markov decision processes & intelligent agents

Basic Info
  • Host: GitHub
  • Owner: aaronjs99
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 1.68 MB
Statistics
  • Stars: 3
  • Watchers: 3
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
bandit-algorithms linear-programming mdp policy-iteration reinforcement-learning value-iteration
Created over 5 years ago · Last pushed 9 months ago
Metadata Files
Readme License

README.md

Intelligent Agents

A collection of reinforcement learning and intelligent agents projects showcasing implementations of key algorithms and their comparative analysis. These projects were developed as part of the CS747: Foundations of Intelligent and Learning Agents course at IIT Bombay.

Projects Included

Bandits (src/bandits)

Implements and compares classical multi-armed bandit algorithms: ε-greedy, UCB, KL-UCB, and Thompson Sampling, along with a custom variant: Thompson Sampling with a hint. Each algorithm minimizes cumulative regret across different horizons and random seeds.

Key Findings: - Thompson Sampling generally outperforms others in regret minimization. - KL-UCB improves over UCB using a tighter confidence bound via binary search. - ε-Greedy performs best at ε ≈ 0.02 — striking a balance between exploration and exploitation. - The Thompson Sampling "hinted" version leverages knowledge of true means, improving early performance through a custom Beta-distribution-based selector.

Includes regret plots over multiple seeds and horizons, as well as parameter studies.

MDP Maze Solver (src/mdp)

Solves mazes by modeling them as Markov Decision Processes using: - Value Iteration - Linear Programming (via PuLP) - Howard’s Policy Iteration

Pipeline: 1. encoder.py transforms grid mazes into MDPs. 2. solver.py computes optimal policy. 3. decoder.py reconstructs the shortest path using the policy.

Insights: - LP is consistently fastest for large mazes. - Howard's Policy Iteration performs well on small problems but becomes costly as maze complexity grows. - Visual comparisons confirm that solved mazes follow intuitive paths with minimal steps.

Benchmarks for runtime across methods and visualizations for grid navigation are included.

Windy Gridworld (src/windy_gridworld)

Adopts the Sutton & Barto Windy Gridworld challenge with multiple RL approaches: - Sarsa (normal and King’s moves) - Sarsa with stochastic wind - Q-Learning - Expected Sarsa

Key Results: - Sarsa with King’s Moves converges fastest due to shorter episodes. - Q-Learning and Expected Sarsa outperform standard Sarsa on stability and convergence. - The stochastic wind variant adds realistic randomness but slows convergence. - Paths from all agents are visualized for both deterministic and windy environments.

Gridworld is defined as an episodic MDP with reward shaping and stepwise convergence plotting.

🔧 Running Experiments with run.py

Use the run.py script to run all experiments. It acts as a unified launcher for bandits, MDP solving, verification, visualization, and Windy Gridworld tasks.

Enable --verbose to view subprocess outputs and logs in real time.

Bandits

bash python run.py --verbose bandits \ --instance data/bandits/instances/i-1.txt \ --algorithm thompson-sampling \ --rseed 42 \ --epsilon 0.1 \ --horizon 1000

MDP Maze Solver

bash python run.py --verbose solve_mdp \ --grid data/mdp/grids/grid10.txt \ --algorithm pi

To create a synthetic MDP file: bash python run.py --verbose generate_mdp \ --num_states 10 \ --num_actions 5 \ --gamma 0.95 \ --mdptype episodic \ --rseed 42 \ --output_file src/mdp/tmp/generated_mdp.txt

To verify all default grids (10 through 100): bash python run.py --verbose verify_mdp --algorithm vi

To verify specific grids: bash python run.py --verbose verify_mdp \ --algorithm lp \ --grid data/mdp/grids/grid40.txt data/mdp/grids/grid50.txt

To visualize a grid: bash python run.py --verbose visualize_mdp \ --grid_file data/mdp/grids/grid10.txt \ --output_file plots/mdp/grid10_unsolved.png

To visualize a solved grid: bash python run.py --verbose visualize_mdp \ --grid_file data/mdp/grids/grid10.txt \ --path_file data/mdp/paths/path10.txt \ --output_file plots/mdp/grid10_solved.png

Windy Gridworld

bash python run.py --verbose windy \ --episodes 200 \ --epsilon 0.15 \ --discount 0.99 \ --learning-rate 0.5

Command Reference (Summary)

| Command | Description | |------------------|------------------------------------------| | bandits | Run multi-armed bandit experiments | | windy | Run Windy Gridworld RL agents | | generate_mdp | Generate synthetic MDP instance files | | solve_mdp | Solve a maze-based MDP using vi/pi/lp | | verify_mdp | Verify path optimality for maze solvers | | visualize_mdp | Create visual output of MDP grid/paths |

References

License

This project is licensed under the MIT License. See the LICENSE file for details.

Owner

  • Name: Aaron John Sabu
  • Login: aaronjs99
  • Kind: user
  • Location: Los Angeles, California
  • Company: University of California Los Angeles

Mechanical and Aerospace Engineering PhD Candidate | Class of 2027 (hopefully)

GitHub Events

Total
Last Year

Dependencies

requirements.txt pypi
  • black *
  • matplotlib *
  • numpy *
  • pulp *