https://github.com/cvxgrp/vgi

Value-gradient iteration for convex stochastic control

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.4%) to scientific vocabulary

Keywords

approximate-dynamic-programming convex-optimization mpc python

Last synced: 5 months ago · JSON representation

Repository

Value-gradient iteration for convex stochastic control

Basic Info

Host: GitHub
Owner: cvxgrp
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://stanford.edu/~boyd/papers/vgi.html
Size: 675 KB

Statistics

Stars: 9
Watchers: 3
Forks: 2
Open Issues: 0
Releases: 1

Topics

approximate-dynamic-programming convex-optimization mpc python

Created over 2 years ago · Last pushed over 2 years ago

Metadata Files

Readme License

VGI - A method for convex stochastic control

Value-gradient iteration (VGI) is a method for designing policies for convex stochastic control problems characterized by random linear dynamics and convex stage cost. We consider policies that employ quadratic approximate value functions as a substitute for the true value function. Evaluating the associated control policy involves solving a convex problem, typically a quadratic program, which can be carried out reliably in real-time. VGI fits the gradient of the value function with regularization that can include constraints reflecting known bounds on the true value function. Our value-gradient iteration method can yield a good approximate value function with few samples, and little hyperparameter tuning.

For more details, see our manuscript.

To install locally, clone the repository, and run pip install -e . in the repo directory. Optionally, create a pyenv or conda environment first. Note that the examples require additional dependencies,torch and cvxpylayers.

Convex stochastic control

We consider convex stochastic control problems, which have dynamics $$x{t+1} = Atxt + Btut + ct,$$ where $xt$ is the state, $ut$ is the input, and $(At,Bt,c_t)$ may be random (but indpendent in time).

The goal is to minimize the average cost $$J = \lim{T\to\infty}\frac 1 T \sum{t=0}^{T-1} g(xt, ut),$$ where $g$ is a convex stage cost. The stage cost can take on infinite values, to represent constraints on $(xt, ut)$.

We consider approximate dynamic programming (ADP) control policies of the form $$\phi(xt) = \text{argmin}u \left(g(xt, u) + \mathbf{E} \hat V(At xt + Bt u + c_t)\right),$$ where $\hat V$ is a quadratic approximate value function of the form $\hat V(x) = (1/2)x^TPx + p^Tx$. If $\hat V$ is an optimal value function, then the ADP policy is also optimal.

Example

In this example, we have a box-constrained linear quadratic regulator (LQR) problem, with dynamics $$x{t+1} = Axt + But + ct,$$ where $A$ and $B$ are fixed and $ct$ is a zero-mean Gaussian random variable. The stage cost is $$g(xt,ut) = xt^TQxt + ut^TR ut + I(|ut|{\infty} \le u{\max}),$$ where $Q$ and $R$ are positive semidefinite matrices and the last term is an indicator function that encodes the constraint that the entries of $ut$ all have magnitude at most $u{\max}$.

We can initialize a ControlProblem instance with state $xt\in\mathbf{R}^{12}$ and input $ut\in\mathbf{R}^{3}$ with python n = 3 m = 2 problem = BoxLQRProblem.create_problem_instance(n, m, seed=0, processes=5) Adding the extra argument processes=5 lets us run simulations in parallel using 5 processes. By default, the cost is evaluated by simulating eval_trajectories=5 trajectories, each for eval_horizon=2*10**4 steps.

We can get a quadratic lower bound on the optimal value function with python V_lb = problem.V_lb()

To create an ADP policy and MPC policy with 30-step lookahead, we call python policy = problem.create_policy(compile=True, name="box_lqr_policy", V=V_lb) mpc = problem.create_policy(lookahead=30, compile=True, name="box_lqr_policy") Setting the argument compile=True generates a custom solver implementation in C using CVXPYgen.

To find an ADP policy using VGI, we run ```python

initialize VGI method

vgimethod = vgi.VGI( problem, policy, vgi.QuadGradReg(), trajectorylen=50, num_trajectories=1, damping=0.5, )

find ADP policy by running VGI for 20 iterations

adppolicy = vgimethod(20) ```

To simulate the policy for 100 steps and plot the state trajectories, we can run ```python simulation = problem.simulate(adp_policy, 100)

import matplotlib.pyplot as plt plt.plot(simulation.states_matrix) plt.show() ```

To evaluate the average cost of the policy via simulation, we can run python adp_cost = problem.cost(adp_policy)

Defining your own control problems

Examples of control problems can be found in The examples folder. To set up a new control problem, we can inherit the ControlProblem class. For example, to create a linear quadratic regulator (LQR) problem, we might write ```python from vgi import ControlProblem class LQRProblem(ControlProblem):

def __init__(self, A, B, Q, R):
    """Constructor for LQR problem"""
    self.A = A
    self.B = B
    self.Q = Q
    self.R = R
    n, m = B.shape
    super().__init__(n, m)

def step(self, x, u):
    """Dynamics for simulation. Returns next state, noise/observation/measurements, and stage cost"""
    c = np.random.randn(self.n)
    x_next = self.A @ x + self.B @ u + c
    stage_cost = x.T @ self.Q @ x + u.T @ self.R @ u
    return x_next, c, stage_cost

def sample_initial_condition(self):
    return np.random.randn(self.n)

To create a corresponding policy for theLQRProblem, we can create aLQRPolicy, which inherits fromCOCP, the class for convex optimization control policies (COCPs):python import cvxpy as cp from vgi import COCP

class LQRPolicy(COCP): def stagecost(self, x, u): constraints = [] return cp.quadform(x, self.Q) + cp.quad_form(u, self.R), constraints The stage cost function takes in CVXPY variablesxandu, and returns an expression for the stage cost, and any constraints onxandu. The COCP constructor takes the state and control dimensionsnandmas arguments, as well as any additional named parameters, such as the positive semidefinite cost matricesQandR, as well as the dynamics matricesAandB```.

For example, suppose we have an LQR problem with state dimension 3, input dimension 2, and randomly generated dynamics: ```python

problem dimensions

import numpy as np n = 3 m = 2

generate random dynamics matrices

np.random.seed(0) A = np.random.randn(n, n) A /= np.max(np.abs(np.linalg.eigvals(A))) B = np.random.randn(n, m)

mean of c

c = np.zeros(n)

cost parameters

Q = np.eye(n) R = np.eye(m)

controlproblem = LQRProblem(A, B, Q, R) To create an ADP policy with randomly generated quadratic approximate value function,python from vgi import QuadForm Vhat = QuadForm.random(n) adppolicy = LQRPolicy(n, m, Q=Q, R=R, A=A, B=B, c=c, V=Vhat) To compile the policy to a custom solver implementation in C using CVXPYgen, add the argumentcompile=Trueas well as a directory name for the generated code, e.g.name="lqr_policy"```.

To simulate the policy for T steps, run python T = 100 sim = control_problem.simulate(adp_policy, T, seed=0) This yields a Simulation object. Calling sim.states_matrix gives a (T, n) matrix of the visited states.

To evaluate the average cost of the policy via simulation, we can run python adp_cost = control_problem.cost(adp_policy, seed=0) This runs eval_trajectories simulations starting from different randomly sampled initial conditions, each for eval_horizon steps, and returns the average cost. The simulations may optionally be run in parallel.

Those parameters may be set explicitly in the constructor for the control problem. For example, if we construct the LQRProblem as python control_problem = LQRProblem(A, B, Q, R, eval_horizon=1000, eval_trajectories=5, processes=5) then running python adp_cost = control_problem.cost(adp_policy, seed=0) will run 5 simulations in parallel, each for 1000 steps, and return the average cost on those trajectories.

Owner

Name: Stanford University Convex Optimization Group
Login: cvxgrp
Kind: organization
Location: Stanford, CA

Website: www.stanford.edu/~boyd
Repositories: 102
Profile: https://github.com/cvxgrp

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: about 18 hours
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Pull Request Authors

tschm (1)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

.github/workflows/deploy.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite
conda-incubator/setup-miniconda v2 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/setup-python v3 composite
conda-incubator/setup-miniconda v2 composite

requirements.txt pypi

cvxpy *
cvxpygen *
numpy >=1.17.5
pathos *
scikit-learn *
scipy *

setup.py pypi

cvxpy *
cvxpygen *
numpy *
pathos *
scikit-learn *
scipy *

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/cvxgrp/vgi

Science Score: 13.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

VGI - A method for convex stochastic control

Convex stochastic control

Example

initialize VGI method

find ADP policy by running VGI for 20 iterations

Defining your own control problems

problem dimensions

generate random dynamics matrices

mean of c

cost parameters

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies