Inverse LQR

Date:

Problem Statement

Given a trajectory of states and actions ${\mathbf{x}^*_{0:T},\mathbf{u}^*_{0:T-1}}$ that is solution of Linear Quadratic Regulator optimization where $\mathbf{R}=\mathbf{I}$

from the following system of linear discrete dynamics:
$\mathbf{x}_{t+1}= \mathbf{A}_t\mathbf{x}_t + \mathbf{B}_t\mathbf{u}_t$

where:
$\mathbf{x}_t\in\mathbb{R}^{n}$ is the state,
$\mathbf{u}_t\in\mathbb{R}^{m}$, is the action,
$\mathbf{A}_t\in\mathbb{R}^{n\times n}$ is the state-transition matrix,
$\mathbf{B}_t\in\mathbb{R}^{n\times m}$ is the input-transition matrix

the problem is to find $\mathbf{Q}^*$ that generates the trajectory

Finite Horizon Cost

For the cost parameters $\mathbf{Q}, \mathbf{R}$, the following can be stated:

Instantaneous Cost

$C(\mathbf{x}_t, \mathbf{u}_t)= \mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t$

Cumulative cost

$J(\mathbf{x}_{0:T},\mathbf{u}_{0:T-1})=\mathbf{x}_T^T\mathbf{Q}\mathbf{x}_T+\sum_{t=0}^{T-1}\big[\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t\big]$ where $\mathbf{Q}\geq\mathbf{0}; \mathbf{R}>0$

Cost-to-go

$V^*(\mathbf{x}_t)=\min_{\mathbf{u}_{t:T}}\sum_t^T \Big[\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t\Big]$

$\qquad\quad=\min_{\mathbf{u}_t}\Big[\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t+V^*(\mathbf{x}_{t+1})\Big]$

Feedback Law

For the cost parameters $\mathbf{Q}, \mathbf{R}$, the feedback law is given as:

$\mathbf{u}^*_t=-(\mathbf{R} + \mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{B}_t)^{-1}\mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{A}_t\mathbf{x}_t$

where
$\mathbf{P}_t = \mathbf{Q}+ \mathbf{A}^T\mathbf{P}_{t+1}\mathbf{A} -\mathbf{A}^T_{t}\mathbf{P}_{t+1}\mathbf{B}_t(\mathbf{R} + \mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{B}_t)^{-1}\mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{A}_t \qquad\dots$ Algebraic Ricatti Equation

Applying Pontryagin’s Minimum Principle (PMP)

Let $H_t(\mathbf{x}_t, \mathbf{u}_t, \mathbf{\lambda}_{t+1})=\frac{1}{2}\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t+\frac{1}{2}\mathbf{u}_t^T\mathbf{R}\mathbf{u}_t + \mathbf{\lambda}_{t+1}^T(\mathbf{A}\mathbf{x}_t+\mathbf{B}\mathbf{u}_t)$

Given that the trajectory ${\mathbf{x}^*_t}_{t=0}^T$ minimizes the cumulative cost $J$, the following are satisfied:

  1. $\mathbf{x}^*_{t+1}=\mathbf{A}\mathbf{x}^*_t + \mathbf{B}\mathbf{u}^*_t\quad\quad\quad\dots\quad (1)$
  2. $\nabla_{\mathbf{u}_t}H_t(\mathbf{x}^*_t, \mathbf{u}^*_t, \mathbf{\lambda}_{t+1})=0$
    $\implies \mathbf{R}\mathbf{u}^*_t + \mathbf{B}^T\mathbf{\lambda}_{t+1}=0\quad\dots\quad (2)$
  3. $\nabla_{\mathbf{x}_t}H_t(\mathbf{x}^*_t, \mathbf{u}^*_t, \mathbf{\lambda}_{t+1})=\mathbf{\lambda}_t$
    $\implies \mathbf{Q}\mathbf{x}^*_t + \mathbf{A}^T\mathbf{\lambda}_{t+1}=\mathbf{\lambda}_{t}\quad\dots\quad (3)$

Assuming $\mathbf{R}=\mathbf{I}$, we have:
$\mathbf{u}^*_t = -\mathbf{B}^T\mathbf{\lambda}_{t+1}\quad\qquad\qquad\qquad\qquad\quad\dots\quad(4)$
(4) into (1) gives us:
$\mathbf{x}^*_{t+1}=\mathbf{A}\mathbf{x}^*_t - \mathbf{B}\mathbf{B}^T\mathbf{\lambda}_{t+1}\quad\qquad\qquad\quad\dots\quad(5)$
$\implies\mathbf{B}\mathbf{B}^T\mathbf{\lambda}_{t+1}=\mathbf{A}\mathbf{x}^*_{t}-\mathbf{x}^*_{t+1}$
$\implies\mathbf{\lambda}_{t+1}=(\mathbf{B}\mathbf{B}^T)^{-1}(\mathbf{A}\mathbf{x}^*_{t}-\mathbf{x}^*_{t+1})\quad\dots\quad(6)$

Solution

Algorithm for exact solution

  1. Find $\mathbf{\lambda}_t=(\mathbf{B}\mathbf{B}^T)^{-1}(\mathbf{A}\mathbf{x}^*_{t-1}-\mathbf{x}^*_{t}); \qquad t=1, \dots, T$
  2. Calculate $\mathbf{\gamma}_t=\mathbf{\lambda}_{t}-\mathbf{A}^T\mathbf{\lambda}_{t+1}; \qquad\qquad t=1, \dots, T$
  3. $\mathbf{Q}\mathbf{x}^*_t=\mathbf{\gamma}_t$
    $\implies\mathbf{Q}\mathbf{X}^*=\mathbf{\Gamma}$
    $\implies\mathbf{Q}=\mathbf{\Gamma}{\mathbf{X}^*}^{T}(\mathbf{X}^*{\mathbf{X}^*}^{T})^{-1}$,
    where $\mathbf{X}^*=[\mathbf{x}^*_1, \dots, \mathbf{x}^*_T]$,
    $\mathbf{\Gamma}=[\mathbf{\gamma}_1, \dots, \mathbf{\gamma}_T]$

Soft IOC Solution

Assuming $\mathcal{H}\equiv{0\leq t<T: \mathbf{u}_t \in \text{int}\ \mathcal{U}}$.
the soft IOC solution is given by:

$\inf_{\mathbf{\theta},\mathbf{\lambda}_{0,T}} \sum_{t_k\in\mathcal{H}}\Vert\nabla_{\mathbf{u}_t}H_t(\mathbf{x}^*_t,\mathbf{u}^*_t,\mathbf{\lambda}_{t+1},\mathbf{\theta})\Vert^2 + \sum_0^{T-1}\Vert\mathbf{\lambda}_t-\nabla_{\mathbf{x}_t}H_t(\mathbf{x}^*_t, \mathbf{u}^*_t,\mathbf{\lambda}_{t+1},\mathbf{\theta})\Vert^2+\Vert\mathbf{\lambda}_T-\nabla_{\mathbf{x}_T}F(\mathbf{x}^*_T)\Vert^2$
$=\inf_{\mathbf{Q},\mathbf{\lambda}_{0,T}} \sum_{t_k\in\mathcal{H}}\Vert\mathbf{u}^*_t + \mathbf{B}^T\mathbf{\lambda}_{t+1}\Vert^2 + \sum_0^{T-1}\Vert\mathbf{\lambda}_t-\mathbf{Q}\mathbf{x}^*_t - \mathbf{A}^T\mathbf{\lambda}_{t+1}\Vert^2+\Vert\mathbf{\lambda}_T-\mathbf{Q}\mathbf{x}^*_T\Vert^2$