Inverse LQR
Date:
Problem Statement
Given a trajectory of states and actions ${\mathbf{x}^*_{0:T},\mathbf{u}^*_{0:T-1}}$ that is solution of Linear Quadratic Regulator optimization where $\mathbf{R}=\mathbf{I}$
from the following system of linear discrete dynamics:
$\mathbf{x}_{t+1}= \mathbf{A}_t\mathbf{x}_t + \mathbf{B}_t\mathbf{u}_t$
where:
$\mathbf{x}_t\in\mathbb{R}^{n}$ is the state,
$\mathbf{u}_t\in\mathbb{R}^{m}$, is the action,
$\mathbf{A}_t\in\mathbb{R}^{n\times n}$ is the state-transition matrix,
$\mathbf{B}_t\in\mathbb{R}^{n\times m}$ is the input-transition matrix
the problem is to find $\mathbf{Q}^*$ that generates the trajectory
Finite Horizon Cost
For the cost parameters $\mathbf{Q}, \mathbf{R}$, the following can be stated:
Instantaneous Cost
$C(\mathbf{x}_t, \mathbf{u}_t)= \mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t$
Cumulative cost
$J(\mathbf{x}_{0:T},\mathbf{u}_{0:T-1})=\mathbf{x}_T^T\mathbf{Q}\mathbf{x}_T+\sum_{t=0}^{T-1}\big[\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t\big]$ where $\mathbf{Q}\geq\mathbf{0}; \mathbf{R}>0$
Cost-to-go
$V^*(\mathbf{x}_t)=\min_{\mathbf{u}_{t:T}}\sum_t^T \Big[\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t\Big]$
$\qquad\quad=\min_{\mathbf{u}_t}\Big[\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t + \mathbf{u}_t^T\mathbf{R}\mathbf{u}_t+V^*(\mathbf{x}_{t+1})\Big]$
Feedback Law
For the cost parameters $\mathbf{Q}, \mathbf{R}$, the feedback law is given as:
$\mathbf{u}^*_t=-(\mathbf{R} + \mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{B}_t)^{-1}\mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{A}_t\mathbf{x}_t$
where
$\mathbf{P}_t = \mathbf{Q}+ \mathbf{A}^T\mathbf{P}_{t+1}\mathbf{A} -\mathbf{A}^T_{t}\mathbf{P}_{t+1}\mathbf{B}_t(\mathbf{R} + \mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{B}_t)^{-1}\mathbf{B}^T_t\mathbf{P}_{t+1}\mathbf{A}_t \qquad\dots$ Algebraic Ricatti Equation
Applying Pontryagin’s Minimum Principle (PMP)
Let $H_t(\mathbf{x}_t, \mathbf{u}_t, \mathbf{\lambda}_{t+1})=\frac{1}{2}\mathbf{x}_t^T\mathbf{Q}\mathbf{x}_t+\frac{1}{2}\mathbf{u}_t^T\mathbf{R}\mathbf{u}_t + \mathbf{\lambda}_{t+1}^T(\mathbf{A}\mathbf{x}_t+\mathbf{B}\mathbf{u}_t)$
Given that the trajectory ${\mathbf{x}^*_t}_{t=0}^T$ minimizes the cumulative cost $J$, the following are satisfied:
- $\mathbf{x}^*_{t+1}=\mathbf{A}\mathbf{x}^*_t + \mathbf{B}\mathbf{u}^*_t\quad\quad\quad\dots\quad (1)$
- $\nabla_{\mathbf{u}_t}H_t(\mathbf{x}^*_t, \mathbf{u}^*_t, \mathbf{\lambda}_{t+1})=0$
$\implies \mathbf{R}\mathbf{u}^*_t + \mathbf{B}^T\mathbf{\lambda}_{t+1}=0\quad\dots\quad (2)$ - $\nabla_{\mathbf{x}_t}H_t(\mathbf{x}^*_t, \mathbf{u}^*_t, \mathbf{\lambda}_{t+1})=\mathbf{\lambda}_t$
$\implies \mathbf{Q}\mathbf{x}^*_t + \mathbf{A}^T\mathbf{\lambda}_{t+1}=\mathbf{\lambda}_{t}\quad\dots\quad (3)$
Assuming $\mathbf{R}=\mathbf{I}$, we have:
$\mathbf{u}^*_t = -\mathbf{B}^T\mathbf{\lambda}_{t+1}\quad\qquad\qquad\qquad\qquad\quad\dots\quad(4)$
(4) into (1) gives us:
$\mathbf{x}^*_{t+1}=\mathbf{A}\mathbf{x}^*_t - \mathbf{B}\mathbf{B}^T\mathbf{\lambda}_{t+1}\quad\qquad\qquad\quad\dots\quad(5)$
$\implies\mathbf{B}\mathbf{B}^T\mathbf{\lambda}_{t+1}=\mathbf{A}\mathbf{x}^*_{t}-\mathbf{x}^*_{t+1}$
$\implies\mathbf{\lambda}_{t+1}=(\mathbf{B}\mathbf{B}^T)^{-1}(\mathbf{A}\mathbf{x}^*_{t}-\mathbf{x}^*_{t+1})\quad\dots\quad(6)$
Solution
Algorithm for exact solution
- Find $\mathbf{\lambda}_t=(\mathbf{B}\mathbf{B}^T)^{-1}(\mathbf{A}\mathbf{x}^*_{t-1}-\mathbf{x}^*_{t}); \qquad t=1, \dots, T$
- Calculate $\mathbf{\gamma}_t=\mathbf{\lambda}_{t}-\mathbf{A}^T\mathbf{\lambda}_{t+1}; \qquad\qquad t=1, \dots, T$
- $\mathbf{Q}\mathbf{x}^*_t=\mathbf{\gamma}_t$
$\implies\mathbf{Q}\mathbf{X}^*=\mathbf{\Gamma}$
$\implies\mathbf{Q}=\mathbf{\Gamma}{\mathbf{X}^*}^{T}(\mathbf{X}^*{\mathbf{X}^*}^{T})^{-1}$,
where $\mathbf{X}^*=[\mathbf{x}^*_1, \dots, \mathbf{x}^*_T]$,
$\mathbf{\Gamma}=[\mathbf{\gamma}_1, \dots, \mathbf{\gamma}_T]$
Soft IOC Solution
Assuming $\mathcal{H}\equiv{0\leq t<T: \mathbf{u}_t \in \text{int}\ \mathcal{U}}$.
the soft IOC solution is given by:
$\inf_{\mathbf{\theta},\mathbf{\lambda}_{0,T}} \sum_{t_k\in\mathcal{H}}\Vert\nabla_{\mathbf{u}_t}H_t(\mathbf{x}^*_t,\mathbf{u}^*_t,\mathbf{\lambda}_{t+1},\mathbf{\theta})\Vert^2 + \sum_0^{T-1}\Vert\mathbf{\lambda}_t-\nabla_{\mathbf{x}_t}H_t(\mathbf{x}^*_t, \mathbf{u}^*_t,\mathbf{\lambda}_{t+1},\mathbf{\theta})\Vert^2+\Vert\mathbf{\lambda}_T-\nabla_{\mathbf{x}_T}F(\mathbf{x}^*_T)\Vert^2$
$=\inf_{\mathbf{Q},\mathbf{\lambda}_{0,T}} \sum_{t_k\in\mathcal{H}}\Vert\mathbf{u}^*_t + \mathbf{B}^T\mathbf{\lambda}_{t+1}\Vert^2 + \sum_0^{T-1}\Vert\mathbf{\lambda}_t-\mathbf{Q}\mathbf{x}^*_t - \mathbf{A}^T\mathbf{\lambda}_{t+1}\Vert^2+\Vert\mathbf{\lambda}_T-\mathbf{Q}\mathbf{x}^*_T\Vert^2$