\documentclass[12pt]{diary}
\title{Active set solver for quadratic programming}
\author{Alec Jacobson}
\date{18 September 2013}

\renewcommand{\A}{\mat{A}}
\renewcommand{\Q}{\mat{Q}}
\newcommand{\RR}{\mat{R}}
\newcommand{\Aeq}{\mat{A}_\textrm{eq}}
\newcommand{\Aieq}{\mat{A}_\textrm{ieq}}
\newcommand{\beq}{\vc{b}_\textrm{eq}}
\newcommand{\bieq}{\vc{b}_\textrm{ieq}}
\newcommand{\lx}{\Bell}
\newcommand{\ux}{\vc{u}}
\newcommand{\lameq} {\lambda_\textrm{eq}}
\newcommand{\lamieq}{\lambda_\textrm{ieq}}
\newcommand{\lamlu}  {\lambda_\textrm{lu}}

\begin{document}

\begin{pullout}
\footnotesize
\emph{Disclaimer: This document rewrites and paraphrases the ideas in
\url{http://www.math.uh.edu/~rohop/fall_06/Chapter3.pdf},
\url{http://www.math.uh.edu/~rohop/fall_06/Chapter2.pdf}, and
\url{http://www.cs.cornell.edu/courses/cs322/2007sp/notes/qr.pdf}. Mostly this
is to put everything in the same place and use notation compatible with the
libigl implementation. The ideas and descriptions, however, are not novel.}
\end{pullout}

Quadratic programming problems (QPs) can be written in general as:
\begin{align}
\argmin \limits_\z &
  \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
\text{subject to } & \Aieq \z ≤ \bieq,
\end{align}
where $\z \in \R^n$ is a vector of unknowns,  $\A \in \R^{n \times n}$ is a (in
our case sparse) matrix of quadratic coefficients, $\b \in \R^n$ is a vector of
linear coefficients, $\Aieq \in \R^{m_\text{ieq} \times n}$ is a matrix (also
sparse) linear inequality coefficients and $\bieq \in \R^{m_\text{ieq}}$ is a
vector of corresponding right-hand sides. Each row in $\Aieq \z ≤ \bieq$
corresponds to a single linear inequality constraint.

Though representable by the linear inequality constraints above---linear
\emph{equality} constraints, constant bounds, and constant fixed values appear
so often that we can write a more practical form
\begin{align}
\argmin \limits_\z &
  \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
\text{subject to } & \z_\text{known} = \y,\\
                   & \Aeq \z = \beq,\\
                   & \Aieq \z ≤ \bieq,\\
                   & \z ≥ \lx,\\
                   & \z ≤ \ux,
\end{align}
where $\z_\text{known} \in \R^{n_\text{known}}$ is a subvector of our unknowns $\z$ which
are known or fixed to obtain corresponding values $\y \in \R^{n_\text{known}}$,
$\Aeq \in \R^{m_\text{eq} \times n}$ and $\beq \in \R^{m_\text{eq} \times n}$
are linear \emph{equality} coefficients and right-hand sides respectively, and
$\lx, \ux \in \R^n$ are vectors of constant lower and upper bound constraints.

\todo{This notation is unfortunate. Too many bold A's and too many capitals.}

This description exactly matches the prototype used by the
\texttt{igl::active\_set()} function.

The active set method works by iteratively treating a subset (some rows) of the
inequality constraints as equality constraints. These are called the ``active
set'' of constraints. So at any given iterations $i$ we might have a new
problem:
\begin{align}
\argmin \limits_\z &
  \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
\text{subject to } & \z_\text{known}^i = \y^i,\\
                   & \Aeq^i \z = \beq^i,
\end{align}
where the active rows from 
$\lx ≤ \z ≤ \ux$ and $\Aieq \z ≤ \bieq$  have been appended into
$\z_\text{known}^i = \y^i$ and $\Aeq^i \z = \beq^i$ respectively.

This may be optimized by solving a sparse linear system, resulting in the
current solution $\z^i$. For equality constraint we can also find a
corresponding Lagrange multiplier value. The active set method works by adding
to the active set all linear inequality constraints which are violated by the
previous solution $\z^{i-1}$ before this solve and then after the solve
removing from the active set any constraints with negative Lagrange multiplier
values. Let's declare that $\lameq \in \R^{m_\text{eq}}$, $\lamieq \in
\R^{m_\text{ieq}}$, and $\lamlu \in \R^{n}$ are the Lagrange multipliers
corresponding to the linear equality, linear inequality and constant bound
constraints respectively. Then the abridged active set method proceeds as
follows:

\begin{lstlisting}[keywordstyle=,mathescape]
while not converged
  add to active set all rows where $\Aieq \z > \bieq$, $\z < \lx$ or $\z > \ux$
    $\Aeq^i,\beq^i \leftarrow \Aeq,\beq + \text{active rows of} \Aieq,\bieq$
    $\z_\text{known}^i,\y^i \leftarrow \z_\text{known},\y + \text{active indices and values of} \lx,\ux$
  solve problem treating active constraints as equality constraints $\rightarrow \z,\lamieq,\lamlu$
  remove from active set all rows with $\lamieq < 0$ or $\lamlu < 0$
end
\end{lstlisting}

The fixed values constraints of $\z_\text{known}^i = \y^i$ may be enforced by
substituting $\z_\text{known}^i$ for $\y^i$ in the energy directly during the
solve.  Corresponding Lagrange multiplier values $\lambda_\text{known}^i$ can
be recovered after we've found the rest of $\z$.

The linear equality constraints $\Aeq^i \z = \beq^i$ are a little trickier. If
the rows of 
$\Aeq^i \in \R^{(<m_\text{eq}+ m_\text{ieq}) \times n}$ are linearly
independent then it is straightforward how to build a Lagrangian which enforces
each constraint. This results in solving a system roughly of the form:
\begin{equation}
\left(
\begin{array}{cc}
\A      & {\Aeq^i}^\transpose \\
\Aeq^i  & \mat{0}
\end{array}
\right)
\left(
\begin{array}{l}
\z\\\lambda^i_\text{eq}
\end{array}
\right)
=
\left(
\begin{array}{l}
-\onehalf\b\\
-\beq^i
\end{array}
\right)
\end{equation}
\todo{Double check. Could be missing a sign or factor of 2 here.}

If the rows of $\Aeq^i$ are linearly dependent then the system matrix above
will be singular. Because we may always assume that the constraints are not
contradictory, this system can still be solved. But it will take some care.
Some linear solvers, e.g.\ \textsc{MATLAB}'s, seem to deal with these OK.
\textsc{Eigen}'s does not.

Without loss of generality, let us assume that there are no inequality
constraints and it's the rows of $\Aeq$ which might be linearly dependent.
Let's also assume we have no fixed or known values. Then the linear system
above corresponds to the following optimization problem:
\begin{align}
\argmin \limits_\z &
  \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
\text{subject to } & \Aeq \z = \beq.
\end{align}
For the sake of cleaner notation, let $m = m_\text{eq}$ so that $\Aeq \in \R^{m
\times n}$ and $\beq \in \R^m$.

We can construct the null space of the constraints by computing a QR
decomposition of $\Aeq^\transpose$:
\begin{equation}
\Aeq^\transpose \P = \Q \RR =
\left(\begin{array}{cc}
\Q_1 & \Q_2
\end{array}\right)
\left(\begin{array}{c}
\RR\\
\mat{0}
\end{array}\right)=
\Q_1 \RR
\end  {equation}
where $\P \in \R^{m \times m}$ is a sparse permutation matrix, $\Q \in \R^{n
\times n}$ is orthonormal, $\RR \in \R^{n \times m}$ is upper triangular. Let
$r$ be the row rank of $\Aeq$---the number of linearly independent rows---then
we split $\Q$ and $\RR$ into $\Q_1 \in \R^{n \times r}$, $\Q_2 \in \R^{n \times
n-r}$, and $\R_1 \in \RR^{r \times m}$.

Notice that 
\begin{align}
\Aeq \z &= \beq \\
\P \P^\transpose \Aeq \z &= \\
\P \left(\Aeq^\transpose \P\right)^\transpose \z &= \\
\P\left(\Q\RR\right)^\transpose \z &= \\
\P \RR^\transpose \Q^\transpose \z &= \\
\P \RR^\transpose \Q \z &= \\
\P \left(\RR_1^\transpose \mat{0}\right)
  \left(\begin{array}{c}
  \Q_1^\transpose\\
  \Q_2^\transpose
  \end{array}\right) \z &=\\
\P \RR_1^\transpose \Q_1^\transpose \z + \P \0 \Q_2^\transpose \z &= .
\end{align}
Here we see that $\Q_1^\transpose \z$ affects this equality but
$\Q_2^\transpose \z$ does not. Thus
we say that $\Q_2$ forms a basis that spans the null space of $\Aeq$. That is,
$\Aeq \Q_2 \w_2 =
\vc{0},
\forall \w_2 \in \R^{n-r}$. Let's write 
\begin{equation}
\z = \Q_1 \w_1 + \Q_2 \w_2.
\end{equation}
We're only interested in solutions $\z$ which satisfy our equality constraints
$\Aeq \z = \beq$. If we plug in the above then we get
\begin{align}
\P \Aeq \left(\Q_1 \w_1 + \Q_2 \w_2\right) &= \beq \\
\P \Aeq \Q_1 \w_1 &= \beq \\
\P \left(\RR^\transpose \quad
\mat{0}\right)\left(\begin{array}{c}\Q_1^\transpose\\
\Q_2^\transpose\end{array}\right) \Q_1 \w_1 &= \beq \\
\P \RR^\transpose \Q_1^\transpose \Q_1 \w_1 &= \beq \\
\P \RR^\transpose \w_1 &= \beq \\
\w_1 &= {\RR^\transpose}^{-1} \P^\transpose \beq.
\end{align}
% P RT Q1T z = Beq
% RT Q1T z = PT Beq
% Q1T z = RT \ PT Beq
% w1 = RT \ PT * Beq
% z = Q2 w2 + Q1 * w1 
% z = Q2 w2 + Q1 * RT \ P * Beq
% z = Q2 w2 + lambda0
So then
\begin{align}
z &= \Q_1 {\RR^\transpose}^{-1} \P^\transpose \beq  + \Q_2 \w_2 \\
z &= \Q_2 \w_2 + \overline{\w}_1\\
\end{align}
where $\overline{\w}_1$ is just collecting the known terms that do not depend on the
unknowns $\w_2$.

Now, by construction, $\Q_2 \w_2 + \overline{\w}_1$ spans (all) possible solutions
for $\z$  that satisfy $\Aeq \z = \beq$
% Let:
% z = Q2 w2 + Q1 * w1 
% z = Q2 w2 + Q1 * RT \ P * Beq
% z = Q2 w2 + lambda0
% Q2 w2 + lambda0 spans all z such that Aeq z = Beq, we have essentially
% "factored out" the constraints. We can then simply make this substitution
% into our original optimization problem, leaving an \emph{unconstrained}
% quadratic energy optimization
\begin{align}
\argmin \limits_{\w_2} & (\Q_2 \w_2 + \overline{\w}_1)^\transpose \A (\Q_2 \w_2 + \overline{\w}_1)  +
(\Q_2 \w_2 + \overline{\w}_1)^\transpose \b + \textrm{ constant}\\
&\text{\emph{or equivalently}} \notag\\
\argmin \limits_{\w_2} & 
\w_2^\transpose \Q_2^\transpose \A \Q_2 + \w_2^\transpose \left(2\Q_2^\transpose \A
\Q_2 \overline{\w}_1 + \Q_2^\transpose \b\right) + \textrm{ constant}
\end{align}
which is solved with a linear system solve
\begin{equation}
\left(\Q_2^\transpose \A \Q_2\right) \w_2 = 
  -2\Q_2^\transpose \A \Q_2 \overline{\w}_1 + \Q_2^\transpose \b
\end{equation}
\todo{I've almost surely lost a sign or factor of 2 here.}
If $\A$ is semi-positive definite then $\Q_2^\transpose\A\Q_2$ will also be
semi-positive definite. Thus Cholesky factorization can be used. If $\A$ is
sparse and $\Aeq$ is sparse and a good column pivoting sparse QR library is
used then $\Q_2$ will be sparse and so will $\Q_2^\transpose\A\Q_2$.


After $\w_2$ is known we can compute the solution $\z$ as 
\begin{equation}
\z = \Q_2 \w_2 + \overline{\w}_1.
\end{equation}

Recovering the Lagrange multipliers $\lameq \in \R^{m}$ corresponding to $\Aeq
\z = \b$ is a bit trickier. The equation is:
\begin{align}
\left(\A\quad \Aeq^\transpose\right)
\left(\begin{array}{c}
\z\\
\lameq
\end{array}\right) &= -\onehalf\b \\
\Aeq^\transpose \lameq &= - \A \z - \onehalf\b \\
\Q_1^\transpose \Aeq^\transpose \lameq &= - \Q_1^\transpose \A \z - \onehalf\Q_1^\transpose \b \\
% AT P = Q_1 R
% AT = Q_1 R PT
%
% Q1T AT
% Q1T Q_1 R PT
% R PT
& \textrm{\emph{Recall that $\Aeq^\transpose\P = \Q_1\RR$ so then $
\Q_1^\transpose \Aeq^\transpose = \RR \P^\transpose$}}\notag\\
\RR \P^\transpose \lameq &= - \Q_1^\transpose \A \z - \onehalf\Q_1^\transpose \b \\
\lameq &= \P
\RR^{-1}
\left(-\Q_1^\transpose \A \z - \onehalf\Q_1^\transpose \b\right).
\end{align}

\alec{If $r<m$ then I think it is enough to define the permutation matrix as
rectangular $\P \in \R^{m \times r}$.}

We assumes that there were now known or fixed values. If there are, special
care only needs to be taken to adjust $\Aeq \z = \beq$ to account for
substituting $\z_\text{known}$ with $\y$. 

\hr
\clearpage

The following table describes the translation between the entities described in
this document and those used in \texttt{igl/active\_set} and
\texttt{igl/min\_quad\_with\_fixed}.
\begin{lstlisting}[keywordstyle=,mathescape]
$\A$: A
$\b$: B
$\z$: Z
$\Aeq$: Aeq
$\Aieq$: Aieq
$\beq$: Beq
$\bieq$: Bieq
$\lx$: lx
$\ux$: ux
$\lamieq$: ~Lambda_Aieq_i
$\lamlu$: ~Lambda_known_i
$\P$: AeqTE
$\Q$: AeqTQ
$\RR$: AeqTR1
$\RR^\transpose$: AeqTR1T
$\Q_1$: AeqTQ1
$\Q_1^\transpose$: AeqTQ1T
$\Q_2$: AeqTQ2
$\Q_2^\transpose$: AeqTQ2T
$\w_2$: lambda
$\overline{\w}_1$: lambda_0
\end{lstlisting}


% Resources
%  http://www.cs.cornell.edu/courses/cs322/2007sp/notes/qr.pdf
%  http://www.math.uh.edu/~rohop/fall_06/Chapter2.pdf
%  http://www.math.uh.edu/~rohop/fall_06/Chapter3.pdf


\end{document}