active-set.tex 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326
  1. \documentclass[12pt]{diary}
  2. \title{Active set solver for quadratic programming}
  3. \author{Alec Jacobson}
  4. \date{18 September 2013}
  5. \renewcommand{\A}{\mat{A}}
  6. \renewcommand{\Q}{\mat{Q}}
  7. \newcommand{\RR}{\mat{R}}
  8. \newcommand{\Aeq}{\mat{A}_\textrm{eq}}
  9. \newcommand{\Aieq}{\mat{A}_\textrm{ieq}}
  10. \newcommand{\beq}{\vc{b}_\textrm{eq}}
  11. \newcommand{\bieq}{\vc{b}_\textrm{ieq}}
  12. \newcommand{\lx}{\Bell}
  13. \newcommand{\ux}{\vc{u}}
  14. \newcommand{\lameq} {\lambda_\textrm{eq}}
  15. \newcommand{\lamieq}{\lambda_\textrm{ieq}}
  16. \newcommand{\lamlu} {\lambda_\textrm{lu}}
  17. \begin{document}
  18. \begin{pullout}
  19. \footnotesize
  20. \emph{Disclaimer: This document rewrites and paraphrases the ideas in
  21. \url{http://www.math.uh.edu/~rohop/fall_06/Chapter3.pdf},
  22. \url{http://www.math.uh.edu/~rohop/fall_06/Chapter2.pdf}, and
  23. \url{http://www.cs.cornell.edu/courses/cs322/2007sp/notes/qr.pdf}. Mostly this
  24. is to put everything in the same place and use notation compatible with the
  25. \textsc{libigl} implementation. The ideas and descriptions, however, are not novel.}
  26. \end{pullout}
  27. Quadratic programming problems (QPs) can be written in general as:
  28. \begin{align}
  29. \argmin \limits_\z &
  30. \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
  31. \text{subject to } & \Aieq \z ≤ \bieq,
  32. \end{align}
  33. where $\z \in \R^n$ is a vector of unknowns, $\A \in \R^{n \times n}$ is a (in
  34. our case sparse) matrix of quadratic coefficients, $\b \in \R^n$ is a vector of
  35. linear coefficients, $\Aieq \in \R^{m_\text{ieq} \times n}$ is a matrix (also
  36. sparse) linear inequality coefficients and $\bieq \in \R^{m_\text{ieq}}$ is a
  37. vector of corresponding right-hand sides. Each row in $\Aieq \z ≤ \bieq$
  38. corresponds to a single linear inequality constraint.
  39. Though representable by the linear inequality constraints above---linear
  40. \emph{equality} constraints, constant bounds, and constant fixed values appear
  41. so often that we can write a more practical form
  42. \begin{align}
  43. \argmin \limits_\z &
  44. \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
  45. \text{subject to } & \z_\text{known} = \y,\\
  46. & \Aeq \z = \beq,\\
  47. & \Aieq \z ≤ \bieq,\\
  48. & \z ≥ \lx,\\
  49. & \z ≤ \ux,
  50. \end{align}
  51. where $\z_\text{known} \in \R^{n_\text{known}}$ is a subvector of our unknowns $\z$ which
  52. are known or fixed to obtain corresponding values $\y \in \R^{n_\text{known}}$,
  53. $\Aeq \in \R^{m_\text{eq} \times n}$ and $\beq \in \R^{m_\text{eq} \times n}$
  54. are linear \emph{equality} coefficients and right-hand sides respectively, and
  55. $\lx, \ux \in \R^n$ are vectors of constant lower and upper bound constraints.
  56. \todo{This notation is unfortunate. Too many bold A's and too many capitals.}
  57. This description exactly matches the prototype used by the
  58. \texttt{igl::active\_set()} function.
  59. The active set method works by iteratively treating a subset (some rows) of the
  60. inequality constraints as equality constraints. These are called the ``active
  61. set'' of constraints. So at any given iterations $i$ we might have a new
  62. problem:
  63. \begin{align}
  64. \argmin \limits_\z &
  65. \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
  66. \text{subject to } & \z_\text{known}^i = \y^i,\\
  67. & \Aeq^i \z = \beq^i,
  68. \end{align}
  69. where the active rows from
  70. $\lx ≤ \z ≤ \ux$ and $\Aieq \z ≤ \bieq$ have been appended into
  71. $\z_\text{known}^i = \y^i$ and $\Aeq^i \z = \beq^i$ respectively.
  72. This may be optimized by solving a sparse linear system, resulting in the
  73. current solution $\z^i$. For equality constraint we can also find a
  74. corresponding Lagrange multiplier value. The active set method works by adding
  75. to the active set all linear inequality constraints which are violated by the
  76. previous solution $\z^{i-1}$ before this solve and then after the solve
  77. removing from the active set any constraints with negative Lagrange multiplier
  78. values. Let's declare that $\lameq \in \R^{m_\text{eq}}$, $\lamieq \in
  79. \R^{m_\text{ieq}}$, and $\lamlu \in \R^{n}$ are the Lagrange multipliers
  80. corresponding to the linear equality, linear inequality and constant bound
  81. constraints respectively. Then the abridged active set method proceeds as
  82. follows:
  83. \begin{lstlisting}[keywordstyle=,mathescape]
  84. while not converged
  85. add to active set all rows where $\Aieq \z > \bieq$, $\z < \lx$ or $\z > \ux$
  86. $\Aeq^i,\beq^i \leftarrow \Aeq,\beq + \text{active rows of} \Aieq,\bieq$
  87. $\z_\text{known}^i,\y^i \leftarrow \z_\text{known},\y + \text{active indices and values of} \lx,\ux$
  88. solve problem treating active constraints as equality constraints $\rightarrow \z,\lamieq,\lamlu$
  89. remove from active set all rows with $\lamieq < 0$ or $\lamlu < 0$
  90. end
  91. \end{lstlisting}
  92. The fixed values constraints of $\z_\text{known}^i = \y^i$ may be enforced by
  93. substituting $\z_\text{known}^i$ for $\y^i$ in the energy directly during the
  94. solve. Corresponding Lagrange multiplier values $\lambda_\text{known}^i$ can
  95. be recovered after we've found the rest of $\z$.
  96. The linear equality constraints $\Aeq^i \z = \beq^i$ are a little trickier. If
  97. the rows of
  98. $\Aeq^i \in \R^{(<m_\text{eq}+ m_\text{ieq}) \times n}$ are linearly
  99. independent then it is straightforward how to build a Lagrangian which enforces
  100. each constraint. This results in solving a system roughly of the form:
  101. \begin{equation}
  102. \left(
  103. \begin{array}{cc}
  104. \A & {\Aeq^i}^\transpose \\
  105. \Aeq^i & \mat{0}
  106. \end{array}
  107. \right)
  108. \left(
  109. \begin{array}{l}
  110. \z\\\lambda^i_\text{eq}
  111. \end{array}
  112. \right)
  113. =
  114. \left(
  115. \begin{array}{l}
  116. -\onehalf\b\\
  117. -\beq^i
  118. \end{array}
  119. \right)
  120. \end{equation}
  121. \todo{Double check. Could be missing a sign or factor of 2 here.}
  122. If the rows of $\Aeq^i$ are linearly dependent then the system matrix above
  123. will be singular. Because we may always assume that the constraints are not
  124. contradictory, this system can still be solved. But it will take some care.
  125. Some linear solvers, e.g.\ \textsc{MATLAB}'s, seem to deal with these OK.
  126. \textsc{Eigen}'s does not.
  127. Without loss of generality, let us assume that there are no inequality
  128. constraints and it's the rows of $\Aeq$ which might be linearly dependent.
  129. Let's also assume we have no fixed or known values. Then the linear system
  130. above corresponds to the following optimization problem:
  131. \begin{align}
  132. \argmin \limits_\z &
  133. \z^\transpose \A \z + \z^\transpose \b + \text{ constant}\\
  134. \text{subject to } & \Aeq \z = \beq.
  135. \end{align}
  136. For the sake of cleaner notation, let $m = m_\text{eq}$ so that $\Aeq \in \R^{m
  137. \times n}$ and $\beq \in \R^m$.
  138. We can construct the null space of the constraints by computing a QR
  139. decomposition of $\Aeq^\transpose$:
  140. \begin{equation}
  141. \Aeq^\transpose \P = \Q \RR =
  142. \left(\begin{array}{cc}
  143. \Q_1 & \Q_2
  144. \end{array}\right)
  145. \left(\begin{array}{c}
  146. \RR\\
  147. \mat{0}
  148. \end{array}\right)=
  149. \Q_1 \RR
  150. \end {equation}
  151. where $\P \in \R^{m \times m}$ is a sparse permutation matrix, $\Q \in \R^{n
  152. \times n}$ is orthonormal, $\RR \in \R^{n \times m}$ is upper triangular. Let
  153. $r$ be the row rank of $\Aeq$---the number of linearly independent rows---then
  154. we split $\Q$ and $\RR$ into $\Q_1 \in \R^{n \times r}$, $\Q_2 \in \R^{n \times
  155. n-r}$, and $\R_1 \in \RR^{r \times m}$.
  156. Notice that
  157. \begin{align}
  158. \Aeq \z &= \beq \\
  159. \P \P^\transpose \Aeq \z &= \\
  160. \P \left(\Aeq^\transpose \P\right)^\transpose \z &= \\
  161. \P\left(\Q\RR\right)^\transpose \z &= \\
  162. \P \RR^\transpose \Q^\transpose \z &= \\
  163. \P \RR^\transpose \Q \z &= \\
  164. \P \left(\RR_1^\transpose \mat{0}\right)
  165. \left(\begin{array}{c}
  166. \Q_1^\transpose\\
  167. \Q_2^\transpose
  168. \end{array}\right) \z &=\\
  169. \P \RR_1^\transpose \Q_1^\transpose \z + \P \0 \Q_2^\transpose \z &= .
  170. \end{align}
  171. Here we see that $\Q_1^\transpose \z$ affects this equality but
  172. $\Q_2^\transpose \z$ does not. Thus
  173. we say that $\Q_2$ forms a basis that spans the null space of $\Aeq$. That is,
  174. $\Aeq \Q_2 \w_2 =
  175. \vc{0},
  176. \forall \w_2 \in \R^{n-r}$. Let's write
  177. \begin{equation}
  178. \z = \Q_1 \w_1 + \Q_2 \w_2.
  179. \end{equation}
  180. We're only interested in solutions $\z$ which satisfy our equality constraints
  181. $\Aeq \z = \beq$. If we plug in the above then we get
  182. \begin{align}
  183. \P \Aeq \left(\Q_1 \w_1 + \Q_2 \w_2\right) &= \beq \\
  184. \P \Aeq \Q_1 \w_1 &= \beq \\
  185. \P \left(\RR^\transpose \quad
  186. \mat{0}\right)\left(\begin{array}{c}\Q_1^\transpose\\
  187. \Q_2^\transpose\end{array}\right) \Q_1 \w_1 &= \beq \\
  188. \P \RR^\transpose \Q_1^\transpose \Q_1 \w_1 &= \beq \\
  189. \P \RR^\transpose \w_1 &= \beq \\
  190. \w_1 &= {\RR^\transpose}^{-1} \P^\transpose \beq.
  191. \end{align}
  192. % P RT Q1T z = Beq
  193. % RT Q1T z = PT Beq
  194. % Q1T z = RT \ PT Beq
  195. % w1 = RT \ PT * Beq
  196. % z = Q2 w2 + Q1 * w1
  197. % z = Q2 w2 + Q1 * RT \ P * Beq
  198. % z = Q2 w2 + lambda0
  199. So then
  200. \begin{align}
  201. z &= \Q_1 {\RR^\transpose}^{-1} \P^\transpose \beq + \Q_2 \w_2 \\
  202. z &= \Q_2 \w_2 + \overline{\w}_1\\
  203. \end{align}
  204. where $\overline{\w}_1$ is just collecting the known terms that do not depend on the
  205. unknowns $\w_2$.
  206. Now, by construction, $\Q_2 \w_2 + \overline{\w}_1$ spans (all) possible solutions
  207. for $\z$ that satisfy $\Aeq \z = \beq$
  208. % Let:
  209. % z = Q2 w2 + Q1 * w1
  210. % z = Q2 w2 + Q1 * RT \ P * Beq
  211. % z = Q2 w2 + lambda0
  212. % Q2 w2 + lambda0 spans all z such that Aeq z = Beq, we have essentially
  213. % "factored out" the constraints. We can then simply make this substitution
  214. % into our original optimization problem, leaving an \emph{unconstrained}
  215. % quadratic energy optimization
  216. \begin{align}
  217. \argmin \limits_{\w_2} & (\Q_2 \w_2 + \overline{\w}_1)^\transpose \A (\Q_2 \w_2 + \overline{\w}_1) +
  218. (\Q_2 \w_2 + \overline{\w}_1)^\transpose \b + \textrm{ constant}\\
  219. &\text{\emph{or equivalently}} \notag\\
  220. \argmin \limits_{\w_2} &
  221. \w_2^\transpose \Q_2^\transpose \A \Q_2 + \w_2^\transpose \left(2\Q_2^\transpose \A
  222. \Q_2 \overline{\w}_1 + \Q_2^\transpose \b\right) + \textrm{ constant}
  223. \end{align}
  224. which is solved with a linear system solve
  225. \begin{equation}
  226. \left(\Q_2^\transpose \A \Q_2\right) \w_2 =
  227. -2\Q_2^\transpose \A \Q_2 \overline{\w}_1 + \Q_2^\transpose \b
  228. \end{equation}
  229. \todo{I've almost surely lost a sign or factor of 2 here.}
  230. If $\A$ is semi-positive definite then $\Q_2^\transpose\A\Q_2$ will also be
  231. semi-positive definite. Thus Cholesky factorization can be used. If $\A$ is
  232. sparse and $\Aeq$ is sparse and a good column pivoting sparse QR library is
  233. used then $\Q_2$ will be sparse and so will $\Q_2^\transpose\A\Q_2$.
  234. After $\w_2$ is known we can compute the solution $\z$ as
  235. \begin{equation}
  236. \z = \Q_2 \w_2 + \overline{\w}_1.
  237. \end{equation}
  238. Recovering the Lagrange multipliers $\lameq \in \R^{m}$ corresponding to $\Aeq
  239. \z = \b$ is a bit trickier. The equation is:
  240. \begin{align}
  241. \left(\A\quad \Aeq^\transpose\right)
  242. \left(\begin{array}{c}
  243. \z\\
  244. \lameq
  245. \end{array}\right) &= -\onehalf\b \\
  246. \Aeq^\transpose \lameq &= - \A \z - \onehalf\b \\
  247. \Q_1^\transpose \Aeq^\transpose \lameq &= - \Q_1^\transpose \A \z - \onehalf\Q_1^\transpose \b \\
  248. % AT P = Q_1 R
  249. % AT = Q_1 R PT
  250. %
  251. % Q1T AT
  252. % Q1T Q_1 R PT
  253. % R PT
  254. & \textrm{\emph{Recall that $\Aeq^\transpose\P = \Q_1\RR$ so then $
  255. \Q_1^\transpose \Aeq^\transpose = \RR \P^\transpose$}}\notag\\
  256. \RR \P^\transpose \lameq &= - \Q_1^\transpose \A \z - \onehalf\Q_1^\transpose \b \\
  257. \lameq &= \P
  258. \RR^{-1}
  259. \left(-\Q_1^\transpose \A \z - \onehalf\Q_1^\transpose \b\right).
  260. \end{align}
  261. \alec{If $r<m$ then I think it is enough to define the permutation matrix as
  262. rectangular $\P \in \R^{m \times r}$.}
  263. We assumes that there were now known or fixed values. If there are, special
  264. care only needs to be taken to adjust $\Aeq \z = \beq$ to account for
  265. substituting $\z_\text{known}$ with $\y$.
  266. \hr
  267. \clearpage
  268. The following table describes the translation between the entities described in
  269. this document and those used in \texttt{igl/active\_set} and
  270. \texttt{igl/min\_quad\_with\_fixed}.
  271. \begin{lstlisting}[keywordstyle=,mathescape]
  272. $\A$: A
  273. $\b$: B
  274. $\z$: Z
  275. $\Aeq$: Aeq
  276. $\Aieq$: Aieq
  277. $\beq$: Beq
  278. $\bieq$: Bieq
  279. $\lx$: lx
  280. $\ux$: ux
  281. $\lamieq$: ~Lambda_Aieq_i
  282. $\lamlu$: ~Lambda_known_i
  283. $\P$: AeqTE
  284. $\Q$: AeqTQ
  285. $\RR$: AeqTR1
  286. $\RR^\transpose$: AeqTR1T
  287. $\Q_1$: AeqTQ1
  288. $\Q_1^\transpose$: AeqTQ1T
  289. $\Q_2$: AeqTQ2
  290. $\Q_2^\transpose$: AeqTQ2T
  291. $\w_2$: lambda
  292. $\overline{\w}_1$: lambda_0
  293. \end{lstlisting}
  294. % Resources
  295. % http://www.cs.cornell.edu/courses/cs322/2007sp/notes/qr.pdf
  296. % http://www.math.uh.edu/~rohop/fall_06/Chapter2.pdf
  297. % http://www.math.uh.edu/~rohop/fall_06/Chapter3.pdf
  298. \end{document}