|
@@ -474,7 +474,7 @@ metric for evaluating the extent to which the model deviates from the correct
|
|
|
answer. One of the most common loss function is the \emph{mean square error}
|
|
|
(MSE) loss function. Let $N$ be the number of points in the set of training
|
|
|
data. Then,
|
|
|
-\begin{equation}
|
|
|
+\begin{equation} \label{eq:mse}
|
|
|
\Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
|
|
|
\end{equation}
|
|
|
calculates the squared difference between each model prediction and true value
|
|
@@ -497,7 +497,7 @@ $\hat{\boldsymbol{y}} = f(w; \theta)$ be the model prediction with
|
|
|
$w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
|
|
|
$\boldsymbol{x}$ is the input vector and $\theta_1, \theta_2\subset\theta$.
|
|
|
Then,
|
|
|
-\begin{equation}
|
|
|
+\begin{equation}\label{eq:backprop}
|
|
|
\nabla_{\theta_1} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\theta_1}f^{(1)},
|
|
|
\end{equation}
|
|
|
is the gradient of $\Loss{ }$ in respect of the parameters $\theta_1$. The name
|
|
@@ -522,23 +522,110 @@ systems.
|
|
|
\label{sec:pinn}
|
|
|
|
|
|
In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
|
|
|
-recognized tools for approximating any kind of function. This section, we
|
|
|
-show that this capability can be applied to create a solver for ODE's and PDE's
|
|
|
-as Legaris \etal~\cite{Lagaris1997} describe in their paper. In this method, the
|
|
|
-model learns to approximate a function using the given data points and employs
|
|
|
-knowledge that is available about the problem such as a system of differential
|
|
|
-system. The physics-informed neural network (PINN) learns system of differential
|
|
|
-equations during training, as it tries to optimize its output to fit the
|
|
|
-equations.\\
|
|
|
-
|
|
|
-In contrast to standard MLP's PINN's have a modified Loss term. Ultimately, the
|
|
|
-loss includes the above-mentioned prior knowledge to the problem. While still
|
|
|
-containing the loss term, that seeks to minimize the distance between the model
|
|
|
-predictions and the solutions, which is the observation loss $\Loss{obs} =
|
|
|
- \Loss{MSE}$, a PINN adds a term that includes the residuals of the differential
|
|
|
-equations, which is the physics loss $\mathcal{L}_{physics}(\boldsymbol{x},
|
|
|
- \hat{\boldsymbol{y}})$ of the PINN and tries to optimize the prediction to fit
|
|
|
-the differential equations.
|
|
|
+wildely recognized tools for approximating any kind of function. In this
|
|
|
+section, we apply this capability to create a solver for ODE's and PDE's
|
|
|
+as Legaris \etal~\cite{Lagaris1997} describe in their paper. In this approach,
|
|
|
+the model learns to approximate a function using provided data points while
|
|
|
+leveraging the available knowledge about the problem in the form of a system of
|
|
|
+differential equations. The \emph{physics-informed neural network} (PINN)
|
|
|
+learns the system of differential equations during training, as it optimizes
|
|
|
+its output to align with the equations.\\
|
|
|
+
|
|
|
+In contrast to standard MLP's, the loss term of a PINN comprises two
|
|
|
+components. The first term incorporates the aforementioned prior knowledge to pertinent the problem. As Raissi
|
|
|
+\etal~\cite{Raissi2017} propose, the residual of each differential equation in
|
|
|
+the system must be minimized in order for the model to optimize its output in accordance with the theory.
|
|
|
+We obtain the residual $R_i$, with $i\in\{1, ...,N_d\}$, by rearranging the differential equation and
|
|
|
+calculating the difference between the left-hand side and the right-hand side
|
|
|
+of the equation. $N_d$ is the number of differential equations in a system. As
|
|
|
+Raissi \etal~\cite{Raissi2017} propose the \emph{physics
|
|
|
+ loss} of a PINN,\todo{check source again}
|
|
|
+\begin{equation}
|
|
|
+ \mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2,
|
|
|
+\end{equation}
|
|
|
+takes the input data and the model prediction to calculate the mean square
|
|
|
+error of the residuals. The second term, the \emph{observation loss}
|
|
|
+$\Loss{obs}$, employs the mean square error of the distances between the
|
|
|
+predicted and the true values for each training point. Additionally, the
|
|
|
+observation loss may incorporate extra terms of inital and boundary conditions. Let $N_t$
|
|
|
+denote the number of training points. Then,
|
|
|
+\begin{equation}
|
|
|
+ \mathcal{L}_{PINN}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
|
|
|
+\end{equation}\\
|
|
|
+represents the comprehensive loss function of a physics-informed neural network. \\
|
|
|
+
|
|
|
+\todo{check for correctness}
|
|
|
+Given the nature of residuals, calculating the loss term of
|
|
|
+$\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
|
|
|
+calculation of the derivative of the output with respect to the input of
|
|
|
+the neural network. As we outline in~\Cref{sec:mlp}, during the process of
|
|
|
+back-propagation we calculate the gradients of the loss term in respect to a
|
|
|
+layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
|
|
|
+the index of the \todo{check for consistency} respective layer. By employing
|
|
|
+the chain rule of calculus, the algorithm progresses from the output layer
|
|
|
+through each hidden layer, ultimately reaching the first layer in order to
|
|
|
+compute the respective gradients. The term,
|
|
|
+\begin{equation}
|
|
|
+ \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
|
|
|
+\end{equation}
|
|
|
+illustrates that, in contrast to the procedure described in~\cref{eq:backprop},
|
|
|
+this procedure the \emph{automatic differenciation} goes one step further and
|
|
|
+calculates the gradient of the output with respect to the input
|
|
|
+$\boldsymbol{x}$. In order to calculate the second derivative
|
|
|
+$\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
|
|
|
+this procedure must be repeated.\\
|
|
|
+
|
|
|
+Above we present a method for approximating functions through the use of
|
|
|
+systems of differential equations. As previously stated, we want to find a
|
|
|
+solver for systems of differential equations. In problems, where we must solve
|
|
|
+an ODE or PDE, we have to find a set of parameters, that satisfies the system
|
|
|
+for any input $\boldsymbol{x}$. In terms of the context of PINN's this is the
|
|
|
+inverse problem, where we have a set of training data from measurements, for
|
|
|
+example, is available along with the respective differential equations but
|
|
|
+information about the parameters of the equations is lacking. To address this
|
|
|
+challenge, we set these parameters as distinct learnable parameters within the
|
|
|
+neural network. This enables the network to utilize a specific value, that
|
|
|
+actively influences the physics loss $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$.
|
|
|
+During the training phase the optimizer aims to minimize the physics loss,
|
|
|
+which should ultimately yield an approximation of the true value.\\
|
|
|
+
|
|
|
+\begin{figure}[h]
|
|
|
+ \centering
|
|
|
+ \includegraphics[scale=0.87]{oscilator.pdf}
|
|
|
+ \caption{Illustration of of the movement of an oscillating body in the
|
|
|
+ underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
|
|
|
+ \label{fig:spring}
|
|
|
+\end{figure}
|
|
|
+One illustrative example of a potential application for PINN's is the
|
|
|
+\emph{damped harmonic oscillator}~\cite{Tenenbaum1985}. In this problem, we \todo{check source for wording}
|
|
|
+displace a body, which is attached to a spring, from its resting position. The
|
|
|
+body is subject to three forces: firstly, the inertia exerted by the
|
|
|
+displacement $u$, which points in the direction the displacement $u$; secondly
|
|
|
+the restoring force of the spring, which attempts to return the body to its
|
|
|
+original position and thirdly, the friction force, which points in the opposite
|
|
|
+direction of the movement. In accordance with Newton's second law and the
|
|
|
+combined influence of these forces, the body exhibits oscillatory motion around
|
|
|
+its position of rest. The system is influenced by $m$ the mass of the body,
|
|
|
+$\mu$ the coefficient of friction and $k$ the spring constant, indicating the
|
|
|
+stiffness of the spring. The residual of the differential equation, \todo{check in book}
|
|
|
+\begin{equation}
|
|
|
+ m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
|
|
|
+\end{equation}
|
|
|
+shows relation of these parameters in reference to the problem at hand. As
|
|
|
+Tenenbaum and Morris provide, there are three potential solutions to this
|
|
|
+issue. However only the \emph{underdamped case} results in an oscillating
|
|
|
+movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
|
|
|
+PINN to this problem, we require a set of training data $x$. This consists of
|
|
|
+pairs of timepoints and corresponding displacement measurements
|
|
|
+$(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
|
|
|
+we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
|
|
|
+initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$, However, we do
|
|
|
+not know the value of the friction $\mu$. In this case the loss function,
|
|
|
+\begin{equation}
|
|
|
+ \mathcal{L}_{osc}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = (u^{(1)}-1)+\frac{du(0)}{dt}+||m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2,
|
|
|
+\end{equation}
|
|
|
+includes the border conditions, the residual, in which $\mu$ is a learnable
|
|
|
+parameter and the observation loss.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|