|
@@ -28,12 +28,12 @@ in~\Cref{sec:pinn}.
|
|
\label{sec:domain}
|
|
\label{sec:domain}
|
|
|
|
|
|
To model a physical problem using mathematical tools, it is necessary to define
|
|
To model a physical problem using mathematical tools, it is necessary to define
|
|
-a set of fundamental numbers or quantities upon which the subsequent calculations
|
|
|
|
-will be based. These sets may represent, for instance, a specific time interval
|
|
|
|
-or a distance. The term \emph{domain} describes these fundamental sets of
|
|
|
|
-numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
|
|
|
|
-living in a certain domain. In this thesis, we will focus on domains of real
|
|
|
|
-numbers in $\mathbb{R}$.\\
|
|
|
|
|
|
+a set of fundamental numbers or quantities upon which the subsequent
|
|
|
|
+calculations will be based. These sets may represent, for instance, a specific
|
|
|
|
+time interval or a distance. The term \emph{domain} describes these fundamental
|
|
|
|
+sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
|
|
|
|
+entity living in a certain domain. In this thesis, we will focus on domains of
|
|
|
|
+real numbers in $\mathbb{R}$.\\
|
|
|
|
|
|
The mapping between variables enables the modeling of the process and depicts
|
|
The mapping between variables enables the modeling of the process and depicts
|
|
the semantics. We use functions in order to facilitate this mapping. Let
|
|
the semantics. We use functions in order to facilitate this mapping. Let
|
|
@@ -408,49 +408,52 @@ systems, as we describe in~\Cref{sec:mlp}.
|
|
|
|
|
|
\section{Multilayer Perceptron 2}
|
|
\section{Multilayer Perceptron 2}
|
|
\label{sec:mlp}
|
|
\label{sec:mlp}
|
|
-In~\Cref{sec:differentialEq} we show the importance of differential equations to
|
|
|
|
-systems, being able to show the change of it dependent on a certain parameter of
|
|
|
|
-the parameter. In~\Cref{sec:epidemModel} we show specific applications for
|
|
|
|
-differential equations in an epidemiological context. Now, the last point is to
|
|
|
|
-solve these equations. For this problem, there are multiple methods to reach
|
|
|
|
-this goal one of them is the \emph{Multilayer Perceptron}
|
|
|
|
-(MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
|
|
|
|
-training and usage of these \emph{neural networks} using, for which we use the book
|
|
|
|
-\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a base
|
|
|
|
-for our explanations.\\
|
|
|
|
-
|
|
|
|
-The goal is to be able to approximate any function $f^{*}$ that is for instance
|
|
|
|
-mathematical function or a mapping of an input vector to a class or category.
|
|
|
|
-Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$ the label, class
|
|
|
|
-or result, then,
|
|
|
|
-\begin{equation}
|
|
|
|
- \boldsymbol{y} = f^{*}(\boldsymbol{x}),
|
|
|
|
-\end{equation}
|
|
|
|
|
|
+In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
|
|
|
|
+equations in systems, illustrating how they can be utilized to elucidate the
|
|
|
|
+impact of a specific parameter on the system's behavior.
|
|
|
|
+In~\Cref{sec:epidemModel}, we show specific applications of differential
|
|
|
|
+equations in an epidemiological context. The final objective is to solve these
|
|
|
|
+equations. For this problem, there are multiple methods to achieve this goal. On
|
|
|
|
+such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
|
|
|
|
+following section, we provide a brief overview of the structure and training of
|
|
|
|
+these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
|
|
|
|
+by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
|
|
|
|
+explanations.\\
|
|
|
|
+
|
|
|
|
+The objective is to develop an approximation method for any function $f^{*}$,
|
|
|
|
+which could be a mathematical function or a mapping of an input vector to a
|
|
|
|
+class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
|
|
|
|
+the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
|
|
is the function to approximate. In the year 1958,
|
|
is the function to approximate. In the year 1958,
|
|
Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
|
|
Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
|
|
a neuron in a neuroscientific sense. The perceptron takes in the input vector
|
|
a neuron in a neuroscientific sense. The perceptron takes in the input vector
|
|
-$\boldsymbol{x}$ performs anoperation and produces a scalar result. This model
|
|
|
|
|
|
+$\boldsymbol{x}$ performs an operation and produces a scalar result. This model
|
|
optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
|
|
optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
|
|
- f(\boldsymbol{x}; \theta)$ as correct as possible. As Minsky and
|
|
|
|
-Papert~\cite{Minsky1972} show, the perceptron on its own is able to approximate
|
|
|
|
-only a class of functions. Thus, the need for an expansion of the perceptron.\\
|
|
|
|
-
|
|
|
|
-As Goodfellow \etal go on, the solution for this is to split $f$ into a chain
|
|
|
|
-structure of $f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x})))$.
|
|
|
|
-This transforms a perceptron, which has an input and output layer into a
|
|
|
|
-multilayer perceptron. Each sub-function $f^{(n)}$ is represented in the
|
|
|
|
-structure of an MLP as a \emph{layer}, which are each build of a multitude of
|
|
|
|
-\emph{units} (also \emph{neurons}) each of which are doing the same
|
|
|
|
-vector-to-scalar calculation as the perceptron does. Each scalar, is then given
|
|
|
|
-to a nonlinear activation function. The layers are staggered in the neural
|
|
|
|
-network, with each being connected to its neighbor, in the way as illustrated
|
|
|
|
-in~\Cref{fig:mlp_example}. The input vector $\boldsymbol{x}$ is given to each
|
|
|
|
-unit of the first layer $f^{(1)}$, which results are then given to the units of
|
|
|
|
-the second layer $f^{(2)}$, and so on. The last layer is called the
|
|
|
|
-\emph{output layer}. All layers in between the first and the output layers are
|
|
|
|
-called \emph{hidden layers}. Through the alternating structure of linear and
|
|
|
|
-nonlinear calculation MLP's are able to approximate any kind of function. As
|
|
|
|
-Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
|
|
|
|
|
|
+ f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
|
|
|
|
+Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
|
|
|
|
+approximating a specific class of functions. Consequently, there is a necessity
|
|
|
|
+for an expansion of the perceptron.\\
|
|
|
|
+
|
|
|
|
+As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
|
|
|
|
+a chain structure of the form,
|
|
|
|
+\begin{equation} \label{eq:mlp_char}
|
|
|
|
+ f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
|
|
|
|
+\end{equation}
|
|
|
|
+This converts a perceptron, which has only two layers (an input and an output
|
|
|
|
+layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
|
|
|
|
+is represented in the structure of an MLP as a \emph{layer}. A multitude of
|
|
|
|
+\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
|
|
|
|
+same vector-to-scalar calculation as the perceptron does. Subsequently, a
|
|
|
|
+nonlinear activation function transforms the scalar output into the activation
|
|
|
|
+of the unit. The layers are staggered in the neural network, with each layer
|
|
|
|
+being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
|
|
|
|
+input vector $\boldsymbol{x}$ is provided to each unit of the first layer
|
|
|
|
+$f^{(1)}$, which then gives the results to the units of the second layer
|
|
|
|
+$f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
|
|
|
|
+intervening layers, situated between the first and the output layers are the
|
|
|
|
+\emph{hidden layers}. The alternating structure of linear and nonlinear
|
|
|
|
+calculation enables MLP's to approximate any function. As Hornik
|
|
|
|
+\etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
|
|
|
|
|
|
\begin{figure}[h]
|
|
\begin{figure}[h]
|
|
\centering
|
|
\centering
|
|
@@ -460,34 +463,58 @@ Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
|
|
\label{fig:mlp_example}
|
|
\label{fig:mlp_example}
|
|
\end{figure}
|
|
\end{figure}
|
|
|
|
|
|
-The process of optimizing the parameters $\theta$ is called \emph{training}.
|
|
|
|
-For trainning we will have to have a set of \emph{training data}, which
|
|
|
|
-is a set of pairs (also called training points) of the input data
|
|
|
|
-$\boldsymbol{x}$ and its corresponding true solution $\boldsymbol{y}$ of the
|
|
|
|
-function $f^{*}$. For the training process we must define the
|
|
|
|
-\emph{loss function} $\Loss{ }$, using the model prediction
|
|
|
|
|
|
+The term \emph{training} describes the process of optimizing the parameters
|
|
|
|
+$\theta$. In order to undertake training, it is necessary to have a set of
|
|
|
|
+\emph{training data}, which is a set of pairs (also called training points) of
|
|
|
|
+the input data $\boldsymbol{x}$ and its corresponding true solution
|
|
|
|
+$\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
|
|
|
|
+define the \emph{loss function} $\Loss{ }$, using the model prediction
|
|
$\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
|
|
$\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
|
|
-metric of how far the model is away from the correct answer. One of the most
|
|
|
|
-common loss function is the \emph{mean square error} (MSE) loss function. Let
|
|
|
|
-$N$ be the number of points in the set of training data, then
|
|
|
|
|
|
+metric for evaluating the extent to which the model deviates from the correct
|
|
|
|
+answer. One of the most common loss function is the \emph{mean square error}
|
|
|
|
+(MSE) loss function. Let $N$ be the number of points in the set of training
|
|
|
|
+data. Then,
|
|
\begin{equation}
|
|
\begin{equation}
|
|
- \Loss{MSE} = \frac{1}{N}\sum_{n=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
|
|
|
|
|
|
+ \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
|
|
\end{equation}
|
|
\end{equation}
|
|
-calculates the squared differnce between each model prediction and true value
|
|
|
|
-of a training and takes the mean across the whole training data. Now, we only
|
|
|
|
-need a way to change the parameters using our loss. For this gradient descent
|
|
|
|
-is used in gradient-based learning.
|
|
|
|
-- gradient descent/gradient-based learning
|
|
|
|
-- backpropagation
|
|
|
|
-- optimizers
|
|
|
|
-- learning rate
|
|
|
|
-- scheduler
|
|
|
|
-
|
|
|
|
|
|
+calculates the squared difference between each model prediction and true value
|
|
|
|
+of a training and takes the mean across the whole training data. \\
|
|
|
|
+
|
|
|
|
+In the context of neural networks, \emph{forward propagation} describes the
|
|
|
|
+process of information flowing through the network from the input layer to the
|
|
|
|
+output layer, resulting in a scalar loss. Ultimately, the objective is to
|
|
|
|
+utilize this information to optimize the parameters, in order to minimize the
|
|
|
|
+loss. One of the most fundamental optimization strategy is \emph{gradient
|
|
|
|
+ descent}. In this process, the derivatives are employed to identify the location
|
|
|
|
+of local or global minima within a function. Given that a positive gradient
|
|
|
|
+signifies ascent and a negative gradient indicates descent, we must move the
|
|
|
|
+variable by a constant \emph{learning rate} (step size) in the opposite
|
|
|
|
+direction to that of the gradient. The calculation of the derivatives in respect
|
|
|
|
+to the parameters is a complex task, since our functions is a composition of
|
|
|
|
+many functions (one for each layer). The algorithm of \emph{back propagation} \todo{Insert source}
|
|
|
|
+takes the advantage of~\Cref{eq:mlp_char} and addresses this issue by employing
|
|
|
|
+the chain rule of calculus.\\
|
|
|
|
+
|
|
|
|
+In practical applications, an optimizer often accomplishes the optimization task
|
|
|
|
+by executing gradient descent in the background. Furthermore, modifying the
|
|
|
|
+learning rate during training can be advantageous. For instance, making larger
|
|
|
|
+steps at the beginning and minor adjustments at the end. Therefore, schedulers
|
|
|
|
+are implementations algorithms that employ diverse learning rate alteration
|
|
|
|
+strategies.\\
|
|
|
|
+
|
|
|
|
+This section provides an overview of basic concepts of neural networks. For a
|
|
|
|
+deeper understanding, we direct the reader to the book \emph{Deep Learning} by
|
|
|
|
+Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
|
|
|
|
+the application of neural networks in approximating solutions to differential
|
|
|
|
+systems.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
% -------------------------------------------------------------------
|
|
|
|
|
|
\section{Physics Informed Neural Networks 5}
|
|
\section{Physics Informed Neural Networks 5}
|
|
\label{sec:pinn}
|
|
\label{sec:pinn}
|
|
|
|
+In~\Cref{sec:mlp} we described the structure and training of MLP's, which are
|
|
|
|
+recognized tools for approximating any kind of function. In this section we want
|
|
|
|
+to make use of this ability and us neural networks as approximators for ODE's.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
% -------------------------------------------------------------------
|
|
|
|
|