1 year ago · 3e7fa08192
--- a/chapters/chap02/chap02.tex
+++ b/chapters/chap02/chap02.tex
@@ -28,12 +28,12 @@ in~\Cref{sec:pinn}.
 
															 \label{sec:domain}
														
 
															 To model a physical problem using mathematical tools, it is necessary to define
														
 
															-a set of fundamental numbers or quantities upon which the subsequent calculations
														
 
															-will be based. These sets may represent, for instance, a specific time interval
														
 
															-or a distance. The term \emph{domain} describes these fundamental sets of
														
 
															-numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
														
 
															-living in a certain domain. In this thesis, we will focus on domains of real
														
 
															-numbers in $\mathbb{R}$.\\
														
 
															+a set of fundamental numbers or quantities upon which the subsequent
														
 
															+calculations will be based. These sets may represent, for instance, a specific
														
 
															+time interval or a distance. The term \emph{domain} describes these fundamental
														
 
															+sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
														
 
															+entity living in a certain domain. In this thesis, we will focus on domains of
														
 
															+real numbers in $\mathbb{R}$.\\
														
 
															 The mapping between variables enables the modeling of the process and depicts
														
 
															 the semantics. We use functions in order to facilitate this mapping. Let
														
@@ -408,49 +408,52 @@ systems, as we describe in~\Cref{sec:mlp}.
 
															 \section{Multilayer Perceptron   2}
														
 
															 \label{sec:mlp}
														
 
															-In~\Cref{sec:differentialEq} we show the importance of differential equations to
														
 
															-systems, being able to show the change of it dependent on a certain parameter of
														
 
															-the parameter. In~\Cref{sec:epidemModel} we show specific applications for
														
 
															-differential equations in an epidemiological context. Now, the last point is to
														
 
															-solve these equations. For this problem, there are multiple methods to reach
														
 
															-this goal one of them is the \emph{Multilayer Perceptron}
														
 
															-(MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
														
 
															-training and usage of these \emph{neural networks} using, for which we use the book
														
 
															-\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a base
														
 
															-for our explanations.\\
														
 
															-
														
 
															-The goal is to be able to approximate any function $f^{*}$ that is for instance
														
 
															-mathematical function or a mapping of an input vector to a class or category.
														
 
															-Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$ the label, class
														
 
															-or result, then,
														
 
															-\begin{equation}
														
 
															-  \boldsymbol{y} = f^{*}(\boldsymbol{x}),
														
 
															-\end{equation}
														
 
															+In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
														
 
															+equations in systems, illustrating how they can be utilized to elucidate the
														
 
															+impact of a specific parameter on the system's behavior.
														
 
															+In~\Cref{sec:epidemModel}, we show specific applications of differential
														
 
															+equations in an epidemiological context. The final objective is to solve these
														
 
															+equations. For this problem, there are multiple methods to achieve this goal. On
														
 
															+such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
														
 
															+following section, we provide a brief overview of the structure and training of
														
 
															+these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
														
 
															+by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
														
 
															+explanations.\\
														
 
															+
														
 
															+The objective is to develop an approximation method for any function $f^{*}$,
														
 
															+which could be a mathematical function or a mapping of an input vector to a
														
 
															+class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
														
 
															+the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
														
 
															 is the function to approximate. In the year 1958,
														
 
															 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
														
 
															 a neuron in a neuroscientific sense. The perceptron takes in the input vector
														
 
															-$\boldsymbol{x}$ performs anoperation and produces a scalar result. This model
														
 
															+$\boldsymbol{x}$ performs an operation and produces a scalar result. This model
														
 
															 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
														
 
															-  f(\boldsymbol{x}; \theta)$ as correct as possible. As Minsky and
														
 
															-Papert~\cite{Minsky1972} show, the perceptron on its own is able to approximate
														
 
															-only a class of functions. Thus, the need for an expansion of the perceptron.\\
														
 
															-
														
 
															-As Goodfellow \etal go on, the solution for this is to split $f$ into a chain
														
 
															-structure of $f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x})))$.
														
 
															-This transforms a perceptron, which has an input and output layer into a
														
 
															-multilayer perceptron. Each sub-function $f^{(n)}$ is represented in the
														
 
															-structure of an MLP as a \emph{layer}, which are each build of a multitude of
														
 
															-\emph{units} (also \emph{neurons}) each of which are doing the same
														
 
															-vector-to-scalar calculation as the perceptron does. Each scalar, is then given
														
 
															-to a nonlinear activation function. The layers are staggered in the neural
														
 
															-network, with each being connected to its neighbor, in the way as illustrated
														
 
															-in~\Cref{fig:mlp_example}. The input vector $\boldsymbol{x}$ is given to each
														
 
															-unit of the first layer $f^{(1)}$, which results are then given to the units of
														
 
															-the second layer $f^{(2)}$, and so on. The last layer is called the
														
 
															-\emph{output layer}. All layers in between the first and the output layers are
														
 
															-called \emph{hidden layers}. Through the alternating structure of linear and
														
 
															-nonlinear calculation MLP's are able to approximate any kind of function. As
														
 
															-Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
														
 
															+  f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
														
 
															+Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
														
 
															+approximating a specific class of functions. Consequently, there is a necessity
														
 
															+for an expansion of the perceptron.\\
														
 
															+
														
 
															+As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
														
 
															+a chain structure of the form,
														
 
															+\begin{equation} \label{eq:mlp_char}
														
 
															+  f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
														
 
															+\end{equation}
														
 
															+This converts a perceptron, which has only two layers (an input and an output
														
 
															+layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
														
 
															+is represented in the structure of an MLP as a \emph{layer}. A multitude of
														
 
															+\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
														
 
															+same vector-to-scalar calculation as the perceptron does. Subsequently, a
														
 
															+nonlinear activation function transforms the scalar output into the activation
														
 
															+of the unit. The layers are staggered in the neural network, with each layer
														
 
															+being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
														
 
															+input vector $\boldsymbol{x}$ is provided to each unit of the first layer
														
 
															+$f^{(1)}$, which then gives the results to the units of the second layer
														
 
															+$f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
														
 
															+intervening layers, situated between the first and the output layers are the
														
 
															+\emph{hidden layers}. The alternating structure of linear and nonlinear
														
 
															+calculation enables MLP's to approximate any function. As Hornik
														
 
															+\etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
														
 
															 \begin{figure}[h]
														
 
															   \centering
														
@@ -460,34 +463,58 @@ Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
 
															   \label{fig:mlp_example}
														
 
															 \end{figure}
														
 
															-The process of optimizing the parameters $\theta$ is called \emph{training}.
														
 
															-For trainning we will have to have a set of \emph{training data}, which
														
 
															-is a set of pairs (also called training points) of the input data
														
 
															-$\boldsymbol{x}$ and its corresponding true solution $\boldsymbol{y}$ of the
														
 
															-function $f^{*}$. For the training process we must define the
														
 
															-\emph{loss function} $\Loss{ }$, using the model prediction
														
 
															+The term \emph{training} describes the process of optimizing the parameters
														
 
															+$\theta$. In order to undertake training, it is necessary to have a set of
														
 
															+\emph{training data}, which is a set of pairs (also called training points) of
														
 
															+the input data $\boldsymbol{x}$ and its corresponding true solution
														
 
															+$\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
														
 
															+define the \emph{loss function} $\Loss{ }$, using the model prediction
														
 
															 $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
														
 
															-metric of how far the model is away from the correct answer. One of the most
														
 
															-common loss function is the \emph{mean square error} (MSE) loss function. Let
														
 
															-$N$ be the number of points in the set of training data, then
														
 
															+metric for evaluating the extent to which the model deviates from the correct
														
 
															+answer. One of the most common loss function is the \emph{mean square error}
														
 
															+(MSE) loss function. Let $N$ be the number of points in the set of training
														
 
															+data. Then,
														
 
															 \begin{equation}
														
 
															-  \Loss{MSE} = \frac{1}{N}\sum_{n=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
														
 
															+  \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
														
 
															 \end{equation}
														
 
															-calculates the squared differnce between each model prediction and true value
														
 
															-of a training and takes the mean across the whole training data. Now, we only
														
 
															-need a way to change the parameters using our loss. For this gradient descent
														
 
															-is used in gradient-based learning.
														
 
															-- gradient descent/gradient-based learning
														
 
															-- backpropagation
														
 
															-- optimizers
														
 
															-- learning rate
														
 
															-- scheduler
														
 
															-
														
 
															+calculates the squared difference between each model prediction and true value
														
 
															+of a training and takes the mean across the whole training data. \\
														
 
															+
														
 
															+In the context of neural networks, \emph{forward propagation} describes the
														
 
															+process of information flowing through the network from the input layer to the
														
 
															+output layer, resulting in a scalar loss. Ultimately, the objective is to
														
 
															+utilize this information to optimize the parameters, in order to minimize the
														
 
															+loss. One of the most fundamental optimization strategy is \emph{gradient
														
 
															+  descent}. In this process, the derivatives are employed to identify the location
														
 
															+of local or global minima within a function. Given that a positive gradient
														
 
															+signifies ascent and a negative gradient indicates descent, we must move the
														
 
															+variable by a constant \emph{learning rate} (step size) in the opposite
														
 
															+direction to that of the gradient. The calculation of the derivatives in respect
														
 
															+to the parameters is a complex task, since our functions is a composition of
														
 
															+many functions (one for each layer). The algorithm of \emph{back propagation} \todo{Insert source}
														
 
															+takes the advantage of~\Cref{eq:mlp_char} and addresses this issue by employing
														
 
															+the chain rule of calculus.\\
														
 
															+
														
 
															+In practical applications, an optimizer often accomplishes the optimization task
														
 
															+by executing gradient descent in the background. Furthermore, modifying  the
														
 
															+learning rate during training can be advantageous. For instance, making larger
														
 
															+steps at the beginning and minor adjustments at the end. Therefore, schedulers
														
 
															+are implementations algorithms that employ diverse learning rate alteration
														
 
															+strategies.\\
														
 
															+
														
 
															+This section provides an overview of basic concepts of neural networks. For a
														
 
															+deeper understanding, we direct the reader to the book \emph{Deep Learning} by
														
 
															+Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
														
 
															+the application of neural networks in approximating solutions to differential
														
 
															+systems.
														
 
															 % -------------------------------------------------------------------
														
 
															 \section{Physics Informed Neural Networks   5}
														
 
															 \label{sec:pinn}
														
 
															+In~\Cref{sec:mlp} we described the structure and training of MLP's, which are
														
 
															+recognized tools for approximating any kind of function. In this section we want
														
 
															+to make use of this ability and us neural networks as approximators for ODE's.
														
 
															 % -------------------------------------------------------------------
														
--- a/thesis.pdf
+++ b/thesis.pdf