Browse Source

finish mlp

Phillip Rothenbeck 1 year ago
parent
commit
3e7fa08192
2 changed files with 92 additions and 65 deletions
  1. 92 65
      chapters/chap02/chap02.tex
  2. BIN
      thesis.pdf

+ 92 - 65
chapters/chap02/chap02.tex

@@ -28,12 +28,12 @@ in~\Cref{sec:pinn}.
 \label{sec:domain}
 \label{sec:domain}
 
 
 To model a physical problem using mathematical tools, it is necessary to define
 To model a physical problem using mathematical tools, it is necessary to define
-a set of fundamental numbers or quantities upon which the subsequent calculations
-will be based. These sets may represent, for instance, a specific time interval
-or a distance. The term \emph{domain} describes these fundamental sets of
-numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
-living in a certain domain. In this thesis, we will focus on domains of real
-numbers in $\mathbb{R}$.\\
+a set of fundamental numbers or quantities upon which the subsequent
+calculations will be based. These sets may represent, for instance, a specific
+time interval or a distance. The term \emph{domain} describes these fundamental
+sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
+entity living in a certain domain. In this thesis, we will focus on domains of
+real numbers in $\mathbb{R}$.\\
 
 
 The mapping between variables enables the modeling of the process and depicts
 The mapping between variables enables the modeling of the process and depicts
 the semantics. We use functions in order to facilitate this mapping. Let
 the semantics. We use functions in order to facilitate this mapping. Let
@@ -408,49 +408,52 @@ systems, as we describe in~\Cref{sec:mlp}.
 
 
 \section{Multilayer Perceptron   2}
 \section{Multilayer Perceptron   2}
 \label{sec:mlp}
 \label{sec:mlp}
-In~\Cref{sec:differentialEq} we show the importance of differential equations to
-systems, being able to show the change of it dependent on a certain parameter of
-the parameter. In~\Cref{sec:epidemModel} we show specific applications for
-differential equations in an epidemiological context. Now, the last point is to
-solve these equations. For this problem, there are multiple methods to reach
-this goal one of them is the \emph{Multilayer Perceptron}
-(MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
-training and usage of these \emph{neural networks} using, for which we use the book
-\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a base
-for our explanations.\\
-
-The goal is to be able to approximate any function $f^{*}$ that is for instance
-mathematical function or a mapping of an input vector to a class or category.
-Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$ the label, class
-or result, then,
-\begin{equation}
-  \boldsymbol{y} = f^{*}(\boldsymbol{x}),
-\end{equation}
+In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
+equations in systems, illustrating how they can be utilized to elucidate the
+impact of a specific parameter on the system's behavior.
+In~\Cref{sec:epidemModel}, we show specific applications of differential
+equations in an epidemiological context. The final objective is to solve these
+equations. For this problem, there are multiple methods to achieve this goal. On
+such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
+following section, we provide a brief overview of the structure and training of
+these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
+by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
+explanations.\\
+
+The objective is to develop an approximation method for any function $f^{*}$,
+which could be a mathematical function or a mapping of an input vector to a
+class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
+the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
 is the function to approximate. In the year 1958,
 is the function to approximate. In the year 1958,
 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
 a neuron in a neuroscientific sense. The perceptron takes in the input vector
 a neuron in a neuroscientific sense. The perceptron takes in the input vector
-$\boldsymbol{x}$ performs anoperation and produces a scalar result. This model
+$\boldsymbol{x}$ performs an operation and produces a scalar result. This model
 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
-  f(\boldsymbol{x}; \theta)$ as correct as possible. As Minsky and
-Papert~\cite{Minsky1972} show, the perceptron on its own is able to approximate
-only a class of functions. Thus, the need for an expansion of the perceptron.\\
-
-As Goodfellow \etal go on, the solution for this is to split $f$ into a chain
-structure of $f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x})))$.
-This transforms a perceptron, which has an input and output layer into a
-multilayer perceptron. Each sub-function $f^{(n)}$ is represented in the
-structure of an MLP as a \emph{layer}, which are each build of a multitude of
-\emph{units} (also \emph{neurons}) each of which are doing the same
-vector-to-scalar calculation as the perceptron does. Each scalar, is then given
-to a nonlinear activation function. The layers are staggered in the neural
-network, with each being connected to its neighbor, in the way as illustrated
-in~\Cref{fig:mlp_example}. The input vector $\boldsymbol{x}$ is given to each
-unit of the first layer $f^{(1)}$, which results are then given to the units of
-the second layer $f^{(2)}$, and so on. The last layer is called the
-\emph{output layer}. All layers in between the first and the output layers are
-called \emph{hidden layers}. Through the alternating structure of linear and
-nonlinear calculation MLP's are able to approximate any kind of function. As
-Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
+  f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
+Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
+approximating a specific class of functions. Consequently, there is a necessity
+for an expansion of the perceptron.\\
+
+As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
+a chain structure of the form,
+\begin{equation} \label{eq:mlp_char}
+  f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
+\end{equation}
+This converts a perceptron, which has only two layers (an input and an output
+layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
+is represented in the structure of an MLP as a \emph{layer}. A multitude of
+\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
+same vector-to-scalar calculation as the perceptron does. Subsequently, a
+nonlinear activation function transforms the scalar output into the activation
+of the unit. The layers are staggered in the neural network, with each layer
+being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
+input vector $\boldsymbol{x}$ is provided to each unit of the first layer
+$f^{(1)}$, which then gives the results to the units of the second layer
+$f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
+intervening layers, situated between the first and the output layers are the
+\emph{hidden layers}. The alternating structure of linear and nonlinear
+calculation enables MLP's to approximate any function. As Hornik
+\etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
 
 
 \begin{figure}[h]
 \begin{figure}[h]
   \centering
   \centering
@@ -460,34 +463,58 @@ Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
   \label{fig:mlp_example}
   \label{fig:mlp_example}
 \end{figure}
 \end{figure}
 
 
-The process of optimizing the parameters $\theta$ is called \emph{training}.
-For trainning we will have to have a set of \emph{training data}, which
-is a set of pairs (also called training points) of the input data
-$\boldsymbol{x}$ and its corresponding true solution $\boldsymbol{y}$ of the
-function $f^{*}$. For the training process we must define the
-\emph{loss function} $\Loss{ }$, using the model prediction
+The term \emph{training} describes the process of optimizing the parameters
+$\theta$. In order to undertake training, it is necessary to have a set of
+\emph{training data}, which is a set of pairs (also called training points) of
+the input data $\boldsymbol{x}$ and its corresponding true solution
+$\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
+define the \emph{loss function} $\Loss{ }$, using the model prediction
 $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
 $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
-metric of how far the model is away from the correct answer. One of the most
-common loss function is the \emph{mean square error} (MSE) loss function. Let
-$N$ be the number of points in the set of training data, then
+metric for evaluating the extent to which the model deviates from the correct
+answer. One of the most common loss function is the \emph{mean square error}
+(MSE) loss function. Let $N$ be the number of points in the set of training
+data. Then,
 \begin{equation}
 \begin{equation}
-  \Loss{MSE} = \frac{1}{N}\sum_{n=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
+  \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
 \end{equation}
 \end{equation}
-calculates the squared differnce between each model prediction and true value
-of a training and takes the mean across the whole training data. Now, we only
-need a way to change the parameters using our loss. For this gradient descent
-is used in gradient-based learning.
-- gradient descent/gradient-based learning
-- backpropagation
-- optimizers
-- learning rate
-- scheduler
-
+calculates the squared difference between each model prediction and true value
+of a training and takes the mean across the whole training data. \\
+
+In the context of neural networks, \emph{forward propagation} describes the
+process of information flowing through the network from the input layer to the
+output layer, resulting in a scalar loss. Ultimately, the objective is to
+utilize this information to optimize the parameters, in order to minimize the
+loss. One of the most fundamental optimization strategy is \emph{gradient
+  descent}. In this process, the derivatives are employed to identify the location
+of local or global minima within a function. Given that a positive gradient
+signifies ascent and a negative gradient indicates descent, we must move the
+variable by a constant \emph{learning rate} (step size) in the opposite
+direction to that of the gradient. The calculation of the derivatives in respect
+to the parameters is a complex task, since our functions is a composition of
+many functions (one for each layer). The algorithm of \emph{back propagation} \todo{Insert source}
+takes the advantage of~\Cref{eq:mlp_char} and addresses this issue by employing
+the chain rule of calculus.\\
+
+In practical applications, an optimizer often accomplishes the optimization task
+by executing gradient descent in the background. Furthermore, modifying  the
+learning rate during training can be advantageous. For instance, making larger
+steps at the beginning and minor adjustments at the end. Therefore, schedulers
+are implementations algorithms that employ diverse learning rate alteration
+strategies.\\
+
+This section provides an overview of basic concepts of neural networks. For a
+deeper understanding, we direct the reader to the book \emph{Deep Learning} by
+Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
+the application of neural networks in approximating solutions to differential
+systems.
 
 
 % -------------------------------------------------------------------
 % -------------------------------------------------------------------
 
 
 \section{Physics Informed Neural Networks   5}
 \section{Physics Informed Neural Networks   5}
 \label{sec:pinn}
 \label{sec:pinn}
+In~\Cref{sec:mlp} we described the structure and training of MLP's, which are
+recognized tools for approximating any kind of function. In this section we want
+to make use of this ability and us neural networks as approximators for ODE's.
 
 
 % -------------------------------------------------------------------
 % -------------------------------------------------------------------
 
 

BIN
thesis.pdf