Phillip Rothenbeck 1 жил өмнө
parent
commit
3e7fa08192
2 өөрчлөгдсөн 92 нэмэгдсэн , 65 устгасан
  1. 92 65
      chapters/chap02/chap02.tex
  2. BIN
      thesis.pdf

+ 92 - 65
chapters/chap02/chap02.tex

@@ -28,12 +28,12 @@ in~\Cref{sec:pinn}.
 \label{sec:domain}
 
 To model a physical problem using mathematical tools, it is necessary to define
-a set of fundamental numbers or quantities upon which the subsequent calculations
-will be based. These sets may represent, for instance, a specific time interval
-or a distance. The term \emph{domain} describes these fundamental sets of
-numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
-living in a certain domain. In this thesis, we will focus on domains of real
-numbers in $\mathbb{R}$.\\
+a set of fundamental numbers or quantities upon which the subsequent
+calculations will be based. These sets may represent, for instance, a specific
+time interval or a distance. The term \emph{domain} describes these fundamental
+sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
+entity living in a certain domain. In this thesis, we will focus on domains of
+real numbers in $\mathbb{R}$.\\
 
 The mapping between variables enables the modeling of the process and depicts
 the semantics. We use functions in order to facilitate this mapping. Let
@@ -408,49 +408,52 @@ systems, as we describe in~\Cref{sec:mlp}.
 
 \section{Multilayer Perceptron   2}
 \label{sec:mlp}
-In~\Cref{sec:differentialEq} we show the importance of differential equations to
-systems, being able to show the change of it dependent on a certain parameter of
-the parameter. In~\Cref{sec:epidemModel} we show specific applications for
-differential equations in an epidemiological context. Now, the last point is to
-solve these equations. For this problem, there are multiple methods to reach
-this goal one of them is the \emph{Multilayer Perceptron}
-(MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
-training and usage of these \emph{neural networks} using, for which we use the book
-\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a base
-for our explanations.\\
-
-The goal is to be able to approximate any function $f^{*}$ that is for instance
-mathematical function or a mapping of an input vector to a class or category.
-Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$ the label, class
-or result, then,
-\begin{equation}
-  \boldsymbol{y} = f^{*}(\boldsymbol{x}),
-\end{equation}
+In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
+equations in systems, illustrating how they can be utilized to elucidate the
+impact of a specific parameter on the system's behavior.
+In~\Cref{sec:epidemModel}, we show specific applications of differential
+equations in an epidemiological context. The final objective is to solve these
+equations. For this problem, there are multiple methods to achieve this goal. On
+such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
+following section, we provide a brief overview of the structure and training of
+these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
+by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
+explanations.\\
+
+The objective is to develop an approximation method for any function $f^{*}$,
+which could be a mathematical function or a mapping of an input vector to a
+class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
+the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
 is the function to approximate. In the year 1958,
 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
 a neuron in a neuroscientific sense. The perceptron takes in the input vector
-$\boldsymbol{x}$ performs anoperation and produces a scalar result. This model
+$\boldsymbol{x}$ performs an operation and produces a scalar result. This model
 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
-  f(\boldsymbol{x}; \theta)$ as correct as possible. As Minsky and
-Papert~\cite{Minsky1972} show, the perceptron on its own is able to approximate
-only a class of functions. Thus, the need for an expansion of the perceptron.\\
-
-As Goodfellow \etal go on, the solution for this is to split $f$ into a chain
-structure of $f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x})))$.
-This transforms a perceptron, which has an input and output layer into a
-multilayer perceptron. Each sub-function $f^{(n)}$ is represented in the
-structure of an MLP as a \emph{layer}, which are each build of a multitude of
-\emph{units} (also \emph{neurons}) each of which are doing the same
-vector-to-scalar calculation as the perceptron does. Each scalar, is then given
-to a nonlinear activation function. The layers are staggered in the neural
-network, with each being connected to its neighbor, in the way as illustrated
-in~\Cref{fig:mlp_example}. The input vector $\boldsymbol{x}$ is given to each
-unit of the first layer $f^{(1)}$, which results are then given to the units of
-the second layer $f^{(2)}$, and so on. The last layer is called the
-\emph{output layer}. All layers in between the first and the output layers are
-called \emph{hidden layers}. Through the alternating structure of linear and
-nonlinear calculation MLP's are able to approximate any kind of function. As
-Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
+  f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
+Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
+approximating a specific class of functions. Consequently, there is a necessity
+for an expansion of the perceptron.\\
+
+As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
+a chain structure of the form,
+\begin{equation} \label{eq:mlp_char}
+  f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
+\end{equation}
+This converts a perceptron, which has only two layers (an input and an output
+layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
+is represented in the structure of an MLP as a \emph{layer}. A multitude of
+\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
+same vector-to-scalar calculation as the perceptron does. Subsequently, a
+nonlinear activation function transforms the scalar output into the activation
+of the unit. The layers are staggered in the neural network, with each layer
+being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
+input vector $\boldsymbol{x}$ is provided to each unit of the first layer
+$f^{(1)}$, which then gives the results to the units of the second layer
+$f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
+intervening layers, situated between the first and the output layers are the
+\emph{hidden layers}. The alternating structure of linear and nonlinear
+calculation enables MLP's to approximate any function. As Hornik
+\etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
 
 \begin{figure}[h]
   \centering
@@ -460,34 +463,58 @@ Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
   \label{fig:mlp_example}
 \end{figure}
 
-The process of optimizing the parameters $\theta$ is called \emph{training}.
-For trainning we will have to have a set of \emph{training data}, which
-is a set of pairs (also called training points) of the input data
-$\boldsymbol{x}$ and its corresponding true solution $\boldsymbol{y}$ of the
-function $f^{*}$. For the training process we must define the
-\emph{loss function} $\Loss{ }$, using the model prediction
+The term \emph{training} describes the process of optimizing the parameters
+$\theta$. In order to undertake training, it is necessary to have a set of
+\emph{training data}, which is a set of pairs (also called training points) of
+the input data $\boldsymbol{x}$ and its corresponding true solution
+$\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
+define the \emph{loss function} $\Loss{ }$, using the model prediction
 $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
-metric of how far the model is away from the correct answer. One of the most
-common loss function is the \emph{mean square error} (MSE) loss function. Let
-$N$ be the number of points in the set of training data, then
+metric for evaluating the extent to which the model deviates from the correct
+answer. One of the most common loss function is the \emph{mean square error}
+(MSE) loss function. Let $N$ be the number of points in the set of training
+data. Then,
 \begin{equation}
-  \Loss{MSE} = \frac{1}{N}\sum_{n=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
+  \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
 \end{equation}
-calculates the squared differnce between each model prediction and true value
-of a training and takes the mean across the whole training data. Now, we only
-need a way to change the parameters using our loss. For this gradient descent
-is used in gradient-based learning.
-- gradient descent/gradient-based learning
-- backpropagation
-- optimizers
-- learning rate
-- scheduler
-
+calculates the squared difference between each model prediction and true value
+of a training and takes the mean across the whole training data. \\
+
+In the context of neural networks, \emph{forward propagation} describes the
+process of information flowing through the network from the input layer to the
+output layer, resulting in a scalar loss. Ultimately, the objective is to
+utilize this information to optimize the parameters, in order to minimize the
+loss. One of the most fundamental optimization strategy is \emph{gradient
+  descent}. In this process, the derivatives are employed to identify the location
+of local or global minima within a function. Given that a positive gradient
+signifies ascent and a negative gradient indicates descent, we must move the
+variable by a constant \emph{learning rate} (step size) in the opposite
+direction to that of the gradient. The calculation of the derivatives in respect
+to the parameters is a complex task, since our functions is a composition of
+many functions (one for each layer). The algorithm of \emph{back propagation} \todo{Insert source}
+takes the advantage of~\Cref{eq:mlp_char} and addresses this issue by employing
+the chain rule of calculus.\\
+
+In practical applications, an optimizer often accomplishes the optimization task
+by executing gradient descent in the background. Furthermore, modifying  the
+learning rate during training can be advantageous. For instance, making larger
+steps at the beginning and minor adjustments at the end. Therefore, schedulers
+are implementations algorithms that employ diverse learning rate alteration
+strategies.\\
+
+This section provides an overview of basic concepts of neural networks. For a
+deeper understanding, we direct the reader to the book \emph{Deep Learning} by
+Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
+the application of neural networks in approximating solutions to differential
+systems.
 
 % -------------------------------------------------------------------
 
 \section{Physics Informed Neural Networks   5}
 \label{sec:pinn}
+In~\Cref{sec:mlp} we described the structure and training of MLP's, which are
+recognized tools for approximating any kind of function. In this section we want
+to make use of this ability and us neural networks as approximators for ODE's.
 
 % -------------------------------------------------------------------
 

BIN
thesis.pdf