1 жил өмнө · 3e7fa08192
--- a/chapters/chap02/chap02.tex
+++ b/chapters/chap02/chap02.tex
@@ -28,12 +28,12 @@ in~\Cref{sec:pinn}.
 
				 \label{sec:domain}
			
 
				 
			
 
				 To model a physical problem using mathematical tools, it is necessary to define
			
 
				-a set of fundamental numbers or quantities upon which the subsequent calculations
			
 
				-will be based. These sets may represent, for instance, a specific time interval
			
 
				-or a distance. The term \emph{domain} describes these fundamental sets of
			
 
				-numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
			
 
				-living in a certain domain. In this thesis, we will focus on domains of real
			
 
				-numbers in $\mathbb{R}$.\\
			
 
				+a set of fundamental numbers or quantities upon which the subsequent
			
 
				+calculations will be based. These sets may represent, for instance, a specific
			
 
				+time interval or a distance. The term \emph{domain} describes these fundamental
			
 
				+sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
			
 
				+entity living in a certain domain. In this thesis, we will focus on domains of
			
 
				+real numbers in $\mathbb{R}$.\\
			
 
				 
			
 
				 The mapping between variables enables the modeling of the process and depicts
			
 
				 the semantics. We use functions in order to facilitate this mapping. Let
			
@@ -408,49 +408,52 @@ systems, as we describe in~\Cref{sec:mlp}.
 
				 
			
 
				 \section{Multilayer Perceptron   2}
			
 
				 \label{sec:mlp}
			
 
				-In~\Cref{sec:differentialEq} we show the importance of differential equations to
			
 
				-systems, being able to show the change of it dependent on a certain parameter of
			
 
				-the parameter. In~\Cref{sec:epidemModel} we show specific applications for
			
 
				-differential equations in an epidemiological context. Now, the last point is to
			
 
				-solve these equations. For this problem, there are multiple methods to reach
			
 
				-this goal one of them is the \emph{Multilayer Perceptron}
			
 
				-(MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
			
 
				-training and usage of these \emph{neural networks} using, for which we use the book
			
 
				-\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a base
			
 
				-for our explanations.\\
			
 
				-
			
 
				-The goal is to be able to approximate any function $f^{*}$ that is for instance
			
 
				-mathematical function or a mapping of an input vector to a class or category.
			
 
				-Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$ the label, class
			
 
				-or result, then,
			
 
				-\begin{equation}
			
 
				-  \boldsymbol{y} = f^{*}(\boldsymbol{x}),
			
 
				-\end{equation}
			
 
				+In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
			
 
				+equations in systems, illustrating how they can be utilized to elucidate the
			
 
				+impact of a specific parameter on the system's behavior.
			
 
				+In~\Cref{sec:epidemModel}, we show specific applications of differential
			
 
				+equations in an epidemiological context. The final objective is to solve these
			
 
				+equations. For this problem, there are multiple methods to achieve this goal. On
			
 
				+such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
			
 
				+following section, we provide a brief overview of the structure and training of
			
 
				+these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
			
 
				+by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
			
 
				+explanations.\\
			
 
				+
			
 
				+The objective is to develop an approximation method for any function $f^{*}$,
			
 
				+which could be a mathematical function or a mapping of an input vector to a
			
 
				+class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
			
 
				+the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
			
 
				 is the function to approximate. In the year 1958,
			
 
				 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
			
 
				 a neuron in a neuroscientific sense. The perceptron takes in the input vector
			
 
				-$\boldsymbol{x}$ performs anoperation and produces a scalar result. This model
			
 
				+$\boldsymbol{x}$ performs an operation and produces a scalar result. This model
			
 
				 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
			
 
				-  f(\boldsymbol{x}; \theta)$ as correct as possible. As Minsky and
			
 
				-Papert~\cite{Minsky1972} show, the perceptron on its own is able to approximate
			
 
				-only a class of functions. Thus, the need for an expansion of the perceptron.\\
			
 
				-
			
 
				-As Goodfellow \etal go on, the solution for this is to split $f$ into a chain
			
 
				-structure of $f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x})))$.
			
 
				-This transforms a perceptron, which has an input and output layer into a
			
 
				-multilayer perceptron. Each sub-function $f^{(n)}$ is represented in the
			
 
				-structure of an MLP as a \emph{layer}, which are each build of a multitude of
			
 
				-\emph{units} (also \emph{neurons}) each of which are doing the same
			
 
				-vector-to-scalar calculation as the perceptron does. Each scalar, is then given
			
 
				-to a nonlinear activation function. The layers are staggered in the neural
			
 
				-network, with each being connected to its neighbor, in the way as illustrated
			
 
				-in~\Cref{fig:mlp_example}. The input vector $\boldsymbol{x}$ is given to each
			
 
				-unit of the first layer $f^{(1)}$, which results are then given to the units of
			
 
				-the second layer $f^{(2)}$, and so on. The last layer is called the
			
 
				-\emph{output layer}. All layers in between the first and the output layers are
			
 
				-called \emph{hidden layers}. Through the alternating structure of linear and
			
 
				-nonlinear calculation MLP's are able to approximate any kind of function. As
			
 
				-Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
			
 
				+  f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
			
 
				+Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
			
 
				+approximating a specific class of functions. Consequently, there is a necessity
			
 
				+for an expansion of the perceptron.\\
			
 
				+
			
 
				+As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
			
 
				+a chain structure of the form,
			
 
				+\begin{equation} \label{eq:mlp_char}
			
 
				+  f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
			
 
				+\end{equation}
			
 
				+This converts a perceptron, which has only two layers (an input and an output
			
 
				+layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
			
 
				+is represented in the structure of an MLP as a \emph{layer}. A multitude of
			
 
				+\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
			
 
				+same vector-to-scalar calculation as the perceptron does. Subsequently, a
			
 
				+nonlinear activation function transforms the scalar output into the activation
			
 
				+of the unit. The layers are staggered in the neural network, with each layer
			
 
				+being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
			
 
				+input vector $\boldsymbol{x}$ is provided to each unit of the first layer
			
 
				+$f^{(1)}$, which then gives the results to the units of the second layer
			
 
				+$f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
			
 
				+intervening layers, situated between the first and the output layers are the
			
 
				+\emph{hidden layers}. The alternating structure of linear and nonlinear
			
 
				+calculation enables MLP's to approximate any function. As Hornik
			
 
				+\etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
			
 
				 
			
 
				 \begin{figure}[h]
			
 
				   \centering
			
@@ -460,34 +463,58 @@ Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
 
				   \label{fig:mlp_example}
			
 
				 \end{figure}
			
 
				 
			
 
				-The process of optimizing the parameters $\theta$ is called \emph{training}.
			
 
				-For trainning we will have to have a set of \emph{training data}, which
			
 
				-is a set of pairs (also called training points) of the input data
			
 
				-$\boldsymbol{x}$ and its corresponding true solution $\boldsymbol{y}$ of the
			
 
				-function $f^{*}$. For the training process we must define the
			
 
				-\emph{loss function} $\Loss{ }$, using the model prediction
			
 
				+The term \emph{training} describes the process of optimizing the parameters
			
 
				+$\theta$. In order to undertake training, it is necessary to have a set of
			
 
				+\emph{training data}, which is a set of pairs (also called training points) of
			
 
				+the input data $\boldsymbol{x}$ and its corresponding true solution
			
 
				+$\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
			
 
				+define the \emph{loss function} $\Loss{ }$, using the model prediction
			
 
				 $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
			
 
				-metric of how far the model is away from the correct answer. One of the most
			
 
				-common loss function is the \emph{mean square error} (MSE) loss function. Let
			
 
				-$N$ be the number of points in the set of training data, then
			
 
				+metric for evaluating the extent to which the model deviates from the correct
			
 
				+answer. One of the most common loss function is the \emph{mean square error}
			
 
				+(MSE) loss function. Let $N$ be the number of points in the set of training
			
 
				+data. Then,
			
 
				 \begin{equation}
			
 
				-  \Loss{MSE} = \frac{1}{N}\sum_{n=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
			
 
				+  \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
			
 
				 \end{equation}
			
 
				-calculates the squared differnce between each model prediction and true value
			
 
				-of a training and takes the mean across the whole training data. Now, we only
			
 
				-need a way to change the parameters using our loss. For this gradient descent
			
 
				-is used in gradient-based learning.
			
 
				-- gradient descent/gradient-based learning
			
 
				-- backpropagation
			
 
				-- optimizers
			
 
				-- learning rate
			
 
				-- scheduler
			
 
				-
			
 
				+calculates the squared difference between each model prediction and true value
			
 
				+of a training and takes the mean across the whole training data. \\
			
 
				+
			
 
				+In the context of neural networks, \emph{forward propagation} describes the
			
 
				+process of information flowing through the network from the input layer to the
			
 
				+output layer, resulting in a scalar loss. Ultimately, the objective is to
			
 
				+utilize this information to optimize the parameters, in order to minimize the
			
 
				+loss. One of the most fundamental optimization strategy is \emph{gradient
			
 
				+  descent}. In this process, the derivatives are employed to identify the location
			
 
				+of local or global minima within a function. Given that a positive gradient
			
 
				+signifies ascent and a negative gradient indicates descent, we must move the
			
 
				+variable by a constant \emph{learning rate} (step size) in the opposite
			
 
				+direction to that of the gradient. The calculation of the derivatives in respect
			
 
				+to the parameters is a complex task, since our functions is a composition of
			
 
				+many functions (one for each layer). The algorithm of \emph{back propagation} \todo{Insert source}
			
 
				+takes the advantage of~\Cref{eq:mlp_char} and addresses this issue by employing
			
 
				+the chain rule of calculus.\\
			
 
				+
			
 
				+In practical applications, an optimizer often accomplishes the optimization task
			
 
				+by executing gradient descent in the background. Furthermore, modifying  the
			
 
				+learning rate during training can be advantageous. For instance, making larger
			
 
				+steps at the beginning and minor adjustments at the end. Therefore, schedulers
			
 
				+are implementations algorithms that employ diverse learning rate alteration
			
 
				+strategies.\\
			
 
				+
			
 
				+This section provides an overview of basic concepts of neural networks. For a
			
 
				+deeper understanding, we direct the reader to the book \emph{Deep Learning} by
			
 
				+Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
			
 
				+the application of neural networks in approximating solutions to differential
			
 
				+systems.
			
 
				 
			
 
				 % -------------------------------------------------------------------
			
 
				 
			
 
				 \section{Physics Informed Neural Networks   5}
			
 
				 \label{sec:pinn}
			
 
				+In~\Cref{sec:mlp} we described the structure and training of MLP's, which are
			
 
				+recognized tools for approximating any kind of function. In this section we want
			
 
				+to make use of this ability and us neural networks as approximators for ODE's.
			
 
				 
			
 
				 % -------------------------------------------------------------------
			
 
				 
			
--- a/thesis.pdf
+++ b/thesis.pdf