Bladeren bron

add half mlp section

Phillip Rothenbeck 1 jaar geleden
bovenliggende
commit
de7104f104
2 gewijzigde bestanden met toevoegingen van 64 en 14 verwijderingen
  1. 64 14
      chapters/chap02/chap02.tex
  2. BIN
      thesis.pdf

+ 64 - 14
chapters/chap02/chap02.tex

@@ -81,6 +81,8 @@ semantics of a problem. This method is useful if no basic function exists for a
 system. Differential equations find application in several areas such as
 engineering, physics, economics, epidemiology, and beyond.\\
 
+\todo{Here insert definition of differential equations (take from books)}
+
 In the context of functions, it is possible to have multiple domains, meaning
 that function has more than one parameter. To illustrate, consider a function
 operating in two-dimensional space, wherein each parameter represents one axis
@@ -163,13 +165,13 @@ contact or proximity of an individual carrying the illness and a healthy
 individual. This is possible due to the distinction between infected beings
 who are carriers of the disease and the part of the population, which is
 susceptible to infection. In the model, the mentioned groups are capable to
-change, e.g.,  healthy individuals becoming infected.  The model assumes the
+change, \eg,  healthy individuals becoming infected.  The model assumes the
 size $N$ of the population remains constant throughout the duration of the
 pandemic. The population $N$ comprises three distinct groups: the
 \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
 \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
   \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
-\begin{equation} \label{eq:N_char}
+\begin{equation}
   S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
 \end{equation}
 give the values of $S$, $I$ and $R$ at a certain point of time
@@ -183,7 +185,7 @@ the $R$ group are either recovered or deceased, and thus unable to transmit or
 carry the disease.
 \begin{figure}[h]
   \centering
-  \includegraphics[scale=0.3]{sir_graph.png}
+  \includegraphics[scale=0.87]{sir_graph.pdf}
   \caption{A visualization of the SIR model, illustrating $N$ being split in the
     three groups $S$, $I$ and $R$.}
   \label{fig:sir_model}
@@ -192,7 +194,7 @@ As visualized in the~\Cref{fig:sir_model} the
 individuals may transition between groups based on transition rates. The
 transmission rate $\beta$ is responsible for individuals becoming infected,
 while the rate of removal or recovery rate $\alpha$ (also referred to as
-$\delta$ or $\nu$, e.g.,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
+$\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
 individuals from $I$ to $R$.\\
 
 We can describe this problem mathematically using a system of differential
@@ -244,8 +246,8 @@ emerged.\\
   \setlength{\unitlength}{1cm} % Set the unit length for coordinates
   \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
     % reference
-    \put(0, 2.5){
-      \begin{subfigure}{0.3\textwidth}
+    \put(0, 1.75){
+      \begin{subfigure}{0.4\textwidth}
         \centering
         \includegraphics[width=\textwidth]{reference_params_synth.png}
         \caption{$\alpha=0.35$, $\beta=0.5$}
@@ -253,7 +255,7 @@ emerged.\\
       \end{subfigure}
     }
     % 1. row, 1.image (low beta)
-    \put(5, 5){
+    \put(5.5, 5){
       \begin{subfigure}{0.3\textwidth}
         \centering
         \includegraphics[width=\textwidth]{low_beta_synth.png}
@@ -262,7 +264,7 @@ emerged.\\
       \end{subfigure}
     }
     % 1. row, 2.image (high beta)
-    \put(9, 5){
+    \put(9.5, 5){
       \begin{subfigure}{0.3\textwidth}
         \centering
         \includegraphics[width=\textwidth]{high_beta_synth.png}
@@ -271,7 +273,7 @@ emerged.\\
       \end{subfigure}
     }
     % 2. row, 1.image (low alpha)
-    \put(5, 0){
+    \put(5.5, 0){
       \begin{subfigure}{0.3\textwidth}
         \centering
         \includegraphics[width=\textwidth]{low_alpha_synth.png}
@@ -280,7 +282,7 @@ emerged.\\
       \end{subfigure}
     }
     % 2. row, 2.image (high alpha)
-    \put(9, 0){
+    \put(9.5, 0){
       \begin{subfigure}{0.3\textwidth}
         \centering
         \includegraphics[width=\textwidth]{high_alpha_synth.png}
@@ -338,7 +340,7 @@ that reduce the contact between the infectious and susceptible individuals, the
 emergence of a new variant of the disease that increases its infectivity or
 deadliness, or the administration of a vaccination that provides previously
 susceptible individuals with immunity without ever being infectious. To address
-this Millevoi et al.~\cite{Millevoi2023} introduce a model that simultaneously
+this Millevoi \etal~\cite{Millevoi2023} introduce a model that simultaneously
 reduces the size of the system of differential equations and solves the problem
 of time scaling at hand.\\
 
@@ -360,7 +362,7 @@ pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
 limited number of infections resulting from $I(t_0) << S(t_0)$. When $\RO < 1$,
 the disease is spreading rapidly across the population, with an increase in $I$
 occurring at a high rate. Nevertheless, $\RO$ does not cover the entire time
-span. For this reason, Millevoi et al.~\cite{Millevoi2023} introduce $\Rt$
+span. For this reason, Millevoi \etal~\cite{Millevoi2023} introduce $\Rt$
 which has the same interpretation as $\RO$, with the exception that $\Rt$ is
 dependent on time. The definition of the time-dependent reproduction number on
 the time interval $\mathcal{T}$ with the population size $N$,
@@ -383,7 +385,7 @@ $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
 \end{equation}
 is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
 characteristic and the reproduction number \Rt (see ~\Cref{eq:repr_num}).
-Another issue that Millevoi et al.~\cite{Millevoi2023} seek to address is the
+Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
 extensive range of values that the SIR groups can assume, spanning from $0$ to
 $10^7$. Accordingly, they initially scale the time interval $\mathcal{T}$ using
 its borders to calculate the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in
@@ -413,7 +415,55 @@ differential equations in an epidemiological context. Now, the last point is to
 solve these equations. For this problem, there are multiple methods to reach
 this goal one of them is the \emph{Multilayer Perceptron}
 (MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
-training and usage of these neural networks.
+training and usage of these \emph{neural networks} using, for which we use the book
+\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a base
+for our explanations.\\
+
+The goal is to be able to approximate any function $f^{*}$ that is for instance
+mathematical function or a mapping of an input vector to a class or category.
+Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$ the label, class
+or result, then,
+\begin{equation}
+  \boldsymbol{y} = f^{*}(\boldsymbol{x}),
+\end{equation}
+is the function to approximate. In the year 1958,
+Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
+a neuron in a neuroscientific sense. The perceptron takes in the input vector
+$\boldsymbol{x}$ performs anoperation and produces a scalar result. This model
+optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
+  f(\boldsymbol{x}; \theta)$ as correct as possible. As Minsky and
+Papert~\cite{Minsky1972} show, the perceptron on its own is able to approximate
+only a class of functions. Thus, the need for an expansion of the perceptron.\\
+
+As Goodfellow \etal go on, the solution for this is to split $f$ into a chain
+structure of $f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x})))$.
+This transforms a perceptron, which has an input and output layer into a
+multilayer perceptron. Each sub-function $f^{(n)}$ is represented in the
+structure of an MLP as a \emph{layer}, which are each build of a multitude of
+\emph{units} (also \emph{neurons}) each of which are doing the same
+vector-to-scalar calculation as the perceptron does. Each scalar, is then given
+to a nonlinear activation function. The layers are staggered in the neural
+network, with each being connected to its neighbor, in the way as illustrated
+in~\Cref{fig:mlp_example}. The input vector $\boldsymbol{x}$ is given to each
+unit of the first layer $f^{(1)}$, which results are then given to the units of
+the second layer $f^{(2)}$, and so on. The last layer is called the
+\emph{output layer}. All layers in between the first and the output layers are
+called \emph{hidden layers}. Through the alternating structure of linear and
+nonlinear calculation MLP's are able to approximate any kind of function. As
+Hornik \etal~\cite{Hornik1989} shows, MLP's are universal approximators.\\
+
+\begin{figure}[h]
+  \centering
+  \includegraphics[scale=0.87]{MLP.pdf}
+  \caption{A visualization of the SIR model, illustrating $N$ being split in the
+    three groups $S$, $I$ and $R$.}
+  \label{fig:mlp_example}
+\end{figure}
+
+The process of optimizing the parameters $\theta$ is called \emph{learning}.
+Here, we define a metric for the quality of the results, of our neural network.
+This metric is called a loss function
+
 
 % -------------------------------------------------------------------
 

BIN
thesis.pdf