Bladeren bron

corr no 4

FlipediFlop 9 maanden geleden
bovenliggende
commit
bd065b68e7
2 gewijzigde bestanden met toevoegingen van 39 en 37 verwijderingen
  1. 39 37
      chapters/chap02/chap02.tex
  2. BIN
      thesis.pdf

+ 39 - 37
chapters/chap02/chap02.tex

@@ -448,7 +448,7 @@ the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
 is the function to approximate. In the year 1958,
 is the function to approximate. In the year 1958,
 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
 Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
 a neuron in a neuroscientific sense. The perceptron takes in the input vector
 a neuron in a neuroscientific sense. The perceptron takes in the input vector
-$\boldsymbol{x}$ performs an operation and produces a scalar result. This model
+$\boldsymbol{x}$, performs an operation and produces a scalar result. This model
 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
 optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
   f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
   f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
 Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
 Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
@@ -460,19 +460,19 @@ a chain structure of the form,
 \begin{equation} \label{eq:mlp_char}
 \begin{equation} \label{eq:mlp_char}
   f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
   f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
 \end{equation}
 \end{equation}
-This nested version of a perceptron is a multilayer perceptron. Each
+This nested version of a perceptron is called a multilayer perceptron. Each
 sub-function, designated as $f^{(i)}$, is represented in the structure of an
 sub-function, designated as $f^{(i)}$, is represented in the structure of an
 MLP as a \emph{layer}, which contains a linear mapping and a nonlinear mapping
 MLP as a \emph{layer}, which contains a linear mapping and a nonlinear mapping
 in form of an \emph{activation function}. A multitude of
 in form of an \emph{activation function}. A multitude of
-\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
+\emph{units} (also \emph{neurons}) compose each layer. A neuron performs the
 same vector-to-scalar calculation as the perceptron does. Subsequently, a
 same vector-to-scalar calculation as the perceptron does. Subsequently, a
 nonlinear activation function transforms the scalar output into the activation
 nonlinear activation function transforms the scalar output into the activation
 of the unit. The layers are staggered in the neural network, with each layer
 of the unit. The layers are staggered in the neural network, with each layer
-being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
-input vector $\boldsymbol{x}$ is provided to each unit of the first layer
+being connected to its neighboring layers, as illustrated in~\Cref{fig:mlp_example}. The
+input vector $\boldsymbol{x}$ is provided to each unit of the first layer (input layer)
 $f^{(1)}$, which then gives the results to the units of the second layer
 $f^{(1)}$, which then gives the results to the units of the second layer
 $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
 $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
-intervening layers, situated between the first and the output layers are the
+intervening layers, situated between the input and the output layers are the
 \emph{hidden layers}. The term \emph{forward propagation} describes the
 \emph{hidden layers}. The term \emph{forward propagation} describes the
 process of information flowing through the network from the input layer to the
 process of information flowing through the network from the input layer to the
 output layer, resulting in a scalar loss. The alternating structure of linear
 output layer, resulting in a scalar loss. The alternating structure of linear
@@ -503,7 +503,7 @@ data. Then,
   \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
   \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
 \end{equation}
 \end{equation}
 calculates the squared difference between each model prediction and true value
 calculates the squared difference between each model prediction and true value
-of a training and takes the mean across the whole training data. \\
+of a training data point and takes the mean across the whole training data. \\
 
 
 Ultimately, the objective is to utilize this information to optimize the parameters, in order to minimize the
 Ultimately, the objective is to utilize this information to optimize the parameters, in order to minimize the
 loss. One of the most fundamental and seminal optimization strategy is \emph{gradient
 loss. One of the most fundamental and seminal optimization strategy is \emph{gradient
@@ -513,8 +513,8 @@ zero. Given that a positive gradient
 signifies ascent and a negative gradient indicates descent, we must move the
 signifies ascent and a negative gradient indicates descent, we must move the
 variable by a \emph{learning rate} (step size) in the opposite
 variable by a \emph{learning rate} (step size) in the opposite
 direction to that of the gradient. The calculation of the derivatives in respect
 direction to that of the gradient. The calculation of the derivatives in respect
-to the parameters is a complex task, since our functions is a composition of
-many functions (one for each layer). We can address this issue taking advantage
+to the parameters is a complex task, since our function is a composition of
+many functions (one for each layer). We can address this issue by taking advantage
 of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
 of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
 $\hat{\boldsymbol{y}} = f(\boldsymbol{x}; \theta)$ be the model prediction with the
 $\hat{\boldsymbol{y}} = f(\boldsymbol{x}; \theta)$ be the model prediction with the
 decomposed version $f(\boldsymbol{x}; \theta) = f^{(3)}(w; \theta_3)$ with
 decomposed version $f(\boldsymbol{x}; \theta) = f^{(3)}(w; \theta_3)$ with
@@ -527,14 +527,14 @@ Then,
 is the gradient of $\Loss{ }$ in respect of the parameters $\theta_3$. To obtain
 is the gradient of $\Loss{ }$ in respect of the parameters $\theta_3$. To obtain
 $\nabla_{\theta_2} \Loss{ }$, we have to derive $\nabla_{\theta_3} \Loss{ }$ in
 $\nabla_{\theta_2} \Loss{ }$, we have to derive $\nabla_{\theta_3} \Loss{ }$ in
 respect to $\theta_2$. The name of this method in the context of neural
 respect to $\theta_2$. The name of this method in the context of neural
-networks is \emph{back propagation}~\cite{Rumelhart1986}, as it propagates the
+networks is \emph{backpropagation}~\cite{Rumelhart1986}, as it propagates the
 error backwards through the neural network.\\
 error backwards through the neural network.\\
 
 
 In practical applications, an optimizer often accomplishes the optimization task
 In practical applications, an optimizer often accomplishes the optimization task
-by executing back propagation in the background. Furthermore, modifying the
+by executing backpropagation in the background. Furthermore, modifying the
 learning rate during training can be advantageous. For instance, making larger
 learning rate during training can be advantageous. For instance, making larger
 steps at the beginning and minor adjustments at the end. Therefore, schedulers
 steps at the beginning and minor adjustments at the end. Therefore, schedulers
-are implementations algorithms that employ diverse learning rate alteration
+are implementations of algorithms that employ diverse learning rate alteration
 strategies.\\
 strategies.\\
 
 
 For a more in-depth discussion of practical considerations and additional
 For a more in-depth discussion of practical considerations and additional
@@ -549,7 +549,7 @@ solutions to differential systems.
 \label{sec:pinn}
 \label{sec:pinn}
 
 
 In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
 In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
-wildely recognized tools for approximating any kind of function. In 1997
+wildely recognized tools for approximating any kind of function. In 1997,
 Lagaris \etal~\cite{Lagaris1998} provided a method, that utilizes gradient
 Lagaris \etal~\cite{Lagaris1998} provided a method, that utilizes gradient
 descent to solve ODEs and PDEs. Building on this approach, Raissi
 descent to solve ODEs and PDEs. Building on this approach, Raissi
 \etal~\cite{Raissi2019} introduced the methodology with the name
 \etal~\cite{Raissi2019} introduced the methodology with the name
@@ -577,14 +577,14 @@ fitted to the data through the mean square error data loss $\mathcal{L}_{\text{d
 Moreover, the data loss function may include additional terms for initial and boundary
 Moreover, the data loss function may include additional terms for initial and boundary
 conditions. Furthermore, the physics are incorporated through an additional loss
 conditions. Furthermore, the physics are incorporated through an additional loss
 term of the physics loss $\mathcal{L}_{\text{physics}}$ which includes the
 term of the physics loss $\mathcal{L}_{\text{physics}}$ which includes the
-differential equation through its residual $r=\boldsymbol{y} - \mathcal{D}(\boldsymbol{x})$.
+differential equation through its residual $r=\nicefrac{d\boldsymbol{y}}{d\boldsymbol{x}} - \mathcal{D}(\boldsymbol{x})$.
 This leads to the PINN loss function,
 This leads to the PINN loss function,
 \begin{align}\label{eq:PINN_loss}
 \begin{align}\label{eq:PINN_loss}
   \mathcal{L}_{\text{PINN}}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) & = &  & \mathcal{L}_{\text{data}}         (\boldsymbol{y},\hat{\boldsymbol{y}})               & + & \quad \mathcal{L}_{\text{physics}}     (\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) &   \\
   \mathcal{L}_{\text{PINN}}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) & = &  & \mathcal{L}_{\text{data}}         (\boldsymbol{y},\hat{\boldsymbol{y}})               & + & \quad \mathcal{L}_{\text{physics}}     (\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) &   \\
                                                                                  & = &  & \frac{1}{N_t}\sum_{i=1}^{N_t} ||  \hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2 & + & \quad\frac{1}{N_d}\sum_{i=1}^{N_d} ||  r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2          & ,
                                                                                  & = &  & \frac{1}{N_t}\sum_{i=1}^{N_t} ||  \hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2 & + & \quad\frac{1}{N_d}\sum_{i=1}^{N_d} ||  r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2          & ,
 \end{align}
 \end{align}
 with $N_d$ the number of differential equations in a system and $N_t$ the
 with $N_d$ the number of differential equations in a system and $N_t$ the
-number of training samples used for training. Utilizing~\Cref{eq:PINN_loss}, the
+number of training samples used for training. Utilizing $\mathcal{L}_{\text{PINN}}$, the
 PINN simultaneously optimizes its parameters $\theta$ to minimize both the data
 PINN simultaneously optimizes its parameters $\theta$ to minimize both the data
 loss and the physics loss. This makes it a multi-objective optimization problem.\\
 loss and the physics loss. This makes it a multi-objective optimization problem.\\
 
 
@@ -592,7 +592,7 @@ Given the nature of differential equations, calculating the loss term of
 $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
 $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
 calculation of the derivative of the output with respect to the input of
 calculation of the derivative of the output with respect to the input of
 the neural network. As we outline in~\Cref{sec:mlp}, during the process of
 the neural network. As we outline in~\Cref{sec:mlp}, during the process of
-back-propagation we calculate the gradients of the loss term in respect to a
+backpropagation we calculate the gradients of the loss term in respect to a
 layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
 layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
 the index of the respective layer. By employing
 the index of the respective layer. By employing
 the chain rule of calculus, the algorithm progresses from the output layer
 the chain rule of calculus, the algorithm progresses from the output layer
@@ -602,7 +602,7 @@ compute the respective gradients. The term,
   \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
   \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
 \end{equation}
 \end{equation}
 illustrates that, in contrast to the procedure described in~\Cref{eq:backprop},
 illustrates that, in contrast to the procedure described in~\Cref{eq:backprop},
-this procedure the \emph{automatic differentiation} goes one step further and
+this procedure, the \emph{automatic differentiation}, goes one step further and
 calculates the gradient of the output with respect to the input
 calculates the gradient of the output with respect to the input
 $\boldsymbol{x}$. In order to calculate the second derivative
 $\boldsymbol{x}$. In order to calculate the second derivative
 $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
 $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
@@ -621,16 +621,9 @@ parameters within the neural network. This enables the network to utilize a
 specific value, that actively influences the physics loss
 specific value, that actively influences the physics loss
 $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$. During the
 $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$. During the
 training phase the optimizer aims to minimize the physics loss, which should
 training phase the optimizer aims to minimize the physics loss, which should
-ultimately yield an approximation of the true parameter value fitting the
+ultimately yield an approximation of the the true parameter value fitting the
 observations.\\
 observations.\\
 
 
-\begin{figure}[t]
-  \centering
-  \includegraphics[width=\textwidth]{oscilator.pdf}
-  \caption{Illustration of of the movement of an oscillating body in the
-    underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
-  \label{fig:spring}
-\end{figure}
 In order to illustrate the working of a PINN, we use the example of a
 In order to illustrate the working of a PINN, we use the example of a
 \emph{damped harmonic oscillator} taken from~\cite{Moseley}. In this problem, we
 \emph{damped harmonic oscillator} taken from~\cite{Moseley}. In this problem, we
 displace a body, which is attached to a spring, from its resting position. The
 displace a body, which is attached to a spring, from its resting position. The
@@ -646,7 +639,16 @@ stiffness of the spring. The residual of the differential equation,
 \begin{equation}
 \begin{equation}
   m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
   m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
 \end{equation}
 \end{equation}
-shows relation of these parameters in reference to the problem at hand. As
+
+\begin{figure}[t]
+  \centering
+  \includegraphics[width=\textwidth]{oscilator.pdf}
+  \caption{Illustration of of the movement of an oscillating body in the
+    underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
+  \label{fig:spring}
+\end{figure}
+
+shows the relation of these parameters in reference to the problem at hand. As
 Tenenbaum and Morris~\cite{Tenenbaum1985} provide, there are three potential solutions to this
 Tenenbaum and Morris~\cite{Tenenbaum1985} provide, there are three potential solutions to this
 issue. However only the \emph{underdamped case} results in an oscillating
 issue. However only the \emph{underdamped case} results in an oscillating
 movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
 movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
@@ -664,7 +666,7 @@ not know the value of the friction $\mu$. In this case the loss function,
 \end{equation}
 \end{equation}
 includes the border conditions, the residual, in which $\hat{\mu}$ is a learnable
 includes the border conditions, the residual, in which $\hat{\mu}$ is a learnable
 parameter and the data loss. By minimizing $\mathcal{L}_{\text{osc}}$ and
 parameter and the data loss. By minimizing $\mathcal{L}_{\text{osc}}$ and
-solving the inverse problem the PINN is able to find the missing parameter
+solving the inverse problem, the PINN is able to find the missing parameter
 $\mu$. This shows the methodology by which PINNs are capable of learning the
 $\mu$. This shows the methodology by which PINNs are capable of learning the
 parameters of physical systems, such as the damped harmonic oscillator. In the
 parameters of physical systems, such as the damped harmonic oscillator. In the
 following section, we present the approach of Shaier \etal~\cite{Shaier2021} to
 following section, we present the approach of Shaier \etal~\cite{Shaier2021} to
@@ -674,8 +676,8 @@ find the transmission rate and recovery rate of the SIR model using PINNs.
 
 
 \subsection{Disease-Informed Neural Networks}
 \subsection{Disease-Informed Neural Networks}
 \label{sec:pinn:dinn}
 \label{sec:pinn:dinn}
-In the preceding section, we present a data-driven methodology, as described by Lagaris
-\etal~\cite{Lagaris1998}, for solving systems of differential equations by employing
+In the preceding section, we present a data-driven methodology, as described by Raissi
+\etal~\cite{Raissi2019}, for solving systems of differential equations by employing
 PINNs. In~\Cref{sec:pandemicModel:sir}, we describe the SIR model, which models
 PINNs. In~\Cref{sec:pandemicModel:sir}, we describe the SIR model, which models
 the relations of susceptible, infectious and removed individuals and simulates
 the relations of susceptible, infectious and removed individuals and simulates
 the progress of a disease in a population with a constant size. A system of
 the progress of a disease in a population with a constant size. A system of
@@ -695,24 +697,24 @@ would calculate the initial transmission rate using the initial size of the
 susceptible group $S_0$ and the infectious group $I_0$. The recovery rate, then
 susceptible group $S_0$ and the infectious group $I_0$. The recovery rate, then
 could be defined using the amount of days a person between the point of
 could be defined using the amount of days a person between the point of
 infection and the start of isolation $d$, $\alpha = \frac{1}{d}$. The analytical
 infection and the start of isolation $d$, $\alpha = \frac{1}{d}$. The analytical
-solutions to the SIR models often use heuristic methods and require knowledge
+solutions to the SIR models often use heuristic methods and require prior knowledge
 like the sizes $S_0$ and $I_0$. A data-driven approach such as the one that
 like the sizes $S_0$ and $I_0$. A data-driven approach such as the one that
 Shaier \etal~\cite{Shaier2021} propose does not suffer from these problems. Since the
 Shaier \etal~\cite{Shaier2021} propose does not suffer from these problems. Since the
 model learns the parameters $\alpha$ and $\beta$ while learning the training
 model learns the parameters $\alpha$ and $\beta$ while learning the training
 data consisting of the time points $\boldsymbol{t}$,  and the corresponding
 data consisting of the time points $\boldsymbol{t}$,  and the corresponding
-measured sizes of the groups $\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}$.
-Let $\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}$ be the
+measured sizes of the groups $\Psi=(\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R})$.
+Let $\hat{\Psi}=(\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}})$ be the
 model predictions of the groups and
 model predictions of the groups and
-$r_S=\frac{d\hat{\boldsymbol{S}}}{dt}+\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}},
-  r_I=\frac{d\hat{\boldsymbol{I}}}{dt}-\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}}+\alpha \hat{\boldsymbol{I}}$
-and $r_R=\frac{d \hat{\boldsymbol{R}}}{dt} - \alpha \hat{\boldsymbol{I}}$ the
+$r_S=\frac{d\hat{\boldsymbol{S}}}{dt}+\hat{\beta} \hat{\boldsymbol{S}}\hat{\boldsymbol{I}},
+  r_I=\frac{d\hat{\boldsymbol{I}}}{dt}-\hat{\beta} \hat{\boldsymbol{S}}\hat{\boldsymbol{I}}+\alpha \hat{\boldsymbol{I}}$
+and $r_R=\frac{d \hat{\boldsymbol{R}}}{dt} - \hat{\alpha} \hat{\boldsymbol{I}}$ the
 residuals of each differential equation using the model predictions. Then,
 residuals of each differential equation using the model predictions. Then,
 \begin{equation}
 \begin{equation}
   \begin{split}
   \begin{split}
-    \mathcal{L}_{SIR}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = &||r_S||^2 + ||r_I||^2 + ||r_R||^2\\
+    \mathcal{L}_{\text{SIR}}(\boldsymbol{t}, \Psi, \hat{\Psi}) = &||r_S||^2 + ||r_I||^2 + ||r_R||^2\\
     + &\frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}||^2 + ||\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}||^2 + ||\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}||^2,
     + &\frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}||^2 + ||\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}||^2 + ||\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}||^2,
   \end{split}
   \end{split}
 \end{equation}
 \end{equation}
-is the loss function of a DINN, with $\alpha$ and $\beta$ being learnable
+is the loss function of a DINN, with $\hat{\alpha}$ and $\hat{\beta}$ being learnable
 parameters. These are represented in the residuals of the ODEs.
 parameters. These are represented in the residuals of the ODEs.
 % -------------------------------------------------------------------
 % -------------------------------------------------------------------

BIN
thesis.pdf