|
@@ -323,7 +323,7 @@ The SIR model makes a number of assumptions that are intended to reduce the
|
|
|
model's overall complexity while simultaneously increasing its divergence from
|
|
|
actual reality. One such assumption is that the size of the population, $N$,
|
|
|
remains constant, as the daily change is negligible to the total population.
|
|
|
-This depiction is not an accurate representation of the actual relations
|
|
|
+This depiction is not an accurate representation of the actual relations \todo{other assumptions in a bad light?}
|
|
|
observed in the real world, as the size of a population is subject to a number
|
|
|
of factors that can contribute to change. The population is increased by the
|
|
|
occurrence of births and decreased by the occurrence of deaths. Other examples
|
|
@@ -365,19 +365,20 @@ represents the number of susceptible individuals, that one infectious individual
|
|
|
infects at the onset of the pandemic. In light of the effects of $\beta$ and
|
|
|
$\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO > 1$ indicates that the
|
|
|
pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
|
|
|
-limited number of infections resulting from $I(t_0) << S(t_0)$. When $\RO < 1$,
|
|
|
-the disease is spreading rapidly across the population, with an increase in $I$
|
|
|
-occurring at a high rate. Nevertheless, $\RO$ does not cover the entire time
|
|
|
-span. For this reason, Millevoi \etal~\cite{Millevoi2023} introduce $\Rt$
|
|
|
-which has the same interpretation as $\RO$, with the exception that $\Rt$ is
|
|
|
-dependent on time. The definition of the time-dependent reproduction number on
|
|
|
-the time interval $\mathcal{T}$ with the population size $N$,
|
|
|
+limited number of infections resulting from $I(t_0) << S(t_0)$.\\ Further,
|
|
|
+$\RO < 1$ leads to the disease spreading rapidly across the population, with an
|
|
|
+increase in $I$ occurring at a high rate. Nevertheless, $\RO$ does not cover
|
|
|
+the entire time span. For this reason, Millevoi \etal~\cite{Millevoi2023}
|
|
|
+introduce $\Rt$ which has the same interpretation as $\RO$, with the exception
|
|
|
+that $\Rt$ is dependent on time. The time-dependent reproduction number is
|
|
|
+defined as,
|
|
|
\begin{equation}\label{eq:repr_num}
|
|
|
- \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N}
|
|
|
+ \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N},
|
|
|
\end{equation}
|
|
|
-includes the rates of change for information about the spread of the disease and
|
|
|
-information of the decrease of the ratio of susceptible individuals in the
|
|
|
-population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
|
|
|
+on the time interval $\mathcal{T}$. This definition includes the transition
|
|
|
+rates for information about the spread of the disease and information of the
|
|
|
+decrease of the ratio of susceptible individuals in the population. In contrast
|
|
|
+to $\beta$ and $\alpha$, $\Rt$ is not a parameter but \todo{Sai comment - earlier?}
|
|
|
a state variable in the model and enabling the following reduction of the SIR
|
|
|
model.\\
|
|
|
|
|
@@ -390,12 +391,12 @@ $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
|
|
|
\end{split}
|
|
|
\end{equation}
|
|
|
is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
|
|
|
-characteristic and the reproduction number \Rt (see ~\Cref{eq:repr_num}).
|
|
|
+characteristic and the reproduction number $\Rt$ (see ~\Cref{eq:repr_num}).
|
|
|
Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
|
|
|
-extensive range of values that the SIR groups can assume, spanning from $0$ to
|
|
|
-$10^7$. Accordingly, they initially scale the time interval $\mathcal{T}$ using
|
|
|
-its borders to calculate the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in
|
|
|
- [0, 1]$. Subsequently, they calculate the scaled groups,
|
|
|
+extensive range of values that the SIR groups can assume. Accordingly, they
|
|
|
+initially scale the time interval $\mathcal{T}$ using its borders to calculate
|
|
|
+the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in[0, 1]$. Subsequently, they
|
|
|
+calculate the scaled groups,
|
|
|
\begin{equation}
|
|
|
S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
|
|
|
\end{equation}
|
|
@@ -404,11 +405,11 @@ variable $I$, results in,
|
|
|
\begin{equation}
|
|
|
\frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
|
|
|
\end{equation}
|
|
|
-a further reduced version of~\Cref{eq:sir} results in a more streamlined and
|
|
|
-efficient process, as it entails the elimination of a parameter($\beta$) and two
|
|
|
-state variables ($S$ and $R$), while adding the state variable $\Rt$. This is a
|
|
|
-crucial aspect for the automated resolution of such differential equation
|
|
|
-systems, as we describe in~\Cref{sec:mlp}.
|
|
|
+which is a further reduced version of~\Cref{eq:sir}. This less complex
|
|
|
+differential equation results in a less complex solution, as it entails the
|
|
|
+elimination of a parameter ($\beta$) and the two state variables ($S$ and $R$).
|
|
|
+The reduced SIR model, is more precise in applications with a worse data
|
|
|
+situation, due to its fewer input variables.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|
|
@@ -419,16 +420,17 @@ equations in systems, illustrating how they can be utilized to elucidate the
|
|
|
impact of a specific parameter on the system's behavior.
|
|
|
In~\Cref{sec:epidemModel}, we show specific applications of differential
|
|
|
equations in an epidemiological context. The final objective is to solve these
|
|
|
-equations. For this problem, there are multiple methods to achieve this goal. On
|
|
|
-such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
|
|
|
-following section, we provide a brief overview of the structure and training of
|
|
|
-these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
|
|
|
-by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
|
|
|
-explanations.\\
|
|
|
+equations by finding a function that fits. Fitting measured data points to
|
|
|
+approximate such a function, is one of the multiple methods to achieve this
|
|
|
+goal. The \emph{Multilayer Perceptron} (MLP)~\cite{Rumelhart1986} is a
|
|
|
+data-driven function approximator. In the following section, we provide a brief
|
|
|
+overview of the structure and training of these \emph{neural networks}. For
|
|
|
+reference, we use the book \emph{Deep Learning} by Goodfellow
|
|
|
+\etal~\cite{Goodfellow-et-al-2016} as a foundation for our explanations.\\
|
|
|
|
|
|
The objective is to develop an approximation method for any function $f^{*}$,
|
|
|
-which could be a mathematical function or a mapping of an input vector to a
|
|
|
-class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
|
|
|
+which could be a mathematical function or a mapping of an input vector to the
|
|
|
+desired output. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
|
|
|
the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
|
|
|
is the function to approximate. In the year 1958,
|
|
|
Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
|
|
@@ -440,14 +442,15 @@ Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
|
|
|
approximating a specific class of functions. Consequently, there is a necessity
|
|
|
for an expansion of the perceptron.\\
|
|
|
|
|
|
-As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
|
|
|
+As Goodfellow \etal~\cite{Goodfellow-et-al-2016} proceed, the solution to this issue is to decompose $f$ into
|
|
|
a chain structure of the form,
|
|
|
\begin{equation} \label{eq:mlp_char}
|
|
|
f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
|
|
|
\end{equation}
|
|
|
-This converts a perceptron, which has only two layers (an input and an output
|
|
|
-layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
|
|
|
-is represented in the structure of an MLP as a \emph{layer}. A multitude of
|
|
|
+This nested version of a perceptron is a multilayer perceptron. Each
|
|
|
+sub-function, designated as $f^{(i)}$, is represented in the structure of an
|
|
|
+MLP as a \emph{layer}, which contains a linear mapping and a nonlinear mapping
|
|
|
+in form of an \emph{activation function}. A multitude of
|
|
|
\emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
|
|
|
same vector-to-scalar calculation as the perceptron does. Subsequently, a
|
|
|
nonlinear activation function transforms the scalar output into the activation
|
|
@@ -457,27 +460,30 @@ input vector $\boldsymbol{x}$ is provided to each unit of the first layer
|
|
|
$f^{(1)}$, which then gives the results to the units of the second layer
|
|
|
$f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
|
|
|
intervening layers, situated between the first and the output layers are the
|
|
|
-\emph{hidden layers}. The alternating structure of linear and nonlinear
|
|
|
-calculation enables MLP's to approximate any function. As Hornik
|
|
|
-\etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
|
|
|
+\emph{hidden layers}. The term \emph{forward propagation} describes the
|
|
|
+process of information flowing through the network from the input layer to the
|
|
|
+output layer, resulting in a scalar loss. The alternating structure of linear
|
|
|
+and nonlinear calculation enables MLP's to approximate any function. As Hornik
|
|
|
+\etal~\cite{Hornik1989} proves, MLP's are universal approximators.\\
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
\centering
|
|
|
\includegraphics[scale=0.87]{MLP.pdf}
|
|
|
- \caption{A visualization of the SIR model, illustrating $N$ being split in the
|
|
|
- three groups $S$, $I$ and $R$.}
|
|
|
+ \caption{A illustration of an MLP with two hidden layers. Each neuron of a layer
|
|
|
+ is connected to every neuron of the neighboring layers. The arrow indicates
|
|
|
+ the direction of the forward propagation.}
|
|
|
\label{fig:mlp_example}
|
|
|
\end{figure}
|
|
|
-\todo{caption}
|
|
|
+
|
|
|
The term \emph{training} describes the process of optimizing the parameters
|
|
|
$\theta$. In order to undertake training, it is necessary to have a set of
|
|
|
\emph{training data}, which is a set of pairs (also called training points) of
|
|
|
the input data $\boldsymbol{x}$ and its corresponding true solution
|
|
|
$\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
|
|
|
-define the \emph{loss function} $\Loss{ }$, using the model prediction
|
|
|
+define a \emph{loss function} $\Loss{ }$, using the model prediction
|
|
|
$\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
|
|
|
metric for evaluating the extent to which the model deviates from the correct
|
|
|
-answer. One of the most common loss function is the \emph{mean square error}
|
|
|
+answer. One common loss function is the \emph{mean square error}
|
|
|
(MSE) loss function. Let $N$ be the number of points in the set of training
|
|
|
data. Then,
|
|
|
\begin{equation} \label{eq:mse}
|
|
@@ -486,41 +492,43 @@ data. Then,
|
|
|
calculates the squared difference between each model prediction and true value
|
|
|
of a training and takes the mean across the whole training data. \\
|
|
|
|
|
|
-In the context of neural networks, \emph{forward propagation} describes the
|
|
|
-process of information flowing through the network from the input layer to the
|
|
|
-output layer, resulting in a scalar loss. Ultimately, the objective is to
|
|
|
-utilize this information to optimize the parameters, in order to minimize the
|
|
|
+Ultimately, the objective is to utilize this information to optimize the parameters, in order to minimize the
|
|
|
loss. One of the most fundamental optimization strategy is \emph{gradient
|
|
|
descent}. In this process, the derivatives are employed to identify the location
|
|
|
-of local or global minima within a function. Given that a positive gradient
|
|
|
+of local or global minima within a function, which lie where the gradient is
|
|
|
+zero. Given that a positive gradient
|
|
|
signifies ascent and a negative gradient indicates descent, we must move the
|
|
|
-variable by a constant \emph{learning rate} (step size) in the opposite
|
|
|
+variable by a \emph{learning rate} (step size) in the opposite
|
|
|
direction to that of the gradient. The calculation of the derivatives in respect
|
|
|
to the parameters is a complex task, since our functions is a composition of
|
|
|
many functions (one for each layer). We can address this issue taking advantage
|
|
|
of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
|
|
|
-$\hat{\boldsymbol{y}} = f(w; \theta)$ be the model prediction with
|
|
|
+$\hat{\boldsymbol{y}} = f(\boldsymbol{x}; \theta)$ be the model prediction with the
|
|
|
+decomposed version $f(\boldsymbol{x}; \theta) = f^{(3)}(w; \theta_3)$ with
|
|
|
$w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
|
|
|
-$\boldsymbol{x}$ is the input vector and $\theta_1, \theta_2\subset\theta$.
|
|
|
+$\boldsymbol{x}$ is the input vector and $\theta_3, \theta_2, \theta_1\subset\theta$.
|
|
|
Then,
|
|
|
\begin{equation}\label{eq:backprop}
|
|
|
- \nabla_{\theta_1} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\theta_1}f^{(1)},
|
|
|
+ \nabla_{\theta_3} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(3)}}\nabla_{\theta_3}f^{(3)},
|
|
|
\end{equation}
|
|
|
-is the gradient of $\Loss{ }$ in respect of the parameters $\theta_1$. The name
|
|
|
-of this method in the context of neural networks is \emph{back propagation}. \todo{Insert source}\\
|
|
|
+is the gradient of $\Loss{ }$ in respect of the parameters $\theta_3$. To obtain
|
|
|
+$\nabla_{\theta_2} \Loss{ }$, we have to derive $\nabla_{\theta_3} \Loss{ }$ in
|
|
|
+respect to $\theta_2$. The name of this method in the context of neural
|
|
|
+networks is \emph{back propagation}~\cite{Rumelhart1986}, as it propagates the
|
|
|
+error backwards through the neural network.\\
|
|
|
|
|
|
In practical applications, an optimizer often accomplishes the optimization task
|
|
|
-by executing gradient descent in the background. Furthermore, modifying the
|
|
|
-learning rate during training can be advantageous. For instance, making larger
|
|
|
+by executing back propagation in the background. Furthermore, modifying the
|
|
|
+learning rate during training can be advantageous. For instance, making larger \todo{leave whole paragraph out? - Niklas}
|
|
|
steps at the beginning and minor adjustments at the end. Therefore, schedulers
|
|
|
are implementations algorithms that employ diverse learning rate alteration
|
|
|
strategies.\\
|
|
|
|
|
|
-This section provides an overview of basic concepts of neural networks. For a
|
|
|
-deeper understanding, we direct the reader to the book \emph{Deep Learning} by
|
|
|
-Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
|
|
|
-the application of neural networks in approximating solutions to differential
|
|
|
-systems.
|
|
|
+For a more in-depth discussion of practical considerations and additional
|
|
|
+details like regularization, we direct the reader to the book
|
|
|
+\emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next
|
|
|
+section will demonstrate the application of neural networks in approximating
|
|
|
+solutions to differential systems.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|
|
@@ -537,15 +545,15 @@ differential equations. The \emph{physics-informed neural network} (PINN)
|
|
|
learns the system of differential equations during training, as it optimizes
|
|
|
its output to align with the equations.\\
|
|
|
|
|
|
-In contrast to standard MLP's, the loss term of a PINN comprises two
|
|
|
-components. The first term incorporates the aforementioned prior knowledge to pertinent the problem. As Raissi
|
|
|
+In contrast to standard MLP's, PINNs are not only data-driven. The loss term of a PINN comprises two
|
|
|
+components. The first term incorporates the equations of the aforementioned prior knowledge to pertinent the problem. As Raissi
|
|
|
\etal~\cite{Raissi2017} propose, the residual of each differential equation in
|
|
|
the system must be minimized in order for the model to optimize its output in accordance with the theory.
|
|
|
We obtain the residual $r_i$, with $i\in\{1, ...,N_d\}$, by rearranging the differential equation and
|
|
|
calculating the difference between the left-hand side and the right-hand side
|
|
|
of the equation. $N_d$ is the number of differential equations in a system. As
|
|
|
Raissi \etal~\cite{Raissi2017} propose the \emph{physics
|
|
|
- loss} of a PINN,\todo{check source again}
|
|
|
+ loss} of a PINN,
|
|
|
\begin{equation}
|
|
|
\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2,
|
|
|
\end{equation}
|
|
@@ -560,14 +568,13 @@ denote the number of training points. Then,
|
|
|
\end{equation}\\
|
|
|
represents the comprehensive loss function of a physics-informed neural network. \\
|
|
|
|
|
|
-\todo{check for correctness}
|
|
|
Given the nature of residuals, calculating the loss term of
|
|
|
$\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
|
|
|
calculation of the derivative of the output with respect to the input of
|
|
|
the neural network. As we outline in~\Cref{sec:mlp}, during the process of
|
|
|
back-propagation we calculate the gradients of the loss term in respect to a
|
|
|
layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
|
|
|
-the index of the \todo{check for consistency} respective layer. By employing
|
|
|
+the index of the respective layer. By employing
|
|
|
the chain rule of calculus, the algorithm progresses from the output layer
|
|
|
through each hidden layer, ultimately reaching the first layer in order to
|
|
|
compute the respective gradients. The term,
|
|
@@ -603,7 +610,7 @@ which should ultimately yield an approximation of the true value.\\
|
|
|
\label{fig:spring}
|
|
|
\end{figure}
|
|
|
One illustrative example of a potential application for PINN's is the
|
|
|
-\emph{damped harmonic oscillator}~\cite{Tenenbaum1985}. In this problem, we \todo{check source for wording}
|
|
|
+\emph{damped harmonic oscillator}~\cite{Demtroeder2021}. In this problem, we
|
|
|
displace a body, which is attached to a spring, from its resting position. The
|
|
|
body is subject to three forces: firstly, the inertia exerted by the
|
|
|
displacement $u$, which points in the direction the displacement $u$; secondly
|
|
@@ -613,7 +620,7 @@ direction of the movement. In accordance with Newton's second law and the
|
|
|
combined influence of these forces, the body exhibits oscillatory motion around
|
|
|
its position of rest. The system is influenced by $m$ the mass of the body,
|
|
|
$\mu$ the coefficient of friction and $k$ the spring constant, indicating the
|
|
|
-stiffness of the spring. The residual of the differential equation, \todo{check in book}
|
|
|
+stiffness of the spring. The residual of the differential equation,
|
|
|
\begin{equation}
|
|
|
m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
|
|
|
\end{equation}
|