123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681 |
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Author: Phillip Rothenbeck
- % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
- % File: chap02/chap02.tex
- % Part: theoretical background
- % Description:
- % summary of the content in this chapter
- % Version: 05.08.2024
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- \chapter{Theoretical Background 12}
- \label{chap:background}
- This chapter introduces the theoretical foundations for the work presented in
- this thesis. In~\Cref{sec:domain} and~\Cref{sec:differentialEq}, we describe
- differential equations and the underlying theory. In these Sections both the
- explanations and the approach are based on a book on analysis by
- Rudin~\cite{Rudin2007} and a book about ordinary differential equations by
- Tenenbaum and Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this
- knowledge to examine various pandemic models in~\Cref{sec:epidemModel}. Finally,
- we address the topic of neural networks with a focus on the multilayer
- perceptron in~\Cref{sec:mlp} and physics informed neural networks
- in~\Cref{sec:pinn}.
- % -------------------------------------------------------------------
- \section{Mathematical Modelling using Functions 1}
- \label{sec:domain}
- To model a physical problem mathematically, it is necessary to define a set of
- fundamental numbers\todo{meeting question 1} or quantities upon which the subsequent calculations will be
- based. These sets may represent, for instance, a specific time interval or a
- distance. The term \emph{domain} describes these fundamental sets of numbers or
- quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity living in a
- certain domain. In this thesis, we will focus on domains of real numbers in
- $\mathbb{R}$.\\
- The mapping between variables enables the modeling of a physical process and may
- depict semantics. We use functions in order to facilitate this mapping. Let
- $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
- function as the mapping
- \begin{equation}
- f: A\rightarrow B.
- \end{equation}
- In other words, the function $f$ maps elements $x\in A$ to values
- $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
- of $f$. Functions are capable of representing the state of a system as a value
- based on an input value from their domain. One illustrative example is a
- function that maps a time step to the distance covered since a starting point.
- In this case, time serves as the domain, while the distance is the codomain.
- % -------------------------------------------------------------------
- \section{Mathematical Modelling using Differential Equations 1}
- \label{sec:differentialEq}
- Often, the behavior of a variable or a quantity across a domain is more
- interesting than its current state. Functions are able to give us the latter, \todo{meeting question 2}
- but only passively give information about the change of a system. The objective
- is to determine an effective method for calculating the change of a function
- across its domain. Let $f$ be a function and $[a, b]\subset \mathbb{R}$ an
- interval of real numbers. The expression
- \begin{equation}
- m = \frac{f(b) - f(a)}{a-b}
- \end{equation}
- gives the average rate of change. While the average rate of change is useful in
- many cases, the momentary rate of change is more accurate. To calculate this, \todo{look up in Rudin - cite (wordly)}
- we need to narrow down, the interval to an infinitesimal. For each $x\in[a, b]$
- we calculate
- \begin{equation} \label{eqn:differential}
- \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
- \end{equation}
- if it exists. As the Tenenbaum and Pollard~\cite{Tenenbaum1985} define,
- $\nicefrac{df}{dx}$ is the \emph{derivative}, which is ``the rate of change of a
- variable with respect to another''. The relation between a variable and its
- derivative is modeled in a \emph{differential equation}. The derivative of
- $\nicefrac{df}{dx}$ yields $\nicefrac{d^2f}{dx^2}$, which is the function that
- calculates the rate of change of the rate of change and is called the
- \emph{second order derivative}. Iterating this $n$ times results in
- $\nicefrac{d^nf}{dx^n}$, the derivative of the $n$'th order. A method for
- obtaining a differential equation is to derive it from the semantics of a
- problem. For example, in physics a differential equation can be derived from the
- law of the conservation of energy~\cite{Demtroeder2021}. Differential equations \todo{is this good?}
- find application in several areas such as engineering \eg, the Chua's
- circuit~\cite{Matsumoto1984}, physics with, \eg, the Schrödinger
- equation~\cite{Schroedinger1926}, economics, \eg, Black-Scholes
- equation~\cite{Oksendal2000}, epidemiology, and beyond.\\
- In the context of functions, it is possible to have multiple domains, meaning
- that function has more than one parameter. To illustrate, consider a function
- operating in two-dimensional space, wherein each parameter represents one axis.
- Another example would be a function, that maps its inputs of a location variable
- and a time variable on a height. The term \emph{partial differential equations}
- (\emph{PDE}'s) describes differential equations of such functions, which contain
- partial derivatives with respect to each individual domain. In contrast, \emph{ordinary differential
- equations} (\emph{ODE}'s) are the single derivatives for a function having only
- one domain~\cite{Tenenbaum1985}. In this thesis, we restrict ourselves to ODE's.\\
- A \emph{system of differential equations} is the name for a set of differential
- equations. The derivatives in a system of differential equations each have their
- own codomain, which is part of the problem, while they all share the same
- domain.\\
- Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
- including the \emph{Motion of a Particle Along a Straight Line}. Further,
- Newton's second law states that ``the rate of change of the momentum of a body
- ($momentum = mass \cdot velocity$) is proportional to the resultant external
- force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
- in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
- Then, Newton's second law translates mathematically to
- \begin{equation} \label{eq:newtonSecLaw}
- F = m\frac{dv}{dt}.
- \end{equation}
- It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
- the velocity is part of the equation. Additionally, the velocity of a body is
- the derivative of the distance traveled by that body. Based on these findings,
- we can rewrite the~\Cref{eq:newtonSecLaw} to
- \begin{equation}
- F=ma=m\frac{d^2s}{dt^2}.
- \end{equation}\\
- To conclude, note that this explanation of differential equations focuses on the
- aspects deemed crucial for this thesis and is not intended to be a complete
- explanation of the subject. To gain a better understanding of it, we recommend
- the books mentioned above~\cite{Rudin2007,Tenenbaum1985}. In the following
- section we describe the application of these principles in epidemiological
- models.
- % -------------------------------------------------------------------
- \section{Epidemiological Models 4}
- \label{sec:epidemModel}
- Pandemics, like \emph{COVID-19}, which have resulted in a significant
- number of fatalities. Hence, the question arises: How should we analyze a \todo{Better?}
- pandemic effectively? It is essential to study whether the employed
- countermeasures are efficacious in combating the pandemic. Given the unfavorable
- public response to measures such as lockdowns, it is imperative to investigate
- that their efficacy remains commensurate with the costs incurred to those
- affected. In the event that alternative and novel technologies were in use, such
- as the mRNA vaccines in the context of COVID-19, it is needful to test the
- effect and find the optimal variant. In order to shed light on the
- aforementioned events, we need a method to quantify the pandemic along with its
- course of progression.\\
- The real world is a highly complex system, which presents a significant
- challenge attempting to describe it fully in a mathematical model. Therefore,
- the model must reduce the complexity while retaining the essential information.
- Furthermore, it must address the issue of limited data availability. For
- instance, during COVID-19 institutions such as the Robert Koch Institute
- (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
- able to collect data on infections and mortality cases. Consequently, we require
- a model that employs an abstraction of the real world to illustrate the events
- and relations that are pivotal to understanding the problem.
- % -------------------------------------------------------------------
- \subsection{SIR Model 3}
- \label{sec:pandemicModel:sir}
- In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
- which subsequently became one of the most influential epidemiological models.
- This model enables the modeling of infections during epidemiological events such as pandemics.
- The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
- reiterates the model and serves as the foundation for the following explanation
- of SIR models.\\
- The SIR model is capable of illustrating diseases, which are transferred through
- contact or proximity of an individual carrying the illness and a healthy
- individual. This is possible due to the distinction between infected individuals
- who are carriers of the disease and the part of the population, which is
- susceptible to infection. In the model, the mentioned groups are capable to
- change, \eg, healthy individuals becoming infected. The model assumes the
- size $N$ of the population remains constant throughout the duration of the
- pandemic. The population $N$ comprises three distinct compartments: the
- \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
- \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
- \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
- \begin{equation}
- S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
- \end{equation}
- give the values of $S$, $I$ and $R$ at a certain point of time
- $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
- \begin{equation} \label{eq:N_char}
- N = S + I + R.
- \end{equation}
- The model makes another assumption by stating that recovered people are immune
- to the illness and infectious individuals can not infect them. The individuals
- in the $R$ group are either recovered or deceased, and thus unable to transmit
- or carry the disease.
- \begin{figure}[h]
- \centering
- \includegraphics[scale=0.87]{sir_graph.pdf}
- \caption{A visualization of the SIR model, illustrating $N$ being split in the
- three groups $S$, $I$ and $R$.}
- \label{fig:sir_model}
- \end{figure}
- As visualized in the~\Cref{fig:sir_model} the
- individuals may transition between groups based on transition rates. The
- transmission rate $\beta$ is responsible for individuals becoming infected,
- while the rate of removal or recovery rate $\alpha$ (also referred to as
- $\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
- individuals from $I$ to $R$.\\
- We can describe this problem mathematically using a system of differential
- equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
- McKendrick~\cite{1927} propose the following set of differential equations:
- \begin{equation}\label{eq:sir}
- \begin{split}
- \frac{dS}{dt} &= -\beta S I,\\
- \frac{dI}{dt} &= \beta S I - \alpha I,\\
- \frac{dR}{dt} &= \alpha I.
- \end{split}
- \end{equation}
- This set of differential equations, is based on the following assumption:
- ``The rate of transmission of a microparasitic disease is proportional to the
- rate of encounter of susceptible and infective individuals modelled by the
- product ($\beta S I$)'', according to Edelstein-Keshet~\cite{EdelsteinKeshet2005}.
- The system shows the change in size of the groups per time unit due to
- infections, recoveries, and deaths.\\
- The term $\beta SI$ describes the rate of encounters of susceptible and infected
- individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
- and May~\cite{Anderson1991} propose a modified model:
- \begin{equation}\label{eq:modSIR}
- \begin{split}
- \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
- \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
- \frac{dR}{dt} &= \alpha I.
- \end{split}
- \end{equation}
- In~\Cref{eq:modSIR} $\beta SI$ gets normalized by $N$, which is more correct in
- a real world aspect~\cite{Anderson1991}.\\
- The initial phase of a pandemic is characterized by the infection of a small
- number of individuals, while the majority of the population remains susceptible.
- The infectious group has not yet infected any individuals thus
- neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
- the number of infected individuals at the beginning of the disease. Then,
- \begin{equation}
- \begin{split}
- S(0) &= N - I_{0},\\
- I(0) &= I_{0},\\
- R(0) &= 0,
- \end{split}
- \end{equation}
- describes the initial configuration of a system in which a disease has just
- emerged.\\
- \begin{figure}[h]
- %\centering
- \setlength{\unitlength}{1cm} % Set the unit length for coordinates
- \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
- % reference
- \put(0, 1.75){
- \begin{subfigure}{0.4\textwidth}
- \centering
- \includegraphics[width=\textwidth]{reference_params_synth.png}
- \caption{$\alpha=0.35$, $\beta=0.5$}
- \label{fig:synth_norm}
- \end{subfigure}
- }
- % 1. row, 1.image (low beta)
- \put(5.5, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{low_beta_synth.png}
- \caption{$\alpha=0.25$, $\beta=0.5$}
- \label{fig:synth_low_beta}
- \end{subfigure}
- }
- % 1. row, 2.image (high beta)
- \put(9.5, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{high_beta_synth.png}
- \caption{$\alpha=0.45$, $\beta=0.5$}
- \label{fig:synth_high_beta}
- \end{subfigure}
- }
- % 2. row, 1.image (low alpha)
- \put(5.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{low_alpha_synth.png}
- \caption{$\alpha=0.35$, $\beta=0.4$}
- \label{fig:synth_low_alpha}
- \end{subfigure}
- }
- % 2. row, 2.image (high alpha)
- \put(9.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{high_alpha_synth.png}
- \caption{$\alpha=0.35$, $\beta=0.6$}
- \label{fig:synth_high_alpha}
- \end{subfigure}
- }
- \end{picture}
- \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$, $I_0=10$ with different sets of parameters.
- We visualize the case with the reference parameters in (a). In (b) and (c) we keep $\alpha$ constant, while varying
- the value of $\beta$. In contrast, (d) and (e) have varying values of $\alpha$.}
- \label{fig:synth_sir}
- \end{figure}
- In the SIR model the temporal occurrence and the height of the peak (or peaks)
- of the infectious group are of paramount importance for understanding the
- dynamics of a pandemic. A low peak occurring at a late point in time indicates
- that the disease is unable to keep pace with the rate of recovery, resulting
- in its demise before it can exert a significant influence on the population. In
- contrast, an early and high peak means that the disease is rapidly transmitted
- through the population, with a significant proportion of individuals having been
- infected.~\Cref{fig:sir_model} illustrates this effect by varying the values of
- $\beta$ or $\alpha$ while simulating a pandemic using a model such
- as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
- and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
- When the number of infections exceeds the number of recoveries, the peak of $I$
- will occur early and will be high. On the other hand, if recoveries occur at a
- faster rate than new infections the peak will occur later and will be low. Thus,
- it is crucial to know both $\beta$ and $\alpha$, as these parameters
- characterize how the pandemic evolves.\\
- The SIR model makes a number of assumptions that are intended to reduce the
- model's overall complexity while simultaneously increasing its divergence from
- actual reality. One such assumption is that the size of the population, $N$,
- remains constant, as the daily change is negligible to the total population.
- This depiction is not an accurate representation of the actual relations
- observed in the real world, as the size of a population is subject to a number
- of factors that can contribute to change. The population is increased by the
- occurrence of births and decreased by the occurrence of deaths. Other examples
- are the impossibility for individuals to be susceptible again, after having
- recovered, or the possibility for the transition rates to change due to new
- variants or the implementation of new countermeasures. We address this latter
- option in the next~\Cref{sec:pandemicModel:rsir}.
- % -------------------------------------------------------------------
- \subsection{Reduced SIR Model and the Reproduction Number 1}
- \label{sec:pandemicModel:rsir}
- The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. This model
- contains two scalar parameters $\beta$ and $\alpha$, which describe the course
- of a pandemic over its duration. This is beneficial when examining the overall
- pandemic; however, in the real world, disease behavior is dynamic, and the
- values of the parameters $\beta$ and $\alpha$ change throughout the course of
- the disease. The reason for this is due to events such as the implementation of
- countermeasures that reduce the contact between the infectious and susceptible
- individuals, the emergence of a new variant of the disease that increases its
- infectivity or deadliness, or the administration of a vaccination that provides
- previously susceptible individuals with immunity without ever being infected. \todo{sai correction -> is this point not already included?}
- To address this Millevoi \etal~\cite{Millevoi2023} introduce a model that \todo{are there older sources}
- simultaneously reduces the size of the system of differential equations and
- solves the problem of time scaling at hand.\\
- First, they alter the definition of $\beta$ and $\alpha$ to be dependent on the time interval
- $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
- \begin{equation}
- \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
- \end{equation}
- Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
- span an infected individual requires to recuperate. Subsequently, at the initial time point
- $t_0$, the \emph{reproduction number},
- \begin{equation}
- \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
- \end{equation}
- represents the number of susceptible individuals, that one infectious individual
- infects at the onset of the pandemic. In light of the effects of $\beta$ and
- $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO > 1$ indicates that the
- pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
- limited number of infections resulting from $I(t_0) << S(t_0)$. When $\RO < 1$,
- the disease is spreading rapidly across the population, with an increase in $I$
- occurring at a high rate. Nevertheless, $\RO$ does not cover the entire time
- span. For this reason, Millevoi \etal~\cite{Millevoi2023} introduce $\Rt$
- which has the same interpretation as $\RO$, with the exception that $\Rt$ is
- dependent on time. The definition of the time-dependent reproduction number on
- the time interval $\mathcal{T}$ with the population size $N$,
- \begin{equation}\label{eq:repr_num}
- \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N}
- \end{equation}
- includes the rates of change for information about the spread of the disease and
- information of the decrease of the ratio of susceptible individuals in the
- population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
- a state variable in the model and enabling the following reduction of the SIR
- model.\\
- \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
- $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
- \begin{equation}
- \begin{split}
- \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
- \frac{dI}{dt} &= -\alpha\Rt I(t),
- \end{split}
- \end{equation}
- is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
- characteristic and the reproduction number \Rt (see ~\Cref{eq:repr_num}).
- Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
- extensive range of values that the SIR groups can assume, spanning from $0$ to
- $10^7$. Accordingly, they initially scale the time interval $\mathcal{T}$ using
- its borders to calculate the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in
- [0, 1]$. Subsequently, they calculate the scaled groups,
- \begin{equation}
- S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
- \end{equation}
- using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
- variable $I$, results in,
- \begin{equation}
- \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
- \end{equation}
- a further reduced version of~\Cref{eq:sir} results in a more streamlined and
- efficient process, as it entails the elimination of a parameter($\beta$) and two
- state variables ($S$ and $R$), while adding the state variable $\Rt$. This is a
- crucial aspect for the automated resolution of such differential equation
- systems, as we describe in~\Cref{sec:mlp}.
- % -------------------------------------------------------------------
- \section{Multilayer Perceptron 2}
- \label{sec:mlp}
- In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
- equations in systems, illustrating how they can be utilized to elucidate the
- impact of a specific parameter on the system's behavior.
- In~\Cref{sec:epidemModel}, we show specific applications of differential
- equations in an epidemiological context. The final objective is to solve these
- equations. For this problem, there are multiple methods to achieve this goal. On
- such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
- following section, we provide a brief overview of the structure and training of
- these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
- by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
- explanations.\\
- The objective is to develop an approximation method for any function $f^{*}$,
- which could be a mathematical function or a mapping of an input vector to a
- class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
- the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
- is the function to approximate. In the year 1958,
- Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
- a neuron in a neuroscientific sense. The perceptron takes in the input vector
- $\boldsymbol{x}$ performs an operation and produces a scalar result. This model
- optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
- f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
- Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
- approximating a specific class of functions. Consequently, there is a necessity
- for an expansion of the perceptron.\\
- As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
- a chain structure of the form,
- \begin{equation} \label{eq:mlp_char}
- f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
- \end{equation}
- This converts a perceptron, which has only two layers (an input and an output
- layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
- is represented in the structure of an MLP as a \emph{layer}. A multitude of
- \emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
- same vector-to-scalar calculation as the perceptron does. Subsequently, a
- nonlinear activation function transforms the scalar output into the activation
- of the unit. The layers are staggered in the neural network, with each layer
- being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
- input vector $\boldsymbol{x}$ is provided to each unit of the first layer
- $f^{(1)}$, which then gives the results to the units of the second layer
- $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
- intervening layers, situated between the first and the output layers are the
- \emph{hidden layers}. The alternating structure of linear and nonlinear
- calculation enables MLP's to approximate any function. As Hornik
- \etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
- \begin{figure}[h]
- \centering
- \includegraphics[scale=0.87]{MLP.pdf}
- \caption{A visualization of the SIR model, illustrating $N$ being split in the
- three groups $S$, $I$ and $R$.}
- \label{fig:mlp_example}
- \end{figure}
- \todo{caption}
- The term \emph{training} describes the process of optimizing the parameters
- $\theta$. In order to undertake training, it is necessary to have a set of
- \emph{training data}, which is a set of pairs (also called training points) of
- the input data $\boldsymbol{x}$ and its corresponding true solution
- $\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
- define the \emph{loss function} $\Loss{ }$, using the model prediction
- $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
- metric for evaluating the extent to which the model deviates from the correct
- answer. One of the most common loss function is the \emph{mean square error}
- (MSE) loss function. Let $N$ be the number of points in the set of training
- data. Then,
- \begin{equation} \label{eq:mse}
- \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
- \end{equation}
- calculates the squared difference between each model prediction and true value
- of a training and takes the mean across the whole training data. \\
- In the context of neural networks, \emph{forward propagation} describes the
- process of information flowing through the network from the input layer to the
- output layer, resulting in a scalar loss. Ultimately, the objective is to
- utilize this information to optimize the parameters, in order to minimize the
- loss. One of the most fundamental optimization strategy is \emph{gradient
- descent}. In this process, the derivatives are employed to identify the location
- of local or global minima within a function. Given that a positive gradient
- signifies ascent and a negative gradient indicates descent, we must move the
- variable by a constant \emph{learning rate} (step size) in the opposite
- direction to that of the gradient. The calculation of the derivatives in respect
- to the parameters is a complex task, since our functions is a composition of
- many functions (one for each layer). We can address this issue taking advantage
- of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
- $\hat{\boldsymbol{y}} = f(w; \theta)$ be the model prediction with
- $w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
- $\boldsymbol{x}$ is the input vector and $\theta_1, \theta_2\subset\theta$.
- Then,
- \begin{equation}\label{eq:backprop}
- \nabla_{\theta_1} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\theta_1}f^{(1)},
- \end{equation}
- is the gradient of $\Loss{ }$ in respect of the parameters $\theta_1$. The name
- of this method in the context of neural networks is \emph{back propagation}. \todo{Insert source}\\
- In practical applications, an optimizer often accomplishes the optimization task
- by executing gradient descent in the background. Furthermore, modifying the
- learning rate during training can be advantageous. For instance, making larger
- steps at the beginning and minor adjustments at the end. Therefore, schedulers
- are implementations algorithms that employ diverse learning rate alteration
- strategies.\\
- This section provides an overview of basic concepts of neural networks. For a
- deeper understanding, we direct the reader to the book \emph{Deep Learning} by
- Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
- the application of neural networks in approximating solutions to differential
- systems.
- % -------------------------------------------------------------------
- \section{Physics Informed Neural Networks 4}
- \label{sec:pinn}
- In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
- wildely recognized tools for approximating any kind of function. In this
- section, we apply this capability to create a solver for ODE's and PDE's
- as Legaris \etal~\cite{Lagaris1997} describe in their paper. In this approach,
- the model learns to approximate a function using provided data points while
- leveraging the available knowledge about the problem in the form of a system of
- differential equations. The \emph{physics-informed neural network} (PINN)
- learns the system of differential equations during training, as it optimizes
- its output to align with the equations.\\
- In contrast to standard MLP's, the loss term of a PINN comprises two
- components. The first term incorporates the aforementioned prior knowledge to pertinent the problem. As Raissi
- \etal~\cite{Raissi2017} propose, the residual of each differential equation in
- the system must be minimized in order for the model to optimize its output in accordance with the theory.
- We obtain the residual $r_i$, with $i\in\{1, ...,N_d\}$, by rearranging the differential equation and
- calculating the difference between the left-hand side and the right-hand side
- of the equation. $N_d$ is the number of differential equations in a system. As
- Raissi \etal~\cite{Raissi2017} propose the \emph{physics
- loss} of a PINN,\todo{check source again}
- \begin{equation}
- \mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2,
- \end{equation}
- takes the input data and the model prediction to calculate the mean square
- error of the residuals. The second term, the \emph{observation loss}
- $\Loss{obs}$, employs the mean square error of the distances between the
- predicted and the true values for each training point. Additionally, the
- observation loss may incorporate extra terms of inital and boundary conditions. Let $N_t$
- denote the number of training points. Then,
- \begin{equation}
- \mathcal{L}_{PINN}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
- \end{equation}\\
- represents the comprehensive loss function of a physics-informed neural network. \\
- \todo{check for correctness}
- Given the nature of residuals, calculating the loss term of
- $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
- calculation of the derivative of the output with respect to the input of
- the neural network. As we outline in~\Cref{sec:mlp}, during the process of
- back-propagation we calculate the gradients of the loss term in respect to a
- layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
- the index of the \todo{check for consistency} respective layer. By employing
- the chain rule of calculus, the algorithm progresses from the output layer
- through each hidden layer, ultimately reaching the first layer in order to
- compute the respective gradients. The term,
- \begin{equation}
- \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
- \end{equation}
- illustrates that, in contrast to the procedure described in~\cref{eq:backprop},
- this procedure the \emph{automatic differenciation} goes one step further and
- calculates the gradient of the output with respect to the input
- $\boldsymbol{x}$. In order to calculate the second derivative
- $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
- this procedure must be repeated.\\
- Above we present a method for approximating functions through the use of
- systems of differential equations. As previously stated, we want to find a
- solver for systems of differential equations. In problems, where we must solve
- an ODE or PDE, we have to find a set of parameters, that satisfies the system
- for any input $\boldsymbol{x}$. In terms of the context of PINN's this is the
- inverse problem, where we have a set of training data from measurements, for
- example, is available along with the respective differential equations but
- information about the parameters of the equations is lacking. To address this
- challenge, we set these parameters as distinct learnable parameters within the
- neural network. This enables the network to utilize a specific value, that
- actively influences the physics loss $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$.
- During the training phase the optimizer aims to minimize the physics loss,
- which should ultimately yield an approximation of the true value.\\
- \begin{figure}[h]
- \centering
- \includegraphics[scale=0.87]{oscilator.pdf}
- \caption{Illustration of of the movement of an oscillating body in the
- underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
- \label{fig:spring}
- \end{figure}
- One illustrative example of a potential application for PINN's is the
- \emph{damped harmonic oscillator}~\cite{Tenenbaum1985}. In this problem, we \todo{check source for wording}
- displace a body, which is attached to a spring, from its resting position. The
- body is subject to three forces: firstly, the inertia exerted by the
- displacement $u$, which points in the direction the displacement $u$; secondly
- the restoring force of the spring, which attempts to return the body to its
- original position and thirdly, the friction force, which points in the opposite
- direction of the movement. In accordance with Newton's second law and the
- combined influence of these forces, the body exhibits oscillatory motion around
- its position of rest. The system is influenced by $m$ the mass of the body,
- $\mu$ the coefficient of friction and $k$ the spring constant, indicating the
- stiffness of the spring. The residual of the differential equation, \todo{check in book}
- \begin{equation}
- m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
- \end{equation}
- shows relation of these parameters in reference to the problem at hand. As
- Tenenbaum and Morris provide, there are three potential solutions to this
- issue. However only the \emph{underdamped case} results in an oscillating
- movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
- PINN to this problem, we require a set of training data $x$. This consists of
- pairs of time points and corresponding displacement measurements
- $(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
- we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
- initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$. However, we do
- not know the value of the friction $\mu$. In this case the loss function,
- \begin{equation}
- \mathcal{L}_{osc}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = (u^{(1)}-1)+\frac{du(0)}{dt}+||m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2,
- \end{equation}
- includes the border conditions, the residual, in which $\mu$ is a learnable
- parameter and the observation loss.
- % -------------------------------------------------------------------
- \subsection{Disease Informed Neural Networks 1}
- \label{sec:pinn:dinn}
- In this section, we describe the capability of MLP's to solve systems of
- differential equations. In~\Cref{sec:pandemicModel:sir}, we describe the SIR
- model, which models the relations of susceptible, infectious and removed
- individuals and simulates the progress of a disease in a population with a
- constant size. A system of differential equations models these relations. Shaier
- \etal~\cite{Shaier2021} propose a method to solve the equations of the SIR model
- using a PINN, which they call a \emph{disease-informed neural network} (DINN).\\
- To solve~\Cref{eq:sir} we need to find the transmission rate $\beta$ and the
- recovery rate $\alpha$. As Shaier \etal~\cite{Shaier2021} point out, there are
- different approaches to solve this set of equations. For instance, building on
- the assumption, that at the beginning one infected individual infects $-n$ other
- people, concluding in $\frac{dS(0)}{dt} = -n$. Then,
- \begin{equation}
- \beta=-\frac{\frac{dS}{dt}}{S_0I_0}
- \end{equation}
- would calculate the initial transmission rate using the initial size of the
- susceptible group $S_0$ and the infectious group $I_0$. The recovery rate, then
- could be defined using the amount of days a person between the point of
- infection and the start of isolation $d$, $\alpha = \frac{1}{d}$. The analytical
- solutions to the SIR models often use heuristic methods and require knowledge
- like the sizes $S_0$ and $I_0$. A data-driven approach such as the one that
- Shaier \etal~\cite{Shaier2021} propose does not have these problems. Since the
- model learns the parameters $\beta$ and $\alpha$ while learning the training
- data consisting of the time points $\boldsymbol{t}$, and the corresponding
- measured sizes of the groups $\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}$.
- Let $\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}$ be the
- model predictions of the groups and
- $r_S=\frac{d\hat{\boldsymbol{S}}}{dt}+\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}},
- r_I=\frac{d\hat{\boldsymbol{I}}}{dt}-\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}}+\alpha \hat{\boldsymbol{I}}$
- and $r_R=\frac{d \hat{\boldsymbol{R}}}{dt} - \alpha \hat{\boldsymbol{I}}$ the
- residuals of each differential equation using the model predictions. Then,
- \begin{equation}
- \begin{split}
- \mathcal{L}_{SIR}() = ||r_S||^2 + ||r_I||^2 + ||r_R||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}||^2 &+\\
- ||\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}||^2 &+\\
- ||\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}||^2 &,
- \end{split}
- \end{equation}
- is the loss function of a DINN, with $\alpha$ and $beta$ being learnable
- parameters.
- % -------------------------------------------------------------------
|