123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718 |
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Author: Phillip Rothenbeck
- % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
- % File: chap02/chap02.tex
- % Part: theoretical background
- % Description:
- % summary of the content in this chapter
- % Version: 20.08.2024
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- \chapter{Theoretical Background}
- \label{chap:background}
- This chapter introduces the theoretical foundations for the work presented in
- this thesis. In~\Cref{sec:domain} and~\Cref{sec:differentialEq}, we describe
- differential equations and the underlying theory. In these Sections both the
- explanations and the approach are based on a book on analysis by
- Rudin~\cite{Rudin2007} and a book about ordinary differential equations by
- Tenenbaum and Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this
- knowledge to examine various pandemic models in~\Cref{sec:epidemModel}. Finally,
- we address the topic of neural networks with a focus on the multilayer
- perceptron in~\Cref{sec:mlp} and physics-informed neural networks
- in~\Cref{sec:pinn}.
- % -------------------------------------------------------------------
- \section{Mathematical Modelling using Functions}
- \label{sec:domain}
- To model a physical problem mathematically, it is necessary to define a
- fundamental set of numbers or quantities upon which the subsequent calculations
- will be based. These sets may represent, for instance, a specific time interval
- or a distance. The term \emph{domain} describes these fundamental sets of
- numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
- living in a certain domain. In this thesis, we will focus on domains of real
- numbers in $\mathbb{R}$.\\
- The mapping between variables enables the modeling of a physical process and may
- depict semantics. We use functions in order to facilitate this mapping. Let
- $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
- function as the mapping
- \begin{equation}
- f: A\rightarrow B.
- \end{equation}
- In other words, the function $f$ maps elements $x\in A$ to values
- $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
- of $f$. Functions are capable of representing the state of a system as a value
- based on an input value from their domain. One illustrative example is a
- function that maps a time step to the distance covered since a starting point.
- In this case, time serves as the domain, while the distance is the codomain.
- % -------------------------------------------------------------------
- \section{Mathematical Modelling using Differential Equations}
- \label{sec:differentialEq}
- Often, the behavior of a variable or a quantity across a domain is more
- interesting than its current state. Functions are able to give us the latter,
- but do not contain information about the change of a system. The objective
- is to determine an effective method for calculating the change of a function
- across its domain. Let $f$ be a function and $[a, b]\subset \mathbb{R}$ an
- interval of real numbers. The expression
- \begin{equation}
- m = \frac{f(b) - f(a)}{a-b}
- \end{equation}
- gives the average rate of change. While the average rate of change is useful in
- many cases, the momentary rate of change is more accurate. To calculate the
- momentary rate of change at $x$, we let the value $t$ approach $x$ thereby
- narrowing down the interval to an infinitesimal. For each $x\in[a, b]$ we
- calculate
- \begin{equation} \label{eqn:differential}
- \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
- \end{equation}
- if it exists. As the Tenenbaum and Pollard~\cite{Tenenbaum1985} define,
- $\nicefrac{df}{dx}$ is the \emph{derivative}, which is ``the rate of change of a
- variable with respect to another''. The relation between a variable and its
- derivative is modeled in a \emph{differential equation}. The derivative of
- $\nicefrac{df}{dx}$ yields $\nicefrac{d^2f}{dx^2}$, which is the function that
- calculates the rate of change of the rate of change and is called the
- \emph{second order derivative}. Iterating this $n$ times results in
- $\nicefrac{d^nf}{dx^n}$, the derivative of the $n$'th order. A method for
- obtaining a differential equation is to derive it from the semantics of a
- problem. For example, in physics a differential equation can be derived from the
- law of the conservation of energy~\cite{Demtroeder2021}. Differential equations
- find application in several areas such as engineering \eg, the Kirchhoff's
- circuit laws~\cite{Kirchhoff1845} to describe the relation between the voltage
- and current in systems with resistors, inductors, and capacitors; physics with,
- \eg, the Schrödinger equation, which predicts the probability of finding
- particles like electrons in specific places or states in a quantum system;
- economics, \eg, Black-Scholes equation~\cite{Oksendal2000} predicting the price
- of financial derivatives, such as options, over time; epidemiology with the SIR
- Model~\cite{1927}; and beyond.\\
- In the context of functions, it is possible to have multiple domains, meaning
- that a function has more than one parameter. To illustrate, consider a function
- operating in two-dimensional space, wherein each parameter represents one axis.
- Another example would be a function, that maps its inputs of a location variable
- and a time variable on a height. The term \emph{partial differential equations}
- (PDE) describes differential equations of such functions, which contain
- partial derivatives with respect to each individual domain. In contrast,
- \emph{ordinary differential equations} (ODE) are the single derivatives for a
- function having only one domain~\cite{Tenenbaum1985}. In this thesis, we
- restrict ourselves to ODE's. Furthermore, a
- \emph{system of differential equations} is the name for a set of differential
- equations. The derivatives in a system of differential equations each have their
- own codomain, which is part of the problem, while they all share the same
- domain.\\
- Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
- including the \emph{Motion of a Particle Along a Straight Line}. Further,
- Newton's second law states that ``the rate of change of the momentum of a body
- ($momentum = mass \cdot velocity$) is proportional to the resultant external
- force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
- in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
- Then, Newton's second law translates mathematically to
- \begin{equation} \label{eq:newtonSecLaw}
- F = m\frac{dv}{dt}.
- \end{equation}
- It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
- the velocity is part of the equation. Additionally, the velocity of a body is
- the derivative of the distance traveled by that body. Based on these findings,
- we can rewrite the~\Cref{eq:newtonSecLaw} to,
- \begin{equation}
- F=ma=m\frac{d^2s}{dt^2},
- \end{equation}
- showing that the force $F$ influences the changes of the body's position over
- time.\\
- To conclude, note that this explanation of differential equations focuses on the
- aspects deemed crucial for this thesis and is not intended to be a complete
- explanation of the subject. To gain a better understanding of it, we recommend
- the books mentioned above~\cite{Rudin2007,Tenenbaum1985}. In the following
- section we describe the application of these principles in epidemiological
- models.
- % -------------------------------------------------------------------
- \section{Epidemiological Models}
- \label{sec:epidemModel}
- Pandemics, like \emph{COVID-19}, have resulted in a significant
- number of fatalities. Hence, the question arises: How should we analyze a
- pandemic effectively? It is essential to study whether the employed
- countermeasures are efficacious in combating the pandemic. Given the unfavorable
- public response~\cite{Jaschke2023} to measures such as lockdowns, it is
- imperative to investigate if their efficacy remains commensurate with the costs
- incurred to those affected. In the event that alternative and novel technologies
- were in use, such as the mRNA vaccines in the context of COVID-19, it is needful
- to test the effect and find the optimal variant. In order to shed light on the
- aforementioned events, we need a method to quantify the pandemic along with its
- course of progression.\\
- The real world is a highly complex system, which presents a significant
- challenge attempting to describe it fully in a mathematical model. Therefore,
- the model must reduce the complexity while retaining the essential information.
- Furthermore, it must address the issue of limited data availability. For
- instance, during COVID-19 institutions such as the Robert Koch Institute
- (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
- able to collect data on infections and mortality cases. Consequently, we require
- a model that employs an abstraction of the real world to illustrate the events
- and relations that are pivotal to understanding the problem.
- % -------------------------------------------------------------------
- \subsection{SIR Model}
- \label{sec:pandemicModel:sir}
- In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
- which subsequently became one of the most influential epidemiological models.
- This model enables the modeling of infections during epidemiological events such as pandemics.
- The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
- reiterates the model and serves as the foundation for the following explanation
- of SIR models.\\
- The SIR model is capable of illustrating diseases, which are transferred through
- contact or proximity of an individual carrying the illness and a healthy
- individual. This is possible due to the distinction between infected individuals
- who are carriers of the disease and the part of the population, which is
- susceptible to infection. In the model, the mentioned groups are capable to
- change, \eg, healthy individuals becoming infected. The model assumes the
- size $N$ of the population remains constant throughout the duration of the
- pandemic. The population $N$ comprises three distinct compartments: the
- \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
- \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
- \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
- \begin{equation}
- S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
- \end{equation}
- give the values of $S$, $I$ and $R$ at a certain point of time
- $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
- \begin{equation} \label{eq:N_char}
- N = S + I + R.
- \end{equation}
- The model makes another assumption by stating that recovered people are immune
- to the illness and infectious individuals can not infect them. The individuals
- in the $R$ group are either recovered or deceased, and thus unable to transmit
- or carry the disease.
- \begin{figure}[t]
- \centering
- \includegraphics[width=\textwidth]{sir_graph.pdf}
- \caption{A visualization of the SIR model~\cite{1927}, illustrating $N$ being split in the
- three groups $S$, $I$ and $R$.}
- \label{fig:sir_model}
- \end{figure}
- As visualized in the~\Cref{fig:sir_model} the
- individuals may transition between groups based on epidemiological parameters. The
- transmission rate $\beta$ is responsible for individuals becoming infected,
- while the rate of removal or recovery rate $\alpha$ (also referred to as
- $\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
- individuals from $I$ to $R$.\\
- We can describe this problem mathematically using a system of differential
- equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
- McKendrick~\cite{1927} propose the following set of differential equations:
- \begin{equation}\label{eq:sir}
- \begin{split}
- \frac{dS}{dt} &= -\beta S I,\\
- \frac{dI}{dt} &= \beta S I - \alpha I,\\
- \frac{dR}{dt} &= \alpha I.
- \end{split}
- \end{equation}
- This set of differential equations, is based on the following assumption:
- ``The rate of transmission of a microparasitic disease is proportional to the
- rate of encounter of susceptible and infective individuals modelled by the
- product ($\beta S I$)'', according to Edelstein-Keshet~\cite{EdelsteinKeshet2005}.
- The system shows the change in size of the groups per time unit due to
- infections, recoveries, and deaths.\\
- The term $\beta SI$ describes the rate of encounters of susceptible and infected
- individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
- and May~\cite{Anderson1991} propose a modified model:
- \begin{equation}\label{eq:modSIR}
- \begin{split}
- \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
- \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
- \frac{dR}{dt} &= \alpha I.
- \end{split}
- \end{equation}
- By normylizing $\beta SI$ by $N$ the~\Cref{eq:modSIR} is more correct in a
- real world~aspect~\cite{Anderson1991}.\\
- The initial phase of a pandemic is characterized by the infection of a small
- number of individuals, while the majority of the population remains susceptible.
- The infectious group has not yet infected any individuals thus
- neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
- the number of infected individuals at the beginning of the disease. Then,
- \begin{equation}\label{eq:startCond}
- \begin{split}
- S(0) &= N - I_{0},\\
- I(0) &= I_{0},\\
- R(0) &= 0,
- \end{split}
- \end{equation}
- describes the initial configuration of a system in which a disease has just
- emerged.\\
- \begin{figure}[t]
- %\centering
- \setlength{\unitlength}{1cm} % Set the unit length for coordinates
- \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
- % reference
- \put(0, 1.75){
- \begin{subfigure}{0.35\textwidth}
- \centering
- \includegraphics[width=\textwidth]{reference_params_synth.pdf}
- \caption{$\alpha=0.35$, $\beta=0.5$}
- \label{fig:synth_norm}
- \end{subfigure}
- }
- % 1. row, 1.image (low beta)
- \put(5.25, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{low_beta_synth.pdf}
- \caption{$\alpha=0.25$, $\beta=0.5$}
- \label{fig:synth_low_beta}
- \end{subfigure}
- }
- % 1. row, 2.image (high beta)
- \put(9.5, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{high_beta_synth.pdf}
- \caption{$\alpha=0.45$, $\beta=0.5$}
- \label{fig:synth_high_beta}
- \end{subfigure}
- }
- % 2. row, 1.image (low alpha)
- \put(5.25, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{low_alpha_synth.pdf}
- \caption{$\alpha=0.35$, $\beta=0.4$}
- \label{fig:synth_low_alpha}
- \end{subfigure}
- }
- % 2. row, 2.image (high alpha)
- \put(9.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{high_alpha_synth.pdf}
- \caption{$\alpha=0.35$, $\beta=0.6$}
- \label{fig:synth_high_alpha}
- \end{subfigure}
- }
- \end{picture}
- \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$,
- $I_0=10$ with different sets of parameters. We visualize the case with the
- reference parameters in (a). In (b) and (c) we keep $\alpha$ constant, while
- varying the value of $\beta$. In contrast, (d) and (e) have varying values
- of $\alpha$.}
- \label{fig:synth_sir}
- \end{figure}
- In the SIR model the temporal occurrence and the height of the peak (or peaks)
- of the infectious group are of great importance for understanding the
- dynamics of a pandemic. A low peak occurring at a late point in time indicates
- that the disease is unable to keep pace with the rate of recovery, resulting
- in its demise before it can exert a significant influence on the population. In
- contrast, an early and high peak means that the disease is rapidly transmitted
- through the population, with a significant proportion of individuals having been
- infected.~\Cref{fig:sir_model} illustrates this effect by varying the values of
- $\beta$ or $\alpha$ while simulating a pandemic using a model such
- as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
- and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
- When the number of infections exceeds the number of recoveries, the peak of $I$
- will occur early and will be high. On the other hand, if recoveries occur at a
- faster rate than new infections the peak will occur later and will be low. Thus,
- it is crucial to know both $\beta$ and $\alpha$, as these parameters
- characterize how the pandemic evolves.\\
- The SIR model is based on a number of assumptions that are intended to reduce
- the overall complexity of the model while still representing the processes
- observed in the real-world. For example, the size of a population
- in the real-world is subject to a number of factors that can contribute to
- change. The population is increased by the occurrence of births and decreased
- by the occurrence of deaths. One assumption, stated in the SIR model is that
- the size of the population, $N$, remains constant, as the daily change is
- negligible to the total population. Other examples include the impossibility
- for individuals to be susceptible again, after having recovered, or the
- possibility for the epidemiological parameters to change due to new variants or the
- implementation of new countermeasures. We address this latter option in the
- next~\Cref{sec:pandemicModel:rsir}.
- % -------------------------------------------------------------------
- \subsection{Reduced SIR Model and the Reproduction Number}
- \label{sec:pandemicModel:rsir}
- The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. This model
- contains two scalar parameters $\beta$ and $\alpha$, which describe the course
- of a pandemic over its duration. This is beneficial when examining the overall
- pandemic; however, in the real world, disease behavior is dynamic, and the
- values of the parameters $\beta$ and $\alpha$ change throughout the course of
- the disease. The reason for this is due to events such as the implementation of
- countermeasures that reduce the contact between the infectious and susceptible
- individuals, the emergence of a new variant of the disease that increases its
- infectivity or deadliness, or the administration of a vaccination that provides
- previously susceptible individuals with immunity without ever being infected.
- As these fine details of disease progression are missed in the constant rates,
- Liu and Stechlinski~\cite{Liu2012}, and Setianto and Hidayat~\cite{Setianto2023},
- introduce time-dependent epidemiological parameters and the time-dependent reproduction
- number to address this issue. Millevoi \etal~\cite{Millevoi2023} present a
- reduced version of the SIR model.\\
- For the time interval, $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
- they alter the definition of $\beta$ and $\alpha$ to be time-dependent,
- \begin{equation}
- \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
- \end{equation}
- Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
- span an infected individual requires to recuperate. Subsequently, at the initial time point
- $t_0$, the \emph{reproduction number},
- \begin{equation}
- \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
- \end{equation}
- represents the number of susceptible individuals, that one infectious individual
- infects at the onset of the pandemic. In light of the effects of $\beta$ and
- $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO < 1$ indicates that the
- pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
- limited number of infections resulting from $I(t_0) << S(t_0)$. Further,
- $\RO > 1$ leads to the disease spreading rapidly across the population, with an
- increase in $I$ occurring at a high rate. Nevertheless, $\RO$ does not cover
- the entire time span. For this reason, Millevoi \etal~\cite{Millevoi2023}
- introduce $\Rt$ which has the same interpretation as $\RO$, with the exception
- that $\Rt$ is dependent on time. The time-dependent reproduction number is
- defined as,
- \begin{equation}\label{eq:repr_num}
- \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N},
- \end{equation}
- on the time interval $\mathcal{T}$ and the population site $N$. This definition
- includes the epidemiological parameters for information about the spread of the disease
- and information of the decrease of the ratio of susceptible individuals in the
- population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
- a state variable in the model, which gives information about the reproduction of the disease
- for each day. As Millevoi \etal~\cite{Millevoi2023} show, $\Rt$ enables the
- following reduction of the SIR model.\\
- \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
- $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
- \begin{equation}
- \begin{split}
- \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
- \frac{dI}{dt} &= -\alpha\Rt I(t),
- \end{split}
- \end{equation}
- is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
- characteristic and the reproduction number $\Rt$ (see ~\Cref{eq:repr_num}).
- Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
- extensive range of values that the SIR groups can assume. Accordingly, they
- initially scale the time interval $\mathcal{T}$ using its borders to calculate
- the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in[0, 1]$. Subsequently, they
- calculate the scaled groups,
- \begin{equation}
- S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
- \end{equation}
- using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
- variable $I$, results in,
- \begin{equation}\label{eq:reduced_sir_ODE}
- \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
- \end{equation}
- which is a further reduced version of~\Cref{eq:sir}. This less complex
- differential equation results in a less complex solution, as it entails the
- elimination of a parameter ($\beta$) and the two state variables ($S$ and $R$).
- The reduced SIR model is more precise due to fewer input variables, making it
- advantageous in situations with limited data, such as when recovery data is
- missing.
- % -------------------------------------------------------------------
- \section{Multilayer Perceptron}
- \label{sec:mlp}
- In~\Cref{sec:differentialEq}, we discuss the modeling of systems using differential
- equations in systems, illustrating how they can be utilized to elucidate the
- impact of a specific parameter on the system's behavior.
- In~\Cref{sec:epidemModel}, we show specific applications of differential
- equations in an epidemiological context. Solving such systems is crucial and
- involves finding a function that best fits the data. Fitting measured data points to
- approximate such a function, is one of the multiple methods to achieve this
- goal. The \emph{Multilayer Perceptron} (MLP)~\cite{Rumelhart1986} is a
- data-driven function approximator. In the following section, we provide a brief
- overview of the structure and training of these \emph{neural networks}. For
- reference, we use the book \emph{Deep Learning} by Goodfellow
- \etal~\cite{Goodfellow-et-al-2016} as a foundation for our explanations.\\
- The objective is to develop an approximation method for any function $f^{*}$,
- which could be a mathematical function or a mapping of an input vector to the
- desired output. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
- the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
- is the function to approximate. In the year 1958,
- Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
- a neuron in a neuroscientific sense. The perceptron takes in the input vector
- $\boldsymbol{x}$ performs an operation and produces a scalar result. This model
- optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
- f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
- Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
- approximating a specific class of functions. Consequently, there is a necessity
- for an expansion of the perceptron.\\
- As Goodfellow \etal~\cite{Goodfellow-et-al-2016} proceed, the solution to this issue is to decompose $f$ into
- a chain structure of the form,
- \begin{equation} \label{eq:mlp_char}
- f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
- \end{equation}
- This nested version of a perceptron is a multilayer perceptron. Each
- sub-function, designated as $f^{(i)}$, is represented in the structure of an
- MLP as a \emph{layer}, which contains a linear mapping and a nonlinear mapping
- in form of an \emph{activation function}. A multitude of
- \emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
- same vector-to-scalar calculation as the perceptron does. Subsequently, a
- nonlinear activation function transforms the scalar output into the activation
- of the unit. The layers are staggered in the neural network, with each layer
- being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
- input vector $\boldsymbol{x}$ is provided to each unit of the first layer
- $f^{(1)}$, which then gives the results to the units of the second layer
- $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
- intervening layers, situated between the first and the output layers are the
- \emph{hidden layers}. The term \emph{forward propagation} describes the
- process of information flowing through the network from the input layer to the
- output layer, resulting in a scalar loss. The alternating structure of linear
- and nonlinear calculation enables MLP's to approximate any function. As Hornik
- \etal~\cite{Hornik1989} proves, MLP's are universal approximators.\\
- \begin{figure}[t]
- \centering
- \includegraphics[width=\textwidth]{MLP.pdf}
- \caption{A illustration of an MLP~\cite{Rumelhart1986} with two hidden layers.
- Each neuron of a layer is connected to every neuron of the neighboring
- layers. The arrow indicates the direction of the forward propagation.}
- \label{fig:mlp_example}
- \end{figure}
- The term \emph{training} describes the process of optimizing the parameters
- $\theta$. In order to undertake training, it is necessary to have a set of
- \emph{training data}, which is a set of pairs (also called training points) of
- the input data $\boldsymbol{x}$ and its corresponding true solution
- $\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
- define a \emph{loss function} $\Loss{ }$, using the model prediction
- $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
- metric for evaluating the extent to which the model deviates from the correct
- answer. One common loss function is the \emph{mean square error}
- (MSE) loss function. Let $N$ be the number of points in the set of training
- data. Then,
- \begin{equation} \label{eq:mse}
- \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
- \end{equation}
- calculates the squared difference between each model prediction and true value
- of a training and takes the mean across the whole training data. \\
- Ultimately, the objective is to utilize this information to optimize the parameters, in order to minimize the
- loss. One of the most fundamental and seminal optimization strategy is \emph{gradient
- descent}. In this process, the derivatives are employed to identify the location
- of local or global minima within a function, which lie where the gradient is
- zero. Given that a positive gradient
- signifies ascent and a negative gradient indicates descent, we must move the
- variable by a \emph{learning rate} (step size) in the opposite
- direction to that of the gradient. The calculation of the derivatives in respect
- to the parameters is a complex task, since our functions is a composition of
- many functions (one for each layer). We can address this issue taking advantage
- of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
- $\hat{\boldsymbol{y}} = f(\boldsymbol{x}; \theta)$ be the model prediction with the
- decomposed version $f(\boldsymbol{x}; \theta) = f^{(3)}(w; \theta_3)$ with
- $w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
- $\boldsymbol{x}$ is the input vector and $\theta_3, \theta_2, \theta_1\subset\theta$.
- Then,
- \begin{equation}\label{eq:backprop}
- \nabla_{\theta_3} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(3)}}\nabla_{\theta_3}f^{(3)},
- \end{equation}
- is the gradient of $\Loss{ }$ in respect of the parameters $\theta_3$. To obtain
- $\nabla_{\theta_2} \Loss{ }$, we have to derive $\nabla_{\theta_3} \Loss{ }$ in
- respect to $\theta_2$. The name of this method in the context of neural
- networks is \emph{back propagation}~\cite{Rumelhart1986}, as it propagates the
- error backwards through the neural network.\\
- In practical applications, an optimizer often accomplishes the optimization task
- by executing back propagation in the background. Furthermore, modifying the
- learning rate during training can be advantageous. For instance, making larger
- steps at the beginning and minor adjustments at the end. Therefore, schedulers
- are implementations algorithms that employ diverse learning rate alteration
- strategies.\\
- For a more in-depth discussion of practical considerations and additional
- details like regularization, we direct the reader to the book
- \emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next
- section will demonstrate the application of neural networks in approximating
- solutions to differential systems.
- % -------------------------------------------------------------------
- \section{Physics-Informed Neural Networks}
- \label{sec:pinn}
- In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
- wildely recognized tools for approximating any kind of function. In 1997
- Lagaris \etal~\cite{Lagaris1998} provided a method, that utilizes gradient
- descent to solve ODEs and PDEs. Building on this approach, Raissi
- \etal~\cite{Raissi2019} introduced the methodology with the name
- \emph{Physics-Informed Neural Network} (PINN) in 2017. In this approach, the
- model learns to approximate a function using provided data points while
- leveraging the available knowledge about the problem in the form of a system of
- differential equations.\\
- In contrast to standard MLP models, PINNs are not solely data-driven. The differential
- equation,
- \begin{equation}
- \boldsymbol{y}=\mathcal{D}(\boldsymbol{x}),
- \end{equation}
- includes both the solution $\boldsymbol{y}$, and the operant $\mathcal{D}$,
- which incorporates all derivatives with respect to the input $\boldsymbol{x}$.
- This equation contains the information of the physical properties and dynamics of
- $\boldsymbol{y}$. In order to find the solution $\boldsymbol{y}$, we must solve the
- differential equation with respect to data which is related to the problem at hand. As
- Raissi \etal~\cite{Raissi2019} propose, we employ a neural network with the
- parameters $\theta$. The MLP then is supposed to optimize its parameters for
- its output $\hat{\boldsymbol{y}}$ to approximate the solution $\boldsymbol{y}$.
- In order to achieve this, we train the model on data containing input-output
- pairs with measures of $\boldsymbol{y}$. The output $\hat{\boldsymbol{y}}$ is
- fitted to the data through the mean square error data loss $\mathcal{L}_{\text{data}}$.
- Moreover, the data loss function may include additional terms for initial and boundary
- conditions. Furthermore, the physics are incorporated through an additional loss
- term of the physics loss $\mathcal{L}_{\text{physics}}$ which includes the
- differential equation through its residual $r=\boldsymbol{y} - \mathcal{D}(\boldsymbol{x})$.
- This leads to the PINN loss function,
- \begin{align}\label{eq:PINN_loss}
- \mathcal{L}_{\text{PINN}}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) & = & & \mathcal{L}_{\text{data}} (\boldsymbol{y},\hat{\boldsymbol{y}}) & + & \quad \mathcal{L}_{\text{physics}} (\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) & \\
- & = & & \frac{1}{N_t}\sum_{i=1}^{N_t} || \hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2 & + & \quad\frac{1}{N_d}\sum_{i=1}^{N_d} || r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 & ,
- \end{align}
- with $N_d$ the number of differential equations in a system and $N_t$ the
- number of training samples used for training. Utilizing~\Cref{eq:PINN_loss}, the
- PINN simultaneously optimizes its parameters $\theta$ to minimize both the data
- loss and the physics loss. This makes it a multi-objective optimization problem.\\
- Given the nature of differential equations, calculating the loss term of
- $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
- calculation of the derivative of the output with respect to the input of
- the neural network. As we outline in~\Cref{sec:mlp}, during the process of
- back-propagation we calculate the gradients of the loss term in respect to a
- layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
- the index of the respective layer. By employing
- the chain rule of calculus, the algorithm progresses from the output layer
- through each hidden layer, ultimately reaching the first layer in order to
- compute the respective gradients. The term,
- \begin{equation}
- \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
- \end{equation}
- illustrates that, in contrast to the procedure described in~\Cref{eq:backprop},
- this procedure the \emph{automatic differentiation} goes one step further and
- calculates the gradient of the output with respect to the input
- $\boldsymbol{x}$. In order to calculate the second derivative
- $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
- this procedure must be repeated.\\
- Above we present a method by Raissi et al.~\cite{Raissi2019} for approximating
- functions through the use of systems of differential equations. As previously
- stated, we want to find a
- solution for systems of differential equations. In problems where we must solve
- an ODE or PDE, we have to find a set of parameters that satisfies the system
- for any input $\boldsymbol{x}$. In the context of PINNs, this is an inverse
- problem. We have training data from measurements and the corresponding
- differential equations, but the parameters of these equations are unknown. To
- address this challenge, we implement these parameters as distinct learnable
- parameters within the neural network. This enables the network to utilize a
- specific value, that actively influences the physics loss
- $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$. During the
- training phase the optimizer aims to minimize the physics loss, which should
- ultimately yield an approximation of the true parameter value fitting the
- observations.\\
- \begin{figure}[t]
- \centering
- \includegraphics[width=\textwidth]{oscilator.pdf}
- \caption{Illustration of of the movement of an oscillating body in the
- underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
- \label{fig:spring}
- \end{figure}
- In order to illustrate the working of a PINN, we use the example of a
- \emph{damped harmonic oscillator} taken from~\cite{Moseley}. In this problem, we
- displace a body, which is attached to a spring, from its resting position. The
- body is subject to three forces: firstly, the inertia exerted by the
- displacement $u$, which points in the direction of the displacement; secondly,
- the restoring force of the spring, which attempts to return the body to its
- original position and thirdly, the friction force, which points in the opposite
- direction of the movement. In accordance with Newton's second law and the
- combined influence of these forces, the body exhibits oscillatory motion around
- its position of rest. The system is influenced by $m$ the mass of the body,
- $\mu$ the coefficient of friction and $k$ the spring constant, indicating the
- stiffness of the spring. The residual of the differential equation,
- \begin{equation}
- m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
- \end{equation}
- shows relation of these parameters in reference to the problem at hand. As
- Tenenbaum and Morris~\cite{Tenenbaum1985} provide, there are three potential solutions to this
- issue. However only the \emph{underdamped case} results in an oscillating
- movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
- PINN to this problem, we require a set of training data $x$. This consists of
- pairs of time points and corresponding displacement measurements
- $(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
- we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
- initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$. However, we do
- not know the value of the friction $\mu$. In this case the loss function,
- \begin{equation}
- \begin{split}
- \mathcal{L}_{\text{osc}}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = & (u^{(1)}-1)+\frac{du(0)}{dt}+ \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2 \\
- + & ||m\frac{d^2u}{dx^2}+\hat{\mu}\frac{du}{dx}+ku||^2,
- \end{split}
- \end{equation}
- includes the border conditions, the residual, in which $\hat{\mu}$ is a learnable
- parameter and the data loss. By minimizing $\mathcal{L}_{\text{osc}}$ and
- solving the inverse problem the PINN is able to find the missing parameter
- $\mu$. This shows the methodology by which PINNs are capable of learning the
- parameters of physical systems, such as the damped harmonic oscillator. In the
- following section, we present the approach of Shaier \etal~\cite{Shaier2021} to
- find the transmission rate and recovery rate of the SIR model using PINNs.
- % -------------------------------------------------------------------
- \subsection{Disease-Informed Neural Networks}
- \label{sec:pinn:dinn}
- In the preceding section, we present a data-driven methodology, as described by Lagaris
- \etal~\cite{Lagaris1998}, for solving systems of differential equations by employing
- PINNs. In~\Cref{sec:pandemicModel:sir}, we describe the SIR model, which models
- the relations of susceptible, infectious and removed individuals and simulates
- the progress of a disease in a population with a constant size. A system of
- differential equations models these relations. Shaier \etal~\cite{Shaier2021}
- propose a method to solve the equations of the SIR model using a PINN, which
- they call a \emph{Disease-Informed Neural Network} (DINN).\\
- To solve~\Cref{eq:sir} we need to find the transmission rate $\beta$ and the
- recovery rate $\alpha$. As Shaier \etal~\cite{Shaier2021} point out, there are
- different approaches to solve this set of equations. For instance, building on
- the assumption, that at the beginning one infected individual infects $-n$ other
- people, concluding in $\frac{dS(0)}{dt} = -n$. Then,
- \begin{equation}
- \beta=-\frac{\frac{dS}{dt}}{S_0I_0}
- \end{equation}
- would calculate the initial transmission rate using the initial size of the
- susceptible group $S_0$ and the infectious group $I_0$. The recovery rate, then
- could be defined using the amount of days a person between the point of
- infection and the start of isolation $d$, $\alpha = \frac{1}{d}$. The analytical
- solutions to the SIR models often use heuristic methods and require knowledge
- like the sizes $S_0$ and $I_0$. A data-driven approach such as the one that
- Shaier \etal~\cite{Shaier2021} propose does not suffer from these problems. Since the
- model learns the parameters $\beta$ and $\alpha$ while learning the training
- data consisting of the time points $\boldsymbol{t}$, and the corresponding
- measured sizes of the groups $\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}$.
- Let $\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}$ be the
- model predictions of the groups and
- $r_S=\frac{d\hat{\boldsymbol{S}}}{dt}+\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}},
- r_I=\frac{d\hat{\boldsymbol{I}}}{dt}-\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}}+\alpha \hat{\boldsymbol{I}}$
- and $r_R=\frac{d \hat{\boldsymbol{R}}}{dt} - \alpha \hat{\boldsymbol{I}}$ the
- residuals of each differential equation using the model predictions. Then,
- \begin{equation}
- \begin{split}
- \mathcal{L}_{SIR}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = &||r_S||^2 + ||r_I||^2 + ||r_R||^2\\
- + &\frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}||^2 + ||\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}||^2 + ||\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}||^2,
- \end{split}
- \end{equation}
- is the loss function of a DINN, with $\alpha$ and $\beta$ being learnable
- parameters. These are represented in the residuals of the ODEs.
- % -------------------------------------------------------------------
|