123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635 |
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Author: Phillip Rothenbeck
- % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
- % File: chap02/chap02.tex
- % Part: theoretical background
- % Description:
- % summary of the content in this chapter
- % Version: 05.08.2024
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- \chapter{Theoretical Background 12}
- \label{chap:background}
- This chapter introduces the theoretical knowledge that forms the foundation of
- the work presented in this thesis. In~\Cref{sec:domain}
- and~\Cref{sec:differentialEq}, we talk about differential equations and the
- underlying theory. In these Sections both the explanations and the approach are
- strongly based on the book on analysis by Rudin~\cite{Rudin2007} and the book
- about ordinary differential equations by Tenenbaum and
- Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this knowledge to examine
- various pandemic models in~\Cref{sec:epidemModel}.
- Finally, we address the topic of neural networks with a focus on the multilayer
- perceptron in~\Cref{sec:mlp} and physics informed neural networks
- in~\Cref{sec:pinn}.
- % -------------------------------------------------------------------
- \section{Mathematical Modelling using Functions 1}
- \label{sec:domain}
- To model a physical problem using mathematical tools, it is necessary to define
- a set of fundamental numbers or quantities upon which the subsequent
- calculations will be based. These sets may represent, for instance, a specific
- time interval or a distance. The term \emph{domain} describes these fundamental
- sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
- entity living in a certain domain. In this thesis, we will focus on domains of
- real numbers in $\mathbb{R}$.\\
- The mapping between variables enables the modeling of the process and depicts
- the semantics. We use functions in order to facilitate this mapping. Let
- $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
- function as the mapping
- \begin{equation}
- f: A\rightarrow B.
- \end{equation}
- In other words, the function $f$ maps elements $x\in A$ to values
- $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
- of $f$. Functions are capable of representing the state of a system as a value
- based on an input value from their domain. One illustrative example is a
- function that maps a time point to the distance covered since a starting point.
- In this case, time serves as the domain, while the distance is the codomain.
- % -------------------------------------------------------------------
- \section{Basics of Differential Equations 1}
- \label{sec:differentialEq}
- Often, the change of a system is more interesting than its current state.
- Functions are able to give us the latter, but only passively give information
- about the change of a system. The objective is to determine an effective method
- for calculating the change of a function across its domain. Let $f$ be a
- function and $[a, b]\subset \mathbb{R}$ an interval of real numbers, the
- expression
- \begin{equation}
- m = \frac{f(b) - f(a)}{a-b}
- \end{equation}
- gives the average rate of change. While the average rate of change is useful in
- many cases, the momentary rate of change is more accurate. To calculate this,
- we need to narrow down, the interval to an infinitesimal. For each $x\in[a, b]$
- we calculate
- \begin{equation} \label{eqn:differential}
- \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
- \end{equation}
- if it exists. $\frac{df}{dx}$ is the \emph{derivative}, or
- \emph{differential equation}, it returns the momentary rate of change of $f$ for
- each value $x$ of $f$'s domain. Repeating this process on $\frac{df}{dx}$ yields
- $\frac{d^2f}{dx^2}$, which is the function that calculates the rate of change of
- the rate of change and is called the second order derivative. Iterating this $n$
- times results in $\frac{d^nf}{dx^n}$, the derivative of the $n$'th order.
- Another method for obtaining a differential equation is to create it from the
- semantics of a problem. This method is useful if no basic function exists for a
- system. Differential equations find application in several areas such as
- engineering, physics, economics, epidemiology, and beyond.\\
- \todo{Here insert definition of differential equations (take from books)}
- In the context of functions, it is possible to have multiple domains, meaning
- that function has more than one parameter. To illustrate, consider a function
- operating in two-dimensional space, wherein each parameter represents one axis
- or one that, employs with time and locations as inputs. The term
- \emph{partial differential equations} (\emph{PDE}'s) describes differential
- equations of such functions, which require a derivative for each of their
- domains. In contrast, \emph{ordinary differential equations} (\emph{ODE}'s) are
- the single derivatives for a function having only one domain. In this thesis, we
- only need ODE's.\\
- A \emph{system of differential equations} is the name for a set of differential
- equations. The derivatives in a system of differential equations each have their
- own codomain, which is part of the problem, while they all share the same
- domain.\\
- Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
- including the \emph{Motion of a Particle Along a Straight Line}. Further,
- Newton's second law states that ``the rate of change of the momentum of a body
- ($momentum = mass \cdot velocity$) is proportional to the resultant external
- force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
- in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
- Then, Newton's second law translates mathematically to
- \begin{equation} \label{eq:newtonSecLaw}
- F = m\frac{dv}{dt}.
- \end{equation}
- It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
- the velocity is part of the equation. Additionally, the velocity of a body is
- the derivative of the distance traveled by that body. Based on these findings,
- we can rewrite the~\Cref{eq:newtonSecLaw} to
- \begin{equation}
- F=ma=m\frac{d^2s}{dt^2}.
- \end{equation}\\
- This explanation of differential equations focuses on the aspects deemed crucial
- for this thesis and is not intended to be a complete explanation of the subject.
- To gain a better understanding of it, we recommend the books mentioned
- above~\cite{Rudin2007,Tenenbaum1985}. In the following section we
- describe the application of these principles in epidemiological models.
- % -------------------------------------------------------------------
- \section{Epidemiological Models 3}
- \label{sec:epidemModel}
- Pandemics, like \emph{COVID-19}, which has resulted in a significant
- number of fatalities. The question arises: How should we fight a pandemic
- correctly? Also, it is essential to study whether the employed countermeasures
- efficacious in combating the pandemic. Given the unfavorable public response to
- measures such as lockdowns, it is imperative to investigate that their efficacy
- remains commensurate with the costs incurred to those affected. In the event
- that alternative and novel technologies were in use, such as the mRNA vaccines
- in the context of COVID-19, it is needful to test the effect and find the
- optimal variant. In order to shed light on the aforementioned events we need to
- develop a method to quantize the pandemic along with its course of
- progression.\\
- The real world is a highly complex system, which presents a significant
- challenge attempting to describe it fully in a model. Therefore, the model must
- reduce the complexity while retaining the essential information. Furthermore, it
- must address the issue of limited data availability. For instance, during
- COVID-19 institutions such as the Robert Koch Institute
- (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
- able to collect data on infections and mortality cases. Consequently, we require
- a model that employs an abstraction of the real world to illustrate the events
- and relations that are pivotal to understanding the problem.
- % -------------------------------------------------------------------
- \subsection{SIR Model 2}
- \label{sec:pandemicModel:sir}
- In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
- which subsequently became one of the most influential epidemiological models.
- This model enables the modeling of infections during epidemiological events such as pandemics.
- The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
- reiterates the model and serves as the foundation for the following explanation
- of SIR models.\\
- The SIR model is capable of illustrating diseases, which are transferred through
- contact or proximity of an individual carrying the illness and a healthy
- individual. This is possible due to the distinction between infected beings
- who are carriers of the disease and the part of the population, which is
- susceptible to infection. In the model, the mentioned groups are capable to
- change, \eg, healthy individuals becoming infected. The model assumes the
- size $N$ of the population remains constant throughout the duration of the
- pandemic. The population $N$ comprises three distinct groups: the
- \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
- \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
- \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
- \begin{equation}
- S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
- \end{equation}
- give the values of $S$, $I$ and $R$ at a certain point of time
- $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
- \begin{equation} \label{eq:N_char}
- N = S + I + R.
- \end{equation}
- The model makes another assumption by stating that recovered people are immune
- to the illness and infectious individual can not infect them. The individuals in
- the $R$ group are either recovered or deceased, and thus unable to transmit or
- carry the disease.
- \begin{figure}[h]
- \centering
- \includegraphics[scale=0.87]{sir_graph.pdf}
- \caption{A visualization of the SIR model, illustrating $N$ being split in the
- three groups $S$, $I$ and $R$.}
- \label{fig:sir_model}
- \end{figure}
- As visualized in the~\Cref{fig:sir_model} the
- individuals may transition between groups based on transition rates. The
- transmission rate $\beta$ is responsible for individuals becoming infected,
- while the rate of removal or recovery rate $\alpha$ (also referred to as
- $\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
- individuals from $I$ to $R$.\\
- We can describe this problem mathematically using a system of differential
- equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
- McKendrick~\cite{1927} propose the following set of differential equations:
- \begin{equation}\label{eq:sir}
- \begin{split}
- \frac{dS}{dt} &= -\beta S I,\\
- \frac{dI}{dt} &= \beta S I - \alpha I,\\
- \frac{dR}{dt} &= \alpha I.
- \end{split}
- \end{equation}
- This, according to Edelstein-Keshet, is based on the following assumption:
- ``The rate of transmission of a microparasitic disease is proportional to the
- rate of encounter of susceptible and infective individuals modelled by the
- product ($\beta S I$)''~\cite{EdelsteinKeshet2005}. The system shows the change
- in size of the groups per time unit due to infections, recoveries, and deaths.\\
- The term $\beta SI$ describes the rate of encounters of susceptible and infected
- individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
- and May~\cite{Anderson1991} propose a modified model:
- \begin{equation}\label{eq:modSIR}
- \begin{split}
- \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
- \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
- \frac{dR}{dt} &= \alpha I.
- \end{split}
- \end{equation}
- In which $\beta SI$ gets normalized by $N$, which is more correct in a
- real world aspect~\cite{Anderson1991}.\\
- The initial phase of a pandemic is characterized by the infection of a small
- number of individuals, while the majority of the population remains susceptible.
- The infectious group has not yet infected any individuals thus
- neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
- the number of infected individuals at the beginning of the disease. Then,
- \begin{equation}
- \begin{split}
- S(0) &= N - I_{0},\\
- I(0) &= I_{0},\\
- R(0) &= 0,
- \end{split}
- \end{equation}
- describes the initial configuration of a system in which a disease has just
- emerged.\\
- \begin{figure}[h]
- %\centering
- \setlength{\unitlength}{1cm} % Set the unit length for coordinates
- \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
- % reference
- \put(0, 1.75){
- \begin{subfigure}{0.4\textwidth}
- \centering
- \includegraphics[width=\textwidth]{reference_params_synth.png}
- \caption{$\alpha=0.35$, $\beta=0.5$}
- \label{fig:synth_norm}
- \end{subfigure}
- }
- % 1. row, 1.image (low beta)
- \put(5.5, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{low_beta_synth.png}
- \caption{$\alpha=0.25$, $\beta=0.5$}
- \label{fig:synth_low_beta}
- \end{subfigure}
- }
- % 1. row, 2.image (high beta)
- \put(9.5, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{high_beta_synth.png}
- \caption{$\alpha=0.45$, $\beta=0.5$}
- \label{fig:synth_high_beta}
- \end{subfigure}
- }
- % 2. row, 1.image (low alpha)
- \put(5.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{low_alpha_synth.png}
- \caption{$\alpha=0.35$, $\beta=0.4$}
- \label{fig:synth_low_alpha}
- \end{subfigure}
- }
- % 2. row, 2.image (high alpha)
- \put(9.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{high_alpha_synth.png}
- \caption{$\alpha=0.35$, $\beta=0.6$}
- \label{fig:synth_high_alpha}
- \end{subfigure}
- }
- \end{picture}
- \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$, $I_0=10$ with different sets of parameters.}
- \label{fig:synth_sir}
- \end{figure}
- In the SIR model the temporal occurrence and the height of the peak (or peaks)
- of the infectious group are of paramount importance for understanding the
- dynamics of a pandemic. A low peak occurring at a late point in time indicates
- that the disease is unable to keep pace with the rate of recovery, resulting
- in its demise before it can exert a significant influence on the population. In
- contrast, an early and high peak means that the disease is rapidly transmitted
- through the population, with a significant proportion of individuals having been
- infected.~\Cref{fig:sir_model} illustrates the impact of modifying either
- $\beta$ or $\alpha$ while simulating a pandemic using a model such
- as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
- and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
- When the number of infections exceeds the number of recoveries, the peak of $I$
- will occur early and will be high. On the other hand, if recoveries occur at a
- faster rate than new infections the peak will occur later and will be low. This
- means, that it is crucial to know both $\beta$ and $\alpha$ to be able to
- simulate a pandemic using the SIR model.\\
- The SIR model makes a number of assumptions that are intended to reduce the
- model's overall complexity while simultaneously increasing its divergence from
- actual reality. One such assumption is that the size of the population, $N$,
- remains constant. This depiction is not an accurate representation of the actual
- relations observed in the real world, as the size of a population is subject to
- a number of factors that can contribute to change. The population is increased
- by the occurrence of births and decreased by the occurrence of deaths. There are
- different reasons for mortality, including the natural aging process or the
- development of other diseases. Other examples are the absence of the possibility
- for individuals to be susceptible again, after having recovered, or the
- possibility for the transition rates to change due to new variants or the
- implementation of new countermeasures. We address this latter option in the
- next~\Cref{sec:pandemicModel:rsir}.
- % -------------------------------------------------------------------
- \subsection{Reduced SIR Model and the Reproduction Number 1}
- \label{sec:pandemicModel:rsir}
- The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. The model
- comprises two parameters $\beta$ and $\alpha$, which describe the course of a
- pandemic over its duration. This is beneficial when examining the overall
- pandemic; however, in the real world, disease behavior is dynamic, and the
- values of the parameters $\beta$ and $\alpha$ change at each time point. The
- reason for this is due to events such as the implementation of countermeasures
- that reduce the contact between the infectious and susceptible individuals, the
- emergence of a new variant of the disease that increases its infectivity or
- deadliness, or the administration of a vaccination that provides previously
- susceptible individuals with immunity without ever being infectious. To address
- this Millevoi \etal~\cite{Millevoi2023} introduce a model that simultaneously
- reduces the size of the system of differential equations and solves the problem
- of time scaling at hand.\\
- First, they alter the definition of $\beta$ and $\alpha$ to be dependent on the time interval
- $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
- \begin{equation}
- \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
- \end{equation}
- Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
- span an infected individual requires to recuperate. Subsequently, at the initial time point
- $t_0$, the \emph{reproduction number},
- \begin{equation}
- \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
- \end{equation}
- represents the number of susceptible individuals, that one infectious individual
- infects at the onset of the pandemic.In light of the effects of $\beta$ and
- $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO > 1$ indicates that the
- pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
- limited number of infections resulting from $I(t_0) << S(t_0)$. When $\RO < 1$,
- the disease is spreading rapidly across the population, with an increase in $I$
- occurring at a high rate. Nevertheless, $\RO$ does not cover the entire time
- span. For this reason, Millevoi \etal~\cite{Millevoi2023} introduce $\Rt$
- which has the same interpretation as $\RO$, with the exception that $\Rt$ is
- dependent on time. The definition of the time-dependent reproduction number on
- the time interval $\mathcal{T}$ with the population size $N$,
- \begin{equation}\label{eq:repr_num}
- \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N}
- \end{equation}
- includes the rates of change for information about the spread of the disease and
- information of the decrease of the ratio of susceptible individuals in the
- population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
- a state variable in the model and enabling the following reduction of the SIR
- model.\\
- \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
- $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
- \begin{equation}
- \begin{split}
- \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
- \frac{dI}{dt} &= -\alpha\Rt I(t),
- \end{split}
- \end{equation}
- is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
- characteristic and the reproduction number \Rt (see ~\Cref{eq:repr_num}).
- Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
- extensive range of values that the SIR groups can assume, spanning from $0$ to
- $10^7$. Accordingly, they initially scale the time interval $\mathcal{T}$ using
- its borders to calculate the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in
- [0, 1]$. Subsequently, they calculate the scaled groups,
- \begin{equation}
- S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
- \end{equation}
- using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
- variable $I$, results in,
- \begin{equation}
- \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
- \end{equation}
- a further reduced version of~\Cref{eq:sir} results in a more streamlined and
- efficient process, as it entails the elimination of a parameter($\beta$) and two
- state variables ($S$ and $R$), while adding the state variable $\Rt$. This is a
- crucial aspect for the automated resolution of such differential equation
- systems, as we describe in~\Cref{sec:mlp}.
- % -------------------------------------------------------------------
- \section{Multilayer Perceptron 2}
- \label{sec:mlp}
- In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
- equations in systems, illustrating how they can be utilized to elucidate the
- impact of a specific parameter on the system's behavior.
- In~\Cref{sec:epidemModel}, we show specific applications of differential
- equations in an epidemiological context. The final objective is to solve these
- equations. For this problem, there are multiple methods to achieve this goal. On
- such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
- following section, we provide a brief overview of the structure and training of
- these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
- by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
- explanations.\\
- The objective is to develop an approximation method for any function $f^{*}$,
- which could be a mathematical function or a mapping of an input vector to a
- class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
- the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
- is the function to approximate. In the year 1958,
- Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
- a neuron in a neuroscientific sense. The perceptron takes in the input vector
- $\boldsymbol{x}$ performs an operation and produces a scalar result. This model
- optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
- f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
- Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
- approximating a specific class of functions. Consequently, there is a necessity
- for an expansion of the perceptron.\\
- As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
- a chain structure of the form,
- \begin{equation} \label{eq:mlp_char}
- f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
- \end{equation}
- This converts a perceptron, which has only two layers (an input and an output
- layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
- is represented in the structure of an MLP as a \emph{layer}. A multitude of
- \emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
- same vector-to-scalar calculation as the perceptron does. Subsequently, a
- nonlinear activation function transforms the scalar output into the activation
- of the unit. The layers are staggered in the neural network, with each layer
- being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
- input vector $\boldsymbol{x}$ is provided to each unit of the first layer
- $f^{(1)}$, which then gives the results to the units of the second layer
- $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
- intervening layers, situated between the first and the output layers are the
- \emph{hidden layers}. The alternating structure of linear and nonlinear
- calculation enables MLP's to approximate any function. As Hornik
- \etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
- \begin{figure}[h]
- \centering
- \includegraphics[scale=0.87]{MLP.pdf}
- \caption{A visualization of the SIR model, illustrating $N$ being split in the
- three groups $S$, $I$ and $R$.}
- \label{fig:mlp_example}
- \end{figure}
- \todo{caption}
- The term \emph{training} describes the process of optimizing the parameters
- $\theta$. In order to undertake training, it is necessary to have a set of
- \emph{training data}, which is a set of pairs (also called training points) of
- the input data $\boldsymbol{x}$ and its corresponding true solution
- $\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
- define the \emph{loss function} $\Loss{ }$, using the model prediction
- $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
- metric for evaluating the extent to which the model deviates from the correct
- answer. One of the most common loss function is the \emph{mean square error}
- (MSE) loss function. Let $N$ be the number of points in the set of training
- data. Then,
- \begin{equation} \label{eq:mse}
- \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
- \end{equation}
- calculates the squared difference between each model prediction and true value
- of a training and takes the mean across the whole training data. \\
- In the context of neural networks, \emph{forward propagation} describes the
- process of information flowing through the network from the input layer to the
- output layer, resulting in a scalar loss. Ultimately, the objective is to
- utilize this information to optimize the parameters, in order to minimize the
- loss. One of the most fundamental optimization strategy is \emph{gradient
- descent}. In this process, the derivatives are employed to identify the location
- of local or global minima within a function. Given that a positive gradient
- signifies ascent and a negative gradient indicates descent, we must move the
- variable by a constant \emph{learning rate} (step size) in the opposite
- direction to that of the gradient. The calculation of the derivatives in respect
- to the parameters is a complex task, since our functions is a composition of
- many functions (one for each layer). We can address this issue taking advantage
- of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
- $\hat{\boldsymbol{y}} = f(w; \theta)$ be the model prediction with
- $w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
- $\boldsymbol{x}$ is the input vector and $\theta_1, \theta_2\subset\theta$.
- Then,
- \begin{equation}\label{eq:backprop}
- \nabla_{\theta_1} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\theta_1}f^{(1)},
- \end{equation}
- is the gradient of $\Loss{ }$ in respect of the parameters $\theta_1$. The name
- of this method in the context of neural networks is \emph{back propagation}. \todo{Insert source}\\
- In practical applications, an optimizer often accomplishes the optimization task
- by executing gradient descent in the background. Furthermore, modifying the
- learning rate during training can be advantageous. For instance, making larger
- steps at the beginning and minor adjustments at the end. Therefore, schedulers
- are implementations algorithms that employ diverse learning rate alteration
- strategies.\\
- This section provides an overview of basic concepts of neural networks. For a
- deeper understanding, we direct the reader to the book \emph{Deep Learning} by
- Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
- the application of neural networks in approximating solutions to differential
- systems.
- % -------------------------------------------------------------------
- \section{Physics Informed Neural Networks 5}
- \label{sec:pinn}
- In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
- wildely recognized tools for approximating any kind of function. In this
- section, we apply this capability to create a solver for ODE's and PDE's
- as Legaris \etal~\cite{Lagaris1997} describe in their paper. In this approach,
- the model learns to approximate a function using provided data points while
- leveraging the available knowledge about the problem in the form of a system of
- differential equations. The \emph{physics-informed neural network} (PINN)
- learns the system of differential equations during training, as it optimizes
- its output to align with the equations.\\
- In contrast to standard MLP's, the loss term of a PINN comprises two
- components. The first term incorporates the aforementioned prior knowledge to pertinent the problem. As Raissi
- \etal~\cite{Raissi2017} propose, the residual of each differential equation in
- the system must be minimized in order for the model to optimize its output in accordance with the theory.
- We obtain the residual $R_i$, with $i\in\{1, ...,N_d\}$, by rearranging the differential equation and
- calculating the difference between the left-hand side and the right-hand side
- of the equation. $N_d$ is the number of differential equations in a system. As
- Raissi \etal~\cite{Raissi2017} propose the \emph{physics
- loss} of a PINN,\todo{check source again}
- \begin{equation}
- \mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2,
- \end{equation}
- takes the input data and the model prediction to calculate the mean square
- error of the residuals. The second term, the \emph{observation loss}
- $\Loss{obs}$, employs the mean square error of the distances between the
- predicted and the true values for each training point. Additionally, the
- observation loss may incorporate extra terms of inital and boundary conditions. Let $N_t$
- denote the number of training points. Then,
- \begin{equation}
- \mathcal{L}_{PINN}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
- \end{equation}\\
- represents the comprehensive loss function of a physics-informed neural network. \\
- \todo{check for correctness}
- Given the nature of residuals, calculating the loss term of
- $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
- calculation of the derivative of the output with respect to the input of
- the neural network. As we outline in~\Cref{sec:mlp}, during the process of
- back-propagation we calculate the gradients of the loss term in respect to a
- layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
- the index of the \todo{check for consistency} respective layer. By employing
- the chain rule of calculus, the algorithm progresses from the output layer
- through each hidden layer, ultimately reaching the first layer in order to
- compute the respective gradients. The term,
- \begin{equation}
- \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
- \end{equation}
- illustrates that, in contrast to the procedure described in~\cref{eq:backprop},
- this procedure the \emph{automatic differenciation} goes one step further and
- calculates the gradient of the output with respect to the input
- $\boldsymbol{x}$. In order to calculate the second derivative
- $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
- this procedure must be repeated.\\
- Above we present a method for approximating functions through the use of
- systems of differential equations. As previously stated, we want to find a
- solver for systems of differential equations. In problems, where we must solve
- an ODE or PDE, we have to find a set of parameters, that satisfies the system
- for any input $\boldsymbol{x}$. In terms of the context of PINN's this is the
- inverse problem, where we have a set of training data from measurements, for
- example, is available along with the respective differential equations but
- information about the parameters of the equations is lacking. To address this
- challenge, we set these parameters as distinct learnable parameters within the
- neural network. This enables the network to utilize a specific value, that
- actively influences the physics loss $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$.
- During the training phase the optimizer aims to minimize the physics loss,
- which should ultimately yield an approximation of the true value.\\
- \begin{figure}[h]
- \centering
- \includegraphics[scale=0.87]{oscilator.pdf}
- \caption{Illustration of of the movement of an oscillating body in the
- underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
- \label{fig:spring}
- \end{figure}
- One illustrative example of a potential application for PINN's is the
- \emph{damped harmonic oscillator}~\cite{Tenenbaum1985}. In this problem, we \todo{check source for wording}
- displace a body, which is attached to a spring, from its resting position. The
- body is subject to three forces: firstly, the inertia exerted by the
- displacement $u$, which points in the direction the displacement $u$; secondly
- the restoring force of the spring, which attempts to return the body to its
- original position and thirdly, the friction force, which points in the opposite
- direction of the movement. In accordance with Newton's second law and the
- combined influence of these forces, the body exhibits oscillatory motion around
- its position of rest. The system is influenced by $m$ the mass of the body,
- $\mu$ the coefficient of friction and $k$ the spring constant, indicating the
- stiffness of the spring. The residual of the differential equation, \todo{check in book}
- \begin{equation}
- m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
- \end{equation}
- shows relation of these parameters in reference to the problem at hand. As
- Tenenbaum and Morris provide, there are three potential solutions to this
- issue. However only the \emph{underdamped case} results in an oscillating
- movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
- PINN to this problem, we require a set of training data $x$. This consists of
- pairs of timepoints and corresponding displacement measurements
- $(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
- we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
- initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$, However, we do
- not know the value of the friction $\mu$. In this case the loss function,
- \begin{equation}
- \mathcal{L}_{osc}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = (u^{(1)}-1)+\frac{du(0)}{dt}+||m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2,
- \end{equation}
- includes the border conditions, the residual, in which $\mu$ is a learnable
- parameter and the observation loss.
- % -------------------------------------------------------------------
- \subsection{Disease Informed Neural Networks 2}
- \label{sec:pinn:dinn}
- % -------------------------------------------------------------------
|