chap02.tex 21 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428
  1. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2. % Author: Phillip Rothenbeck
  3. % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
  4. % File: chap02/chap02.tex
  5. % Part: theoretical background
  6. % Description:
  7. % summary of the content in this chapter
  8. % Version: 05.08.2024
  9. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  10. \chapter{Theoretical Background}
  11. \label{chap:background}
  12. This chapter introduces the theoretical knowledge that forms the foundation of
  13. the work presented in this thesis. In~\Cref{sec:domain}
  14. and~\Cref{sec:differentialEq}, we talk about differential equations and the
  15. underlying theory. In these Sections both the explanations and the approach are
  16. strongly based on the book on analysis by Rudin~\cite{Rudin2007} and the book
  17. about ordinary differential equations by Tenenbaum and
  18. Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this knowledge to examine
  19. various pandemic models in~\Cref{sec:epidemModel}.
  20. Finally, we address the topic of neural networks with a focus on the multilayer
  21. perceptron in~\Cref{sec:mlp} and physics informed neural networks
  22. in~\Cref{sec:pinn}.
  23. % -------------------------------------------------------------------
  24. \section{Mathematical Modelling using Functions}
  25. \label{sec:domain}
  26. To model a physical problem using mathematical tools, it is necessary to define
  27. a set of fundamental numbers or quantities upon which the subsequent calculations
  28. will be based. These sets may represent, for instance, a specific time interval
  29. or a distance. The term \emph{domain} describes these fundamental sets of
  30. numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
  31. living in a certain domain. In this thesis, we will focus on domains of real
  32. numbers in $\mathbb{R}$.\\
  33. The mapping between variables enables the modeling of the process and depicts
  34. the semantics. We use functions in order to facilitate this mapping. Let
  35. $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
  36. function as the mapping
  37. \begin{equation}
  38. f: A\rightarrow B.
  39. \end{equation}
  40. In other words, the function $f$ maps elements $x\in A$ to values
  41. $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
  42. of $f$. Functions are capable of representing the state of a system as a value
  43. based on an input value from their domain. One illustrative example is a
  44. function that maps a time point to the distance covered since a starting point.
  45. In this case, time serves as the domain, while the distance is the codomain.
  46. % -------------------------------------------------------------------
  47. \section{Basics of Differential Equations}
  48. \label{sec:differentialEq}
  49. Often, the change of a system is more interesting than its current state.
  50. Functions are able to give us the latter, but only passively give information
  51. about the change of a system. The objective is to determine an effective method
  52. for calculating the change of a function across its domain. Let $f$ be a
  53. function and $[a, b]\subset \mathbb{R}$ an interval of real numbers, the
  54. expression
  55. \begin{equation}
  56. m = \frac{f(b) - f(a)}{a-b}
  57. \end{equation}
  58. gives the average rate of change. While the average rate of change is useful in
  59. many cases, the momentary rate of change is more accurate. To calculate this,
  60. we need to narrow down, the interval to an infinitesimal. For each $x\in[a, b]$
  61. we calculate
  62. \begin{equation} \label{eqn:differential}
  63. \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
  64. \end{equation}
  65. if it exists. $\frac{df}{dx}$ is the \emph{derivative}, or
  66. \emph{differential equation}, it returns the momentary rate of change of $f$ for
  67. each value $x$ of $f$'s domain. Repeating this process on $\frac{df}{dx}$ yields
  68. $\frac{d^2f}{dx^2}$, which is the function that calculates the rate of change of
  69. the rate of change and is called the second order derivative. Iterating this $n$
  70. times results in $\frac{d^nf}{dx^n}$, the derivative of the $n$'th order.
  71. Another method for obtaining a differential equation is to create it from the
  72. semantics of a problem. This method is useful if no basic function exists for a
  73. system. Differential equations find application in several areas such as
  74. engineering, physics, economics, epidemiology, and beyond.\\
  75. In the context of functions, it is possible to have multiple domains, meaning
  76. that function has more than one parameter. To illustrate, consider a function
  77. operating in two-dimensional space, wherein each parameter represents one axis
  78. or one that, employs with time and locations as inputs. The term
  79. \emph{partial differential equations} (\emph{PDE}'s) describes differential
  80. equations of such functions, which require a derivative for each of their
  81. domains. In contrast, \emph{ordinary differential equations} (\emph{ODE}'s) are
  82. the single derivatives for a function having only one domain. In this thesis, we
  83. only need ODE's.\\
  84. A \emph{system of differential equations} is the name for a set of differential
  85. equations. The derivatives in a system of differential equations each have their
  86. own codomain, which is part of the problem, while they all share the same
  87. domain.\\
  88. Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
  89. including the \emph{Motion of a Particle Along a Straight Line}. Further,
  90. Newton's second law states that ``the rate of change of the momentum of a body
  91. ($momentum = mass \cdot velocity$) is proportional to the resultant external
  92. force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
  93. in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
  94. Then, Newton's second law translates mathematically to
  95. \begin{equation} \label{eq:newtonSecLaw}
  96. F = m\frac{dv}{dt}.
  97. \end{equation}
  98. It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
  99. the velocity is part of the equation. Additionally, the velocity of a body is
  100. the derivative of the distance traveled by that body. Based on these findings,
  101. we can rewrite the~\Cref{eq:newtonSecLaw} to
  102. \begin{equation}
  103. F=ma=m\frac{d^2s}{dt^2}.
  104. \end{equation}\\
  105. This explanation of differential equations focuses on the aspects deemed crucial
  106. for this thesis and is not intended to be a complete explanation of the subject.
  107. To gain a better understanding of it, we recommend the books mentioned
  108. above~\cite{Rudin2007,Tenenbaum1985}. In the following section we
  109. describe the application of these principles in epidemiological models.
  110. % -------------------------------------------------------------------
  111. \section{Epidemiological Models}
  112. \label{sec:epidemModel}
  113. Pandemics, like \emph{COVID-19}, which has resulted in a significant
  114. number of fatalities. The question arises: How should we fight a pandemic
  115. correctly? Also, it is essential to study whether the employed countermeasures
  116. efficacious in combating the pandemic. Given the unfavorable public response to
  117. measures such as lockdowns, it is imperative to investigate that their efficacy
  118. remains commensurate with the costs incurred to those affected. In the event
  119. that alternative and novel technologies were in use, such as the mRNA vaccines
  120. in the context of COVID-19, it is needful to test the effect and find the
  121. optimal variant. In order to shed light on the aforementioned events we need to
  122. develop a method to quantize the pandemic along with its course of
  123. progression.\\
  124. The real world is a highly complex system, which presents a significant
  125. challenge attempting to describe it fully in a model. Therefore, the model must
  126. reduce the complexity while retaining the essential information. Furthermore, it
  127. must address the issue of limited data availability. For instance, during
  128. COVID-19 institutions such as the Robert Koch Institute
  129. (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
  130. able to collect data on infections and mortality cases. Consequently, we require
  131. a model that employs an abstraction of the real world to illustrate the events
  132. and relations that are pivotal to understanding the problem.
  133. % -------------------------------------------------------------------
  134. \subsection{SIR Model}
  135. \label{sec:pandemicModel:sir}
  136. In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
  137. which subsequently became one of the most influential epidemiological models.
  138. This model enables the modeling of infections during epidemiological events such as pandemics.
  139. The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
  140. reiterates the model and serves as the foundation for the following explanation
  141. of SIR models.\\
  142. The SIR model is capable of illustrating diseases, which are transferred through
  143. contact or proximity of an individual carrying the illness and a healthy
  144. individual. This is possible due to the distinction between infected beings
  145. who are carriers of the disease and the part of the population, which is
  146. susceptible to infection. In the model, the mentioned groups are capable to
  147. change, e.g., healthy individuals becoming infected. The model assumes the
  148. size $N$ of the population remains constant throughout the duration of the
  149. pandemic. The population $N$ comprises three distinct groups: the
  150. \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
  151. \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
  152. \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
  153. \begin{equation} \label{eq:N_char}
  154. S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
  155. \end{equation}
  156. give the values of $S$, $I$ and $R$ at a certain point of time
  157. $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
  158. \begin{equation} \label{eq:N_char}
  159. N = S + I + R.
  160. \end{equation}
  161. The model makes another assumption by stating that recovered people are immune
  162. to the illness and infectious individual can not infect them. The individuals in
  163. the $R$ group are either recovered or deceased, and thus unable to transmit or
  164. carry the disease.
  165. \begin{figure}[h]
  166. \centering
  167. \includegraphics[scale=0.3]{sir_graph.png}
  168. \caption{A visualization of the SIR model, illustrating $N$ being split in the
  169. three groups $S$, $I$ and $R$.}
  170. \label{fig:sir_model}
  171. \end{figure}
  172. As visualized in the~\Cref{fig:sir_model} the
  173. individuals may transition between groups based on transition rates. The
  174. transmission rate $\beta$ is responsible for individuals becoming infected,
  175. while the rate of removal or recovery rate $\alpha$ (also referred to as
  176. $\delta$ or $\nu$, e.g.,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
  177. individuals from $I$ to $R$.\\
  178. We can describe this problem mathematically using a system of differential
  179. equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
  180. McKendrick~\cite{1927} propose the following set of differential equations:
  181. \begin{equation}\label{eq:sir}
  182. \begin{split}
  183. \frac{dS}{dt} &= -\beta S I,\\
  184. \frac{dI}{dt} &= \beta S I - \alpha I,\\
  185. \frac{dR}{dt} &= \alpha I.
  186. \end{split}
  187. \end{equation}
  188. This, according to Edelstein-Keshet, is based on the following assumption:
  189. ``The rate of transmission of a microparasitic disease is proportional to the
  190. rate of encounter of susceptible and infective individuals modelled by the
  191. product ($\beta S I$)''~\cite{EdelsteinKeshet2005}. The system shows the change
  192. in size of the groups per time unit due to infections, recoveries, and deaths.\\
  193. The term $\beta SI$ describes the rate of encounters of susceptible and infected
  194. individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
  195. and May~\cite{Anderson1991} propose a modified model:
  196. \begin{equation}\label{eq:modSIR}
  197. \begin{split}
  198. \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
  199. \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
  200. \frac{dR}{dt} &= \alpha I.
  201. \end{split}
  202. \end{equation}
  203. In which $\beta SI$ gets normalized by $N$, which is more correct in a
  204. real world aspect~\cite{Anderson1991}.\\
  205. The initial phase of a pandemic is characterized by the infection of a small
  206. number of individuals, while the majority of the population remains susceptible.
  207. The infectious group has not yet infected any individuals thus
  208. neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
  209. the number of infected individuals at the beginning of the disease. Then,
  210. \begin{equation}
  211. \begin{split}
  212. S(0) &= N - I_{0},\\
  213. I(0) &= I_{0},\\
  214. R(0) &= 0,
  215. \end{split}
  216. \end{equation}
  217. describes the initial configuration of a system in which a disease has just
  218. emerged.\\
  219. \begin{figure}[h]
  220. %\centering
  221. \setlength{\unitlength}{1cm} % Set the unit length for coordinates
  222. \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
  223. % reference
  224. \put(0, 2.5){
  225. \begin{subfigure}{0.3\textwidth}
  226. \centering
  227. \includegraphics[width=\textwidth]{reference_params_synth.png}
  228. \caption{$\alpha=0.35$, $\beta=0.5$}
  229. \label{fig:synth_norm}
  230. \end{subfigure}
  231. }
  232. % 1. row, 1.image (low beta)
  233. \put(5, 5){
  234. \begin{subfigure}{0.3\textwidth}
  235. \centering
  236. \includegraphics[width=\textwidth]{low_beta_synth.png}
  237. \caption{$\alpha=0.25$, $\beta=0.5$}
  238. \label{fig:synth_low_beta}
  239. \end{subfigure}
  240. }
  241. % 1. row, 2.image (high beta)
  242. \put(9, 5){
  243. \begin{subfigure}{0.3\textwidth}
  244. \centering
  245. \includegraphics[width=\textwidth]{high_beta_synth.png}
  246. \caption{$\alpha=0.45$, $\beta=0.5$}
  247. \label{fig:synth_high_beta}
  248. \end{subfigure}
  249. }
  250. % 2. row, 1.image (low alpha)
  251. \put(5, 0){
  252. \begin{subfigure}{0.3\textwidth}
  253. \centering
  254. \includegraphics[width=\textwidth]{low_alpha_synth.png}
  255. \caption{$\alpha=0.35$, $\beta=0.4$}
  256. \label{fig:synth_low_alpha}
  257. \end{subfigure}
  258. }
  259. % 2. row, 2.image (high alpha)
  260. \put(9, 0){
  261. \begin{subfigure}{0.3\textwidth}
  262. \centering
  263. \includegraphics[width=\textwidth]{high_alpha_synth.png}
  264. \caption{$\alpha=0.35$, $\beta=0.6$}
  265. \label{fig:synth_high_alpha}
  266. \end{subfigure}
  267. }
  268. \end{picture}
  269. \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$, $I_0=10$ with different sets of parameters.}
  270. \label{fig:synth_sir}
  271. \end{figure}
  272. In the SIR model the temporal occurrence and the height of the peak (or peaks)
  273. of the infectious group are of paramount importance for understanding the
  274. dynamics of a pandemic. A low peak occurring at a late point in time indicates
  275. that the disease is unable to keep pace with the rate of recovery, resulting
  276. in its demise before it can exert a significant influence on the population. In
  277. contrast, an early and high peak means that the disease is rapidly transmitted
  278. through the population, with a significant proportion of individuals having been
  279. infected.~\Cref{fig:sir_model} illustrates the impact of modifying either
  280. $\beta$ or $\alpha$ while simulating a pandemic using a model such
  281. as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
  282. and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
  283. When the number of infections exceeds the number of recoveries, the peak of $I$
  284. will occur early and will be high. On the other hand, if recoveries occur at a
  285. faster rate than new infections the peak will occur later and will be low. This
  286. means, that it is crucial to know both $\beta$ and $\alpha$ to be able to
  287. simulate a pandemic using the SIR model.\\
  288. The SIR model makes a number of assumptions that are intended to reduce the
  289. model's overall complexity while simultaneously increasing its divergence from
  290. actual reality. One such assumption is that the size of the population, $N$,
  291. remains constant. This depiction is not an accurate representation of the actual
  292. relations observed in the real world, as the size of a population is subject to
  293. a number of factors that can contribute to change. The population is increased
  294. by the occurrence of births and decreased by the occurrence of deaths. There are
  295. different reasons for mortality, including the natural aging process or the
  296. development of other diseases. Other examples are the absence of the possibility
  297. for individuals to be susceptible again, after having recovered, or the
  298. possibility for the transition rates to change due to new variants or the
  299. implementation of new countermeasures. We address this latter option in the
  300. next~\Cref{sec:pandemicModel:rsir}.
  301. % -------------------------------------------------------------------
  302. \subsection{Reduced SIR Model and the Reproduction Number}
  303. \label{sec:pandemicModel:rsir}
  304. The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. The model
  305. comprises two parameters $\beta$ and $\alpha$, which describe the course of a
  306. pandemic over its duration. This is beneficial when examining the overall
  307. pandemic; however, in the real world, disease behavior is dynamic, and the
  308. values of the parameters $\beta$ and $\alpha$ change at each time point. The
  309. reason for this is due to events such as the implementation of countermeasures
  310. that reduce the contact between the infectious and susceptible individuals, the
  311. emergence of a new variant of the disease that increases its infectivity or
  312. deadliness, or the administration of a vaccination that provides previously
  313. susceptible individuals with immunity without ever being infectious. To address
  314. this Millevoi et al.~\cite{Millevoi2023} introduce a model that simultaneously
  315. reduces the size of the system of differential equations and solves the problem
  316. of time scaling at hand.\\
  317. First, they alter the definition of $\beta$ and $\alpha$ to be dependent on the time interval
  318. $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
  319. \begin{equation}
  320. \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
  321. \end{equation}
  322. Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
  323. span an infected individual requires to recuperate. Subsequently, at the initial time point
  324. $t_0$, the \emph{reproduction number},
  325. \begin{equation}
  326. \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
  327. \end{equation}
  328. represents the number of susceptible individuals, that one infectious individual
  329. infects at the onset of the pandemic.In light of the effects of $\beta$ and
  330. $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO > 1$ indicates that the
  331. pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
  332. limited number of infections resulting from $I(t_0) << S(t_0)$. When $\RO < 1$,
  333. the disease is spreading rapidly across the population, with an increase in $I$
  334. occurring at a high rate. Nevertheless, $\RO$ does not cover the entire time
  335. span. For this reason, Millevoi et al.~\cite{Millevoi2023} introduce $\Rt$
  336. which has the same interpretation as $\RO$, with the exception that $\Rt$ is
  337. dependent on time. The definition of the time-dependent reproduction number on
  338. the time interval $\mathcal{T}$ with the population size $N$,
  339. \begin{equation}\label{eq:repr_num}
  340. \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N}
  341. \end{equation}
  342. includes the rates of change for information about the spread of the disease and
  343. information of the decrease of the ratio of susceptible individuals in the
  344. population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
  345. a state variable in the model and enabling the following reduction of the SIR
  346. model.\\
  347. \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
  348. $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
  349. \begin{equation}
  350. \begin{split}
  351. \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
  352. \frac{dI}{dt} &= -\alpha\Rt I(t),
  353. \end{split}
  354. \end{equation}
  355. is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
  356. characteristic and the reproduction number \Rt (see ~\Cref{eq:repr_num}).
  357. Another issue that Millevoi et al.~\cite{Millevoi2023} seek to address is the
  358. extensive range of values that the SIR groups can assume, spanning from $0$ to
  359. $10^7$. Accordingly, they initially scale the time interval $\mathcal{T}$ using
  360. its borders to calculate the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in
  361. [0, 1]$. Subsequently, they calculate the scaled groups,
  362. \begin{equation}
  363. S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
  364. \end{equation}
  365. using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
  366. variable $I$, results in,
  367. \begin{equation}
  368. \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
  369. \end{equation}
  370. a further reduced version of~\Cref{eq:sir} results in a more streamlined and
  371. efficient process, as it entails the elimination of a parameter($\beta$) and two
  372. state variables ($S$ and $R$), while adding the state variable $\Rt$. This is a
  373. crucial aspect for the automated resolution of such differential equation
  374. systems, as we describe in~\Cref{sec:mlp}.
  375. % -------------------------------------------------------------------
  376. \section{Multilayer Perceptron}
  377. \label{sec:mlp}
  378. In~\Cref{sec:differentialEq} we show the importance of differential equations to
  379. systems, being able to show the change of it dependent on a certain parameter of
  380. the parameter. In~\Cref{sec:epidemModel} we show specific applications for
  381. differential equations in an epidemiological context. Now, the last point is to
  382. solve these equations. For this problem, there are multiple methods to reach
  383. this goal one of them is the \emph{Multilayer Perceptron}
  384. (MLP)~\cite{Hornik1989}. In the following we briefly tackle the structure,
  385. training and usage of these neural networks.
  386. % -------------------------------------------------------------------
  387. \section{Physics Informed Neural Networks}
  388. \label{sec:pinn}
  389. % -------------------------------------------------------------------
  390. \subsection{Disease Informed Neural Networks}
  391. \label{sec:pinn:dinn}
  392. % -------------------------------------------------------------------