chap02.tex 40 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718
  1. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2. % Author: Phillip Rothenbeck
  3. % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
  4. % File: chap02/chap02.tex
  5. % Part: theoretical background
  6. % Description:
  7. % summary of the content in this chapter
  8. % Version: 20.08.2024
  9. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  10. \chapter{Theoretical Background}
  11. \label{chap:background}
  12. This chapter introduces the theoretical foundations for the work presented in
  13. this thesis. In~\Cref{sec:domain} and~\Cref{sec:differentialEq}, we describe
  14. differential equations and the underlying theory. In these Sections both the
  15. explanations and the approach are based on a book on analysis by
  16. Rudin~\cite{Rudin2007} and a book about ordinary differential equations by
  17. Tenenbaum and Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this
  18. knowledge to examine various pandemic models in~\Cref{sec:epidemModel}. Finally,
  19. we address the topic of neural networks with a focus on the multilayer
  20. perceptron in~\Cref{sec:mlp} and physics-informed neural networks
  21. in~\Cref{sec:pinn}.
  22. % -------------------------------------------------------------------
  23. \section{Mathematical Modelling using Functions}
  24. \label{sec:domain}
  25. To model a physical problem mathematically, it is necessary to define a
  26. fundamental set of numbers or quantities upon which the subsequent calculations
  27. will be based. These sets may represent, for instance, a specific time interval
  28. or a distance. The term \emph{domain} describes these fundamental sets of
  29. numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity
  30. living in a certain domain. In this thesis, we will focus on domains of real
  31. numbers in $\mathbb{R}$.\\
  32. The mapping between variables enables the modeling of a physical process and may
  33. depict semantics. We use functions in order to facilitate this mapping. Let
  34. $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
  35. function as the mapping
  36. \begin{equation}
  37. f: A\rightarrow B.
  38. \end{equation}
  39. In other words, the function $f$ maps elements $x\in A$ to values
  40. $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
  41. of $f$. Functions are capable of representing the state of a system as a value
  42. based on an input value from their domain. One illustrative example is a
  43. function that maps a time step to the distance covered since a starting point.
  44. In this case, time serves as the domain, while the distance is the codomain.
  45. % -------------------------------------------------------------------
  46. \section{Mathematical Modelling using Differential Equations}
  47. \label{sec:differentialEq}
  48. Often, the behavior of a variable or a quantity across a domain is more
  49. interesting than its current state. Functions are able to give us the latter,
  50. but do not contain information about the change of a system. The objective
  51. is to determine an effective method for calculating the change of a function
  52. across its domain. Let $f$ be a function and $[a, b]\subset \mathbb{R}$ an
  53. interval of real numbers. The expression
  54. \begin{equation}
  55. m = \frac{f(b) - f(a)}{a-b}
  56. \end{equation}
  57. gives the average rate of change. While the average rate of change is useful in
  58. many cases, the momentary rate of change is more accurate. To calculate the
  59. momentary rate of change at $x$, we let the value $t$ approach $x$ thereby
  60. narrowing down the interval to an infinitesimal. For each $x\in[a, b]$ we
  61. calculate
  62. \begin{equation} \label{eqn:differential}
  63. \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
  64. \end{equation}
  65. if it exists. As the Tenenbaum and Pollard~\cite{Tenenbaum1985} define,
  66. $\nicefrac{df}{dx}$ is the \emph{derivative}, which is ``the rate of change of a
  67. variable with respect to another''. The relation between a variable and its
  68. derivative is modeled in a \emph{differential equation}. The derivative of
  69. $\nicefrac{df}{dx}$ yields $\nicefrac{d^2f}{dx^2}$, which is the function that
  70. calculates the rate of change of the rate of change and is called the
  71. \emph{second order derivative}. Iterating this $n$ times results in
  72. $\nicefrac{d^nf}{dx^n}$, the derivative of the $n$'th order. A method for
  73. obtaining a differential equation is to derive it from the semantics of a
  74. problem. For example, in physics a differential equation can be derived from the
  75. law of the conservation of energy~\cite{Demtroeder2021}. Differential equations
  76. find application in several areas such as engineering \eg, the Kirchhoff's
  77. circuit laws~\cite{Kirchhoff1845} to describe the relation between the voltage
  78. and current in systems with resistors, inductors, and capacitors; physics with,
  79. \eg, the Schrödinger equation, which predicts the probability of finding
  80. particles like electrons in specific places or states in a quantum system;
  81. economics, \eg, Black-Scholes equation~\cite{Oksendal2000} predicting the price
  82. of financial derivatives, such as options, over time; epidemiology with the SIR
  83. Model~\cite{1927}; and beyond.\\
  84. In the context of functions, it is possible to have multiple domains, meaning
  85. that a function has more than one parameter. To illustrate, consider a function
  86. operating in two-dimensional space, wherein each parameter represents one axis.
  87. Another example would be a function, that maps its inputs of a location variable
  88. and a time variable on a height. The term \emph{partial differential equations}
  89. (PDE) describes differential equations of such functions, which contain
  90. partial derivatives with respect to each individual domain. In contrast,
  91. \emph{ordinary differential equations} (ODE) are the single derivatives for a
  92. function having only one domain~\cite{Tenenbaum1985}. In this thesis, we
  93. restrict ourselves to ODE's. Furthermore, a
  94. \emph{system of differential equations} is the name for a set of differential
  95. equations. The derivatives in a system of differential equations each have their
  96. own codomain, which is part of the problem, while they all share the same
  97. domain.\\
  98. Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
  99. including the \emph{Motion of a Particle Along a Straight Line}. Further,
  100. Newton's second law states that ``the rate of change of the momentum of a body
  101. ($momentum = mass \cdot velocity$) is proportional to the resultant external
  102. force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
  103. in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
  104. Then, Newton's second law translates mathematically to
  105. \begin{equation} \label{eq:newtonSecLaw}
  106. F = m\frac{dv}{dt}.
  107. \end{equation}
  108. It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
  109. the velocity is part of the equation. Additionally, the velocity of a body is
  110. the derivative of the distance traveled by that body. Based on these findings,
  111. we can rewrite the~\Cref{eq:newtonSecLaw} to,
  112. \begin{equation}
  113. F=ma=m\frac{d^2s}{dt^2},
  114. \end{equation}
  115. showing that the force $F$ influences the changes of the body's position over
  116. time.\\
  117. To conclude, note that this explanation of differential equations focuses on the
  118. aspects deemed crucial for this thesis and is not intended to be a complete
  119. explanation of the subject. To gain a better understanding of it, we recommend
  120. the books mentioned above~\cite{Rudin2007,Tenenbaum1985}. In the following
  121. section we describe the application of these principles in epidemiological
  122. models.
  123. % -------------------------------------------------------------------
  124. \section{Epidemiological Models}
  125. \label{sec:epidemModel}
  126. Pandemics, like \emph{COVID-19}, have resulted in a significant
  127. number of fatalities. Hence, the question arises: How should we analyze a
  128. pandemic effectively? It is essential to study whether the employed
  129. countermeasures are efficacious in combating the pandemic. Given the unfavorable
  130. public response~\cite{Jaschke2023} to measures such as lockdowns, it is
  131. imperative to investigate if their efficacy remains commensurate with the costs
  132. incurred to those affected. In the event that alternative and novel technologies
  133. were in use, such as the mRNA vaccines in the context of COVID-19, it is needful
  134. to test the effect and find the optimal variant. In order to shed light on the
  135. aforementioned events, we need a method to quantify the pandemic along with its
  136. course of progression.\\
  137. The real world is a highly complex system, which presents a significant
  138. challenge attempting to describe it fully in a mathematical model. Therefore,
  139. the model must reduce the complexity while retaining the essential information.
  140. Furthermore, it must address the issue of limited data availability. For
  141. instance, during COVID-19 institutions such as the Robert Koch Institute
  142. (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
  143. able to collect data on infections and mortality cases. Consequently, we require
  144. a model that employs an abstraction of the real world to illustrate the events
  145. and relations that are pivotal to understanding the problem.
  146. % -------------------------------------------------------------------
  147. \subsection{SIR Model}
  148. \label{sec:pandemicModel:sir}
  149. In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
  150. which subsequently became one of the most influential epidemiological models.
  151. This model enables the modeling of infections during epidemiological events such as pandemics.
  152. The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
  153. reiterates the model and serves as the foundation for the following explanation
  154. of SIR models.\\
  155. The SIR model is capable of illustrating diseases, which are transferred through
  156. contact or proximity of an individual carrying the illness and a healthy
  157. individual. This is possible due to the distinction between infected individuals
  158. who are carriers of the disease and the part of the population, which is
  159. susceptible to infection. In the model, the mentioned groups are capable to
  160. change, \eg, healthy individuals becoming infected. The model assumes the
  161. size $N$ of the population remains constant throughout the duration of the
  162. pandemic. The population $N$ comprises three distinct compartments: the
  163. \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
  164. \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
  165. \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
  166. \begin{equation}
  167. S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
  168. \end{equation}
  169. give the values of $S$, $I$ and $R$ at a certain point of time
  170. $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
  171. \begin{equation} \label{eq:N_char}
  172. N = S + I + R.
  173. \end{equation}
  174. The model makes another assumption by stating that recovered people are immune
  175. to the illness and infectious individuals can not infect them. The individuals
  176. in the $R$ group are either recovered or deceased, and thus unable to transmit
  177. or carry the disease.
  178. \begin{figure}[t]
  179. \centering
  180. \includegraphics[width=\textwidth]{sir_graph.pdf}
  181. \caption{A visualization of the SIR model~\cite{1927}, illustrating $N$ being split in the
  182. three groups $S$, $I$ and $R$.}
  183. \label{fig:sir_model}
  184. \end{figure}
  185. As visualized in the~\Cref{fig:sir_model} the
  186. individuals may transition between groups based on epidemiological parameters. The
  187. transmission rate $\beta$ is responsible for individuals becoming infected,
  188. while the rate of removal or recovery rate $\alpha$ (also referred to as
  189. $\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
  190. individuals from $I$ to $R$.\\
  191. We can describe this problem mathematically using a system of differential
  192. equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
  193. McKendrick~\cite{1927} propose the following set of differential equations:
  194. \begin{equation}\label{eq:sir}
  195. \begin{split}
  196. \frac{dS}{dt} &= -\beta S I,\\
  197. \frac{dI}{dt} &= \beta S I - \alpha I,\\
  198. \frac{dR}{dt} &= \alpha I.
  199. \end{split}
  200. \end{equation}
  201. This set of differential equations, is based on the following assumption:
  202. ``The rate of transmission of a microparasitic disease is proportional to the
  203. rate of encounter of susceptible and infective individuals modelled by the
  204. product ($\beta S I$)'', according to Edelstein-Keshet~\cite{EdelsteinKeshet2005}.
  205. The system shows the change in size of the groups per time unit due to
  206. infections, recoveries, and deaths.\\
  207. The term $\beta SI$ describes the rate of encounters of susceptible and infected
  208. individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
  209. and May~\cite{Anderson1991} propose a modified model:
  210. \begin{equation}\label{eq:modSIR}
  211. \begin{split}
  212. \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
  213. \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
  214. \frac{dR}{dt} &= \alpha I.
  215. \end{split}
  216. \end{equation}
  217. By normylizing $\beta SI$ by $N$ the~\Cref{eq:modSIR} is more correct in a
  218. real world~aspect~\cite{Anderson1991}.\\
  219. The initial phase of a pandemic is characterized by the infection of a small
  220. number of individuals, while the majority of the population remains susceptible.
  221. The infectious group has not yet infected any individuals thus
  222. neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
  223. the number of infected individuals at the beginning of the disease. Then,
  224. \begin{equation}\label{eq:startCond}
  225. \begin{split}
  226. S(0) &= N - I_{0},\\
  227. I(0) &= I_{0},\\
  228. R(0) &= 0,
  229. \end{split}
  230. \end{equation}
  231. describes the initial configuration of a system in which a disease has just
  232. emerged.\\
  233. \begin{figure}[t]
  234. %\centering
  235. \setlength{\unitlength}{1cm} % Set the unit length for coordinates
  236. \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
  237. % reference
  238. \put(0, 1.75){
  239. \begin{subfigure}{0.35\textwidth}
  240. \centering
  241. \includegraphics[width=\textwidth]{reference_params_synth.pdf}
  242. \caption{$\alpha=0.35$, $\beta=0.5$}
  243. \label{fig:synth_norm}
  244. \end{subfigure}
  245. }
  246. % 1. row, 1.image (low beta)
  247. \put(5.25, 5){
  248. \begin{subfigure}{0.3\textwidth}
  249. \centering
  250. \includegraphics[width=\textwidth]{low_beta_synth.pdf}
  251. \caption{$\alpha=0.25$, $\beta=0.5$}
  252. \label{fig:synth_low_beta}
  253. \end{subfigure}
  254. }
  255. % 1. row, 2.image (high beta)
  256. \put(9.5, 5){
  257. \begin{subfigure}{0.3\textwidth}
  258. \centering
  259. \includegraphics[width=\textwidth]{high_beta_synth.pdf}
  260. \caption{$\alpha=0.45$, $\beta=0.5$}
  261. \label{fig:synth_high_beta}
  262. \end{subfigure}
  263. }
  264. % 2. row, 1.image (low alpha)
  265. \put(5.25, 0){
  266. \begin{subfigure}{0.3\textwidth}
  267. \centering
  268. \includegraphics[width=\textwidth]{low_alpha_synth.pdf}
  269. \caption{$\alpha=0.35$, $\beta=0.4$}
  270. \label{fig:synth_low_alpha}
  271. \end{subfigure}
  272. }
  273. % 2. row, 2.image (high alpha)
  274. \put(9.5, 0){
  275. \begin{subfigure}{0.3\textwidth}
  276. \centering
  277. \includegraphics[width=\textwidth]{high_alpha_synth.pdf}
  278. \caption{$\alpha=0.35$, $\beta=0.6$}
  279. \label{fig:synth_high_alpha}
  280. \end{subfigure}
  281. }
  282. \end{picture}
  283. \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$,
  284. $I_0=10$ with different sets of parameters. We visualize the case with the
  285. reference parameters in (a). In (b) and (c) we keep $\alpha$ constant, while
  286. varying the value of $\beta$. In contrast, (d) and (e) have varying values
  287. of $\alpha$.}
  288. \label{fig:synth_sir}
  289. \end{figure}
  290. In the SIR model the temporal occurrence and the height of the peak (or peaks)
  291. of the infectious group are of great importance for understanding the
  292. dynamics of a pandemic. A low peak occurring at a late point in time indicates
  293. that the disease is unable to keep pace with the rate of recovery, resulting
  294. in its demise before it can exert a significant influence on the population. In
  295. contrast, an early and high peak means that the disease is rapidly transmitted
  296. through the population, with a significant proportion of individuals having been
  297. infected.~\Cref{fig:sir_model} illustrates this effect by varying the values of
  298. $\beta$ or $\alpha$ while simulating a pandemic using a model such
  299. as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
  300. and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
  301. When the number of infections exceeds the number of recoveries, the peak of $I$
  302. will occur early and will be high. On the other hand, if recoveries occur at a
  303. faster rate than new infections the peak will occur later and will be low. Thus,
  304. it is crucial to know both $\beta$ and $\alpha$, as these parameters
  305. characterize how the pandemic evolves.\\
  306. The SIR model is based on a number of assumptions that are intended to reduce
  307. the overall complexity of the model while still representing the processes
  308. observed in the real-world. For example, the size of a population
  309. in the real-world is subject to a number of factors that can contribute to
  310. change. The population is increased by the occurrence of births and decreased
  311. by the occurrence of deaths. One assumption, stated in the SIR model is that
  312. the size of the population, $N$, remains constant, as the daily change is
  313. negligible to the total population. Other examples include the impossibility
  314. for individuals to be susceptible again, after having recovered, or the
  315. possibility for the epidemiological parameters to change due to new variants or the
  316. implementation of new countermeasures. We address this latter option in the
  317. next~\Cref{sec:pandemicModel:rsir}.
  318. % -------------------------------------------------------------------
  319. \subsection{Reduced SIR Model and the Reproduction Number}
  320. \label{sec:pandemicModel:rsir}
  321. The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. This model
  322. contains two scalar parameters $\beta$ and $\alpha$, which describe the course
  323. of a pandemic over its duration. This is beneficial when examining the overall
  324. pandemic; however, in the real world, disease behavior is dynamic, and the
  325. values of the parameters $\beta$ and $\alpha$ change throughout the course of
  326. the disease. The reason for this is due to events such as the implementation of
  327. countermeasures that reduce the contact between the infectious and susceptible
  328. individuals, the emergence of a new variant of the disease that increases its
  329. infectivity or deadliness, or the administration of a vaccination that provides
  330. previously susceptible individuals with immunity without ever being infected.
  331. As these fine details of disease progression are missed in the constant rates,
  332. Liu and Stechlinski~\cite{Liu2012}, and Setianto and Hidayat~\cite{Setianto2023},
  333. introduce time-dependent epidemiological parameters and the time-dependent reproduction
  334. number to address this issue. Millevoi \etal~\cite{Millevoi2023} present a
  335. reduced version of the SIR model.\\
  336. For the time interval, $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
  337. they alter the definition of $\beta$ and $\alpha$ to be time-dependent,
  338. \begin{equation}
  339. \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
  340. \end{equation}
  341. Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
  342. span an infected individual requires to recuperate. Subsequently, at the initial time point
  343. $t_0$, the \emph{reproduction number},
  344. \begin{equation}
  345. \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
  346. \end{equation}
  347. represents the number of susceptible individuals, that one infectious individual
  348. infects at the onset of the pandemic. In light of the effects of $\beta$ and
  349. $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO < 1$ indicates that the
  350. pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
  351. limited number of infections resulting from $I(t_0) << S(t_0)$. Further,
  352. $\RO > 1$ leads to the disease spreading rapidly across the population, with an
  353. increase in $I$ occurring at a high rate. Nevertheless, $\RO$ does not cover
  354. the entire time span. For this reason, Millevoi \etal~\cite{Millevoi2023}
  355. introduce $\Rt$ which has the same interpretation as $\RO$, with the exception
  356. that $\Rt$ is dependent on time. The time-dependent reproduction number is
  357. defined as,
  358. \begin{equation}\label{eq:repr_num}
  359. \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N},
  360. \end{equation}
  361. on the time interval $\mathcal{T}$ and the population site $N$. This definition
  362. includes the epidemiological parameters for information about the spread of the disease
  363. and information of the decrease of the ratio of susceptible individuals in the
  364. population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
  365. a state variable in the model, which gives information about the reproduction of the disease
  366. for each day. As Millevoi \etal~\cite{Millevoi2023} show, $\Rt$ enables the
  367. following reduction of the SIR model.\\
  368. \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
  369. $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
  370. \begin{equation}
  371. \begin{split}
  372. \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
  373. \frac{dI}{dt} &= -\alpha\Rt I(t),
  374. \end{split}
  375. \end{equation}
  376. is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
  377. characteristic and the reproduction number $\Rt$ (see ~\Cref{eq:repr_num}).
  378. Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
  379. extensive range of values that the SIR groups can assume. Accordingly, they
  380. initially scale the time interval $\mathcal{T}$ using its borders to calculate
  381. the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in[0, 1]$. Subsequently, they
  382. calculate the scaled groups,
  383. \begin{equation}
  384. S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
  385. \end{equation}
  386. using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
  387. variable $I$, results in,
  388. \begin{equation}\label{eq:reduced_sir_ODE}
  389. \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
  390. \end{equation}
  391. which is a further reduced version of~\Cref{eq:sir}. This less complex
  392. differential equation results in a less complex solution, as it entails the
  393. elimination of a parameter ($\beta$) and the two state variables ($S$ and $R$).
  394. The reduced SIR model is more precise due to fewer input variables, making it
  395. advantageous in situations with limited data, such as when recovery data is
  396. missing.
  397. % -------------------------------------------------------------------
  398. \section{Multilayer Perceptron}
  399. \label{sec:mlp}
  400. In~\Cref{sec:differentialEq}, we discuss the modeling of systems using differential
  401. equations in systems, illustrating how they can be utilized to elucidate the
  402. impact of a specific parameter on the system's behavior.
  403. In~\Cref{sec:epidemModel}, we show specific applications of differential
  404. equations in an epidemiological context. Solving such systems is crucial and
  405. involves finding a function that best fits the data. Fitting measured data points to
  406. approximate such a function, is one of the multiple methods to achieve this
  407. goal. The \emph{Multilayer Perceptron} (MLP)~\cite{Rumelhart1986} is a
  408. data-driven function approximator. In the following section, we provide a brief
  409. overview of the structure and training of these \emph{neural networks}. For
  410. reference, we use the book \emph{Deep Learning} by Goodfellow
  411. \etal~\cite{Goodfellow-et-al-2016} as a foundation for our explanations.\\
  412. The objective is to develop an approximation method for any function $f^{*}$,
  413. which could be a mathematical function or a mapping of an input vector to the
  414. desired output. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
  415. the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
  416. is the function to approximate. In the year 1958,
  417. Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
  418. a neuron in a neuroscientific sense. The perceptron takes in the input vector
  419. $\boldsymbol{x}$ performs an operation and produces a scalar result. This model
  420. optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
  421. f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
  422. Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
  423. approximating a specific class of functions. Consequently, there is a necessity
  424. for an expansion of the perceptron.\\
  425. As Goodfellow \etal~\cite{Goodfellow-et-al-2016} proceed, the solution to this issue is to decompose $f$ into
  426. a chain structure of the form,
  427. \begin{equation} \label{eq:mlp_char}
  428. f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
  429. \end{equation}
  430. This nested version of a perceptron is a multilayer perceptron. Each
  431. sub-function, designated as $f^{(i)}$, is represented in the structure of an
  432. MLP as a \emph{layer}, which contains a linear mapping and a nonlinear mapping
  433. in form of an \emph{activation function}. A multitude of
  434. \emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
  435. same vector-to-scalar calculation as the perceptron does. Subsequently, a
  436. nonlinear activation function transforms the scalar output into the activation
  437. of the unit. The layers are staggered in the neural network, with each layer
  438. being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
  439. input vector $\boldsymbol{x}$ is provided to each unit of the first layer
  440. $f^{(1)}$, which then gives the results to the units of the second layer
  441. $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
  442. intervening layers, situated between the first and the output layers are the
  443. \emph{hidden layers}. The term \emph{forward propagation} describes the
  444. process of information flowing through the network from the input layer to the
  445. output layer, resulting in a scalar loss. The alternating structure of linear
  446. and nonlinear calculation enables MLP's to approximate any function. As Hornik
  447. \etal~\cite{Hornik1989} proves, MLP's are universal approximators.\\
  448. \begin{figure}[t]
  449. \centering
  450. \includegraphics[width=\textwidth]{MLP.pdf}
  451. \caption{A illustration of an MLP~\cite{Rumelhart1986} with two hidden layers.
  452. Each neuron of a layer is connected to every neuron of the neighboring
  453. layers. The arrow indicates the direction of the forward propagation.}
  454. \label{fig:mlp_example}
  455. \end{figure}
  456. The term \emph{training} describes the process of optimizing the parameters
  457. $\theta$. In order to undertake training, it is necessary to have a set of
  458. \emph{training data}, which is a set of pairs (also called training points) of
  459. the input data $\boldsymbol{x}$ and its corresponding true solution
  460. $\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
  461. define a \emph{loss function} $\Loss{ }$, using the model prediction
  462. $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
  463. metric for evaluating the extent to which the model deviates from the correct
  464. answer. One common loss function is the \emph{mean square error}
  465. (MSE) loss function. Let $N$ be the number of points in the set of training
  466. data. Then,
  467. \begin{equation} \label{eq:mse}
  468. \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
  469. \end{equation}
  470. calculates the squared difference between each model prediction and true value
  471. of a training and takes the mean across the whole training data. \\
  472. Ultimately, the objective is to utilize this information to optimize the parameters, in order to minimize the
  473. loss. One of the most fundamental and seminal optimization strategy is \emph{gradient
  474. descent}. In this process, the derivatives are employed to identify the location
  475. of local or global minima within a function, which lie where the gradient is
  476. zero. Given that a positive gradient
  477. signifies ascent and a negative gradient indicates descent, we must move the
  478. variable by a \emph{learning rate} (step size) in the opposite
  479. direction to that of the gradient. The calculation of the derivatives in respect
  480. to the parameters is a complex task, since our functions is a composition of
  481. many functions (one for each layer). We can address this issue taking advantage
  482. of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
  483. $\hat{\boldsymbol{y}} = f(\boldsymbol{x}; \theta)$ be the model prediction with the
  484. decomposed version $f(\boldsymbol{x}; \theta) = f^{(3)}(w; \theta_3)$ with
  485. $w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
  486. $\boldsymbol{x}$ is the input vector and $\theta_3, \theta_2, \theta_1\subset\theta$.
  487. Then,
  488. \begin{equation}\label{eq:backprop}
  489. \nabla_{\theta_3} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(3)}}\nabla_{\theta_3}f^{(3)},
  490. \end{equation}
  491. is the gradient of $\Loss{ }$ in respect of the parameters $\theta_3$. To obtain
  492. $\nabla_{\theta_2} \Loss{ }$, we have to derive $\nabla_{\theta_3} \Loss{ }$ in
  493. respect to $\theta_2$. The name of this method in the context of neural
  494. networks is \emph{back propagation}~\cite{Rumelhart1986}, as it propagates the
  495. error backwards through the neural network.\\
  496. In practical applications, an optimizer often accomplishes the optimization task
  497. by executing back propagation in the background. Furthermore, modifying the
  498. learning rate during training can be advantageous. For instance, making larger
  499. steps at the beginning and minor adjustments at the end. Therefore, schedulers
  500. are implementations algorithms that employ diverse learning rate alteration
  501. strategies.\\
  502. For a more in-depth discussion of practical considerations and additional
  503. details like regularization, we direct the reader to the book
  504. \emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next
  505. section will demonstrate the application of neural networks in approximating
  506. solutions to differential systems.
  507. % -------------------------------------------------------------------
  508. \section{Physics-Informed Neural Networks}
  509. \label{sec:pinn}
  510. In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
  511. wildely recognized tools for approximating any kind of function. In 1997
  512. Lagaris \etal~\cite{Lagaris1998} provided a method, that utilizes gradient
  513. descent to solve ODEs and PDEs. Building on this approach, Raissi
  514. \etal~\cite{Raissi2019} introduced the methodology with the name
  515. \emph{Physics-Informed Neural Network} (PINN) in 2017. In this approach, the
  516. model learns to approximate a function using provided data points while
  517. leveraging the available knowledge about the problem in the form of a system of
  518. differential equations.\\
  519. In contrast to standard MLP models, PINNs are not solely data-driven. The differential
  520. equation,
  521. \begin{equation}
  522. \boldsymbol{y}=\mathcal{D}(\boldsymbol{x}),
  523. \end{equation}
  524. includes both the solution $\boldsymbol{y}$, and the operant $\mathcal{D}$,
  525. which incorporates all derivatives with respect to the input $\boldsymbol{x}$.
  526. This equation contains the information of the physical properties and dynamics of
  527. $\boldsymbol{y}$. In order to find the solution $\boldsymbol{y}$, we must solve the
  528. differential equation with respect to data which is related to the problem at hand. As
  529. Raissi \etal~\cite{Raissi2019} propose, we employ a neural network with the
  530. parameters $\theta$. The MLP then is supposed to optimize its parameters for
  531. its output $\hat{\boldsymbol{y}}$ to approximate the solution $\boldsymbol{y}$.
  532. In order to achieve this, we train the model on data containing input-output
  533. pairs with measures of $\boldsymbol{y}$. The output $\hat{\boldsymbol{y}}$ is
  534. fitted to the data through the mean square error data loss $\mathcal{L}_{\text{data}}$.
  535. Moreover, the data loss function may include additional terms for initial and boundary
  536. conditions. Furthermore, the physics are incorporated through an additional loss
  537. term of the physics loss $\mathcal{L}_{\text{physics}}$ which includes the
  538. differential equation through its residual $r=\boldsymbol{y} - \mathcal{D}(\boldsymbol{x})$.
  539. This leads to the PINN loss function,
  540. \begin{align}\label{eq:PINN_loss}
  541. \mathcal{L}_{\text{PINN}}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) & = & & \mathcal{L}_{\text{data}} (\boldsymbol{y},\hat{\boldsymbol{y}}) & + & \quad \mathcal{L}_{\text{physics}} (\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) & \\
  542. & = & & \frac{1}{N_t}\sum_{i=1}^{N_t} || \hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2 & + & \quad\frac{1}{N_d}\sum_{i=1}^{N_d} || r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 & ,
  543. \end{align}
  544. with $N_d$ the number of differential equations in a system and $N_t$ the
  545. number of training samples used for training. Utilizing~\Cref{eq:PINN_loss}, the
  546. PINN simultaneously optimizes its parameters $\theta$ to minimize both the data
  547. loss and the physics loss. This makes it a multi-objective optimization problem.\\
  548. Given the nature of differential equations, calculating the loss term of
  549. $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
  550. calculation of the derivative of the output with respect to the input of
  551. the neural network. As we outline in~\Cref{sec:mlp}, during the process of
  552. back-propagation we calculate the gradients of the loss term in respect to a
  553. layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
  554. the index of the respective layer. By employing
  555. the chain rule of calculus, the algorithm progresses from the output layer
  556. through each hidden layer, ultimately reaching the first layer in order to
  557. compute the respective gradients. The term,
  558. \begin{equation}
  559. \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
  560. \end{equation}
  561. illustrates that, in contrast to the procedure described in~\Cref{eq:backprop},
  562. this procedure the \emph{automatic differentiation} goes one step further and
  563. calculates the gradient of the output with respect to the input
  564. $\boldsymbol{x}$. In order to calculate the second derivative
  565. $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
  566. this procedure must be repeated.\\
  567. Above we present a method by Raissi et al.~\cite{Raissi2019} for approximating
  568. functions through the use of systems of differential equations. As previously
  569. stated, we want to find a
  570. solution for systems of differential equations. In problems where we must solve
  571. an ODE or PDE, we have to find a set of parameters that satisfies the system
  572. for any input $\boldsymbol{x}$. In the context of PINNs, this is an inverse
  573. problem. We have training data from measurements and the corresponding
  574. differential equations, but the parameters of these equations are unknown. To
  575. address this challenge, we implement these parameters as distinct learnable
  576. parameters within the neural network. This enables the network to utilize a
  577. specific value, that actively influences the physics loss
  578. $\mathcal{L}_{\text{physics}}(\boldsymbol{x},\hat{\boldsymbol{y}})$. During the
  579. training phase the optimizer aims to minimize the physics loss, which should
  580. ultimately yield an approximation of the true parameter value fitting the
  581. observations.\\
  582. \begin{figure}[t]
  583. \centering
  584. \includegraphics[width=\textwidth]{oscilator.pdf}
  585. \caption{Illustration of of the movement of an oscillating body in the
  586. underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
  587. \label{fig:spring}
  588. \end{figure}
  589. In order to illustrate the working of a PINN, we use the example of a
  590. \emph{damped harmonic oscillator} taken from~\cite{Moseley}. In this problem, we
  591. displace a body, which is attached to a spring, from its resting position. The
  592. body is subject to three forces: firstly, the inertia exerted by the
  593. displacement $u$, which points in the direction of the displacement; secondly,
  594. the restoring force of the spring, which attempts to return the body to its
  595. original position and thirdly, the friction force, which points in the opposite
  596. direction of the movement. In accordance with Newton's second law and the
  597. combined influence of these forces, the body exhibits oscillatory motion around
  598. its position of rest. The system is influenced by $m$ the mass of the body,
  599. $\mu$ the coefficient of friction and $k$ the spring constant, indicating the
  600. stiffness of the spring. The residual of the differential equation,
  601. \begin{equation}
  602. m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
  603. \end{equation}
  604. shows relation of these parameters in reference to the problem at hand. As
  605. Tenenbaum and Morris~\cite{Tenenbaum1985} provide, there are three potential solutions to this
  606. issue. However only the \emph{underdamped case} results in an oscillating
  607. movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
  608. PINN to this problem, we require a set of training data $x$. This consists of
  609. pairs of time points and corresponding displacement measurements
  610. $(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
  611. we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
  612. initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$. However, we do
  613. not know the value of the friction $\mu$. In this case the loss function,
  614. \begin{equation}
  615. \begin{split}
  616. \mathcal{L}_{\text{osc}}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = & (u^{(1)}-1)+\frac{du(0)}{dt}+ \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2 \\
  617. + & ||m\frac{d^2u}{dx^2}+\hat{\mu}\frac{du}{dx}+ku||^2,
  618. \end{split}
  619. \end{equation}
  620. includes the border conditions, the residual, in which $\hat{\mu}$ is a learnable
  621. parameter and the data loss. By minimizing $\mathcal{L}_{\text{osc}}$ and
  622. solving the inverse problem the PINN is able to find the missing parameter
  623. $\mu$. This shows the methodology by which PINNs are capable of learning the
  624. parameters of physical systems, such as the damped harmonic oscillator. In the
  625. following section, we present the approach of Shaier \etal~\cite{Shaier2021} to
  626. find the transmission rate and recovery rate of the SIR model using PINNs.
  627. % -------------------------------------------------------------------
  628. \subsection{Disease-Informed Neural Networks}
  629. \label{sec:pinn:dinn}
  630. In the preceding section, we present a data-driven methodology, as described by Lagaris
  631. \etal~\cite{Lagaris1998}, for solving systems of differential equations by employing
  632. PINNs. In~\Cref{sec:pandemicModel:sir}, we describe the SIR model, which models
  633. the relations of susceptible, infectious and removed individuals and simulates
  634. the progress of a disease in a population with a constant size. A system of
  635. differential equations models these relations. Shaier \etal~\cite{Shaier2021}
  636. propose a method to solve the equations of the SIR model using a PINN, which
  637. they call a \emph{Disease-Informed Neural Network} (DINN).\\
  638. To solve~\Cref{eq:sir} we need to find the transmission rate $\beta$ and the
  639. recovery rate $\alpha$. As Shaier \etal~\cite{Shaier2021} point out, there are
  640. different approaches to solve this set of equations. For instance, building on
  641. the assumption, that at the beginning one infected individual infects $-n$ other
  642. people, concluding in $\frac{dS(0)}{dt} = -n$. Then,
  643. \begin{equation}
  644. \beta=-\frac{\frac{dS}{dt}}{S_0I_0}
  645. \end{equation}
  646. would calculate the initial transmission rate using the initial size of the
  647. susceptible group $S_0$ and the infectious group $I_0$. The recovery rate, then
  648. could be defined using the amount of days a person between the point of
  649. infection and the start of isolation $d$, $\alpha = \frac{1}{d}$. The analytical
  650. solutions to the SIR models often use heuristic methods and require knowledge
  651. like the sizes $S_0$ and $I_0$. A data-driven approach such as the one that
  652. Shaier \etal~\cite{Shaier2021} propose does not suffer from these problems. Since the
  653. model learns the parameters $\beta$ and $\alpha$ while learning the training
  654. data consisting of the time points $\boldsymbol{t}$, and the corresponding
  655. measured sizes of the groups $\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}$.
  656. Let $\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}$ be the
  657. model predictions of the groups and
  658. $r_S=\frac{d\hat{\boldsymbol{S}}}{dt}+\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}},
  659. r_I=\frac{d\hat{\boldsymbol{I}}}{dt}-\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}}+\alpha \hat{\boldsymbol{I}}$
  660. and $r_R=\frac{d \hat{\boldsymbol{R}}}{dt} - \alpha \hat{\boldsymbol{I}}$ the
  661. residuals of each differential equation using the model predictions. Then,
  662. \begin{equation}
  663. \begin{split}
  664. \mathcal{L}_{SIR}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = &||r_S||^2 + ||r_I||^2 + ||r_R||^2\\
  665. + &\frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}||^2 + ||\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}||^2 + ||\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}||^2,
  666. \end{split}
  667. \end{equation}
  668. is the loss function of a DINN, with $\alpha$ and $\beta$ being learnable
  669. parameters. These are represented in the residuals of the ODEs.
  670. % -------------------------------------------------------------------