chap02.tex 38 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688
  1. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2. % Author: Phillip Rothenbeck
  3. % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
  4. % File: chap02/chap02.tex
  5. % Part: theoretical background
  6. % Description:
  7. % summary of the content in this chapter
  8. % Version: 20.08.2024
  9. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  10. \chapter{Theoretical Background 12}
  11. \label{chap:background}
  12. This chapter introduces the theoretical foundations for the work presented in
  13. this thesis. In~\Cref{sec:domain} and~\Cref{sec:differentialEq}, we describe
  14. differential equations and the underlying theory. In these Sections both the
  15. explanations and the approach are based on a book on analysis by
  16. Rudin~\cite{Rudin2007} and a book about ordinary differential equations by
  17. Tenenbaum and Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this
  18. knowledge to examine various pandemic models in~\Cref{sec:epidemModel}. Finally,
  19. we address the topic of neural networks with a focus on the multilayer
  20. perceptron in~\Cref{sec:mlp} and physics informed neural networks
  21. in~\Cref{sec:pinn}.
  22. % -------------------------------------------------------------------
  23. \section{Mathematical Modelling using Functions 1}
  24. \label{sec:domain}
  25. To model a physical problem mathematically, it is necessary to define a set of
  26. fundamental numbers\todo{meeting question 1} or quantities upon which the subsequent calculations will be
  27. based. These sets may represent, for instance, a specific time interval or a
  28. distance. The term \emph{domain} describes these fundamental sets of numbers or
  29. quantities~\cite{Rudin2007}. A \emph{variable} is a changing entity living in a
  30. certain domain. In this thesis, we will focus on domains of real numbers in
  31. $\mathbb{R}$.\\
  32. The mapping between variables enables the modeling of a physical process and may
  33. depict semantics. We use functions in order to facilitate this mapping. Let
  34. $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
  35. function as the mapping
  36. \begin{equation}
  37. f: A\rightarrow B.
  38. \end{equation}
  39. In other words, the function $f$ maps elements $x\in A$ to values
  40. $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
  41. of $f$. Functions are capable of representing the state of a system as a value
  42. based on an input value from their domain. One illustrative example is a
  43. function that maps a time step to the distance covered since a starting point.
  44. In this case, time serves as the domain, while the distance is the codomain.
  45. % -------------------------------------------------------------------
  46. \section{Mathematical Modelling using Differential Equations 1}
  47. \label{sec:differentialEq}
  48. Often, the behavior of a variable or a quantity across a domain is more
  49. interesting than its current state. Functions are able to give us the latter, \todo{meeting question 2}
  50. but only passively give information about the change of a system. The objective
  51. is to determine an effective method for calculating the change of a function
  52. across its domain. Let $f$ be a function and $[a, b]\subset \mathbb{R}$ an
  53. interval of real numbers. The expression
  54. \begin{equation}
  55. m = \frac{f(b) - f(a)}{a-b}
  56. \end{equation}
  57. gives the average rate of change. While the average rate of change is useful in
  58. many cases, the momentary rate of change is more accurate. To calculate this, \todo{look up in Rudin - cite (wordly)}
  59. we need to narrow down, the interval to an infinitesimal. For each $x\in[a, b]$
  60. we calculate
  61. \begin{equation} \label{eqn:differential}
  62. \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
  63. \end{equation}
  64. if it exists. As the Tenenbaum and Pollard~\cite{Tenenbaum1985} define,
  65. $\nicefrac{df}{dx}$ is the \emph{derivative}, which is ``the rate of change of a
  66. variable with respect to another''. The relation between a variable and its
  67. derivative is modeled in a \emph{differential equation}. The derivative of
  68. $\nicefrac{df}{dx}$ yields $\nicefrac{d^2f}{dx^2}$, which is the function that
  69. calculates the rate of change of the rate of change and is called the
  70. \emph{second order derivative}. Iterating this $n$ times results in
  71. $\nicefrac{d^nf}{dx^n}$, the derivative of the $n$'th order. A method for
  72. obtaining a differential equation is to derive it from the semantics of a
  73. problem. For example, in physics a differential equation can be derived from the
  74. law of the conservation of energy~\cite{Demtroeder2021}. Differential equations \todo{is this good?}
  75. find application in several areas such as engineering \eg, the Chua's
  76. circuit~\cite{Matsumoto1984}, physics with, \eg, the Schrödinger
  77. equation~\cite{Schroedinger1926}, economics, \eg, Black-Scholes
  78. equation~\cite{Oksendal2000}, epidemiology, and beyond.\\
  79. In the context of functions, it is possible to have multiple domains, meaning
  80. that function has more than one parameter. To illustrate, consider a function
  81. operating in two-dimensional space, wherein each parameter represents one axis.
  82. Another example would be a function, that maps its inputs of a location variable
  83. and a time variable on a height. The term \emph{partial differential equations}
  84. (\emph{PDE}'s) describes differential equations of such functions, which contain
  85. partial derivatives with respect to each individual domain. In contrast, \emph{ordinary differential
  86. equations} (\emph{ODE}'s) are the single derivatives for a function having only
  87. one domain~\cite{Tenenbaum1985}. In this thesis, we restrict ourselves to ODE's.\\
  88. A \emph{system of differential equations} is the name for a set of differential
  89. equations. The derivatives in a system of differential equations each have their
  90. own codomain, which is part of the problem, while they all share the same
  91. domain.\\
  92. Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
  93. including the \emph{Motion of a Particle Along a Straight Line}. Further,
  94. Newton's second law states that ``the rate of change of the momentum of a body
  95. ($momentum = mass \cdot velocity$) is proportional to the resultant external
  96. force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
  97. in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
  98. Then, Newton's second law translates mathematically to
  99. \begin{equation} \label{eq:newtonSecLaw}
  100. F = m\frac{dv}{dt}.
  101. \end{equation}
  102. It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
  103. the velocity is part of the equation. Additionally, the velocity of a body is
  104. the derivative of the distance traveled by that body. Based on these findings,
  105. we can rewrite the~\Cref{eq:newtonSecLaw} to
  106. \begin{equation}
  107. F=ma=m\frac{d^2s}{dt^2}.
  108. \end{equation}\\
  109. To conclude, note that this explanation of differential equations focuses on the
  110. aspects deemed crucial for this thesis and is not intended to be a complete
  111. explanation of the subject. To gain a better understanding of it, we recommend
  112. the books mentioned above~\cite{Rudin2007,Tenenbaum1985}. In the following
  113. section we describe the application of these principles in epidemiological
  114. models.
  115. % -------------------------------------------------------------------
  116. \section{Epidemiological Models 4}
  117. \label{sec:epidemModel}
  118. Pandemics, like \emph{COVID-19}, which have resulted in a significant
  119. number of fatalities. Hence, the question arises: How should we analyze a \todo{Better?}
  120. pandemic effectively? It is essential to study whether the employed
  121. countermeasures are efficacious in combating the pandemic. Given the unfavorable
  122. public response to measures such as lockdowns, it is imperative to investigate
  123. that their efficacy remains commensurate with the costs incurred to those
  124. affected. In the event that alternative and novel technologies were in use, such
  125. as the mRNA vaccines in the context of COVID-19, it is needful to test the
  126. effect and find the optimal variant. In order to shed light on the
  127. aforementioned events, we need a method to quantify the pandemic along with its
  128. course of progression.\\
  129. The real world is a highly complex system, which presents a significant
  130. challenge attempting to describe it fully in a mathematical model. Therefore,
  131. the model must reduce the complexity while retaining the essential information.
  132. Furthermore, it must address the issue of limited data availability. For
  133. instance, during COVID-19 institutions such as the Robert Koch Institute
  134. (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
  135. able to collect data on infections and mortality cases. Consequently, we require
  136. a model that employs an abstraction of the real world to illustrate the events
  137. and relations that are pivotal to understanding the problem.
  138. % -------------------------------------------------------------------
  139. \subsection{SIR Model 3}
  140. \label{sec:pandemicModel:sir}
  141. In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
  142. which subsequently became one of the most influential epidemiological models.
  143. This model enables the modeling of infections during epidemiological events such as pandemics.
  144. The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
  145. reiterates the model and serves as the foundation for the following explanation
  146. of SIR models.\\
  147. The SIR model is capable of illustrating diseases, which are transferred through
  148. contact or proximity of an individual carrying the illness and a healthy
  149. individual. This is possible due to the distinction between infected individuals
  150. who are carriers of the disease and the part of the population, which is
  151. susceptible to infection. In the model, the mentioned groups are capable to
  152. change, \eg, healthy individuals becoming infected. The model assumes the
  153. size $N$ of the population remains constant throughout the duration of the
  154. pandemic. The population $N$ comprises three distinct compartments: the
  155. \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
  156. \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
  157. \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
  158. \begin{equation}
  159. S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
  160. \end{equation}
  161. give the values of $S$, $I$ and $R$ at a certain point of time
  162. $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
  163. \begin{equation} \label{eq:N_char}
  164. N = S + I + R.
  165. \end{equation}
  166. The model makes another assumption by stating that recovered people are immune
  167. to the illness and infectious individuals can not infect them. The individuals
  168. in the $R$ group are either recovered or deceased, and thus unable to transmit
  169. or carry the disease.
  170. \begin{figure}[h]
  171. \centering
  172. \includegraphics[width=\textwidth]{sir_graph.pdf}
  173. \caption{A visualization of the SIR model, illustrating $N$ being split in the
  174. three groups $S$, $I$ and $R$.}
  175. \label{fig:sir_model}
  176. \end{figure}
  177. As visualized in the~\Cref{fig:sir_model} the
  178. individuals may transition between groups based on transition rates. The
  179. transmission rate $\beta$ is responsible for individuals becoming infected,
  180. while the rate of removal or recovery rate $\alpha$ (also referred to as
  181. $\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
  182. individuals from $I$ to $R$.\\
  183. We can describe this problem mathematically using a system of differential
  184. equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
  185. McKendrick~\cite{1927} propose the following set of differential equations:
  186. \begin{equation}\label{eq:sir}
  187. \begin{split}
  188. \frac{dS}{dt} &= -\beta S I,\\
  189. \frac{dI}{dt} &= \beta S I - \alpha I,\\
  190. \frac{dR}{dt} &= \alpha I.
  191. \end{split}
  192. \end{equation}
  193. This set of differential equations, is based on the following assumption:
  194. ``The rate of transmission of a microparasitic disease is proportional to the
  195. rate of encounter of susceptible and infective individuals modelled by the
  196. product ($\beta S I$)'', according to Edelstein-Keshet~\cite{EdelsteinKeshet2005}.
  197. The system shows the change in size of the groups per time unit due to
  198. infections, recoveries, and deaths.\\
  199. The term $\beta SI$ describes the rate of encounters of susceptible and infected
  200. individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
  201. and May~\cite{Anderson1991} propose a modified model:
  202. \begin{equation}\label{eq:modSIR}
  203. \begin{split}
  204. \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
  205. \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
  206. \frac{dR}{dt} &= \alpha I.
  207. \end{split}
  208. \end{equation}
  209. In~\Cref{eq:modSIR} $\beta SI$ gets normalized by $N$, which is more correct in
  210. a real world aspect~\cite{Anderson1991}.\\
  211. The initial phase of a pandemic is characterized by the infection of a small
  212. number of individuals, while the majority of the population remains susceptible.
  213. The infectious group has not yet infected any individuals thus
  214. neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
  215. the number of infected individuals at the beginning of the disease. Then,
  216. \begin{equation}\label{eq:startCond}
  217. \begin{split}
  218. S(0) &= N - I_{0},\\
  219. I(0) &= I_{0},\\
  220. R(0) &= 0,
  221. \end{split}
  222. \end{equation}
  223. describes the initial configuration of a system in which a disease has just
  224. emerged.\\
  225. \begin{figure}[h]
  226. %\centering
  227. \setlength{\unitlength}{1cm} % Set the unit length for coordinates
  228. \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
  229. % reference
  230. \put(0, 1.75){
  231. \begin{subfigure}{0.4\textwidth}
  232. \centering
  233. \includegraphics[width=\textwidth]{reference_params_synth.png}
  234. \caption{$\alpha=0.35$, $\beta=0.5$}
  235. \label{fig:synth_norm}
  236. \end{subfigure}
  237. }
  238. % 1. row, 1.image (low beta)
  239. \put(5.5, 5){
  240. \begin{subfigure}{0.3\textwidth}
  241. \centering
  242. \includegraphics[width=\textwidth]{low_beta_synth.png}
  243. \caption{$\alpha=0.25$, $\beta=0.5$}
  244. \label{fig:synth_low_beta}
  245. \end{subfigure}
  246. }
  247. % 1. row, 2.image (high beta)
  248. \put(9.5, 5){
  249. \begin{subfigure}{0.3\textwidth}
  250. \centering
  251. \includegraphics[width=\textwidth]{high_beta_synth.png}
  252. \caption{$\alpha=0.45$, $\beta=0.5$}
  253. \label{fig:synth_high_beta}
  254. \end{subfigure}
  255. }
  256. % 2. row, 1.image (low alpha)
  257. \put(5.5, 0){
  258. \begin{subfigure}{0.3\textwidth}
  259. \centering
  260. \includegraphics[width=\textwidth]{low_alpha_synth.png}
  261. \caption{$\alpha=0.35$, $\beta=0.4$}
  262. \label{fig:synth_low_alpha}
  263. \end{subfigure}
  264. }
  265. % 2. row, 2.image (high alpha)
  266. \put(9.5, 0){
  267. \begin{subfigure}{0.3\textwidth}
  268. \centering
  269. \includegraphics[width=\textwidth]{high_alpha_synth.png}
  270. \caption{$\alpha=0.35$, $\beta=0.6$}
  271. \label{fig:synth_high_alpha}
  272. \end{subfigure}
  273. }
  274. \end{picture}
  275. \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$, $I_0=10$ with different sets of parameters.
  276. We visualize the case with the reference parameters in (a). In (b) and (c) we keep $\alpha$ constant, while varying
  277. the value of $\beta$. In contrast, (d) and (e) have varying values of $\alpha$.}
  278. \label{fig:synth_sir}
  279. \end{figure}
  280. In the SIR model the temporal occurrence and the height of the peak (or peaks)
  281. of the infectious group are of paramount importance for understanding the
  282. dynamics of a pandemic. A low peak occurring at a late point in time indicates
  283. that the disease is unable to keep pace with the rate of recovery, resulting
  284. in its demise before it can exert a significant influence on the population. In
  285. contrast, an early and high peak means that the disease is rapidly transmitted
  286. through the population, with a significant proportion of individuals having been
  287. infected.~\Cref{fig:sir_model} illustrates this effect by varying the values of
  288. $\beta$ or $\alpha$ while simulating a pandemic using a model such
  289. as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
  290. and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
  291. When the number of infections exceeds the number of recoveries, the peak of $I$
  292. will occur early and will be high. On the other hand, if recoveries occur at a
  293. faster rate than new infections the peak will occur later and will be low. Thus,
  294. it is crucial to know both $\beta$ and $\alpha$, as these parameters
  295. characterize how the pandemic evolves.\\
  296. The SIR model makes a number of assumptions that are intended to reduce the
  297. model's overall complexity while simultaneously increasing its divergence from
  298. actual reality. One such assumption is that the size of the population, $N$,
  299. remains constant, as the daily change is negligible to the total population.
  300. This depiction is not an accurate representation of the actual relations \todo{other assumptions in a bad light?}
  301. observed in the real world, as the size of a population is subject to a number
  302. of factors that can contribute to change. The population is increased by the
  303. occurrence of births and decreased by the occurrence of deaths. Other examples
  304. are the impossibility for individuals to be susceptible again, after having
  305. recovered, or the possibility for the transition rates to change due to new
  306. variants or the implementation of new countermeasures. We address this latter
  307. option in the next~\Cref{sec:pandemicModel:rsir}.
  308. % -------------------------------------------------------------------
  309. \subsection{Reduced SIR Model and the Reproduction Number 1}
  310. \label{sec:pandemicModel:rsir}
  311. The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. This model
  312. contains two scalar parameters $\beta$ and $\alpha$, which describe the course
  313. of a pandemic over its duration. This is beneficial when examining the overall
  314. pandemic; however, in the real world, disease behavior is dynamic, and the
  315. values of the parameters $\beta$ and $\alpha$ change throughout the course of
  316. the disease. The reason for this is due to events such as the implementation of
  317. countermeasures that reduce the contact between the infectious and susceptible
  318. individuals, the emergence of a new variant of the disease that increases its
  319. infectivity or deadliness, or the administration of a vaccination that provides
  320. previously susceptible individuals with immunity without ever being infected. \todo{sai correction -> is this point not already included?}
  321. To address this Millevoi \etal~\cite{Millevoi2023} introduce a model that \todo{are there older sources}
  322. simultaneously reduces the size of the system of differential equations and
  323. solves the problem of time scaling at hand.\\
  324. First, they alter the definition of $\beta$ and $\alpha$ to be dependent on the time interval
  325. $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
  326. \begin{equation}
  327. \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
  328. \end{equation}
  329. Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
  330. span an infected individual requires to recuperate. Subsequently, at the initial time point
  331. $t_0$, the \emph{reproduction number},
  332. \begin{equation}
  333. \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
  334. \end{equation}
  335. represents the number of susceptible individuals, that one infectious individual
  336. infects at the onset of the pandemic. In light of the effects of $\beta$ and
  337. $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO > 1$ indicates that the
  338. pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
  339. limited number of infections resulting from $I(t_0) << S(t_0)$.\\ Further,
  340. $\RO < 1$ leads to the disease spreading rapidly across the population, with an
  341. increase in $I$ occurring at a high rate. Nevertheless, $\RO$ does not cover
  342. the entire time span. For this reason, Millevoi \etal~\cite{Millevoi2023}
  343. introduce $\Rt$ which has the same interpretation as $\RO$, with the exception
  344. that $\Rt$ is dependent on time. The time-dependent reproduction number is
  345. defined as,
  346. \begin{equation}\label{eq:repr_num}
  347. \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N},
  348. \end{equation}
  349. on the time interval $\mathcal{T}$. This definition includes the transition
  350. rates for information about the spread of the disease and information of the
  351. decrease of the ratio of susceptible individuals in the population. In contrast
  352. to $\beta$ and $\alpha$, $\Rt$ is not a parameter but \todo{Sai comment - earlier?}
  353. a state variable in the model and enabling the following reduction of the SIR
  354. model.\\
  355. \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
  356. $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
  357. \begin{equation}
  358. \begin{split}
  359. \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
  360. \frac{dI}{dt} &= -\alpha\Rt I(t),
  361. \end{split}
  362. \end{equation}
  363. is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
  364. characteristic and the reproduction number $\Rt$ (see ~\Cref{eq:repr_num}).
  365. Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
  366. extensive range of values that the SIR groups can assume. Accordingly, they
  367. initially scale the time interval $\mathcal{T}$ using its borders to calculate
  368. the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in[0, 1]$. Subsequently, they
  369. calculate the scaled groups,
  370. \begin{equation}
  371. S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
  372. \end{equation}
  373. using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
  374. variable $I$, results in,
  375. \begin{equation}
  376. \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
  377. \end{equation}
  378. which is a further reduced version of~\Cref{eq:sir}. This less complex
  379. differential equation results in a less complex solution, as it entails the
  380. elimination of a parameter ($\beta$) and the two state variables ($S$ and $R$).
  381. The reduced SIR model, is more precise in applications with a worse data
  382. situation, due to its fewer input variables.
  383. % -------------------------------------------------------------------
  384. \section{Multilayer Perceptron 2}
  385. \label{sec:mlp}
  386. In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
  387. equations in systems, illustrating how they can be utilized to elucidate the
  388. impact of a specific parameter on the system's behavior.
  389. In~\Cref{sec:epidemModel}, we show specific applications of differential
  390. equations in an epidemiological context. The final objective is to solve these
  391. equations by finding a function that fits. Fitting measured data points to
  392. approximate such a function, is one of the multiple methods to achieve this
  393. goal. The \emph{Multilayer Perceptron} (MLP)~\cite{Rumelhart1986} is a
  394. data-driven function approximator. In the following section, we provide a brief
  395. overview of the structure and training of these \emph{neural networks}. For
  396. reference, we use the book \emph{Deep Learning} by Goodfellow
  397. \etal~\cite{Goodfellow-et-al-2016} as a foundation for our explanations.\\
  398. The objective is to develop an approximation method for any function $f^{*}$,
  399. which could be a mathematical function or a mapping of an input vector to the
  400. desired output. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
  401. the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
  402. is the function to approximate. In the year 1958,
  403. Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
  404. a neuron in a neuroscientific sense. The perceptron takes in the input vector
  405. $\boldsymbol{x}$ performs an operation and produces a scalar result. This model
  406. optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
  407. f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
  408. Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
  409. approximating a specific class of functions. Consequently, there is a necessity
  410. for an expansion of the perceptron.\\
  411. As Goodfellow \etal~\cite{Goodfellow-et-al-2016} proceed, the solution to this issue is to decompose $f$ into
  412. a chain structure of the form,
  413. \begin{equation} \label{eq:mlp_char}
  414. f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
  415. \end{equation}
  416. This nested version of a perceptron is a multilayer perceptron. Each
  417. sub-function, designated as $f^{(i)}$, is represented in the structure of an
  418. MLP as a \emph{layer}, which contains a linear mapping and a nonlinear mapping
  419. in form of an \emph{activation function}. A multitude of
  420. \emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
  421. same vector-to-scalar calculation as the perceptron does. Subsequently, a
  422. nonlinear activation function transforms the scalar output into the activation
  423. of the unit. The layers are staggered in the neural network, with each layer
  424. being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
  425. input vector $\boldsymbol{x}$ is provided to each unit of the first layer
  426. $f^{(1)}$, which then gives the results to the units of the second layer
  427. $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
  428. intervening layers, situated between the first and the output layers are the
  429. \emph{hidden layers}. The term \emph{forward propagation} describes the
  430. process of information flowing through the network from the input layer to the
  431. output layer, resulting in a scalar loss. The alternating structure of linear
  432. and nonlinear calculation enables MLP's to approximate any function. As Hornik
  433. \etal~\cite{Hornik1989} proves, MLP's are universal approximators.\\
  434. \begin{figure}[h]
  435. \centering
  436. \includegraphics[width=\textwidth]{MLP.pdf}
  437. \caption{A illustration of an MLP with two hidden layers. Each neuron of a layer
  438. is connected to every neuron of the neighboring layers. The arrow indicates
  439. the direction of the forward propagation.}
  440. \label{fig:mlp_example}
  441. \end{figure}
  442. The term \emph{training} describes the process of optimizing the parameters
  443. $\theta$. In order to undertake training, it is necessary to have a set of
  444. \emph{training data}, which is a set of pairs (also called training points) of
  445. the input data $\boldsymbol{x}$ and its corresponding true solution
  446. $\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
  447. define a \emph{loss function} $\Loss{ }$, using the model prediction
  448. $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
  449. metric for evaluating the extent to which the model deviates from the correct
  450. answer. One common loss function is the \emph{mean square error}
  451. (MSE) loss function. Let $N$ be the number of points in the set of training
  452. data. Then,
  453. \begin{equation} \label{eq:mse}
  454. \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
  455. \end{equation}
  456. calculates the squared difference between each model prediction and true value
  457. of a training and takes the mean across the whole training data. \\
  458. Ultimately, the objective is to utilize this information to optimize the parameters, in order to minimize the
  459. loss. One of the most fundamental optimization strategy is \emph{gradient
  460. descent}. In this process, the derivatives are employed to identify the location
  461. of local or global minima within a function, which lie where the gradient is
  462. zero. Given that a positive gradient
  463. signifies ascent and a negative gradient indicates descent, we must move the
  464. variable by a \emph{learning rate} (step size) in the opposite
  465. direction to that of the gradient. The calculation of the derivatives in respect
  466. to the parameters is a complex task, since our functions is a composition of
  467. many functions (one for each layer). We can address this issue taking advantage
  468. of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
  469. $\hat{\boldsymbol{y}} = f(\boldsymbol{x}; \theta)$ be the model prediction with the
  470. decomposed version $f(\boldsymbol{x}; \theta) = f^{(3)}(w; \theta_3)$ with
  471. $w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
  472. $\boldsymbol{x}$ is the input vector and $\theta_3, \theta_2, \theta_1\subset\theta$.
  473. Then,
  474. \begin{equation}\label{eq:backprop}
  475. \nabla_{\theta_3} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(3)}}\nabla_{\theta_3}f^{(3)},
  476. \end{equation}
  477. is the gradient of $\Loss{ }$ in respect of the parameters $\theta_3$. To obtain
  478. $\nabla_{\theta_2} \Loss{ }$, we have to derive $\nabla_{\theta_3} \Loss{ }$ in
  479. respect to $\theta_2$. The name of this method in the context of neural
  480. networks is \emph{back propagation}~\cite{Rumelhart1986}, as it propagates the
  481. error backwards through the neural network.\\
  482. In practical applications, an optimizer often accomplishes the optimization task
  483. by executing back propagation in the background. Furthermore, modifying the
  484. learning rate during training can be advantageous. For instance, making larger \todo{leave whole paragraph out? - Niklas}
  485. steps at the beginning and minor adjustments at the end. Therefore, schedulers
  486. are implementations algorithms that employ diverse learning rate alteration
  487. strategies.\\
  488. For a more in-depth discussion of practical considerations and additional
  489. details like regularization, we direct the reader to the book
  490. \emph{Deep Learning} by Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next
  491. section will demonstrate the application of neural networks in approximating
  492. solutions to differential systems.
  493. % -------------------------------------------------------------------
  494. \section{Physics Informed Neural Networks 4}
  495. \label{sec:pinn}
  496. In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
  497. wildely recognized tools for approximating any kind of function. In this
  498. section, we apply this capability to create a solver for ODE's and PDE's
  499. as Legaris \etal~\cite{Lagaris1997} describe in their paper. In this approach,
  500. the model learns to approximate a function using provided data points while
  501. leveraging the available knowledge about the problem in the form of a system of
  502. differential equations. The \emph{physics-informed neural network} (PINN)
  503. learns the system of differential equations during training, as it optimizes
  504. its output to align with the equations.\\
  505. In contrast to standard MLP's, PINNs are not only data-driven. The loss term of a PINN comprises two
  506. components. The first term incorporates the equations of the aforementioned prior knowledge to pertinent the problem. As Raissi
  507. \etal~\cite{Raissi2017} propose, the residual of each differential equation in
  508. the system must be minimized in order for the model to optimize its output in accordance with the theory.
  509. We obtain the residual $r_i$, with $i\in\{1, ...,N_d\}$, by rearranging the differential equation and
  510. calculating the difference between the left-hand side and the right-hand side
  511. of the equation. $N_d$ is the number of differential equations in a system. As
  512. Raissi \etal~\cite{Raissi2017} propose the \emph{physics
  513. loss} of a PINN,
  514. \begin{equation}
  515. \mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2,
  516. \end{equation}
  517. takes the input data and the model prediction to calculate the mean square
  518. error of the residuals. The second term, the \emph{observation loss}
  519. $\Loss{obs}$, employs the mean square error of the distances between the
  520. predicted and the true values for each training point. Additionally, the
  521. observation loss may incorporate extra terms of inital and boundary conditions. Let $N_t$
  522. denote the number of training points. Then,
  523. \begin{equation}
  524. \mathcal{L}_{PINN}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
  525. \end{equation}\\
  526. represents the comprehensive loss function of a physics-informed neural network. \\
  527. Given the nature of residuals, calculating the loss term of
  528. $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
  529. calculation of the derivative of the output with respect to the input of
  530. the neural network. As we outline in~\Cref{sec:mlp}, during the process of
  531. back-propagation we calculate the gradients of the loss term in respect to a
  532. layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
  533. the index of the respective layer. By employing
  534. the chain rule of calculus, the algorithm progresses from the output layer
  535. through each hidden layer, ultimately reaching the first layer in order to
  536. compute the respective gradients. The term,
  537. \begin{equation}
  538. \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
  539. \end{equation}
  540. illustrates that, in contrast to the procedure described in~\cref{eq:backprop},
  541. this procedure the \emph{automatic differenciation} goes one step further and
  542. calculates the gradient of the output with respect to the input
  543. $\boldsymbol{x}$. In order to calculate the second derivative
  544. $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
  545. this procedure must be repeated.\\
  546. Above we present a method for approximating functions through the use of
  547. systems of differential equations. As previously stated, we want to find a
  548. solver for systems of differential equations. In problems, where we must solve
  549. an ODE or PDE, we have to find a set of parameters, that satisfies the system
  550. for any input $\boldsymbol{x}$. In terms of the context of PINN's this is the
  551. inverse problem, where we have a set of training data from measurements, for
  552. example, is available along with the respective differential equations but
  553. information about the parameters of the equations is lacking. To address this
  554. challenge, we set these parameters as distinct learnable parameters within the
  555. neural network. This enables the network to utilize a specific value, that
  556. actively influences the physics loss $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$.
  557. During the training phase the optimizer aims to minimize the physics loss,
  558. which should ultimately yield an approximation of the true value.\\
  559. \begin{figure}[h]
  560. \centering
  561. \includegraphics[width=\textwidth]{oscilator.pdf}
  562. \caption{Illustration of of the movement of an oscillating body in the
  563. underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
  564. \label{fig:spring}
  565. \end{figure}
  566. One illustrative example of a potential application for PINN's is the
  567. \emph{damped harmonic oscillator}~\cite{Demtroeder2021}. In this problem, we
  568. displace a body, which is attached to a spring, from its resting position. The
  569. body is subject to three forces: firstly, the inertia exerted by the
  570. displacement $u$, which points in the direction the displacement $u$; secondly
  571. the restoring force of the spring, which attempts to return the body to its
  572. original position and thirdly, the friction force, which points in the opposite
  573. direction of the movement. In accordance with Newton's second law and the
  574. combined influence of these forces, the body exhibits oscillatory motion around
  575. its position of rest. The system is influenced by $m$ the mass of the body,
  576. $\mu$ the coefficient of friction and $k$ the spring constant, indicating the
  577. stiffness of the spring. The residual of the differential equation,
  578. \begin{equation}
  579. m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
  580. \end{equation}
  581. shows relation of these parameters in reference to the problem at hand. As
  582. Tenenbaum and Morris provide, there are three potential solutions to this
  583. issue. However only the \emph{underdamped case} results in an oscillating
  584. movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
  585. PINN to this problem, we require a set of training data $x$. This consists of
  586. pairs of time points and corresponding displacement measurements
  587. $(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
  588. we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
  589. initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$. However, we do
  590. not know the value of the friction $\mu$. In this case the loss function,
  591. \begin{equation}
  592. \mathcal{L}_{osc}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = (u^{(1)}-1)+\frac{du(0)}{dt}+||m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2,
  593. \end{equation}
  594. includes the border conditions, the residual, in which $\mu$ is a learnable
  595. parameter and the observation loss.
  596. % -------------------------------------------------------------------
  597. \subsection{Disease Informed Neural Networks 1}
  598. \label{sec:pinn:dinn}
  599. In this section, we describe the capability of MLP's to solve systems of
  600. differential equations. In~\Cref{sec:pandemicModel:sir}, we describe the SIR
  601. model, which models the relations of susceptible, infectious and removed
  602. individuals and simulates the progress of a disease in a population with a
  603. constant size. A system of differential equations models these relations. Shaier
  604. \etal~\cite{Shaier2021} propose a method to solve the equations of the SIR model
  605. using a PINN, which they call a \emph{disease-informed neural network} (DINN).\\
  606. To solve~\Cref{eq:sir} we need to find the transmission rate $\beta$ and the
  607. recovery rate $\alpha$. As Shaier \etal~\cite{Shaier2021} point out, there are
  608. different approaches to solve this set of equations. For instance, building on
  609. the assumption, that at the beginning one infected individual infects $-n$ other
  610. people, concluding in $\frac{dS(0)}{dt} = -n$. Then,
  611. \begin{equation}
  612. \beta=-\frac{\frac{dS}{dt}}{S_0I_0}
  613. \end{equation}
  614. would calculate the initial transmission rate using the initial size of the
  615. susceptible group $S_0$ and the infectious group $I_0$. The recovery rate, then
  616. could be defined using the amount of days a person between the point of
  617. infection and the start of isolation $d$, $\alpha = \frac{1}{d}$. The analytical
  618. solutions to the SIR models often use heuristic methods and require knowledge
  619. like the sizes $S_0$ and $I_0$. A data-driven approach such as the one that
  620. Shaier \etal~\cite{Shaier2021} propose does not have these problems. Since the
  621. model learns the parameters $\beta$ and $\alpha$ while learning the training
  622. data consisting of the time points $\boldsymbol{t}$, and the corresponding
  623. measured sizes of the groups $\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}$.
  624. Let $\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}$ be the
  625. model predictions of the groups and
  626. $r_S=\frac{d\hat{\boldsymbol{S}}}{dt}+\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}},
  627. r_I=\frac{d\hat{\boldsymbol{I}}}{dt}-\beta \hat{\boldsymbol{S}}\hat{\boldsymbol{I}}+\alpha \hat{\boldsymbol{I}}$
  628. and $r_R=\frac{d \hat{\boldsymbol{R}}}{dt} - \alpha \hat{\boldsymbol{I}}$ the
  629. residuals of each differential equation using the model predictions. Then,
  630. \begin{equation}
  631. \begin{split}
  632. \mathcal{L}_{SIR}() = ||r_S||^2 + ||r_I||^2 + ||r_R||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}||^2 &+\\
  633. ||\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}||^2 &+\\
  634. ||\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}||^2 &,
  635. \end{split}
  636. \end{equation}
  637. is the loss function of a DINN, with $\alpha$ and $beta$ being learnable
  638. parameters.
  639. % -------------------------------------------------------------------