chap02.tex 35 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635
  1. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2. % Author: Phillip Rothenbeck
  3. % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
  4. % File: chap02/chap02.tex
  5. % Part: theoretical background
  6. % Description:
  7. % summary of the content in this chapter
  8. % Version: 05.08.2024
  9. % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  10. \chapter{Theoretical Background 12}
  11. \label{chap:background}
  12. This chapter introduces the theoretical knowledge that forms the foundation of
  13. the work presented in this thesis. In~\Cref{sec:domain}
  14. and~\Cref{sec:differentialEq}, we talk about differential equations and the
  15. underlying theory. In these Sections both the explanations and the approach are
  16. strongly based on the book on analysis by Rudin~\cite{Rudin2007} and the book
  17. about ordinary differential equations by Tenenbaum and
  18. Pollard~\cite{Tenenbaum1985}. Subsequently, we employ this knowledge to examine
  19. various pandemic models in~\Cref{sec:epidemModel}.
  20. Finally, we address the topic of neural networks with a focus on the multilayer
  21. perceptron in~\Cref{sec:mlp} and physics informed neural networks
  22. in~\Cref{sec:pinn}.
  23. % -------------------------------------------------------------------
  24. \section{Mathematical Modelling using Functions 1}
  25. \label{sec:domain}
  26. To model a physical problem using mathematical tools, it is necessary to define
  27. a set of fundamental numbers or quantities upon which the subsequent
  28. calculations will be based. These sets may represent, for instance, a specific
  29. time interval or a distance. The term \emph{domain} describes these fundamental
  30. sets of numbers or quantities~\cite{Rudin2007}. A \emph{variable} is a changing
  31. entity living in a certain domain. In this thesis, we will focus on domains of
  32. real numbers in $\mathbb{R}$.\\
  33. The mapping between variables enables the modeling of the process and depicts
  34. the semantics. We use functions in order to facilitate this mapping. Let
  35. $A, B\subset\mathbb{R}$ be to subsets of the real numbers, then we define a
  36. function as the mapping
  37. \begin{equation}
  38. f: A\rightarrow B.
  39. \end{equation}
  40. In other words, the function $f$ maps elements $x\in A$ to values
  41. $f(x)\in B$. $A$ is the \emph{domain} of $f$, while $B$ is the \emph{codomain}
  42. of $f$. Functions are capable of representing the state of a system as a value
  43. based on an input value from their domain. One illustrative example is a
  44. function that maps a time point to the distance covered since a starting point.
  45. In this case, time serves as the domain, while the distance is the codomain.
  46. % -------------------------------------------------------------------
  47. \section{Basics of Differential Equations 1}
  48. \label{sec:differentialEq}
  49. Often, the change of a system is more interesting than its current state.
  50. Functions are able to give us the latter, but only passively give information
  51. about the change of a system. The objective is to determine an effective method
  52. for calculating the change of a function across its domain. Let $f$ be a
  53. function and $[a, b]\subset \mathbb{R}$ an interval of real numbers, the
  54. expression
  55. \begin{equation}
  56. m = \frac{f(b) - f(a)}{a-b}
  57. \end{equation}
  58. gives the average rate of change. While the average rate of change is useful in
  59. many cases, the momentary rate of change is more accurate. To calculate this,
  60. we need to narrow down, the interval to an infinitesimal. For each $x\in[a, b]$
  61. we calculate
  62. \begin{equation} \label{eqn:differential}
  63. \frac{df}{dx} = \lim_{t\to x} \frac{f(t) - f(x)}{t-x},
  64. \end{equation}
  65. if it exists. $\frac{df}{dx}$ is the \emph{derivative}, or
  66. \emph{differential equation}, it returns the momentary rate of change of $f$ for
  67. each value $x$ of $f$'s domain. Repeating this process on $\frac{df}{dx}$ yields
  68. $\frac{d^2f}{dx^2}$, which is the function that calculates the rate of change of
  69. the rate of change and is called the second order derivative. Iterating this $n$
  70. times results in $\frac{d^nf}{dx^n}$, the derivative of the $n$'th order.
  71. Another method for obtaining a differential equation is to create it from the
  72. semantics of a problem. This method is useful if no basic function exists for a
  73. system. Differential equations find application in several areas such as
  74. engineering, physics, economics, epidemiology, and beyond.\\
  75. \todo{Here insert definition of differential equations (take from books)}
  76. In the context of functions, it is possible to have multiple domains, meaning
  77. that function has more than one parameter. To illustrate, consider a function
  78. operating in two-dimensional space, wherein each parameter represents one axis
  79. or one that, employs with time and locations as inputs. The term
  80. \emph{partial differential equations} (\emph{PDE}'s) describes differential
  81. equations of such functions, which require a derivative for each of their
  82. domains. In contrast, \emph{ordinary differential equations} (\emph{ODE}'s) are
  83. the single derivatives for a function having only one domain. In this thesis, we
  84. only need ODE's.\\
  85. A \emph{system of differential equations} is the name for a set of differential
  86. equations. The derivatives in a system of differential equations each have their
  87. own codomain, which is part of the problem, while they all share the same
  88. domain.\\
  89. Tenenbaum and Pollard~\cite{Tenenbaum1985} provide many examples for ODE's,
  90. including the \emph{Motion of a Particle Along a Straight Line}. Further,
  91. Newton's second law states that ``the rate of change of the momentum of a body
  92. ($momentum = mass \cdot velocity$) is proportional to the resultant external
  93. force $F$ acted upon it''~\cite{Tenenbaum1985}. Let $m$ be the mass of the body
  94. in kilograms, $v$ its velocity in meters per second and $t$ the time in seconds.
  95. Then, Newton's second law translates mathematically to
  96. \begin{equation} \label{eq:newtonSecLaw}
  97. F = m\frac{dv}{dt}.
  98. \end{equation}
  99. It is evident that the acceleration, $a=\frac{dv}{dt}$, as the rate of change of
  100. the velocity is part of the equation. Additionally, the velocity of a body is
  101. the derivative of the distance traveled by that body. Based on these findings,
  102. we can rewrite the~\Cref{eq:newtonSecLaw} to
  103. \begin{equation}
  104. F=ma=m\frac{d^2s}{dt^2}.
  105. \end{equation}\\
  106. This explanation of differential equations focuses on the aspects deemed crucial
  107. for this thesis and is not intended to be a complete explanation of the subject.
  108. To gain a better understanding of it, we recommend the books mentioned
  109. above~\cite{Rudin2007,Tenenbaum1985}. In the following section we
  110. describe the application of these principles in epidemiological models.
  111. % -------------------------------------------------------------------
  112. \section{Epidemiological Models 3}
  113. \label{sec:epidemModel}
  114. Pandemics, like \emph{COVID-19}, which has resulted in a significant
  115. number of fatalities. The question arises: How should we fight a pandemic
  116. correctly? Also, it is essential to study whether the employed countermeasures
  117. efficacious in combating the pandemic. Given the unfavorable public response to
  118. measures such as lockdowns, it is imperative to investigate that their efficacy
  119. remains commensurate with the costs incurred to those affected. In the event
  120. that alternative and novel technologies were in use, such as the mRNA vaccines
  121. in the context of COVID-19, it is needful to test the effect and find the
  122. optimal variant. In order to shed light on the aforementioned events we need to
  123. develop a method to quantize the pandemic along with its course of
  124. progression.\\
  125. The real world is a highly complex system, which presents a significant
  126. challenge attempting to describe it fully in a model. Therefore, the model must
  127. reduce the complexity while retaining the essential information. Furthermore, it
  128. must address the issue of limited data availability. For instance, during
  129. COVID-19 institutions such as the Robert Koch Institute
  130. (RKI)\footnote[1]{\url{https://www.rki.de/EN/Home/homepage_node.html}} were only
  131. able to collect data on infections and mortality cases. Consequently, we require
  132. a model that employs an abstraction of the real world to illustrate the events
  133. and relations that are pivotal to understanding the problem.
  134. % -------------------------------------------------------------------
  135. \subsection{SIR Model 2}
  136. \label{sec:pandemicModel:sir}
  137. In 1927, Kermack and McKendrick~\cite{1927} introduced the \emph{SIR Model},
  138. which subsequently became one of the most influential epidemiological models.
  139. This model enables the modeling of infections during epidemiological events such as pandemics.
  140. The book \emph{Mathematical Models in Biology}~\cite{EdelsteinKeshet2005}
  141. reiterates the model and serves as the foundation for the following explanation
  142. of SIR models.\\
  143. The SIR model is capable of illustrating diseases, which are transferred through
  144. contact or proximity of an individual carrying the illness and a healthy
  145. individual. This is possible due to the distinction between infected beings
  146. who are carriers of the disease and the part of the population, which is
  147. susceptible to infection. In the model, the mentioned groups are capable to
  148. change, \eg, healthy individuals becoming infected. The model assumes the
  149. size $N$ of the population remains constant throughout the duration of the
  150. pandemic. The population $N$ comprises three distinct groups: the
  151. \emph{susceptible} group $S$, the \emph{infectious} group $I$ and the
  152. \emph{removed} group $R$ (hence SIR model). Let $\mathcal{T} = [t_0, t_f]\subseteq
  153. \mathbb{R}_{\geq0}$ be the time span of the pandemic, then,
  154. \begin{equation}
  155. S: \mathcal{T}\rightarrow\mathbb{N}, \quad I: \mathcal{T}\rightarrow\mathbb{N}, \quad R: \mathcal{T}\rightarrow\mathbb{N},
  156. \end{equation}
  157. give the values of $S$, $I$ and $R$ at a certain point of time
  158. $t\in\mathcal{T}$. For $S$, $I$, $R$ and $N$ applies:
  159. \begin{equation} \label{eq:N_char}
  160. N = S + I + R.
  161. \end{equation}
  162. The model makes another assumption by stating that recovered people are immune
  163. to the illness and infectious individual can not infect them. The individuals in
  164. the $R$ group are either recovered or deceased, and thus unable to transmit or
  165. carry the disease.
  166. \begin{figure}[h]
  167. \centering
  168. \includegraphics[scale=0.87]{sir_graph.pdf}
  169. \caption{A visualization of the SIR model, illustrating $N$ being split in the
  170. three groups $S$, $I$ and $R$.}
  171. \label{fig:sir_model}
  172. \end{figure}
  173. As visualized in the~\Cref{fig:sir_model} the
  174. individuals may transition between groups based on transition rates. The
  175. transmission rate $\beta$ is responsible for individuals becoming infected,
  176. while the rate of removal or recovery rate $\alpha$ (also referred to as
  177. $\delta$ or $\nu$, \eg,~\cite{EdelsteinKeshet2005,Millevoi2023}) moves
  178. individuals from $I$ to $R$.\\
  179. We can describe this problem mathematically using a system of differential
  180. equations (see ~\Cref{sec:differentialEq}). Thus, Kermack and
  181. McKendrick~\cite{1927} propose the following set of differential equations:
  182. \begin{equation}\label{eq:sir}
  183. \begin{split}
  184. \frac{dS}{dt} &= -\beta S I,\\
  185. \frac{dI}{dt} &= \beta S I - \alpha I,\\
  186. \frac{dR}{dt} &= \alpha I.
  187. \end{split}
  188. \end{equation}
  189. This, according to Edelstein-Keshet, is based on the following assumption:
  190. ``The rate of transmission of a microparasitic disease is proportional to the
  191. rate of encounter of susceptible and infective individuals modelled by the
  192. product ($\beta S I$)''~\cite{EdelsteinKeshet2005}. The system shows the change
  193. in size of the groups per time unit due to infections, recoveries, and deaths.\\
  194. The term $\beta SI$ describes the rate of encounters of susceptible and infected
  195. individuals. This term is dependent on the size of $S$ and $I$, thus Anderson
  196. and May~\cite{Anderson1991} propose a modified model:
  197. \begin{equation}\label{eq:modSIR}
  198. \begin{split}
  199. \frac{dS}{dt} &= -\beta \frac{SI}{N},\\
  200. \frac{dI}{dt} &= \beta \frac{SI}{N} - \alpha I,\\
  201. \frac{dR}{dt} &= \alpha I.
  202. \end{split}
  203. \end{equation}
  204. In which $\beta SI$ gets normalized by $N$, which is more correct in a
  205. real world aspect~\cite{Anderson1991}.\\
  206. The initial phase of a pandemic is characterized by the infection of a small
  207. number of individuals, while the majority of the population remains susceptible.
  208. The infectious group has not yet infected any individuals thus
  209. neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
  210. the number of infected individuals at the beginning of the disease. Then,
  211. \begin{equation}
  212. \begin{split}
  213. S(0) &= N - I_{0},\\
  214. I(0) &= I_{0},\\
  215. R(0) &= 0,
  216. \end{split}
  217. \end{equation}
  218. describes the initial configuration of a system in which a disease has just
  219. emerged.\\
  220. \begin{figure}[h]
  221. %\centering
  222. \setlength{\unitlength}{1cm} % Set the unit length for coordinates
  223. \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
  224. % reference
  225. \put(0, 1.75){
  226. \begin{subfigure}{0.4\textwidth}
  227. \centering
  228. \includegraphics[width=\textwidth]{reference_params_synth.png}
  229. \caption{$\alpha=0.35$, $\beta=0.5$}
  230. \label{fig:synth_norm}
  231. \end{subfigure}
  232. }
  233. % 1. row, 1.image (low beta)
  234. \put(5.5, 5){
  235. \begin{subfigure}{0.3\textwidth}
  236. \centering
  237. \includegraphics[width=\textwidth]{low_beta_synth.png}
  238. \caption{$\alpha=0.25$, $\beta=0.5$}
  239. \label{fig:synth_low_beta}
  240. \end{subfigure}
  241. }
  242. % 1. row, 2.image (high beta)
  243. \put(9.5, 5){
  244. \begin{subfigure}{0.3\textwidth}
  245. \centering
  246. \includegraphics[width=\textwidth]{high_beta_synth.png}
  247. \caption{$\alpha=0.45$, $\beta=0.5$}
  248. \label{fig:synth_high_beta}
  249. \end{subfigure}
  250. }
  251. % 2. row, 1.image (low alpha)
  252. \put(5.5, 0){
  253. \begin{subfigure}{0.3\textwidth}
  254. \centering
  255. \includegraphics[width=\textwidth]{low_alpha_synth.png}
  256. \caption{$\alpha=0.35$, $\beta=0.4$}
  257. \label{fig:synth_low_alpha}
  258. \end{subfigure}
  259. }
  260. % 2. row, 2.image (high alpha)
  261. \put(9.5, 0){
  262. \begin{subfigure}{0.3\textwidth}
  263. \centering
  264. \includegraphics[width=\textwidth]{high_alpha_synth.png}
  265. \caption{$\alpha=0.35$, $\beta=0.6$}
  266. \label{fig:synth_high_alpha}
  267. \end{subfigure}
  268. }
  269. \end{picture}
  270. \caption{Synthetic data, using~\Cref{eq:modSIR} and $N=7.9\cdot 10^6$, $I_0=10$ with different sets of parameters.}
  271. \label{fig:synth_sir}
  272. \end{figure}
  273. In the SIR model the temporal occurrence and the height of the peak (or peaks)
  274. of the infectious group are of paramount importance for understanding the
  275. dynamics of a pandemic. A low peak occurring at a late point in time indicates
  276. that the disease is unable to keep pace with the rate of recovery, resulting
  277. in its demise before it can exert a significant influence on the population. In
  278. contrast, an early and high peak means that the disease is rapidly transmitted
  279. through the population, with a significant proportion of individuals having been
  280. infected.~\Cref{fig:sir_model} illustrates the impact of modifying either
  281. $\beta$ or $\alpha$ while simulating a pandemic using a model such
  282. as~\Cref{eq:modSIR}. It is evident that both the transmission rate $\beta$
  283. and the recovery rate $\alpha$ influence the height and time of the peak of $I$.
  284. When the number of infections exceeds the number of recoveries, the peak of $I$
  285. will occur early and will be high. On the other hand, if recoveries occur at a
  286. faster rate than new infections the peak will occur later and will be low. This
  287. means, that it is crucial to know both $\beta$ and $\alpha$ to be able to
  288. simulate a pandemic using the SIR model.\\
  289. The SIR model makes a number of assumptions that are intended to reduce the
  290. model's overall complexity while simultaneously increasing its divergence from
  291. actual reality. One such assumption is that the size of the population, $N$,
  292. remains constant. This depiction is not an accurate representation of the actual
  293. relations observed in the real world, as the size of a population is subject to
  294. a number of factors that can contribute to change. The population is increased
  295. by the occurrence of births and decreased by the occurrence of deaths. There are
  296. different reasons for mortality, including the natural aging process or the
  297. development of other diseases. Other examples are the absence of the possibility
  298. for individuals to be susceptible again, after having recovered, or the
  299. possibility for the transition rates to change due to new variants or the
  300. implementation of new countermeasures. We address this latter option in the
  301. next~\Cref{sec:pandemicModel:rsir}.
  302. % -------------------------------------------------------------------
  303. \subsection{Reduced SIR Model and the Reproduction Number 1}
  304. \label{sec:pandemicModel:rsir}
  305. The~\Cref{sec:pandemicModel:sir} presents the classical SIR model. The model
  306. comprises two parameters $\beta$ and $\alpha$, which describe the course of a
  307. pandemic over its duration. This is beneficial when examining the overall
  308. pandemic; however, in the real world, disease behavior is dynamic, and the
  309. values of the parameters $\beta$ and $\alpha$ change at each time point. The
  310. reason for this is due to events such as the implementation of countermeasures
  311. that reduce the contact between the infectious and susceptible individuals, the
  312. emergence of a new variant of the disease that increases its infectivity or
  313. deadliness, or the administration of a vaccination that provides previously
  314. susceptible individuals with immunity without ever being infectious. To address
  315. this Millevoi \etal~\cite{Millevoi2023} introduce a model that simultaneously
  316. reduces the size of the system of differential equations and solves the problem
  317. of time scaling at hand.\\
  318. First, they alter the definition of $\beta$ and $\alpha$ to be dependent on the time interval
  319. $\mathcal{T} = [t_0, t_f]\subseteq \mathbb{R}_{\geq0}$,
  320. \begin{equation}
  321. \beta: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}, \quad\alpha: \mathcal{T}\rightarrow\mathbb{R}_{\geq0}.
  322. \end{equation}
  323. Another crucial element is $D(t) = \frac{1}{\alpha(t)}$, which represents the initial time
  324. span an infected individual requires to recuperate. Subsequently, at the initial time point
  325. $t_0$, the \emph{reproduction number},
  326. \begin{equation}
  327. \RO = \beta(t_0)D(t_0) = \frac{\beta(t_0)}{\alpha(t_0)},
  328. \end{equation}
  329. represents the number of susceptible individuals, that one infectious individual
  330. infects at the onset of the pandemic.In light of the effects of $\beta$ and
  331. $\alpha$ (see~\Cref{sec:pandemicModel:sir}), $\RO > 1$ indicates that the
  332. pandemic is emerging. In this scenario $\alpha$ is relatively low due to the
  333. limited number of infections resulting from $I(t_0) << S(t_0)$. When $\RO < 1$,
  334. the disease is spreading rapidly across the population, with an increase in $I$
  335. occurring at a high rate. Nevertheless, $\RO$ does not cover the entire time
  336. span. For this reason, Millevoi \etal~\cite{Millevoi2023} introduce $\Rt$
  337. which has the same interpretation as $\RO$, with the exception that $\Rt$ is
  338. dependent on time. The definition of the time-dependent reproduction number on
  339. the time interval $\mathcal{T}$ with the population size $N$,
  340. \begin{equation}\label{eq:repr_num}
  341. \Rt=\frac{\beta(t)}{\alpha(t)}\cdot\frac{S(t)}{N}
  342. \end{equation}
  343. includes the rates of change for information about the spread of the disease and
  344. information of the decrease of the ratio of susceptible individuals in the
  345. population. In contrast to $\beta$ and $\alpha$, $\Rt$ is not a parameter but
  346. a state variable in the model and enabling the following reduction of the SIR
  347. model.\\
  348. \Cref{eq:N_char} allows for the calculation of the value of the group $R$ using
  349. $S$ and $I$, with the term $R(t)=N-S(t)-I(t)$. Thus,
  350. \begin{equation}
  351. \begin{split}
  352. \frac{dS}{dt} &= \alpha(\Rt-1)I(t),\\
  353. \frac{dI}{dt} &= -\alpha\Rt I(t),
  354. \end{split}
  355. \end{equation}
  356. is the reduction of~\Cref{eq:sir} on the time interval $\mathcal{T}$ using this
  357. characteristic and the reproduction number \Rt (see ~\Cref{eq:repr_num}).
  358. Another issue that Millevoi \etal~\cite{Millevoi2023} seek to address is the
  359. extensive range of values that the SIR groups can assume, spanning from $0$ to
  360. $10^7$. Accordingly, they initially scale the time interval $\mathcal{T}$ using
  361. its borders to calculate the scaled time $t_s = \frac{t - t_0}{t_f - t_0}\in
  362. [0, 1]$. Subsequently, they calculate the scaled groups,
  363. \begin{equation}
  364. S_s(t_s) = \frac{S(t)}{C},\quad I_s(t_s) = \frac{I(t)}{C},\quad R_s(t_s) = \frac{R(t)}{C},
  365. \end{equation}
  366. using a large constant scaling factor $C\in\mathbb{N}$. Applying this to the
  367. variable $I$, results in,
  368. \begin{equation}
  369. \frac{dI_s}{dt_s} = \alpha(t_f - t_0)(\Rt - 1)I_s(t_s),
  370. \end{equation}
  371. a further reduced version of~\Cref{eq:sir} results in a more streamlined and
  372. efficient process, as it entails the elimination of a parameter($\beta$) and two
  373. state variables ($S$ and $R$), while adding the state variable $\Rt$. This is a
  374. crucial aspect for the automated resolution of such differential equation
  375. systems, as we describe in~\Cref{sec:mlp}.
  376. % -------------------------------------------------------------------
  377. \section{Multilayer Perceptron 2}
  378. \label{sec:mlp}
  379. In~\Cref{sec:differentialEq}, we demonstrate the significance of differential
  380. equations in systems, illustrating how they can be utilized to elucidate the
  381. impact of a specific parameter on the system's behavior.
  382. In~\Cref{sec:epidemModel}, we show specific applications of differential
  383. equations in an epidemiological context. The final objective is to solve these
  384. equations. For this problem, there are multiple methods to achieve this goal. On
  385. such method is the \emph{Multilayer Perceptron} (MLP)~\cite{Hornik1989}. In the
  386. following section, we provide a brief overview of the structure and training of
  387. these \emph{neural networks}. For reference, we use the book \emph{Deep Learning}
  388. by Goodfellow \etal~\cite{Goodfellow-et-al-2016} as a foundation for our
  389. explanations.\\
  390. The objective is to develop an approximation method for any function $f^{*}$,
  391. which could be a mathematical function or a mapping of an input vector to a
  392. class or category. Let $\boldsymbol{x}$ be the input vector and $\boldsymbol{y}$
  393. the label, class, or result. Then, $\boldsymbol{y} = f^{*}(\boldsymbol{x})$,
  394. is the function to approximate. In the year 1958,
  395. Rosenblatt~\cite{Rosenblatt1958} proposed the perceptron modeling the concept of
  396. a neuron in a neuroscientific sense. The perceptron takes in the input vector
  397. $\boldsymbol{x}$ performs an operation and produces a scalar result. This model
  398. optimizes its parameters $\theta$ to be able to calculate $\boldsymbol{y} =
  399. f(\boldsymbol{x}; \theta)$ as accurately as possible. As Minsky and
  400. Papert~\cite{Minsky1972} demonstrate, the perceptron is only capable of
  401. approximating a specific class of functions. Consequently, there is a necessity
  402. for an expansion of the perceptron.\\
  403. As Goodfellow \etal proceed, the solution to this issue is to decompose $f$ into
  404. a chain structure of the form,
  405. \begin{equation} \label{eq:mlp_char}
  406. f(\boldsymbol{x}) = f^{(3)}(f^{(2)}(f^{(1)}(\boldsymbol{x}))).
  407. \end{equation}
  408. This converts a perceptron, which has only two layers (an input and an output
  409. layer), into a multilayer perceptron. Each sub-function, designated as $f^{(i)}$,
  410. is represented in the structure of an MLP as a \emph{layer}. A multitude of
  411. \emph{Units} (also \emph{neurons}) compose each layer. A neuron performs the
  412. same vector-to-scalar calculation as the perceptron does. Subsequently, a
  413. nonlinear activation function transforms the scalar output into the activation
  414. of the unit. The layers are staggered in the neural network, with each layer
  415. being connected to its neighbors, as illustrated in~\Cref{fig:mlp_example}. The
  416. input vector $\boldsymbol{x}$ is provided to each unit of the first layer
  417. $f^{(1)}$, which then gives the results to the units of the second layer
  418. $f^{(2)}$, and so forth. The final layer is the \emph{output layer}. The
  419. intervening layers, situated between the first and the output layers are the
  420. \emph{hidden layers}. The alternating structure of linear and nonlinear
  421. calculation enables MLP's to approximate any function. As Hornik
  422. \etal~\cite{Hornik1989} demonstrate, MLP's are universal approximators.\\
  423. \begin{figure}[h]
  424. \centering
  425. \includegraphics[scale=0.87]{MLP.pdf}
  426. \caption{A visualization of the SIR model, illustrating $N$ being split in the
  427. three groups $S$, $I$ and $R$.}
  428. \label{fig:mlp_example}
  429. \end{figure}
  430. \todo{caption}
  431. The term \emph{training} describes the process of optimizing the parameters
  432. $\theta$. In order to undertake training, it is necessary to have a set of
  433. \emph{training data}, which is a set of pairs (also called training points) of
  434. the input data $\boldsymbol{x}$ and its corresponding true solution
  435. $\boldsymbol{y}$ of the function $f^{*}$. For the training process we must
  436. define the \emph{loss function} $\Loss{ }$, using the model prediction
  437. $\hat{\boldsymbol{y}}$ and the true value $\boldsymbol{y}$, which will act as a
  438. metric for evaluating the extent to which the model deviates from the correct
  439. answer. One of the most common loss function is the \emph{mean square error}
  440. (MSE) loss function. Let $N$ be the number of points in the set of training
  441. data. Then,
  442. \begin{equation} \label{eq:mse}
  443. \Loss{MSE} = \frac{1}{N}\sum_{i=1}^{N} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
  444. \end{equation}
  445. calculates the squared difference between each model prediction and true value
  446. of a training and takes the mean across the whole training data. \\
  447. In the context of neural networks, \emph{forward propagation} describes the
  448. process of information flowing through the network from the input layer to the
  449. output layer, resulting in a scalar loss. Ultimately, the objective is to
  450. utilize this information to optimize the parameters, in order to minimize the
  451. loss. One of the most fundamental optimization strategy is \emph{gradient
  452. descent}. In this process, the derivatives are employed to identify the location
  453. of local or global minima within a function. Given that a positive gradient
  454. signifies ascent and a negative gradient indicates descent, we must move the
  455. variable by a constant \emph{learning rate} (step size) in the opposite
  456. direction to that of the gradient. The calculation of the derivatives in respect
  457. to the parameters is a complex task, since our functions is a composition of
  458. many functions (one for each layer). We can address this issue taking advantage
  459. of~\Cref{eq:mlp_char} and employing the chain rule of calculus. Let
  460. $\hat{\boldsymbol{y}} = f(w; \theta)$ be the model prediction with
  461. $w = f^{(2)}(z; \theta_2)$ and $z = f^{(1)}(\boldsymbol{x}; \theta_1)$.
  462. $\boldsymbol{x}$ is the input vector and $\theta_1, \theta_2\subset\theta$.
  463. Then,
  464. \begin{equation}\label{eq:backprop}
  465. \nabla_{\theta_1} \Loss{ } = \frac{d\mathcal{L}}{d\hat{\boldsymbol{y}}}\frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\theta_1}f^{(1)},
  466. \end{equation}
  467. is the gradient of $\Loss{ }$ in respect of the parameters $\theta_1$. The name
  468. of this method in the context of neural networks is \emph{back propagation}. \todo{Insert source}\\
  469. In practical applications, an optimizer often accomplishes the optimization task
  470. by executing gradient descent in the background. Furthermore, modifying the
  471. learning rate during training can be advantageous. For instance, making larger
  472. steps at the beginning and minor adjustments at the end. Therefore, schedulers
  473. are implementations algorithms that employ diverse learning rate alteration
  474. strategies.\\
  475. This section provides an overview of basic concepts of neural networks. For a
  476. deeper understanding, we direct the reader to the book \emph{Deep Learning} by
  477. Goodfellow \etal~\cite{Goodfellow-et-al-2016}. The next section will demonstrate
  478. the application of neural networks in approximating solutions to differential
  479. systems.
  480. % -------------------------------------------------------------------
  481. \section{Physics Informed Neural Networks 5}
  482. \label{sec:pinn}
  483. In~\Cref{sec:mlp}, we describe the structure and training of MLP's, which are
  484. wildely recognized tools for approximating any kind of function. In this
  485. section, we apply this capability to create a solver for ODE's and PDE's
  486. as Legaris \etal~\cite{Lagaris1997} describe in their paper. In this approach,
  487. the model learns to approximate a function using provided data points while
  488. leveraging the available knowledge about the problem in the form of a system of
  489. differential equations. The \emph{physics-informed neural network} (PINN)
  490. learns the system of differential equations during training, as it optimizes
  491. its output to align with the equations.\\
  492. In contrast to standard MLP's, the loss term of a PINN comprises two
  493. components. The first term incorporates the aforementioned prior knowledge to pertinent the problem. As Raissi
  494. \etal~\cite{Raissi2017} propose, the residual of each differential equation in
  495. the system must be minimized in order for the model to optimize its output in accordance with the theory.
  496. We obtain the residual $R_i$, with $i\in\{1, ...,N_d\}$, by rearranging the differential equation and
  497. calculating the difference between the left-hand side and the right-hand side
  498. of the equation. $N_d$ is the number of differential equations in a system. As
  499. Raissi \etal~\cite{Raissi2017} propose the \emph{physics
  500. loss} of a PINN,\todo{check source again}
  501. \begin{equation}
  502. \mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2,
  503. \end{equation}
  504. takes the input data and the model prediction to calculate the mean square
  505. error of the residuals. The second term, the \emph{observation loss}
  506. $\Loss{obs}$, employs the mean square error of the distances between the
  507. predicted and the true values for each training point. Additionally, the
  508. observation loss may incorporate extra terms of inital and boundary conditions. Let $N_t$
  509. denote the number of training points. Then,
  510. \begin{equation}
  511. \mathcal{L}_{PINN}(\boldsymbol{x}, \boldsymbol{y},\hat{\boldsymbol{y}}) = \frac{1}{N_d}\sum_{i=1}^{N_d} ||r_i(\boldsymbol{x},\hat{\boldsymbol{y}})||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{y}}^{(i)}-\boldsymbol{y}^{(i)}||^2,
  512. \end{equation}\\
  513. represents the comprehensive loss function of a physics-informed neural network. \\
  514. \todo{check for correctness}
  515. Given the nature of residuals, calculating the loss term of
  516. $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$ requires the
  517. calculation of the derivative of the output with respect to the input of
  518. the neural network. As we outline in~\Cref{sec:mlp}, during the process of
  519. back-propagation we calculate the gradients of the loss term in respect to a
  520. layer-specific set of parameters denoted by $\theta_l$, where $l$ represents
  521. the index of the \todo{check for consistency} respective layer. By employing
  522. the chain rule of calculus, the algorithm progresses from the output layer
  523. through each hidden layer, ultimately reaching the first layer in order to
  524. compute the respective gradients. The term,
  525. \begin{equation}
  526. \nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} = \frac{d\hat{\boldsymbol{y}}}{df^{(2)}}\frac{df^{(2)}}{df^{(1)}}\nabla_{\boldsymbol{x}}f^{(1)},
  527. \end{equation}
  528. illustrates that, in contrast to the procedure described in~\cref{eq:backprop},
  529. this procedure the \emph{automatic differenciation} goes one step further and
  530. calculates the gradient of the output with respect to the input
  531. $\boldsymbol{x}$. In order to calculate the second derivative
  532. $\frac{d\hat{\boldsymbol{y}}}{d\boldsymbol{x}}=\nabla_{\boldsymbol{x}} (\nabla_{\boldsymbol{x}} \hat{\boldsymbol{y}} ),$
  533. this procedure must be repeated.\\
  534. Above we present a method for approximating functions through the use of
  535. systems of differential equations. As previously stated, we want to find a
  536. solver for systems of differential equations. In problems, where we must solve
  537. an ODE or PDE, we have to find a set of parameters, that satisfies the system
  538. for any input $\boldsymbol{x}$. In terms of the context of PINN's this is the
  539. inverse problem, where we have a set of training data from measurements, for
  540. example, is available along with the respective differential equations but
  541. information about the parameters of the equations is lacking. To address this
  542. challenge, we set these parameters as distinct learnable parameters within the
  543. neural network. This enables the network to utilize a specific value, that
  544. actively influences the physics loss $\mathcal{L}_{physics}(\boldsymbol{x},\hat{\boldsymbol{y}})$.
  545. During the training phase the optimizer aims to minimize the physics loss,
  546. which should ultimately yield an approximation of the true value.\\
  547. \begin{figure}[h]
  548. \centering
  549. \includegraphics[scale=0.87]{oscilator.pdf}
  550. \caption{Illustration of of the movement of an oscillating body in the
  551. underdamped case. With $m=1kg$, $\mu=4\frac{Ns}{m}$ and $k=200\frac{N}{m}$.}
  552. \label{fig:spring}
  553. \end{figure}
  554. One illustrative example of a potential application for PINN's is the
  555. \emph{damped harmonic oscillator}~\cite{Tenenbaum1985}. In this problem, we \todo{check source for wording}
  556. displace a body, which is attached to a spring, from its resting position. The
  557. body is subject to three forces: firstly, the inertia exerted by the
  558. displacement $u$, which points in the direction the displacement $u$; secondly
  559. the restoring force of the spring, which attempts to return the body to its
  560. original position and thirdly, the friction force, which points in the opposite
  561. direction of the movement. In accordance with Newton's second law and the
  562. combined influence of these forces, the body exhibits oscillatory motion around
  563. its position of rest. The system is influenced by $m$ the mass of the body,
  564. $\mu$ the coefficient of friction and $k$ the spring constant, indicating the
  565. stiffness of the spring. The residual of the differential equation, \todo{check in book}
  566. \begin{equation}
  567. m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku=0,
  568. \end{equation}
  569. shows relation of these parameters in reference to the problem at hand. As
  570. Tenenbaum and Morris provide, there are three potential solutions to this
  571. issue. However only the \emph{underdamped case} results in an oscillating
  572. movement of the body, as illustrated in~\Cref{fig:spring}. In order to apply a
  573. PINN to this problem, we require a set of training data $x$. This consists of
  574. pairs of timepoints and corresponding displacement measurements
  575. $(t^{(i)}, u^{(i)})$, where $i\in\{1, ..., N_t\}$. In this hypothetical case,
  576. we know the mass $m=1kg$, and the spring constant $k=200\frac{N}{m}$ and the
  577. initial displacement $u^{(1)} = 1$ and $\frac{du(0)}{dt} = 0$, However, we do
  578. not know the value of the friction $\mu$. In this case the loss function,
  579. \begin{equation}
  580. \mathcal{L}_{osc}(\boldsymbol{x}, \boldsymbol{u}, \hat{\boldsymbol{u}}) = (u^{(1)}-1)+\frac{du(0)}{dt}+||m\frac{d^2u}{dx^2}+\mu\frac{du}{dx}+ku||^2 + \frac{1}{N_t}\sum_{i=1}^{N_t} ||\hat{\boldsymbol{u}}^{(i)}-\boldsymbol{u}^{(i)}||^2,
  581. \end{equation}
  582. includes the border conditions, the residual, in which $\mu$ is a learnable
  583. parameter and the observation loss.
  584. % -------------------------------------------------------------------
  585. \subsection{Disease Informed Neural Networks 2}
  586. \label{sec:pinn:dinn}
  587. % -------------------------------------------------------------------