| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166 |
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Author: Phillip Rothenbeck
- % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
- % File: chap01-introduction/chap01-introduction.tex
- % Part: introduction
- % Description:
- % summary of the content in this chapter
- % Version: 01.01.2012
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- \chapter{Introduction}
- \label{chap:introduction}
- In the early months of 2020, Germany, like many other countries, was struck by
- the novel \emph{Coronavirus Disease} (COVID-19)~\cite{WHO}. The pandemic, which
- originates in Wuhan, China, had a profound impact on the global community,
- paralyzing it for over two years. In response to the pandemic, the German
- government employed a multifaceted approach~\cite{RKI}, encompassing the
- introduction of vaccines and non-pharmaceutical mitigation policies such as
- lockdowns. Between mitigation policies and varying strains of COVID-19, which
- have exhibited varying degrees of infectiousness and lethality~\cite{RKIa},
- Germany had recorded over 38,400,000 infection cases and 174,000 deaths, by
- the end of June in 2023~\cite{SRD}. In light of these figures the need for an
- analysis arises.\\
- The dynamics of disease transmission in the real-world are complex. A multitude
- of factors influence the course of a disease, and it is
- challenging to gain a comprehensive understanding of these factors and develop
- tools that allow for the comparison of disease courses across different
- diseases and time points. The common approach in epidemiology to address this is
- the utilization of epidemiological models that approximate the dynamics by
- focusing on specific factors and modeling these using mathematical tools. These
- models provide epidemiological parameters that determine the behavior of a
- disease within the boundaries of the model. A seminal epidemiological model is
- the \emph{SIR model}, which was first proposed by Kermack and McKendrick~\cite{1927}
- in 1927. The SIR model is a compartmental model that divides the entire
- population into three distinct groups: the \emph{susceptible} compartment, $S$;
- the \emph{infectious} compartment, $I$; and the \emph{removed} compartment, $R$.
- In the context of the SIR model, the constant parameters of the transmission
- rate $\beta$ and the recovery rate $\alpha$ serve to quantify and determine the
- course of a pandemic. However, a pandemic is not a static entity, therefore Liu
- and Stechlinski~\cite{Liu2012}, and Setianto and Hidayat~\cite{Setianto2023}
- propose an SIR model with time-dependent epidemiological parameters and
- reproduction numbers $\Rt$. The SIR model is defined by a system of differential
- equations, that incorporate the parameters $\alpha$ and $\beta$, thereby
- depicting the fluctuation between the three compartments. For a given set of
- data, the epidemiological parameters can be identified by solving the set of
- differential systems. Recently, the data-driven approach of \emph{Physics-Informed Neural Networks}
- (PINN) has gained attention due to its capability of finding solutions to
- differential equations by fitting its predictions to both given data and the
- governing system of differential equations. By employing this methodology,
- Shaier \etal~\cite{Shaier2021} were able to find the epidemiological parameters
- on data for different diseases. Additionally, Millevoi \etal~\cite{Millevoi2023}
- were able to identify the reproduction number $\Rt$ for both synthetic and
- Italian COVID-19 data using an approach based on a reduced version of the SIR
- model.\\
- The objective of this thesis is to identify the epidemiological parameters
- $\alpha$ and $\beta$, as well as the reproduction number $\Rt$ of COVID-19 over
- the first 1200 days of recorded data in Germany and its federal states. The
- Robert Koch Institute (RKI)\footnote{\url{https://www.rki.de/EN/Home/homepage_node.html}} has compiled data on both reported cases and
- associated moralities from the beginning of the outbreak in Germany to the
- present. We utilize and preprocess this data according to the required format of
- our approaches. As the raw data lacks information on recovery incidence, we
- introduce the recovery queue that simulates a recovery period. To estimate the
- epidemiological parameters we adopt the approach of Shaier
- \etal~\cite{Shaier2021}, which utilizes a PINN learning the data, that consists
- of time points with their respective sizes of the $S, I$ and $R$ compartments,
- to predict the epidemiological parameters based on the data and the governing
- system of differential equations. Additionally, we apply the methodology by
- Millevoi \etal~\cite{Millevoi2023} to estimate the time-dependent reproduction
- number, $\Rt$, over a 1200-day period for each German federal state and Germany
- as a whole in the reduced SIR model. Thus needing only the size of the $I$
- group for each time step. To validate the effectiveness of these methods, we
- first conduct experiments on a small synthetic dataset before applying the
- techniques to real-world data. We then analyze the plausibility of our results
- by comparing them to real-world events and data such as vaccination ratios of
- each region or the peaks of impactful variants. This analysis demonstrates the
- relevance of our findings and reveals a correlation between our results and
- real-world developments, thus supporting the effectiveness of our approach.\\
- % -------------------------------------------------------------------
- \section{Related work}
- \label{sec:relatedWork}
- In this section, we categorize our work into the context of existing literature
- on the topic of solving the epidemiological models for real-world data. The
- first work, by Smirnova \etal~\cite{Smirnova2017}, endeavors to identify a
- stochastic methodology for estimating the time-dependent transmission rate
- $\beta(t)$. They achieve this by projecting the time-dependent transmission
- rate onto a finite subspace, that is defined by Legendre polynomials.
- Subsequently, they compare the three regularization techniques of variational
- (Tikhonov's) regularization, truncated singular value decomposition (TSVD), and
- modified TSVD to ascertain the most reliable method for forecasting with
- limited data. Their findings indicate that modified TSVD provides the most
- stable forecasts on, as demonstrated on both simulated data and real-world data
- from the 1918 influenza pandemic and the Ebola epidemic. In contrast, we
- utilize PINNs to find the constant epidemiological parameters and the
- reproduction number for Germany and its states.\\
- Some related works similar to our method apply PINN approaches to COVID-19 and
- other real-world disease examples~\cite{Shaier2021,Millevoi2023,Berkhahn2022,Olumoyin2021}.
- Specifically Shaier \etal~\cite{Shaier2021} put forth a data-driven method
- which they refer to as \emph{Disease-Informed Neural Networks} (DINN). In their
- work, they demonstrate the capacity of PINNs to forecast the trajectory of
- epidemics and pandemics. They underpin the efficacy of their approach by
- applying it to 11 diseases, that have previously been modeled. In their
- experiments they employ the SIDR (susceptible, infectious, dead, recovered)
- model. Finally, they present that this method is a robust and effective means
- of identifying the parameters of a SIR model.\\
- Similarly Berkhahn and Ehrhard~\cite{Berkhahn2022}, employ the susceptible,
- vaccinated, infectious, hospitalized and removed (SVIHR) model. The proposed
- PINN methodology initially estimates the SVIHR model parameters for German
- COVID-19 data, covering the time span from the inceptions of the outbreak to
- the end of 2021. For comparative purposes, Berkhahn and Ehrhard employ the
- method of non-standard finite differences (NSFD) as well. The authors utilize
- both forecasting methods to project the trajectory of COVID-19 from mid-April
- 2023 onwards. Berkhahn and Ehrhard find that PINNs are able to adapt to varying
- vaccination rates and emerging variants.\\
- Furthermore, Olumoyin \etal~\cite{Olumoyin2021} employ an alternative
- methodology for identifying the time-dependent transmission rate of an
- asymptomatic-SIR model accounting for unreported infectious cases. The PINN
- approach they introduce, utilizes the cumulative and daily reported infection
- cases and symptomatic recovered cases, to demonstrate the effect of different
- mitigation measures and to ascertain the proportion of non-symptomatic
- individuals and asymptomatic recovered individuals. With this they can
- illustrate the influence of vaccinations and a set non-pharmaceutical
- mitigation methods on the transmission of COVID-19 on data from Italy, South
- Korea, the United Kingdom, and the United States.\\
- Finally, Millevoi \etal~\cite{Millevoi2023} address the issue of the changes in
- the transmission rate due to the dynamics of a pandemic. The authors employ the
- reproduction number $\Rt$ to reduce the system of differential equations to a
- single equation and introduce a reduced-split version of the PINN, which
- initially trains on the data and then trains to minimize the residual of the
- ordinary differential equation. They test their approach on five synthetic and
- two real-world scenarios from the early stages of the COVID-19 pandemic in
- Italy. This method yields an increase in both accuracy and training speed. In
- contrast, to these works, we estimate the epidemiological of $\alpha$ and
- $\beta$ and the reproduction number $\Rt$ for Germany for the entirety of the
- span from early March in 2020 to late June in 2023.
- % -------------------------------------------------------------------
- \section{Overview}
- This thesis is comprised of four chapters. \Cref{chap:background}
- starts with the theoretical overview of mathematical modeling in epidemiology,
- with a particular focus on the SIR model. Subsequently, it shifts its focus to
- neural networks, specifically on the background of PINNs and their use in
- solving ordinary differential equations.~\Cref{chap:methods} outlines the
- methodology employed in this thesis. First, we present the data, that was
- collected by the RKI and our preprocessing. Then, we present the PINN
- approaches, which are inspired by the work of Shaier \etal~\cite{Shaier2021}
- and Millevoi \etal~\cite{Millevoi2023}.~\Cref{chap:evaluation} provides the
- setups and results of the experiments that we conduct. This chapter is divided
- into two sections. The first section shows and discusses the results concerning
- the epidemiological parameters of $\alpha$ and $\beta$. The subsequent section
- presents the results concerning the reproduction value $\Rt$. Finally, in
- \Cref{chap:conclusions}, give a conclusion of our work and provide an overview
- of potential further work.
- % -------------------------------------------------------------------
|