123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166 |
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Author: Phillip Rothenbeck
- % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
- % File: chap01-introduction/chap01-introduction.tex
- % Part: introduction
- % Description:
- % summary of the content in this chapter
- % Version: 01.01.2012
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- \chapter{Introduction}
- \label{chap:introduction}
- In the early months of 2020, Germany, like many other countries, was struck by
- the novel \emph{Coronavirus Disease} (COVID-19)~\cite{WHO}. The pandemic, which
- originates in Wuhan, China, had a profound impact on the global community,
- paralyzing it for over two years. In response to the pandemic, the German
- government employed a multifaceted approach~\cite{RKI}, encompassing the
- introduction of vaccines and non-pharmaceutical mitigation policies such as
- lockdowns. Between mitigation policies and varying strains of COVID-19, which
- have exhibited varying degrees of infectiousness and lethality~\cite{RKIa},
- Germany had recorded over 38,400,000 infection cases and 174,000 deaths, as of
- the end of June in 2023~\cite{SRD}. In light of these figures the need for an
- analysis arises.\\
- The dynamics of the spread of disease transmission in the real-world are
- complex. A multitude of factors influence the course of a disease, and it is
- challenging to gain a comprehensive understanding of these factors and develop
- tools that allows for the comparison of disease courses across different
- diseases and time points. The common approach in epidemiology to address this is
- the utilization of epidemiological models that approximate the dynamics by
- focusing on specific factors and modeling these using mathematical tools. These
- models provide epidemiological parameters that determine the behavior of a
- disease within the boundaries of the model. A seminal epidemiological model is
- the \emph{SIR model}, which was first proposed by Kermack and McKendrick~\cite{1927}
- in 1927. The SIR model is a compartmentalized model that divides the entire
- population into three distinct groups: the \emph{susceptible} compartment, $S$;
- the \emph{infectious} compartment, $I$; and the \emph{removed} compartment, $R$.
- In the context of the SIR model, the constant parameters of the transmission
- rate $\beta$ and the recovery rate $\alpha$ serve to quantify and determine the
- course of a pandemic. However, a pandemic is not a static entity, therefore Liu
- and Stechlinski~\cite{Liu2012}, and Setianto and Hidayat~\cite{Setianto2023}
- propose an SIR model with time-dependent epidemiological parameters and
- reproduction number $\Rt$. The SIR model is defined by a system of differential
- equations, that incorporate the parameters $\alpha$ and $\beta$, thereby
- depicting the fluctuation between the three compartments. For a given set of
- data, the epidemiological parameters can be identified by solving the set of
- differential systems. Recently, the data-driven approach of \emph{Physics-Informed Neural Networks}
- (PINN) has gained attention due to its capability of finding solutions to
- differential equations by fitting its predictions to both given data and the
- governing system of differential equations. By employing this methodology,
- Shaier \etal~\cite{Shaier2021} were able to find the epidemiological parameters
- on data for different diseases. Additionally, Millevoi \etal~\cite{Millevoi2023}
- were able to identify the reproduction number $\Rt$ for both synthetic and
- Italian COVID-19 data using an approach based on a reduced version of the SIR
- model.\\
- The objective of this thesis is to identify the epidemiological parameters
- $\beta$ and $\alpha$, as well as the reproduction number $\Rt$ of COVID-19 over
- the first 1200 days of recorded data in Germany and its federal states. The
- Robert Koch Institute (RKI) has compiled data on both reported cases and
- associated moralities from the beginning of the outbreak in Germany to the
- present. We utilize and preprocess this data according to the required format of
- our approaches. As the raw data lacks information on recovery incidence, we
- introduce the recovery queue that simulates a recovery period. To estimate the
- epidemiological parameters we adopt the approach of Shaier
- \etal~\cite{Shaier2021}, which utilizes a PINN learning the data, which consists
- of time points with their respective sizes of the $S, I$ and $R$ compartments,
- to predict the epidemiological parameters based on the data and the governing
- system of differential equations. Moreover, we utilize the methodology proposed
- by Millevoi \etal~\cite{Millevoi2023} that estimates the reproduction number for
- each day across the 1200-day span for each German state and Germany as a whole,
- in the reduced SIR model. Thus needing only the size of the $I$ group for each
- time step. To validate the effectiveness of these methods, we first conduct
- experiments on a small synthetic dataset before applying the techniques to
- real-world data. We then analyze the plausibility of our results by comparing
- them to real-world events and data such as vaccination ratios of each region or
- the peaks of impactful variants to demonstrate the relevance of these numbers.
- This analysis demonstrates the relevance of our findings and reveals a
- correlation between our results and real-world developments, thus supporting the
- effectiveness of our approach.\\
- % -------------------------------------------------------------------
- \section{Related work}
- \label{sec:relatedWork}
- In this section, we categorize our work into the context of existing literature
- on the topic of solving the epidemiological models for real-world data. The
- first work, by Smirnova \etal~\cite{Smirnova2017}, endeavors to identify a
- stochastic methodology for estimating the time-dependent transmission rate
- $\beta(t)$. They achieve this by projecting the time-dependent transmission rate
- onto a finite subspace, that is defined by Legendre polynomials. Subsequently,
- they compare the three regularization techniques of variational (Tikhonov's)
- regularization, truncated singular value decomposition (TSVD), and modified TSVD
- to ascertain the most reliable method for forecasting with limited data. Their
- findings indicate that modified TSVD provides the most stable forecasts on
- limited data, as demonstrated on both simulated data and real-world data from
- the 1918 influenza pandemic and the Ebola epidemic. In contrast, we
- utilize PINNs to find the constant epidemiological parameters
- and the reproduction number for Germany and its states.\\
- Some related works similar to our approach apply PINN approaches to COVID-19 and
- other real-world disease examples~\cite{Shaier2021,Millevoi2023,Berkhahn2022,Olumoyin2021}.
- Specifically Shaier \etal~\cite{Shaier2021} put forth a data-driven approach
- which they refer to as \emph{Disease-Informed Neural Networks} (DINN). In their
- work, they demonstrate the capacity of DINNs to forecast the trajectory of
- epidemics and pandemics. They underpin the efficacy of their approach by
- applying it to 11 diseases, that have previously been modeled. In their
- experiments they employ the SIDR (susceptible, infectious, dead, recovered)
- model. Finally, they present that this method is a robust and effective means of
- identifying the parameters of a SIR model.\\
- Similarly Berkhahn and Ehrhard~\cite{Berkhahn2022}, employ the susceptible,
- vaccinated, infectious, hospitalized and removed (SVIHR) model. The proposed
- PINN methodology initially estimates the SVIHR model parameters for German
- COVID-19 data, covering the time span from the inceptions of the outbreak to the
- end of 2021. For comparative purposes, Berkhahn and Ehrhard employ the method of
- non-standard finite differences (NSFD) as well. The authors employ both
- forecasting methods project the trajectory of COVID-19 from mid-April 2023
- onwards. Berkhahn and Ehrhard find that the PINN is able to adapt to varying
- vaccination rates and emerging variants.\\
- Furthermore, Olumoyin \etal~\cite{Olumoyin2021} employ an alternative
- methodology for identifying the time-dependent transmission rate of an
- asymptomatic-SIR model accounting for unreported infectious cases. The PINN
- approach they introduce, utilizes the cumulative and daily reported infection
- cases and symptomatic recovered cases, to demonstrate the effect of different
- mitigation measures and to ascertain the proportion of non-symptomatic
- individuals and asymptomatic recovered individuals. With this they can
- illustrate the influence of vaccination and a set non-pharmaceutical mitigation
- methods on the transmission of COVID-19 on data from Italy, South Korea, the
- United Kingdom, and the United States.\\
- Finally, Millevoi \etal~\cite{Millevoi2023} address the issue of the changes in
- the transmission rate due to the dynamics of a pandemic. The authors employ the
- reproduction number to reduce the system of differential equations to a single
- equation and introduce a reduced-split version of the PINN, which initially
- trains on the data and then trains to minimize the residual of the ordinary
- differential equation. They test their approach on five synthetic and two
- real-world scenarios from the early stages of the COVID-19 pandemic in Italy.
- This method yields an increase in both accuracy and training speed. In contrast,
- to these works, we estimate the rates and the reproduction number for Germany
- for the entirety of the span from early March in 2020 to late June in 2023.
- % -------------------------------------------------------------------
- \section{Overview}
- This thesis is comprised of four chapters. \Cref{chap:background}
- presents with the theoretical overview of mathematical modeling in epidemiology,
- with a particular focus on the SIR model. Subsequently, it shifts its focus to
- neural networks, specifically on the background of PINNs and their use in
- solving ordinary differential equations.~\Cref{chap:methods} outlines the
- methodology employed in this thesis. First we present the data, that was
- collected by the RKI. Then we present the PINN approaches, which are inspired by
- the work of Shaier \etal~\cite{Shaier2021} and Millevoi
- \etal~\cite{Millevoi2023}.~\Cref{chap:evaluation} presents the setups and
- results of the experiments that we conduct. This chapter is divided into two
- sections. The first section presents and discusses the results concerning the
- epidemiological parameters of $\beta$ and $\alpha$. The subsequent section
- presents the results concerning the reproduction value $\Rt$. Finally, in
- \Cref{chap:conclusions}, we connect our results with the events of the
- real-world and give an overview of potential further work.
- % -------------------------------------------------------------------
|