|
@@ -8,60 +8,39 @@
|
|
|
% Version: 01.01.2012
|
|
|
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
|
|
|
-\chapter{Introduction 5}
|
|
|
+\chapter{Introduction}
|
|
|
\label{chap:introduction}
|
|
|
|
|
|
In the early months of 2020, Germany, like many other countries, was struck by the novel
|
|
|
-\emph{Coronavirus Disease} (COVID-19). The pandemic, which originates in
|
|
|
+\emph{Coronavirus Disease} (COVID-19)~\cite{WHO}. The pandemic, which originates in
|
|
|
Wuhan, China, had a profound impact on the global community, paralyzing it for
|
|
|
over two years. In response to the pandemic, the German government employed a
|
|
|
-multifaceted approach, encompassing the introduction of vaccines and
|
|
|
+multifaceted approach~\cite{RKI}, encompassing the introduction of vaccines and
|
|
|
non-pharmaceutical mitigation policies such as lockdowns. Between mitigation
|
|
|
policies and varying strains of COVID-19, which have exhibited varying degrees
|
|
|
-of infectiousness and lethality, Germany had recorded over 38,400,000 infection
|
|
|
-cases and 174,000 deaths, as of the end of June in 2023. In light of these
|
|
|
+of infectiousness and lethality~\cite{RKIa}, Germany had recorded over 38,400,000 infection
|
|
|
+cases and 174,000 deaths, as of the end of June in 2023~\cite{SRD}. In light of these
|
|
|
figures the need for an analysis arises.\\
|
|
|
|
|
|
The dynamics of the spread of disease transmission in the real-world are
|
|
|
complex. A multitude of factors influence the course of a disease, and it is
|
|
|
-challenging to gain a comprehensive understanding of these factors and develop a
|
|
|
-tool that allows for the comparison of disease courses across different diseases
|
|
|
+challenging to gain a comprehensive understanding of these factors and develop
|
|
|
+tools that allows for the comparison of disease courses across different diseases
|
|
|
and time points. The common approach in epidemiology to address this is the
|
|
|
utilization of epidemiological models that approximate the dynamics by focusing
|
|
|
-on specific factors and modeling these using differential equations and other
|
|
|
-mathematical tools for modeling. These models provide transition rates and
|
|
|
-parameters that determine the behavior of a disease within the boundaries of the
|
|
|
-model. A fundamental epidemiological model, is the \emph{SIR model}, which was
|
|
|
-first proposed by Kermack and McKendrick~\cite{1927} in 1927. The SIR model is a
|
|
|
-compartmentalized model that divides the entire population into three distinct
|
|
|
-compartments. The first compartment is the \emph{susceptible} compartment, $S$,
|
|
|
-which contains all individuals of the population who are susceptible to
|
|
|
-infection. The second group, is the \emph{infectious} compartment, $I$, which
|
|
|
-comprises all individuals currently infected and capable of infecting
|
|
|
-susceptible individuals. Lastly, the \emph{removed} compartment, $R$, contains
|
|
|
-all individuals, who have succumbed to the disease or recovered from it and are
|
|
|
-therefore no longer susceptible to infection. The model is characterized by two
|
|
|
-transition rates: the transmission rate $\beta$, which controls the rate of
|
|
|
-individuals becoming infected and consequently transitioning from $S$ to $I$;
|
|
|
-and the recovery rate $\alpha$, which determines the rate at which individuals
|
|
|
-either recover or succumb to the disease, thereby transitioning from $I$ to $R$.
|
|
|
-In the context of the SIR model, the values of $\beta$ and $\alpha$ serve to
|
|
|
-quantify and determine the course of a pandemic.\\
|
|
|
-
|
|
|
-The transition rates of $\beta$ and $\alpha$ serve to quantify a pandemic across
|
|
|
-its entire duration. However, it is important to recognize that a pandemic is
|
|
|
-not a static entity; rather, it evolves, and the infectiousness, deadliness and
|
|
|
-time to recovery associated with it change with each of its numerous variants.
|
|
|
-To address this issue, Liu and Stechlinski, and Setianto and Hidayat~\cite{Liu2012, Setianto2023},
|
|
|
-propose an SIR model with time-dependent transition rates $\beta(t)$ and
|
|
|
-$\alpha(t)$. From these rates, they derive the time-dependent reproductive
|
|
|
-number $\Rt$, which represents the average number of individuals, that are
|
|
|
-infected by one infectious person. A high value for $\Rt$ indicates a rapid
|
|
|
-spread of the disease, while a low value either suggests either an outbreak or
|
|
|
-the disease is declining. This qualifies the time-dependent reproduction number
|
|
|
-$\Rt$ as an indicator of the pandemic's progression.\\
|
|
|
-
|
|
|
-The SIR model is defined by a system of differential equations, that incorporate
|
|
|
+on specific factors and modeling these using mathematical tools. These models
|
|
|
+provide transition rates and parameters that determine the behavior of a disease
|
|
|
+within the boundaries of the model. A fundamental epidemiological model, is the
|
|
|
+\emph{SIR model}, which was first proposed by Kermack and McKendrick~\cite{1927}
|
|
|
+in 1927. The SIR model is a compartmentalized model that divides the entire
|
|
|
+population into three distinct groups: the \emph{susceptible} compartment, $S$; the
|
|
|
+\emph{infectious} compartment, $I$; and the \emph{removed} compartment, $R$.
|
|
|
+In the context of the SIR model, the constant parameters of the transmission
|
|
|
+rate $\beta$ and the recovery rate $\alpha$ serve to quantify and determine the
|
|
|
+course of a pandemic. However, pandemic is not a static entity, therefor, Liu
|
|
|
+and Stechlinski~\cite{Liu2012}, and Setianto and Hidayat~\cite{Setianto2023},
|
|
|
+propose an SIR model with time-dependent transition rates and reproduction number $\Rt$. The SIR model
|
|
|
+is defined by a system of differential equations, that incorporate
|
|
|
the transition rates, thereby depicting the fluctuation between the three
|
|
|
compartments. For a given set of data, the transition rate can be identified by
|
|
|
solving the set of differential systems. Recently, the data-driven approach of
|
|
@@ -69,92 +48,98 @@ solving the set of differential systems. Recently, the data-driven approach of
|
|
|
capability of finding solutions to differential equations by fitting its
|
|
|
predictions to both given data and the governing system of differential
|
|
|
equations. By employing this methodology, Shaier \etal~\cite{Shaier2021} were
|
|
|
-able to find the transition rate on synthetic data. Additionally, Millevoi
|
|
|
-\etal~\cite{Millevoi2023} were able to identify the reproduction number $\Rt$
|
|
|
-for both synthetic and Italian COVID-19 data using an approach based on a
|
|
|
+able to find the transition rate on data for different diseases. Additionally,
|
|
|
+Millevoi \etal~\cite{Millevoi2023} were able to identify the reproduction number
|
|
|
+$\Rt$ for both synthetic and Italian COVID-19 data using an approach based on a
|
|
|
reduced version of the SIR model.\\
|
|
|
|
|
|
-The Robert Koch Institute has collected incident and death case data from the
|
|
|
-beginning of the outbreak in Germany to the present. This data will be utilitzed
|
|
|
-in this bachelor thesis to investigate the transition rates and reproduction
|
|
|
-number for each German state and the country as a whole, employing the
|
|
|
-methodologies proposed by Shaier \etal and Millevoi \etal. Additionally, the
|
|
|
-findings will be contextualized and correlated with the events of the real
|
|
|
-world.\\
|
|
|
+The objective of this thesis is to identify the transition rates $\beta$ and
|
|
|
+$alpha$, as well as the reproduction number $\Rt$ of COVID-19 over the first
|
|
|
+1200 days of recorded data in Germany and its federal states. The Robert Koch
|
|
|
+Institute (RKI) has compiled data on both reported cases and associated
|
|
|
+moralities from the beginning of the outbreak in Germany to the present. We
|
|
|
+utilize and preprocess this data according to the required format of our
|
|
|
+approaches. As the raw data lacks information on recovery incidence, we
|
|
|
+introduce the recovery queue that simulates a recovery period. To estimate the
|
|
|
+transition rates we adopt the approach of Shaier \etal~\cite{Shaier2021}, which
|
|
|
+utilizes a physics-informed neural network learning the data, which consists of
|
|
|
+time point with their respective sizes of the $S, I$ and $R$ compartments, to
|
|
|
+predict the transition rates based on the data and the governing system of
|
|
|
+differential equations. Moreover, we utilize the methodology proposed by
|
|
|
+Millevoi \etal~\cite{Millevoi2023} that estimates the reproduction number for
|
|
|
+each day across the 1200-day span for each German state and Germany as a whole,
|
|
|
+in reduced SIR model. Thus needing only the size of the $I$ group for each time
|
|
|
+step. To validate the effectiveness of these methods, we first conduct
|
|
|
+experiments on a small synthetic dataset before applying the techniques to
|
|
|
+real-world data. We then analyze the plausibility of our results by comparing
|
|
|
+them to real-world events and data such as vaccination ratios of each region or
|
|
|
+the peaks of impactful variants to demonstrate the relevance of these numbers.
|
|
|
+This analysis demonstrates the relevance of our findings and reveals a
|
|
|
+correlation between our results and real-world developments, thus supporting the
|
|
|
+effectiveness of our approach.\\
|
|
|
+
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|
|
|
-\section{Related work 2}
|
|
|
+\section{Related work}
|
|
|
\label{sec:relatedWork}
|
|
|
-In \emph{Forecasting Epidemics Through Nonparametric Estimation of
|
|
|
- Time-Dependent Transmission Rates Using the SEIR Model}~\cite{Smirnova2017},
|
|
|
-Smirnova \etal endeavor to identify a stochastic methodology for estimating the
|
|
|
-time-dependent transmission rate $\beta(t)$. This is in response to the
|
|
|
-limitations of earlier parametric estimation methods, which are prone
|
|
|
-instability due to the difficulty in identifying parameter finding and a low
|
|
|
-amount of available data. They achieve this by projecting the time-dependent
|
|
|
-transmission rate onto a finite subspace, that is defined by Legendre
|
|
|
-polynomials. Subsequently, they compare the three regularization techniques of
|
|
|
-variational (Tikhonov’s) regularization, truncated singular value decomposition
|
|
|
-(TSVD), and modified TSVD to ascertain the most reliable method for forecasting
|
|
|
-with limited data. Their findings indicate that modified TSVD provides the most
|
|
|
-stable forecasts on limited data, as demonstrated on both simulated data and
|
|
|
-real-world data from the 1918 influenza pandemic and the 2014-2015 Ebola
|
|
|
-epidemic.\\
|
|
|
-
|
|
|
-In their publication, entitled \emph{Data-driven approaches for predicting
|
|
|
- spread of infectious diseases through DINNs: Disease Informed Neural Networks},
|
|
|
-Shaier \etal~\cite{Shaier2021} put forth a data-driven approach for identifying
|
|
|
-the parameters of epidemiological models. The authors apply physics-informed
|
|
|
-neural networks to the compartmental SIR models, and refer to their method as
|
|
|
-disease informed neural networks (DINN). In their work, they demonstrate the
|
|
|
-capacity of DINNs to forecast the trajectory of epidemics and pandemics. They
|
|
|
-underpin the efficacy of their approach by applying it to 11 diseases, that have
|
|
|
-previously been modeled, including examples such as COVID, HIV, Tuberculosis and
|
|
|
-Ebola. In their experiments they employ the SIDR (susceptible, infectious, dead,
|
|
|
-recovered) model. Finally, they present that this method is a robust and
|
|
|
-effective means of identifying the parameters of a SIR model.\\
|
|
|
-
|
|
|
-In their article \emph{A physics-informed neural network to model COVID-19
|
|
|
- infection and hospitalization scenarios}, Berkhahn and Ehrhard~\cite{Berkhahn2022}
|
|
|
-employ the susceptible, vaccinated, infectious, hospitalized and removed (SVIHR)
|
|
|
-model. They solve the system of differential equations inherent to the SVIHR
|
|
|
-model by the means of PINNs. The authors utilize a dataset of German COVID-19
|
|
|
-data, covering the time span from the inceptions of the outbreak to the end of
|
|
|
-2021. The proposed PINN methodology initially estimates the SVIHR model
|
|
|
-parameters and subsequently forecasts the data. For comparative purposes,
|
|
|
-Berkhahn and Ehrhard employ the method of non-standard finite differences (NSFD)
|
|
|
-as well. In the validation process, the two forecasting methods project the
|
|
|
-trajectory of COVID-19 from mid-April onwards. Berkhahn and Ehrhard find that
|
|
|
-the PINN is able to adapt to varying vaccination rates and emerging variants.\\
|
|
|
-
|
|
|
-In their work, \emph{Data-Driven Deep-Learning Algorithm for Asymptomatic
|
|
|
- COVID-19 Model with Varying Mitigation Measures and Transmission Rate},
|
|
|
-Olumoyin \etal~\cite{Olumoyin2021} employ an alternative methodology for
|
|
|
-identifying the time-dependent transmission rate of an asymptomatic-SIR model.
|
|
|
-On the premise that not all the infectious individuals are reported and included
|
|
|
-in the data available. The algorithm they introduce, utilizes the cumulative and
|
|
|
-daily reported infection cases and symptomatic recovered cases, to demonstrate
|
|
|
-the effect of different mitigation measures and to ascertain the size of the
|
|
|
-part of non-symptomatic individuals in the total number of infective individuals
|
|
|
-and the proportion of asymptomatic recovered individuals. With this they can
|
|
|
+In this section, we categorize our work into the context of existing literature
|
|
|
+on the topic of solving the epidemiological models for real-world data. The
|
|
|
+first work, by Smirnova \etal~\cite{Smirnova2017}, endeavors to identify a
|
|
|
+stochastic methodology for estimating the time-dependent transmission rate
|
|
|
+$\beta(t)$. They achieve this by projecting the time-dependent transmission rate
|
|
|
+onto a finite subspace, that is defined by Legendre polynomials. Subsequently,
|
|
|
+they compare the three regularization techniques of variational (Tikhonov’s)
|
|
|
+regularization, truncated singular value decomposition (TSVD), and modified TSVD
|
|
|
+to ascertain the most reliable method for forecasting with limited data. Their
|
|
|
+findings indicate that modified TSVD provides the most stable forecasts on
|
|
|
+limited data, as demonstrated on both simulated data and real-world data from
|
|
|
+the 1918 influenza pandemic and the Ebola epidemic. In contrast, we
|
|
|
+utilize physics-informed neural networks (PINN) to find the constant transition rates
|
|
|
+and the reproduction number for Germany and its states\\
|
|
|
+
|
|
|
+Some related works similarly to us apply PINN approaches to COVID-19 and other
|
|
|
+real-world disease data such as~\cite{Shaier2021,Berkhahn2022,Olumoyin2021,Millevoi2023}.
|
|
|
+Specifically in~\cite{Shaier2021}, Shaier \etal put forth a data-driven
|
|
|
+approach which they refer to as disease informed neural networks (DINN). In their work,
|
|
|
+they demonstrate the capacity of DINNs to forecast the trajectory of epidemics
|
|
|
+and pandemics. They underpin the efficacy of their approach by applying it to 11
|
|
|
+diseases, that have previously been modeled. In their experiments they employ
|
|
|
+the SIDR (susceptible, infectious, dead, recovered) model. Finally, they present
|
|
|
+that this method is a robust and effective means of identifying the parameters
|
|
|
+of a SIR model.\\
|
|
|
+
|
|
|
+Similarly in~\cite{Berkhahn2022}, Berkhahn and Ehrhard employ the susceptible,
|
|
|
+vaccinated, infectious, hospitalized and removed (SVIHR) model. The proposed
|
|
|
+PINN methodology initially estimates the SVIHR model parameters for German
|
|
|
+COVID-19 data, covering the time span from the inceptions of the outbreak to the
|
|
|
+end of 2021. For comparative purposes, Berkhahn and Ehrhard employ the method of
|
|
|
+non-standard finite differences (NSFD) as well. The authors employ both methods
|
|
|
+the two forecasting methods project the trajectory of COVID-19 from mid-April
|
|
|
+2023 onwards. Berkhahn and Ehrhard find that the PINN is able to adapt to
|
|
|
+varying vaccination rates and emerging variants.\\
|
|
|
+
|
|
|
+Furthermore, Olumoyin \etal~\cite{Olumoyin2021} employ an alternative
|
|
|
+methodology for identifying the time-dependent transmission rate of an
|
|
|
+asymptomatic-SIR model accounting for unreported infectious cases. The PINN
|
|
|
+approach they introduce, utilizes the cumulative and daily reported infection
|
|
|
+cases and symptomatic recovered cases, to demonstrate the effect of different
|
|
|
+mitigation measures and to ascertain the proportion of non-symptomatic
|
|
|
+individuals and asymptomatic recovered individuals. With this they can
|
|
|
illustrate the influence of vaccination and a set non-pharmaceutical mitigation
|
|
|
methods on the transmission of COVID-19 on data from Italy, South Korea, the
|
|
|
United Kingdom, and the United States.\\
|
|
|
|
|
|
-In \emph{A Physics-Informed Neural Network approach for compartmental
|
|
|
- epidemiological models} Millevoi \etal~\cite{Millevoi2023} address the issue
|
|
|
-of describing the dynamically changing transmission rate, which is influenced by
|
|
|
-the emergence of new variants or the implementation of non-pharmaceutical
|
|
|
-measures. They employ a PINN to maintain an account of the changes of the
|
|
|
-transmission rate included in the reproduction number and to approximate the
|
|
|
-model state variables. To this end, Millevoi \etal employ the reproduction
|
|
|
-number to reduce the system of differential equations to a single equation and
|
|
|
-introduce a reduced-split version of the PINN, which initially trains on the
|
|
|
-data and then trains to minimize the residual of the ODE. They test their
|
|
|
-approach on five synthetic and two real-world scenarios from the early stages of
|
|
|
-the COVID-19 pandemic in Italy. This method yields an increase in both accuracy
|
|
|
-and training speed.
|
|
|
+Finally, Millevoi \etal~\cite{Millevoi2023} address the issue of the changes in
|
|
|
+the transmission rate due to the dynamics of a pandemic. The authors employ the
|
|
|
+reproduction number to reduce the system of differential equations to a single
|
|
|
+equation and introduce a reduced-split version of the PINN, which initially
|
|
|
+trains on the data and then trains to minimize the residual of the ODE. They
|
|
|
+test their approach on five synthetic and two real-world scenarios from the
|
|
|
+early stages of the COVID-19 pandemic in Italy. This method yields an increase
|
|
|
+in both accuracy and training speed. In contrast, to these works, we estimate
|
|
|
+the rates and the reproduction number for Germany for the entirety of the span
|
|
|
+from early March in 2020 to late June in 2023.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|
|
@@ -164,11 +149,11 @@ This thesis is comprised of four chapters. \Cref{chap:background}
|
|
|
presents with the theoretical overview of mathematical modeling in epidemiology,
|
|
|
with a particular focus on the SIR model. Subsequently, it shifts its focus to
|
|
|
neural networks, specifically on the background of physics-informed neural
|
|
|
-networks (PINN) and their use in solving ordinary differential equations.
|
|
|
-In~\Cref{chap:methods} outlines the methodology employed in this thesis. First
|
|
|
+networks (PINN) and their use in solving ordinary differential equations.~\Cref{chap:methods}
|
|
|
+outlines the methodology employed in this thesis. First
|
|
|
we present the data, that was collected by the Robert Koch Institute (RKI). Then
|
|
|
-we present the PINN approaches, which are inspired by the work of Shaier \etal
|
|
|
-and Millevoi \etal~\cite{Shaier2021,Millevoi2023}.~\Cref{chap:evaluation}
|
|
|
+we present the PINN approaches, which are inspired by the work of Shaier \etal~\cite{Shaier2021}
|
|
|
+and Millevoi \etal~\cite{Millevoi2023}.~\Cref{chap:evaluation}
|
|
|
presents the setups and results of the experiments that we conduct. This chapter
|
|
|
is divided into two sections. The first section presents and discusses the
|
|
|
results concerning the transition rates of $\beta$ and $\alpha$. The subsequent
|