% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Author:   Phillip Rothenbeck
% Title:    Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
% File:     chap01-introduction/chap01-introduction.tex
% Part:     introduction
% Description:
%         summary of the content in this chapter
% Version:  01.01.2012
% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

\chapter{Introduction   5}
\label{chap:introduction}

In the early months of 2020, Germany, like many other countries, was struck by the novel
\emph{Coronavirus Disease} (COVID-19). The pandemic, which originates in
Wuhan, China, had a profound impact on the global community, paralyzing it for
over two years. In response to the pandemic, the German government employed a
multifaceted approach, encompassing the introduction of vaccines and
non-pharmaceutical mitigation policies such as lockdowns. Between mitigation
policies and varying strains of COVID-19, which have exhibited varying degrees
of infectiousness and lethality, Germany had recorded over 38,400,000 infection
cases and 174,000 deaths, as of the end of June in 2023. In light of these
figures the need for an analysis arises.\\

The dynamics of the spread of disease transmission in the real-world are
complex. A multitude of factors influence the course of a disease, and it is
challenging to gain a comprehensive understanding of these factors and develop a
tool that allows for the comparison of disease courses across different diseases
and time points. The common approach in epidemiology to address this is the
utilization of epidemiological models that approximate the dynamics by focusing
on specific factors and modeling these using differential equations and other
mathematical tools for modeling. These models provide transition rates and
parameters that determine the behavior of a disease within the boundaries of the
model. A fundamental epidemiological model, is the \emph{SIR model}, which was
first proposed by Kermack and McKendrick~\cite{1927} in 1927. The SIR model is a
compartmentalized model that divides the entire population into three distinct
compartments. The first compartment is the \emph{susceptible} compartment, $S$,
which contains all individuals of the population who are susceptible to
infection. The second group, is the \emph{infectious} compartment, $I$, which
comprises all individuals currently infected and capable of infecting
susceptible individuals. Lastly, the \emph{removed} compartment, $R$, contains
all individuals, who have succumbed to the disease or recovered from it and are
therefore no longer susceptible to infection. The model is characterized by two
transition rates: the transmission rate $\beta$, which controls the rate of
individuals becoming infected and consequently transitioning from $S$ to $I$;
and the recovery rate $\alpha$, which determines the rate at which individuals
either recover or succumb to the disease, thereby transitioning from $I$ to $R$.
In the context of the SIR model, the values of $\beta$ and $\alpha$ serve to
quantify and determine the course of a pandemic.\\

The transition rates of $\beta$ and $\alpha$ serve to quantify a pandemic across
its entire duration. However, it is important to recognize that a pandemic is
not a static entity; rather, it evolves, and the infectiousness, deadliness and
time to recovery associated with it change with each of its numerous variants.
To address this issue, Liu and Stechlinski, and Setianto and Hidayat~\cite{Liu2012, Setianto2023},
propose an SIR model with time-dependent transition rates $\beta(t)$ and
$\alpha(t)$. From these rates, they derive the time-dependent reproductive
number $\Rt$, which represents the average number of individuals, that are
infected by one infectious person. A high value for $\Rt$ indicates a rapid
spread of the disease, while a low value either suggests either an outbreak or
the disease is declining. This qualifies the time-dependent reproduction number
$\Rt$ as an indicator of the pandemic's progression.\\

The SIR model is defined by a system of differential equations, that incorporate
the transition rates, thereby depicting the fluctuation between the three
compartments. For a given set of data, the transition rate can be identified by
solving the set of differential systems. Recently, the data-driven approach of
\emph{physics-informed neural networks} (PINN) has gained attention due to its
capability of finding solutions to differential equations by fitting its
predictions to both given data and the governing system of differential
equations. By employing this methodology, Shaier \etal~\cite{Shaier2021} were
able to find the transition rate on synthetic data. Additionally, Millevoi
\etal~\cite{Millevoi2023} were able to identify the reproduction number $\Rt$
for both synthetic and Italian COVID-19 data using an approach based on a
reduced version of the SIR model.\\

The Robert Koch Institute has collected incident and death case data from the
beginning of the outbreak in Germany to the present. This data will be utilitzed
in this bachelor thesis to investigate the transition rates and reproduction
number for each German state and the country as a whole, employing the
methodologies proposed by Shaier \etal and Millevoi \etal. Additionally, the
findings will be contextualized and correlated with the events of the real
world.\\

% -------------------------------------------------------------------

\section{Related work   2}
\label{sec:relatedWork}
In \emph{Forecasting Epidemics Through Nonparametric Estimation of
    Time-Dependent Transmission Rates Using the SEIR Model}~\cite{Smirnova2017},
Smirnova \etal endeavor to identify a stochastic methodology for estimating the
time-dependent transmission rate $\beta(t)$. This is in response to the
limitations of earlier parametric estimation methods, which are prone
instability due to the difficulty in identifying parameter finding and a low
amount of available data. They achieve this by projecting the time-dependent
transmission rate onto a finite subspace, that is defined by Legendre
polynomials. Subsequently, they compare the three regularization techniques of
variational (Tikhonov’s) regularization, truncated singular value decomposition
(TSVD), and modified TSVD to ascertain the most reliable method for forecasting
with limited data. Their findings indicate that modified TSVD provides the most
stable forecasts on limited data, as demonstrated on both simulated data and
real-world data from the 1918 influenza pandemic and the 2014-2015 Ebola
epidemic.\\

In their publication, entitled \emph{Data-driven approaches for predicting
    spread of infectious diseases through DINNs: Disease Informed Neural Networks},
Shaier \etal~\cite{Shaier2021} put forth a data-driven approach for identifying
the parameters of epidemiological models. The authors apply physics-informed
neural networks to the compartmental SIR models, and refer to their method as
disease informed neural networks (DINN). In their work, they demonstrate the
capacity of DINNs to forecast the trajectory of epidemics and pandemics. They
underpin the efficacy of their approach by applying it to 11 diseases, that have
previously been modeled, including examples such as COVID, HIV, Tuberculosis and
Ebola. In their experiments they employ the SIDR (susceptible, infectious, dead,
recovered) model. Finally, they present that this method is a robust and
effective means of identifying the parameters of a SIR model.\\

In their article \emph{A physics-informed neural network to model COVID-19
    infection and hospitalization scenarios}, Berkhahn and Ehrhard~\cite{Berkhahn2022}
employ the susceptible, vaccinated, infectious, hospitalized and removed (SVIHR)
model. They solve the system of differential equations inherent to the SVIHR
model by the means of PINNs. The authors utilize a dataset of German COVID-19
data, covering the time span from the inceptions of the outbreak to the end of
2021. The proposed PINN methodology initially estimates the SVIHR model
parameters and subsequently forecasts the data. For comparative purposes,
Berkhahn and Ehrhard employ the method of non-standard finite differences (NSFD)
as well. In the validation process, the two forecasting methods project the
trajectory of COVID-19 from mid-April onwards. Berkhahn and Ehrhard find that
the PINN is able to adapt to varying vaccination rates and emerging variants.\\

In their work, \emph{Data-Driven Deep-Learning Algorithm for Asymptomatic
    COVID-19 Model with Varying Mitigation Measures and Transmission Rate},
Olumoyin \etal~\cite{Olumoyin2021} employ an alternative methodology for
identifying the time-dependent transmission rate of an asymptomatic-SIR model.
On the premise that not all the infectious individuals are reported and included
in the data available. The algorithm they introduce, utilizes the cumulative and
daily reported infection cases and symptomatic recovered cases, to demonstrate
the effect of different mitigation measures and to ascertain the size of the
part of non-symptomatic individuals in the total number of infective individuals
and the proportion of asymptomatic recovered individuals. With this they can
illustrate the influence of vaccination and a set non-pharmaceutical mitigation
methods on the transmission of COVID-19 on data from Italy, South Korea, the
United Kingdom, and the United States.\\

In \emph{A Physics-Informed Neural Network approach for compartmental
    epidemiological models} Millevoi \etal~\cite{Millevoi2023} address the issue
of describing the dynamically changing transmission rate, which is influenced by
the emergence of new variants or the implementation of non-pharmaceutical
measures. They employ a PINN to maintain an account of the changes of the
transmission rate included in the reproduction number and to approximate the
model state variables. To this end, Millevoi \etal employ the reproduction
number to reduce the system of differential equations to a single equation and
introduce a reduced-split version of the PINN, which initially trains on the
data and then trains to minimize the residual of the ODE. They test their
approach on five synthetic and two real-world scenarios from the early stages of
the COVID-19 pandemic in Italy. This method yields an increase in both accuracy
and training speed.

% -------------------------------------------------------------------

\section{Overview}

This thesis is comprised of four chapters. \Cref{chap:background}
presents with the theoretical overview of mathematical modeling in epidemiology,
with a particular focus on the SIR model. Subsequently, it shifts its focus to
neural networks, specifically on the background of physics-informed neural
networks (PINN) and their use in solving ordinary differential equations.
In~\Cref{chap:methods} outlines the methodology employed in this thesis. First
we present the data, that was collected by the Robert Koch Institute (RKI). Then
we present the PINN approaches, which are inspired by the work of Shaier \etal
and Millevoi \etal~\cite{Shaier2021,Millevoi2023}.~\Cref{chap:evaluation}
presents the setups and results of the experiments that we conduct. This chapter
is divided into two sections. The first section presents and discusses the
results concerning the transition rates of $\beta$ and $\alpha$. The subsequent
section presents the results concerning the reproduction value $\Rt$. Finally,
in \Cref{chap:conclusions}, we connect our results with the events of the
real-world and give an overview of potential further work.

% -------------------------------------------------------------------