% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Author: Phillip Rothenbeck % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks % File: chap01-introduction/chap01-introduction.tex % Part: introduction % Description: % summary of the content in this chapter % Version: 01.01.2012 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Introduction 5} \label{chap:introduction} In the early months of 2020, Germany, like many other countries, was struck by the novel \emph{Coronavirus Disease} (COVID-19). The pandemic, which originates in Wuhan, China, had a profound impact on the global community, paralyzing it for over two years. In response to the pandemic, the German government employed a multifaceted approach, encompassing the introduction of vaccines and non-pharmaceutical mitigation policies such as lockdowns. Between mitigation policies and varying strains of COVID-19, which have exhibited varying degrees of infectiousness and lethality, Germany had recorded over 38,400,000 infection cases and 174,000 deaths, as of the end of June in 2023. In light of these figures the need for an analysis arises.\\ The dynamics of the spread of disease transmission in the real-world are complex. A multitude of factors influence the course of a disease, and it is challenging to gain a comprehensive understanding of these factors and develop a tool that allows for the comparison of disease courses across different diseases and time points. The common approach in epidemiology to address this is the utilization of epidemiological models that approximate the dynamics by focusing on specific factors and modeling these using differential equations and other mathematical tools for modeling. These models provide transition rates and parameters that determine the behavior of a disease within the boundaries of the model. A fundamental epidemiological model, is the \emph{SIR model}, which was first proposed by Kermack and McKendrick~\cite{1927} in 1927. The SIR model is a compartmentalized model that divides the entire population into three distinct compartments. The first compartment is the \emph{susceptible} compartment, $S$, which contains all individuals of the population who are susceptible to infection. The second group, is the \emph{infectious} compartment, $I$, which comprises all individuals currently infected and capable of infecting susceptible individuals. Lastly, the \emph{removed} compartment, $R$, contains all individuals, who have succumbed to the disease or recovered from it and are therefore no longer susceptible to infection. The model is characterized by two transition rates: the transmission rate $\beta$, which controls the rate of individuals becoming infected and consequently transitioning from $S$ to $I$; and the recovery rate $\alpha$, which determines the rate at which individuals either recover or succumb to the disease, thereby transitioning from $I$ to $R$. In the context of the SIR model, the values of $\beta$ and $\alpha$ serve to quantify and determine the course of a pandemic.\\ The transition rates of $\beta$ and $\alpha$ serve to quantify a pandemic across its entire duration. However, it is important to recognize that a pandemic is not a static entity; rather, it evolves, and the infectiousness, deadliness and time to recovery associated with it change with each of its numerous variants. To address this issue, Liu and Stechlinski, and Setianto and Hidayat~\cite{Liu2012, Setianto2023}, propose an SIR model with time-dependent transition rates $\beta(t)$ and $\alpha(t)$. From these rates, they derive the time-dependent reproductive number $\Rt$, which represents the average number of individuals, that are infected by one infectious person. A high value for $\Rt$ indicates a rapid spread of the disease, while a low value either suggests either an outbreak or the disease is declining. This qualifies the time-dependent reproduction number $\Rt$ as an indicator of the pandemic's progression.\\ The SIR model is defined by a system of differential equations, that incorporate the transition rates, thereby depicting the fluctuation between the three compartments. For a given set of data, the transition rate can be identified by solving the set of differential systems. Recently, the data-driven approach of \emph{physics-informed neural networks} (PINN) has gained attention due to its capability of finding solutions to differential equations by fitting its predictions to both given data and the governing system of differential equations. By employing this methodology, Shaier \etal~\cite{Shaier2021} were able to find the transition rate on synthetic data. Additionally, Millevoi \etal~\cite{Millevoi2023} were able to identify the reproduction number $\Rt$ for both synthetic and Italian COVID-19 data using an approach based on a reduced version of the SIR model.\\ The Robert Koch Institute has collected incident and death case data from the beginning of the outbreak in Germany to the present. This data will be utilitzed in this bachelor thesis to investigate the transition rates and reproduction number for each German state and the country as a whole, employing the methodologies proposed by Shaier \etal and Millevoi \etal. Additionally, the findings will be contextualized and correlated with the events of the real world.\\ % ------------------------------------------------------------------- \section{Related work 2} \label{sec:relatedWork} In \emph{Forecasting Epidemics Through Nonparametric Estimation of Time-Dependent Transmission Rates Using the SEIR Model}~\cite{Smirnova2017}, Smirnova \etal endeavor to identify a stochastic methodology for estimating the time-dependent transmission rate $\beta(t)$. This is in response to the limitations of earlier parametric estimation methods, which are prone instability due to the difficulty in identifying parameter finding and a low amount of available data. They achieve this by projecting the time-dependent transmission rate onto a finite subspace, that is defined by Legendre polynomials. Subsequently, they compare the three regularization techniques of variational (Tikhonov’s) regularization, truncated singular value decomposition (TSVD), and modified TSVD to ascertain the most reliable method for forecasting with limited data. Their findings indicate that modified TSVD provides the most stable forecasts on limited data, as demonstrated on both simulated data and real-world data from the 1918 influenza pandemic and the 2014-2015 Ebola epidemic.\\ In their publication, entitled \emph{Data-driven approaches for predicting spread of infectious diseases through DINNs: Disease Informed Neural Networks}, Shaier \etal~\cite{Shaier2021} put forth a data-driven approach for identifying the parameters of epidemiological models. The authors apply physics-informed neural networks to the compartmental SIR models, and refer to their method as disease informed neural networks (DINN). In their work, they demonstrate the capacity of DINNs to forecast the trajectory of epidemics and pandemics. They underpin the efficacy of their approach by applying it to 11 diseases, that have previously been modeled, including examples such as COVID, HIV, Tuberculosis and Ebola. In their experiments they employ the SIDR (susceptible, infectious, dead, recovered) model. Finally, they present that this method is a robust and effective means of identifying the parameters of a SIR model.\\ In their article \emph{A physics-informed neural network to model COVID-19 infection and hospitalization scenarios}, Berkhahn and Ehrhard~\cite{Berkhahn2022} employ the susceptible, vaccinated, infectious, hospitalized and removed (SVIHR) model. They solve the system of differential equations inherent to the SVIHR model by the means of PINNs. The authors utilize a dataset of German COVID-19 data, covering the time span from the inceptions of the outbreak to the end of 2021. The proposed PINN methodology initially estimates the SVIHR model parameters and subsequently forecasts the data. For comparative purposes, Berkhahn and Ehrhard employ the method of non-standard finite differences (NSFD) as well. In the validation process, the two forecasting methods project the trajectory of COVID-19 from mid-April onwards. Berkhahn and Ehrhard find that the PINN is able to adapt to varying vaccination rates and emerging variants.\\ In their work, \emph{Data-Driven Deep-Learning Algorithm for Asymptomatic COVID-19 Model with Varying Mitigation Measures and Transmission Rate}, Olumoyin \etal~\cite{Olumoyin2021} employ an alternative methodology for identifying the time-dependent transmission rate of an asymptomatic-SIR model. On the premise that not all the infectious individuals are reported and included in the data available. The algorithm they introduce, utilizes the cumulative and daily reported infection cases and symptomatic recovered cases, to demonstrate the effect of different mitigation measures and to ascertain the size of the part of non-symptomatic individuals in the total number of infective individuals and the proportion of asymptomatic recovered individuals. With this they can illustrate the influence of vaccination and a set non-pharmaceutical mitigation methods on the transmission of COVID-19 on data from Italy, South Korea, the United Kingdom, and the United States.\\ In \emph{A Physics-Informed Neural Network approach for compartmental epidemiological models} Millevoi \etal~\cite{Millevoi2023} address the issue of describing the dynamically changing transmission rate, which is influenced by the emergence of new variants or the implementation of non-pharmaceutical measures. They employ a PINN to maintain an account of the changes of the transmission rate included in the reproduction number and to approximate the model state variables. To this end, Millevoi \etal employ the reproduction number to reduce the system of differential equations to a single equation and introduce a reduced-split version of the PINN, which initially trains on the data and then trains to minimize the residual of the ODE. They test their approach on five synthetic and two real-world scenarios from the early stages of the COVID-19 pandemic in Italy. This method yields an increase in both accuracy and training speed. % ------------------------------------------------------------------- \section{Overview} This thesis is comprised of four chapters. \Cref{chap:background} presents with the theoretical overview of mathematical modeling in epidemiology, with a particular focus on the SIR model. Subsequently, it shifts its focus to neural networks, specifically on the background of physics-informed neural networks (PINN) and their use in solving ordinary differential equations. In~\Cref{chap:methods} outlines the methodology employed in this thesis. First we present the data, that was collected by the Robert Koch Institute (RKI). Then we present the PINN approaches, which are inspired by the work of Shaier \etal and Millevoi \etal~\cite{Shaier2021,Millevoi2023}.~\Cref{chap:evaluation} presents the setups and results of the experiments that we conduct. This chapter is divided into two sections. The first section presents and discusses the results concerning the transition rates of $\beta$ and $\alpha$. The subsequent section presents the results concerning the reproduction value $\Rt$. Finally, in \Cref{chap:conclusions}, we connect our results with the events of the real-world and give an overview of potential further work. % -------------------------------------------------------------------