% %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Author: Phillip Rothenbeck % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks % File: chap04/chap04.tex % Part: Experiments % Description: % summary of the content in this chapter % Version: 01.01.2012 % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% \chapter{Experiments 10} \label{chap:evaluation} In the previous chapters we explained the methods (see~\Cref{chap:methods}) based the theoretical background, that we established in~\Cref{chap:background}. In this chapter, we present the setups and results from the experiments and simulations, we ran. First, we tackle the experiments dedicated to find the epidemiological parameters of $\beta$ and $\alpha$ in synthetic and real-world data. Second, we identify the reproduction number in synthetic and real-world data of Germany. Each section, is divided in the setup and the results of the experiments. % ------------------------------------------------------------------- \section{Identifying the Transition Rates on Real-World and Synthetic Data 5} \label{sec:sir} In this section we seek to find the transmission rate $\beta$ and the recovery rate $\alpha$ from either synthetic or preprocessed real-world data. The methodology that we employ to identify the transition rates is described in~\Cref{sec:pinn:sir}. Meanwhile, the methods we use to preprocess the real-world data is to be found in~\Cref{sec:preprocessing:rq}. % ------------------------------------------------------------------- \subsection{Setup 1} \label{sec:sir:setup} In this section we show the setups for the training of our PINNs, that are supposed to find the transition parameters. This includes the specific parameters for the preprocessing and the configuration of the PINN their selves.\\ In order to validate our method we first generate a dataset of synthetic data. We conduct this by solving~\Cref{eq:modSIR} for a given set of parameters. The parameters are set to $\alpha = \nicefrac{1}{3}$ and $\beta = \nicefrac{1}{2}$. The size of the population is $N = \expnumber{7.6}{6}$ and the initial amount of infectious individuals of is $I_0 = 10$. We simulate over 150 days and get a dataset of the form of~\Cref{fig:synthetic_SIR}.\\For the real-world RKI data we preprocess the raw data of each state and Germany separately using a recovery queue with a recovery period of 14 days. As for the population size of each state we set it to the respective value counted at the end of 2019\footnote{\url{https://de.statista.com/statistik/kategorien/kategorie/8/themen/63/branche/demographie/\#overview}}. The initial number of infectious individuals is set to the number of infected people on March 09. 2020 from the dataset. The data we extract spans from March 09. 2020 to June 22. 2023, which is a span of 1200 days and covers the time in which the COVID-19 disease was the most active and severe. \begin{figure}[h] %\centering \setlength{\unitlength}{1cm} % Set the unit length for coordinates \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height) \put(1.5, 4.5){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{SIR_synth.pdf} \label{fig:synthetic_SIR} \end{subfigure} } \put(8, 4.5){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Germany_SIR_14.pdf} \label{fig:germany_sir} \end{subfigure} } \put(0, 0){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Schleswig_Holstein_SIR_14.pdf} \label{fig:schleswig_holstein_sir} \end{subfigure} } \put(4.75, 0){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Berlin_SIR_14.pdf} \label{fig:berlin_sir} \end{subfigure} } \put(9.5, 0){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Thueringen_SIR_14.pdf} \label{fig:thüringen_sir} \end{subfigure} } \end{picture} \caption{Synthetic and real-world training data. The synthetic data is generated with $\alpha=\nicefrac{1}{3}$ and $\beta=\nicefrac{1}{2}$ and~\Cref{eq:modSIR}. The Germany data is taken from the death case data set. Exemplatory we show illustrations of the datasets of Schleswig Holstein, Berlin, and Thuringia. For the other states see~\Cref{chap:appendix} } \label{fig:datasets_sir} \end{figure} The PINN that we employ consists of seven hidden layers with twenty neurons each and an activation function of ReLU. For training, we use the Adam optimizer and the polynomial scheduler of the pytorch library with a base learning rate of $\expnumber{1}{-3}$. We train the model for 10000 epochs to extract the parameters. For each set of parameters we do 5 iterations to show stability of the values. Our configuration is similar to the configuration, that Shaier \etal.~\cite{Shaier2021} use for their work aside from the learning rate and the scheduler choice.\\ In the next section we present the results of the simulations conducted with the setups that we describe in this section. % ------------------------------------------------------------------- \subsection{Results 4} \label{sec:sir:results} \begin{figure}[t] \centering \includegraphics[width=0.7\textwidth]{reproducability.pdf} \caption{Visualization of all 5 predictions for the synthetic dataset, compared to the true values of $\alpha = \nicefrac{1}{3}$ and $\beta = \nicefrac{1}{2}$} \label{fig:reprod} \end{figure} In this section we describe the results, that we obtain from the conducted experiments, that we describe in the preceding section. First we show the results for the synthetic dataset and look at the accuracy and reproducibility. Then we present and discuss the results for the German states and Germany.\\ The results of the experiment regarding the synthetic data can be seen in~\Cref{table:alpha_beta_synth} and in~\Cref{fig:reprod}.~\Cref{fig:reprod} shows the values of $\beta$ and $\alpha$ of each iteration compared to the true values of $\beta=\nicefrac{1}{2}$ and $\alpha=\nicefrac{1}{3}$. In~\Cref{table:alpha_beta_synth} we present the mean $\mu$ and standard variation $\sigma$ of both values across all 5 iterations.\\ \begin{table}[h] \begin{center} \begin{tabular}{ccc|ccc} true $\alpha$ & $\mu(\alpha)$ & $\sigma(\alpha)$ & true $\beta$ & $\mu(\beta)$ & $\sigma(\beta)$ \\ \hline 0.3333 & 0.3334 & 0.0011 & 0.5000 & 0.5000 & 0.0017 \\ \end{tabular} \caption{The mean $\mu$ and standard variation $\sigma$ across the 5 independent iterations of training our PINNs with the synthetic dataset.} \label{table:alpha_beta_synth} \end{center} \end{table} From the results we can see that the model is able to approximate the correct parameters for the small, synthetic dataset in each of the 5 iterations. Even though the predicted value is never exactly correct, the standard deviation is negligible small and taking the mean of multiple iterations yields an almost perfect result.\\ In~\Cref{table:alpha_beta} we present the results of the training for the real-world data. These are presented from top to bottom, in the order of the community identification number, with the last entry being Germany. $\mu$ and $\sigma$ are both calculated across all 5 iterations of our experiment. We can see that the values of \emph{Hamburg} have the highest standard deviation, while \emph{Mecklenburg Vorpommern} has the smallest $\sigma$.\\ \begin{table}[h] \begin{center} \begin{tabular}{c|cc|cc} & $\mu(\alpha)$ & $\sigma(\alpha)$ & $\mu(\beta)$ & $\sigma(\beta)$ \\ \hline Schleswig Holstein & 0.0771 & 0.0010 & 0.0966 & 0.0013 \\ Hamburg & 0.0847 & 0.0035 & 0.1077 & 0.0037 \\ Niedersachsen & 0.0735 & 0.0014 & 0.0962 & 0.0018 \\ Bremen & 0.0588 & 0.0018 & 0.0795 & 0.0025 \\ Nordrhein-Westfalen & 0.0780 & 0.0009 & 0.1001 & 0.0011 \\ Hessen & 0.0653 & 0.0016 & 0.0854 & 0.0020 \\ Rheinland-Pfalz & 0.0808 & 0.0016 & 0.1036 & 0.0018 \\ Baden-Württemberg & 0.0862 & 0.0014 & 0.1132 & 0.0016 \\ Bayern & 0.0809 & 0.0021 & 0.1106 & 0.0027 \\ Saarland & 0.0746 & 0.0021 & 0.0996 & 0.0024 \\ Berlin & 0.0901 & 0.0008 & 0.1125 & 0.0008 \\ Brandenburg & 0.0861 & 0.0008 & 0.1091 & 0.0010 \\ Mecklenburg Vorpommern & 0.0910 & 0.0007 & 0.1167 & 0.0008 \\ Sachsen & 0.0797 & 0.0017 & 0.1073 & 0.0022 \\ Sachsen-Anhalt & 0.0932 & 0.0019 & 0.1207 & 0.0027 \\ Thüringen & 0.0952 & 0.0011 & 0.1248 & 0.0016 \\ Germany & 0.0803 & 0.0012 & 0.1044 & 0.0014 \\ \end{tabular} \caption{Mean and standard variation across the 5 iterations, that we conducted for each German state and Germany as the whole country.} \label{table:alpha_beta} \end{center} \end{table} In~\Cref{fig:alpha_beta_mean_std} we visualize the means and standard variations in contrast to the national values. The states with the highest transmission rate values are Thuringia, Saxony Anhalt and Mecklenburg West-Pomerania. It is also, visible that all six of the eastern states have a higher transmission rate than Germany. These results may be explainable with the ratio of vaccinated individuals\footnote{\url{https://impfdashboard.de/}}. The eastern state have a comparably low complete vaccination ratio, accept for Berlin. While Berlin has a moderate vaccination ratio, it is also a hub of mobility, which means that contact between individuals happens much more often. This is also a reason for Hamburg being a state with an above national standard rate of transmission. \\ We visualize these numbers in~\Cref{fig:alpha_beta_mean_std}, where all means and standard variations are plotted as points, while the values for Germany are also plotted as lines to make a classification easier. It is visible that Hamburg, Baden-Württemberg, Bayern and all six of the states that lie in the eastern part of Germany have a higher transmission rate $\beta$ than overall Germany. Furthermore, it can be observed, that all values for the recovery $\alpha$ seem to be correlating to the value of $\beta$, which can be explained with the assumption that we make when we preprocess the data using the recovery queue by setting the recovery time to 14 days. \begin{figure}[h] \centering \includegraphics[width=\textwidth]{mean_std_alpha_beta_res.pdf} \label{fig:alpha_beta_mean_std} \end{figure} % ------------------------------------------------------------------- \section{Reduced SIR Model 5} \label{sec:rsir} In this section we describe the experiments we conduct to identify the time-dependent reproduction number for both synthetic and real-world data. Similar to the previous section, we first describe the setup of our experiments and afterwards present the results. The methods we employ for the preprocessing are described in~\Cref{sec:preprocessing:rq} and for the PINN, that we use, are described in~\Cref{sec:pinn:rsir}. % ------------------------------------------------------------------- \subsection{Setup 1} \label{sec:rsir:setup} In this section we describe the choice of parameters and configuration for data generation, preprocessing and the neural networks. We use these setups to train the PINNs to find the reproduction number on both synthetic and real-world data.\\ For validation reasons we create a synthetic dataset, by setting the parameters of $\alpha$ and $\beta$ each to a specific value, and solving~\Cref{eq:modSIR} for a given time interval. We set $\alpha=\nicefrac{1}{3}$ and $\beta=\nicefrac{1}{2}$ as well as the population size $N=\expnumber{7.6}{6}$ and the initial amount of infected people to $I_0=10$. Furthermore, we set our simulated time span to 150 days.We will use this dataset to show, that our method is working on a simple and minimal dataset.\\ For the real-world data we we processed the data of the dataset \emph{COVID-19-Todesfälle in Deutschland} to extract the number of infections in the whole of Germany, while we used the data of \emph{SARS-CoV-2 Infektionen in Deutschland} for the German states. For the preprocessing we use a constant rate for $\alpha$ to move individual into the removed compartment. First we choose $\alpha = \nicefrac{1}{14}$ as this is covers the time of recovery\footnote{\url{https://github.com/robert-koch-institut/SARS-CoV-2-Infektionen_in_Deutschland.git}}. Second we use $\alpha=\nicefrac{1}{5}$ since the peak of infectiousness is reached right in front or at 5 days into the infection\footnote{\url{https://www.infektionsschutz.de/coronavirus/fragen-und-antworten/ansteckung-uebertragung-und-krankheitsverlauf/}}. Just as in~\Cref{sec:sir} we set the population size $N$ of each state and Germany to the corresponding size at the end of 2019. Also, for the same reason we restrict the data points to an interval of 1200 days starting from March 09. 2020. \begin{figure}[h] %\centering \setlength{\unitlength}{1cm} % Set the unit length for coordinates \begin{picture}(12, 14.5) % Specify the size of the picture environment (width, height) \put(0, 10){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{I_synth.pdf} \caption{Synthetic data} \label{fig:synthetic_I} \end{subfigure} } \put(4.75, 10){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Germany_I_14.pdf} \caption{Germany with $\alpha=\nicefrac{1}{14}$} \label{fig:germany_I_14} \end{subfigure} } \put(9.5, 10){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Germany_I_5.pdf} \caption{Germany with $\alpha=\nicefrac{1}{5}$} \label{fig:germany_I_5} \end{subfigure} } \put(0, 5){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Nordrhein_Westfalen_I_14.pdf} \caption{NRW with $\alpha=\nicefrac{1}{14}$} \label{fig:schleswig_holstein_I_14} \end{subfigure} } \put(4.75, 5){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Hessen_I_14.pdf} \caption{Hessen with $\alpha=\nicefrac{1}{14}$} \label{fig:berlin_I_14} \end{subfigure} } \put(9.5, 5){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Thueringen_I_14.pdf} \caption{Thüringen with $\alpha=\nicefrac{1}{14}$} \label{fig:thüringen_I_14} \end{subfigure} } \put(0, 0){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Nordrhein_Westfalen_I_5.pdf} \caption{NRW with $\alpha=\nicefrac{1}{5}$} \label{fig:schleswig_holstein_I_5} \end{subfigure} } \put(4.75, 0){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Hessen_I_5.pdf} \caption{Hessen with $\alpha=\nicefrac{1}{5}$} \label{fig:berlin_I_5} \end{subfigure} } \put(9.5, 0){ \begin{subfigure}{0.3\textwidth} \centering \includegraphics[width=\textwidth]{datasets_states/Thueringen_I_5.pdf} \caption{Thüringen with $\alpha=\nicefrac{1}{5}$} \label{fig:thüringen_I_5} \end{subfigure} } \end{picture} \caption{Visualization of the datasets for the training process. Illustration (a) is the synthetic data. For the real-world data we use a dataset with $\alpha=\nicefrac{1}{14}$ and $\alpha=\nicefrac{1}{5}$ each. (b) and (c) for Germany, (d) and (g) for Nordrhein-Westfalen (NRW), (e) and (h) for Hessen, and (f) and (i) for Thüringen.} \label{fig:i_datasets} \end{figure} For this task the chosen architecture of the neural network consists of 4 hidden layers with each 100 neurons. The activation function is the tangens hyperbolicus function tanh. We weight the data loss with a weight of $\expnumber{1}{6}$ into the total loss. The model is trained using a base learning rate of $\expnumber{1}{-3}$ with the same scheduler and optimizer as we use in~\Cref{sec:sir:setup}. We train the model for 20000 epochs. Also, we conduct each experiment 15 times to reduce the standard deviation. % ------------------------------------------------------------------- \subsection{Results 4} \label{sec:rsir:results} % -------------------------------------------------------------------