123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361 |
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- % Author: Phillip Rothenbeck
- % Title: Investigating the Evolution of the COVID-19 Pandemic in Germany Using Physics-Informed Neural Networks
- % File: chap04/chap04.tex
- % Part: Experiments
- % Description:
- % summary of the content in this chapter
- % Version: 01.01.2012
- % %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
- \chapter{Experiments 10}
- \label{chap:evaluation}
- In the previous chapters we explained the methods (see~\Cref{chap:methods})
- based the theoretical background, that we established in~\Cref{chap:background}.
- In this chapter, we present the setups and results from the experiments and
- simulations, we ran. First, we tackle the experiments dedicated to find the
- epidemiological parameters of $\beta$ and $\alpha$ in synthetic and real-world
- data. Second, we identify the reproduction number in synthetic and real-world
- data of Germany. Each section, is divided in the setup and the results of the
- experiments.
- % -------------------------------------------------------------------
- \section{Identifying the Transition Rates on Real-World and Synthetic Data 5}
- \label{sec:sir}
- In this section we seek to find the transmission rate $\beta$ and the recovery
- rate $\alpha$ from either synthetic or preprocessed real-world data. The
- methodology that we employ to identify the transition rates is described
- in~\Cref{sec:pinn:sir}. Meanwhile, the methods we use to preprocess the
- real-world data is to be found in~\Cref{sec:preprocessing:rq}.
- % -------------------------------------------------------------------
- \subsection{Setup 1}
- \label{sec:sir:setup}
- In this section we show the setups for the training of our PINNs, that are
- supposed to find the transition parameters. This includes the specific
- parameters for the preprocessing and the configuration of the PINN their
- selves.\\
- In order to validate our method we first generate a dataset of synthetic data.
- We conduct this by solving~\Cref{eq:modSIR} for a given set of parameters.
- The parameters are set to $\alpha = \nicefrac{1}{3}$ and $\beta = \nicefrac{1}{2}$.
- The size of the population is $N = \expnumber{7.6}{6}$ and the initial amount of
- infectious individuals of is $I_0 = 10$. We simulate over 150 days and get a
- dataset of the form of~\Cref{fig:synthetic_SIR}.\\For the real-world RKI data we
- preprocess the raw data of each state and Germany separately using a
- recovery queue with a recovery period of 14 days. As for the population size of
- each state we set it to the respective value counted at the end of 2019\footnote{\url{https://de.statista.com/statistik/kategorien/kategorie/8/themen/63/branche/demographie/\#overview}}.
- The initial number of infectious individuals is set to the number of infected
- people on March 09. 2020 from the dataset. The data we extract spans from
- March 09. 2020 to June 22. 2023, which is a span of 1200 days and covers the time
- in which the COVID-19 disease was the most active and severe.
- \begin{figure}[h]
- %\centering
- \setlength{\unitlength}{1cm} % Set the unit length for coordinates
- \begin{picture}(12, 9.5) % Specify the size of the picture environment (width, height)
- \put(1.5, 4.5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{SIR_synth.pdf}
- \label{fig:synthetic_SIR}
- \end{subfigure}
- }
- \put(8, 4.5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Germany_SIR_14.pdf}
- \label{fig:germany_sir}
- \end{subfigure}
- }
- \put(0, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Schleswig_Holstein_SIR_14.pdf}
- \label{fig:schleswig_holstein_sir}
- \end{subfigure}
- }
- \put(4.75, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Berlin_SIR_14.pdf}
- \label{fig:berlin_sir}
- \end{subfigure}
- }
- \put(9.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Thueringen_SIR_14.pdf}
- \label{fig:thüringen_sir}
- \end{subfigure}
- }
- \end{picture}
- \caption{Synthetic and real-world training data. The synthetic data is
- generated with $\alpha=\nicefrac{1}{3}$ and $\beta=\nicefrac{1}{2}$
- and~\Cref{eq:modSIR}. The Germany data is taken from the death case
- data set. Exemplatory we show illustrations of the datasets of Schleswig
- Holstein, Berlin, and Thuringia. For the other states see~\Cref{chap:appendix} }
- \label{fig:datasets_sir}
- \end{figure}
- The PINN that we employ consists of seven hidden layers with twenty neurons
- each and an activation function of ReLU. For training, we use the Adam optimizer
- and the polynomial scheduler of the pytorch library with a base learning rate
- of $\expnumber{1}{-3}$. We train the model for 10000 epochs to extract the
- parameters. For each set of parameters we do 5 iterations to show stability of
- the values. Our configuration is similar to the configuration, that Shaier
- \etal.~\cite{Shaier2021} use for their work aside from the learning rate and the
- scheduler choice.\\
- In the next section we present the results of the simulations conducted with the
- setups that we describe in this section.
- % -------------------------------------------------------------------
- \subsection{Results 4}
- \label{sec:sir:results}
- \begin{figure}[t]
- \centering
- \includegraphics[width=0.7\textwidth]{reproducability.pdf}
- \caption{Visualization of all 5 predictions for the synthetic dataset,
- compared to the true values of $\alpha = \nicefrac{1}{3}$ and $\beta = \nicefrac{1}{2}$}
- \label{fig:reprod}
- \end{figure}
- In this section we describe the results, that we obtain from the conducted
- experiments, that we describe in the preceding section. First we show the
- results for the synthetic dataset and look at the accuracy and reproducibility.
- Then we present and discuss the results for the German states and Germany.\\
- The results of the experiment regarding the synthetic data can be seen
- in~\Cref{table:alpha_beta_synth} and in~\Cref{fig:reprod}.~\Cref{fig:reprod}
- shows the values of $\beta$ and $\alpha$ of each iteration compared to the true
- values of $\beta=\nicefrac{1}{2}$ and $\alpha=\nicefrac{1}{3}$. In~\Cref{table:alpha_beta_synth}
- we present the mean $\mu$ and standard variation $\sigma$ of both values across
- all 5 iterations.\\
- \begin{table}[h]
- \begin{center}
- \begin{tabular}{ccc|ccc}
- true $\alpha$ & $\mu(\alpha)$ & $\sigma(\alpha)$ & true $\beta$ & $\mu(\beta)$ & $\sigma(\beta)$ \\
- \hline
- 0.3333 & 0.3334 & 0.0011 & 0.5000 & 0.5000 & 0.0017 \\
- \end{tabular}
- \caption{The mean $\mu$ and standard variation $\sigma$ across the 5
- independent iterations of training our PINNs with the synthetic dataset.}
- \label{table:alpha_beta_synth}
- \end{center}
- \end{table}
- From the results we can see that the model is able to approximate the correct
- parameters for the small, synthetic dataset in each of the 5 iterations. Even
- though the predicted value is never exactly correct, the standard deviation is
- negligible small and taking the mean of multiple iterations yields an almost
- perfect result.\\
- In~\Cref{table:alpha_beta} we present the results of the training for the
- real-world data. These are presented from top to bottom, in the order of the
- community identification number, with the last entry being Germany. $\mu$ and
- $\sigma$ are both calculated across all 5 iterations of our experiment. We can
- see that the values of \emph{Hamburg} have the highest standard deviation, while
- \emph{Mecklenburg Vorpommern} has the smallest $\sigma$.\\
- \begin{table}[h]
- \begin{center}
- \begin{tabular}{c|cc|cc}
- & $\mu(\alpha)$ & $\sigma(\alpha)$ & $\mu(\beta)$ & $\sigma(\beta)$ \\
- \hline
- Schleswig Holstein & 0.0771 & 0.0010 & 0.0966 & 0.0013 \\
- Hamburg & 0.0847 & 0.0035 & 0.1077 & 0.0037 \\
- Niedersachsen & 0.0735 & 0.0014 & 0.0962 & 0.0018 \\
- Bremen & 0.0588 & 0.0018 & 0.0795 & 0.0025 \\
- Nordrhein-Westfalen & 0.0780 & 0.0009 & 0.1001 & 0.0011 \\
- Hessen & 0.0653 & 0.0016 & 0.0854 & 0.0020 \\
- Rheinland-Pfalz & 0.0808 & 0.0016 & 0.1036 & 0.0018 \\
- Baden-Württemberg & 0.0862 & 0.0014 & 0.1132 & 0.0016 \\
- Bayern & 0.0809 & 0.0021 & 0.1106 & 0.0027 \\
- Saarland & 0.0746 & 0.0021 & 0.0996 & 0.0024 \\
- Berlin & 0.0901 & 0.0008 & 0.1125 & 0.0008 \\
- Brandenburg & 0.0861 & 0.0008 & 0.1091 & 0.0010 \\
- Mecklenburg Vorpommern & 0.0910 & 0.0007 & 0.1167 & 0.0008 \\
- Sachsen & 0.0797 & 0.0017 & 0.1073 & 0.0022 \\
- Sachsen-Anhalt & 0.0932 & 0.0019 & 0.1207 & 0.0027 \\
- Thüringen & 0.0952 & 0.0011 & 0.1248 & 0.0016 \\
- Germany & 0.0803 & 0.0012 & 0.1044 & 0.0014 \\
- \end{tabular}
- \caption{Mean and standard variation across the 5 iterations, that we
- conducted for each German state and Germany as the whole country.}
- \label{table:alpha_beta}
- \end{center}
- \end{table}
- In~\Cref{fig:alpha_beta_mean_std} we visualize the means and standard variations
- in contrast to the national values. The states with the highest transmission rate
- values are Thuringia, Saxony Anhalt and Mecklenburg West-Pomerania. It is also,
- visible that all six of the eastern states have a higher transmission rate than
- Germany. These results may be explainable with the ratio of vaccinated individuals\footnote{\url{https://impfdashboard.de/}}.
- The eastern state have a comparably low complete vaccination ratio, accept for
- Berlin. While Berlin has a moderate vaccination ratio, it is also a hub of
- mobility, which means that contact between individuals happens much more often. This is also a reason for Hamburg being a state with an above national standard rate of transmission.
- \\
- We visualize these numbers in~\Cref{fig:alpha_beta_mean_std},
- where all means and standard variations are plotted as points, while the values
- for Germany are also plotted as lines to make a classification easier. It is
- visible that Hamburg, Baden-Württemberg, Bayern and all six of the states that
- lie in the eastern part of Germany have a higher transmission rate $\beta$ than
- overall Germany. Furthermore, it can be observed, that all values for the
- recovery $\alpha$ seem to be correlating to the value of $\beta$, which can be
- explained with the assumption that we make when we preprocess the data using the
- recovery queue by setting the recovery time to 14 days.
- \begin{figure}[h]
- \centering
- \includegraphics[width=\textwidth]{mean_std_alpha_beta_res.pdf}
- \label{fig:alpha_beta_mean_std}
- \end{figure}
- % -------------------------------------------------------------------
- \section{Reduced SIR Model 5}
- \label{sec:rsir}
- In this section we describe the experiments we conduct to identify the
- time-dependent reproduction number for both synthetic and real-world data.
- Similar to the previous section, we first describe the setup of our experiments
- and afterwards present the results. The methods we employ for the preprocessing
- are described in~\Cref{sec:preprocessing:rq} and for the PINN, that we use,
- are described in~\Cref{sec:pinn:rsir}.
- % -------------------------------------------------------------------
- \subsection{Setup 1}
- \label{sec:rsir:setup}
- In this section we describe the choice of parameters and configuration for data
- generation, preprocessing and the neural networks. We use these setups to train
- the PINNs to find the reproduction number on both synthetic and real-world data.\\
- For validation reasons we create a synthetic dataset, by setting the parameters
- of $\alpha$ and $\beta$ each to a specific value, and solving~\Cref{eq:modSIR}
- for a given time interval. We set $\alpha=\nicefrac{1}{3}$ and
- $\beta=\nicefrac{1}{2}$ as well as the population size $N=\expnumber{7.6}{6}$
- and the initial amount of infected people to $I_0=10$. Furthermore, we set our
- simulated time span to 150 days.We will use this dataset to show, that our
- method is working on a simple and minimal dataset.\\ For the real-world data we
- we processed the data of the dataset \emph{COVID-19-Todesfälle in Deutschland}
- to extract the number of infections in the whole of Germany, while we used the
- data of \emph{SARS-CoV-2 Infektionen in Deutschland} for the German states. For
- the preprocessing we use a constant rate for $\alpha$ to move individual into
- the removed compartment. First we choose $\alpha = \nicefrac{1}{14}$ as this is
- covers the time of recovery\footnote{\url{https://github.com/robert-koch-institut/SARS-CoV-2-Infektionen_in_Deutschland.git}}.
- Second we use $\alpha=\nicefrac{1}{5}$ since the peak of infectiousness is
- reached right in front or at 5 days into the infection\footnote{\url{https://www.infektionsschutz.de/coronavirus/fragen-und-antworten/ansteckung-uebertragung-und-krankheitsverlauf/}}.
- Just as in~\Cref{sec:sir} we set the population size $N$ of each state and
- Germany to the corresponding size at the end of 2019. Also, for the same reason
- we restrict the data points to an interval of 1200 days starting from March 09.
- 2020.
- \begin{figure}[h]
- %\centering
- \setlength{\unitlength}{1cm} % Set the unit length for coordinates
- \begin{picture}(12, 14.5) % Specify the size of the picture environment (width, height)
- \put(0, 10){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{I_synth.pdf}
- \caption{Synthetic data}
- \label{fig:synthetic_I}
- \end{subfigure}
- }
- \put(4.75, 10){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Germany_I_14.pdf}
- \caption{Germany with $\alpha=\nicefrac{1}{14}$}
- \label{fig:germany_I_14}
- \end{subfigure}
- }
- \put(9.5, 10){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Germany_I_5.pdf}
- \caption{Germany with $\alpha=\nicefrac{1}{5}$}
- \label{fig:germany_I_5}
- \end{subfigure}
- }
- \put(0, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Nordrhein_Westfalen_I_14.pdf}
- \caption{NRW with $\alpha=\nicefrac{1}{14}$}
- \label{fig:schleswig_holstein_I_14}
- \end{subfigure}
- }
- \put(4.75, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Hessen_I_14.pdf}
- \caption{Hessen with $\alpha=\nicefrac{1}{14}$}
- \label{fig:berlin_I_14}
- \end{subfigure}
- }
- \put(9.5, 5){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Thueringen_I_14.pdf}
- \caption{Thüringen with $\alpha=\nicefrac{1}{14}$}
- \label{fig:thüringen_I_14}
- \end{subfigure}
- }
- \put(0, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Nordrhein_Westfalen_I_5.pdf}
- \caption{NRW with $\alpha=\nicefrac{1}{5}$}
- \label{fig:schleswig_holstein_I_5}
- \end{subfigure}
- }
- \put(4.75, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Hessen_I_5.pdf}
- \caption{Hessen with $\alpha=\nicefrac{1}{5}$}
- \label{fig:berlin_I_5}
- \end{subfigure}
- }
- \put(9.5, 0){
- \begin{subfigure}{0.3\textwidth}
- \centering
- \includegraphics[width=\textwidth]{datasets_states/Thueringen_I_5.pdf}
- \caption{Thüringen with $\alpha=\nicefrac{1}{5}$}
- \label{fig:thüringen_I_5}
- \end{subfigure}
- }
- \end{picture}
- \caption{Visualization of the datasets for the training process.
- Illustration (a) is the synthetic data. For the real-world data we use a
- dataset with $\alpha=\nicefrac{1}{14}$ and $\alpha=\nicefrac{1}{5}$ each.
- (b) and (c) for Germany, (d) and (g) for Nordrhein-Westfalen (NRW), (e) and (h)
- for Hessen, and (f) and (i) for Thüringen.}
- \label{fig:i_datasets}
- \end{figure}
- For this task the chosen architecture of the neural network consists of 4 hidden
- layers with each 100 neurons. The activation function is the tangens
- hyperbolicus function tanh. We weight the data loss with a weight of
- $\expnumber{1}{6}$ into the total loss. The model is trained using a base
- learning rate of $\expnumber{1}{-3}$ with the same scheduler and optimizer as
- we use in~\Cref{sec:sir:setup}. We train the model for 20000 epochs. Also, we
- conduct each experiment 15 times to reduce the standard deviation.
- % -------------------------------------------------------------------
- \subsection{Results 4}
- \label{sec:rsir:results}
- % -------------------------------------------------------------------
|