|
@@ -23,65 +23,71 @@ implementations described in~\Cref{sec:sir:setup} and~\Cref{sec:rsir:setup}.
|
|
|
|
|
|
\section{Data Preprocessing 3}
|
|
|
\label{sec:preprocessing}
|
|
|
-For the PINN's to work with the data available to us, it must be in the form
|
|
|
-that is dictated by the epidemiological models, that are supposed to be solved.
|
|
|
-The PINN that solves the SIR model (see~\Cref{sec:pinn:dinn}), needs data points
|
|
|
-containing a time point $t^{(i)}$ and the corresponding true values of the
|
|
|
-compartments $\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}$ and
|
|
|
-$\boldsymbol{R}^{(i)}$ with $i\in\{1, ..., N_t\}$ and the number of training
|
|
|
-points $N_t$. For the reduced SIR model (see~\Cref{sec:pandemicModel:rsir}), we
|
|
|
-need pairs of $(t^{(i)}, \boldsymbol{I}^{(i)})$. This section, concentrates on
|
|
|
-the structure of the available data and the methods we employ to convert it to
|
|
|
+In order for the PINNs to be effective with the data available to us, it is
|
|
|
+necessary for the data to be in the format required by the epidemiological
|
|
|
+models, which the PINNs will solve. Let $N_t$ be the number of training points,
|
|
|
+then let $i\in\{1, ..., N_t\}$ be the index of the training points. The data
|
|
|
+required by the PINN for solving the SIR model (see~\Cref{sec:pinn:dinn}),
|
|
|
+consists of pairs $(\boldsymbol{t}^{(i)}, (\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}, \boldsymbol{R}^{(i)}))$.
|
|
|
+Given that the system of differential equations representing the reduced SIR
|
|
|
+model (see~\Cref{sec:pandemicModel:rsir}) consists of a single differential
|
|
|
+equation for $I$, it is necessary to obtain pairs of the form
|
|
|
+$(\boldsymbol{t}^{(i)}, \boldsymbol{I}^{(i)})$. This section, focuses on the
|
|
|
+structure of the available data and the methods we employ to transform it into
|
|
|
the correct structure.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|
|
|
\subsection{RKI Data 2}
|
|
|
\label{sec:preprocessing:rki}
|
|
|
-The Robert Koch Institute works on monitoring and preventing diseases. As the
|
|
|
-central institution of the German government in the field of biomedicine, one of
|
|
|
-its tasks during the COVID-19 pandemic was it to track the number of infections
|
|
|
-and death cases in Germany. The data was collected by university hospitals,
|
|
|
-research facilities and laboratories, by conducting tests. Each new case must be
|
|
|
-reported after 24 hours at the latest to the respective state authority. Each
|
|
|
-state authority collects the cases for a day and must report them to the next
|
|
|
-working day to the RKI. The RKI refurbishes the data and releases statistics or
|
|
|
-update repositories holding the information for the public to access. For the
|
|
|
-work of this thesis we concentrate on two of these repositories.\\
|
|
|
-
|
|
|
-The first repository is called \emph{COVID-19-Todesfälle in Deutschland}\footnote{\url{https://github.com/robert-koch-institut/COVID-19-Todesfaelle_in_Deutschland.git}}. It
|
|
|
-holds data points with each having the date on which the respective data was
|
|
|
-collected. The dates reach from the ninth of march in 2020 to the present day.
|
|
|
-For every date, the total number of infection and death cases, the new death
|
|
|
-cases, and the case-fatality ratio. The total number of infection and death
|
|
|
-cases is the number of all cases that were reported until that day including the
|
|
|
-newly reported data. The dataset includes two more datasets, that hold the death
|
|
|
-case information for either the age groups or the individual states of Germany.\\
|
|
|
+The Robert Koch Institute is responsible for the on monitoring and prevention of
|
|
|
+diseases. As the central institution of the German government in the field of
|
|
|
+biomedicine, one of its tasks during the COVID-19 pandemic was it to track the
|
|
|
+number of infections and death cases in Germany. The data was collected by
|
|
|
+university hospitals, research facilities and laboratories through the
|
|
|
+conduction of tests. Each new case must be reported within a period of 24 hours
|
|
|
+at the latest to the respective state authority. Each state authority collects
|
|
|
+the cases for a day and must report them to the RKI by the following working
|
|
|
+day. The RKI then refines the data and releases statistics and updates its
|
|
|
+repositories holding the information for the public to access. For the purposes
|
|
|
+of this thesis we concentrate on two of these repositories.\\
|
|
|
+
|
|
|
+The first repository is called \emph{COVID-19-Todesfälle in Deutschland}\footnote{\url{https://github.com/robert-koch-institut/COVID-19-Todesfaelle_in_Deutschland.git}}.
|
|
|
+The dataset comprises discrete data points, each with a date indicating the
|
|
|
+point in time at which the respective data was collected. The dates span from
|
|
|
+March 9, 2020, to the present day. For each date, the dataset provides the total
|
|
|
+number of infection and death cases, the number of new deaths, and the
|
|
|
+case-fatality ratio. The total number of infection and death cases represents
|
|
|
+the sum of all cases reported up to that date, including the newly reported
|
|
|
+data. The dataset includes two additional datasets, that contain the death case
|
|
|
+information organized by age group or by the individual states within Germany on
|
|
|
+a weekly basis.\\
|
|
|
|
|
|
\begin{figure}[h]
|
|
|
\centering
|
|
|
\includegraphics[width=\textwidth]{dataset_visualization.pdf}
|
|
|
- \caption{A visualization the total death case and infection case data for
|
|
|
+ \caption{A visualization of the total death case and infection case data for
|
|
|
each day from the data set \emph{COVID-19-Todesfälle in Deutschland}. Status
|
|
|
of the 20'th of August 2024.}
|
|
|
\label{fig:rki_data}
|
|
|
\end{figure}
|
|
|
|
|
|
-The second repository is called \emph{SARS-CoV-2 Infektionen in Deutschland}
|
|
|
-This dataset holds detailed information about the infections of each county per
|
|
|
-day. The counties are encoded using the \emph{Community Identification Number}\footnote{\url{https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/_inhalt.html}},
|
|
|
-wherein the first two digits denounce the state, the third number stand for the
|
|
|
-government district and the last two digits show the county. Each data point
|
|
|
-shows the gender, the age group, the death, infection and recovery cases and the
|
|
|
-reference and report date. The reference date marks the start of the individual
|
|
|
-feeling sick. If this is unknown, the reference date equals the report date.\\
|
|
|
+The second repository is entitled \emph{SARS-CoV-2 Infektionen in Deutschland}.
|
|
|
+This dataset contains comprehensive data regarding the infections of each county
|
|
|
+on a daily basis. The counties are encoded using the \emph{Community Identification Number}\footnote{\url{https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/_inhalt.html}},
|
|
|
+wherein the first two digits denote the state, the third digit represents the
|
|
|
+government district, and the last two digits indicate the county. Each data
|
|
|
+point displays the gender, the age group, number death, infection and recovery
|
|
|
+cases and the reference and report date. The reference date marks the onset of
|
|
|
+illness in the individual. In the absence of this information, the reference
|
|
|
+date is equivalent to the report date.\\
|
|
|
|
|
|
-The RKI assumes the duration of the illness under normal conditions to 14 days,
|
|
|
+The RKI assumes that the duration of the illness under normal conditions is 14 days,
|
|
|
while the duration of severe cases is assumed to be 28 days. The recovery cases
|
|
|
in the dataset are calculated using these assumptions, by adding the duration on
|
|
|
-the reference date if it is given. As written in the ReadMe, the recovery data
|
|
|
-should be used with caution. Since we need the recovery data for further
|
|
|
-calculations, the next section presents the solutions we employed to address
|
|
|
+the reference date if it is given. As stated in the ReadMe, the recovery data
|
|
|
+should be used with caution. Since we require the recovery data for further
|
|
|
+calculations, the following section presents the solutions we employed to address
|
|
|
this issue.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
@@ -89,11 +95,46 @@ this issue.
|
|
|
\subsection{Recovery Queue and Recovery Rate 1}
|
|
|
\label{sec:preprocessing:rq}
|
|
|
|
|
|
-In~\Cref{sec:preprocessing:rki} we present the data which the RKI provides.
|
|
|
-While containing the data of infections and death cases in form of accumulated
|
|
|
-or daily case. The data for the susceptible and removed compartments are absent.
|
|
|
-While this is not an issue for the reduced SIR model
|
|
|
-(see~\Cref{sec:pandemicModel:rsir}), we need the data. For com
|
|
|
+At the outset of this section, we establish the format of the data, that is
|
|
|
+necessary for training the PINNs. In this subsection, we present the method, that we
|
|
|
+employ to preprocess and transform the RKI data (see~\Cref{sec:preprocessing:rki})
|
|
|
+into the training data. \\
|
|
|
+
|
|
|
+In order to obtain the SIR data we require the size of each SIR compartment for
|
|
|
+each time point. The infection case data for the German states is available on
|
|
|
+a daily basis. To obtain the daily cases for the entire country we need to
|
|
|
+differentiate the total number of cases. The size of the population is defined
|
|
|
+as the respective size at the beginning of 2020. Using the starting conditions
|
|
|
+of~\Cref{eq:startCond}, we iterate through each day, modifying the sizes of the
|
|
|
+groups in a consecutive manner. For each iteration we subtract the new infection
|
|
|
+cases from $\boldsymbol{S}^{(i-1)}$ to obtain $\boldsymbol{S}^{(i)}$, for
|
|
|
+$\boldsymbol{I}^{(i)}$, we add the new cases and subtract deaths and recoveries,
|
|
|
+and the size of $\boldsymbol{R}^{(i)}$ is obtained by adding the new deaths and
|
|
|
+recoveries as they occur.\\
|
|
|
+
|
|
|
+As previously stated in~\Cref{sec:preprocessing:rki} the data on recoveries may
|
|
|
+either be unreliable or is entirely absent. To address this, we propose a method
|
|
|
+for computing the number of recovered individuals per day. Under the assumption
|
|
|
+that recovery takes $D$ days, we present the recovery queue, a data structure
|
|
|
+that holds the number of infections for a given day, retains them for $D$ days,
|
|
|
+and releases them into the removed group $D$ days later.\\
|
|
|
+
|
|
|
+\begin{figure}[h]
|
|
|
+ \centering
|
|
|
+ \includegraphics[width=\textwidth]{recovery_queue.pdf}
|
|
|
+ \caption{The recovery queue takes in the infected individuals for the $k$'th
|
|
|
+ day and releases them $D$ days later into the removed group.}
|
|
|
+ \label{fig:rki_data}
|
|
|
+\end{figure}
|
|
|
+
|
|
|
+In order to solve the reduced SIR model, we employ a similar algorithm to that
|
|
|
+used for the SIR model. However, in contrast to the recovery queue, we utilize
|
|
|
+the set recovery rate $\alpha$ to transfer a portion $\alpha\boldsymbol{I}^{(i)}$
|
|
|
+of infections, which have recovered on the $i$ and put them into the
|
|
|
+$\boldsymbol{R}^{(i)}$ compartment, which is irrelevant to our purposes. \\
|
|
|
+
|
|
|
+The transformed data for both the SIR model and the reduced SIR model are then
|
|
|
+employed by the PINN models, which we describe in the subsequent section.
|
|
|
|
|
|
% -------------------------------------------------------------------
|
|
|
|