Bladeren bron

add recovery queue

Phillip Rothenbeck 9 maanden geleden
bovenliggende
commit
af900d9a7e

+ 1 - 0
.vscode/ltex.dictionary.en-US.txt

@@ -1,2 +1,3 @@
 PINN
 codomain
+PINNs

+ 5 - 0
.vscode/settings.json

@@ -0,0 +1,5 @@
+{
+    "cSpell.words": [
+        "PINN"
+    ]
+}

+ 1 - 1
chapters/chap02/chap02.tex

@@ -236,7 +236,7 @@ number of individuals, while the majority of the population remains susceptible.
 The infectious group has not yet infected any individuals thus
 neither recovery nor mortality is possible. Let $I_0\in\mathbb{N}$ be
 the number of infected individuals at the beginning of the disease. Then,
-\begin{equation}
+\begin{equation}\label{eq:startCond}
   \begin{split}
     S(0) &= N - I_{0},\\
     I(0) &= I_{0},\\

+ 87 - 46
chapters/chap03/chap03.tex

@@ -23,65 +23,71 @@ implementations described in~\Cref{sec:sir:setup} and~\Cref{sec:rsir:setup}.
 
 \section{Data Preprocessing   3}
 \label{sec:preprocessing}
-For the PINN's to work with the data available to us, it must be in the form
-that is dictated by the epidemiological models, that are supposed to be solved.
-The PINN that solves the SIR model (see~\Cref{sec:pinn:dinn}), needs data points
-containing a time point $t^{(i)}$ and the corresponding true values of the
-compartments $\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}$ and
-$\boldsymbol{R}^{(i)}$ with $i\in\{1, ..., N_t\}$ and the number of training
-points $N_t$. For the reduced SIR model (see~\Cref{sec:pandemicModel:rsir}), we
-need pairs of $(t^{(i)}, \boldsymbol{I}^{(i)})$. This section, concentrates on
-the structure of the available data and the methods we employ to convert it to
+In order for the PINNs to be effective with the data available to us, it is
+necessary for the data to be in the format required by the epidemiological
+models, which the PINNs will solve. Let $N_t$ be the number of training points,
+then let $i\in\{1, ..., N_t\}$ be the index of the training points. The data
+required by the PINN for solving the SIR model (see~\Cref{sec:pinn:dinn}),
+consists of pairs $(\boldsymbol{t}^{(i)}, (\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}, \boldsymbol{R}^{(i)}))$.
+Given that the system of differential equations representing the reduced SIR
+model (see~\Cref{sec:pandemicModel:rsir}) consists of a single differential
+equation for $I$, it is necessary to obtain pairs of the form
+$(\boldsymbol{t}^{(i)}, \boldsymbol{I}^{(i)})$. This section, focuses on the
+structure of the available data and the methods we employ to transform it into
 the correct structure.
 
 % -------------------------------------------------------------------
 
 \subsection{RKI Data   2}
 \label{sec:preprocessing:rki}
-The Robert Koch Institute works on monitoring and preventing diseases. As the
-central institution of the German government in the field of biomedicine, one of
-its tasks during the COVID-19 pandemic was it to track the number of infections
-and death cases in Germany. The data was collected by university hospitals,
-research facilities and laboratories, by conducting tests. Each new case must be
-reported after 24 hours at the latest to the respective state authority. Each
-state authority collects the cases for a day and must report them to the next
-working day to the RKI. The RKI refurbishes the data and releases statistics or
-update repositories holding the information for the public to access. For the
-work of this thesis we concentrate on two of these repositories.\\
-
-The first repository is called \emph{COVID-19-Todesfälle in Deutschland}\footnote{\url{https://github.com/robert-koch-institut/COVID-19-Todesfaelle_in_Deutschland.git}}. It
-holds data points with each having the date on which the respective data was
-collected. The dates reach from the ninth of march in 2020 to the present day.
-For every date, the total number of infection and death cases, the new death
-cases, and the case-fatality ratio. The total number of infection and death
-cases is the number of all cases that were reported until that day including the
-newly reported data. The dataset includes two more datasets, that hold the death
-case information for either the age groups or the individual states of Germany.\\
+The Robert Koch Institute is responsible for the on monitoring and prevention of
+diseases. As the central institution of the German government in the field of
+biomedicine, one of its tasks during the COVID-19 pandemic was it to track the
+number of infections and death cases in Germany. The data was collected by
+university hospitals, research facilities and laboratories through the
+conduction of tests. Each new case must be reported within a period of 24 hours
+at the latest to the respective state authority. Each state authority collects
+the cases for a day and must report them to the RKI by the following working
+day. The RKI then refines the data and releases statistics and updates its
+repositories holding the information for the public to access. For the purposes
+of this thesis we concentrate on two of these repositories.\\
+
+The first repository is called \emph{COVID-19-Todesfälle in Deutschland}\footnote{\url{https://github.com/robert-koch-institut/COVID-19-Todesfaelle_in_Deutschland.git}}.
+The dataset comprises discrete data points, each with a date indicating the
+point in time at which the respective data was collected. The dates span from
+March 9, 2020, to the present day. For each date, the dataset provides the total
+number of infection and death cases, the number  of new deaths, and the
+case-fatality ratio. The total number of infection and death cases represents
+the sum of all cases reported up to that date, including the newly reported
+data. The dataset includes two additional datasets, that contain the death case
+information organized by age group or by the individual states within Germany on
+a weekly basis.\\
 
 \begin{figure}[h]
     \centering
     \includegraphics[width=\textwidth]{dataset_visualization.pdf}
-    \caption{A visualization the total death case and infection case data for
+    \caption{A visualization of the total death case and infection case data for
         each day from the data set \emph{COVID-19-Todesfälle in Deutschland}. Status
         of the 20'th of August 2024.}
     \label{fig:rki_data}
 \end{figure}
 
-The second repository is called \emph{SARS-CoV-2 Infektionen in Deutschland}
-This dataset holds detailed information about the infections of each county per
-day. The counties are encoded using the \emph{Community Identification Number}\footnote{\url{https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/_inhalt.html}},
-wherein the first two digits denounce the state, the third number stand for the
-government district and the last two digits show the county. Each data point
-shows the gender, the age group, the death, infection and recovery cases and the
-reference and report date. The reference date marks the start of the individual
-feeling sick. If this is unknown, the reference date equals the report date.\\
+The second repository is entitled \emph{SARS-CoV-2 Infektionen in Deutschland}.
+This dataset contains comprehensive data regarding the infections of each county
+on a daily basis. The counties are encoded using the \emph{Community Identification Number}\footnote{\url{https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/_inhalt.html}},
+wherein the first two digits denote the state, the third digit represents the
+government district, and the last two digits indicate the county. Each data
+point displays the gender, the age group, number death, infection and recovery
+cases and the reference and report date. The reference date marks the onset of
+illness in the individual. In the absence of this information, the reference
+date is equivalent to the report date.\\
 
-The RKI assumes the duration of the illness under normal conditions to 14 days,
+The RKI assumes that the duration of the illness under normal conditions is 14 days,
 while the duration of severe cases is assumed to be 28 days. The recovery cases
 in the dataset are calculated using these assumptions, by adding the duration on
-the reference date if it is given. As written in the ReadMe, the recovery data
-should be used with caution. Since we need the recovery data for further
-calculations, the next section presents the solutions we employed to address
+the reference date if it is given. As stated in the ReadMe, the recovery data
+should be used with caution. Since we require the recovery data for further
+calculations, the following section presents the solutions we employed to address
 this issue.
 
 % -------------------------------------------------------------------
@@ -89,11 +95,46 @@ this issue.
 \subsection{Recovery Queue and Recovery Rate   1}
 \label{sec:preprocessing:rq}
 
-In~\Cref{sec:preprocessing:rki} we present the data which the RKI provides.
-While containing the data of infections and death cases in form of accumulated
-or daily case. The data for the susceptible and removed compartments are absent.
-While this is not an issue for the reduced SIR model
-(see~\Cref{sec:pandemicModel:rsir}), we need the data. For com
+At the outset of this section, we establish the format of the data, that is
+necessary for training the PINNs. In this subsection, we present the method, that we
+employ to preprocess and transform the RKI data (see~\Cref{sec:preprocessing:rki})
+into the training data. \\
+
+In order to obtain the SIR data we require the size of each SIR compartment for
+each time point.  The infection case data for the German states is available on
+a daily basis. To obtain the daily cases for the entire country we need to
+differentiate the total number of cases. The size of the population is defined
+as the respective size at the beginning of 2020.  Using the starting conditions
+of~\Cref{eq:startCond}, we iterate through each day, modifying the sizes of the
+groups in a consecutive manner. For each iteration we subtract the new infection
+cases from $\boldsymbol{S}^{(i-1)}$ to obtain $\boldsymbol{S}^{(i)}$, for
+$\boldsymbol{I}^{(i)}$, we add the new cases and subtract deaths and recoveries,
+and the size of $\boldsymbol{R}^{(i)}$ is obtained by adding the new deaths and
+recoveries as they occur.\\
+
+As previously stated in~\Cref{sec:preprocessing:rki} the data on recoveries may
+either be unreliable or is entirely absent. To address this, we propose a method
+for computing the number of recovered individuals per day. Under the assumption
+that recovery takes $D$ days, we present the recovery queue, a data structure
+that holds the number of infections for a given day, retains them for $D$ days,
+and releases them into the removed group $D$ days later.\\
+
+\begin{figure}[h]
+    \centering
+    \includegraphics[width=\textwidth]{recovery_queue.pdf}
+    \caption{The recovery queue takes in the infected individuals for the $k$'th
+        day and releases them $D$ days later into the removed group.}
+    \label{fig:rki_data}
+\end{figure}
+
+In order to solve the reduced SIR model, we employ a similar algorithm to that
+used for the SIR model. However, in contrast to the recovery queue, we utilize
+the set recovery rate $\alpha$ to transfer a portion $\alpha\boldsymbol{I}^{(i)}$
+of infections, which have recovered on the $i$ and put them into the
+$\boldsymbol{R}^{(i)}$ compartment, which is irrelevant to our purposes. \\
+
+The transformed data for both the SIR model and the reduced SIR model are then
+employed by the PINN models, which we describe in the subsequent section.
 
 % -------------------------------------------------------------------
 

BIN
images/oscilator.pdf


BIN
images/recovery_queue.pdf


BIN
thesis.pdf