Kaynağa Gözat

add cooments to chap 3

FlipediFlop 9 ay önce
ebeveyn
işleme
dba2b14fce
2 değiştirilmiş dosya ile 47 ekleme ve 45 silme
  1. 47 45
      chapters/chap03/chap03.tex
  2. BIN
      thesis.pdf

+ 47 - 45
chapters/chap03/chap03.tex

@@ -23,12 +23,13 @@ implementations described in~\Cref{sec:sir:setup} and~\Cref{sec:rsir:setup}.
 
 \section{Epidemiological Data}
 \label{sec:preprocessing}
-In order for the PINNs to be effective with the data available to us, it is
-necessary for the data to be in the format required by the epidemiological
-models, which the PINNs will solve. Let $N_t$ be the number of training points,
-then let $i\in\{1, ..., N_t\}$ be the index of the training points. The data
-required by the PINN for solving the SIR model (see~\Cref{sec:pinn:dinn}),
-consists of pairs $(\boldsymbol{t}^{(i)}, (\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}, \boldsymbol{R}^{(i)}))$.
+In this thesis we want to analyze the COVID-19 pandemic In Germany utilizing
+the SIR model and PINNs. For a PINN to learn the parameters of the SIR model,
+we need pandemic data in the correct format for the approach. Let $N_t$ be the
+number of training points, then let $i\in\{1, ..., N_t\}$
+be the index of the training points. The data required by the PINN for solving
+the SIR model (see~\Cref{sec:pinn:dinn}), consists of pairs
+$(\boldsymbol{t}^{(i)}, (\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}, \boldsymbol{R}^{(i)}))$.
 Given that the system of differential equations representing the reduced SIR
 model (see~\Cref{sec:pandemicModel:rsir}) consists of a single differential
 equation for $I$, it is necessary to obtain pairs of the form
@@ -40,19 +41,20 @@ the correct structure.
 
 \subsection{RKI Data}
 \label{sec:preprocessing:rki}
-The Robert Koch Institute is responsible for the on monitoring and prevention of
-diseases. As the central institution of the German government in the field of
-biomedicine, one of its tasks during the COVID-19 pandemic was it to track the
-number of infections and death cases in Germany. The data was collected by
-university hospitals, research facilities and laboratories through the
-conduction of tests. Each new case must be reported within a period of 24 hours
-at the latest to the respective state authority. Each state authority collects
-the cases for a day and must report them to the RKI by the following working
-day. The RKI then refines the data and releases statistics and updates its
-repositories holding the information for the public to access. For the purposes
-of this thesis we concentrate on two of these repositories.\\
+The Robert Koch Institute is a biomedical institute in Germany responsible for
+the on monitoring and prevention of diseases. As the central institution of the
+German government in the field of biomedicine, one of its tasks during the
+COVID-19 pandemic was it to track the number of infections and death cases in
+Germany. The data was collected by university hospitals, research facilities
+and laboratories through the conduction of tests. Each new case must be
+reported within a period of 24 hours at the latest to the respective state
+authority. Each state authority collects the cases for a day and must report
+them to the RKI by the following working day. The RKI then refines the data and
+releases statistics and updates its repositories holding the information for
+the public to access. For the purposes of this thesis we concentrate on two of
+these repositories.\\
 
-The first repository is called \emph{COVID-19-Todesfälle in Deutschland}\footnote{\url{https://github.com/robert-koch-institut/COVID-19-Todesfaelle_in_Deutschland.git}}.
+The first repository is called \emph{COVID-19-Todesfälle in Deutschland}~\cite{GHDead}.
 The dataset comprises discrete data points, each with a date indicating the
 point in time at which the respective data was collected. The dates span from
 March 9, 2020, to the present day. For each date, the dataset provides the total
@@ -72,7 +74,7 @@ a weekly basis.\\
     \label{fig:rki_data}
 \end{figure}
 
-The second repository is entitled \emph{SARS-CoV-2 Infektionen in Deutschland}\footnote{\url{https://github.com/robert-koch-institut/SARS-CoV-2-Infektionen_in_Deutschland.git}}.
+The second repository is entitled \emph{SARS-CoV-2 Infektionen in Deutschland}~\cite{GHInf}.
 This dataset contains comprehensive data regarding the infections of each county
 on a daily basis. The counties are encoded using the \emph{Community Identification Number}\footnote{\url{https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/Gemeindeverzeichnis/_inhalt.html}},
 wherein the first two digits denote the state, the third digit represents the
@@ -85,10 +87,9 @@ date is equivalent to the report date.\\
 The RKI assumes that the duration of the illness under normal conditions is 14 days,
 while the duration of severe cases is assumed to be 28 days. The recovery cases
 in the dataset are calculated using these assumptions, by adding the duration on
-the reference date if it is given. As stated in the ReadMe, the recovery data
-should be used with caution. Since we require the recovery data for further
-calculations, the following section presents the solutions we employed to address
-this issue.
+the reference date if it is given. As stated, the recovery data should be used
+with caution. Since we require the recovery data for further calculations, the
+following section presents the solutions we employed to address this issue.
 
 % -------------------------------------------------------------------
 
@@ -144,7 +145,7 @@ employed by the PINN models, which we describe in the subsequent section.
 In the preceding section, we present the methods we employ to preprocess and
 format the data from the RKI in accordance with the specifications required for
 the work of this thesis. In this section, we will present the method we employ
-to identify the non-time-dependent SIR parameters $\beta$ and $\alpha$ for the
+to identify the SIR parameters $\beta$ and $\alpha$ for the
 data. As a foundation for our work, we draw upon the work of Shaier et
 al.~\cite{Shaier2021}, to solve the SIR system of ODEs using PINNs.\\
 
@@ -182,7 +183,7 @@ Their approach, which they refer to as the \emph{disease-informed neural network
 the two transition rates $\alpha$ and $\beta$. This method
 achieves this by finding an approximate solution of to the inverse problem of
 physics-informed neural networks (see~\Cref{sec:pinn}). In terms of the terms of
-the SIR model, a PINN addresses the inverse problemin two ways. First, it minimizes the mean of~\Cref{eq:SIR_obs_term}
+the SIR model, a PINN addresses the inverse problem in two ways. First, it minimizes the mean of~\Cref{eq:SIR_obs_term}
 by bringing the model predictions $(\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R})$
 closer to the actual values $(\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}})$
 for each time point. Second, it reduces the residuals of the ODEs that
@@ -191,8 +192,8 @@ inverse problem presets that a parameter is unknown. Thus, we designate the para
 $\beta$ and $\alpha$ as free, learnable parameters, $\widehat{\beta}$ and
 $\widehat{\alpha}$. These separate trainable parameters are values that are
 optimized during the training process and must fit the equations of the set of
-ODEs. Furthermore, we know, that the transition rates
-do not surpass the value of $1$. Consequently, we force the value of both rates to be in a
+ODEs. Assuming that the values of the transition rates stay below
+1~\cite{Shaier2021}, we force the value of both rates to be in a
 range of $[-1, 1]$. Therefor, we regularize the parameters using the
 \emph{tangens hyperbolicus}. This results in the terms,
 \begin{equation}
@@ -205,31 +206,30 @@ The input data must include the time point $\boldsymbol{t}^{(i)}$ and its
 corresponding measured true values of $(\boldsymbol{S}^{(i)}, \boldsymbol{I}^{(i)}, \boldsymbol{R}^{(i)})$.
 In its forward path, the PINN receives the time point $\boldsymbol{t}^{(i)}$ as its input, from which it
 calculates its model prediction $(\hat{\boldsymbol{S}}^{(i)}, \hat{\boldsymbol{I}}^{(i)}, \hat{\boldsymbol{R}}^{(i)})$
-based on its model parameters $\theta$. Subsequently, the model computes the loss function. It calculates the observation loss by taking the
+based on its model parameters $\theta$. Subsequently, the model computes the loss function. It calculates the data loss by taking the
 mean squared error of~\Cref{eq:SIR_obs_term} over all $N_t$ training samples.
-Therefore, the term for the observation loss is,
+Therefore, the term for the data loss is,
 \begin{equation}
-    \mathcal{L}_{\text{obs}}(\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = \frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}\Big\|^2  + \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2 + \Big\|\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}\Big\|^2,
+    \mathcal{L}_{\text{data}}(\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = \frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}\Big\|^2  + \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2 + \Big\|\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}\Big\|^2,
 \end{equation}
-is the term for the observation loss. Given superior performance in practical applications
+is the term for the data loss. Given superior performance in practical applications
 relative to the ODEs of~\Cref{eq:sir}, we utilize the ODEs of~\Cref{eq:modSIR}
 in our physics loss. In order for the model to learn the system of differential,
 it is necessary to obtain the residual of each ODE. The mean square error of the residuals constitutes
-the physics loss $\mathcal{L}_{\text{physiks}}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}})$.
+the physics loss $\mathcal{L}_{\text{physics}}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}})$.
 The residuals are calculated using the model predictions $(\hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}})$ and the regularized model predictions of the parameters $\widehat{\beta}$ and $\widehat{\alpha}$. The residuals are given by,
 \begin{equation}
     0=\frac{d\hat{\boldsymbol{S}}}{d\boldsymbol{t}}+ \widehat{\beta}\frac{\hat{\boldsymbol{S}}\hat{\boldsymbol{I}}}{N}, \quad 0=\frac{d\hat{\boldsymbol{I}}}{d\boldsymbol{t}} - \widehat{\beta}\frac{\hat{\boldsymbol{S}}\hat{\boldsymbol{I}}}{N} + \widehat{\alpha}\hat{\boldsymbol{I}}, \quad 0=\frac{d\hat{\boldsymbol{R}}}{d\boldsymbol{t}} + \widehat{\alpha}\hat{\boldsymbol{I}}.
 \end{equation}
 Thus,
 \begin{equation}
-    \begin{split}
-        \mathcal{L}_{\text{SIR}}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = &\bigg\|\frac{d\hat{\boldsymbol{S}}}{d\boldsymbol{t}}+ \widehat{\beta}\frac{\hat{\boldsymbol{S}}\hat{\boldsymbol{I}}}{N}\bigg\|^2 + \bigg\|\frac{d\hat{\boldsymbol{I}}}{d\boldsymbol{t}} - \widehat{\beta}\frac{\hat{\boldsymbol{S}}\hat{\boldsymbol{I}}}{N} + \widehat{\alpha}\hat{\boldsymbol{I}}\bigg\|^2 + \bigg\|\frac{d\hat{\boldsymbol{R}}}{d\boldsymbol{t}} + \widehat{\alpha}\hat{\boldsymbol{I}}\bigg\|^2\\
-        + &\frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}\Big\|^2  + \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2 + \Big\|\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}\Big\|^2,
-    \end{split}
+    \mathcal{L}_{\text{SIR}}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) = \mathcal{L}_{\text{physics}}(\boldsymbol{t}, \boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}}) + \mathcal{L}_{\text{data}}(\boldsymbol{S}, \boldsymbol{I}, \boldsymbol{R}, \hat{\boldsymbol{S}}, \hat{\boldsymbol{I}}, \hat{\boldsymbol{R}})
 \end{equation}
-is the equation of the total loss for our approach. This loss value is then
-back-propagated through our network, while the model predictions of the
-parameters $\beta$ and $\alpha$ are optimized using the loss as well.\\
+is the multi-objective loss equation encapsuling both the physics loss and the
+data loss for our approach. By minimizing these loss terms our model learn the
+given training data but also the physics of the system. This enables our model
+to simultaneously learn the values of the parameters $\beta$ and $\alpha$
+during training. \\
 
 As this section concentrates on the finding of the time constant parameters
 $\beta$ and $\alpha$, the next section will show our approach of finding the
@@ -257,7 +257,7 @@ time-dependent and constant across the entire duration of the pandemic may not
 accurately reflect the dynamics of the spread of a real-world disease correctly.
 Although we set the transmission rate to be time-dependent, the recovery time
 is assumed to be relatively constant over time. The Robert Koch
-Institute\footnote{\url{https://github.com/robert-koch-institut/SARS-CoV-2-Infektionen_in_Deutschland.git}}
+Institute~\cite{GHInf}
 posits that the typical recovery period for the illness under normal conditions
 is 14 days, while those individuals with severe cases require approximately 28
 days to recover. In the light of the negligible number of severe cases in
@@ -292,20 +292,22 @@ The PINN receives the input of $\boldsymbol{t}^{(i)}$ and generates a prediction
 ($\hat{\boldsymbol{I}}^{(i)}$, $\Rt^{(i)}$). As previously stated, the PINN minimizes
 the distance between the true values of $\boldsymbol{I}$ and the model predictions
 $\hat{\boldsymbol{I}}$ by minimizing the mean squared error. Consequently, the
-observation loss function is defined by,
+data loss function is defined by,
 \begin{equation}
-    \mathcal{L}_{\text{rSIR}}(\boldsymbol{I}, \hat{\boldsymbol{I}}) = \frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2.
+    \mathcal{L}_{\text{data}}(\boldsymbol{I}, \hat{\boldsymbol{I}}) = \frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2.
 \end{equation}
 The physics loss function is defined as the squared error of the residual of the
 ODE. The residual of the reduced SIR model is given by,
 \begin{equation}
     0 = \frac{dI_s}{dt_s} - \alpha(t_f - t_0)(\Rt - 1)I_s(t_s).
 \end{equation}
-By combining the observation loss with the physics loss, we arrive at the total loss for
-the PINN that solves the reduced SIR model, which is given by,
+During training we first fit the data agnostic to physics utilizing only the
+data loss $\mathcal{L}_{\text{data}}(\boldsymbol{I}, \hat{\boldsymbol{I}})$.
+Then we train on composite loss function given by,
 \begin{equation}
-    \mathcal{L}_{\text{rSIR}}(\boldsymbol{t}, \boldsymbol{I}, \hat{\boldsymbol{I}}) = \bigg\|\frac{dI_s}{dt_s} - \alpha(t_f - t_0)(\Rt - 1)I_s(t_s)\bigg\|^2+ \frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2.
+    \mathcal{L}_{\text{rSIR}}(\boldsymbol{t}, \boldsymbol{I}, \hat{\boldsymbol{I}}) = \bigg\|\frac{dI_s}{dt_s} - \alpha(t_f - t_0)(\Rt - 1)I_s(t_s)\bigg\|^2+ \frac{1}{N_t}\sum_{i=1}^{N_t} \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2,
 \end{equation}
+to achieve a better solution.\\
 
 The process of determining the reproduction number, along with the other
 techniques, that this chapter presents find application in the following chapter.

BIN
thesis.pdf