10 months ago · 6ec05e32b6
--- a/chapters/chap03/chap03.tex
+++ b/chapters/chap03/chap03.tex
@@ -17,13 +17,13 @@ the \emph{Robert Koch Institute} (RKI). The second section outlines the
 
				 techniques we use to process this data to fit our project's requirements.
			
 
				 Subsequently, we give a theoretical overview of the PINN's that we employ.
			
 
				 These latter sections, establish the foundation for the implementations
			
 
				-described in~\Cref{sec:sir:setup} and~\Cref{sec:rsir:setup}.
			
 
				+used in~\Cref{sec:sir:setup} and~\Cref{sec:rsir:setup}.
			
 
				 
			
 
				 % -------------------------------------------------------------------
			
 
				 
			
 
				 \section{Epidemiological Data}
			
 
				 \label{sec:preprocessing}
			
 
				-In this thesis we want to analyze the COVID-19 pandemic in Germany utilizing
			
 
				+In this thesis, we want to analyze the COVID-19 pandemic in Germany utilizing
			
 
				 the SIR model and PINNs. For a PINN to learn the parameters of the SIR model,
			
 
				 we need pandemic data in the correct format for the approach. Let $N_t$ be the
			
 
				 number of training points, then let $i\in\{1, ..., N_t\}$
			
@@ -44,7 +44,7 @@ employ to transform it into the correct structure.
 
				 \subsection{RKI Data}
			
 
				 \label{sec:preprocessing:rki}
			
 
				 The RKI is a biomedical institute in Germany responsible for
			
 
				-the on monitoring and prevention of diseases. As the central institution of the
			
 
				+the monitoring and prevention of diseases. As the central institution of the
			
 
				 German government in the field of biomedicine, one of its tasks during the
			
 
				 COVID-19 pandemic was to track the number of infections and death cases in
			
 
				 Germany. The data was collected by university hospitals, research facilities
			
@@ -89,7 +89,7 @@ date is equivalent to the report date.\\
 
				 The RKI assumes that the duration of the illness under normal conditions is 14 days,
			
 
				 while the duration of severe cases is assumed to be 28 days. The recovery cases
			
 
				 in the dataset are calculated using these assumptions, by adding the duration on
			
 
				-the reference date if it is given. As stated, the recovery data should be used
			
 
				+the reference date if it is given. As the RKI states, the recovery data should be used
			
 
				 with caution. Since we require the recovery data for further calculations, the
			
 
				 following section presents the solutions we employed to address this issue.
			
 
				 
			
@@ -99,7 +99,7 @@ following section presents the solutions we employed to address this issue.
 
				 \label{sec:preprocessing:rq}
			
 
				 
			
 
				 At the outset of this section, we establish the format of the data, that is
			
 
				-necessary for training the PINNs. In this subsection, we present the method, that we
			
 
				+necessary for training the PINNs. In this section, we present the method, that we
			
 
				 employ to preprocess and transform the RKI data (see~\Cref{sec:preprocessing:rki})
			
 
				 into the training data. \\
			
 
				 
			
@@ -132,7 +132,7 @@ and releases them into the removed group $D$ days later.\\
 
				 
			
 
				 In order to solve the reduced SIR model, we employ a similar algorithm to that
			
 
				 used for the SIR model. However, in contrast to the recovery queue, we utilize
			
 
				-a set recovery rate $\alpha$ to transfer a portion $\alpha\boldsymbol{I}^{(i)}$
			
 
				+a constant recovery rate $\alpha$ to transfer a portion $\alpha\boldsymbol{I}^{(i)}$
			
 
				 of infections, which have recovered or died on the $i$'th day and put them into
			
 
				 the $\boldsymbol{R}^{(i+1)}$ compartment of the next day, which is irrelevant to
			
 
				 our purposes. The transformed data for both the SIR model and the reduced SIR
			
@@ -172,7 +172,7 @@ for a set of time points, with the aim of minimizing the term,
 
				 \begin{equation}\label{eq:SIR_obs_term}
			
 
				     \Big\|\hat{\boldsymbol{S}}^{(i)}-\boldsymbol{S}^{(i)}\Big\|^2  + \Big\|\hat{\boldsymbol{I}}^{(i)}-\boldsymbol{I}^{(i)}\Big\|^2 + \Big\|\hat{\boldsymbol{R}}^{(i)}-\boldsymbol{R}^{(i)}\Big\|^2,
			
 
				 \end{equation}
			
 
				-for each data point in the set of training dataset of a cardinality $N_t$ and with
			
 
				+for each data point in the set of training data points of a cardinality $N_t$ and with
			
 
				 $i\in\{1, ..., N_t\}$. Moreover, the aforementioned parameters must satisfy the system
			
 
				 of differential equations that govern the SIR model. For this reason, Shaier
			
 
				 \etal~\cite{Shaier2021} utilize a PINN framework to satisfy both requirements.
			
@@ -199,7 +199,7 @@ range of $[-1, 1]$. Therefore, we regularize the parameters using the
 
				 \begin{equation}
			
 
				     \tilde{\beta} = \tanh(\hat{\beta}),\quad \tilde{\alpha} = \tanh(\hat{\alpha}),
			
 
				 \end{equation}
			
 
				-where $\tilde{\alpha}$ are regularized model predictions.\\
			
 
				+where $\tilde{\alpha}$ and $\hat{\beta}$ are regularized model predictions.\\
			
 
				 
			
 
				 The input data must include the time point $\boldsymbol{t}^{(i)}$ and its
			
 
				 corresponding measured true values of $\Psi^{(i)}$.
			
@@ -218,7 +218,7 @@ residual of each ODE. The mean square error of the residuals constitutes the
 
				 physics loss
			
 
				 $\mathcal{L}_{\text{physics}}(\boldsymbol{t}, \Psi, \hat{\Psi})$.
			
 
				 The residuals are calculated using the model predictions $\hat{\Psi}$
			
 
				-and the regularized model predictions of the parameters, $\tilde{\beta}$ and $\tilde{\alpha}$.
			
 
				+and the regularized model predictions of the parameters, $\tilde{\alpha}$ and $\tilde{\beta}$.
			
 
				 The residuals are given by,
			
 
				 \begin{equation}
			
 
				     0=\frac{d\hat{\boldsymbol{S}}}{d\boldsymbol{t}}+ \tilde{\beta}\frac{\hat{\boldsymbol{S}}\hat{\boldsymbol{I}}}{N}, \quad 0=\frac{d\hat{\boldsymbol{I}}}{d\boldsymbol{t}} - \tilde{\beta}\frac{\hat{\boldsymbol{S}}\hat{\boldsymbol{I}}}{N} + \tilde{\alpha}\hat{\boldsymbol{I}}, \quad 0=\frac{d\hat{\boldsymbol{R}}}{d\boldsymbol{t}} + \hat{\alpha}\hat{\boldsymbol{I}}.
			
@@ -243,7 +243,7 @@ reproduction number $\Rt$ on the German data of the RKI.
 
				 \label{sec:pinn:rsir}
			
 
				 
			
 
				 The previous section illustrates the methodology we employ to determine the
			
 
				-constant transmission and recovery rates from a data set obtained from
			
 
				+constant transmission and recovery rates from a data set obtained during
			
 
				 the COVID-19 pandemic in Germany. In this section, we utilize PINNs to identify
			
 
				 the time-dependent reproduction number, $\Rt$, while reducing the number of
			
 
				 state variables and the reliance on assumptions, by decreasing the number of ODEs
			
@@ -277,8 +277,8 @@ minimizing the term,
 
				 \end{equation}
			
 
				 for each $i\in\{1,...,N_t\}$. In order to identify the reproduction number, the
			
 
				 PINN minimizes the residuals of the ODE during the training process. The
			
 
				-training process is analogous to the PINN, which identifies $\beta$
			
 
				-and $\alpha$ (see~\Cref{sec:pinn:sir}). However, there are two key differences. Firstly, the absence of
			
 
				+training process is analogous to the PINN, which identifies $\alpha$
			
 
				+and $\beta$ (see~\Cref{sec:pinn:sir}). However, there are two key differences. Firstly, the absence of
			
 
				 free, trainable parameters. Secondly, the inclusion of an additional state variable that
			
 
				 fluctuates in response to the input. While the state variable $\boldsymbol{I}$
			
 
				 is approximated using the error between the training data and the predicted
			
@@ -307,10 +307,10 @@ Then we train on composite loss function given by,
 
				 to achieve a better solution.\\
			
 
				 
			
 
				 Although we set the transmission rate to be time-dependent, we define the
			
 
				-recovery time constant over time to reduce the complexity of the problem. The
			
 
				+recovery time to be constant over time to reduce the complexity of the problem. The
			
 
				 RKI~\cite{GHInf} posits that the typical recovery period for the illness under
			
 
				 normal conditions is 14 days, while those individuals with severe cases require
			
 
				-approximately 28 days to recover. As we assume the case with normal condition,
			
 
				+approximately 28 days to recover. As we assume the case with a normal condition,
			
 
				 we can set the recovery time to $D=14$, which yields $\alpha = \nicefrac{1}{14}$.\\
			
 
				 
			
 
				 We perform extensive empirical evaluations of the methodology employed to
			
--- a/thesis.pdf
+++ b/thesis.pdf