• Liang Shize (A0178178M)
  • Nandana Murthy(A0178563R)
  • Tan Zhi Wen David (A0056418A)
  • Zhao Ziyuan(A0178184U)
  • Zhu Huiying(A0178222H)

 

1. Background

In rapidly ageing Singapore, the demographic trends are worrying. While the birth-rate fell to a seven-year low, the number of deaths recorded was the highest in at least two decades. The death-rate rose by 4 per cent from 20,017 deaths in 2016 to 20,905 deaths last year, the report on Registration of Births and Deaths-2017 showed. The report was released by the Immigration and Checkpoints Authority (ICA). Therefore, this study aims to analyze factors related to deaths in order to help government have more insights beyond the current situation. According to our analysis, Admissions to Accident & Emergency Departments is one of factors that has correlation with death.

 

2. Hypothesis

We assume that the relationship between number of deaths aggregated by month and number of admissions to Accident & Emergency Departments per month are correlated. In other words, increase in the number of admissions will raise death count. In this case, the dependent variable is number of deaths aggregated monthly, while the independent variable is admissions to Accident & Emergency Departments aggregated by month.

 

3. Dataset

3.1 Data Source

The two datasets were collected from Department of Statistics Singapore. The admissions to Accident & Emergency Departments data was extracted from the dataset called ‘Admissions to Public Sector Hospitals, Monthly’, whereas the monthly death data was extracted  from ‘Deaths by Ethnic Group and Sex, Monthly’. (Link:http://www.tablebuilder.singstat.gov.sg/publicfacing/createDataTable.action?refId=15193 http://www.tablebuilder.singstat.gov.sg/publicfacing/createDataTable.action?refId=15167

 

3.2 Data Transformation

Originally, the data structure was a multi-dimension table which was not suitable for analytics. Therefore, we transformed the data structure in order to make it more suitable for analytics. Besides, we selected the data from Jan 2012 to Dec 2017 only as we wanted to mainly focus on its recent trend.

 

3.2 Data Records

The data was extracted from Jan 2012 to Dec 2017. It has 72 records in total which are enough for transfer function, because generally speaking, the threshold of transfer function is 60 records.

 

4. ARIMA Model

4.1 Observing plots

As shown in Figure 1 and 2, the monthly deaths are non-stationary with a trend, while the admissions to Accidents & Emergency Department are stationary. In order to test their stationary, KPSS test was employed. The null hypothesis of KPSS test is that the data is level or trend stationary. Therefore, if the data is level or trend stationary, its p-value should be larger than 0.05.

According to the outputs of KPSS test shown in Figure 3, it is obvious that the monthly deaths are non-stationary, while the admissions to Accidents & Emergency Department are stationary.

1

Figure 1 Line chart of monthly death

 

2

Figure 2 Line chart of admissions to Accidents & Emergency Department

 

3.13.2

Figure 3 Output of KPSS test

 

4.2 Determination of parameters

Based on the ACF and PACF graphs in Figure 4, the parameters of ARIMA model can be determined.

  • p – Non-seasonal Autoregression Order

According to the PACF graph, there is a spike at lag 2. Therefore, p is equal to 2.

  • d – Non-seasonal Differencing Order

Because the original data is stationary. Therefore, the differencing order is equal to 0.

  • q – Non-seasonal Moving Average Order

According to the ACF graph, there is a spike at lag 2. Therefore, p is equal to 2.

  • P – Seasonal Autoregression Order

According to the PACF graph, there is a spike at lags 12 and 24. Therefore, P is equal to 2.

  • D – Seasonal Differencing Order

Because the original admissions data is stationary. Therefore, D is equal to 0.

  • Q – Seasonal Moving Average Order

According to the ACF graph, there is a spike at lag 12. Therefore, p is equal to 1.

4

Figure 4 Time Series Basic Diagnostics

 

4.3 Result of Seasonal ARIMA (2,0,2) (2,0,1)

The result of ARIMA is shown below. It is obvious that some coefficients are not significant, so the insignificant parameters have to be removed one by one to find out the parameters of best ARIMA model.

5

Figure 5 Output of Seasonal ARIMA (2,0,2) (2,0,1)

 

4.4 Output of ARIMA Model Group

In order to find out the best parameters, ARIMA Model Group was employed. The output of ARIMA Model Group is shown as below. According to AIC criteria, the best solution is (2,0,2) (1,0,1).

6

Figure 6 Output of ARIMA Model Group

 

4.5 Output of Seasonal ARIMA (2,0,2) (1,0,1)

The output of Season ARIMA (2,0,2) (1,0,1) is shown as below. The model is regarded as the best model for the following reasons:

  1. Firstly, according to Forecast in Figure 8, the predictive trend and range are reasonable.
  2. Secondly, according to Parameter Estimates in Figure 9, all parameters are significant.
  3. Thirdly, according to Residuals in Figure 10, the residuals are randomly distributed, while there are no spikes in ACF and PACF graphs. Therefore, this model can be used for further analysis.

7

Figure 7 Model Summary

 

8

Figure 8 Forecast

 

9

Figure 9 Parameter Estimates

 

10

Figure 10 Residuals

 

5. Transfer Function

5.1 Prewhitening

After finding the suitable ARIMA Model, the parameters of X’s ARIMA was used to pre-whiten the input and output series in order to get their cross-correlation graph which is shown as below.

11

Figure 11 Prewhitening Plot

 

5.2 Identifying Parameters and Fitting Transfer Function Noise Model

According to Figure 11, there are two spikes at lags 10 and 11. It means the non-zero autocorrelation occurs at lag10 and the values decay after lag 11. Therefore, b=10, s=11-10=1. Besides, r is equal to2. Finally, the parameters we used are shown in Figure below:

12

Figure 12 Transfer Function Model

 

5.3 Diagnostic Checks

  • Check Residuals

According to Figure 13, the residuals are randomly distributed.

13

Figure 13 Residuals of Transfer Function

 

  • Check Significance of Parameters

According to the result, all parameters are significant.

14

Figure 14 Parameter Estimates of Transfer Function

 

5.4 Model Comparison

Although the above solution is good enough, other parameters were tried for comparison. Finally, according to the output of model comparison in Figure 15, the solution mentioned above is the best one.

15

Figure 15 Model Comparison

 

5.5 Expanded Formula

16

Figure 16 Formula of Transfer Function

The formula of transfer function is shown above. However, in order to understand the formula better, we expanded the model in full with backshift operator as shown below. The expanded formula includes Y-Deaths, X-Admissions to Accident & Emergency Departments and e-Error Term. The number at the bottom right corner means the lags of its corresponding item.

17

Figure 17 Expanded Formula

 

5.6 Model Summary

As shown in Model Summary in Figure 18, the MAPE is equal to 2.77, while the MAE is equal to 46.92. The performance is acceptable. The forecasting points and confidence interval are shown in Figure 19.

18

Figure 18 Model Summary

 

19

Figure 19 Forecasting graph

 

5.7 Output of test data

As shown in Figure 20, the MAE of test data is equal to 100.157, while the MAPE is equal to 5.55. Both evaluation metrics are larger than that of train data. However, the result still can be a reference for government.

Time Predictive Value Actual Value
Jan 2018 1696.946 1924
Feb 2018 1565.284 1662
Mar 2018 1666.980 1776
Apr 2018 1654.818 1624
May 2018 1684.973 1803
Jun 2018 1710.541 1729
 
MAE 100.157
MAPE 5.545054604

Figure 20 Output of test data

 

  1. Conclusion

This transfer function indicates the relationship between Deaths and Admissions to Accident & Emergency Departments. It validates the initial hypothesis that the Admissions to Accident & Emergency Departments leads number of deaths aggregated by month in Singapore. Besides, the performance of model is acceptable. Therefore, government can use this model to predict deaths in advance and then take actions to lower death count. For example, if the government sees an increase in the number of admissions, it can put more effort to provide medical assistance and perform researches to understand the underlying reasons.