- Liang Shize (A0178178M)
- Nandana Murthy(A0178563R)
- Tan Zhi Wen David (A0056418A)
- Zhao Ziyuan(A0178184U)
- Zhu Huiying(A0178222H)
In rapidly ageing Singapore, the demographic trends are worrying. While the birth-rate fell to a seven-year low, the number of deaths recorded was the highest in at least two decades. The death-rate rose by 4 per cent from 20,017 deaths in 2016 to 20,905 deaths last year, the report on Registration of Births and Deaths-2017 showed. The report was released by the Immigration and Checkpoints Authority (ICA). Therefore, this study aims to analyze factors related to deaths in order to help government have more insights beyond the current situation. According to our analysis, Admissions to Accident & Emergency Departments is one of factors that has correlation with death.
We assume that the relationship between number of deaths aggregated by month and number of admissions to Accident & Emergency Departments per month are correlated. In other words, increase in the number of admissions will raise death count. In this case, the dependent variable is number of deaths aggregated monthly, while the independent variable is admissions to Accident & Emergency Departments aggregated by month.
3.1 Data Source
The two datasets were collected from Department of Statistics Singapore. The admissions to Accident & Emergency Departments data was extracted from the dataset called ‘Admissions to Public Sector Hospitals, Monthly’, whereas the monthly death data was extracted from ‘Deaths by Ethnic Group and Sex, Monthly’. （Link:http://www.tablebuilder.singstat.gov.sg/publicfacing/createDataTable.action?refId=15193 http://www.tablebuilder.singstat.gov.sg/publicfacing/createDataTable.action?refId=15167）
3.2 Data Transformation
Originally, the data structure was a multi-dimension table which was not suitable for analytics. Therefore, we transformed the data structure in order to make it more suitable for analytics. Besides, we selected the data from Jan 2012 to Dec 2017 only as we wanted to mainly focus on its recent trend.
3.2 Data Records
The data was extracted from Jan 2012 to Dec 2017. It has 72 records in total which are enough for transfer function, because generally speaking, the threshold of transfer function is 60 records.
4. ARIMA Model
4.1 Observing plots
As shown in Figure 1 and 2, the monthly deaths are non-stationary with a trend, while the admissions to Accidents & Emergency Department are stationary. In order to test their stationary, KPSS test was employed. The null hypothesis of KPSS test is that the data is level or trend stationary. Therefore, if the data is level or trend stationary, its p-value should be larger than 0.05.
According to the outputs of KPSS test shown in Figure 3, it is obvious that the monthly deaths are non-stationary, while the admissions to Accidents & Emergency Department are stationary.
Figure 1 Line chart of monthly death
Figure 2 Line chart of admissions to Accidents & Emergency Department
Figure 3 Output of KPSS test
4.2 Determination of parameters
Based on the ACF and PACF graphs in Figure 4, the parameters of ARIMA model can be determined.
- p – Non-seasonal Autoregression Order
According to the PACF graph, there is a spike at lag 2. Therefore, p is equal to 2.
- d – Non-seasonal Differencing Order
Because the original data is stationary. Therefore, the differencing order is equal to 0.
- q – Non-seasonal Moving Average Order
According to the ACF graph, there is a spike at lag 2. Therefore, p is equal to 2.
- P – Seasonal Autoregression Order
According to the PACF graph, there is a spike at lags 12 and 24. Therefore, P is equal to 2.
- D – Seasonal Differencing Order
Because the original admissions data is stationary. Therefore, D is equal to 0.
- Q – Seasonal Moving Average Order
According to the ACF graph, there is a spike at lag 12. Therefore, p is equal to 1.
Figure 4 Time Series Basic Diagnostics
4.3 Result of Seasonal ARIMA (2,0,2) (2,0,1)
The result of ARIMA is shown below. It is obvious that some coefficients are not significant, so the insignificant parameters have to be removed one by one to find out the parameters of best ARIMA model.
Figure 5 Output of Seasonal ARIMA (2,0,2) (2,0,1)
4.4 Output of ARIMA Model Group
In order to find out the best parameters, ARIMA Model Group was employed. The output of ARIMA Model Group is shown as below. According to AIC criteria, the best solution is (2,0,2) (1,0,1).
Figure 6 Output of ARIMA Model Group
4.5 Output of Seasonal ARIMA (2,0,2) (1,0,1)
The output of Season ARIMA (2,0,2) (1,0,1) is shown as below. The model is regarded as the best model for the following reasons:
- Firstly, according to Forecast in Figure 8, the predictive trend and range are reasonable.
- Secondly, according to Parameter Estimates in Figure 9, all parameters are significant.
- Thirdly, according to Residuals in Figure 10, the residuals are randomly distributed, while there are no spikes in ACF and PACF graphs. Therefore, this model can be used for further analysis.
Figure 7 Model Summary
Figure 8 Forecast
Figure 9 Parameter Estimates
Figure 10 Residuals
5. Transfer Function
After finding the suitable ARIMA Model, the parameters of X’s ARIMA was used to pre-whiten the input and output series in order to get their cross-correlation graph which is shown as below.
Figure 11 Prewhitening Plot
5.2 Identifying Parameters and Fitting Transfer Function Noise Model
According to Figure 11, there are two spikes at lags 10 and 11. It means the non-zero autocorrelation occurs at lag10 and the values decay after lag 11. Therefore, b=10, s=11-10=1. Besides, r is equal to2. Finally, the parameters we used are shown in Figure below:
Figure 12 Transfer Function Model
5.3 Diagnostic Checks
- Check Residuals
According to Figure 13, the residuals are randomly distributed.
Figure 13 Residuals of Transfer Function
- Check Significance of Parameters
According to the result, all parameters are significant.
Figure 14 Parameter Estimates of Transfer Function
5.4 Model Comparison
Although the above solution is good enough, other parameters were tried for comparison. Finally, according to the output of model comparison in Figure 15, the solution mentioned above is the best one.
Figure 15 Model Comparison
5.5 Expanded Formula
Figure 16 Formula of Transfer Function
The formula of transfer function is shown above. However, in order to understand the formula better, we expanded the model in full with backshift operator as shown below. The expanded formula includes Y-Deaths, X-Admissions to Accident & Emergency Departments and e-Error Term. The number at the bottom right corner means the lags of its corresponding item.
Figure 17 Expanded Formula
5.6 Model Summary
As shown in Model Summary in Figure 18, the MAPE is equal to 2.77, while the MAE is equal to 46.92. The performance is acceptable. The forecasting points and confidence interval are shown in Figure 19.
Figure 18 Model Summary
Figure 19 Forecasting graph
5.7 Output of test data
As shown in Figure 20, the MAE of test data is equal to 100.157, while the MAPE is equal to 5.55. Both evaluation metrics are larger than that of train data. However, the result still can be a reference for government.
|Time||Predictive Value||Actual Value|
Figure 20 Output of test data
This transfer function indicates the relationship between Deaths and Admissions to Accident & Emergency Departments. It validates the initial hypothesis that the Admissions to Accident & Emergency Departments leads number of deaths aggregated by month in Singapore. Besides, the performance of model is acceptable. Therefore, government can use this model to predict deaths in advance and then take actions to lower death count. For example, if the government sees an increase in the number of admissions, it can put more effort to provide medical assistance and perform researches to understand the underlying reasons.