Submitted by: Aastha Arora (A0178188L), Aishwarya Bose (A0178277M), Chetna Gupta (A0178260A), Madeleine Dy (A0178427U), Misha Singh (A0178309W), Zaira Hossain (A0178331E)

 

Problem/Hypothesis of the Two Time Series:

Is there a relationship between Air Passenger Arrivals and value of Food & Beverage Sales in Singapore?

Data Exploration and Pre-Processing:

SERIES 1: AIR PASSENGER ARRIVALS IN SINGAPORE

Picture1

Source: https://data.gov.sg/dataset/air-passenger-arrivals-total-by-region-and-selected-country-of-embarkation

Notes:

  • The raw dataset contains Air passenger arrivals by country and regions estimated monthly.
  • Information has been obtained from third party sources.
  • Refers to Changi Airport only.
  • Data excludes transit passengers who continued their journey on the same flight.

 

SERIES 2: FOOD & BEVERAGE SALES IN SINGAPORE

Picture2

Source: https://data.gov.sg/dataset/value-of-food-beverage-sales-based-on-2014-100-index-estimated-monthly

Notes:

  • The raw dataset contains value of food and beverage sales index estimated monthly.
  • The Food & Beverage Sales Value is compiled from the results of the Monthly Survey of Food & Beverage Services.
  • The objective of the survey is to measure the short-term performance of food & beverage (F&B) services industries based on the sales records of F&B services establishments.
  • The Food & Beverage Sales Value is based on the Food & Beverage Services Index (FSI).

Number of Records:

F&B sales value and overall Air passenger arrivals were combined to form a dataset containing 200 monthly records from Jan 1997 to Aug 2013 (~16 years & 8 months of data) for both Singapore Food & Beverage Prices, and Number of Air Passenger arrivals, which was used for analysis.

Outliers:

A dip in arrival and F&B sales value was seen between April, May and June 2003, which could be attributed to the fact that there was SARS virus outbreak in Singapore. So, this was considered as an outlier, and we replaced the value by taking the moving average of 3 months.

 

Initial Analysis using Python:

We have plotted both the variables to visually inspect these variables. With this, we can clearly see that both the time series are trending which is a clear sign that both the time series have a unit root.

Picture4

We confirmed whether they have unit roots by using the augmented Dickey Fuller (ADF) test.

The Dickey Fuller Test produced the following output for Arrival:

(1.5485691015907959, 0.9976947744107255, 15, 184, 
{'1%': -3.466398230774071, '5%': -2.8773796387256514, 
'10%': -2.575213838610586}, 4548.903850689798)

The Dickey Fuller Test produced the following output for Food Value:

(1.5508415283231438, 0.9977023393139994, 13, 186, 
{'1%': -3.466005071659723, '5%': -2.8772078537639385, 
'10%': -2.5751221620996647}, 1478.034269903061)

Clearly, we cannot reject the null-hypothesis that these series have a unit root. So we should difference both series as a first step.

To test whether the ‘Arrival’ is caused by the ‘Food Value’, we applied the Granger Causality Test.

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=50.6180 , p=0.0000  , df_denom=196, df_num=1
ssr based chi2 test:   chi2=51.3927 , p=0.0000  , df=1
likelihood ratio test: chi2=45.7154 , p=0.0000  , df=1
parameter F test:         F=50.6180 , p=0.0000  , df_denom=196, df_num=1
{1: ({'ssr_ftest': (50.61796959284111, 2.059081651775858e-11, 196.0, 1), 
'ssr_chi2test': (51.39273443354787, 7.56178806704204e-13, 1), 
'lrtest': (45.71543384976394, 1.3674112550498519e-11, 1), 
'params_ftest': (50.617969592841, 2.0590816517759726e-11, 196.0, 1)}, 
[<statsmodels.regression.linear_model.RegressionResultsWrapper object at 
0x000001A53F406F28>, 
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 
0x000001A540F8D518>, array([[0., 1., 0.]])])}

We can say that Arrival is caused by Food Value because p value is significant and less than 0.05. Then, we reverse the test to see if the ‘Food Value’ is caused by the ‘Arrival’. As a result, we found that Food Value was not caused by Arrival.

Granger Causality
number of lags (no zero) 1
ssr based F test:         F=1.9754  , p=0.1615  , df_denom=196, df_num=1
ssr based chi2 test:   chi2=2.0056  , p=0.1567  , df=1
likelihood ratio test: chi2=1.9956  , p=0.1578  , df=1
parameter F test:         F=1.9754  , p=0.1615  , df_denom=196, df_num=1
{1:({'ssr_ftest': (1.9754074370715489, 0.16145817984806168, 196.0, 1),
'ssr_chi2test': (2.005643265189991, 0.15671480384086686, 1), 
'lrtest': (1.9956036184489676, 0.15775620297932386, 1), 
'params_ftest': (1.9754074371779515, 0.16145817983682556, 196.0, 1)}, 
[<statsmodels.regression.linear_model.RegressionResultsWrapper object at 
0x000001A545511BE0>, 
<statsmodels.regression.linear_model.RegressionResultsWrapper object at 
0x000001A545489CC0>, array([[0., 1., 0.]])])}

 

Analysis using JMP:

Firstly, we checked the PACF graph in the time series inputs (Food Value). It’s clear that the data needs to be differenced to run the input series ARIMA for pre-whitening:

Picture5

We then use the ARIMA Group Model Function in JMP to discover the best combination. This turns out to be Seasonal ARIMA (2,1,0) (0,1,1)12 for the input series. Then pre-whitened the input series and we got the following result:

Picture6

Since, we did not get any significant lags. We decided to go with log transformation of both the series.  After the log transformation, we fit an ARIMA model using the above steps and pre-whitened the series and got the following output:

Picture8

We found significant lags at both sides of the plot. In this case transfer function won’t give the right estimates of coefficients. So, we reached the conclusion that it is a cointegration problem. Hence, we moved on to GRETL.

 

Analysis using GRETL:

We have plotted both the variables to visually inspect these variables. With this, we can clearly see that both the time series are trending which is a clear sign that both the time series have a unit root.

Picture8

Both series are trending upward. It is possible that both the series follow long run equilibrium relationship that they tend to return to over time. We performed the Engle Granger Test for cointegration to find out.

Engle Granger Test:
Given two sets of time series data, x and y, granger-causality is a method which attempts to determine whether one series is likely to influence change in the other. This is accomplished by taking different lags of one series and using that to model the change in the second series. We create two models which predict y, one with only past values of y (Ω), and the other with past values of y and x (π). The models are given below where k is the number of lags in the time series:

Let Ω = yt = β0 + β1yt-1 +…+ βkyt-k + e
And π = yt = β0 + β1yt-1 +…+ βkyt-k + α1xt-1 +…+ αkxt-k + e

The residual sums of squared errors are then compared, and a test is used to determine whether the nested model (Ω) is adequate to explain the future values of y or if the full model (π) is better. The F-test, t-test or Wald test (used in R) are calculated to test the following null and alternate hypotheses:

H0: αi = 0 for each i of the element [1,k]
H1: αi ≠ 0 for at least 1 i of the element [1,k]

Essentially, we are trying to determine whether we can say that statistically, x provides more information about future values of y than past values of y alone. Under this definition we are not trying to prove actual causation, only that the two values are related by some phenomenon. Along those lines, we must also run this model in reverse to verify that that y does not provide information about future values of x. If we find that this is the case, it is likely that there is some exogenous variable, z, which needs to be controlled or could be a better candidate for granger causation.

Steps in Engle Granger Test:

Step 1: Determine ‘d’ in I(d) for ‘Log of Arrival’ using ADF Unit Root Test.
H0: Level series contains a unit root.
HA: Level series does not contain a unit root.

We have taken a maximum lag order of 6 by taking the cube root of the number of data points (200).

We select ‘constant and trend’ because we have seen from the plot that the series has an upward trend.

Picture9

The p-value is large, so we fail to reject the NULL Hypothesis. This means series needs to be differenced to make it stationary. So, d=1.

Step 2: Determine ‘d’ in I(d) for ‘Log of Food Value’ using ADF Unit Root Test.
H0: Level series contains a unit root.
HA: Level series does not contain a unit root.

We have taken a maximum lag order of 6 by taking the cube root of the number of data points (200).

We select ‘constant and trend’ because we have seen from the plot that the series has an upward trend.

Picture10

The p-value is large, so we fail to reject the NULL Hypothesis. This means series needs to be differenced to make it stationary. So, d=1.

Step 3: Estimate cointegrating regression: Yt = β1+ β2Xt+Ɛt
We estimated the cointegrating regression by using ‘Log of Arrival’ as depend Variable. Both variables are integrated of the same order. We provided lag order to be 6 as mentioned in Step 1 and Step 2.

Picture11

Step 4: Determine ‘d’ in I(d) for Ɛt
H0: Unit root (i.e., not cointegrated)
HA: No unit root (i.e., cointegrated)

Since our p-value is small so we reject the NULL hypothesis at 5% level of significance and conclude that the Food Value is cointegrated with Arrival rate. This in turn means that the series can be written as an error correction model.

 

Error Correction model using GRETL:

Equation:
∆Y_t=φ_1 ∆X_(t-1)+φ_2 Y_(t-1)-γ{Y_(t-1)-β ̂_1-β ̂_2 X_(t-1) }+ω_t

The lag residual from cointegrating regression is found within the curly braces above. The coefficient γ is the speed of adjustment. If it is not statistically significant, the variable is weakly exogenous.

Before estimating the error correction model, we estimated the cointegrating regression and saved the residuals with name e.

Picture12

Then, we created two new series which is the difference of ‘log of arrival’ and ‘log of food value’.

Then we estimated the Error Correction Model. Select following lags:

Picture13

The variable that we are interested in is e. Atleast one of the variables must not be weakly exogenous if the series are cointegrated. We can see that Log of Arrival is not weakly exogenous. This means that The arrival value moves to restore the equilibirium when the system is out of balance but Log of Food Value doesn’t move to equilibirium when system is out of balance.

Picture14

Conclusion:

We can conclude that arrivals of air passengers is caused by food and beverage sales values as opposed to our initial assumption that food and beverage sales value is dependent on arrivals of air passengers.We can say the Food Value Granger-Cause Arrival rate!