Challenge

Question:

Will Apple’s iPhone net sales be higher in FY 2021 than FY 2019?

Link:

https://www.gjopen.com/questions/1823-will-apple-s-iphone-net-sales-be-higher-in-fy-2021-than-fy-2019

Description:

Apple’s iPhone net sales have fluctuated over the years, and as of October 2020 Apple has begun to sell the new iPhone 12 and its variants (Apple). Industry analysts speculate whether 5G capability will bolster declining iPhone sales or if the pandemic’s global impact on supply chains and expendable income will hinder iPhone sales (9 to 5 Mac, ABS-CBN). Apple Inc.’s Fiscal Year (FY) annual report for 2021 is expected by early November 2021 at https://investor.apple.com/investor-relations/default.aspx under the “Annual Reports on Form 10-K” section. For FY 2020, Apple reported net iPhone sales of $137.781 billion, compared to $142.381 billion in FY 2019 (Apple 10-K (2020), see page 21).

Methodology

Data Source

Apple’s Total Revenue (2008 Q2 ~ 2020 Q4, source: WRDS)
iPhone sales share of Apple’s total revenue (2009 Q1 ~ 2020 Q4, source: statista)
Chian GDP (1992 Q1 ~ 2020 Q3, source: FRED)
US GDP (1947 Q1 ~ 2020 Q3, source: FRED)
Apple Stock Price (1980 Q4 ~ 2020 Q4, source: macrotrends)

Process:

Creating iPhone Sales data: In order to create iPhone sales time series data, I multiplied Apple’s total sales by iPhone sales share of the total sales.
Considering China GDP and other exogenous variables: According to Statista, US and China are major countries bringing the most sales of iPhone in 2020. Therefore, I considered GDP in both countries as exogenous variables. Moreover, the stock market often reflects customer attitudes for a business, and thus Apple’s stock historical prices probably offer useful information for forecasting.
Before building models: I visualized iPhone sales series to inspect if there is any trend and seasonality and also plotted other exogenous variables to examine cointegration. Furthermore, I applied Dickey–Fuller test to determine if a series a stationary series and also conduct conintegration test to verify whether cointegration exists between iPhone sales and other exogenous variables or not.
During the model-building stage, I tried to use ARIMA and ARIMAX model with different exogeneous variables and various number of lags. After several time trials, I found China GDP at lag 3 can effectively and significantly eliminate RMSE on the validation period. Here are the results of trials

**Performances of Various Models Forecasts**
Model	Order	Exogeous Variable	RMSE	DM Test (p-value)	DM Test(5%)
Model	Order	Exogeous Variable	RMSE	DM Test (p-value)	CANNOT Rejct
Naive	NA	None	0.3839	NA	NA
ARIMA	(0,1,1)(0,1,1)[4]	None	0.0916	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	US GDP at lag 1	0.3184	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	US GDP at lag 2	0.0675	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	US GDP at lag 3	0.0846	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	China GDP at lag 1	0.1557	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	China GDP at lag 2	0.1578	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	China GDP at lag 3	0.0555	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	Stock Price at lag 1	0.1372	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	Stock Price at lag 2	0.2121	0	Reject
ARIMAX	(0,1,1)(0,1,1)[4]	Stock Price at lag 3	0.1408	0	Reject

5. I used residuals from my best forecasting model to build GARCH model and predict standard deviations of 2021’s sales. Below are 2021’s forecasts.

**Sales and Standard Deviation Forecasts**
Quarter	2019 Actual Sales	2021 Sales Forecast	2021 Forecast SD
Q1	51,986	49,125	7,025
Q2	31,050	27,678	3,974
Q3	25,984	24,169	3,424
Q4	33,365	24,754	3,520
Whole Year	142,384	125,725	9,448

*Note: $\sigma_{2021} = \sqrt{(7,025^2+3,974^2+3,424^2+3,520^2)}=9,448$

Since the question required to provide a probability that 2021 sales exceeds 2019 sales, so I I utilized standard deviations to compute the z-score of 2019’s sales in the distribution of 2021’s forecasts and then transformed the z-score into a probability based on the normal distribution assumption of the white noises. Here is the probability.

$z = \frac{(142,384 - 125,725)}{9,448} = 1.7632303$
$P(2021 sales>2019sales) = 3.89\%$

In sum, The probability is simply 3.89%. We can almost conclude that iPhone impossibly create a higher sales in 2021 than in 2019.

Codes:

Import Apple-related Data

# Apple's total Revenue
apple_sales = read.csv('apple_net_sales_by_year.csv')
sales_ts = ts(apple_sales$revtq, start=c(2008,2), freq=4)
sales_ts = window(sales_ts, start=c(2009,1))

# iPhone sales share
iphone_share = read.csv('statistic_id253649_iphone-sales-as-share-of-apples-total-revenue-2009-2020.csv')
iphone_share = iphone_share[complete.cases(iphone_share), ]
iphone_share_ts = ts(iphone_share$X, start=c(2009,1), freq=4)


# Create iPhone sales data by multiplying Apple's total revenue by iPhone sales share
iphone_ts = sales_ts*iphone_share_ts/100
log_iphone_ts = log(iphone_ts)

log_iphone_ts

##           Qtr1      Qtr2      Qtr3      Qtr4
## 2009  7.986267  7.794512  8.026291  8.435050
## 2010  8.626665  8.602544  8.581724  9.085090
## 2011  9.256183  9.417270  9.496363  9.303837
## 2012 10.103055 10.029622  9.695458  9.748202
## 2013 10.330634 10.041377  9.806563  9.878768
## 2014 10.428102 10.168305  9.891054 10.072274
## 2015 10.843150 10.603664 10.353614 10.379993
## 2016 10.844774 10.399920 10.087750 10.245589
## 2017 10.903671 10.412459 10.120137 10.269686
## 2018 11.028020 10.546218 10.301451 10.523701
## 2019 10.858721 10.343342 10.165250 10.415258
## 2020 10.931638 10.274415 10.182651 10.183445

Import other Data

# China GDP data 
china_gdp = read.csv('CHNGDPNQDSMEI.csv')
china_gdp_ts = log(ts(china_gdp$CHNGDPNQDSMEI, start=c(1992,1), freq=4))


# US GDP data
us_gdp = read.csv('USGDP.csv')
us_gdp_ts = ts(us_gdp$GDP, start=c(1947,1), freq=4)
us_gdp_ts = log(us_gdp_ts)


# Apple Stock Price
apple_price = read.csv('MacroTrends_Data_Download_AAPL.csv')
apple_price$date = as.Date(apple_price$date, format='%m/%d/%Y')
apple_price['year'] = year(apple_price$date)
apple_price['quarter'] = quarter(apple_price$date)

apple_price2 = apple_price %>%
  group_by(year, quarter) %>%
  summarize(price=mean(close))

apple_price_ts = ts(apple_price2$price, start=c(1980,4), freq=4)
apple_price_ts = log(apple_price_ts)

Visualization

p1 = autoplot(iphone_ts, size=1) + 
     labs(title='iPhone Sales (Original)', y='log_sales', x='')

p2 = autoplot(diff(log_iphone_ts, 4), color='royalblue', size=1) + 
     labs(title='iPhone Sales (Seasonally Adjusted)', y='log_sales', x='')

p3 = autoplot(diff(diff(log_iphone_ts, 4), 1), color='orangered', size=1) + 
     labs(title='iPhone Sales (Seasonally Adjusted and Trend Removed)', y='log_sales', x='')

cowplot::plot_grid(p1, p2, p3, align = "v", nrow = 3)

create_dual_axis_plot = function(ts1, ts2){
  par(mar=c(5, 4, 4, 6) + 0.1)
  plot(ts1, 
       main='iPhone Sales vs. China GDP', 
       pch=16, xlab="", ylab="", yaxt="n")
  axis(2,col="black",las=1)
  mtext("log_Sales",side=2,line=2.5)
  box()
  
  par(new=TRUE)
  plot(ts2, pch=15,  axes=FALSE, col="red",  xlab="", ylab="")
  mtext("log_GDP", side=4,col="red",line=4) 
  axis(4, col="red",col.axis="red",las=1)
}

par(mfrow = c(3, 1))

# Graph 1
create_dual_axis_plot(log_iphone_ts, china_gdp_ts)

# Graph 2
create_dual_axis_plot(log_iphone_ts, us_gdp_ts)

# Graph 3
create_dual_axis_plot(log_iphone_ts, apple_price_ts)

DF Test

df_test = function(ts, type){
  df_test = ur.df(ts , type=type, selectlags="BIC")
  sumStats = summary(df_test)
  teststat = round(sumStats@teststat[1],2)
  critical = sumStats@cval[1,2]
  reject = ifelse(teststat < critical, 
                  "CAN reject the null hypothesis, so it a stationary series with trend",
                  "CANNOT reject the null hypothesis, so it a non-stationary series with drift")
  glue("DF Test ({type}) \nTest Statistics: {teststat} \n5% Critical Value (tau) :{critical}\n-> {reject}\n ")
}

cat('iPhone Sales\n')

## iPhone Sales

df_test(log_iphone_ts , type='trend')

## DF Test (trend) 
## Test Statistics: -2.94 
## 5% Critical Value (tau) :-3.5
## -> CANNOT reject the null hypothesis, so it a non-stationary series with drift

cat('\n\nChina GDP\n')

## 
## 
## China GDP

df_test(window(china_gdp_ts,start=c(2009,1), end=c()) , type='trend')

## DF Test (trend) 
## Test Statistics: -4.94 
## 5% Critical Value (tau) :-3.5
## -> CAN reject the null hypothesis, so it a stationary series with trend

cat('\n\nUS GDP\n')

## 
## 
## US GDP

df_test(window(us_gdp_ts,start=c(2009,1)) , type='trend')

## DF Test (trend) 
## Test Statistics: -2.39 
## 5% Critical Value (tau) :-3.5
## -> CANNOT reject the null hypothesis, so it a non-stationary series with drift

cat('\n\nApple Stock Price\n')

## 
## 
## Apple Stock Price

df_test(window(apple_price_ts,start=c(2009,1)), type='trend')

## DF Test (trend) 
## Test Statistics: -2.89 
## 5% Critical Value (tau) :-3.5
## -> CANNOT reject the null hypothesis, so it a non-stationary series with drift

Examine if any cointegration exists between iPhone sales, US GDP, and Apple Stock Price

According to Dickey–Fuller Tests on residuals, there is no cointegration between iPhone sales and US GDP, as well as iPhone sales and Apple Stock Price

creg.model1 = lm(window(us_gdp_ts,start=c(2009,1)) ~ window(log_iphone_ts, end=c(2020,3)))
resid1 = residuals(creg.model1)
resid.ts1 = ts(resid1, start=c(2009,2), freq = 365)

cat('Regression Residuals (using iPhone Sales to model on US GDP_')

## Regression Residuals (using iPhone Sales to model on US GDP_

df_test(resid.ts1 , type='none')

## DF Test (none) 
## Test Statistics: -1.91 
## 5% Critical Value (tau) :-1.95
## -> CANNOT reject the null hypothesis, so it a non-stationary series with drift

creg.model2 = lm(window(apple_price_ts,start=c(2009,1), end=c(2020,4)) ~ log_iphone_ts)
resid2 = residuals(creg.model2)
resid.ts2 = ts(resid2, start=c(2009,2), freq = 365)
cat('\n\nRegression Residuals (using iPhone Sales to model on Apple Stock Price)')

## 
## 
## Regression Residuals (using iPhone Sales to model on Apple Stock Price)

df_test(resid.ts2 , type='none')

## DF Test (none) 
## Test Statistics: -1.49 
## 5% Critical Value (tau) :-1.95
## -> CANNOT reject the null hypothesis, so it a non-stationary series with drift

Use ARIMA models to forecast US GDP, China GDP and Apple Stock Price and thus create the same length series with the iPhone sales series

arima_us_gdp = auto.arima(us_gdp_ts)
us_gdp_forecast = forecast(arima_us_gdp, h=5)
us_gdp_ts = ts(c(us_gdp_ts, 
                 us_gdp_forecast$mean), 
               end=c(2021,4), freq=4)

arima_china_gdp = auto.arima(china_gdp_ts)
china_gdp_forecast = forecast(arima_china_gdp, h=5)
china_gdp_ts = ts(c(china_gdp_ts, 
                    china_gdp_forecast$mean), 
                  end=c(2021,4), freq=4)

arima_apple_price = auto.arima(apple_price_ts)
apple_price_forecast = forecast(apple_price_ts, h=4)
apple_price_ts = ts(c(apple_price_ts,
                      apple_price_forecast$mean), 
                    end=c(2021,4), freq=4)

Build Models

Create Lagged Series and Split Data

# Create lagged series for US GDP

for (lag in 1:3){
  lag_series = window(stats::lag(us_gdp_ts,-lag), start=c(2009,1), end=c(2020,4))
  log_iphone_ts = cbind(log_iphone_ts, window(lag_series))
}

for (lag in 1:3){
  lag_series = window(stats::lag(china_gdp_ts,-lag), start=c(2009,1), end=c(2020,4))
  log_iphone_ts = cbind(log_iphone_ts, window(lag_series))
}


for (lag in 1:3){
  lag_series = window(stats::lag(apple_price_ts,-lag), start=c(2009,1), end=c(2020,4))
  log_iphone_ts = cbind(log_iphone_ts, window(lag_series))
}


colnames(log_iphone_ts) = c('sales', 
                            paste('us_gdp_lag', seq(1,3), sep=''),
                            paste('china_gdp_lag', seq(1,3), sep=''),
                            paste('apple_price_lag', seq(1,3), sep=''))


n_valid = 8
train_ts = window(log_iphone_ts, end=c(2020, 4-n_valid))
valid_ts = window(log_iphone_ts, start=c(2020, 4-n_valid+1))

Create the naive forecast model as a base model

log_iphone_ts_lag = stats::lag(log_iphone_ts[, 'sales'], -1)
naive_train = window(log_iphone_ts_lag, end=c(2020, 4-n_valid))
naive_valid = window(log_iphone_ts_lag, start=c(2020, 4-n_valid+1))
naive_train_residuals = train_ts - naive_train
naive_valid_residuals = valid_ts - naive_valid
rmse_naive = round(accuracy(naive_valid, valid_ts[, 'sales'])[2],4)

glue('Naive Model RMSE: {rmse_naive}\n')

## Naive Model RMSE: 0.3839

Attempt 1: ARIMA, Seasonal, w/o other X variables

model1 = auto.arima(train_ts[, 'sales'], seasonal = TRUE, ic='bic')

summary(model1)

## Series: train_ts[, "sales"] 
## ARIMA(0,1,1)(0,1,1)[4] 
## 
## Coefficients:
##           ma1     sma1
##       -0.3532  -0.3479
## s.e.   0.1502   0.1679
## 
## sigma^2 estimated as 0.03953:  log likelihood=7.58
## AIC=-9.15   AICc=-8.38   BIC=-4.48
## 
## Training set error measures:
##                       ME      RMSE       MAE        MPE    MAPE      MASE
## Training set -0.03050251 0.1805996 0.1223888 -0.3223638 1.23442 0.3876714
##                     ACF1
## Training set -0.03445454

model1_pred = Arima(valid_ts[, 'sales'], model=model1)
rmse1 = round(accuracy(model1_pred)[2], 4)
dm_test1 = dm.test(residuals(model1_pred), naive_valid_residuals, alternative = 'less')
dm_test1 = round(dm_test1$p.value,4)

cat(glue('Naive Model RMSE: {rmse_naive}\nARIMA Model RMSE: {rmse1}, DM Test P-value: {dm_test1}'))

## Naive Model RMSE: 0.3839
## ARIMA Model RMSE: 0.0916, DM Test P-value: 0

plot(valid_ts[, 'sales'])
lines(fitted(model1_pred), col='blue')
lines(naive_valid, col='green')
grid()

Attempt 2: ARIMA, Seasonal, with various lagged `US GDP` series

rmse_us_gdp_lag = rep(0,3)
dm_us_gdp_lag = rep(0,3)
colors = c('red', 'blue', 'green')

cat(glue('Naive Model RMSE: {rmse_naive}\nARIMA Model RMSE: {rmse1}'), '\n')

## Naive Model RMSE: 0.3839
## ARIMA Model RMSE: 0.0916

plot(valid_ts[, 'sales'], ylim=c(9.3, 11), ylab='log_sales', xlab='')
grid()

for (i in 1:3){
  lag_var_name = paste('us_gdp_lag', i, sep='')

  model2 = auto.arima(train_ts[, 'sales'], seasonal = TRUE, xreg=train_ts[, lag_var_name], ic='bic')
  
 # summary(model2)
  
  model2_pred = Arima(valid_ts[, 'sales'], model=model2, xreg=valid_ts[, lag_var_name])
  rmse2 = round(accuracy(model2_pred)[2], 4)
  rmse_us_gdp_lag[i] = rmse2
  dm_test2 = dm.test(residuals(model2_pred), naive_valid_residuals, alternative = 'less')
  dm_test2 = round(dm_test2$p.value,4)
  dm_us_gdp_lag[i] = dm_test2
  lines(fitted(model2_pred), col=colors[i])
  cat(glue('ARIMAX Model RMSE: {rmse2} (with us_gdp_lag{i}), DM Test P-value: {dm_test2}(with china_gdp_lag{i})'), '\n')
}

## ARIMAX Model RMSE: 0.3184 (with us_gdp_lag1), DM Test P-value: 0(with china_gdp_lag1) 
## ARIMAX Model RMSE: 0.0675 (with us_gdp_lag2), DM Test P-value: 0(with china_gdp_lag2) 
## ARIMAX Model RMSE: 0.0846 (with us_gdp_lag3), DM Test P-value: 0(with china_gdp_lag3)

Attempt 3: ARIMA, Seasonal, with various lagged `China GDP` series

rmse_china_gdp_lag = rep(0,3)
dm_china_gdp_lag = rep(0,3)
colors = c('red', 'blue', 'green')

cat(glue('Naive Model RMSE: {rmse_naive}\nARIMA Model RMSE: {rmse1}'), '\n')

## Naive Model RMSE: 0.3839
## ARIMA Model RMSE: 0.0916

plot(valid_ts[, 'sales'], ylim=c(9.3, 11), ylab='log_sales', xlab='')
grid()

for (i in 1:3){
  lag_var_name = paste('china_gdp_lag', i, sep='')

  model2 = auto.arima(train_ts[, 'sales'], seasonal = TRUE, xreg=train_ts[, lag_var_name], ic='bic')
  
 # summary(model2)
  
  model2_pred = Arima(valid_ts[, 'sales'], model=model2, xreg=valid_ts[, lag_var_name])
  rmse2 = round(accuracy(model2_pred)[2], 4)
  rmse_china_gdp_lag[i] = rmse2
  lines(fitted(model2_pred), col=colors[i])
  dm_test2 = dm.test(residuals(model2_pred), naive_valid_residuals, alternative = 'less')
  dm_test2 = round(dm_test2$p.value,4)
  dm_china_gdp_lag[i] = dm_test2
  cat(glue('ARIMAX Model RMSE: {rmse2} (with china_gdp_lag{i}), DM Test P-value: {dm_test2}(with china_gdp_lag{i})'), '\n')
}

## ARIMAX Model RMSE: 0.1557 (with china_gdp_lag1), DM Test P-value: 0(with china_gdp_lag1) 
## ARIMAX Model RMSE: 0.1578 (with china_gdp_lag2), DM Test P-value: 0(with china_gdp_lag2) 
## ARIMAX Model RMSE: 0.0555 (with china_gdp_lag3), DM Test P-value: 0(with china_gdp_lag3)

Attempt 4: ARIMA, Seasonal, with various lagged `Apple Price` series

rmse_apple_price_lag = rep(0,3)
dm_apple_price_lag = rep(0,3)
colors = c('red', 'blue', 'green')

cat(glue('Naive Model RMSE: {rmse_naive}\nARIMA Model RMSE: {rmse1}, DM Test P-value: {dm_test1}'), '\n')

## Naive Model RMSE: 0.3839
## ARIMA Model RMSE: 0.0916, DM Test P-value: 0

plot(valid_ts[, 'sales'], ylim=c(9.3, 11), ylab='log_sales', xlab='')
grid()

for (i in 1:3){
  lag_var_name = paste('apple_price_lag', i, sep='')

  model2 = auto.arima(train_ts[, 'sales'], seasonal = TRUE, xreg=train_ts[, lag_var_name], ic='bic')
  
 # summary(model2)
  
  model2_pred = Arima(valid_ts[, 'sales'], model=model2, xreg=valid_ts[, lag_var_name])
  rmse2 = round(accuracy(model2_pred)[2], 4)
  rmse_apple_price_lag[i] = rmse2
  dm_test2 = dm.test(residuals(model2_pred), naive_valid_residuals, alternative = 'less')
  dm_test2 = round(dm_test2$p.value,4)
  dm_apple_price_lag[i] = dm_test2
  lines(fitted(model2_pred), col=colors[i])
  cat(glue('ARIMAX Model RMSE: {rmse2}, DM Test P-value: {dm_test2}(with china_gdp_lag{i})'), '\n')
}

## ARIMAX Model RMSE: 0.1372, DM Test P-value: 0(with china_gdp_lag1) 
## ARIMAX Model RMSE: 0.2121, DM Test P-value: 0(with china_gdp_lag2) 
## ARIMAX Model RMSE: 0.1408, DM Test P-value: 0(with china_gdp_lag3)

Summary of Model Performances

model_performances = data.frame(c(rmse_naive, rmse1, rmse_us_gdp_lag,rmse_china_gdp_lag, rmse_apple_price_lag))
colnames(model_performances) = 'RMSE'
model_performances['Model'] = c('Naive', 'ARIMA', rep('ARIMAX', 9))
model_performances['Order'] = '(0,1,1)(0,1,1)[4]'
model_performances['Exogeous Variable'] = c('None', 'None', 
                                            paste('US GDP at lag ', seq(3), sep=''),
                                            paste('China GDP at lag ', seq(3), sep=''),
                                            paste('Stock Price at lag ', seq(3), sep=''))

model_performances = model_performances[, c(2,3,4,1)]
model_performances[1, 'Order'] = NA
model_performances['DM Test (p-value)'] = c(NA, dm_test1, dm_us_gdp_lag, dm_china_gdp_lag, dm_apple_price_lag)
model_performances

##     Model             Order    Exogeous Variable   RMSE DM Test (p-value)
## 1   Naive              <NA>                 None 0.3839                NA
## 2   ARIMA (0,1,1)(0,1,1)[4]                 None 0.0916                 0
## 3  ARIMAX (0,1,1)(0,1,1)[4]      US GDP at lag 1 0.3184                 0
## 4  ARIMAX (0,1,1)(0,1,1)[4]      US GDP at lag 2 0.0675                 0
## 5  ARIMAX (0,1,1)(0,1,1)[4]      US GDP at lag 3 0.0846                 0
## 6  ARIMAX (0,1,1)(0,1,1)[4]   China GDP at lag 1 0.1557                 0
## 7  ARIMAX (0,1,1)(0,1,1)[4]   China GDP at lag 2 0.1578                 0
## 8  ARIMAX (0,1,1)(0,1,1)[4]   China GDP at lag 3 0.0555                 0
## 9  ARIMAX (0,1,1)(0,1,1)[4] Stock Price at lag 1 0.1372                 0
## 10 ARIMAX (0,1,1)(0,1,1)[4] Stock Price at lag 2 0.2121                 0
## 11 ARIMAX (0,1,1)(0,1,1)[4] Stock Price at lag 3 0.1408                 0

#write.table(model_performances, "model_performances.csv", row.names = F, quote = F, sep=';')

Use the Best Model to train on the entire dataset

best_model = Arima(log_iphone_ts[, 'sales'], 
                   order=c(0,1,1), 
                   seasonal=c(0,1,1),
                   xreg=log_iphone_ts[, 'china_gdp_lag3'])
  
summary(best_model)

## Series: log_iphone_ts[, "sales"] 
## Regression with ARIMA(0,1,1)(0,1,1)[4] errors 
## 
## Coefficients:
##           ma1     sma1    xreg
##       -0.4097  -0.3656  1.2836
## s.e.   0.1477   0.1476  1.0697
## 
## sigma^2 estimated as 0.03679:  log likelihood=11.22
## AIC=-14.45   AICc=-13.4   BIC=-7.4
## 
## Training set error measures:
##                      ME      RMSE       MAE        MPE     MAPE      MASE
## Training set -0.0380466 0.1750944 0.1223876 -0.3945825 1.227925 0.4352205
##                     ACF1
## Training set -0.01725971

plot(log_iphone_ts[, 'sales'])
lines(fitted(best_model), col='blue')
grid()

Garch Model

residuals = best_model$residuals
rs_train_ts = window(residuals, end=c(2020, 4-n_valid))
rs_valid_ts = window(residuals, start=c(2020, 4-n_valid+1))

spec = ugarchspec(variance.model = list(garchOrder=c(1,2)), 
                  mean.model = list(include.mean=FALSE, 
                                    armaOrder=c(0,0)))

fit = ugarchfit(spec=spec, data=rs_train_ts)

## Warning in .sgarchfit(spec = spec, data = data, out.sample = out.sample, : 
## ugarchfit-->waring: using less than 100 data
##  points for estimation

format(coef(fit), scientific=F)

##                     omega                    alpha1                     beta1 
## "0.000000000000000746719" "0.009885590675400865099" "0.000201819869163014185" 
##                     beta2 
## "0.968381931893958380897"

# fdays ahead forecast
valid_volfcast = ugarchforecast(fit, n.ahead=n_valid)
valid_volfcast = zoo(sigma(valid_volfcast), index(valid_ts))

# plot data, and future forecast lined up
plot(residuals, xaxt="n", xlab="Date", ylab="Sales", xlim=c(2009, 2021))
lines(valid_volfcast,col='red',lwd=3)
lines(-valid_volfcast,col='red',lwd=3)
lines(rs_valid_ts,col='blue',lwd=1)
abline(0, 0)

residuals = best_model$residuals
fit = ugarchfit(spec=spec, data=residuals)

## Warning in .sgarchfit(spec = spec, data = data, out.sample = out.sample, : 
## ugarchfit-->waring: using less than 100 data
##  points for estimation

# Plot the results
volfcast <- ts(sigma(fit), start=c(2009,1), freq=4)

# plot deviation of percentage changes from mean
plot(residuals,
     type = "l", 
     col = "steelblue",
     ylab = "Percent", 
     xlab = "Date",
     main = "Estimated Bands of +- One Conditional Standard Deviation",
     lwd = 0.2)

# add horizontal line at y = 0
abline(0, 0)

# add GARCH(1,1) confidence bands (one standard deviation) to the plot
lines(mean(residuals) + volfcast, 
      col = "darkred", 
      lwd = 0.5)

lines(mean(residuals) - volfcast, 
      col = "darkred", 
      lwd = 0.5)

Make Forecasts

# Prepare the exogenous variable
china_gdp_lag3 = window(stats::lag(china_gdp_ts,-3),
                        start=c(2021,1), 
                        end=c(2021,4))

# Forecast 2021's Sales
sales_forecast = forecast(best_model, h=4, 
                          xreg=china_gdp_lag3,
                          level=.95)

# Forecast standard deviation of 2021's Sales
volfcast = ugarchforecast(fit, n.ahead=4)

Summary of Forecasting Results

# results_table
results = data.frame(sales_forecast)
colnames(results)[1] = 'log_sales_forecast_2021'
results['log_sales_2019'] = window(log_iphone_ts[, 'sales'], 
                        start=c(2019,1), 
                        end=c(2019,4))
results['garch_log_sigma'] = as.vector(sigma(volfcast))
results['sales_forecast_2021'] = exp(sales_forecast$mean)
results['sales_2019'] = exp(results['log_sales_2019'] )
results['garch_log_sigma'] * results['sales_forecast_2021']^2

##         garch_log_sigma
## 2021 Q1       344526790
## 2021 Q2       109815909
## 2021 Q3        82619846
## 2021 Q4        86997773

results['btsd'] = NaN

# Back-Transformation Of Log-Transformed Variance
for (i in 1:4){
  bt = fishmethods::bt.log(meanlog=results[i, 'log_sales_forecast_2021'], 
              sdlog=results[i, 'garch_log_sigma'],
              n=2)
  results[i, 'btsd'] = bt['sd']
}


btsd_all_2021 = sqrt(mean(sum(results['btsd']^2)))

results['Time'] = as.vector(row.names(results))
row.names(results) = NULL
final_results = results[, c('Time', 'sales_forecast_2021', 'sales_2019', 'btsd')]


for (col in c('sales_forecast_2021', 'sales_2019', 'btsd')){
  final_results[col] = as.numeric(unlist(final_results[col]))
}


sum_df = data.frame(t(c('2021 All', as.numeric((colSums(final_results[, -1]))))))
colnames(sum_df) =  colnames(final_results)
final_results = rbind(final_results,sum_df)
final_results[5, 'btsd'] = btsd_all_2021
final_results

##       Time sales_forecast_2021  sales_2019             btsd
## 1  2021 Q1    49124.5665646916   51985.546  7025.2481855835
## 2  2021 Q2     27678.110981174   31049.628 3974.40612974232
## 3  2021 Q3    24169.1103017797  25984.3661 3424.10805686719
## 4  2021 Q4    24753.5131578707    33364.84 3520.46980305005
## 5 2021 All    125725.301005516 142384.3801 9448.18711565704

#write.csv(final_results, "final_results.csv", row.names = F, quote = F)

iPhone Sales Forecasting

November Challenge Project

Crystal Lee

12/1/2020

Challenge

Question:

Link:

Description:

Methodology

Data Source

Process:

Performances of Various Models Forecasts

Sales and Standard Deviation Forecasts

Codes:

Import other Data

Visualization

DF Test

Examine if any cointegration exists between iPhone sales, US GDP, and Apple Stock Price

Use ARIMA models to forecast US GDP, China GDP and Apple Stock Price and thus create the same length series with the iPhone sales series

Build Models

Create Lagged Series and Split Data

Create the naive forecast model as a base model

Attempt 1: ARIMA, Seasonal, w/o other X variables

Attempt 2: ARIMA, Seasonal, with various lagged `US GDP` series

Attempt 3: ARIMA, Seasonal, with various lagged `China GDP` series

Attempt 4: ARIMA, Seasonal, with various lagged `Apple Price` series

Summary of Model Performances

Use the Best Model to train on the entire dataset

Garch Model

Make Forecasts

Summary of Forecasting Results

iPhone Sales Forecasting

November Challenge Project

Crystal Lee

12/1/2020

Challenge

Question:

Link:

Description:

Methodology

Data Source

Process:

Performances of Various Models Forecasts

Sales and Standard Deviation Forecasts

Codes:

Import Apple-related Data

Import other Data

Visualization

DF Test

Examine if any cointegration exists between iPhone sales, US GDP, and Apple Stock Price

Use ARIMA models to forecast US GDP, China GDP and Apple Stock Price and thus create the same length series with the iPhone sales series

Build Models

Create Lagged Series and Split Data

Create the naive forecast model as a base model

Attempt 1: ARIMA, Seasonal, w/o other X variables

Attempt 2: ARIMA, Seasonal, with various lagged US GDP series

Attempt 3: ARIMA, Seasonal, with various lagged China GDP series

Attempt 4: ARIMA, Seasonal, with various lagged Apple Price series

Summary of Model Performances

Use the Best Model to train on the entire dataset

Garch Model

Make Forecasts

Summary of Forecasting Results

Attempt 2: ARIMA, Seasonal, with various lagged `US GDP` series

Attempt 3: ARIMA, Seasonal, with various lagged `China GDP` series

Attempt 4: ARIMA, Seasonal, with various lagged `Apple Price` series