Unilever, is a British-Dutch global consumer goods company that is headquartered in London, UK. Unilever was established on September 2, 1929, by the merge of a Dutch margarine producer (Margarine Unie), and a Britsh soapmaker (Lever Brothers). In the 20th century, the company expanded its operations worldwide and diversified its product portfolio. Unilever currently holds many large brands such as Dove, Axe, Magnum, Lipton, Vaseline, etc. Its products include food and beverages, cleaning products, personal care products, etc. The company has great geographical diversification with most of its sales coming from Europe, North America and Asia.
Unilever went through major restructuring in the 1980’s so that it could concentrate on businesses that it believed had a strong and competitive future. Unilever has faced great competition with Proctor and Gamble (P&G), especially in the detergent segment. Competition with major players in the consumers goods markets can lead to slow growth in the future. In addition to competition, we have seen a change in consumer buying habits from brick and mortar stores to online shoping, especially with the rise of Amazon. Although consumer spending in the US and Europe have shown some growth over the years, consumer spending habits may create challenges for large brick and mortar companies.
In this report, I will investigate how the sales of Unilever has been affected by various different factors. These factors include the economic conditions in the market, the growing trend of e-commerence, and other factors. We will analyze Unilever’s strategic plan of focusing on top brands within its core market sectors as well as its emphasis on growth within developing countries. Throughout this report, we can also see the performance of the previous CEO, Paul Polman, who took over on January 1st 2009 and lauched the 2020 Sustainable Living Plan.
This report will be broken down into 5 stages. After the introduction, we will look into data description, data analysis, forecasting, and finally the conclusion. This report will concentrate on forecasting the sales growth with the leadership of a new CEO (Paul Polman) and observing how the actual results relate to the forecasted results.
library(fpp2)
library(knitr)
library(vars)
unilever_sales = read.csv("C:/ECON 4210/unilever_sales.csv")
df = ts(unilever_sales, start= c(2003, 1), frequency=4)
y = df[,"UNILEVER"]
As mentioned in the Introduction, we will be diving into the analysis of Unilever’s sales. However, to make the analysis more interesting and meaningful, we will look at other variables that are influenced by the sales of Unilever. We will analyze one of Unilever’s largest competitors, Proctor and Gamble (P&G), as well as the total sales of retail and food services in the US. All of the 3 variables are in billions of US dollars ($USD). All of the data in this report has been collected from Bloomberg Terminal. Bloomberg Terminal is a great computer software that has access to a large amount of financial market data.
y %>%
stl(t.window=13, s.window="periodic", robust=TRUE) %>%
autoplot()
By using Loess (stl), we can gain a deeper understanding of the trend, seasonality and remainder in the graph, which will give us some insightful information.
The data appears to be non-stationary as the values tend to increase over the years, except for a few anomalies. There is a general increasing trend after 2010. The data also shows a lot of seasonality. The remainder component stays relatively the same (alternating from negative to positive) with a few outliers, such as prior to 2005.
tsdisplay(y, main="Quarterly Sales of Unilever")
The first graph shows the quarterly sales of Unilever in billions of USD. It illustrates a general increasing trend with significant seasonality. The ACF is slowly declining and it shows that there is correlation in the data. This means that the sales of Unilever is influenced by its previous sales data. The Partial ACF shows that there are a few anomilies within the first 5 lags that need to be examined further.
yr = 100*diff(log(y))
Acf(yr, main = "Unilever Q o Q sales", col="purple", lwd=3)
Based on this ACF, we can clearly see the seasonal nature of the sales data. In general, the peaks of sales tends to be in Q4 and the troughs tend to be in Q2. This is because Q4 marks the start of the holiday season, and shopping for consumer goods is at a high during this time. Sales tends to be sluggish during Q2 since it is right after all the holiday season spending.
h <- hist(y,col = "red", xlab = "Intervals", main = "Unilever Quarterly Sales")
xfit <- seq(min(y), max(y), length = 40)
yfit <- dnorm(xfit, mean = mean(y), sd = sd(y))
yfit <- yfit * diff(h$mids[1:2]) * length(y)
lines(xfit, yfit, col = "blue", lwd = 4)
The graph shows that there is some evidence of normal distribution. The 11-12 billion interval does not match the normal distribution pattern of the data. For this reason, we cannot justify that there is perfect normal distribution. However, besides the 11-12 billion dollars interval, the other intervals seem to be normally distributed. Although this is not bad for Unilever, it is not the best case either. A growing company would have a curve that is slanted to the right as most of the data would be in higher intervals. Unilever is rather normalized since there is little growth in the industry. This could be due to intense competition or the growing trend of e-commerce purchases. Either way, Unilever should look into strategies to enhance growth and adapt to the changing preferences of customers (e-commerce).
summary(y)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8136 10234 11862 11567 12750 14406
The statistics show a large difference between the maximum quarterly sales over the years and the minimum. The largest quarterly sales was $14,406 billions, and the minimum was $8,136 billion. This large difference of more than $6 billion shows us that the consumer spending market is always changing and that seasonality is very significant in this industry.
We will be looking at many different models that vary in complexity and we will go into deep analyis of these models. We will begin with the more simple models such as benchmarks and linear regressions and then move onto some of the more advanced models such as VAR and Arima.
We will use 80% of the data as training and 20% as testing. Since there are 67 data points, 52 data points will be used as training (until the end of 2015).
train <- window(y,start=c(2003, 1),end=c(2015, 4))
test <- window(y,start=2016)
both <- window(y,start=c(2003, 1))
h=length(test)
We will first be analysing the basic benchmarks, mean, seasonal naive, naive, and drift models).
Since the data has high seasonality, we can expect that the season naive model will perform well.
fit1 <- meanf(train, h=h)
fit2 <- naive(train, h=h)
fit3 <- snaive(train, h=h)
fit4 <- rwf(train, h=h, drift=TRUE)
p1 = autoplot(fit1)
p2 = autoplot(fit2)
p3 = autoplot(fit3)
p4 = autoplot(fit4)
gridExtra::grid.arrange(p1, p2, p3, p4, nrow=2)
Just like expected, the only graph that succeeded in capturing the seasonality was the seasonal naive approach. As a result, the seasonal naive model will be great for comparing with other models.
# Multiple linear model
tps <- tslm(train ~ trend + season)
fit5 = forecast(tps,h=h)
# Linear model
lm = tslm(train ~ trend, data=train)
fit6 = forecast(lm,h=h)
p5 = autoplot(fit5, main = 'Forecasting using Multiple Linear Model')
p6 = autoplot(fit6, main = 'Forecasting using Linear Model')
gridExtra::grid.arrange(p5, p6, nrow=2)
The multiple linear model appears to be a good forecasting model as it did a great job in capturing the trend and seasonality. It is expected that the linear model would perform poorly and not capture the seasonality since the data is very non-linear.
STL <- stl(train, t.window=15, s.window="periodic", robust=TRUE)
fit7 <- forecast(STL, method="rwdrift",h=h)
p7 = autoplot(fit7, main = "STL Model")
p7
As expected, the STL model is effective in portraying the seasonality and trend in the data. This is expected because Loess fits many localized curves and then smooths out the regression. As a result, Loess is capable of capturing all the jumps and spikes that appear in the data.
# Simple Exponential Moving Averages
fit8 <- ses(train, h = h)
p8 = autoplot(fit8)
# Holt's Linear Trend Method
fit9 <- holt(train, h=h)
p9 = autoplot(fit9)
# Holt's Damped Trend Method
fit10 <- holt(train, damped=TRUE, h=h)
p10 = autoplot(fit10)
# Holt Winter's Method
fit11 <- hw(train, seasonal="multiplicative", h=h)
p11 = autoplot(fit11)
# ETS method
ETS <- ets(train, model="ZZZ")
fit12 <- forecast(ETS, h=h)
p12 = autoplot(fit12)
gridExtra::grid.arrange(p8, p9, p10, p11, p12, nrow=3)
Looking at the first model, it is not effective at all since the forecast is completely flat and neither trend nor seasonality is captured. In fact, it is easy to determine that simple exponential smoothing, Holt’s, and Damped Holt’s are all poor models for forecasting purposes. This is mainly because seasonality is completely ignored within these models.
It appears that the Holt-Winter and ETS model are the only 2 models that are effective in capturing the seasonality in the data. In addition, these models have a relatively wide confidence interval which makes it more efficient for forecasting purposes. We will need to look at accuracy measures to gainer a deeper insight on which models are the most appropriate.
a1 = accuracy(fit1, test)
a2 = accuracy(fit2, test)
a3 = accuracy(fit3, test)
a4 = accuracy(fit4, test)
a5 = accuracy(fit5, test)
a6 = accuracy(fit6, test)
a7 = accuracy(fit7, test)
a8 = accuracy(fit8, test)
a9 = accuracy(fit9, test)
a10 = accuracy(fit10, test)
a11 = accuracy(fit11, test)
a12 = accuracy(fit12, test)
a.table<-rbind(a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12)
row.names(a.table)<-c('Mean training','Mean test',
'Naive training', 'Naive test',
'S.Naive training', 'S.Naive test',
'Random walk with drift training','Random walk with drift test',
'Multiple Linear trend training','Multiple Linear trend test',
'Linear trend training','Linear trend test',
'STL Training', 'STL Test',
'SES training','SES test',
'Holt linear training', 'Holt linear test',
'Holt dampled training','Holt damped test',
'Holt-Winters training', 'Holt-Winters test',
'ETS training', 'ETS test')
a.table<-as.data.frame(a.table)
a.table<-a.table[order(a.table$MASE),]
kable(a.table)
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
---|---|---|---|---|---|---|---|---|
ETS test | -157.361725 | 437.1021 | 338.7715 | -1.2824561 | 2.607431 | 0.4775353 | 0.4466539 | 0.5427155 |
S.Naive test | -224.066667 | 416.1725 | 344.0667 | -1.7609073 | 2.648759 | 0.4849994 | 0.6678776 | 0.5268820 |
Holt-Winters training | 12.718116 | 575.1009 | 379.9517 | -0.1359438 | 3.629344 | 0.5355833 | 0.0088093 | NA |
STL Training | 0.000000 | 608.8270 | 383.3204 | -0.2471526 | 3.733277 | 0.5403319 | -0.3066030 | NA |
ETS training | 69.925073 | 588.3705 | 397.4980 | 0.3816460 | 3.799414 | 0.5603167 | -0.0313572 | NA |
Holt-Winters test | -263.884426 | 530.0366 | 427.5582 | -2.0757382 | 3.281823 | 0.6026898 | 0.4490094 | 0.6599296 |
Multiple Linear trend test | -111.962821 | 579.1214 | 490.6976 | -0.9560615 | 3.766382 | 0.6916917 | 0.6726679 | 0.7258133 |
SES test | 1.294691 | 596.2616 | 495.9796 | -0.1946907 | 3.777557 | 0.6991373 | 0.0898509 | 0.7459843 |
Holt damped test | -162.232946 | 632.4680 | 520.6365 | -1.4474487 | 4.012944 | 0.7338938 | 0.1531912 | 0.7949803 |
Random walk with drift test | -50.341177 | 648.4026 | 527.9412 | -0.6003274 | 4.039313 | 0.7441905 | 0.2465691 | 0.8291559 |
STL Test | -417.604155 | 640.0290 | 529.2452 | -3.2957167 | 4.102292 | 0.7460288 | 0.5769317 | 0.8050945 |
Naive test | 238.600000 | 642.2275 | 533.9333 | 1.6173800 | 3.999412 | 0.7526371 | 0.0898509 | 0.8169222 |
Linear trend test | -78.089874 | 706.5688 | 571.2643 | -0.8204025 | 4.379322 | 0.8052592 | 0.3620423 | 0.9109225 |
SES training | 68.761994 | 845.8969 | 623.3783 | 0.0684671 | 5.781106 | 0.8787195 | 0.0966068 | NA |
Holt dampled training | 143.125667 | 826.2525 | 647.9698 | 0.8535116 | 5.932880 | 0.9133840 | 0.0766493 | NA |
Holt linear training | 105.318822 | 859.6790 | 671.1556 | 0.6196894 | 6.202467 | 0.9460669 | 0.0783842 | NA |
Multiple Linear trend training | 0.000000 | 899.5068 | 699.2912 | -0.6725628 | 6.370121 | 0.9857270 | 0.7248053 | NA |
S.Naive training | 121.875000 | 984.3011 | 709.4167 | 0.5342737 | 6.527860 | 1.0000000 | 0.6055581 | NA |
Naive training | 36.117647 | 923.1701 | 716.1961 | -0.0934340 | 6.683591 | 1.0095563 | -0.1727067 | NA |
Random walk with drift training | 0.000000 | 922.4633 | 721.0750 | -0.4234805 | 6.737776 | 1.0164336 | -0.1727067 | NA |
Linear trend training | 0.000000 | 1019.4539 | 805.1757 | -0.8703640 | 7.405344 | 1.1349828 | 0.5767723 | NA |
Holt linear test | -839.856603 | 1145.3256 | 965.8609 | -6.6463658 | 7.535467 | 1.3614861 | 0.4718621 | 1.4570954 |
Mean training | 0.000000 | 1382.0836 | 1203.5533 | -1.5770401 | 11.022144 | 1.6965393 | 0.7654743 | NA |
Mean test | 2004.907692 | 2091.6934 | 2004.9077 | 15.1049598 | 15.104960 | 2.8261356 | 0.0898509 | 2.6129355 |
Seasonal naive was the best performing benchmark model since the MAPE and MASE were the lowest. Thus, it is the most important benchmark in this analysis. We will use the benchmark models to compare with the other models.
The Multiple Linear Model graph appeared to capture the seasonality and trend well so we expected it to perform well. In fact, the multiple Linear Model performed better than both Mean and Naive. However, Seasonal Naive outperformed it in all categories. The Linear trend performed worse than the Naive method but better than the Mean method.
The STL model performed better than the Naive and Mean methods in all categories. However, it did not outperform Seasonal Naive.
The Holt Linear model performed very poorly. In fact, it was the second worst performing model (behind the Mean method). This is because it ignored the seasonality aspect of the data. On the other hand, the Holt-Winter model performed very well. The Seasonal Naive method beat Holt-Winter in all categories except for ME and MPE. The Holt Damped method was inferior to the Holt-Winter model in all categories, as is expected. However, it is better than the other 2 benchmarks. Simple Exponential Smoothing was similar to the Holt Damped method as it underperformed compared to Holt-Winter and Seasonal Naive. The ETS model was one of the best performing models. The ETS model performed better than the Seasonal Naive for MAE, MAPE, and MASE. And Seasonal Naive had a better ME, RMSE, and MPE.
Among the basic models that we examined above, we have determined that the ETS models has outperformed all the other in all categories except for Seasonal Naive for ME, RMSE and MPE, and Holt-Winter for ME and MPE. Therefore, we can conclude that Seasonal Naive, ETS, and Holt-Winter are the top 3 models among the basic models.
I will be conducting a time series cross validation on the top 4 basic modes; Seasonal Naive, ETS, Holt-Winter, and Multiple Linear trend.
# Seasonal Naive
snaive_function <- function(x, h) {
s.naive <- snaive(x, model="YYY")
forecast(s.naive, h=h)
}
# ETS method
ets_function <- function(x, h) {
ETS <- ets(x, model="ZZZ")
forecast(ETS, h=h)
}
# holt winter's method
holtwinter_function <- function(x, h) {
HW <- hw(x, s.window="multiplicative")
forecast(HW, h=h)
}
# Multiple linear model
tslm_function <- function(x,h) {
tps <- tslm(train ~ trend + season)
forecast(tps,h=h)
}
Instead of windowing the data, I will be using the entire data so that the analysis will be stronger and more accurate. The forecast period will be 6 periods.
# Calling functions and generating tsCV
e1 <- tsCV(y, snaive_function, h=6)
e2 <- tsCV(y, ets_function, h=6)
e3 <- tsCV(y, holtwinter_function, h=6)
e4 <- tsCV(y, tslm_function, h=6)
mse1 = mean(e1^2, na.rm=TRUE)
mse2 = mean(e2^2, na.rm=TRUE)
mse3 = mean(e3^2, na.rm=TRUE)
mse4 = mean(e4^2, na.rm=TRUE)
a.table<-rbind(mse1, mse2, mse3, mse4)
row.names(a.table)<-c('S.NAIVE MSE', 'ETS MSE', 'HW MSE', 'TSLM MSE')
a.table<-as.data.frame(a.table)
a.table
## V1
## S.NAIVE MSE 993709.5
## ETS MSE 865410.1
## HW MSE 1324054.5
## TSLM MSE 4483819.8
Loops will be used in this process.
models = c("snaive_function","ets_function","holtwinter_function","tslm_function")
for (mdl in models) {
mse = c()
for (i in c(1,2,3,4,5,6)){
e <- tsCV(y, get(mdl), h=i)
mse[i] = mean(e^2, na.rm=TRUE)
}
plot(mse,ann=FALSE)+
title(main=mdl, col.main="red", font.main=4)+
title(xlab="h")+
title(ylab="MSE")
}
For majority of the models, we see an increasing trend in MSE as the forecast horizon increases. This analysis confirmed the results that we saw in the previous discussion of accuracy measures. ETS is the best model since it achieved the smallest MSE and its MSE rises at a stable rate relative to the forecast periods. Seasonal naive came close behind ETS but its MSE appeared to fluctuate greatly over the periods. Holt-Winter and the Multiple Linear trend came 3rd and 4th respectively. The Multiple Linear trend did not come even close to Holt-Winter or the other 2 models as it achieved a very high MSE.
We will be looking at some of the more advanced models in this section.
ndiffs(y)
## [1] 1
This shows that only 1 difference is required to optimize this model.
fit13 = forecast(auto.arima(train, d=1), h=h)
p13 = autoplot(fit13)
autoplot(y, series = "Actual") +
ggtitle("Unilever Quarterly Sales") +
autolayer(fit13, PI = FALSE, series = "ARIMA") +
ylab("$ Billions of USD")+
theme(plot.title = element_text(size=15, face = "bold",hjust = 0.5) ,
axis.text.x = element_text(size=10),
axis.text.y = element_text(size=10))
The Arima model appeared to have captured the beginning of the testing decently but not the ending part. The Arima model overestimated the sales in 2019 and 2020 and it may not be the best model out there for this data.
cor(df[,2:4])
## UNILEVER P.G R.F.SALES
## UNILEVER 1.00000000 -0.03605259 0.7363960
## P.G -0.03605259 1.00000000 0.1447701
## R.F.SALES 0.73639602 0.14477014 1.0000000
The table above illustrates the correlation between Unilever Sales, P&G sales and US retail and food services sales (R&F SALES). It is evident that there is a weak negative relationship (-3%) between the sales of Unilever and P&G. I expected there to be a positive relationship between these 2 variables since they both compete in the same industry and market. A potential reason for this relationship is that they use very different sales strategies. There is a moderate positive relationship (73%) between Unilever sales and US R&F sales. However, there is only a weak positive relationship (14%) between P&G sales and US R&F sales. This could mean that P&G is less dependent on US retail and food services sales than is Unilever. It also might mean that P&G has greater growth potential than Unilever.
vardata = log (df[,c(4,3,2)])
plot(vardata, main = "VAR Data", xlab = "Year")
Looking at the graphs in comparison with one another will give us a better understanding of their relationships.
As can be seen, the US retail and Food services sales has been steadily growing throughout the years, with a significant dip during the financial crisis of 2008-2009. Unilever sales and P&G sales have a similar pattern but there is absolutely no correlation between them.
Since collecting at least 2 years of data is necessary, we will be using a maximum lag of 9.
VARselect(vardata, lag.max = 9, type = "both", season=4)
## $selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 2 1 1 2
##
## $criteria
## 1 2 3 4
## AIC(n) -2.145111e+01 -2.154652e+01 -2.131241e+01 -2.151242e+01
## HQ(n) -2.111901e+01 -2.108988e+01 -2.073122e+01 -2.080670e+01
## SC(n) -2.059852e+01 -2.037420e+01 -1.982036e+01 -1.970065e+01
## FPE(n) 4.855182e-10 4.451654e-10 5.712172e-10 4.790129e-10
## 5 6 7 8
## AIC(n) -2.147134e+01 -2.147316e+01 -2.141045e+01 -2.125112e+01
## HQ(n) -2.064108e+01 -2.051836e+01 -2.033111e+01 -2.004724e+01
## SC(n) -1.933984e+01 -1.902194e+01 -1.863951e+01 -1.816046e+01
## FPE(n) 5.170233e-10 5.423184e-10 6.177961e-10 7.929327e-10
## 9
## AIC(n) -2.112460e+01
## HQ(n) -1.979619e+01
## SC(n) -1.771421e+01
## FPE(n) 1.013551e-09
The output above illustrates the lag length selected by all the criteria in the VARS package. VAR(1) was selected by SC as well as HQ, while VAR(2) was selected by AIC and FPE. Since there is a discrepancy, we will fit VAR(1) first.
var1 <- VAR(vardata, p=1, type="both", season=4)
serial.test(var1, lags.pt=16, type="PT.asymptotic")
##
## Portmanteau Test (asymptotic)
##
## data: Residuals of VAR object var1
## Chi-squared = 136.59, df = 135, p-value = 0.4455
With this model, we achieve a p-value of 0.44 and thus the null hypothesis of no auto correlation is not rejected since the p-value is greater than 0.05. This means that the model is good because there is not enough evidence to jusify the pressence of auto correlation. Thus, we will be using this model.
summary(var1)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: R.F.SALES, P.G, UNILEVER
## Deterministic variables: both
## Sample size: 66
## Log Likelihood: 418.104
## Roots of the characteristic polynomial:
## 0.9133 0.9133 0.6856
## Call:
## VAR(y = vardata, p = 1, type = "both", season = 4L)
##
##
## Estimation results for equation R.F.SALES:
## ==========================================
## R.F.SALES = R.F.SALES.l1 + P.G.l1 + UNILEVER.l1 + const + trend + sd1 + sd2 + sd3
##
## Estimate Std. Error t value Pr(>|t|)
## R.F.SALES.l1 0.9081267 0.0531956 17.071 <2e-16 ***
## P.G.l1 -0.0230997 0.0151171 -1.528 0.132
## UNILEVER.l1 0.0126274 0.0314560 0.401 0.690
## const 0.6412181 0.5294744 1.211 0.231
## trend 0.0006954 0.0004921 1.413 0.163
## sd1 0.0101822 0.0064426 1.580 0.119
## sd2 0.0053863 0.0065582 0.821 0.415
## sd3 0.0034124 0.0061121 0.558 0.579
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.01735 on 58 degrees of freedom
## Multiple R-Squared: 0.9884, Adjusted R-squared: 0.987
## F-statistic: 704.3 on 7 and 58 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation P.G:
## ====================================
## P.G = R.F.SALES.l1 + P.G.l1 + UNILEVER.l1 + const + trend + sd1 + sd2 + sd3
##
## Estimate Std. Error t value Pr(>|t|)
## R.F.SALES.l1 0.0092725 0.1131848 0.082 0.9350
## P.G.l1 0.9214422 0.0321647 28.648 < 2e-16 ***
## UNILEVER.l1 -0.0191145 0.0669293 -0.286 0.7762
## const 0.9137007 1.1265675 0.811 0.4207
## trend -0.0004345 0.0010471 -0.415 0.6797
## sd1 -0.0997854 0.0137079 -7.279 9.94e-10 ***
## sd2 -0.0263625 0.0139540 -1.889 0.0639 .
## sd3 -0.0002958 0.0130047 -0.023 0.9819
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.03693 on 58 degrees of freedom
## Multiple R-Squared: 0.9512, Adjusted R-squared: 0.9453
## F-statistic: 161.6 on 7 and 58 DF, p-value: < 2.2e-16
##
##
## Estimation results for equation UNILEVER:
## =========================================
## UNILEVER = R.F.SALES.l1 + P.G.l1 + UNILEVER.l1 + const + trend + sd1 + sd2 + sd3
##
## Estimate Std. Error t value Pr(>|t|)
## R.F.SALES.l1 -0.078283 0.163908 -0.478 0.6347
## P.G.l1 -0.005756 0.046579 -0.124 0.9021
## UNILEVER.l1 0.682277 0.096924 7.039 2.52e-09 ***
## const 3.409296 1.631437 2.090 0.0410 *
## trend 0.002470 0.001516 1.629 0.1088
## sd1 0.034993 0.019851 1.763 0.0832 .
## sd2 0.140552 0.020207 6.955 3.48e-09 ***
## sd3 0.042086 0.018833 2.235 0.0293 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.05347 on 58 degrees of freedom
## Multiple R-Squared: 0.8607, Adjusted R-squared: 0.8439
## F-statistic: 51.18 on 7 and 58 DF, p-value: < 2.2e-16
##
##
##
## Covariance matrix of residuals:
## R.F.SALES P.G UNILEVER
## R.F.SALES 0.0003012 0.0002479 -0.0001207
## P.G 0.0002479 0.0013635 -0.0005127
## UNILEVER -0.0001207 -0.0005127 0.0028594
##
## Correlation matrix of residuals:
## R.F.SALES P.G UNILEVER
## R.F.SALES 1.0000 0.3869 -0.1301
## P.G 0.3869 1.0000 -0.2597
## UNILEVER -0.1301 -0.2597 1.0000
We can look at the Adjusted R-Square values to determine how well the model did.
The estimated R-Square for US retail and foods services sales is 98%. This means that it performed very well in the model. The R-Square for P&G sales is 94%, which means that this variable also performed very well in the model. Lastly, the R-Square for Unilever sales is 84%, which also indicates a strong performance.
roots(var1)
## [1] 0.9132988 0.9132988 0.6855673
Looking at the roots, we can see that all of the roots are less than 1 and average around 0.83 which indicates that the model is good.
causality(var1, cause= c("UNILEVER","R.F.SALES"))
## $Granger
##
## Granger causality H0: R.F.SALES UNILEVER do not Granger-cause P.G
##
## data: VAR object var1
## F-Test = 0.050309, df1 = 2, df2 = 174, p-value = 0.9509
##
##
## $Instant
##
## H0: No instantaneous causality between: R.F.SALES UNILEVER and
## P.G
##
## data: VAR object var1
## Chi-squared = 10.737, df = 2, p-value = 0.004661
causality(var1, cause= c("UNILEVER","P.G"))
## $Granger
##
## Granger causality H0: P.G UNILEVER do not Granger-cause R.F.SALES
##
## data: VAR object var1
## F-Test = 1.853, df1 = 2, df2 = 174, p-value = 0.1598
##
##
## $Instant
##
## H0: No instantaneous causality between: P.G UNILEVER and
## R.F.SALES
##
## data: VAR object var1
## Chi-squared = 8.6411, df = 2, p-value = 0.01329
Based on the above results, there is granger causality between Unilever sales and R&F sales because the p-value is 0.0046 which is less than the significance level of 0.05. This means that we reject the null hypothesis of non-causality. Therefore, we can conclude that US retail and food services sales as well as P&G sales affect the performance of Unilever sales.
Impulse response (IR) is an effective technique for performing what-if analysis (e.g. what happens if A increases to B by C percent?)
par(mfrow=c(2,2))
var1a.irf <- irf(var1, n.ahead = 16, boot = TRUE, runs=500, seed=99, cumulative=FALSE)
plot(irf(var1, response = "UNILEVER", n.ahead = 24, boot = TRUE, runs=500) , plot.type = "single")
The IRF describes how a variable responds to a shock in another variable, in the periods following the shock.
Looking at the first graph, it shows how Unilever’s sales are impacted from a shock in the sales of US retail and food services. In this relationship, there is a large confidence interval, but the dotted red lines dont cross over zero. This means that there is little affect on Unilever sales.
The second graph examines the relationship between Unilever’s sales and P&G’s sales. The confidence interval here is smaller and the dotted red lines don’t cross over zero. Thus, P&G’s sales have little to no effect on Unilever’s sales.
The third graph has the smallest confidence interval compared to the other graphs. The dotted red line crosses over in the 4th lag. This indicates that Unilever sales affects the sales of its company, which is intuitive.
par(mfrow=c(2,2))
plot(irf(var1, response = "P.G", n.ahead = 24, boot = TRUE, runs=500) , plot.type = "single")
The first graph shows us the relationship between US retail and food services sales and P&G sales response. There is a crossover of the dotted red line second lag which indicated that retail and food services sales affects the performance of P&G.
The second graph also has a cross over after the 12th lag which indicates that P&G’s sales has an effect on itself.
The third graph does not have a cross over which indicates that Unilever’s sales does not impact the sales of P&G.
par(mfrow=c(2,2))
plot(irf(var1, response = "R.F.SALES", n.ahead = 24, boot = TRUE, runs=500) , plot.type = "single")
The first graph is the relationship between R&F sales and itself. The cross over of the dotted lines indicated an impact since its being compared to itself.
The second graph and third graph indicate that both P&G sales and Unilever sales do not affect retail and food services sales since the dotted red lines do not cross over zero. It is interesting to not that P&G sales got a much wider confidence interval than Unilever sales did.
Forecast error variance decomposition will give us further insights on the VAR model.
fevd(var1, n.ahead = 16)
## $R.F.SALES
## R.F.SALES P.G UNILEVER
## [1,] 1.0000000 0.000000000 0.0000000000
## [2,] 0.9975682 0.001642151 0.0007896288
## [3,] 0.9927086 0.005158734 0.0021326220
## [4,] 0.9860442 0.010251973 0.0037038112
## [5,] 0.9780505 0.016633115 0.0053163476
## [6,] 0.9691069 0.024025910 0.0068672284
## [7,] 0.9595258 0.032170500 0.0083036613
## [8,] 0.9495701 0.040827375 0.0096025251
## [9,] 0.9394612 0.049780857 0.0107579117
## [10,] 0.9293847 0.058841648 0.0117736038
## [11,] 0.9194933 0.067848193 0.0126585509
## [12,] 0.9099090 0.076666823 0.0134241633
## [13,] 0.9007265 0.085190797 0.0140827094
## [14,] 0.8920151 0.093338470 0.0146463831
## [15,] 0.8838224 0.101050841 0.0151267805
## [16,] 0.8761766 0.108288742 0.0155346255
##
## $P.G
## R.F.SALES P.G UNILEVER
## [1,] 0.1497082 0.8502918 0.0000000000
## [2,] 0.1514059 0.8482117 0.0003824584
## [3,] 0.1530610 0.8459692 0.0009698569
## [4,] 0.1546700 0.8437387 0.0015913632
## [5,] 0.1562240 0.8416063 0.0021696571
## [6,] 0.1577136 0.8396112 0.0026752017
## [7,] 0.1591305 0.8377675 0.0031019592
## [8,] 0.1604685 0.8360769 0.0034546623
## [9,] 0.1617234 0.8345343 0.0037422807
## [10,] 0.1628931 0.8331321 0.0039747845
## [11,] 0.1639774 0.8318610 0.0041616445
## [12,] 0.1649770 0.8307117 0.0043112264
## [13,] 0.1658944 0.8296750 0.0044306329
## [14,] 0.1667324 0.8287419 0.0045257536
## [15,] 0.1674947 0.8279039 0.0046014021
## [16,] 0.1681855 0.8271530 0.0046614760
##
## $UNILEVER
## R.F.SALES P.G UNILEVER
## [1,] 0.01692887 0.05153899 0.9315321
## [2,] 0.02059491 0.05208621 0.9273189
## [3,] 0.02411528 0.05237358 0.9235111
## [4,] 0.02731632 0.05246156 0.9202221
## [5,] 0.03010107 0.05242004 0.9174789
## [6,] 0.03244225 0.05231093 0.9152468
## [7,] 0.03436096 0.05218015 0.9134589
## [8,] 0.03590445 0.05205693 0.9120386
## [9,] 0.03712934 0.05195707 0.9109136
## [10,] 0.03809166 0.05188700 0.9100213
## [11,] 0.03884181 0.05184733 0.9093109
## [12,] 0.03942278 0.05183550 0.9087417
## [13,] 0.03987014 0.05184745 0.9082824
## [14,] 0.04021271 0.05187867 0.9079086
## [15,] 0.04047356 0.05192480 0.9076016
## [16,] 0.04067097 0.05198187 0.9073472
The first table shows the relationship of R&F sales to itself, P&G sales and Unilever sales. R&F’s relationship to itself is strongest in the first quarter and decreases over time. R&F’s influence on P&G sales and Unilever sales increases with each passing quarter.
The second table shows the relationship of P&G sales with the other variables. As expected from the IRF, P&G influence on itself averages about 83%. Influence on retail and food services sales is fairly small, hovering around 16%. This makes sense because P&G’s sales has a positive relationship to food and retail sales. P&G’s sales has basically no influence over Unilever’s sales.
The third tables shows the relationship of Unilever sales with the other variables. Just like the other variables, Unilever’s sales have a strong influence on itself. However, Unilever’s sales does not have as strong as an influence on R&F sales as P&G did (only around 3%). Unilever sales has a very minimal influence on both other variables.
train_var <- window(vardata,start=c(2003, 1),end=c(2015, 4))
test_var <- window(vardata, start=c(2016,1))
both_var <- window(vardata, start=c(2003,1))
h_var=dim(test_var)[1]
#Forecasting data
var2 <- VAR(train_var, p=1, type="const", season=4)
var.fc2 = forecast(var2, h=h_var)
autoplot(var.fc2) + xlab("Year")
We can see that both Unilever sales and sales of US retail and food services are forecasted to increase while P&G sales is forecasted to drop dramatically. Retail and food services sales also has the smallest confidence interval which makes sense due to its low volatility.
train <- window(y,start=c(2003, 1),end=c(2015, 4))
test <- window(y, start=c(2016,1))
both <- window(y, start=c(2003, 1))
h=length(test)
fit.nnet <- forecast(nnetar(train), h=h)
autoplot(fit.nnet) +
autolayer(y, series="Actual Data") +
autolayer(fit.nnet, PI = FALSE, series = "NNETAR") +
ylab("Unilever Sales ($ Billions)")
The neural network did not do a great job in capturing the seasonality of Unilever sales nor the trend of its sales. We cannot rely on this model for forecasting the sales data of Unilever.
fit.tbats <- forecast(tbats(train), h=h)
autoplot(fit.tbats) +
autolayer(y, series="Actual Data") +
ylab("Unilever Sales ($ Billions)")
This model seemed to have captured the seasonality of the data but the trend could have been captured a little better. In addition, the confidence interval for this model is very large.
a13 = accuracy(fit13, test)
a15 = accuracy(fit.nnet, test)
a16 = accuracy(fit.tbats, test)
a.table<-rbind(a13, a15, a16)
row.names(a.table)<-c('ARIMA Training','ARIMA Test',
'ANN Training', 'ANN Test',
'BATS Training', 'BATS Test')
a.table<-as.data.frame(a.table)
a.table<-a.table[order(a.table$MASE),]
kable(a.table, caption="Accuracy Measure Table")
ME | RMSE | MAE | MPE | MAPE | MASE | ACF1 | Theil’s U | |
---|---|---|---|---|---|---|---|---|
ARIMA Training | 174.0429899 | 597.4257 | 375.9920 | 1.6131719 | 3.590891 | 0.5300016 | -0.1415879 | NA |
BATS Test | -207.2778865 | 487.5932 | 385.3603 | -1.6797640 | 2.975865 | 0.5432073 | 0.2375712 | 0.5960236 |
BATS Training | 74.3730417 | 599.2646 | 420.2261 | 0.3966054 | 3.962866 | 0.5923544 | -0.0504317 | NA |
ANN Training | -0.0472237 | 636.7286 | 430.1037 | -0.4292754 | 4.011947 | 0.6062780 | 0.1910786 | NA |
ANN Test | 186.9139361 | 613.1337 | 506.1402 | 1.2278593 | 3.801605 | 0.7134598 | 0.1273265 | 0.7863846 |
ARIMA Test | -680.2187254 | 873.5545 | 724.0570 | -5.2928380 | 5.613282 | 1.0206371 | 0.7114370 | 1.1076962 |
a14 = accuracy(exp (var.fc2$forecast$UNILEVER$mean), exp (test_var[,3]))
kable(a14)
ME | RMSE | MAE | MPE | MAPE | ACF1 | Theil’s U | |
---|---|---|---|---|---|---|---|
Test set | -1459.487 | 1786.468 | 1462.364 | -11.23822 | 11.25982 | 0.7244436 | 2.274318 |
The best model appear to be TBATS by far. TBATS achieved a MASE of close to 0.5 which is why its forecasts were very close to the actual data. I am not suprised that ARIMA did poorly since it clearly failed to capture the fluctuating patterns of Unilever’s sales. The VAR model achieved a great ME and MPE (better than TBATS by far), but its other categories performed poorly. As a result, I would conclude that TBATS is the superior model.
A key theme throughout this report is e-commerce and the changing consumer preferences from Brick and Mortar companies to e-commerce companies such as Amazon. Amazon has one of the largest e-commerce platforms and sells products in a variety of different industries, including consumer products. Brick and Mortar companies such as Unilever are struggling to maintain high growth due to competition from e-commerence companies.
E-commerce makes up 5% of Unilever’s $10 billion in sales in North America, which includes the company’s food, beverage and personal care brands. Unilever is investing in improving its digital/e-commerce business model for consumer packaged goods. In fact, Unilever, is building their business through online channels such as Amazon.
It is very important for Unilever to continue to invest in its e-commerce platforms since we concluded that its sales have a moderate correlation with retail and food services sales. Since the market for retail and food services is turning to e-commerce, Unilever must continue to follow this trend. This is because we learned that retail and food services sales in the US has a significant influence on both P&G’s and Unilever’s sales. Furthermore, this will also help Unilever stay competitive with P&G since we concluded that Unilever’s sales is somewhat affected by P&G sales.
Overall, it is clear that Unilever must continue to differentiate itself since the intense competition in the industry may affect its ability to be profitable.
Furthermore, most of the forecasts of Unilever’s sales reflected positive growth even though its sales have remained relatively stagnant in recent years. This is exactly why Unilever should continue to work with technological companies to prepare for changes in the consumer products industry.
https://www.foodbusinessnews.net/articles/14913-inside-unilevers-e-commerce-strategy
https://www.digitalcommerce360.com/2019/05/01/unilever-grew-global-ecommerce-nearly-50-in-2018/
https://www.referenceforbusiness.com/history2/19/Unilever.html
https://business.financialpost.com/entrepreneur/unilevers-corporate-challenge-find-purpose-or-perish