library(ggplot2)
library(forecast)
library(tseries)
library(zoo)
library(lubridate)
library(TSA)
library(xts)
library(dplyr)
library(scales)
library(vars)
library(readxl)
library(car)
library(glmnet)
library(gridExtra)
Time series analysis is a statistical technique that involves analyzing data points collected or recorded at specific time intervals. This type of analysis is used to identify patterns, trends, cycles, and seasonal variations in the data over time. The primary goal of time series analysis is to forecast future values based on historical data, making it particularly useful in fields such as finance, economics, weather forecasting, and many other areas where data evolves over time (Chatfield, 2003).
Stationarity is a fundamental concept in time series analysis, which implies that the statistical properties of the series, such as mean and variance, remain constant over time. Stationary time series are easier to model and predict because their behavior is consistent over time. Non-stationary data, on the other hand, may exhibit trends, seasonality, or varying variance, complicating the analysis and potentially leading to misleading results. Ensuring stationarity often involves transforming or differencing the data to remove trends and seasonality, allowing for more accurate modeling and forecasting (Box et al., 2015).
Time series analysis is crucial for business intelligence because it enables organizations to make data-driven decisions by forecasting future trends, demand, or performance based on historical data. It can be applied to various business problems, such as predicting sales, stock prices, or customer behavior. This analysis helps businesses anticipate changes, manage inventory, optimize marketing strategies, and improve operational efficiency by understanding patterns over time and responding proactively.
Sources Chatfield, C. (2003). The Analysis of Time Series: An Introduction. Chapman and Hall/CRC. Box, G.E.P., Jenkins, G.M., Reinsel, G.C., & Ljung, G.M. (2015). Time Series Analysis: Forecasting and Control. Wiley.
For this problem situation, our objective is to build a predictive model for Arca Continental, mainly for their biggest and most popular branch in Latin America, Coca-Cola. We want to create a model that can successfully forecast the a designated target variable, with which we can predict the company’s box sales (referred to as “grids”) in the metropolitan area of Guadalajara, Jalisco.
str(cocacola)
## tibble [48 × 15] (S3: tbl_df/tbl/data.frame)
## $ tperiod : POSIXct[1:48], format: "2021-01-15" "2021-02-15" ...
## $ sales_unitboxes : num [1:48] 5516689 5387496 5886747 6389182 6448275 ...
## $ consumer_sentiment: num [1:48] 38.1 37.5 38.5 37.8 38 ...
## $ CPI : num [1:48] 87.1 87.3 87.6 87.4 87 ...
## $ inflation_rate : num [1:48] -0.09 0.19 0.41 -0.26 -0.5 0.17 0.15 0.21 0.37 0.51 ...
## $ unemp_rate : num [1:48] 0.0523 0.0531 0.0461 0.051 0.0552 ...
## $ gdp_percapita : num [1:48] 11660 11660 11660 11626 11626 ...
## $ itaee : num [1:48] 104 104 104 108 108 ...
## $ itaee_growth : num [1:48] 0.0497 0.0497 0.0497 0.0318 0.0318 ...
## $ pop_density : num [1:48] 98.5 98.5 98.5 98.8 98.8 ...
## $ job_density : num [1:48] 18.3 18.5 18.6 18.7 18.7 ...
## $ pop_minwage : num [1:48] 9.66 9.66 9.66 9.59 9.59 ...
## $ exchange_rate : num [1:48] 14.7 14.9 15.2 15.2 15.3 ...
## $ max_temperature : num [1:48] 28 31 29 32 34 32 29 29 29 29 ...
## $ holiday_month : num [1:48] 0 0 0 1 0 0 0 0 1 0 ...
cocacola$holiday_month <- as.factor(cocacola$holiday_month)
cocacola$tperiod <- as.Date(cocacola$tperiod, format = "%d-%m-%Y")
day_month <- format(cocacola$tperiod, "%d-%m")
corrected_year <- as.numeric(substr(day_month, 1, 2)) + 2000
corrected_month_day <- paste0(corrected_year, "-", substr(day_month, 4, 5))
cocacola$tperiod <- as.Date(paste0(corrected_month_day, "-01"), format = "%Y-%m-%d")
head(cocacola$tperiod)
## [1] "2015-01-01" "2015-02-01" "2015-03-01" "2015-04-01" "2015-05-01"
## [6] "2015-06-01"
str(cocacola)
## tibble [48 × 15] (S3: tbl_df/tbl/data.frame)
## $ tperiod : Date[1:48], format: "2015-01-01" "2015-02-01" ...
## $ sales_unitboxes : num [1:48] 5516689 5387496 5886747 6389182 6448275 ...
## $ consumer_sentiment: num [1:48] 38.1 37.5 38.5 37.8 38 ...
## $ CPI : num [1:48] 87.1 87.3 87.6 87.4 87 ...
## $ inflation_rate : num [1:48] -0.09 0.19 0.41 -0.26 -0.5 0.17 0.15 0.21 0.37 0.51 ...
## $ unemp_rate : num [1:48] 0.0523 0.0531 0.0461 0.051 0.0552 ...
## $ gdp_percapita : num [1:48] 11660 11660 11660 11626 11626 ...
## $ itaee : num [1:48] 104 104 104 108 108 ...
## $ itaee_growth : num [1:48] 0.0497 0.0497 0.0497 0.0318 0.0318 ...
## $ pop_density : num [1:48] 98.5 98.5 98.5 98.8 98.8 ...
## $ job_density : num [1:48] 18.3 18.5 18.6 18.7 18.7 ...
## $ pop_minwage : num [1:48] 9.66 9.66 9.66 9.59 9.59 ...
## $ exchange_rate : num [1:48] 14.7 14.9 15.2 15.2 15.3 ...
## $ max_temperature : num [1:48] 28 31 29 32 34 32 29 29 29 29 ...
## $ holiday_month : Factor w/ 2 levels "0","1": 1 1 1 2 1 1 1 1 2 1 ...
ggplot(cocacola, aes(x=tperiod, y=sales_unitboxes)) +
geom_line() +
labs(title='Box Sales Over Time', x='Time', y='sales_unitboxes') +
theme(axis.text.x = element_text(angle=45, hjust=1))
cocacola$Moving_Avg <- rollmean(cocacola$sales_unitboxes, 12, fill = NA)
ggplot(cocacola, aes(x=tperiod)) +
geom_line(aes(y=sales_unitboxes, colour="Original")) +
geom_line(aes(y=Moving_Avg, colour="4-Periods Moving Average")) +
labs(title='Moving Average Plot', x='Date', y='Sales (Unit Boxes)') +
scale_colour_manual("", breaks = c("Original", "12-Periods Moving Average"), values = c("blue", "red"))
cocacola_ts <- ts(cocacola$sales_unitboxes, frequency=12, start=c(2015, 1))
decomposed <- decompose(cocacola_ts, type="additive")
plot(decomposed)
adf_test <- adf.test(cocacola$sales_unitboxes, alternative="stationary")
print(paste("ADF Statistic: ", adf_test$statistic))
## [1] "ADF Statistic: -4.42816565181968"
print(paste("p-value: ", adf_test$p.value))
## [1] "p-value: 0.01"
sales_ts <- ts(cocacola$sales_unitboxes, frequency=12)
acf(sales_ts, main="Autocorrelation of Sales (Unit Boxes)")
pacf(sales_ts, main="Partial Autocorrelation of Sales (Unit Boxes)")
The dependent variable “sales_unitboxes” has a trendline that moves up, which indicates increasing over time. It also displays seasonality, which is likely due to temperature. Finally, we can also conclude it’s stationary, due to the p-value (0.01) and the autocorrelation demonstrated in the ACF plot.
p_values <- 0:5
q_values <- 0:5
best_aic_arma <- Inf
best_order_arma <- c(0, 0)
best_model_arma <- NULL
for (p in p_values) {
for (q in q_values) {
model <- tryCatch({
arima(sales_ts, order = c(p, 0, q))
}, error = function(e) NULL)
if (!is.null(model)) {
current_aic <- AIC(model)
if (current_aic < best_aic_arma) {
best_aic_arma <- current_aic
best_order_arma <- c(p, q)
best_model_arma <- model
}
}
}
}
p_values <- 0:5
d_values <- 1:2
q_values <- 0:5
best_aic_arima <- Inf
best_order_arima <- c(0, 1, 0)
best_model_arima <- NULL
for (p in p_values) {
for (d in d_values) {
for (q in q_values) {
model <- tryCatch({
arima(sales_ts, order = c(p, d, q))
}, error = function(e) NULL)
if (!is.null(model)) {
current_aic <- AIC(model)
if (current_aic < best_aic_arima) {
best_aic_arima <- current_aic
best_order_arima <- c(p, d, q)
best_model_arima <- model
}
}
}
}
}
p_values <- 0:5
best_aic_ar <- Inf
best_order_ar <- c(0, 0)
best_model_ar <- NULL
for (p in p_values) {
model <- tryCatch({
arima(sales_ts, order = c(p, 0, 0))
}, error = function(e) NULL)
if (!is.null(model)) {
current_aic <- AIC(model)
if (current_aic < best_aic_ar) {
best_aic_ar <- current_aic
best_order_ar <- c(p, 0)
best_model_ar <- model
}
}
}
q_values <- 0:5
best_aic_ma <- Inf
best_order_ma <- c(0, 0)
best_model_ma <- NULL
for (q in q_values) {
model <- tryCatch({
arima(sales_ts, order = c(0, 0, q))
}, error = function(e) NULL)
if (!is.null(model)) {
current_aic <- AIC(model)
if (current_aic < best_aic_ma) {
best_aic_ma <- current_aic
best_order_ma <- c(0, q)
best_model_ma <- model
}
}
}
modelos <- c("ARMA", "ARIMA", "AR", "MA")
mejores_aic <- c(best_aic_arma, best_aic_arima, best_aic_ar, best_aic_ma)
comparacion_modelos <- data.frame(
Modelo = modelos,
AIC = mejores_aic
)
print("Comparación de modelos y sus AIC:")
## [1] "Comparación de modelos y sus AIC:"
print(comparacion_modelos)
## Modelo AIC
## 1 ARMA 1398.387
## 2 ARIMA 1354.534
## 3 AR 1401.696
## 4 MA 1402.149
mejor_modelo <- modelos[which.min(mejores_aic)]
mejor_aic <- min(mejores_aic)
print(paste("El mejor modelo es:", mejor_modelo))
## [1] "El mejor modelo es: ARIMA"
print(paste("El mejor AIC es:", mejor_aic))
## [1] "El mejor AIC es: 1354.53371359874"
if (mejor_modelo == "ARMA") {
summary(best_model_arma)
} else if (mejor_modelo == "ARIMA") {
summary(best_model_arima)
} else if (mejor_modelo == "AR") {
summary(best_model_ar)
} else if (mejor_modelo == "MA") {
summary(best_model_ma)
}
##
## Call:
## arima(x = sales_ts, order = c(p, d, q))
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 ma4 ma5
## -0.7718 -0.5521 -0.5689 -0.3367 -0.8268 0.0062 0.7522
## s.e. 0.1690 0.1435 0.1924 0.2089 0.1744 0.1985 0.1731
##
## sigma^2 estimated as 1.885e+11: log likelihood = -669.27, aic = 1352.53
##
## Training set error measures:
## ME RMSE MAE MPE MAPE
## Training set NaN NaN NaN NaN NaN
modelo_arima <- Arima(sales_ts, order = c(2, 2, 5))
forecast_arima <- forecast(modelo_arima, h = 5)
autoplot(forecast_arima) +
labs(title = "Pronóstico a 5 períodos con ARIMA(2,2,5)",
x = "Tiempo",
y = "Ventas (Cajas Unitarias)") +
theme_minimal()
Instructions: From the time series dataset, select 2 explanatory variables that might affect the sales unitboxes.
correlation_matrix <- cor(cocacola[, c("sales_unitboxes",
"consumer_sentiment",
"gdp_percapita",
"CPI",
"inflation_rate",
"unemp_rate",
"exchange_rate",
"max_temperature",
"Moving_Avg")],
use = "complete.obs")
print(correlation_matrix)
## sales_unitboxes consumer_sentiment gdp_percapita CPI
## sales_unitboxes 1.00000000 0.35524547 0.05786549 0.08807625
## consumer_sentiment 0.35524547 1.00000000 -0.52631149 -0.40446243
## gdp_percapita 0.05786549 -0.52631149 1.00000000 0.86618848
## CPI 0.08807625 -0.40446243 0.86618848 1.00000000
## inflation_rate -0.55212057 -0.52620078 -0.01727899 0.10332937
## unemp_rate 0.04231135 0.51952301 -0.76038173 -0.80824729
## exchange_rate -0.06014764 -0.73031564 0.66028999 0.54067761
## max_temperature 0.68348780 0.03848755 0.37713494 0.17746354
## Moving_Avg 0.19129530 -0.20416077 0.73121504 0.65558384
## inflation_rate unemp_rate exchange_rate max_temperature
## sales_unitboxes -0.55212057 0.04231135 -0.06014764 0.68348780
## consumer_sentiment -0.52620078 0.51952301 -0.73031564 0.03848755
## gdp_percapita -0.01727899 -0.76038173 0.66028999 0.37713494
## CPI 0.10332937 -0.80824729 0.54067761 0.17746354
## inflation_rate 1.00000000 -0.19810803 0.41019512 -0.55467320
## unemp_rate -0.19810803 1.00000000 -0.63120125 -0.11731604
## exchange_rate 0.41019512 -0.63120125 1.00000000 0.13404524
## max_temperature -0.55467320 -0.11731604 0.13404524 1.00000000
## Moving_Avg -0.10970141 -0.51257546 0.52480755 0.25792091
## Moving_Avg
## sales_unitboxes 0.1912953
## consumer_sentiment -0.2041608
## gdp_percapita 0.7312150
## CPI 0.6555838
## inflation_rate -0.1097014
## unemp_rate -0.5125755
## exchange_rate 0.5248076
## max_temperature 0.2579209
## Moving_Avg 1.0000000
sales_correlations <- correlation_matrix[1, -1]
print(sales_correlations[order(abs(sales_correlations), decreasing = TRUE)])
## max_temperature inflation_rate consumer_sentiment Moving_Avg
## 0.68348780 -0.55212057 0.35524547 0.19129530
## CPI exchange_rate gdp_percapita unemp_rate
## 0.08807625 -0.06014764 0.05786549 0.04231135
max_temperature_diff <- diff(cocacola$max_temperature)
inflation_rate_diff <- diff(cocacola$inflation_rate)
There is highly likely a positive relationship between “max_temperature” and sales unit boxes of Coca-Cola. As the maximum temperature increases, particularly in warmer months or regions, the demand for cold beverages like Coca-Cola tends to rise. People are more inclined to purchase refreshing drinks when the weather is hot to stay hydrated and cool down.On the other hand, lower temperatures may reduce demand as people may prefer hot drinks or other beverages better suited for colder weather.
There is likely a negative relationship between “inflation_rate” and sales unit boxes of Coca-Cola. An increase in the inflation rate generally leads to a rise in the cost of goods and services, which can result in higher prices for Coca-Cola. As prices go up, consumers’ purchasing power declines, potentially reducing the overall sales volume. During periods of high inflation, consumers may prioritize essential items over discretionary spending. Coca-Cola, being a non-essential item, might experience reduced sales volume as consumers cut back on such expenditures.
aligned_data <- data.frame(
tperiod = cocacola$tperiod[-1],
sales_unitboxes = cocacola$sales_unitboxes[-1],
max_temperature_diff = max_temperature_diff,
inflation_rate_diff = inflation_rate_diff
)
ggplot() +
geom_line(data = aligned_data, aes(x = tperiod, y = sales_unitboxes, color = "Ventas (Cajas Unitarias)"), size = 1) +
geom_line(data = aligned_data, aes(x = tperiod, y = max_temperature_diff * 200000, color = "Temperatura Máxima (Diferenciada)"), size = 1, linetype = "dashed") +
geom_line(data = aligned_data, aes(x = tperiod, y = inflation_rate_diff * 2000000, color = "Tasa de Inflación (Diferenciada)"), size = 1, linetype = "dotted") +
scale_y_continuous(
name = "Ventas (Cajas Unitarias)",
sec.axis = sec_axis(~ . / 100000, name = "Temperatura (°C) y Tasa de Inflación Diferenciada (%)")
) +
labs(title = "Ventas, Temperatura Máxima y Tasa de Inflación Diferenciada con Ejes Secundarios",
x = "Tiempo", color = "") +
theme_minimal() +
theme(legend.position = "bottom")
acf_max_temperature_diff <- Acf(max_temperature_diff, main = "ACF de max_temperature (Diferenciada)")
acf_inflation_rate_diff <- Acf(inflation_rate_diff, main = "ACF de inflation_rate (Diferenciada)")
adf_test_max_temp_diff <- adf.test(max_temperature_diff, alternative = "stationary")
print(adf_test_max_temp_diff)
##
## Augmented Dickey-Fuller Test
##
## data: max_temperature_diff
## Dickey-Fuller = -4.0323, Lag order = 3, p-value = 0.01639
## alternative hypothesis: stationary
adf_test_inflation_diff <- adf.test(inflation_rate_diff, alternative = "stationary")
print(adf_test_inflation_diff)
##
## Augmented Dickey-Fuller Test
##
## data: inflation_rate_diff
## Dickey-Fuller = -3.895, Lag order = 3, p-value = 0.02207
## alternative hypothesis: stationary
Max Temperature: The ADF test statistic of -4.0323 is significantly lower than the critical values at common confidence levels (like -3.43 for 5% significance level). The p-value is less than 0.05, indicating that we can reject the null hypothesis in favor of the alternative hypothesis that the series is in fact, stationary.
Inflation Rate: The ADF test statistic of -3.895 is also below the critical values for stationarity (e.g., -3.43 at 5% significance level).The p-value (0.02207) is less than 0.05, allowing us to reject the null hypothesis of non-stationarity, which in turn tells us that this series is also stationary.
ADF Conclusion: These tests suggest that the statistical properties of both series remain constant over time when differenced, making them suitable for time series analysis techniques that assume stationarity.
Autocorrelation analysis after differentiation: Both series show autocorrelation at lag 1. The significant spike at lag 1 indicates that the differenced data is not completely free of autocorrelation. However, most of the other lags do not show significant autocorrelation, implying that the differencing has reduced much of the autocorrelation in the original series. In summary, while there is some remaining autocorrelation, particularly at lag 1 for both series, the data does not exhibit strong autocorrelation at other lags.
var_data <- data.frame(
sales_unitboxes = cocacola$sales_unitboxes[-1],
inflation_rate_diff = inflation_rate_diff
)
lag_selection <- VARselect(var_data, lag.max = 4, type = "const")
optimal_lag <- lag_selection$selection["AIC(n)"]
VAR_model <- VAR(var_data, p = optimal_lag, type = "const")
summary(VAR_model)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: sales_unitboxes, inflation_rate_diff
## Deterministic variables: const
## Sample size: 44
## Log Likelihood: -636.938
## Roots of the characteristic polynomial:
## 0.7906 0.7906 0.7632 0.7632 0.7044 0.7044
## Call:
## VAR(y = var_data, p = optimal_lag, type = "const")
##
##
## Estimation results for equation sales_unitboxes:
## ================================================
## sales_unitboxes = sales_unitboxes.l1 + inflation_rate_diff.l1 + sales_unitboxes.l2 + inflation_rate_diff.l2 + sales_unitboxes.l3 + inflation_rate_diff.l3 + const
##
## Estimate Std. Error t value Pr(>|t|)
## sales_unitboxes.l1 4.718e-01 1.514e-01 3.117 0.00353 **
## inflation_rate_diff.l1 -5.059e+05 2.238e+05 -2.260 0.02978 *
## sales_unitboxes.l2 -1.179e-02 1.726e-01 -0.068 0.94590
## inflation_rate_diff.l2 -2.666e+05 2.408e+05 -1.107 0.27529
## sales_unitboxes.l3 6.755e-02 1.649e-01 0.410 0.68445
## inflation_rate_diff.l3 -5.693e+05 2.055e+05 -2.771 0.00870 **
## const 3.105e+06 1.136e+06 2.733 0.00956 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 463100 on 37 degrees of freedom
## Multiple R-Squared: 0.4425, Adjusted R-squared: 0.3521
## F-statistic: 4.895 on 6 and 37 DF, p-value: 0.0008983
##
##
## Estimation results for equation inflation_rate_diff:
## ====================================================
## inflation_rate_diff = sales_unitboxes.l1 + inflation_rate_diff.l1 + sales_unitboxes.l2 + inflation_rate_diff.l2 + sales_unitboxes.l3 + inflation_rate_diff.l3 + const
##
## Estimate Std. Error t value Pr(>|t|)
## sales_unitboxes.l1 6.975e-08 9.946e-08 0.701 0.48752
## inflation_rate_diff.l1 -6.279e-01 1.470e-01 -4.271 0.00013 ***
## sales_unitboxes.l2 1.302e-07 1.134e-07 1.148 0.25825
## inflation_rate_diff.l2 -4.187e-01 1.582e-01 -2.647 0.01186 *
## sales_unitboxes.l3 3.463e-07 1.083e-07 3.196 0.00285 **
## inflation_rate_diff.l3 -2.437e-01 1.350e-01 -1.805 0.07922 .
## const -3.523e+00 7.463e-01 -4.721 3.33e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.3042 on 37 degrees of freedom
## Multiple R-Squared: 0.4472, Adjusted R-squared: 0.3576
## F-statistic: 4.989 on 6 and 37 DF, p-value: 0.0007832
##
##
##
## Covariance matrix of residuals:
## sales_unitboxes inflation_rate_diff
## sales_unitboxes 2.145e+11 -4.109e+04
## inflation_rate_diff -4.109e+04 9.256e-02
##
## Correlation matrix of residuals:
## sales_unitboxes inflation_rate_diff
## sales_unitboxes 1.0000 -0.2917
## inflation_rate_diff -0.2917 1.0000
forecast_steps <- 5
var_forecast <- predict(VAR_model, n.ahead = forecast_steps)
forecast_index <- seq(as.Date(tail(cocacola$tperiod, 1)), by = "month", length.out = forecast_steps)
forecast_df <- data.frame(
Period = forecast_index,
sales_unitboxes_pred = var_forecast$fcst$sales_unitboxes[, 1],
inflation_rate_pred = var_forecast$fcst$inflation_rate_diff[, 1]
)
print(forecast_df)
## Period sales_unitboxes_pred inflation_rate_pred
## 1 2018-12-01 6485870 -0.13676581
## 2 2019-01-01 6423873 -0.02262784
## 3 2019-02-01 6635569 0.14764207
## 4 2019-03-01 6607301 -0.02813486
## 5 2019-04-01 6565818 -0.01274115
There is a significant relationship between sales and both its own previous values (lag 1) and the inflation rate (lags 1 and 3). The inflation rate generally has a negative impact on sales, while past sales positively influence current sales.
The inflation rate is mainly affected by its own previous values, which have a strong negative impact. There is also some evidence that past sales influence inflation rate changes at lag 3.
Overall, the dependent variable is significantly influenced by its own past values and the inflation rate. Inflation tends to reduce sales, while past sales help to increase them. The inflation rate itself is influenced by its past values and past sales.
granger_test_1 <- causality(VAR_model, cause = "inflation_rate_diff")
print(granger_test_1$Granger)
##
## Granger causality H0: inflation_rate_diff do not Granger-cause
## sales_unitboxes
##
## data: VAR object VAR_model
## F-Test = 3.7763, df1 = 3, df2 = 74, p-value = 0.01403
granger_test_2 <- causality(VAR_model, cause = "sales_unitboxes")
print(granger_test_2$Granger)
##
## Granger causality H0: sales_unitboxes do not Granger-cause
## inflation_rate_diff
##
## data: VAR object VAR_model
## F-Test = 7.9594, df1 = 3, df2 = 74, p-value = 0.0001142
combined_data <- data.frame(
Period = c(as.Date(cocacola$tperiod[-1]), forecast_df$Period),
sales_unitboxes = c(cocacola$sales_unitboxes[-1], rep(NA, forecast_steps)),
sales_unitboxes_pred = c(rep(NA, length(cocacola$sales_unitboxes[-1])), forecast_df$sales_unitboxes_pred)
)
ggplot() +
geom_line(data = combined_data, aes(x = Period, y = sales_unitboxes, color = "Sales (Real)"), size = 1) +
geom_line(data = combined_data, aes(x = Period, y = sales_unitboxes_pred, color = "Sales (Forecast)"), size = 1, linetype = "dashed") +
geom_vline(xintercept = as.numeric(forecast_df$Period[1]), color = "yellow", linetype = "dashed", size = 1) +
labs(title = "Monthly Sales and VAR forecast of Unit Boxes",
x = "Date", y = "Sales (Unit Boxes)") +
scale_color_manual(values = c("Sales (Real)" = "blue", "Sales (Forecast)" = "green")) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
theme(legend.position = "bottom") +
theme(legend.title = element_blank())
While sales_unitboxes Granger-cause inflation_rate_diff, the reverse is not true. This suggests that past values of sales_unitboxes provide significant information for predicting changes in inflation rate.
The rejection of the null hypothesis in the second Granger-causality test indicates that sales unit boxes have a predictive influence on inflation rates after differencing. This might suggest that changes in sales activity could be an indicator of inflationary trends.
Since the first Granger-causality test fails to reject the null hypothesis , inflation (after differencing) does not appear to Granger-cause sales unit boxes. This could imply that fluctuations in inflation may not be directly influencing sales activity
The covariance matrix of residuals shows a weak negative correlation (-0.2917) between sales_unitboxes and inflation_rate_diff. This suggests that, while the residuals of both variables are interconnected, they do not exhibit a strong linear relationship, implying that other factors may influence the dynamics between sales and inflation beyond what is captured by their past values.
The VAR forecast begins in late 2018, just after a moderate upward trend in real sales. The forecast shows a relatively stable prediction for future sales, with values slightly above the final real sales point. This indicates that the model expects a stabilization of sales based on recent patterns, projecting moderate growth or consistency in the near future.
The forecasted section of the graph maintains the clear seasonal patterns seen in historical data, which suggests that these patterns will persist. Coca-Cola can use this information to prepare for peak demand periods, ensuring inventory and marketing efforts are aligned with these predictable sales surges.
Adjust prices for Coca-Cola products based on the weather. For example, offer discounts or promotions on hotter days to encourage more sales. In colder months, you might focus on bundling with other products.
Implement a pricing strategy that adjusts to inflationary pressures. For example, offering the product in smaller containers, might make it easier for customers to buy it, mainly in countries like Mexico, where the income is low in many families.
We could also consider using machine learning models like neural networks, which can handle time series data and complex non-linear relationships. By feeding these models with relevant time series features, we can improve the accuracy of the forecasts we made.