Corporación Favorita is a large Ecuadorian-based grocery store. Over recent years, Corporación Favorita has experienced growth as well as difficult times. On April 16th 2016, an earthquake with a magnitude of 7.8 struck Ecuador and greatly affected sales for several weeks. Corporación Favorita now wants to analyze the data from the last few years and get a better understanding of the company’s growth and where it can make improvements. In this analysis, 54 of Corporación Favorita stores were used to collect data. To get a strong understanding of how the company is performing, the total sales per day will be used to train a forecasting model. Individual stores did not get analyzed. This will be left for a future study. The type of goods sold by the company and oil-prices over the same period of time will be analyzed as well to gain more insight on the company’s progess.
Corporación Favorita desires an accurate forecast of future sales to better understand inventory needs.
The data for this analysis can be accessed by following this link. The following data sets were used during this project. * train.csv: This will be used to train the forecasting model. This is a data set compiling each stores performance since 2013 * oil.csv: This data set tracks the oil prices over the same period of time as the previous data sets * holidays_events.csv: This data set tracks where a holiday falls and how it is categorized.
These data sets were uploaded and cleaned using SQL.
SQL queries were used to create the following tables. * training_clean_v02: Table consists of each day from 01-01-2013 until 08-15-2017. For eeach date, there is a corresponding “daily_sales” which totals the amount of sales that occurred for all of the stores combined. Duplicates and null entries were removed. * holidays_events_clean: Duplicates and null entries were removed. * oil_prices_clean: Duplicates and null entries were removed.
These data sets were then uploaded to RStudio for further analysis. To begin, a time-series graph will be created to visualize the total sales for Corporación Favorita on a given day.
# Ensure that the date column is viewed in date format
training_clean_v02$date <- as.Date(training_clean_v02$date)
# Create a time-series
traints <- ts(training_clean_v02$daily_sales,
start = as.numeric(format(min(training_clean_v02$date), "%Y")),
end = as.numeric(format(max(training_clean_v02$date), "%Y")),
frequency = 365)
Now, a forecasting model will be created, using the training_clean_v02 and the above time-series.
# The acf function will be used to learn more about the time-series. The consistent crossing of the blue-dotted line indicates that past values of the time series are correlated with the current values.
acf(traints)
# The PACF helps us understand the direct influence of a past observation on the current observation, removing the influence of the intervening observations.
pacf(traints)
# The ADF Test is used to check for the presence of a unit root in a time series. The presence of a unit root indicates that the time series is non-stationary, and its statistical properties change over time.
adf.test(traints)
## Warning in adf.test(traints): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: traints
## Dickey-Fuller = -6.7665, Lag order = 11, p-value = 0.01
## alternative hypothesis: stationary