Walmart Inc. (formerly Wal-Mart Stores, Inc.) is an American multinational retail corporation that operates a chain of hypermarkets, discount department stores, and grocery stores. Has the main headquartered in Bentonville, Arkansas, the company was founded by Sam Walton in 1962 and incorporated on October 31, 1969. As of October 31, 2018, Walmart has 11,200 stores and clubs in 27 countries, operating under 55 different names, and eCommerce websites in 10 countries.
Walmart employ approximately 2.2 million associates around the world – 1.5 million in the U.S. alone. Which means, the sales in the U.S have the majority influence on the Walmart Inc. At 2010 where people slowly change over their preference on shopping to the chain market, Walmart has favorable condition. Evidently, their stocks jack up 28,72% from 53,33 US dollars in January 2010 to 68,65 US dollars in December 2012 despite the global economy’s unstable in the past 2 years.
This dashboard shown selling data on Walmart in the U.S between 5 February 2010 to 26 October 2012.
Preparing few packages for analysis and visualize the data. #### Package
# Load packages
library(shiny)
library(shinydashboard)
library(plotly)
library(readxl)
library(dplyr)
library(reshape2)
library(lubridate)
library(ggplot2)
library(colourpicker)
library(datasets)
library(forecast)
library(randomForest)
library(prophet)
library(tidyverse)
library(data.table)
In terms of how vast the sales distribution on each stores and department, correlation between variable, and so on. the visualization is a must to do before anlysis data, therefor we would know the pattern of the data and decided how we proceed to get best anlysis.
summary(data)
Store Dept Date Weekly_Sales IsHoliday
Min. : 1.0 Min. : 1.00 2011-12-23: 3027 Min. : -4989 Mode :logical
1st Qu.:11.0 1st Qu.:18.00 2011-11-25: 3021 1st Qu.: 2080 FALSE:391909
Median :22.0 Median :37.00 2011-12-16: 3013 Median : 7612 TRUE :29661
Mean :22.2 Mean :44.26 2011-12-09: 3010 Mean : 15981
3rd Qu.:33.0 3rd Qu.:74.00 2012-02-17: 3007 3rd Qu.: 20206
Max. :45.0 Max. :99.00 2011-12-30: 3003 Max. :693099
(Other) :403489
Summary of Walmart Sales data 2010-2012 shows, there are 45 stores in the data, 99 department, and surely there are less holliday weeks than non-holliday. (even though we don’t know specificaly like its country state or the detail department because the data are confidental).
To support the arguments about holliday, there is the graph.
hldy=c(391909,29661)
pie(hldy , labels = c("Holiday","Not Holiday") , col = c("cornsilk", "purple"), edges = 16)
my_theme <- theme_bw() +
theme(axis.title=element_text(size=24),
plot.title=element_text(size=36),
axis.text =element_text(size=16))
data %>%
group_by(Store) %>%
summarise(mean.sales = mean(Weekly_Sales)) %>%
arrange(mean.sales) %>%
mutate(store=factor(Store,levels=Store)) %>%
ggplot(aes(x=store,y=mean.sales)) +
geom_point(color="#c60b1e") +
guides(color=FALSE) +
xlab("Store") +
ylab("Mean Sales") +
my_theme +
theme(axis.text.x=element_blank(), axis.ticks = element_blank()) +
geom_text(aes(x=c(1:45),y=mean.sales,label=Store), hjust=-.25, vjust=-2, size=2.5) +
theme(plot.background=element_rect(fill="#c60b1e"),
panel.background=element_rect(fill="#ffc400"),
panel.grid =element_blank(),
axis.title =element_text(color="#ffc400"),
axis.text =element_text(color="#ffc400"),
plot.title =element_text(color="#ffc400",size=32)) +
ylim(c(5000,30000)) +
ggtitle("Sales Distribution by Stores")
package 㤼㸱bindrcpp㤼㸲 was built under R version 3.5.1
Picture above shown store5 is the lowest mean sales and store2 is the highest mean sales between 45 Walmart store in the U.S. The gap sales difference are quite big, which is around $25000.
ggplot(data,aes(x=Weekly_Sales))+ geom_histogram(aes(y=..density..),colour="black",fill="white", bins = 20)+ geom_density(alpha=.2, fill="red")
Using the graph above, we can conclude the distribution of the weekly sales are expected exponential known the value of 0 is the highest between all. But if we look carefully, Walmart sales has a lot of data (not even 10%, but we can’t assume small quantities because they are hundreds-from hundreds of thousands) of negative sales. So from 2010-2012 distribution data are highly in the range of negatives up to $10.000
After we know how the data distribute and all. Analysis can conduct efficiently because we’ve known data caracteristic. ### Linear Regression
model<-lm(Weekly_Sales~., data)
summary(model)
Call:
lm(formula = Weekly_Sales ~ ., data = data)
Residuals:
Min 1Q Median 3Q Max
-36834 -13092 -7491 5150 665648
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15375.966 417.353 36.842 < 2e-16 ***
Store -157.695 2.688 -58.676 < 2e-16 ***
Dept 111.710 1.127 99.130 < 2e-16 ***
Date2010-02-12 -509.319 580.189 -0.878 0.380026
Date2010-02-19 -633.211 579.165 -1.093 0.274255
Date2010-02-26 -1927.783 580.435 -3.321 0.000896 ***
Date2010-03-05 -899.064 580.780 -1.548 0.121616
Date2010-03-12 -1277.060 580.287 -2.201 0.027755 *
Date2010-03-19 -1534.154 580.829 -2.641 0.008259 **
Date2010-03-26 -1849.053 580.632 -3.185 0.001450 **
Date2010-04-02 281.756 580.533 0.485 0.627435
Date2010-04-09 -777.460 580.435 -1.339 0.180427
Date2010-04-16 -1486.103 580.780 -2.559 0.010504 *
Date2010-04-23 -1572.803 581.325 -2.706 0.006819 **
Date2010-04-30 -1857.219 581.974 -3.191 0.001417 **
Date2010-05-07 -279.547 581.375 -0.481 0.630632
Date2010-05-14 -1316.413 581.874 -2.262 0.023675 *
Date2010-05-21 -1327.753 582.427 -2.280 0.022626 *
Date2010-05-28 -397.310 582.628 -0.682 0.495286
Date2010-06-04 429.526 582.477 0.737 0.460872
Date2010-06-11 -364.366 582.528 -0.625 0.531649
Date2010-06-18 -528.933 581.724 -0.909 0.363218
Date2010-06-25 -847.356 582.125 -1.456 0.145497
Date2010-07-02 -62.446 582.125 -0.107 0.914573
Date2010-07-09 -339.227 582.831 -0.582 0.560544
Date2010-07-16 -884.499 582.932 -1.517 0.129185
Date2010-07-23 -1483.174 581.824 -2.549 0.010798 *
Date2010-07-30 -1522.691 582.125 -2.616 0.008904 **
Date2010-08-06 -247.314 582.578 -0.425 0.671190
Date2010-08-13 -765.264 583.186 -1.312 0.189449
Date2010-08-20 -585.436 582.932 -1.004 0.315237
Date2010-08-27 -965.752 583.084 -1.656 0.097665 .
Date2010-09-03 -683.095 581.674 -1.174 0.240251
Date2010-09-10 -1262.647 581.126 -2.173 0.029799 *
Date2010-09-17 -2059.895 581.525 -3.542 0.000397 ***
Date2010-09-24 -2676.687 581.375 -4.604 4.14e-06 ***
Date2010-10-01 -2392.984 581.226 -4.117 3.84e-05 ***
Date2010-10-08 -1429.320 581.176 -2.459 0.013919 *
Date2010-10-15 -2076.238 581.474 -3.571 0.000356 ***
Date2010-10-22 -2119.883 581.275 -3.647 0.000265 ***
Date2010-10-29 -1923.437 581.375 -3.308 0.000938 ***
Date2010-11-05 -1168.843 581.475 -2.010 0.044417 *
Date2010-11-12 -1035.381 581.574 -1.780 0.075026 .
Date2010-11-19 -1318.831 582.175 -2.265 0.023492 *
Date2010-11-26 5608.776 581.077 9.652 < 2e-16 ***
Date2010-12-03 122.033 580.533 0.210 0.833504
Date2010-12-10 2078.230 580.582 3.580 0.000344 ***
Date2010-12-17 4092.226 580.042 7.055 1.73e-12 ***
Date2010-12-24 10567.162 580.189 18.213 < 2e-16 ***
Date2010-12-31 -3064.236 580.829 -5.276 1.32e-07 ***
Date2011-01-07 -2201.053 581.474 -3.785 0.000154 ***
Date2011-01-14 -2900.091 581.574 -4.987 6.15e-07 ***
Date2011-01-21 -2877.410 582.074 -4.943 7.68e-07 ***
Date2011-01-28 -3231.463 582.024 -5.552 2.82e-08 ***
Date2011-02-04 -1021.893 581.674 -1.757 0.078950 .
Date2011-02-11 -706.481 581.077 -1.216 0.224057
Date2011-02-18 -268.164 580.730 -0.462 0.644247
Date2011-02-25 -1747.762 581.375 -3.006 0.002645 **
Date2011-03-04 -951.657 580.091 -1.641 0.100896
Date2011-03-11 -1767.142 579.797 -3.048 0.002305 **
Date2011-03-18 -1572.448 580.533 -2.709 0.006756 **
Date2011-03-25 -2200.359 580.879 -3.788 0.000152 ***
Date2011-04-01 -2045.926 580.435 -3.525 0.000424 ***
Date2011-04-08 -1117.654 581.226 -1.923 0.054490 .
Date2011-04-15 -1545.230 580.337 -2.663 0.007753 **
Date2011-04-22 -303.997 580.238 -0.524 0.600336
Date2011-04-29 -2029.361 580.386 -3.497 0.000471 ***
Date2011-05-06 -1065.623 579.311 -1.839 0.065847 .
Date2011-05-13 -1509.155 579.651 -2.604 0.009226 **
Date2011-05-20 -1785.921 581.176 -3.073 0.002120 **
Date2011-05-27 -1407.554 580.829 -2.423 0.015378 *
Date2011-06-03 -133.213 581.874 -0.229 0.818918
Date2011-06-10 -515.679 581.774 -0.886 0.375408
Date2011-06-17 -737.359 580.731 -1.270 0.204189
Date2011-06-24 -1173.303 581.375 -2.018 0.043576 *
Date2011-07-01 -552.003 581.425 -0.949 0.342419
Date2011-07-08 -477.934 581.425 -0.822 0.411075
Date2011-07-15 -1232.690 581.774 -2.119 0.034104 *
Date2011-07-22 -1256.140 582.175 -2.158 0.030954 *
Date2011-07-29 -1826.653 582.025 -3.138 0.001699 **
Date2011-08-05 -415.806 581.524 -0.715 0.474593
Date2011-08-12 -1044.305 581.375 -1.796 0.072453 .
Date2011-08-19 -950.076 580.337 -1.637 0.101608
Date2011-08-26 -769.329 580.435 -1.325 0.185027
Date2011-09-02 -1438.244 580.533 -2.477 0.013233 *
Date2011-09-09 -1001.062 580.091 -1.726 0.084402 .
Date2011-09-16 -1919.861 580.681 -3.306 0.000946 ***
Date2011-09-23 -2340.750 580.140 -4.035 5.47e-05 ***
Date2011-09-30 -2496.552 580.435 -4.301 1.70e-05 ***
Date2011-10-07 -860.884 580.091 -1.484 0.137796
Date2011-10-14 -1765.454 580.681 -3.040 0.002363 **
Date2011-10-21 -1305.599 580.385 -2.250 0.024479 *
Date2011-10-28 -1348.506 579.797 -2.326 0.020028 *
Date2011-11-04 -338.939 580.287 -0.584 0.559161
Date2011-11-11 -475.462 579.749 -0.820 0.412150
Date2011-11-18 -1123.317 579.944 -1.937 0.052753 .
Date2011-11-25 5136.258 577.061 8.901 < 2e-16 ***
Date2011-12-02 -352.218 578.346 -0.609 0.542518
Date2011-12-09 1585.564 577.582 2.745 0.006048 **
Date2011-12-16 3072.649 577.439 5.321 1.03e-07 ***
Date2011-12-23 8552.252 576.778 14.828 < 2e-16 ***
Date2011-12-30 -1535.471 577.915 -2.657 0.007886 **
Date2012-01-06 -1696.070 579.359 -2.927 0.003417 **
Date2012-01-13 -2637.748 579.700 -4.550 5.36e-06 ***
Date2012-01-20 -2603.168 580.042 -4.488 7.19e-06 ***
Date2012-01-27 -3338.170 580.385 -5.752 8.84e-09 ***
Date2012-02-03 -1384.599 579.165 -2.391 0.016818 *
Date2012-02-10 -188.250 578.011 -0.326 0.744662
Date2012-02-17 -186.211 577.725 -0.322 0.747212
Date2012-02-24 -1350.763 579.944 -2.329 0.019853 *
Date2012-03-02 -1160.071 578.538 -2.005 0.044945 *
Date2012-03-09 -861.655 579.311 -1.487 0.136916
Date2012-03-16 -986.516 579.797 -1.701 0.088853 .
Date2012-03-23 -1605.160 579.944 -2.768 0.005644 **
Date2012-03-30 -1500.326 579.944 -2.587 0.009681 **
Date2012-04-06 1100.689 578.875 1.901 0.057247 .
Date2012-04-13 -1175.741 579.165 -2.030 0.042351 *
Date2012-04-20 -1661.876 579.262 -2.869 0.004119 **
Date2012-04-27 -1994.755 580.287 -3.438 0.000587 ***
Date2012-05-04 -908.438 580.238 -1.566 0.117436
Date2012-05-11 -1042.402 579.360 -1.799 0.071983 .
Date2012-05-18 -966.658 580.337 -1.666 0.095777 .
Date2012-05-25 -539.118 580.928 -0.928 0.353393
Date2012-06-01 -430.247 580.829 -0.741 0.458848
Date2012-06-08 -69.944 579.993 -0.121 0.904012
Date2012-06-15 -482.932 580.189 -0.832 0.405201
Date2012-06-22 -748.670 579.945 -1.291 0.196727
Date2012-06-29 -937.899 581.325 -1.613 0.106663
Date2012-07-06 451.318 579.945 0.778 0.436446
Date2012-07-13 -1248.012 580.386 -2.150 0.031531 *
Date2012-07-20 -1255.502 580.386 -2.163 0.030525 *
Date2012-07-27 -1817.925 581.326 -3.127 0.001765 **
Date2012-08-03 -836.194 579.652 -1.443 0.149140
Date2012-08-10 -798.196 580.140 -1.376 0.168864
Date2012-08-17 -812.200 580.238 -1.400 0.161583
Date2012-08-24 -818.641 579.993 -1.411 0.158108
Date2012-08-31 -913.108 579.895 -1.575 0.115348
Date2012-09-07 -534.571 579.700 -0.922 0.356450
Date2012-09-14 -1920.581 579.651 -3.313 0.000922 ***
Date2012-09-21 -1827.660 580.042 -3.151 0.001628 **
Date2012-09-28 -2036.439 579.895 -3.512 0.000445 ***
Date2012-10-05 -852.035 579.214 -1.471 0.141287
Date2012-10-12 -1426.475 578.538 -2.466 0.013677 *
Date2012-10-19 -1504.380 580.484 -2.592 0.009554 **
Date2012-10-26 -1408.577 580.042 -2.428 0.015166 *
IsHolidayTRUE NA NA NA NA
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 22300 on 421425 degrees of freedom
Multiple R-squared: 0.03592, Adjusted R-squared: 0.03559
F-statistic: 109 on 144 and 421425 DF, p-value: < 2.2e-16
This model is pretty bad, it only explains about 3.5% of the variances. So assume the linear model are exactly not fit for this data. Though, we actually must try the exponential regression model to this data. Unfortunately, Weekly_Sales have 0 value so it can be proceed further more using exponential regression.
ds_ts <- ts(data_s_w$y, frequency=36)
f <- decompose(ds_ts)
plot(f)
Between 3 Methods above, the seasonal type of time series are match with the data observe. This data have already gathered by summing sales each week from all the stores and departments. #### TS Arima
fit_arima <- auto.arima(data_s_w$y, seasonal = TRUE, stepwise = FALSE, stationary = FALSE, allowmean = TRUE, parallel = TRUE)
plot(forecast(fit_arima, h=14))
Using auto.arima function for this data, unfortunately still can’t solve the problem to predict Walmart Sales, therefor we conduct prophet package for further solving.
#identifikasi model ARIMA
w <- prophet(data_s_w)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#menentukan jumlah periode
future <- make_future_dataframe(w, periods = 144)
#forecasting
forecast <- predict(w, future)
# plot forecasting for 144 weeks further do
plot(w, forecast)
#plot untuk seasonly, weekly, dan yearly
prophet_plot_components(w, forecast)
It shows, using prophet package, the data distribute much closer to the actual data. Now we try using monthly data of sales #### TS Prophet Monthly
#identifikasi model ARIMA
m <- prophet(data_s_m)
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#menentukan jumlah periode
future <- make_future_dataframe(w, periods = 52)
#forecasting
forecast <- predict(m, future)
# plot forecasting for 144 weeks further do
plot(m, forecast)
#plot untuk seasonly, weekly, dan yearly
prophet_plot_components(m, forecast)
Even though the data became lesser, the prediction of data sales still good based on graph and we surely did not use trend methode. Trend methode are not natural for this data.
h_super_bowl<- data_frame(
holiday = "super bowl",
ds = as.Date(c("10/02/12" ,"11/02/11", "12/02/10", "13/02/08"), "%y/%m/%d"
), lower_window = 0, upper_window = 45)
h_labor_day<- data_frame(
holiday = "labor day",
ds = as.Date(c("10/09/10", "11/09/09", "12/09/07", "13/09/06"), "%y/%m/%d"
),lower_window = 0, upper_window = 2)
h_thanksgiving <- data_frame(
holiday = "thanksgiving",
ds = as.Date(c("10/11/26", "11/11/25", "12/11/23", '13/11/29'),"%y/%m/%d"
),lower_window = 0,upper_window = 10)
h_christmas<- data_frame(
holiday = "christmas",
ds = as.Date(c("10/12/31", "11/12/30", "12/12/28", '13/12/27'),"%y/%m/%d"
),lower_window = 0,upper_window = 9)
#compile holidays
liburan <- bind_rows(h_super_bowl,h_labor_day, h_thanksgiving, h_christmas)
hw <- prophet(data_s_w, holidays = liburan )
Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
future <- make_future_dataframe(hw, periods = 144)
forecast <- predict(hw, future)
plot(hw, forecast)
prophet_plot_components(hw, forecast)
This methode is slightly difference with TS Prophet Weekly, because in this time series prediction we include holiday as intervention of the data. In so, chrismas holiday, labor day, super bowl event, and thanksgiving holiday have responsibilities of this time series. As we can see on the graph. Using interfee such as holiday actually good because we consider more variable affecting this time series plot.
I, Galuh Indra Wijaya and my id is 0621154000128. Here, fulfilling my assignment on Bussines Analitic subject on Department Statistics of Institut Teknologi Sepuluh Nopember.