Here is the Data Summary Of Walmart
> summary(Walmart)
Store Date Weekly_Sales Holiday_Flag Temperature
Min. : 1 Length:6435 Min. : 209986 Min. :0.00000 Min. : -2.06
1st Qu.:12 Class :character 1st Qu.: 553350 1st Qu.:0.00000 1st Qu.: 47.46
Median :23 Mode :character Median : 960746 Median :0.00000 Median : 62.67
Mean :23 Mean :1046965 Mean :0.06993 Mean : 60.66
3rd Qu.:34 3rd Qu.:1420159 3rd Qu.:0.00000 3rd Qu.: 74.94
Max. :45 Max. :3818686 Max. :1.00000 Max. :100.14
Fuel_Price CPI Unemployment
Min. :2.472 Min. :126.1 Min. : 3.879
1st Qu.:2.933 1st Qu.:131.7 1st Qu.: 6.891
Median :3.445 Median :182.6 Median : 7.874
Mean :3.359 Mean :171.6 Mean : 7.999
3rd Qu.:3.735 3rd Qu.:212.7 3rd Qu.: 8.622
Max. :4.468 Max. :227.2 Max. :14.313
Preparation For Data Modeling: Here the data is split into different factors, namely Store & Holiday Flag
temp0 <- Walmart %>%
mutate(random = runif(6435) ,
Store = as.factor(Store) ,
Holiday_Flag = as.factor(Holiday_Flag)
)
summary(temp0)
str(temp0)
hist(temp0$random)
summary(temp0$random)
trainlm <- temp0 %>%
filter(random < 0.7) %>%
select(-random )
validationlm <- temp0 %>%
filter(random >= 0.7) %>%
select(-random )
Holiday_Flag1 66762.7 9794.8 6.816 1.06e-11 ***
Temperature -947.8 161.3 -5.877 4.48e-09 ***
Fuel_Price -44905.5 8787.1 -5.110 3.35e-07 ***
CPI 3025.0 1264.1 2.393 0.016749 *
Unemployment -20749.0 5260.0 -3.945 8.11e-05 ***
Second Model
bckwd <- ols_step_backward_p(model1 , details = T , prem = 0.01)
model2 <- bckwd$model
summary(model2)
Holiday_Flag1 66762.7 9794.8 6.816 1.06e-11 ***
Temperature -947.8 161.3 -5.877 4.48e-09 ***
Fuel_Price -44905.5 8787.1 -5.110 3.35e-07 ***
CPI 3025.0 1264.1 2.393 0.016749 *
Unemployment -20749.0 5260.0 -3.945 8.11e-05 ***
Third Model
model3 <- lm(Weekly_Sales ~ (.-Date)^2 , data = trainlm)
summary(model3)
Fourth Model
bckwd2 <- ols_step_backward_p(model3 , details = T , prem = 0.01)
model4 <- bckwd2$model
summary(model4)
3 Interpreted Predictors;
CPI, we see a correlation with the CPI going up, and sales doing the same. This is because of multiple facets, though when taking Walmart into the scope, a noticably cheaper brand than other competitors, the CPI’s rising sign, fortelling inflation and rising prices, causes consumers to increase their purchasing in the short term in order to avoid higher later prices.
Unemployment, When unemployment rates are higher, we see less purchasing within the stores as to simply put, purchasers do not have the income to buy as many Walmart products
Holiday Flags, When stores carry Holiday Flags denouncing a special occassion, Sales within Walmart stores increase for a few reasons, usually at this time purchasers are buying gifts and celebrating, inquiring for a larger need of resources, paired with these holiday occassions, brands like Walmart often release incentive store based sales for their products which also increase spending and subsequently sales within their stores.
R2 Calculation:
mean_sales_val <- mean(validationlm$Weekly_Sales)
vallmR2 <- validationlm %>%
mutate(res2 = (Weekly_Sales - sales_predicted)^2 ,
tot2 = (Weekly_Sales - mean_sales_val )^2)
r2 <- 1- sum(vallmR2$res2)/sum(vallmR2$tot2)
The R2 Calculation Brings back a value of .91 this hints a strong correlation between the variables of the data and effecting one another, meaning for example that CPI is closely correlated to Sales, if this R2 value was say .52 the correlation would be much weaker and farther testing/context would need to be added, essentially invalidating this data.