Here is the Data Summary Of Walmart

> summary(Walmart)
     Store        Date            Weekly_Sales      Holiday_Flag      Temperature    
 Min.   : 1   Length:6435        Min.   : 209986   Min.   :0.00000   Min.   : -2.06  
 1st Qu.:12   Class :character   1st Qu.: 553350   1st Qu.:0.00000   1st Qu.: 47.46  
 Median :23   Mode  :character   Median : 960746   Median :0.00000   Median : 62.67  
 Mean   :23                      Mean   :1046965   Mean   :0.06993   Mean   : 60.66  
 3rd Qu.:34                      3rd Qu.:1420159   3rd Qu.:0.00000   3rd Qu.: 74.94  
 Max.   :45                      Max.   :3818686   Max.   :1.00000   Max.   :100.14  
   Fuel_Price         CPI         Unemployment   
 Min.   :2.472   Min.   :126.1   Min.   : 3.879  
 1st Qu.:2.933   1st Qu.:131.7   1st Qu.: 6.891  
 Median :3.445   Median :182.6   Median : 7.874  
 Mean   :3.359   Mean   :171.6   Mean   : 7.999  
 3rd Qu.:3.735   3rd Qu.:212.7   3rd Qu.: 8.622  
 Max.   :4.468   Max.   :227.2   Max.   :14.313  

Preparation For Data Modeling: Here the data is split into different factors, namely Store & Holiday Flag

temp0 <- Walmart %>% 
  mutate(random = runif(6435)   ,
         Store = as.factor(Store) ,
         Holiday_Flag = as.factor(Holiday_Flag)
  )

summary(temp0)
str(temp0)

hist(temp0$random)
summary(temp0$random)

trainlm <- temp0 %>% 
  filter(random < 0.7) %>% 
  select(-random )

validationlm <- temp0 %>% 
  filter(random >= 0.7) %>% 
  select(-random )
  
Holiday_Flag1    66762.7     9794.8   6.816 1.06e-11 ***
Temperature       -947.8      161.3  -5.877 4.48e-09 ***
Fuel_Price      -44905.5     8787.1  -5.110 3.35e-07 ***
CPI               3025.0     1264.1   2.393 0.016749 *  
Unemployment    -20749.0     5260.0  -3.945 8.11e-05 ***  
  

Second Model

bckwd <- ols_step_backward_p(model1  , details = T , prem = 0.01)

model2 <- bckwd$model

summary(model2)

Holiday_Flag1    66762.7     9794.8   6.816 1.06e-11 ***
Temperature       -947.8      161.3  -5.877 4.48e-09 ***
Fuel_Price      -44905.5     8787.1  -5.110 3.35e-07 ***
CPI               3025.0     1264.1   2.393 0.016749 *  
Unemployment    -20749.0     5260.0  -3.945 8.11e-05 ***

Third Model

model3 <- lm(Weekly_Sales ~ (.-Date)^2 , data = trainlm)
summary(model3)

Fourth Model

bckwd2 <- ols_step_backward_p(model3  , details = T , prem = 0.01)

model4 <- bckwd2$model

summary(model4)

3 Interpreted Predictors;

  1. CPI, we see a correlation with the CPI going up, and sales doing the same. This is because of multiple facets, though when taking Walmart into the scope, a noticably cheaper brand than other competitors, the CPI’s rising sign, fortelling inflation and rising prices, causes consumers to increase their purchasing in the short term in order to avoid higher later prices.

  2. Unemployment, When unemployment rates are higher, we see less purchasing within the stores as to simply put, purchasers do not have the income to buy as many Walmart products

  3. Holiday Flags, When stores carry Holiday Flags denouncing a special occassion, Sales within Walmart stores increase for a few reasons, usually at this time purchasers are buying gifts and celebrating, inquiring for a larger need of resources, paired with these holiday occassions, brands like Walmart often release incentive store based sales for their products which also increase spending and subsequently sales within their stores.

R2 Calculation:

mean_sales_val <- mean(validationlm$Weekly_Sales)

vallmR2 <- validationlm %>% 
  mutate(res2 = (Weekly_Sales - sales_predicted)^2 , 
         tot2 = (Weekly_Sales - mean_sales_val )^2)

r2 <- 1- sum(vallmR2$res2)/sum(vallmR2$tot2)

The R2 Calculation Brings back a value of .91 this hints a strong correlation between the variables of the data and effecting one another, meaning for example that CPI is closely correlated to Sales, if this R2 value was say .52 the correlation would be much weaker and farther testing/context would need to be added, essentially invalidating this data.