Contexto

El análisis de regresión lineal es una técnica estadística fundamental que se utiliza para modelar y entender la relación entre una variable dependiente y una o más variables independientes. Su aplicación principal radica en la predicción y pronóstico de variables, así como en el análisis de la relación entre variables y en la inferencia estadística sobre los coeficientes de regresión, en este caso usaremos este análisis para poder identificar que variables tienen un mayor impacto a la hora de predecir las ventas semanales en las tiendas Walmart.

Pasos

Instalar librerías y paquetes

#install.packages("tidyverse")
library(tidyverse)

Importar base de datos

df<- read.csv("C:\\Users\\LuisD\\Documents\\Concentración\\Walmart_Store_sales.csv")

Entender la base de datos

df$Date<- as.Date(df$Date, format="%d-%m-%Y")

summary(df)
##      Store         Date             Weekly_Sales      Holiday_Flag    
##  Min.   : 1   Min.   :2010-02-05   Min.   : 209986   Min.   :0.00000  
##  1st Qu.:12   1st Qu.:2010-10-08   1st Qu.: 553350   1st Qu.:0.00000  
##  Median :23   Median :2011-06-17   Median : 960746   Median :0.00000  
##  Mean   :23   Mean   :2011-06-17   Mean   :1046965   Mean   :0.06993  
##  3rd Qu.:34   3rd Qu.:2012-02-24   3rd Qu.:1420159   3rd Qu.:0.00000  
##  Max.   :45   Max.   :2012-10-26   Max.   :3818686   Max.   :1.00000  
##   Temperature       Fuel_Price         CPI         Unemployment   
##  Min.   : -2.06   Min.   :2.472   Min.   :126.1   Min.   : 3.879  
##  1st Qu.: 47.46   1st Qu.:2.933   1st Qu.:131.7   1st Qu.: 6.891  
##  Median : 62.67   Median :3.445   Median :182.6   Median : 7.874  
##  Mean   : 60.66   Mean   :3.359   Mean   :171.6   Mean   : 7.999  
##  3rd Qu.: 74.94   3rd Qu.:3.735   3rd Qu.:212.7   3rd Qu.: 8.622  
##  Max.   :100.14   Max.   :4.468   Max.   :227.2   Max.   :14.313
str(df)
## 'data.frame':    6435 obs. of  8 variables:
##  $ Store       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Date        : Date, format: "2010-02-05" "2010-02-12" ...
##  $ Weekly_Sales: num  1643691 1641957 1611968 1409728 1554807 ...
##  $ Holiday_Flag: int  0 1 0 0 0 0 0 0 0 0 ...
##  $ Temperature : num  42.3 38.5 39.9 46.6 46.5 ...
##  $ Fuel_Price  : num  2.57 2.55 2.51 2.56 2.62 ...
##  $ CPI         : num  211 211 211 211 211 ...
##  $ Unemployment: num  8.11 8.11 8.11 8.11 8.11 ...

Agregar variables a la base de datos

df$Year<- format(df$Date, "%Y")
df$Year<- as.integer(df$Year)

df$Month<- format(df$Date, "%m")
df$Month<- as.integer(df$Month)

df$WeekYear<- format(df$Date, "%W") #Significa que inicia en lunes la semana
df$WeekYear<- as.integer(df$WeekYear)

df$Day<- format(df$Date, "%d")
df$Day<- as.integer(df$Day)

#df$WeekDay<- format(df$Date, "%u")
#df$WeekDay<- as.integer(df$WeekDay)


str(df)
## 'data.frame':    6435 obs. of  12 variables:
##  $ Store       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Date        : Date, format: "2010-02-05" "2010-02-12" ...
##  $ Weekly_Sales: num  1643691 1641957 1611968 1409728 1554807 ...
##  $ Holiday_Flag: int  0 1 0 0 0 0 0 0 0 0 ...
##  $ Temperature : num  42.3 38.5 39.9 46.6 46.5 ...
##  $ Fuel_Price  : num  2.57 2.55 2.51 2.56 2.62 ...
##  $ CPI         : num  211 211 211 211 211 ...
##  $ Unemployment: num  8.11 8.11 8.11 8.11 8.11 ...
##  $ Year        : int  2010 2010 2010 2010 2010 2010 2010 2010 2010 2010 ...
##  $ Month       : int  2 2 2 2 3 3 3 3 4 4 ...
##  $ WeekYear    : int  5 6 7 8 9 10 11 12 13 14 ...
##  $ Day         : int  5 12 19 26 5 12 19 26 2 9 ...

Geberar la regresión lineal

regresion<- lm(Weekly_Sales ~., data = df)
summary(regresion)
## 
## Call:
## lm(formula = Weekly_Sales ~ ., data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1094800  -382464   -42860   375406  2587123 
## 
## Coefficients: (1 not defined because of singularities)
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.384e+09  9.127e+09  -0.261   0.7940    
## Store        -1.538e+04  5.202e+02 -29.576  < 2e-16 ***
## Date         -3.399e+03  1.266e+04  -0.268   0.7883    
## Holiday_Flag  4.773e+04  2.706e+04   1.763   0.0779 .  
## Temperature  -1.817e+03  4.053e+02  -4.484 7.47e-06 ***
## Fuel_Price    6.124e+04  2.876e+04   2.130   0.0332 *  
## CPI          -2.109e+03  1.928e+02 -10.941  < 2e-16 ***
## Unemployment -2.209e+04  3.967e+03  -5.569 2.67e-08 ***
## Year          1.212e+06  4.633e+06   0.262   0.7937    
## Month         1.177e+05  3.858e+05   0.305   0.7604    
## WeekYear             NA         NA      NA       NA    
## Day           2.171e+03  1.269e+04   0.171   0.8642    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 520900 on 6424 degrees of freedom
## Multiple R-squared:  0.1495, Adjusted R-squared:  0.1482 
## F-statistic:   113 on 10 and 6424 DF,  p-value: < 2.2e-16

Ajustar regresión

df_ajustada<- df%>% select(-Store, -Date, -Year: -Day)
regresion_ajustada<- lm(Weekly_Sales ~., data = df_ajustada)
summary(regresion_ajustada)
## 
## Call:
## lm(formula = Weekly_Sales ~ ., data = df_ajustada)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1022429  -478555  -117266   397246  2800620 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1726523.4    79763.5  21.646  < 2e-16 ***
## Holiday_Flag   74891.7    27639.3   2.710  0.00675 ** 
## Temperature     -724.2      400.5  -1.808  0.07060 .  
## Fuel_Price    -10167.9    15762.8  -0.645  0.51891    
## CPI            -1598.9      195.1  -8.194 3.02e-16 ***
## Unemployment  -41552.3     3972.7 -10.460  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 557400 on 6429 degrees of freedom
## Multiple R-squared:  0.02544,    Adjusted R-squared:  0.02469 
## F-statistic: 33.57 on 5 and 6429 DF,  p-value: < 2.2e-16

Interpretación / conclusión

Podemos realizar las siguientes afirmaciones con base en los resultados presentados en el paso anterior:
* Las variables con mayor nivel de significancia son “Holiday_Flag”, “CPI” y “Unemployment”
* Holiday_Flag impacta de manera positiva a las ventas
* CPI & Unemployment impactan de manera negativa a las ventas

LS0tDQp0aXRsZTogIlJlZ3Jlc2nDs24gTGluZWFsIg0KYXV0aG9yOiAiTHVpcyBEYXZpZCBTw6FuY2hleiBDYXN0aWxsbyAtIEEwMTI3NTY1NSINCmRhdGU6ICIyLzIzLzIwMjQiDQpvdXRwdXQ6IA0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogeWVzDQogICAgdG9jX2Zsb2F0OiB5ZXMNCiAgICBjb2RlX2Rvd25sb2FkOiB5ZXMNCiAgICB0aGVtZTogeWV0aQ0KICBwZGZfZG9jdW1lbnQ6DQogICAgdG9jOiB5ZXMNCi0tLQ0KDQogICFbXShDOlxcVXNlcnNcXEx1aXNEXFxEb2N1bWVudHNcXENvbmNlbnRyYWNpw7NuXFxsaW5lYXIucG5nKQ0KDQojIENvbnRleHRvDQoNCkVsIGFuw6FsaXNpcyBkZSByZWdyZXNpw7NuIGxpbmVhbCBlcyB1bmEgdMOpY25pY2EgZXN0YWTDrXN0aWNhIGZ1bmRhbWVudGFsIHF1ZSBzZSB1dGlsaXphIHBhcmEgbW9kZWxhciB5IGVudGVuZGVyIGxhIHJlbGFjacOzbiBlbnRyZSB1bmEgdmFyaWFibGUgZGVwZW5kaWVudGUgeSB1bmEgbyBtw6FzIHZhcmlhYmxlcyBpbmRlcGVuZGllbnRlcy4gU3UgYXBsaWNhY2nDs24gcHJpbmNpcGFsIHJhZGljYSBlbiBsYSBwcmVkaWNjacOzbiB5IHByb27Ds3N0aWNvIGRlIHZhcmlhYmxlcywgYXPDrSBjb21vIGVuIGVsIGFuw6FsaXNpcyBkZSBsYSByZWxhY2nDs24gZW50cmUgdmFyaWFibGVzIHkgZW4gbGEgaW5mZXJlbmNpYSBlc3RhZMOtc3RpY2Egc29icmUgbG9zIGNvZWZpY2llbnRlcyBkZSByZWdyZXNpw7NuLCBlbiBlc3RlIGNhc28gdXNhcmVtb3MgZXN0ZSBhbsOhbGlzaXMgcGFyYSBwb2RlciBpZGVudGlmaWNhciBxdWUgdmFyaWFibGVzIHRpZW5lbiB1biBtYXlvciBpbXBhY3RvIGEgbGEgaG9yYSBkZSBwcmVkZWNpciBsYXMgdmVudGFzIHNlbWFuYWxlcyBlbiBsYXMgdGllbmRhcyBXYWxtYXJ0Lg0KDQoNCiMgUGFzb3MNCiMjIEluc3RhbGFyIGxpYnJlcsOtYXMgeSBwYXF1ZXRlcw0KYGBge3IgbWVzc2FnZT1GQUxTRSwgd2FybmluZz1GQUxTRX0NCiNpbnN0YWxsLnBhY2thZ2VzKCJ0aWR5dmVyc2UiKQ0KbGlicmFyeSh0aWR5dmVyc2UpDQpgYGANCg0KDQojIyBJbXBvcnRhciBiYXNlIGRlIGRhdG9zDQpgYGB7cn0NCmRmPC0gcmVhZC5jc3YoIkM6XFxVc2Vyc1xcTHVpc0RcXERvY3VtZW50c1xcQ29uY2VudHJhY2nDs25cXFdhbG1hcnRfU3RvcmVfc2FsZXMuY3N2IikNCmBgYA0KDQojIyBFbnRlbmRlciBsYSBiYXNlIGRlIGRhdG9zDQpgYGB7cn0NCmRmJERhdGU8LSBhcy5EYXRlKGRmJERhdGUsIGZvcm1hdD0iJWQtJW0tJVkiKQ0KDQpzdW1tYXJ5KGRmKQ0Kc3RyKGRmKQ0KDQpgYGANCg0KIyMgQWdyZWdhciB2YXJpYWJsZXMgYSBsYSBiYXNlIGRlIGRhdG9zDQpgYGB7cn0NCmRmJFllYXI8LSBmb3JtYXQoZGYkRGF0ZSwgIiVZIikNCmRmJFllYXI8LSBhcy5pbnRlZ2VyKGRmJFllYXIpDQoNCmRmJE1vbnRoPC0gZm9ybWF0KGRmJERhdGUsICIlbSIpDQpkZiRNb250aDwtIGFzLmludGVnZXIoZGYkTW9udGgpDQoNCmRmJFdlZWtZZWFyPC0gZm9ybWF0KGRmJERhdGUsICIlVyIpICNTaWduaWZpY2EgcXVlIGluaWNpYSBlbiBsdW5lcyBsYSBzZW1hbmENCmRmJFdlZWtZZWFyPC0gYXMuaW50ZWdlcihkZiRXZWVrWWVhcikNCg0KZGYkRGF5PC0gZm9ybWF0KGRmJERhdGUsICIlZCIpDQpkZiREYXk8LSBhcy5pbnRlZ2VyKGRmJERheSkNCg0KI2RmJFdlZWtEYXk8LSBmb3JtYXQoZGYkRGF0ZSwgIiV1IikNCiNkZiRXZWVrRGF5PC0gYXMuaW50ZWdlcihkZiRXZWVrRGF5KQ0KDQoNCnN0cihkZikNCg0KYGBgDQoNCiMjIEdlYmVyYXIgbGEgcmVncmVzacOzbiBsaW5lYWwNCmBgYHtyfQ0KcmVncmVzaW9uPC0gbG0oV2Vla2x5X1NhbGVzIH4uLCBkYXRhID0gZGYpDQpzdW1tYXJ5KHJlZ3Jlc2lvbikNCmBgYA0KDQojIyBBanVzdGFyIHJlZ3Jlc2nDs24NCmBgYHtyfQ0KZGZfYWp1c3RhZGE8LSBkZiU+JSBzZWxlY3QoLVN0b3JlLCAtRGF0ZSwgLVllYXI6IC1EYXkpDQpyZWdyZXNpb25fYWp1c3RhZGE8LSBsbShXZWVrbHlfU2FsZXMgfi4sIGRhdGEgPSBkZl9hanVzdGFkYSkNCnN1bW1hcnkocmVncmVzaW9uX2FqdXN0YWRhKQ0KYGBgDQoNCiMgSW50ZXJwcmV0YWNpw7NuIC8gY29uY2x1c2nDs24NClBvZGVtb3MgcmVhbGl6YXIgbGFzIHNpZ3VpZW50ZXMgYWZpcm1hY2lvbmVzIGNvbiBiYXNlIGVuIGxvcyByZXN1bHRhZG9zIHByZXNlbnRhZG9zIGVuIGVsIHBhc28gYW50ZXJpb3I6ICANCiogTGFzIHZhcmlhYmxlcyBjb24gbWF5b3Igbml2ZWwgZGUgc2lnbmlmaWNhbmNpYSBzb24gKiJIb2xpZGF5X0ZsYWciLCAiQ1BJIiB5ICJVbmVtcGxveW1lbnQiKiAgDQoqICpIb2xpZGF5X0ZsYWcqIGltcGFjdGEgZGUgbWFuZXJhIHBvc2l0aXZhIGEgbGFzIHZlbnRhcyAgDQoqICpDUEkgJiBVbmVtcGxveW1lbnQqIGltcGFjdGFuIGRlIG1hbmVyYSBuZWdhdGl2YSBhIGxhcyB2ZW50YXMgIA0KDQo=