Regresión Lineal

Importar la base de datos de csv

Usar file.choose()

data <- read.csv("C:\\Users\\anavi\\Downloads\\HousePriceData.csv")

Entender la base de datos

str(data)
## 'data.frame':    905 obs. of  10 variables:
##  $ Observation  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Dist_Taxi    : int  9796 8294 11001 8301 10510 6665 13153 5882 7495 8233 ...
##  $ Dist_Market  : int  5250 8186 14399 11188 12629 5142 11869 9948 11589 7067 ...
##  $ Dist_Hospital: int  10703 12694 16991 12289 13921 9972 17811 13315 13370 11400 ...
##  $ Carpet       : int  1659 1461 1340 1451 1770 1442 1542 1261 1090 1030 ...
##  $ Builtup      : int  1961 1752 1609 1748 2111 1733 1858 1507 1321 1235 ...
##  $ Parking      : chr  "Open" "Not Provided" "Not Provided" "Covered" ...
##  $ City_Category: chr  "CAT B" "CAT B" "CAT A" "CAT B" ...
##  $ Rainfall     : int  530 210 720 620 450 760 1030 1020 680 1130 ...
##  $ House_Price  : int  6649000 3982000 5401000 5373000 4662000 4526000 7224000 3772000 4631000 4415000 ...
summary(data)
##   Observation      Dist_Taxi      Dist_Market    Dist_Hospital  
##  Min.   :  1.0   Min.   :  146   Min.   : 1666   Min.   : 3227  
##  1st Qu.:237.0   1st Qu.: 6477   1st Qu.: 9367   1st Qu.:11302  
##  Median :469.0   Median : 8228   Median :11149   Median :13189  
##  Mean   :468.4   Mean   : 8235   Mean   :11022   Mean   :13091  
##  3rd Qu.:700.0   3rd Qu.: 9939   3rd Qu.:12675   3rd Qu.:14855  
##  Max.   :932.0   Max.   :20662   Max.   :20945   Max.   :23294  
##                                                                 
##      Carpet         Builtup        Parking          City_Category     
##  Min.   :  775   Min.   :  932   Length:905         Length:905        
##  1st Qu.: 1317   1st Qu.: 1579   Class :character   Class :character  
##  Median : 1478   Median : 1774   Mode  :character   Mode  :character  
##  Mean   : 1511   Mean   : 1794                                        
##  3rd Qu.: 1654   3rd Qu.: 1985                                        
##  Max.   :24300   Max.   :12730                                        
##  NA's   :7                                                            
##     Rainfall       House_Price       
##  Min.   :-110.0   Min.   :  1492000  
##  1st Qu.: 600.0   1st Qu.:  4623000  
##  Median : 780.0   Median :  5860000  
##  Mean   : 786.9   Mean   :  6083992  
##  3rd Qu.: 970.0   3rd Qu.:  7200000  
##  Max.   :1560.0   Max.   :150000000  
## 

Generar el Modelo

regresion <- lm(House_Price~Dist_Taxi+Dist_Market+Dist_Hospital+Carpet+factor(Parking)+factor(City_Category)+Rainfall, data=data)
summary(regresion)
## 
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + 
##     Carpet + factor(Parking) + factor(City_Category) + Rainfall, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -4364144 -1142134   -44127  1154032 12009747 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                 -2.142e+06  3.998e+05  -5.357 1.08e-07 ***
## Dist_Taxi                    6.083e+01  3.911e+01   1.555  0.12020    
## Dist_Market                  4.530e+01  3.031e+01   1.494  0.13545    
## Dist_Hospital                1.640e+01  4.386e+01   0.374  0.70858    
## Carpet                       5.735e+03  7.593e+01  75.536  < 2e-16 ***
## factor(Parking)No Parking   -5.502e+05  2.023e+05  -2.719  0.00667 ** 
## factor(Parking)Not Provided -3.146e+05  1.798e+05  -1.749  0.08060 .  
## factor(Parking)Open         -1.458e+05  1.642e+05  -0.888  0.37466    
## factor(City_Category)CAT B  -1.990e+06  1.400e+05 -14.214  < 2e-16 ***
## factor(City_Category)CAT C  -2.979e+06  1.543e+05 -19.313  < 2e-16 ***
## Rainfall                     1.508e+02  2.246e+02   0.672  0.50200    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1784000 on 887 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.8793, Adjusted R-squared:  0.8779 
## F-statistic: 646.1 on 10 and 887 DF,  p-value: < 2.2e-16

Conclusiones

Modelo altamente significativo con un poder explicativo del 88%.

Carpet es el factor mas determinante: Cada m2 adicional aumenta el precio en aproximadamente $5,735. Y tambien es la variable con mayor impacto y significancia.

No tener estacionamiento reduce significativamente el valor, puede disminuir el precio en mas de $500,000.

Las distancias a servicios y la lluvia no influyen significativamente en el precio dentro de este dataset.

#Modelo predictivo

datos_nuevos <- data.frame(Dist_Taxi=13153, Dist_Market=11869, Dist_Hospital=17811, Carpet=1542, Parking="No Parking", City_Category="CAT A", Rainfall=1030)
predict(regresion, datos_nuevos)
##       1 
## 7936762
LS0tDQp0aXRsZTogInJlZ3Jlc2lvbjJmZWIxNyINCmF1dGhvcjogIkFuYSBWaWN0b3JpYSBWZW5lZ2FzIEEwMTU2NzI0NyINCmRhdGU6ICIyMDI2LTAyLTE3Ig0Kb3V0cHV0OiANCiAgaHRtbF9kb2N1bWVudDoNCiAgICB0b2M6IFRSVUUNCiAgICB0b2NfZmxvYXQ6IFRSVUUNCiAgICBjb2RlX2Rvd25sb2FkOiBUUlVFDQogICAgdGhlbWU6IGNvc21vDQotLS0NCiMgUmVncmVzacOzbiBMaW5lYWwNCiMgSW1wb3J0YXIgbGEgYmFzZSBkZSBkYXRvcyBkZSBjc3YNCiMgVXNhciBmaWxlLmNob29zZSgpDQpgYGB7cn0NCmRhdGEgPC0gcmVhZC5jc3YoIkM6XFxVc2Vyc1xcYW5hdmlcXERvd25sb2Fkc1xcSG91c2VQcmljZURhdGEuY3N2IikNCmBgYA0KIyBFbnRlbmRlciBsYSBiYXNlIGRlIGRhdG9zDQpgYGB7cn0NCnN0cihkYXRhKQ0Kc3VtbWFyeShkYXRhKQ0KYGBgDQojIEdlbmVyYXIgZWwgTW9kZWxvDQpgYGB7cn0NCnJlZ3Jlc2lvbiA8LSBsbShIb3VzZV9QcmljZX5EaXN0X1RheGkrRGlzdF9NYXJrZXQrRGlzdF9Ib3NwaXRhbCtDYXJwZXQrZmFjdG9yKFBhcmtpbmcpK2ZhY3RvcihDaXR5X0NhdGVnb3J5KStSYWluZmFsbCwgZGF0YT1kYXRhKQ0Kc3VtbWFyeShyZWdyZXNpb24pDQpgYGANCiMgQ29uY2x1c2lvbmVzDQojIE1vZGVsbyBhbHRhbWVudGUgc2lnbmlmaWNhdGl2byBjb24gdW4gcG9kZXIgZXhwbGljYXRpdm8gZGVsIDg4JS4NCiMgQ2FycGV0IGVzIGVsIGZhY3RvciBtYXMgZGV0ZXJtaW5hbnRlOiBDYWRhIG0yIGFkaWNpb25hbCBhdW1lbnRhIGVsIHByZWNpbyBlbiBhcHJveGltYWRhbWVudGUgJDUsNzM1LiBZIHRhbWJpZW4gZXMgbGEgdmFyaWFibGUgY29uIG1heW9yIGltcGFjdG8geSBzaWduaWZpY2FuY2lhLg0KIyBObyB0ZW5lciBlc3RhY2lvbmFtaWVudG8gcmVkdWNlIHNpZ25pZmljYXRpdmFtZW50ZSBlbCB2YWxvciwgcHVlZGUgZGlzbWludWlyIGVsIHByZWNpbyBlbiBtYXMgZGUgJDUwMCwwMDAuDQojIExhcyBkaXN0YW5jaWFzIGEgc2VydmljaW9zIHkgbGEgbGx1dmlhIG5vIGluZmx1eWVuIHNpZ25pZmljYXRpdmFtZW50ZSBlbiBlbCBwcmVjaW8gZGVudHJvIGRlIGVzdGUgZGF0YXNldC4NCg0KI01vZGVsbyBwcmVkaWN0aXZvDQpgYGB7cn0NCmRhdG9zX251ZXZvcyA8LSBkYXRhLmZyYW1lKERpc3RfVGF4aT0xMzE1MywgRGlzdF9NYXJrZXQ9MTE4NjksIERpc3RfSG9zcGl0YWw9MTc4MTEsIENhcnBldD0xNTQyLCBQYXJraW5nPSJObyBQYXJraW5nIiwgQ2l0eV9DYXRlZ29yeT0iQ0FUIEEiLCBSYWluZmFsbD0xMDMwKQ0KcHJlZGljdChyZWdyZXNpb24sIGRhdG9zX251ZXZvcykNCmBgYA0KDQo=