Modelo de regresión lineal sobre el precio de las casas.

casas <- read.csv("C:\\Users\\Torres\\Downloads\\R\\R App\\HousePriceData.csv")
casas <- casas[-348, ]
str(casas)
## 'data.frame':    904 obs. of  10 variables:
##  $ Observation  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Dist_Taxi    : int  9796 8294 11001 8301 10510 6665 13153 5882 7495 8233 ...
##  $ Dist_Market  : int  5250 8186 14399 11188 12629 5142 11869 9948 11589 7067 ...
##  $ Dist_Hospital: int  10703 12694 16991 12289 13921 9972 17811 13315 13370 11400 ...
##  $ Carpet       : int  1659 1461 1340 1451 1770 1442 1542 1261 1090 1030 ...
##  $ Builtup      : int  1961 1752 1609 1748 2111 1733 1858 1507 1321 1235 ...
##  $ Parking      : chr  "Open" "Not Provided" "Not Provided" "Covered" ...
##  $ City_Category: chr  "CAT B" "CAT B" "CAT A" "CAT B" ...
##  $ Rainfall     : int  530 210 720 620 450 760 1030 1020 680 1130 ...
##  $ House_Price  : int  6649000 3982000 5401000 5373000 4662000 4526000 7224000 3772000 4631000 4415000 ...
summary(casas)
##   Observation      Dist_Taxi      Dist_Market    Dist_Hospital       Carpet    
##  Min.   :  1.0   Min.   :  146   Min.   : 1666   Min.   : 3227   Min.   : 775  
##  1st Qu.:236.8   1st Qu.: 6476   1st Qu.: 9366   1st Qu.:11302   1st Qu.:1317  
##  Median :469.5   Median : 8224   Median :11143   Median :13188   Median :1477  
##  Mean   :468.5   Mean   : 8222   Mean   :11011   Mean   :13079   Mean   :1486  
##  3rd Qu.:700.2   3rd Qu.: 9936   3rd Qu.:12668   3rd Qu.:14851   3rd Qu.:1653  
##  Max.   :932.0   Max.   :16850   Max.   :18281   Max.   :22407   Max.   :2229  
##                                                                  NA's   :7     
##     Builtup       Parking          City_Category         Rainfall     
##  Min.   : 932   Length:904         Length:904         Min.   :-110.0  
##  1st Qu.:1578   Class :character   Class :character   1st Qu.: 600.0  
##  Median :1774   Mode  :character   Mode  :character   Median : 780.0  
##  Mean   :1782                                         Mean   : 786.5  
##  3rd Qu.:1983                                         3rd Qu.: 970.0  
##  Max.   :2667                                         Max.   :1560.0  
##                                                                       
##   House_Price      
##  Min.   : 1492000  
##  1st Qu.: 4622750  
##  Median : 5857000  
##  Mean   : 5924793  
##  3rd Qu.: 7187250  
##  Max.   :11632000  
## 
regresion <- lm(House_Price~Dist_Taxi+Dist_Market+Dist_Hospital+Carpet+factor(Parking)+factor(City_Category)+Rainfall, data=casas)
summary(regresion)
## 
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + 
##     Carpet + factor(Parking) + factor(City_Category) + Rainfall, 
##     data = casas)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3573873  -805741   -63223   762582  4419284 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  5.599e+06  3.669e+05  15.261  < 2e-16 ***
## Dist_Taxi                    2.958e+01  2.680e+01   1.103   0.2701    
## Dist_Market                  1.197e+01  2.079e+01   0.576   0.5650    
## Dist_Hospital                4.955e+01  3.006e+01   1.648   0.0996 .  
## Carpet                       8.006e+02  1.641e+02   4.879 1.27e-06 ***
## factor(Parking)No Parking   -6.134e+05  1.386e+05  -4.426 1.08e-05 ***
## factor(Parking)Not Provided -4.947e+05  1.233e+05  -4.013 6.51e-05 ***
## factor(Parking)Open         -2.632e+05  1.125e+05  -2.339   0.0196 *  
## factor(City_Category)CAT B  -1.878e+06  9.593e+04 -19.573  < 2e-16 ***
## factor(City_Category)CAT C  -2.896e+06  1.057e+05 -27.404  < 2e-16 ***
## Rainfall                    -9.982e+01  1.540e+02  -0.648   0.5171    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1222000 on 886 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.5013, Adjusted R-squared:  0.4956 
## F-statistic: 89.05 on 10 and 886 DF,  p-value: < 2.2e-16
datos_nuevos <- data.frame(Dist_Taxi=9796, Dist_Market=5250, Dist_Hospital=10703, Carpet=1659, Builtup=1961, Parking="Open", City_Category="CAT B", Rainfall=530)

predict(regresion, datos_nuevos)
##       1 
## 5616423

```

Conclusiones.

Se puede apreciar que el modelo predictivo no se equivoca tanto con base en los datos analíticos en cuanto a House_Price, ya que el resultado cae perfectamente en la media de esta categoría.

El modelo dio una R cuadrada alta con un 87%, esto después de eliminar variables irrelevantes y factorizar aquellas categóricas.

Se puede apreciar que tanto carpet como builtup son las dos categorías que más impacto tienen en el precio de la casa.

Para el modelo final fue necesario deshacerse de la variable builtup, ya que esta en esencia era lo mismo que la variable carpet, por tanto se terminaban cancelando.

LS0tDQp0aXRsZTogIlJlZ3Jlc2nDs24gbGluZWFsIGNhc2FzIg0KYXV0aG9yOiAiT3NjYXIgUmV0ZXMgLSBBMDEzODM2NTMsIEVyaWNrIENhYmFsbGVybyAtIEEwMDgzODA2MSwgRGFoaXIgVG9ycmVzIC0gQTAxNTcxNjAxLCBNYXJjZWxvIFJleWVzIC0gQTAxNzIzMzIxLCBNYXhpbWlsaWFubyBHw7NtZXogLSBBMDE2MTI2NjMiDQpkYXRlOiAiMjAyNi0wMi0xNyINCm91dHB1dDogDQogIGh0bWxfZG9jdW1lbnQ6DQogICAgdG9jOiBUUlVFDQogICAgdG9jX2Zsb2F0OiBUUlVFDQogICAgY29kZV9kb3dubG9hZDogVFJVRQ0KICAgIHRoZW1lOiBjb3Ntbw0KLS0tDQoNCiMgTW9kZWxvIGRlIHJlZ3Jlc2nDs24gbGluZWFsIHNvYnJlIGVsIHByZWNpbyBkZSBsYXMgY2FzYXMuIA0KDQpgYGB7cn0NCmNhc2FzIDwtIHJlYWQuY3N2KCJDOlxcVXNlcnNcXFRvcnJlc1xcRG93bmxvYWRzXFxSXFxSIEFwcFxcSG91c2VQcmljZURhdGEuY3N2IikNCmNhc2FzIDwtIGNhc2FzWy0zNDgsIF0NCnN0cihjYXNhcykNCnN1bW1hcnkoY2FzYXMpDQpgYGANCg0KYGBge3J9DQpyZWdyZXNpb24gPC0gbG0oSG91c2VfUHJpY2V+RGlzdF9UYXhpK0Rpc3RfTWFya2V0K0Rpc3RfSG9zcGl0YWwrQ2FycGV0K2ZhY3RvcihQYXJraW5nKStmYWN0b3IoQ2l0eV9DYXRlZ29yeSkrUmFpbmZhbGwsIGRhdGE9Y2FzYXMpDQpzdW1tYXJ5KHJlZ3Jlc2lvbikNCmBgYA0KDQpgYGB7cn0NCmRhdG9zX251ZXZvcyA8LSBkYXRhLmZyYW1lKERpc3RfVGF4aT05Nzk2LCBEaXN0X01hcmtldD01MjUwLCBEaXN0X0hvc3BpdGFsPTEwNzAzLCBDYXJwZXQ9MTY1OSwgQnVpbHR1cD0xOTYxLCBQYXJraW5nPSJPcGVuIiwgQ2l0eV9DYXRlZ29yeT0iQ0FUIEIiLCBSYWluZmFsbD01MzApDQoNCnByZWRpY3QocmVncmVzaW9uLCBkYXRvc19udWV2b3MpDQpgYGANCmBgYA0KDQojIENvbmNsdXNpb25lcy4gDQoNClNlIHB1ZWRlIGFwcmVjaWFyIHF1ZSBlbCBtb2RlbG8gcHJlZGljdGl2byBubyBzZSBlcXVpdm9jYSB0YW50byBjb24gYmFzZSBlbiBsb3MgZGF0b3MgYW5hbMOtdGljb3MgZW4gY3VhbnRvIGEgSG91c2VfUHJpY2UsIHlhIHF1ZSBlbCByZXN1bHRhZG8gY2FlIHBlcmZlY3RhbWVudGUgZW4gbGEgbWVkaWEgZGUgZXN0YSBjYXRlZ29yw61hLiANCg0KRWwgbW9kZWxvIGRpbyB1bmEgUiBjdWFkcmFkYSBhbHRhIGNvbiB1biA4NyUsIGVzdG8gZGVzcHXDqXMgZGUgZWxpbWluYXIgdmFyaWFibGVzIGlycmVsZXZhbnRlcyB5IGZhY3Rvcml6YXIgYXF1ZWxsYXMgY2F0ZWfDs3JpY2FzLiANCg0KU2UgcHVlZGUgYXByZWNpYXIgcXVlIHRhbnRvIGNhcnBldCBjb21vIGJ1aWx0dXAgc29uIGxhcyBkb3MgY2F0ZWdvcsOtYXMgcXVlIG3DoXMgaW1wYWN0byB0aWVuZW4gZW4gZWwgcHJlY2lvIGRlIGxhIGNhc2EuIA0KDQpQYXJhIGVsIG1vZGVsbyBmaW5hbCBmdWUgbmVjZXNhcmlvIGRlc2hhY2Vyc2UgZGUgbGEgdmFyaWFibGUgYnVpbHR1cCwgeWEgcXVlIGVzdGEgZW4gZXNlbmNpYSBlcmEgbG8gbWlzbW8gcXVlIGxhIHZhcmlhYmxlIGNhcnBldCwgcG9yIHRhbnRvIHNlIHRlcm1pbmFiYW4gY2FuY2VsYW5kby4g