REGRESION LINEAL

Importar la base de datos de csv

data <- read.csv("C:\\Users\\marco\\Downloads\\HousePriceData.csv")
#file.choose()
nuevo_data <- data[-348, ]

Entender los datos

str(nuevo_data)
## 'data.frame':    904 obs. of  10 variables:
##  $ Observation  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Dist_Taxi    : int  9796 8294 11001 8301 10510 6665 13153 5882 7495 8233 ...
##  $ Dist_Market  : int  5250 8186 14399 11188 12629 5142 11869 9948 11589 7067 ...
##  $ Dist_Hospital: int  10703 12694 16991 12289 13921 9972 17811 13315 13370 11400 ...
##  $ Carpet       : int  1659 1461 1340 1451 1770 1442 1542 1261 1090 1030 ...
##  $ Builtup      : int  1961 1752 1609 1748 2111 1733 1858 1507 1321 1235 ...
##  $ Parking      : chr  "Open" "Not Provided" "Not Provided" "Covered" ...
##  $ City_Category: chr  "CAT B" "CAT B" "CAT A" "CAT B" ...
##  $ Rainfall     : int  530 210 720 620 450 760 1030 1020 680 1130 ...
##  $ House_Price  : int  6649000 3982000 5401000 5373000 4662000 4526000 7224000 3772000 4631000 4415000 ...
summary(nuevo_data)
##   Observation      Dist_Taxi      Dist_Market    Dist_Hospital       Carpet    
##  Min.   :  1.0   Min.   :  146   Min.   : 1666   Min.   : 3227   Min.   : 775  
##  1st Qu.:236.8   1st Qu.: 6476   1st Qu.: 9366   1st Qu.:11302   1st Qu.:1317  
##  Median :469.5   Median : 8224   Median :11143   Median :13188   Median :1477  
##  Mean   :468.5   Mean   : 8222   Mean   :11011   Mean   :13079   Mean   :1486  
##  3rd Qu.:700.2   3rd Qu.: 9936   3rd Qu.:12668   3rd Qu.:14851   3rd Qu.:1653  
##  Max.   :932.0   Max.   :16850   Max.   :18281   Max.   :22407   Max.   :2229  
##                                                                  NA's   :7     
##     Builtup       Parking          City_Category         Rainfall     
##  Min.   : 932   Length:904         Length:904         Min.   :-110.0  
##  1st Qu.:1578   Class :character   Class :character   1st Qu.: 600.0  
##  Median :1774   Mode  :character   Mode  :character   Median : 780.0  
##  Mean   :1782                                         Mean   : 786.5  
##  3rd Qu.:1983                                         3rd Qu.: 970.0  
##  Max.   :2667                                         Max.   :1560.0  
##                                                                       
##   House_Price      
##  Min.   : 1492000  
##  1st Qu.: 4622750  
##  Median : 5857000  
##  Mean   : 5924793  
##  3rd Qu.: 7187250  
##  Max.   :11632000  
## 

Generar el modelo

regresion <- lm(House_Price~Dist_Taxi+Dist_Market+Dist_Hospital+Carpet+Builtup+factor(Parking)+factor(City_Category)+Rainfall, data=data)
summary(regresion)
## 
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + 
##     Carpet + Builtup + factor(Parking) + factor(City_Category) + 
##     Rainfall, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3586934  -837542   -65314   784513  4577689 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  5.568e+06  3.688e+05  15.097  < 2e-16 ***
## Dist_Taxi                    2.834e+01  2.694e+01   1.052   0.2931    
## Dist_Market                  1.237e+01  2.089e+01   0.592   0.5538    
## Dist_Hospital                5.071e+01  3.021e+01   1.679   0.0936 .  
## Carpet                       9.907e+03  1.428e+02  69.398  < 2e-16 ***
## Builtup                     -7.575e+03  2.412e+02 -31.403  < 2e-16 ***
## factor(Parking)No Parking   -6.170e+05  1.393e+05  -4.429 1.06e-05 ***
## factor(Parking)Not Provided -5.077e+05  1.239e+05  -4.096 4.58e-05 ***
## factor(Parking)Open         -2.597e+05  1.131e+05  -2.297   0.0218 *  
## factor(City_Category)CAT B  -1.883e+06  9.641e+04 -19.529  < 2e-16 ***
## factor(City_Category)CAT C  -2.902e+06  1.062e+05 -27.321  < 2e-16 ***
## Rainfall                    -9.984e+01  1.548e+02  -0.645   0.5191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1228000 on 886 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.9429, Adjusted R-squared:  0.9422 
## F-statistic:  1329 on 11 and 886 DF,  p-value: < 2.2e-16

GENERAR PRONOSTICOS

datos_nuevos <- data.frame(
  Dist_Taxi     = 8000,
  Dist_Market   = 11000,
  Dist_Hospital = 13000,
  Carpet        = 1500,
  Builtup       = 1800,
  Parking       = "Open",
  City_Category = "CAT A",
  Rainfall      = seq(600, 1200, length.out = 1)
)
predict(regresion, datos_nuevos)
##       1 
## 7495898

Conclusiones

El modelo de regresión lineal muestra que el precio de la vivienda está principalmente influenciado por el tamaño del inmueble (Carpet y Builtup), la ubicación (distancia a servicios) y la categoría de la ciudad. Las propiedades más grandes y ubicadas en ciudades de mayor categoría tienden a tener precios más altos.

El modelo permite realizar estimaciones bajo distintos escenarios, siendo una herramienta útil para predecir precios inmobiliarios, aunque podría mejorarse incluyendo más variables relevantes.

LS0tDQp0aXRsZTogIlJlZ3Jlc2lvbiBMaW5lYWwgLSBIb3VzaW5nIFByaWNlcyINCmF1dGhvcjogIk1hcmNvIEVzY29iYXIgQTAwODM5NDY3Ig0KZGF0ZTogIjIwMjYtMDItMTciDQpvdXRwdXQ6IA0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogVFJVRQ0KICAgIHRvY19mbG9hdDogVFJVRQ0KICAgIGNvZGVfZG93bmxvYWQ6IFRSVUUNCiAgICB0aGVtZTogY29zbW8NCi0tLQ0KIyBSRUdSRVNJT04gTElORUFMDQojIEltcG9ydGFyIGxhIGJhc2UgZGUgZGF0b3MgZGUgY3N2DQpgYGB7cn0NCmRhdGEgPC0gcmVhZC5jc3YoIkM6XFxVc2Vyc1xcbWFyY29cXERvd25sb2Fkc1xcSG91c2VQcmljZURhdGEuY3N2IikNCiNmaWxlLmNob29zZSgpDQpudWV2b19kYXRhIDwtIGRhdGFbLTM0OCwgXQ0KDQpgYGANCg0KIyBFbnRlbmRlciBsb3MgZGF0b3MgDQpgYGB7cn0NCnN0cihudWV2b19kYXRhKQ0Kc3VtbWFyeShudWV2b19kYXRhKQ0KYGBgDQoNCiMgR2VuZXJhciBlbCBtb2RlbG8gDQpgYGB7cn0NCnJlZ3Jlc2lvbiA8LSBsbShIb3VzZV9QcmljZX5EaXN0X1RheGkrRGlzdF9NYXJrZXQrRGlzdF9Ib3NwaXRhbCtDYXJwZXQrQnVpbHR1cCtmYWN0b3IoUGFya2luZykrZmFjdG9yKENpdHlfQ2F0ZWdvcnkpK1JhaW5mYWxsLCBkYXRhPWRhdGEpDQpzdW1tYXJ5KHJlZ3Jlc2lvbikNCmBgYA0KDQojIEdFTkVSQVIgUFJPTk9TVElDT1MNCmBgYHtyfQ0KZGF0b3NfbnVldm9zIDwtIGRhdGEuZnJhbWUoDQogIERpc3RfVGF4aSAgICAgPSA4MDAwLA0KICBEaXN0X01hcmtldCAgID0gMTEwMDAsDQogIERpc3RfSG9zcGl0YWwgPSAxMzAwMCwNCiAgQ2FycGV0ICAgICAgICA9IDE1MDAsDQogIEJ1aWx0dXAgICAgICAgPSAxODAwLA0KICBQYXJraW5nICAgICAgID0gIk9wZW4iLA0KICBDaXR5X0NhdGVnb3J5ID0gIkNBVCBBIiwNCiAgUmFpbmZhbGwgICAgICA9IHNlcSg2MDAsIDEyMDAsIGxlbmd0aC5vdXQgPSAxKQ0KKQ0KcHJlZGljdChyZWdyZXNpb24sIGRhdG9zX251ZXZvcykNCmBgYA0KIyBDb25jbHVzaW9uZXMgDQpFbCBtb2RlbG8gZGUgcmVncmVzacOzbiBsaW5lYWwgbXVlc3RyYSBxdWUgZWwgcHJlY2lvIGRlIGxhIHZpdmllbmRhIGVzdMOhIHByaW5jaXBhbG1lbnRlIGluZmx1ZW5jaWFkbyBwb3IgZWwgdGFtYcOxbyBkZWwgaW5tdWVibGUgKENhcnBldCB5IEJ1aWx0dXApLCBsYSB1YmljYWNpw7NuIChkaXN0YW5jaWEgYSBzZXJ2aWNpb3MpIHkgbGEgY2F0ZWdvcsOtYSBkZSBsYSBjaXVkYWQuIExhcyBwcm9waWVkYWRlcyBtw6FzIGdyYW5kZXMgeSB1YmljYWRhcyBlbiBjaXVkYWRlcyBkZSBtYXlvciBjYXRlZ29yw61hIHRpZW5kZW4gYSB0ZW5lciBwcmVjaW9zIG3DoXMgYWx0b3MuICANCg0KRWwgbW9kZWxvIHBlcm1pdGUgcmVhbGl6YXIgZXN0aW1hY2lvbmVzIGJham8gZGlzdGludG9zIGVzY2VuYXJpb3MsIHNpZW5kbyB1bmEgaGVycmFtaWVudGEgw7p0aWwgcGFyYSBwcmVkZWNpciBwcmVjaW9zIGlubW9iaWxpYXJpb3MsIGF1bnF1ZSBwb2Ryw61hIG1lam9yYXJzZSBpbmNsdXllbmRvIG3DoXMgdmFyaWFibGVzIHJlbGV2YW50ZXMuDQoNCg0KDQoNCg0K