data <- read.csv("C:\\Users\\marco\\Downloads\\HousePriceData.csv")
#file.choose()
nuevo_data <- data[-348, ]
str(nuevo_data)
## 'data.frame': 904 obs. of 10 variables:
## $ Observation : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Dist_Taxi : int 9796 8294 11001 8301 10510 6665 13153 5882 7495 8233 ...
## $ Dist_Market : int 5250 8186 14399 11188 12629 5142 11869 9948 11589 7067 ...
## $ Dist_Hospital: int 10703 12694 16991 12289 13921 9972 17811 13315 13370 11400 ...
## $ Carpet : int 1659 1461 1340 1451 1770 1442 1542 1261 1090 1030 ...
## $ Builtup : int 1961 1752 1609 1748 2111 1733 1858 1507 1321 1235 ...
## $ Parking : chr "Open" "Not Provided" "Not Provided" "Covered" ...
## $ City_Category: chr "CAT B" "CAT B" "CAT A" "CAT B" ...
## $ Rainfall : int 530 210 720 620 450 760 1030 1020 680 1130 ...
## $ House_Price : int 6649000 3982000 5401000 5373000 4662000 4526000 7224000 3772000 4631000 4415000 ...
summary(nuevo_data)
## Observation Dist_Taxi Dist_Market Dist_Hospital Carpet
## Min. : 1.0 Min. : 146 Min. : 1666 Min. : 3227 Min. : 775
## 1st Qu.:236.8 1st Qu.: 6476 1st Qu.: 9366 1st Qu.:11302 1st Qu.:1317
## Median :469.5 Median : 8224 Median :11143 Median :13188 Median :1477
## Mean :468.5 Mean : 8222 Mean :11011 Mean :13079 Mean :1486
## 3rd Qu.:700.2 3rd Qu.: 9936 3rd Qu.:12668 3rd Qu.:14851 3rd Qu.:1653
## Max. :932.0 Max. :16850 Max. :18281 Max. :22407 Max. :2229
## NA's :7
## Builtup Parking City_Category Rainfall
## Min. : 932 Length:904 Length:904 Min. :-110.0
## 1st Qu.:1578 Class :character Class :character 1st Qu.: 600.0
## Median :1774 Mode :character Mode :character Median : 780.0
## Mean :1782 Mean : 786.5
## 3rd Qu.:1983 3rd Qu.: 970.0
## Max. :2667 Max. :1560.0
##
## House_Price
## Min. : 1492000
## 1st Qu.: 4622750
## Median : 5857000
## Mean : 5924793
## 3rd Qu.: 7187250
## Max. :11632000
##
regresion <- lm(House_Price~Dist_Taxi+Dist_Market+Dist_Hospital+Carpet+Builtup+factor(Parking)+factor(City_Category)+Rainfall, data=data)
summary(regresion)
##
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital +
## Carpet + Builtup + factor(Parking) + factor(City_Category) +
## Rainfall, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3586934 -837542 -65314 784513 4577689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.568e+06 3.688e+05 15.097 < 2e-16 ***
## Dist_Taxi 2.834e+01 2.694e+01 1.052 0.2931
## Dist_Market 1.237e+01 2.089e+01 0.592 0.5538
## Dist_Hospital 5.071e+01 3.021e+01 1.679 0.0936 .
## Carpet 9.907e+03 1.428e+02 69.398 < 2e-16 ***
## Builtup -7.575e+03 2.412e+02 -31.403 < 2e-16 ***
## factor(Parking)No Parking -6.170e+05 1.393e+05 -4.429 1.06e-05 ***
## factor(Parking)Not Provided -5.077e+05 1.239e+05 -4.096 4.58e-05 ***
## factor(Parking)Open -2.597e+05 1.131e+05 -2.297 0.0218 *
## factor(City_Category)CAT B -1.883e+06 9.641e+04 -19.529 < 2e-16 ***
## factor(City_Category)CAT C -2.902e+06 1.062e+05 -27.321 < 2e-16 ***
## Rainfall -9.984e+01 1.548e+02 -0.645 0.5191
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1228000 on 886 degrees of freedom
## (7 observations deleted due to missingness)
## Multiple R-squared: 0.9429, Adjusted R-squared: 0.9422
## F-statistic: 1329 on 11 and 886 DF, p-value: < 2.2e-16
datos_nuevos <- data.frame(
Dist_Taxi = 8000,
Dist_Market = 11000,
Dist_Hospital = 13000,
Carpet = 1500,
Builtup = 1800,
Parking = "Open",
City_Category = "CAT A",
Rainfall = seq(600, 1200, length.out = 1)
)
predict(regresion, datos_nuevos)
## 1
## 7495898
El modelo de regresión lineal muestra que el precio de la vivienda está principalmente influenciado por el tamaño del inmueble (Carpet y Builtup), la ubicación (distancia a servicios) y la categoría de la ciudad. Las propiedades más grandes y ubicadas en ciudades de mayor categoría tienden a tener precios más altos.
El modelo permite realizar estimaciones bajo distintos escenarios, siendo una herramienta útil para predecir precios inmobiliarios, aunque podría mejorarse incluyendo más variables relevantes.