Regresión Lineal

library(readr)
data <- read_csv("HousePriceData.csv")
## Rows: 905 Columns: 10
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Parking, City_Category
## dbl (8): Observation, Dist_Taxi, Dist_Market, Dist_Hospital, Carpet, Builtup...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Entender la data

str(data)
## spc_tbl_ [905 × 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Observation  : num [1:905] 1 2 3 4 5 6 7 8 9 10 ...
##  $ Dist_Taxi    : num [1:905] 9796 8294 11001 8301 10510 ...
##  $ Dist_Market  : num [1:905] 5250 8186 14399 11188 12629 ...
##  $ Dist_Hospital: num [1:905] 10703 12694 16991 12289 13921 ...
##  $ Carpet       : num [1:905] 1659 1461 1340 1451 1770 ...
##  $ Builtup      : num [1:905] 1961 1752 1609 1748 2111 ...
##  $ Parking      : chr [1:905] "Open" "Not Provided" "Not Provided" "Covered" ...
##  $ City_Category: chr [1:905] "CAT B" "CAT B" "CAT A" "CAT B" ...
##  $ Rainfall     : num [1:905] 530 210 720 620 450 760 1030 1020 680 1130 ...
##  $ House_Price  : num [1:905] 6649000 3982000 5401000 5373000 4662000 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Observation = col_double(),
##   ..   Dist_Taxi = col_double(),
##   ..   Dist_Market = col_double(),
##   ..   Dist_Hospital = col_double(),
##   ..   Carpet = col_double(),
##   ..   Builtup = col_double(),
##   ..   Parking = col_character(),
##   ..   City_Category = col_character(),
##   ..   Rainfall = col_double(),
##   ..   House_Price = col_double()
##   .. )
##  - attr(*, "problems")=<externalptr>
summary(data)
##   Observation      Dist_Taxi      Dist_Market    Dist_Hospital  
##  Min.   :  1.0   Min.   :  146   Min.   : 1666   Min.   : 3227  
##  1st Qu.:237.0   1st Qu.: 6477   1st Qu.: 9367   1st Qu.:11302  
##  Median :469.0   Median : 8228   Median :11149   Median :13189  
##  Mean   :468.4   Mean   : 8235   Mean   :11022   Mean   :13091  
##  3rd Qu.:700.0   3rd Qu.: 9939   3rd Qu.:12675   3rd Qu.:14855  
##  Max.   :932.0   Max.   :20662   Max.   :20945   Max.   :23294  
##                                                                 
##      Carpet         Builtup        Parking          City_Category     
##  Min.   :  775   Min.   :  932   Length:905         Length:905        
##  1st Qu.: 1317   1st Qu.: 1579   Class :character   Class :character  
##  Median : 1478   Median : 1774   Mode  :character   Mode  :character  
##  Mean   : 1511   Mean   : 1794                                        
##  3rd Qu.: 1654   3rd Qu.: 1985                                        
##  Max.   :24300   Max.   :12730                                        
##  NA's   :7                                                            
##     Rainfall       House_Price       
##  Min.   :-110.0   Min.   :  1492000  
##  1st Qu.: 600.0   1st Qu.:  4623000  
##  Median : 780.0   Median :  5860000  
##  Mean   : 786.9   Mean   :  6083992  
##  3rd Qu.: 970.0   3rd Qu.:  7200000  
##  Max.   :1560.0   Max.   :150000000  
## 
head(data)
## # A tibble: 6 × 10
##   Observation Dist_Taxi Dist_Market Dist_Hospital Carpet Builtup Parking     
##         <dbl>     <dbl>       <dbl>         <dbl>  <dbl>   <dbl> <chr>       
## 1           1      9796        5250         10703   1659    1961 Open        
## 2           2      8294        8186         12694   1461    1752 Not Provided
## 3           3     11001       14399         16991   1340    1609 Not Provided
## 4           4      8301       11188         12289   1451    1748 Covered     
## 5           5     10510       12629         13921   1770    2111 Not Provided
## 6           6      6665        5142          9972   1442    1733 Open        
## # ℹ 3 more variables: City_Category <chr>, Rainfall <dbl>, House_Price <dbl>
colnames(data)
##  [1] "Observation"   "Dist_Taxi"     "Dist_Market"   "Dist_Hospital"
##  [5] "Carpet"        "Builtup"       "Parking"       "City_Category"
##  [9] "Rainfall"      "House_Price"

Generar el modelo

regresion <- lm(House_Price~ Dist_Taxi+ Dist_Market+Dist_Hospital+ Carpet+Builtup+factor(Parking)+ factor(City_Category)+Rainfall, data=data)
summary(regresion)
## 
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + 
##     Carpet + Builtup + factor(Parking) + factor(City_Category) + 
##     Rainfall, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3586934  -837542   -65314   784513  4577689 
## 
## Coefficients:
##                               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  5.568e+06  3.688e+05  15.097  < 2e-16 ***
## Dist_Taxi                    2.834e+01  2.694e+01   1.052   0.2931    
## Dist_Market                  1.237e+01  2.089e+01   0.592   0.5538    
## Dist_Hospital                5.071e+01  3.021e+01   1.679   0.0936 .  
## Carpet                       9.907e+03  1.428e+02  69.398  < 2e-16 ***
## Builtup                     -7.575e+03  2.412e+02 -31.403  < 2e-16 ***
## factor(Parking)No Parking   -6.170e+05  1.393e+05  -4.429 1.06e-05 ***
## factor(Parking)Not Provided -5.077e+05  1.239e+05  -4.096 4.58e-05 ***
## factor(Parking)Open         -2.597e+05  1.131e+05  -2.297   0.0218 *  
## factor(City_Category)CAT B  -1.883e+06  9.641e+04 -19.529  < 2e-16 ***
## factor(City_Category)CAT C  -2.902e+06  1.062e+05 -27.321  < 2e-16 ***
## Rainfall                    -9.984e+01  1.548e+02  -0.645   0.5191    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1228000 on 886 degrees of freedom
##   (7 observations deleted due to missingness)
## Multiple R-squared:  0.9429, Adjusted R-squared:  0.9422 
## F-statistic:  1329 on 11 and 886 DF,  p-value: < 2.2e-16

Conclusiones

Poder explicativo del modelo = 94%
Modelo altamente significativo estadísticamente. Variables como Carpet y Dist_Hospital son significativas estadísticamente y tienen un impacto positivo en el precio de la casa. Mientras tanto, todos los factores de Parking y City_Category muestran también altos niveles de significancia, pero finalmente un impacto negativo en el precio de las casas.

LS0tDQp0aXRsZTogIlJlZ3Jlc2lvbl9saW5lYWwiDQphdXRob3I6ICJEaWVnbyBRdWV2ZWRvIFNhcmFiaWEiDQpkYXRlOiAiMjAyNi0wMi0xNyINCm91dHB1dDogDQogIGh0bWxfZG9jdW1lbnQ6DQogICAgdG9jOiBUUlVFDQogICAgdG9jX2Zsb2F0OiBUUlVFDQogICAgY29kZV9kb3dubG9hZDogVFJVRQ0KICAgIHRoZW1lOiBjb3Ntbw0KLS0tDQoNCiMgUmVncmVzacOzbiBMaW5lYWwNCg0KYGBge3J9DQpsaWJyYXJ5KHJlYWRyKQ0KZGF0YSA8LSByZWFkX2NzdigiSG91c2VQcmljZURhdGEuY3N2IikNCmBgYA0KDQoNCiMgRW50ZW5kZXIgbGEgZGF0YQ0KYGBge3J9DQpzdHIoZGF0YSkNCnN1bW1hcnkoZGF0YSkNCmhlYWQoZGF0YSkNCmBgYA0KDQpgYGB7cn0NCmNvbG5hbWVzKGRhdGEpDQpgYGANCg0KDQojIEdlbmVyYXIgZWwgbW9kZWxvDQoNCmBgYHtyfQ0KcmVncmVzaW9uIDwtIGxtKEhvdXNlX1ByaWNlfiBEaXN0X1RheGkrIERpc3RfTWFya2V0K0Rpc3RfSG9zcGl0YWwrIENhcnBldCtCdWlsdHVwK2ZhY3RvcihQYXJraW5nKSsgZmFjdG9yKENpdHlfQ2F0ZWdvcnkpK1JhaW5mYWxsLCBkYXRhPWRhdGEpDQpzdW1tYXJ5KHJlZ3Jlc2lvbikNCmBgYA0KDQoNCiMgQ29uY2x1c2lvbmVzDQpQb2RlciBleHBsaWNhdGl2byBkZWwgbW9kZWxvID0gOTQlICANCk1vZGVsbyBhbHRhbWVudGUgc2lnbmlmaWNhdGl2byBlc3RhZMOtc3RpY2FtZW50ZS4NClZhcmlhYmxlcyBjb21vIENhcnBldCB5IERpc3RfSG9zcGl0YWwgc29uIHNpZ25pZmljYXRpdmFzIGVzdGFkw61zdGljYW1lbnRlIHkgdGllbmVuIHVuIGltcGFjdG8gcG9zaXRpdm8gZW4gZWwgcHJlY2lvIGRlIGxhIGNhc2EuDQpNaWVudHJhcyB0YW50bywgdG9kb3MgbG9zIGZhY3RvcmVzIGRlIFBhcmtpbmcgeSBDaXR5X0NhdGVnb3J5IG11ZXN0cmFuIHRhbWJpw6luIGFsdG9zIG5pdmVsZXMgZGUgc2lnbmlmaWNhbmNpYSwgcGVybyBmaW5hbG1lbnRlIHVuIGltcGFjdG8gbmVnYXRpdm8gZW4gZWwgcHJlY2lvIGRlIGxhcyBjYXNhcy4NCg0K