House price data

1. Importar Bases de datos

df<- read.csv("HousePriceData.csv")

2. Entender la base de datos

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.3
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
summary(df)
##  ï..Observation    Dist_Taxi      Dist_Market    Dist_Hospital  
##  Min.   :  1.0   Min.   :  146   Min.   : 1666   Min.   : 3227  
##  1st Qu.:237.0   1st Qu.: 6477   1st Qu.: 9367   1st Qu.:11302  
##  Median :469.0   Median : 8228   Median :11149   Median :13189  
##  Mean   :468.4   Mean   : 8235   Mean   :11022   Mean   :13091  
##  3rd Qu.:700.0   3rd Qu.: 9939   3rd Qu.:12675   3rd Qu.:14855  
##  Max.   :932.0   Max.   :20662   Max.   :20945   Max.   :23294  
##                                                                 
##      Carpet         Builtup        Parking          City_Category     
##  Min.   :  775   Min.   :  932   Length:905         Length:905        
##  1st Qu.: 1317   1st Qu.: 1579   Class :character   Class :character  
##  Median : 1478   Median : 1774   Mode  :character   Mode  :character  
##  Mean   : 1511   Mean   : 1794                                        
##  3rd Qu.: 1654   3rd Qu.: 1985                                        
##  Max.   :24300   Max.   :12730                                        
##  NA's   :7                                                            
##     Rainfall       House_Price       
##  Min.   :-110.0   Min.   :  1492000  
##  1st Qu.: 600.0   1st Qu.:  4623000  
##  Median : 780.0   Median :  5860000  
##  Mean   : 786.9   Mean   :  6083992  
##  3rd Qu.: 970.0   3rd Qu.:  7200000  
##  Max.   :1560.0   Max.   :150000000  
## 
count(df, Parking, sort=TRUE)
##        Parking   n
## 1         Open 355
## 2 Not Provided 225
## 3      Covered 184
## 4   No Parking 141
count(df,City_Category, sort=TRUE)
##   City_Category   n
## 1         CAT B 351
## 2         CAT A 320
## 3         CAT C 234

Observación; 1. El precio de la casa esta con datos atipicos 2. Rainfall tiene valores negativos 3. carpet tiene 7 NA

3. Limpiar datos

# ¿Cuántos NA tengo en la base de datos?
sum(is.na(df))
## [1] 7
# ¿Cuantos NA tengo por variable?
sapply(df, function(x) sum(is.na(x)))
## ï..Observation      Dist_Taxi    Dist_Market  Dist_Hospital         Carpet 
##              0              0              0              0              7 
##        Builtup        Parking  City_Category       Rainfall    House_Price 
##              0              0              0              0              0
# Eliminar NA
df<- na.omit(df)

#Eliminar el registro del precio atípico
df<- df[df$House_Price<12000000,]

# eliminar el registro de lluvia atípico 
df<- df[df$Rainfall>=0,]

#Gráficas 
boxplot(df$House_Price, horizontal=TRUE)

4. Generar la regresión lineal

regresion <- lm(House_Price ~ Dist_Taxi + Dist_Market +Dist_Hospital +Carpet + City_Category + Rainfall +Builtup +Parking, data=df )
summary(regresion)
## 
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + 
##     Carpet + City_Category + Rainfall + Builtup + Parking, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3572009  -800792   -65720   761534  4401585 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          5.599e+06  3.672e+05  15.246  < 2e-16 ***
## Dist_Taxi            3.009e+01  2.682e+01   1.122   0.2622    
## Dist_Market          1.285e+01  2.081e+01   0.618   0.5370    
## Dist_Hospital        4.864e+01  3.008e+01   1.617   0.1062    
## Carpet              -7.997e+02  3.476e+03  -0.230   0.8181    
## City_CategoryCAT B  -1.877e+06  9.598e+04 -19.554  < 2e-16 ***
## City_CategoryCAT C  -2.890e+06  1.059e+05 -27.300  < 2e-16 ***
## Rainfall            -1.175e+02  1.550e+02  -0.758   0.4484    
## Builtup              1.339e+03  2.901e+03   0.462   0.6444    
## ParkingNo Parking   -6.040e+05  1.389e+05  -4.348 1.53e-05 ***
## ParkingNot Provided -4.924e+05  1.235e+05  -3.988 7.22e-05 ***
## ParkingOpen         -2.632e+05  1.126e+05  -2.338   0.0196 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1222000 on 884 degrees of freedom
## Multiple R-squared:  0.5007, Adjusted R-squared:  0.4945 
## F-statistic: 80.58 on 11 and 884 DF,  p-value: < 2.2e-16

5. Ajustar el modelo de regresión lineal

regresion <- lm(House_Price ~ Dist_Taxi + Dist_Market +Dist_Hospital +Carpet + City_Category + Rainfall +Builtup +Parking, data=df )
summary(regresion)
## 
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + 
##     Carpet + City_Category + Rainfall + Builtup + Parking, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3572009  -800792   -65720   761534  4401585 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          5.599e+06  3.672e+05  15.246  < 2e-16 ***
## Dist_Taxi            3.009e+01  2.682e+01   1.122   0.2622    
## Dist_Market          1.285e+01  2.081e+01   0.618   0.5370    
## Dist_Hospital        4.864e+01  3.008e+01   1.617   0.1062    
## Carpet              -7.997e+02  3.476e+03  -0.230   0.8181    
## City_CategoryCAT B  -1.877e+06  9.598e+04 -19.554  < 2e-16 ***
## City_CategoryCAT C  -2.890e+06  1.059e+05 -27.300  < 2e-16 ***
## Rainfall            -1.175e+02  1.550e+02  -0.758   0.4484    
## Builtup              1.339e+03  2.901e+03   0.462   0.6444    
## ParkingNo Parking   -6.040e+05  1.389e+05  -4.348 1.53e-05 ***
## ParkingNot Provided -4.924e+05  1.235e+05  -3.988 7.22e-05 ***
## ParkingOpen         -2.632e+05  1.126e+05  -2.338   0.0196 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1222000 on 884 degrees of freedom
## Multiple R-squared:  0.5007, Adjusted R-squared:  0.4945 
## F-statistic: 80.58 on 11 and 884 DF,  p-value: < 2.2e-16

6. Construir un modelo predictivo

datos<- data.frame(Dist_Taxi=8278, Dist_Market=16251, Dist_Hospital=13857, Carpet=1455, Builtup=1764, Parking="Covered", City_Category="CAT A", Rainfall=390)
predict(regresion,datos)
##       1 
## 7883860

Importar datos

df<- read.csv("HousePriceData.csv")

Limpiar bases de datos

# ¿Cuántos NA tengo en la base de datos?
sum(is.na(df))
## [1] 7
# ¿Cuantos NA tengo por variable?
sapply(df, function(x) sum(is.na(x)))
## ï..Observation      Dist_Taxi    Dist_Market  Dist_Hospital         Carpet 
##              0              0              0              0              7 
##        Builtup        Parking  City_Category       Rainfall    House_Price 
##              0              0              0              0              0
# Eliminar NA
df<- na.omit(df)

#Eliminar el registro del precio atípico
df<- df[df$House_Price<12000000,]

# eliminar el registro de lluvia atípico 
df<- df[df$Rainfall>=0,]
# Instala y carga la biblioteca Shiny
if (!require(shiny)) {
  install.packages("shiny")
  library(shiny)

}
## Loading required package: shiny
# Define la interfaz de la aplicación Shiny
ui <- fluidPage(
  titlePanel("Predicción de Precio de Casa"),
  sidebarLayout(
    sidebarPanel(
      numericInput("Dist_Taxi", "Distancia al Taxi:", value = 8278),
      numericInput("Dist_Market", "Distancia al Mercado:", value = 16251),
      numericInput("Dist_Hospital", "Distancia al Hospital:", value = 13857),
      numericInput("Carpet", "Área de la Alfombra:", value = 1455),
      selectInput("Parking", "Tipo de Estacionamiento:",
                  choices = unique(df$Parking)),
      selectInput("City_Category", "Categoría de la Ciudad:",
                  choices = unique(df$City_Category)),
      numericInput("Rainfall", "Lluvia:", value = 390),
      numericInput("Builtup", "Área Construida:", value = 1764), # Agrega el input para Builtup
      actionButton("submitBtn", "Obtener Precio"),
      hr()
    ),
    mainPanel(
      verbatimTextOutput("predictionText")
    )
  )
)

# Define la función de servidor para la aplicación Shiny
server <- function(input, output) {
  model <- NULL
  
  observeEvent(input$submitBtn, {
    datos <- data.frame(
      Dist_Taxi = input$Dist_Taxi,
      Dist_Market = input$Dist_Market,
      Dist_Hospital = input$Dist_Hospital,
      Carpet = input$Carpet,
      Parking = input$Parking,
      City_Category = input$City_Category,
      Rainfall = input$Rainfall,
      Builtup = input$Builtup # Utiliza el valor ingresado por el usuario para Builtup
    )
    
    # Generar el modelo de regresión lineal con los datos actuales
    model <- lm(House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital +
                Carpet + Parking + City_Category + Rainfall + Builtup, data = df)
    
    prediction <- predict(model, datos)
    output$predictionText <- renderText({
      paste("Precio de la Casa Estimado:", round(prediction, 2))
    })
  })
}

# Crea la aplicación Shiny
shinyApp(ui, server)
Shiny applications not supported in static R Markdown documents
LS0tDQp0aXRsZTogIlJlbnRhIGRlIEJpY2lzIg0KYXV0aG9yOiAiQW5kcsOpcyBHYXJjw61hX0EwMTE5NzQxMSINCmRhdGU6ICI5LzE5LzIwMjMiDQpvdXRwdXQ6IA0KICBodG1sX2RvY3VtZW50Og0KICAgIHRvYzogVFJVRQ0KICAgIHRvY19mbG9hdDogVFJVRQ0KICAgIGNvZGVfZG93bmxvYWQ6IFRSVUUNCiAgICB0aGVtZTogInNpbXBsZXgiDQogICAgaGlnaGxpZ2h0OiAicHlnbWVudHMiDQotLS0NCg0KDQojIDxzcGFuIHN0eWxlPSJjb2xvcjogcmVkOyI+SG91c2UgcHJpY2UgZGF0YTwvc3Bhbj4NCg0KIVtdKGNhc2EuanBnKQ0KDQojIyAxLiBJbXBvcnRhciBCYXNlcyBkZSBkYXRvcw0KDQpgYGB7cn0NCmRmPC0gcmVhZC5jc3YoIkhvdXNlUHJpY2VEYXRhLmNzdiIpDQoNCmBgYA0KDQojIyAyLiBFbnRlbmRlciBsYSBiYXNlIGRlIGRhdG9zDQoNCmBgYHtyfQ0KbGlicmFyeShkcGx5cikNCnN1bW1hcnkoZGYpDQpjb3VudChkZiwgUGFya2luZywgc29ydD1UUlVFKQ0KY291bnQoZGYsQ2l0eV9DYXRlZ29yeSwgc29ydD1UUlVFKQ0KDQpgYGANCk9ic2VydmFjacOzbjsNCjEuIEVsIHByZWNpbyBkZSBsYSBjYXNhIGVzdGEgY29uIGRhdG9zIGF0aXBpY29zDQoyLiBSYWluZmFsbCB0aWVuZSB2YWxvcmVzIG5lZ2F0aXZvcw0KMy4gY2FycGV0IHRpZW5lIDcgTkEgDQoNCiMjIDMuIExpbXBpYXIgZGF0b3MNCg0KYGBge3J9DQojIMK/Q3XDoW50b3MgTkEgdGVuZ28gZW4gbGEgYmFzZSBkZSBkYXRvcz8NCnN1bShpcy5uYShkZikpDQoNCiMgwr9DdWFudG9zIE5BIHRlbmdvIHBvciB2YXJpYWJsZT8NCnNhcHBseShkZiwgZnVuY3Rpb24oeCkgc3VtKGlzLm5hKHgpKSkNCg0KIyBFbGltaW5hciBOQQ0KZGY8LSBuYS5vbWl0KGRmKQ0KDQojRWxpbWluYXIgZWwgcmVnaXN0cm8gZGVsIHByZWNpbyBhdMOtcGljbw0KZGY8LSBkZltkZiRIb3VzZV9QcmljZTwxMjAwMDAwMCxdDQoNCiMgZWxpbWluYXIgZWwgcmVnaXN0cm8gZGUgbGx1dmlhIGF0w61waWNvIA0KZGY8LSBkZltkZiRSYWluZmFsbD49MCxdDQoNCiNHcsOhZmljYXMgDQpib3hwbG90KGRmJEhvdXNlX1ByaWNlLCBob3Jpem9udGFsPVRSVUUpDQpgYGANCg0KDQojIyA0LiBHZW5lcmFyIGxhIHJlZ3Jlc2nDs24gbGluZWFsIA0KDQpgYGB7cn0NCnJlZ3Jlc2lvbiA8LSBsbShIb3VzZV9QcmljZSB+IERpc3RfVGF4aSArIERpc3RfTWFya2V0ICtEaXN0X0hvc3BpdGFsICtDYXJwZXQgKyBDaXR5X0NhdGVnb3J5ICsgUmFpbmZhbGwgK0J1aWx0dXAgK1BhcmtpbmcsIGRhdGE9ZGYgKQ0Kc3VtbWFyeShyZWdyZXNpb24pDQpgYGANCg0KIyMgNS4gQWp1c3RhciBlbCBtb2RlbG8gZGUgcmVncmVzacOzbiBsaW5lYWwgDQoNCmBgYHtyfQ0KcmVncmVzaW9uIDwtIGxtKEhvdXNlX1ByaWNlIH4gRGlzdF9UYXhpICsgRGlzdF9NYXJrZXQgK0Rpc3RfSG9zcGl0YWwgK0NhcnBldCArIENpdHlfQ2F0ZWdvcnkgKyBSYWluZmFsbCArQnVpbHR1cCArUGFya2luZywgZGF0YT1kZiApDQpzdW1tYXJ5KHJlZ3Jlc2lvbikNCg0KYGBgDQoNCiMjIDYuIENvbnN0cnVpciB1biBtb2RlbG8gcHJlZGljdGl2byANCg0KYGBge3J9DQpkYXRvczwtIGRhdGEuZnJhbWUoRGlzdF9UYXhpPTgyNzgsIERpc3RfTWFya2V0PTE2MjUxLCBEaXN0X0hvc3BpdGFsPTEzODU3LCBDYXJwZXQ9MTQ1NSwgQnVpbHR1cD0xNzY0LCBQYXJraW5nPSJDb3ZlcmVkIiwgQ2l0eV9DYXRlZ29yeT0iQ0FUIEEiLCBSYWluZmFsbD0zOTApDQpwcmVkaWN0KHJlZ3Jlc2lvbixkYXRvcykNCmBgYA0KIyMgSW1wb3J0YXIgZGF0b3MNCmBgYHtyfQ0KDQpkZjwtIHJlYWQuY3N2KCJIb3VzZVByaWNlRGF0YS5jc3YiKQ0KDQpgYGANCg0KIyMgTGltcGlhciBiYXNlcyBkZSBkYXRvcyANCmBgYHtyfQ0KIyDCv0N1w6FudG9zIE5BIHRlbmdvIGVuIGxhIGJhc2UgZGUgZGF0b3M/DQpzdW0oaXMubmEoZGYpKQ0KDQojIMK/Q3VhbnRvcyBOQSB0ZW5nbyBwb3IgdmFyaWFibGU/DQpzYXBwbHkoZGYsIGZ1bmN0aW9uKHgpIHN1bShpcy5uYSh4KSkpDQoNCiMgRWxpbWluYXIgTkENCmRmPC0gbmEub21pdChkZikNCg0KI0VsaW1pbmFyIGVsIHJlZ2lzdHJvIGRlbCBwcmVjaW8gYXTDrXBpY28NCmRmPC0gZGZbZGYkSG91c2VfUHJpY2U8MTIwMDAwMDAsXQ0KDQojIGVsaW1pbmFyIGVsIHJlZ2lzdHJvIGRlIGxsdXZpYSBhdMOtcGljbyANCmRmPC0gZGZbZGYkUmFpbmZhbGw+PTAsXQ0KYGBgDQoNCmBgYHtyfQ0KIyBJbnN0YWxhIHkgY2FyZ2EgbGEgYmlibGlvdGVjYSBTaGlueQ0KaWYgKCFyZXF1aXJlKHNoaW55KSkgew0KICBpbnN0YWxsLnBhY2thZ2VzKCJzaGlueSIpDQogIGxpYnJhcnkoc2hpbnkpDQoNCn0NCg0KIyBEZWZpbmUgbGEgaW50ZXJmYXogZGUgbGEgYXBsaWNhY2nDs24gU2hpbnkNCnVpIDwtIGZsdWlkUGFnZSgNCiAgdGl0bGVQYW5lbCgiUHJlZGljY2nDs24gZGUgUHJlY2lvIGRlIENhc2EiKSwNCiAgc2lkZWJhckxheW91dCgNCiAgICBzaWRlYmFyUGFuZWwoDQogICAgICBudW1lcmljSW5wdXQoIkRpc3RfVGF4aSIsICJEaXN0YW5jaWEgYWwgVGF4aToiLCB2YWx1ZSA9IDgyNzgpLA0KICAgICAgbnVtZXJpY0lucHV0KCJEaXN0X01hcmtldCIsICJEaXN0YW5jaWEgYWwgTWVyY2FkbzoiLCB2YWx1ZSA9IDE2MjUxKSwNCiAgICAgIG51bWVyaWNJbnB1dCgiRGlzdF9Ib3NwaXRhbCIsICJEaXN0YW5jaWEgYWwgSG9zcGl0YWw6IiwgdmFsdWUgPSAxMzg1NyksDQogICAgICBudW1lcmljSW5wdXQoIkNhcnBldCIsICLDgXJlYSBkZSBsYSBBbGZvbWJyYToiLCB2YWx1ZSA9IDE0NTUpLA0KICAgICAgc2VsZWN0SW5wdXQoIlBhcmtpbmciLCAiVGlwbyBkZSBFc3RhY2lvbmFtaWVudG86IiwNCiAgICAgICAgICAgICAgICAgIGNob2ljZXMgPSB1bmlxdWUoZGYkUGFya2luZykpLA0KICAgICAgc2VsZWN0SW5wdXQoIkNpdHlfQ2F0ZWdvcnkiLCAiQ2F0ZWdvcsOtYSBkZSBsYSBDaXVkYWQ6IiwNCiAgICAgICAgICAgICAgICAgIGNob2ljZXMgPSB1bmlxdWUoZGYkQ2l0eV9DYXRlZ29yeSkpLA0KICAgICAgbnVtZXJpY0lucHV0KCJSYWluZmFsbCIsICJMbHV2aWE6IiwgdmFsdWUgPSAzOTApLA0KICAgICAgbnVtZXJpY0lucHV0KCJCdWlsdHVwIiwgIsOBcmVhIENvbnN0cnVpZGE6IiwgdmFsdWUgPSAxNzY0KSwgIyBBZ3JlZ2EgZWwgaW5wdXQgcGFyYSBCdWlsdHVwDQogICAgICBhY3Rpb25CdXR0b24oInN1Ym1pdEJ0biIsICJPYnRlbmVyIFByZWNpbyIpLA0KICAgICAgaHIoKQ0KICAgICksDQogICAgbWFpblBhbmVsKA0KICAgICAgdmVyYmF0aW1UZXh0T3V0cHV0KCJwcmVkaWN0aW9uVGV4dCIpDQogICAgKQ0KICApDQopDQoNCiMgRGVmaW5lIGxhIGZ1bmNpw7NuIGRlIHNlcnZpZG9yIHBhcmEgbGEgYXBsaWNhY2nDs24gU2hpbnkNCnNlcnZlciA8LSBmdW5jdGlvbihpbnB1dCwgb3V0cHV0KSB7DQogIG1vZGVsIDwtIE5VTEwNCiAgDQogIG9ic2VydmVFdmVudChpbnB1dCRzdWJtaXRCdG4sIHsNCiAgICBkYXRvcyA8LSBkYXRhLmZyYW1lKA0KICAgICAgRGlzdF9UYXhpID0gaW5wdXQkRGlzdF9UYXhpLA0KICAgICAgRGlzdF9NYXJrZXQgPSBpbnB1dCREaXN0X01hcmtldCwNCiAgICAgIERpc3RfSG9zcGl0YWwgPSBpbnB1dCREaXN0X0hvc3BpdGFsLA0KICAgICAgQ2FycGV0ID0gaW5wdXQkQ2FycGV0LA0KICAgICAgUGFya2luZyA9IGlucHV0JFBhcmtpbmcsDQogICAgICBDaXR5X0NhdGVnb3J5ID0gaW5wdXQkQ2l0eV9DYXRlZ29yeSwNCiAgICAgIFJhaW5mYWxsID0gaW5wdXQkUmFpbmZhbGwsDQogICAgICBCdWlsdHVwID0gaW5wdXQkQnVpbHR1cCAjIFV0aWxpemEgZWwgdmFsb3IgaW5ncmVzYWRvIHBvciBlbCB1c3VhcmlvIHBhcmEgQnVpbHR1cA0KICAgICkNCiAgICANCiAgICAjIEdlbmVyYXIgZWwgbW9kZWxvIGRlIHJlZ3Jlc2nDs24gbGluZWFsIGNvbiBsb3MgZGF0b3MgYWN0dWFsZXMNCiAgICBtb2RlbCA8LSBsbShIb3VzZV9QcmljZSB+IERpc3RfVGF4aSArIERpc3RfTWFya2V0ICsgRGlzdF9Ib3NwaXRhbCArDQogICAgICAgICAgICAgICAgQ2FycGV0ICsgUGFya2luZyArIENpdHlfQ2F0ZWdvcnkgKyBSYWluZmFsbCArIEJ1aWx0dXAsIGRhdGEgPSBkZikNCiAgICANCiAgICBwcmVkaWN0aW9uIDwtIHByZWRpY3QobW9kZWwsIGRhdG9zKQ0KICAgIG91dHB1dCRwcmVkaWN0aW9uVGV4dCA8LSByZW5kZXJUZXh0KHsNCiAgICAgIHBhc3RlKCJQcmVjaW8gZGUgbGEgQ2FzYSBFc3RpbWFkbzoiLCByb3VuZChwcmVkaWN0aW9uLCAyKSkNCiAgICB9KQ0KICB9KQ0KfQ0KDQojIENyZWEgbGEgYXBsaWNhY2nDs24gU2hpbnkNCnNoaW55QXBwKHVpLCBzZXJ2ZXIpDQoNCg0KYGBgDQo=