## hora dia mes año
## Min. : 0.00 Min. : 1.000 Min. : 1.000 Min. :2011
## 1st Qu.: 6.00 1st Qu.: 5.000 1st Qu.: 4.000 1st Qu.:2011
## Median :12.00 Median :10.000 Median : 7.000 Median :2012
## Mean :11.54 Mean : 9.993 Mean : 6.521 Mean :2012
## 3rd Qu.:18.00 3rd Qu.:15.000 3rd Qu.:10.000 3rd Qu.:2012
## Max. :23.00 Max. :19.000 Max. :12.000 Max. :2012
## estacion dia_de_la_semana asueto temperatura
## Min. :1.000 Min. :1.000 Min. :0.00000 Min. : 0.82
## 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:0.00000 1st Qu.:13.94
## Median :3.000 Median :4.000 Median :0.00000 Median :20.50
## Mean :2.507 Mean :4.014 Mean :0.02857 Mean :20.23
## 3rd Qu.:4.000 3rd Qu.:6.000 3rd Qu.:0.00000 3rd Qu.:26.24
## Max. :4.000 Max. :7.000 Max. :1.00000 Max. :41.00
## sensacion_termica humedad velocidad_del_viento
## Min. : 0.76 Min. : 0.00 Min. : 0.000
## 1st Qu.:16.66 1st Qu.: 47.00 1st Qu.: 7.002
## Median :24.24 Median : 62.00 Median :12.998
## Mean :23.66 Mean : 61.89 Mean :12.799
## 3rd Qu.:31.06 3rd Qu.: 77.00 3rd Qu.:16.998
## Max. :45.45 Max. :100.00 Max. :56.997
## rentas_de_no_registrados rentas_de_registrados rentas_totales
## Min. : 0.00 Min. : 0.0 Min. : 1.0
## 1st Qu.: 4.00 1st Qu.: 36.0 1st Qu.: 42.0
## Median : 17.00 Median :118.0 Median :145.0
## Mean : 36.02 Mean :155.6 Mean :191.6
## 3rd Qu.: 49.00 3rd Qu.:222.0 3rd Qu.:284.0
## Max. :367.00 Max. :886.0 Max. :977.0
Observaciones:
1. Los días llegan hasta el 19 y no hasta el 31.
2. ¿Cual es la relación de las estaciones?
R: 1 primavera, 2 verano, 3 otoño, 4 invierno. 3. ¿Cúal es la relación
de los días de la semana?
R: 1 domingo, 2 lunes, 3 martes…, 7 es sabado.
regresion <- lm(rentas_totales ~ hora + dia + mes + año + estacion + dia_de_la_semana + asueto + temperatura + sensacion_termica + humedad + velocidad_del_viento, data = db_bicis)
summary(regresion)##
## Call:
## lm(formula = rentas_totales ~ hora + dia + mes + año + estacion +
## dia_de_la_semana + asueto + temperatura + sensacion_termica +
## humedad + velocidad_del_viento, data = db_bicis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -305.52 -93.64 -27.70 61.85 649.10
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.661e+05 5.496e+03 -30.217 < 2e-16 ***
## hora 7.735e+00 2.070e-01 37.368 < 2e-16 ***
## dia 3.844e-01 2.482e-01 1.549 0.12150
## mes 9.996e+00 1.682e+00 5.943 2.89e-09 ***
## año 8.258e+01 2.732e+00 30.225 < 2e-16 ***
## estacion -7.774e+00 5.177e+00 -1.502 0.13324
## dia_de_la_semana 4.393e-01 6.918e-01 0.635 0.52545
## asueto -4.864e+00 8.365e+00 -0.582 0.56089
## temperatura 1.582e+00 1.038e+00 1.524 0.12752
## sensacion_termica 4.748e+00 9.552e-01 4.971 6.76e-07 ***
## humedad -2.115e+00 7.884e-02 -26.827 < 2e-16 ***
## velocidad_del_viento 5.582e-01 1.809e-01 3.086 0.00203 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 141.7 on 10874 degrees of freedom
## Multiple R-squared: 0.3891, Adjusted R-squared: 0.3885
## F-statistic: 629.6 on 11 and 10874 DF, p-value: < 2.2e-16
regresion <- lm(rentas_totales ~ hora + mes + año + sensacion_termica + humedad + velocidad_del_viento, data = db_bicis)
summary(regresion)##
## Call:
## lm(formula = rentas_totales ~ hora + mes + año + sensacion_termica +
## humedad + velocidad_del_viento, data = db_bicis)
##
## Residuals:
## Min 1Q Median 3Q Max
## -308.60 -93.85 -28.34 61.05 648.09
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.662e+05 5.496e+03 -30.250 < 2e-16 ***
## hora 7.734e+00 2.070e-01 37.364 < 2e-16 ***
## mes 7.574e+00 4.207e-01 18.002 < 2e-16 ***
## año 8.266e+01 2.732e+00 30.258 < 2e-16 ***
## sensacion_termica 6.172e+00 1.689e-01 36.539 < 2e-16 ***
## humedad -2.121e+00 7.858e-02 -26.988 < 2e-16 ***
## velocidad_del_viento 6.208e-01 1.771e-01 3.506 0.000457 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 141.7 on 10879 degrees of freedom
## Multiple R-squared: 0.3886, Adjusted R-squared: 0.3883
## F-statistic: 1153 on 6 and 10879 DF, p-value: < 2.2e-16
datos <- data.frame(hora=11.54, mes=1:12, año=2013, sensacion_termica= 23.66, humedad=61.89, velocidad_del_viento=12.799)
predict(regresion, datos)## 1 2 3 4 5 6 7 8
## 273.6001 281.1738 288.7475 296.3213 303.8950 311.4687 319.0424 326.6161
## 9 10 11 12
## 334.1898 341.7635 349.3372 356.9110
El modelo predictivo nos muestras las bicicletas rentadas por hora, por mes durante el proximo año que sería 2013, considerando las demás variables como promedio con una r cuadrada ajustada del 39%.
## Observation Dist_Taxi Dist_Market Dist_Hospital
## Min. : 1.0 Min. : 146 Min. : 1666 Min. : 3227
## 1st Qu.:237.0 1st Qu.: 6477 1st Qu.: 9367 1st Qu.:11302
## Median :469.0 Median : 8228 Median :11149 Median :13189
## Mean :468.4 Mean : 8235 Mean :11022 Mean :13091
## 3rd Qu.:700.0 3rd Qu.: 9939 3rd Qu.:12675 3rd Qu.:14855
## Max. :932.0 Max. :20662 Max. :20945 Max. :23294
##
## Carpet Builtup Parking City_Category
## Min. : 775 Min. : 932 Length:905 Length:905
## 1st Qu.: 1317 1st Qu.: 1579 Class :character Class :character
## Median : 1478 Median : 1774 Mode :character Mode :character
## Mean : 1511 Mean : 1794
## 3rd Qu.: 1654 3rd Qu.: 1985
## Max. :24300 Max. :12730
## NA's :7
## Rainfall House_Price
## Min. :-110.0 Min. : 1492000
## 1st Qu.: 600.0 1st Qu.: 4623000
## Median : 780.0 Median : 5860000
## Mean : 786.9 Mean : 6083992
## 3rd Qu.: 970.0 3rd Qu.: 7200000
## Max. :1560.0 Max. :150000000
##
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Parking n
## 1 Open 355
## 2 Not Provided 225
## 3 Covered 184
## 4 No Parking 141
## City_Category n
## 1 CAT B 351
## 2 CAT A 320
## 3 CAT C 234
Observaciones: 1. El precio maximo de casa esta con datos
atípicos.
2. Rainfall tiene valores negativos.
7. Carpet tiene 7 NA’s
## [1] 7
## Observation Dist_Taxi Dist_Market Dist_Hospital Carpet
## 0 0 0 0 7
## Builtup Parking City_Category Rainfall House_Price
## 0 0 0 0 0
#Eliminar NA's
db_casas <- na.omit(db_casas)
#Eliminar renglon
db_casas <- db_casas[db_casas$House_Price<15000000,]
boxplot(db_casas$House_Price, horizontal = TRUE)regresion_casas <- lm(House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital + Carpet + Builtup + Parking + City_Category + Rainfall, data = db_casas)
summary(regresion_casas)##
## Call:
## lm(formula = House_Price ~ Dist_Taxi + Dist_Market + Dist_Hospital +
## Carpet + Builtup + Parking + City_Category + Rainfall, data = db_casas)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3572286 -803711 -64861 759084 4399052
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.611e+06 3.681e+05 15.246 < 2e-16 ***
## Dist_Taxi 3.041e+01 2.684e+01 1.133 0.2575
## Dist_Market 1.248e+01 2.083e+01 0.599 0.5492
## Dist_Hospital 4.862e+01 3.009e+01 1.616 0.1065
## Carpet -7.734e+02 3.478e+03 -0.222 0.8241
## Builtup 1.315e+03 2.902e+03 0.453 0.6506
## ParkingNo Parking -6.046e+05 1.390e+05 -4.351 1.52e-05 ***
## ParkingNot Provided -4.898e+05 1.236e+05 -3.963 8.00e-05 ***
## ParkingOpen -2.635e+05 1.126e+05 -2.340 0.0195 *
## City_CategoryCAT B -1.875e+06 9.607e+04 -19.517 < 2e-16 ***
## City_CategoryCAT C -2.890e+06 1.059e+05 -27.291 < 2e-16 ***
## Rainfall -1.260e+02 1.558e+02 -0.809 0.4187
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1223000 on 883 degrees of freedom
## Multiple R-squared: 0.5005, Adjusted R-squared: 0.4943
## F-statistic: 80.43 on 11 and 883 DF, p-value: < 2.2e-16