Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
## Loading required package: xts
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
##
## Attaching package: 'PerformanceAnalytics'
## The following object is masked from 'package:graphics':
##
## legend
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
##
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
##
## group_rows
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(caret) # Para particionar
## Loading required package: lattice
library(Metrics) # Para determinar rmse
##
## Attaching package: 'Metrics'
## The following objects are masked from 'package:caret':
##
## precision, recall
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
##
## margin
## The following object is masked from 'package:dplyr':
##
## combine
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
##
## Attaching package: 'reshape'
## The following object is masked from 'package:plotly':
##
## rename
## The following object is masked from 'package:dplyr':
##
## rename
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
n <- nrow(datos)
set.seed(1550) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 21105.00 |
| 17 | 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.0 | 182 | 5400 | 16 | 22 | 41315.00 |
| 23 | 23 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6377.00 |
| 25 | 25 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6229.00 |
| 34 | 34 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1940 | 92 | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 6529.00 |
| 35 | 35 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | 92 | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7129.00 |
| 43 | 43 | 1 | 96.5 | 169.1 | 66.0 | 51.0 | 2293 | 110 | 3.15 | 3.58 | 9.1 | 100 | 5500 | 25 | 31 | 10345.00 |
| 46 | 46 | 0 | 94.5 | 155.9 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.11 | 9.6 | 70 | 5400 | 38 | 43 | 8916.50 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9389.1 -1617.2 -152.2 1825.8 8804.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.298e+04 1.698e+04 -2.531 0.012403 *
## symboling 2.790e+02 2.711e+02 1.029 0.305051
## wheelbase 1.783e+02 1.155e+02 1.543 0.124831
## carlength -1.113e+02 5.996e+01 -1.856 0.065472 .
## carwidth 4.493e+02 2.560e+02 1.755 0.081305 .
## carheight 1.157e+02 1.511e+02 0.765 0.445276
## curbweight 1.598e+00 1.756e+00 0.910 0.364275
## enginesize 1.318e+02 1.462e+01 9.014 8.63e-16 ***
## boreratio -1.116e+03 1.258e+03 -0.887 0.376365
## stroke -3.590e+03 8.749e+02 -4.103 6.68e-05 ***
## compressionratio 3.293e+02 8.454e+01 3.895 0.000147 ***
## horsepower 2.671e+01 1.681e+01 1.588 0.114290
## peakrpm 2.476e+00 6.806e-01 3.638 0.000377 ***
## citympg -2.888e+02 1.815e+02 -1.591 0.113629
## highwaympg 1.684e+02 1.611e+02 1.045 0.297669
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3006 on 150 degrees of freedom
## Multiple R-squared: 0.8748, Adjusted R-squared: 0.8631
## F-statistic: 74.86 on 14 and 150 DF, p-value: < 2.2e-16
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8631 significa que las variables independientes explican aproximadamente el 86.31% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 9 10 14 17 23 25 34 35
## 18317.600 17183.967 17047.429 27321.071 6972.330 7094.584 8870.448 8896.011
## 43 46 50 54 62 64 80 86
## 10325.581 6149.177 48826.395 5462.278 10001.985 12557.491 8680.409 10039.284
## 100 101 103 108 114 118 125 131
## 10230.220 10195.071 23002.193 14397.543 18122.931 18586.806 15595.630 10356.715
## 132 133 136 137 142 143 154 171
## 10748.887 14010.459 13887.894 16343.315 9008.531 8571.675 5884.568 14094.678
## 172 179 184 186 188 189 191 201
## 14150.596 22343.834 9987.443 9992.236 9540.712 11433.798 9895.001 17955.732
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 9 | 23875.00 | 18317.600 |
| 10 | 17859.17 | 17183.967 |
| 14 | 21105.00 | 17047.429 |
| 17 | 41315.00 | 27321.071 |
| 23 | 6377.00 | 6972.330 |
| 25 | 6229.00 | 7094.584 |
| 34 | 6529.00 | 8870.448 |
| 35 | 7129.00 | 8896.011 |
| 43 | 10345.00 | 10325.581 |
| 46 | 8916.50 | 6149.177 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3905.862
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10825250000 13409.500
## 2) enginesize< 182 149 3147788000 11227.960
## 4) curbweight< 2544 96 564450600 8511.505
## 8) horsepower< 83 56 70852890 7098.795 *
## 9) horsepower>=83 40 225368800 10489.300 *
## 5) curbweight>=2544 53 591803700 16148.340 *
## 3) enginesize>=182 16 364824800 33725.030 *
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 9 10 14 17 23 25 34 35
## 16148.340 16148.340 16148.340 33725.031 7098.795 7098.795 7098.795 7098.795
## 43 46 50 54 62 64 80 86
## 10489.300 7098.795 33725.031 7098.795 10489.300 7098.795 10489.300 10489.300
## 100 101 103 108 114 118 125 131
## 10489.300 10489.300 16148.340 16148.340 16148.340 16148.340 16148.340 16148.340
## 132 133 136 137 142 143 154 171
## 10489.300 16148.340 16148.340 16148.340 7098.795 7098.795 7098.795 16148.340
## 172 179 184 186 188 189 191 201
## 16148.340 16148.340 10489.300 10489.300 7098.795 10489.300 10489.300 16148.340
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
comparaciones
## precio_real precio_predicciones
## 9 23875.00 16148.340
## 10 17859.17 16148.340
## 14 21105.00 16148.340
## 17 41315.00 33725.031
## 23 6377.00 7098.795
## 25 6229.00 7098.795
## 34 6529.00 7098.795
## 35 7129.00 7098.795
## 43 10345.00 10489.300
## 46 8916.50 7098.795
## 50 36000.00 33725.031
## 54 6695.00 7098.795
## 62 10595.00 10489.300
## 64 10795.00 7098.795
## 80 7689.00 10489.300
## 86 6989.00 10489.300
## 100 8949.00 10489.300
## 101 9549.00 10489.300
## 103 14399.00 16148.340
## 108 11900.00 16148.340
## 114 16695.00 16148.340
## 118 18150.00 16148.340
## 125 12764.00 16148.340
## 131 9295.00 16148.340
## 132 9895.00 10489.300
## 133 11850.00 16148.340
## 136 15510.00 16148.340
## 137 18150.00 16148.340
## 142 7126.00 7098.795
## 143 7775.00 7098.795
## 154 6918.00 7098.795
## 171 11199.00 16148.340
## 172 11549.00 16148.340
## 179 16558.00 16148.340
## 184 7975.00 10489.300
## 186 8195.00 10489.300
## 188 9495.00 7098.795
## 189 9995.00 10489.300
## 191 9980.00 10489.300
## 201 16845.00 16148.340
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 9 | 23875.00 | 16148.340 |
| 10 | 17859.17 | 16148.340 |
| 14 | 21105.00 | 16148.340 |
| 17 | 41315.00 | 33725.031 |
| 23 | 6377.00 | 7098.795 |
| 25 | 6229.00 | 7098.795 |
| 34 | 6529.00 | 7098.795 |
| 35 | 7129.00 | 7098.795 |
| 43 | 10345.00 | 10489.300 |
| 46 | 8916.50 | 7098.795 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3015.785
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 6550318
## % Var explained: 90.02
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## curbweight 25400976.23 3102680945
## enginesize 21940748.55 2123405491
## horsepower 10922012.68 1870139587
## citympg 11467944.21 1188181389
## wheelbase 3355907.31 371131971
## carwidth 4336736.33 338290517
## carlength 2439267.55 301153955
## peakrpm 2398956.81 226756639
## highwaympg 4698141.92 195388095
## compressionratio 2096614.59 127380376
## boreratio -57053.15 67403968
## stroke 318685.30 47237413
## carheight 444491.11 37384698
## symboling 64593.47 29340811
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 9 10 14 17 23 25 34 35
## 20922.131 20297.400 18917.074 33424.783 6164.989 6576.668 6565.689 6746.194
## 43 46 50 54 62 64 80 86
## 9272.921 6975.430 37715.650 6604.697 9351.841 9908.840 8031.698 8658.757
## 100 101 103 108 114 118 125 131
## 9136.484 9136.484 18727.855 15535.743 18362.303 16127.839 13568.976 11820.068
## 132 133 136 137 142 143 154 171
## 10776.856 13124.008 14994.968 16930.377 7981.490 8230.122 7188.639 10344.724
## 172 179 184 186 188 189 191 201
## 13102.698 16353.359 8438.955 8438.955 8331.664 9409.574 9285.293 20622.791
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 9 | 23875.00 | 20922.131 |
| 10 | 17859.17 | 20297.400 |
| 14 | 21105.00 | 18917.074 |
| 17 | 41315.00 | 33424.783 |
| 23 | 6377.00 | 6164.989 |
| 25 | 6229.00 | 6576.668 |
| 34 | 6529.00 | 6565.689 |
| 35 | 7129.00 | 6746.194 |
| 43 | 10345.00 | 9272.921 |
| 46 | 8916.50 | 6975.430 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2038.738
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.30 | 140 | 5500 | 17 | 20 | 23875.00 | 18317.600 | 16148.340 | 20922.131 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.00 | 160 | 5500 | 16 | 22 | 17859.17 | 17183.967 | 16148.340 | 20297.400 |
| 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105.00 | 17047.429 | 16148.340 | 18917.074 |
| 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.00 | 182 | 5400 | 16 | 22 | 41315.00 | 27321.071 | 33725.031 | 33424.783 |
| 23 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6377.00 | 6972.330 | 7098.795 | 6164.989 |
| 25 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6229.00 | 7094.584 | 7098.795 | 6576.668 |
| 34 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1940 | 92 | 2.91 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 6529.00 | 8870.448 | 7098.795 | 6565.689 |
| 35 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | 92 | 2.91 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 7129.00 | 8896.011 | 7098.795 | 6746.194 |
| 43 | 1 | 96.5 | 169.1 | 66.0 | 51.0 | 2293 | 110 | 3.15 | 3.58 | 9.10 | 100 | 5500 | 25 | 31 | 10345.00 | 10325.581 | 10489.300 | 9272.921 |
| 46 | 0 | 94.5 | 155.9 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 8916.50 | 6149.177 | 7098.795 | 6975.430 |
| 50 | 0 | 102.0 | 191.7 | 70.6 | 47.8 | 3950 | 326 | 3.54 | 2.76 | 11.50 | 262 | 5000 | 13 | 17 | 36000.00 | 48826.395 | 33725.031 | 37715.650 |
| 54 | 1 | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | 91 | 3.03 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 6695.00 | 5462.278 | 7098.795 | 6604.697 |
| 62 | 1 | 98.8 | 177.8 | 66.5 | 53.7 | 2385 | 122 | 3.39 | 3.39 | 8.60 | 84 | 4800 | 26 | 32 | 10595.00 | 10001.985 | 10489.300 | 9351.841 |
| 64 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2443 | 122 | 3.39 | 3.39 | 22.70 | 64 | 4650 | 36 | 42 | 10795.00 | 12557.491 | 7098.795 | 9908.840 |
| 80 | 1 | 93.0 | 157.3 | 63.8 | 50.8 | 2145 | 98 | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 7689.00 | 8680.409 | 10489.300 | 8031.698 |
| 86 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2365 | 122 | 3.35 | 3.46 | 8.50 | 88 | 5000 | 25 | 32 | 6989.00 | 10039.284 | 10489.300 | 8658.757 |
| 100 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2324 | 120 | 3.33 | 3.47 | 8.50 | 97 | 5200 | 27 | 34 | 8949.00 | 10230.220 | 10489.300 | 9136.484 |
| 101 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2302 | 120 | 3.33 | 3.47 | 8.50 | 97 | 5200 | 27 | 34 | 9549.00 | 10195.071 | 10489.300 | 9136.484 |
| 103 | 0 | 100.4 | 184.6 | 66.5 | 56.1 | 3296 | 181 | 3.43 | 3.27 | 9.00 | 152 | 5200 | 17 | 22 | 14399.00 | 23002.193 | 16148.340 | 18727.855 |
| 108 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3020 | 120 | 3.46 | 3.19 | 8.40 | 97 | 5000 | 19 | 24 | 11900.00 | 14397.543 | 16148.340 | 15535.743 |
| 114 | 0 | 114.2 | 198.9 | 68.4 | 56.7 | 3285 | 120 | 3.46 | 2.19 | 8.40 | 95 | 5000 | 19 | 24 | 16695.00 | 18122.931 | 16148.340 | 18362.303 |
| 118 | 0 | 108.0 | 186.7 | 68.3 | 56.0 | 3130 | 134 | 3.61 | 3.21 | 7.00 | 142 | 5600 | 18 | 24 | 18150.00 | 18586.806 | 16148.340 | 16127.839 |
| 125 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2818 | 156 | 3.59 | 3.86 | 7.00 | 145 | 5000 | 19 | 24 | 12764.00 | 15595.630 | 16148.340 | 13568.976 |
| 131 | 0 | 96.1 | 181.5 | 66.5 | 55.2 | 2579 | 132 | 3.46 | 3.90 | 8.70 | 90 | 5100 | 23 | 31 | 9295.00 | 10356.715 | 16148.340 | 11820.068 |
| 132 | 2 | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | 132 | 3.46 | 3.90 | 8.70 | 90 | 5100 | 23 | 31 | 9895.00 | 10748.887 | 10489.300 | 10776.856 |
| 133 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | 121 | 3.54 | 3.07 | 9.31 | 110 | 5250 | 21 | 28 | 11850.00 | 14010.459 | 16148.340 | 13124.008 |
| 136 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | 121 | 3.54 | 3.07 | 9.30 | 110 | 5250 | 21 | 28 | 15510.00 | 13887.894 | 16148.340 | 14994.968 |
| 137 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2808 | 121 | 3.54 | 3.07 | 9.00 | 160 | 5500 | 19 | 26 | 18150.00 | 16343.315 | 16148.340 | 16930.377 |
| 142 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2145 | 108 | 3.62 | 2.64 | 9.50 | 82 | 4800 | 32 | 37 | 7126.00 | 9008.531 | 7098.795 | 7981.490 |
| 143 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2190 | 108 | 3.62 | 2.64 | 9.50 | 82 | 4400 | 28 | 33 | 7775.00 | 8571.675 | 7098.795 | 8230.122 |
| 154 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2280 | 92 | 3.05 | 3.03 | 9.00 | 62 | 4800 | 31 | 37 | 6918.00 | 5884.568 | 7098.795 | 7188.639 |
| 171 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2679 | 146 | 3.62 | 3.50 | 9.30 | 116 | 4800 | 24 | 30 | 11199.00 | 14094.678 | 16148.340 | 10344.724 |
| 172 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2714 | 146 | 3.62 | 3.50 | 9.30 | 116 | 4800 | 24 | 30 | 11549.00 | 14150.596 | 16148.340 | 13102.698 |
| 179 | 3 | 102.9 | 183.5 | 67.7 | 52.0 | 2976 | 171 | 3.27 | 3.35 | 9.30 | 161 | 5200 | 20 | 24 | 16558.00 | 22343.834 | 16148.340 | 16353.359 |
| 184 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2209 | 109 | 3.19 | 3.40 | 9.00 | 85 | 5250 | 27 | 34 | 7975.00 | 9987.443 | 10489.300 | 8438.955 |
| 186 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2212 | 109 | 3.19 | 3.40 | 9.00 | 85 | 5250 | 27 | 34 | 8195.00 | 9992.236 | 10489.300 | 8438.955 |
| 188 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | 97 | 3.01 | 3.40 | 23.00 | 68 | 4500 | 37 | 42 | 9495.00 | 9540.712 | 7098.795 | 8331.664 |
| 189 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2300 | 109 | 3.19 | 3.40 | 10.00 | 100 | 5500 | 26 | 32 | 9995.00 | 11433.798 | 10489.300 | 9409.574 |
| 191 | 3 | 94.5 | 165.7 | 64.0 | 51.4 | 2221 | 109 | 3.19 | 3.40 | 8.50 | 90 | 5500 | 24 | 29 | 9980.00 | 9895.001 | 10489.300 | 9285.293 |
| 201 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 2952 | 141 | 3.78 | 3.15 | 9.50 | 114 | 5400 | 23 | 28 | 16845.00 | 17955.732 | 16148.340 | 20622.791 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3905.862 | 3015.785 | 2038.738 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas. Se registraron un total de 205 observaciones, de las cuales se usó un 80% de los datos específicamente para la construcción del modelo de entrenamiento, mientras que el 20% restante se usará para la validación del mismo.
Este ejercicio realizará la construcción de 3 modelos estádisticos:
Modelo de regresión lineal múltiple (RM)
Modelo de árbol de regresión (AR)
Modelo de bosques aleatorios (RF)
El modelo de bosques aleatorios resultó ser el modelo más confiable según la semilla 1550. Ya que este obtuvo un RMSE de 2038.738, mientras que los modelos de árbol de regresión y de regresión lineal múltiple, obtuvieron un total de 3015.785 y 3905.862 de RMSE, respectivamente.
Del modelo de regresión líneal múltiple, se obtiene un valor de Adjusted R-Squared del 0.8631, osease, un porcentaje del 86.31% de las variables que dependen de la variable dependiente, que en este caso es: dinero.
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, highwaympg, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
rpart.plot(modelo_ar)
Según el anterior modelo de árbol, la variable predominante es la de enginesize. Según lo indica la gráfica, sólo el 10% de los datos, los cuales tienen un precio de alrededor de 34,000, cumplen el requisito de que el tamaño de su motor es mayor a las 182 unidades. El siguiente punto de inflexión es el curbweight, el cual determina que si este valor es mayor a 2544, entonces el precio rondará los 16,000%, en caso de que sea lo contrario, el precio bajará a un promedio de 8512$.