Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1287) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 17 | 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.0 | 182 | 5400 | 16 | 22 | 41315.00 |
| 18 | 18 | 0 | 110.0 | 197.0 | 70.9 | 56.3 | 3505 | 209 | 3.62 | 3.39 | 8.0 | 182 | 5400 | 15 | 20 | 36880.00 |
| 19 | 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.03 | 9.5 | 48 | 5100 | 47 | 53 | 5151.00 |
| 23 | 23 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6377.00 |
| 34 | 34 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1940 | 92 | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 6529.00 |
| 35 | 35 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | 92 | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7129.00 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7828.6 -1646.5 -178.5 1421.8 10783.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.701e+04 1.634e+04 -2.264 0.024992 *
## symboling 1.428e+02 2.644e+02 0.540 0.589957
## wheelbase 1.559e+02 1.209e+02 1.290 0.199161
## carlength -9.624e+01 6.206e+01 -1.551 0.123075
## carwidth 3.593e+02 2.611e+02 1.376 0.170891
## carheight 1.359e+02 1.514e+02 0.897 0.370945
## curbweight 2.437e+00 1.868e+00 1.304 0.194090
## enginesize 8.290e+01 1.741e+01 4.762 4.48e-06 ***
## boreratio -8.236e+02 1.403e+03 -0.587 0.557995
## stroke -2.949e+03 9.422e+02 -3.130 0.002104 **
## compressionratio 3.448e+02 9.076e+01 3.799 0.000211 ***
## horsepower 4.333e+01 1.783e+01 2.430 0.016261 *
## peakrpm 1.914e+00 6.914e-01 2.768 0.006351 **
## citympg -4.346e+02 1.920e+02 -2.264 0.025039 *
## highwaympg 2.935e+02 1.726e+02 1.701 0.091041 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3044 on 150 degrees of freedom
## Multiple R-squared: 0.8287, Adjusted R-squared: 0.8127
## F-statistic: 51.83 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 5 6 10 17 18
## 13154.98323 15418.41343 14843.61184 17888.19716 25237.12169 27525.90355
## 19 23 34 35 37 52
## -29.77325 6885.68947 8240.37857 8279.36436 8785.78000 6443.97860
## 55 57 61 64 67 72
## 5783.56735 10083.34368 9799.96474 12176.76997 14491.80520 28331.72096
## 73 74 75 79 81 84
## 26747.05717 35263.05836 34023.51026 7721.73181 11023.01438 15108.96244
## 85 87 88 91 112 135
## 15121.14550 9918.19073 11174.48645 7284.30155 17628.02063 17971.03303
## 136 144 145 148 149 152
## 14180.32925 11333.54557 9185.84685 11647.26804 10791.62709 6845.26216
## 155 188 197 204
## 6855.47153 9947.35284 15354.55375 19673.99864
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 13154.98323 |
| 5 | 17450.00 | 15418.41343 |
| 6 | 15250.00 | 14843.61184 |
| 10 | 17859.17 | 17888.19716 |
| 17 | 41315.00 | 25237.12169 |
| 18 | 36880.00 | 27525.90355 |
| 19 | 5151.00 | -29.77325 |
| 23 | 6377.00 | 6885.68947 |
| 34 | 6529.00 | 8240.37857 |
| 35 | 7129.00 | 8279.36436 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 4238.995
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 8111414000 12800.930
## 2) enginesize< 182 153 3246984000 11300.870
## 4) curbweight< 2659.5 102 626414700 8745.314
## 8) curbweight< 2368.5 73 162947200 7665.808 *
## 9) curbweight>=2368.5 29 164258400 11462.690
## 18) highwaympg>=29.5 19 31052900 10241.050 *
## 19) highwaympg< 29.5 10 50974420 13783.800 *
## 5) curbweight>=2659.5 51 622124700 16411.980
## 10) carwidth< 68.6 44 472076400 15829.680
## 20) horsepower< 118 19 130055000 13959.370 *
## 21) horsepower>=118 25 225045900 17251.120
## 42) stroke>=3.24 14 62617410 15396.210 *
## 43) stroke< 3.24 11 52952420 19611.910 *
## 11) carwidth>=68.6 7 41351590 20072.140 *
## 3) enginesize>=182 12 130607100 31926.710 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 5 6 10 17 18 19 23
## 13783.800 13959.368 13783.800 15396.214 31926.708 31926.708 7665.808 7665.808
## 34 35 37 52 55 57 61 64
## 7665.808 7665.808 7665.808 7665.808 7665.808 13783.800 10241.053 10241.053
## 67 72 73 74 75 79 81 84
## 13959.368 31926.708 31926.708 31926.708 31926.708 7665.808 10241.053 15396.214
## 85 87 88 91 112 135 136 144
## 15396.214 10241.053 10241.053 7665.808 13959.368 13959.368 13959.368 7665.808
## 145 148 149 152 155 188 197 204
## 13783.800 10241.053 13783.800 7665.808 7665.808 7665.808 13959.368 20072.143
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 13783.800 |
| 5 | 17450.00 | 13959.368 |
| 6 | 15250.00 | 13783.800 |
| 10 | 17859.17 | 15396.214 |
| 17 | 41315.00 | 31926.708 |
| 18 | 36880.00 | 31926.708 |
| 19 | 5151.00 | 7665.808 |
| 23 | 6377.00 | 7665.808 |
| 34 | 6529.00 | 7665.808 |
| 35 | 7129.00 | 7665.808 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3667.028
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 4881554
## % Var explained: 90.07
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## curbweight 20390050.48 1844550564
## enginesize 17009565.46 1603303285
## horsepower 10328358.10 1189505639
## carwidth 5000360.07 807887095
## highwaympg 7891834.94 778114848
## citympg 6193095.81 720140175
## wheelbase 1858632.66 196482161
## boreratio 902260.25 113738498
## carlength 3869503.72 102600237
## compressionratio 1696751.70 101075913
## carheight 511948.13 78603614
## peakrpm 1357346.52 78078952
## stroke 418080.06 47877524
## symboling -62248.61 20877590
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 5 6 10 17 18 19 23
## 15984.163 15949.189 13528.097 18675.307 29960.336 31058.242 6219.055 6185.613
## 34 35 37 52 55 57 61 64
## 6781.833 6838.408 7027.427 6688.632 6610.649 12843.224 10620.718 10120.373
## 67 72 73 74 75 79 81 84
## 11994.325 29222.729 29347.112 31244.392 31399.692 6552.449 10083.815 13132.334
## 85 87 88 91 112 135 136 144
## 13132.334 8625.072 9710.112 7247.449 15642.488 13312.723 13569.311 9959.631
## 145 148 149 152 155 188 197 204
## 11519.286 10024.999 10293.848 6370.798 8406.746 8681.918 14047.858 16317.052
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 15984.163 |
| 5 | 17450.00 | 15949.189 |
| 6 | 15250.00 | 13528.097 |
| 10 | 17859.17 | 18675.307 |
| 17 | 41315.00 | 29960.336 |
| 18 | 36880.00 | 31058.242 |
| 19 | 5151.00 | 6219.055 |
| 23 | 6377.00 | 6185.613 |
| 34 | 6529.00 | 6781.833 |
| 35 | 7129.00 | 6838.408 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 3988.078
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 | 13154.98323 | 13783.800 | 15984.163 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 | 15418.41343 | 13959.368 | 15949.189 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 | 14843.61184 | 13783.800 | 13528.097 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.400 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 | 17888.19716 | 15396.214 | 18675.307 |
| 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.390 | 8.0 | 182 | 5400 | 16 | 22 | 41315.00 | 25237.12169 | 31926.708 | 29960.336 |
| 18 | 0 | 110.0 | 197.0 | 70.9 | 56.3 | 3505 | 209 | 3.62 | 3.390 | 8.0 | 182 | 5400 | 15 | 20 | 36880.00 | 27525.90355 | 31926.708 | 31058.242 |
| 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.030 | 9.5 | 48 | 5100 | 47 | 53 | 5151.00 | -29.77325 | 7665.808 | 6219.055 |
| 23 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6377.00 | 6885.68947 | 7665.808 | 6185.613 |
| 34 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1940 | 92 | 2.91 | 3.410 | 9.2 | 76 | 6000 | 30 | 34 | 6529.00 | 8240.37857 | 7665.808 | 6781.833 |
| 35 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | 92 | 2.91 | 3.410 | 9.2 | 76 | 6000 | 30 | 34 | 7129.00 | 8279.36436 | 7665.808 | 6838.408 |
| 37 | 0 | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | 92 | 2.92 | 3.410 | 9.2 | 76 | 6000 | 30 | 34 | 7295.00 | 8785.78000 | 7665.808 | 7027.427 |
| 52 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095.00 | 6443.97860 | 7665.808 | 6688.632 |
| 55 | 1 | 93.1 | 166.8 | 64.2 | 54.1 | 1950 | 91 | 3.08 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 7395.00 | 5783.56735 | 7665.808 | 6610.649 |
| 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845.00 | 10083.34368 | 13783.800 | 12843.224 |
| 61 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2410 | 122 | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 8495.00 | 9799.96474 | 10241.053 | 10620.718 |
| 64 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2443 | 122 | 3.39 | 3.390 | 22.7 | 64 | 4650 | 36 | 42 | 10795.00 | 12176.76997 | 10241.053 | 10120.372 |
| 67 | 0 | 104.9 | 175.0 | 66.1 | 54.4 | 2700 | 134 | 3.43 | 3.640 | 22.0 | 72 | 4200 | 31 | 39 | 18344.00 | 14491.80520 | 13959.368 | 11994.325 |
| 72 | -1 | 115.6 | 202.6 | 71.7 | 56.5 | 3740 | 234 | 3.46 | 3.100 | 8.3 | 155 | 4750 | 16 | 18 | 34184.00 | 28331.72096 | 31926.708 | 29222.729 |
| 73 | 3 | 96.6 | 180.3 | 70.5 | 50.8 | 3685 | 234 | 3.46 | 3.100 | 8.3 | 155 | 4750 | 16 | 18 | 35056.00 | 26747.05717 | 31926.708 | 29347.112 |
| 74 | 0 | 120.9 | 208.1 | 71.7 | 56.7 | 3900 | 308 | 3.80 | 3.350 | 8.0 | 184 | 4500 | 14 | 16 | 40960.00 | 35263.05836 | 31926.708 | 31244.392 |
| 75 | 1 | 112.0 | 199.2 | 72.0 | 55.4 | 3715 | 304 | 3.80 | 3.350 | 8.0 | 184 | 4500 | 14 | 16 | 45400.00 | 34023.51026 | 31926.708 | 31399.692 |
| 79 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 2004 | 92 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6669.00 | 7721.73182 | 7665.808 | 6552.449 |
| 81 | 3 | 96.3 | 173.0 | 65.4 | 49.4 | 2370 | 110 | 3.17 | 3.460 | 7.5 | 116 | 5500 | 23 | 30 | 9959.00 | 11023.01438 | 10241.053 | 10083.815 |
| 84 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | 156 | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 14869.00 | 15108.96244 | 15396.214 | 13132.334 |
| 85 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2926 | 156 | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 14489.00 | 15121.14550 | 15396.214 | 13132.334 |
| 87 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2405 | 122 | 3.35 | 3.460 | 8.5 | 88 | 5000 | 25 | 32 | 8189.00 | 9918.19073 | 10241.053 | 8625.073 |
| 88 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.460 | 7.5 | 116 | 5500 | 23 | 30 | 9279.00 | 11174.48645 | 10241.053 | 9710.112 |
| 91 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 2017 | 103 | 2.99 | 3.470 | 21.9 | 55 | 4800 | 45 | 50 | 7099.00 | 7284.30155 | 7665.808 | 7247.449 |
| 112 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | 120 | 3.46 | 2.190 | 8.4 | 95 | 5000 | 19 | 24 | 15580.00 | 17628.02063 | 13959.368 | 15642.488 |
| 135 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | 121 | 2.54 | 2.070 | 9.3 | 110 | 5250 | 21 | 28 | 15040.00 | 17971.03303 | 13959.368 | 13312.723 |
| 136 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | 121 | 3.54 | 3.070 | 9.3 | 110 | 5250 | 21 | 28 | 15510.00 | 14180.32925 | 13959.368 | 13569.311 |
| 144 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2340 | 108 | 3.62 | 2.640 | 9.0 | 94 | 5200 | 26 | 32 | 9960.00 | 11333.54557 | 7665.808 | 9959.631 |
| 145 | 0 | 97.0 | 172.0 | 65.4 | 54.3 | 2385 | 108 | 3.62 | 2.640 | 9.0 | 82 | 4800 | 24 | 25 | 9233.00 | 9185.84685 | 13783.800 | 11519.286 |
| 148 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | 108 | 3.62 | 2.640 | 9.0 | 94 | 5200 | 25 | 31 | 10198.00 | 11647.26804 | 10241.053 | 10024.999 |
| 149 | 0 | 96.9 | 173.6 | 65.4 | 54.9 | 2420 | 108 | 3.62 | 2.640 | 9.0 | 82 | 4800 | 23 | 29 | 8013.00 | 10791.62709 | 13783.800 | 10293.848 |
| 152 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 2040 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 38 | 6338.00 | 6845.26216 | 7665.808 | 6370.798 |
| 155 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898.00 | 6855.47153 | 7665.808 | 8406.746 |
| 188 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | 97 | 3.01 | 3.400 | 23.0 | 68 | 4500 | 37 | 42 | 9495.00 | 9947.35284 | 7665.808 | 8681.917 |
| 197 | -2 | 104.3 | 188.8 | 67.2 | 56.2 | 2935 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 24 | 28 | 15985.00 | 15354.55375 | 13959.368 | 14047.858 |
| 204 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | 145 | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470.00 | 19673.99864 | 20072.143 | 16317.052 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 4238.995 | 3667.028 | 3988.078 |
En base a los datos obtenidos del precio de los automóviles se obtiene que el mejor modelo según el estadístico raiz del error cuadrático medio fue el de Modelo de árbol de regresión con un valor de 3667.028 de diferencia en promedio.