Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1264) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
| 15 | 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 20 | 25 | 24565.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
| 27 | 27 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | 90 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 7609 |
| 38 | 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 |
| 52 | 52 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095 |
| 53 | 53 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 |
| 56 | 56 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10416.7 -1817.3 -138.2 1372.4 14132.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -6.076e+04 1.873e+04 -3.245 0.001450 **
## symboling 4.248e+02 2.863e+02 1.484 0.139974
## wheelbase 1.181e+02 1.235e+02 0.956 0.340707
## carlength -6.694e+01 6.546e+01 -1.023 0.308151
## carwidth 4.693e+02 2.881e+02 1.629 0.105417
## carheight 2.444e+02 1.659e+02 1.473 0.142726
## curbweight 1.848e+00 1.915e+00 0.965 0.336006
## enginesize 1.208e+02 1.537e+01 7.859 6.94e-13 ***
## boreratio -6.100e+02 1.470e+03 -0.415 0.678727
## stroke -2.943e+03 8.639e+02 -3.406 0.000845 ***
## compressionratio 2.443e+02 9.618e+01 2.540 0.012108 *
## horsepower 3.420e+01 1.887e+01 1.813 0.071850 .
## peakrpm 3.048e+00 8.467e-01 3.600 0.000432 ***
## citympg -1.049e+02 2.122e+02 -0.494 0.621834
## highwaympg 6.544e+01 1.889e+02 0.346 0.729551
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3325 on 150 degrees of freedom
## Multiple R-squared: 0.8587, Adjusted R-squared: 0.8456
## F-statistic: 65.14 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 3 7 12 14 27 38 52
## 12892.721 17053.212 18356.677 12900.714 15468.607 6358.799 10155.460 5744.025
## 53 56 60 67 71 90 97 98
## 5753.265 8415.227 9832.958 12538.073 24532.354 6299.510 6451.038 6000.612
## 113 115 116 121 132 136 141 148
## 17444.981 18082.790 14307.297 6318.145 10466.844 14403.976 8673.875 10999.804
## 152 154 155 161 163 165 166 176
## 5799.579 6140.720 6251.609 5742.111 5976.541 5875.798 12818.868 7629.874
## 178 180 181 190 195 197 199 204
## 7711.182 22441.972 20505.941 11406.537 16275.453 16213.051 15765.956 19489.740
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 12892.721 |
| 3 | 16500 | 17053.212 |
| 7 | 17710 | 18356.677 |
| 12 | 16925 | 12900.714 |
| 14 | 21105 | 15468.607 |
| 27 | 7609 | 6358.799 |
| 38 | 7895 | 10155.460 |
| 52 | 6095 | 5744.025 |
| 53 | 6795 | 5753.265 |
| 56 | 10945 | 8415.227 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2783.952
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11742610000 13495.790
## 2) enginesize< 182 148 3021864000 11091.150
## 4) highwaympg>=28.5 99 585475300 8595.414
## 8) carlength< 175.5 78 158367900 7738.115 *
## 9) carlength>=175.5 21 156851600 11779.670 *
## 5) highwaympg< 28.5 49 573881700 16133.550
## 10) horsepower< 112.5 17 90563670 13697.760 *
## 11) horsepower>=112.5 32 328872800 17427.570 *
## 3) enginesize>=182 17 414635700 34430.320 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 3 7 12 14 27 38 52
## 13697.765 17427.568 13697.765 11779.667 17427.568 7738.115 7738.115 7738.115
## 53 56 60 67 71 90 97 98
## 7738.115 13697.765 11779.667 7738.115 34430.324 7738.115 7738.115 7738.115
## 113 115 116 121 132 136 141 148
## 11779.667 13697.765 13697.765 7738.115 11779.667 13697.765 7738.115 7738.115
## 152 154 155 161 163 165 166 176
## 7738.115 7738.115 7738.115 7738.115 7738.115 7738.115 7738.115 11779.667
## 178 180 181 190 195 197 199 204
## 11779.667 17427.568 17427.568 7738.115 17427.568 17427.568 17427.568 13697.765
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 13697.765 |
| 3 | 16500 | 17427.568 |
| 7 | 17710 | 13697.765 |
| 12 | 16925 | 11779.667 |
| 14 | 21105 | 17427.568 |
| 27 | 7609 | 7738.115 |
| 38 | 7895 | 7738.115 |
| 52 | 6095 | 7738.115 |
| 53 | 6795 | 7738.115 |
| 56 | 10945 | 13697.765 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3142.833
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 6635474
## % Var explained: 90.68
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 23921240.07 3507396350
## horsepower 18313320.17 2657616960
## curbweight 14335627.60 1797843267
## citympg 7116500.88 1189749373
## highwaympg 5459460.04 811979254
## carlength 4010243.94 628281658
## carwidth 3148396.16 479399757
## wheelbase 1538372.87 178594147
## compressionratio 797461.97 118850390
## boreratio 1603934.55 116466615
## peakrpm 607629.70 109481549
## stroke 430780.69 48872605
## carheight 29414.56 41571686
## symboling 45891.12 11295768
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 3 7 12 14 27 38 52
## 14419.054 15605.947 19349.443 13754.462 18108.969 6434.230 8636.564 5722.184
## 53 56 60 67 71 90 97 98
## 5787.699 12885.283 10380.855 11843.724 28543.429 6862.742 7014.742 7510.330
## 113 115 116 121 132 136 141 148
## 15318.631 17242.612 14549.593 6391.582 10015.792 14920.195 7510.656 10294.958
## 152 154 155 161 163 165 166 176
## 6461.817 7601.928 7968.417 7305.898 7848.658 8034.628 9880.385 10451.422
## 178 180 181 190 195 197 199 204
## 10523.276 16448.013 16855.416 8583.847 16102.935 16440.660 20138.308 17538.939
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 14419.054 |
| 3 | 16500 | 15605.947 |
| 7 | 17710 | 19349.443 |
| 12 | 16925 | 13754.462 |
| 14 | 21105 | 18108.969 |
| 27 | 7609 | 6434.230 |
| 38 | 7895 | 8636.564 |
| 52 | 6095 | 5722.184 |
| 53 | 6795 | 5787.699 |
| 56 | 10945 | 12885.283 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1908.263
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 | 12892.721 | 13697.765 | 14419.054 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 | 17053.212 | 17427.568 | 15605.947 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 17710 | 18356.677 | 13697.765 | 19349.443 |
| 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16925 | 12900.714 | 11779.667 | 13754.462 |
| 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 | 15468.607 | 17427.568 | 18108.969 |
| 27 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | 90 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 7609 | 6358.799 | 7738.115 | 6434.230 |
| 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 | 10155.460 | 7738.115 | 8636.564 |
| 52 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095 | 5744.025 | 7738.115 | 5722.184 |
| 53 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 | 5753.265 | 7738.115 | 5787.699 |
| 56 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 | 8415.227 | 13697.765 | 12885.283 |
| 60 | 1 | 98.8 | 177.8 | 66.5 | 53.7 | 2385 | 122 | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 8845 | 9832.958 | 11779.667 | 10380.855 |
| 67 | 0 | 104.9 | 175.0 | 66.1 | 54.4 | 2700 | 134 | 3.43 | 3.640 | 22.0 | 72 | 4200 | 31 | 39 | 18344 | 12538.073 | 7738.115 | 11843.724 |
| 71 | -1 | 115.6 | 202.6 | 71.7 | 56.3 | 3770 | 183 | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 31600 | 24532.354 | 34430.324 | 28543.429 |
| 90 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1889 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 5499 | 6299.510 | 7738.115 | 6862.742 |
| 97 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1971 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7499 | 6451.038 | 7738.115 | 7014.742 |
| 98 | 1 | 94.5 | 170.2 | 63.8 | 53.5 | 2037 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7999 | 6000.612 | 7738.115 | 7510.330 |
| 113 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.520 | 21.0 | 95 | 4150 | 28 | 33 | 16900 | 17444.981 | 11779.667 | 15318.631 |
| 115 | 0 | 114.2 | 198.9 | 68.4 | 58.7 | 3485 | 152 | 3.70 | 3.520 | 21.0 | 95 | 4150 | 25 | 25 | 17075 | 18082.790 | 13697.765 | 17242.612 |
| 116 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | 120 | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630 | 14307.297 | 13697.765 | 14549.593 |
| 121 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6229 | 6318.145 | 7738.115 | 6391.582 |
| 132 | 2 | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | 132 | 3.46 | 3.900 | 8.7 | 90 | 5100 | 23 | 31 | 9895 | 10466.844 | 11779.667 | 10015.792 |
| 136 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | 121 | 3.54 | 3.070 | 9.3 | 110 | 5250 | 21 | 28 | 15510 | 14403.976 | 13697.765 | 14920.195 |
| 141 | 2 | 93.3 | 157.3 | 63.8 | 55.7 | 2240 | 108 | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7603 | 8673.875 | 7738.115 | 7510.656 |
| 148 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | 108 | 3.62 | 2.640 | 9.0 | 94 | 5200 | 25 | 31 | 10198 | 10999.804 | 7738.115 | 10294.958 |
| 152 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 2040 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 38 | 6338 | 5799.579 | 7738.115 | 6461.817 |
| 154 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2280 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 37 | 6918 | 6140.720 | 7738.115 | 7601.928 |
| 155 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898 | 6251.609 | 7738.115 | 7968.417 |
| 161 | 0 | 95.7 | 166.3 | 64.4 | 53.0 | 2094 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 38 | 47 | 7738 | 5742.111 | 7738.115 | 7305.898 |
| 163 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 28 | 34 | 9258 | 5976.541 | 7738.115 | 7848.658 |
| 165 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 29 | 34 | 8238 | 5875.798 | 7738.115 | 8034.628 |
| 166 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2265 | 98 | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9298 | 12818.868 | 7738.115 | 9880.385 |
| 176 | -1 | 102.4 | 175.6 | 66.5 | 53.9 | 2414 | 122 | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 9988 | 7629.874 | 11779.667 | 10451.422 |
| 178 | -1 | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | 122 | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 11248 | 7711.182 | 11779.667 | 10523.276 |
| 180 | 3 | 102.9 | 183.5 | 67.7 | 52.0 | 3016 | 171 | 3.27 | 3.350 | 9.3 | 161 | 5200 | 19 | 24 | 15998 | 22441.972 | 17427.568 | 16448.013 |
| 181 | -1 | 104.5 | 187.8 | 66.5 | 54.1 | 3131 | 171 | 3.27 | 3.350 | 9.2 | 156 | 5200 | 20 | 24 | 15690 | 20505.941 | 17427.568 | 16855.416 |
| 190 | 3 | 94.5 | 159.3 | 64.2 | 55.6 | 2254 | 109 | 3.19 | 3.400 | 8.5 | 90 | 5500 | 24 | 29 | 11595 | 11406.537 | 7738.115 | 8583.847 |
| 195 | -2 | 104.3 | 188.8 | 67.2 | 56.2 | 2912 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 23 | 28 | 12940 | 16275.453 | 17427.568 | 16102.935 |
| 197 | -2 | 104.3 | 188.8 | 67.2 | 56.2 | 2935 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 24 | 28 | 15985 | 16213.051 | 17427.568 | 16440.660 |
| 199 | -2 | 104.3 | 188.8 | 67.2 | 56.2 | 3045 | 130 | 3.62 | 3.150 | 7.5 | 162 | 5100 | 17 | 22 | 18420 | 15765.956 | 17427.568 | 20138.308 |
| 204 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | 145 | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470 | 19489.740 | 13697.765 | 17538.939 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2783.952 | 3142.833 | 1908.263 |
El modelo de regresión múltiple destaca las variables estadísticamente significativas: la relación de compresión tiene un nivel de confianza del 95%, la carrera y las revoluciones máximas tienen un nivel de confianza del 99%, y la cilindrada tiene un nivel de confianza del 99,9%.
Los factores más importantes en el modelo de árbol regresivo fueron el tamaño del motor, el consumo en carretera, el peso en vacío y la potencia.
El modelo de bosque estocástico tiene en cuenta factores importantes como el tamaño del motor, el peso en vacío, la potencia, el consumo en ciudad y la anchura del coche.
Cabe destacar como vital y significativa en todos los modelos la variable tamaño del motor, así como las variables tamaño del motor, peso en vacío y potencia en los modelos de árbol regresivo y bosque aleatorio.
El mejor modelo conforme al estadístico mas minimo dew raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.
El modelo de bosques aleatorios fue el salto mas significativo de RMSE, mientras que el modelo de regresion multiple y arbol de regresion, no presentan mucha diferencia.