Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1279) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 |
| 25 | 25 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6229 |
| 28 | 28 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | 98 | 3.03 | 3.39 | 7.6 | 102 | 5500 | 24 | 30 | 8558 |
| 30 | 30 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2811 | 156 | 3.60 | 3.90 | 7.0 | 145 | 5000 | 19 | 24 | 12964 |
| 32 | 32 | 2 | 86.6 | 144.6 | 63.9 | 50.8 | 1819 | 92 | 2.91 | 3.41 | 9.2 | 76 | 6000 | 31 | 38 | 6855 |
| 38 | 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.58 | 9.0 | 86 | 5800 | 27 | 33 | 7895 |
| 42 | 42 | 0 | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | 110 | 3.15 | 3.58 | 9.0 | 101 | 5800 | 24 | 28 | 12945 |
| 50 | 50 | 0 | 102.0 | 191.7 | 70.6 | 47.8 | 3950 | 326 | 3.54 | 2.76 | 11.5 | 262 | 5000 | 13 | 17 | 36000 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10000.5 -1558.3 -213.8 1475.9 12824.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.830e+04 1.695e+04 -1.670 0.096971 .
## symboling 9.211e+01 2.698e+02 0.341 0.733314
## wheelbase 1.648e+02 1.167e+02 1.412 0.160097
## carlength -1.015e+02 6.151e+01 -1.650 0.101006
## carwidth 3.301e+02 2.565e+02 1.287 0.200120
## carheight 7.705e+01 1.520e+02 0.507 0.612892
## curbweight 1.768e+00 1.841e+00 0.961 0.338198
## enginesize 1.448e+02 1.603e+01 9.034 7.65e-16 ***
## boreratio -2.449e+03 1.354e+03 -1.808 0.072564 .
## stroke -4.076e+03 8.941e+02 -4.559 1.06e-05 ***
## compressionratio 3.173e+02 8.831e+01 3.593 0.000443 ***
## horsepower 2.678e+01 1.736e+01 1.542 0.125080
## peakrpm 2.395e+00 7.029e-01 3.407 0.000844 ***
## citympg -3.498e+02 1.861e+02 -1.880 0.062075 .
## highwaympg 2.069e+02 1.689e+02 1.225 0.222517
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3069 on 150 degrees of freedom
## Multiple R-squared: 0.8622, Adjusted R-squared: 0.8493
## F-statistic: 67.03 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 3 6 25 28 30 32 38
## 14311.454 19626.288 16104.986 7288.586 9176.889 15195.436 8337.452 10235.262
## 42 50 51 52 63 70 74 78
## 10316.630 51915.678 5173.659 6289.890 9967.079 24770.362 43724.776 7843.103
## 88 94 101 102 103 107 114 117
## 10855.164 5864.873 10337.773 23934.672 24072.799 24413.031 18564.063 18006.460
## 122 126 127 138 139 140 152 163
## 6327.869 20224.284 28909.391 15964.520 8691.807 8760.793 6784.575 7193.004
## 165 168 170 175 187 190 192 199
## 6459.614 13739.170 13758.622 11693.531 9831.875 10836.086 16142.879 16091.523
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 14311.454 |
| 3 | 16500 | 19626.288 |
| 6 | 15250 | 16104.986 |
| 25 | 6229 | 7288.586 |
| 28 | 8558 | 9176.889 |
| 30 | 12964 | 15195.436 |
| 32 | 6855 | 8337.452 |
| 38 | 7895 | 10235.262 |
| 42 | 12945 | 10316.630 |
| 50 | 36000 | 51915.678 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 4065.081
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10248780000 13237.890
## 2) enginesize< 182 151 3171719000 11291.390
## 4) curbweight< 2542 95 475063700 8501.874
## 8) curbweight< 2291.5 57 60364260 7263.772 *
## 9) curbweight>=2291.5 38 196261700 10359.030 *
## 5) curbweight>=2542 56 703363300 16023.610
## 10) carwidth< 68.6 48 480511600 15298.900
## 20) horsepower< 118 28 188507200 13952.540 *
## 21) horsepower>=118 20 170191300 17183.810 *
## 11) carwidth>=68.6 8 46382600 20371.880 *
## 3) enginesize>=182 14 334261200 34232.250 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 3 6 25 28 30 32 38
## 13952.536 17183.808 10359.026 7263.772 7263.772 17183.808 7263.772 7263.772
## 42 50 51 52 63 70 74 78
## 10359.026 34232.250 7263.772 7263.772 10359.026 34232.250 34232.250 7263.772
## 88 94 101 102 103 107 114 117
## 10359.026 7263.772 10359.026 17183.808 17183.808 17183.808 13952.536 13952.536
## 122 126 127 138 139 140 152 163
## 7263.772 17183.808 34232.250 17183.808 7263.772 7263.772 7263.772 7263.772
## 165 168 170 175 187 190 192 199
## 7263.772 10359.026 13952.536 10359.026 7263.772 7263.772 13952.536 17183.808
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 13952.536 |
| 3 | 16500 | 17183.808 |
| 6 | 15250 | 10359.026 |
| 25 | 6229 | 7263.772 |
| 28 | 8558 | 7263.772 |
| 30 | 12964 | 17183.808 |
| 32 | 6855 | 7263.772 |
| 38 | 7895 | 7263.772 |
| 42 | 12945 | 10359.026 |
| 50 | 36000 | 34232.250 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 2613.561
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 8096805
## % Var explained: 86.96
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 16592596.7 1720088563
## horsepower 21912339.0 1553745423
## citympg 7527704.0 1224490150
## curbweight 14871776.5 1190738983
## carwidth 5889207.4 978393110
## highwaympg 13197907.5 898498035
## carlength 5741494.9 768712871
## wheelbase 6314098.6 491810761
## stroke 747848.5 201191213
## boreratio 3638489.5 147509949
## compressionratio 621265.8 115142818
## peakrpm 1439127.4 105073030
## carheight 97009.3 72858578
## symboling 243608.5 46615279
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 3 6 25 28 30 32 38
## 17515.072 14567.495 13731.712 6649.143 8145.571 13716.133 6482.563 8654.288
## 42 50 51 52 63 70 74 78
## 12202.971 34235.892 6663.734 6475.812 9730.418 25679.976 39709.057 6391.613
## 88 94 101 102 103 107 114 117
## 9337.945 7833.066 9219.387 18160.907 20968.961 16487.228 14111.058 15692.847
## 122 126 127 138 139 140 152 163
## 6853.143 20242.562 30175.201 17531.342 6997.172 7380.448 6582.665 8003.921
## 165 168 170 175 187 190 192 199
## 7892.367 10510.013 10985.612 11839.286 8135.098 9274.737 14279.262 20422.190
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 17515.072 |
| 3 | 16500 | 14567.495 |
| 6 | 15250 | 13731.712 |
| 25 | 6229 | 6649.143 |
| 28 | 8558 | 8145.571 |
| 30 | 12964 | 13716.133 |
| 32 | 6855 | 6482.563 |
| 38 | 7895 | 8654.288 |
| 42 | 12945 | 12202.971 |
| 50 | 36000 | 34235.892 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1942.218
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495 | 14311.454 | 13952.536 | 17515.072 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 | 19626.288 | 17183.808 | 14567.495 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 | 16104.986 | 10359.026 | 13731.712 |
| 25 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6229 | 7288.586 | 7263.772 | 6649.143 |
| 28 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | 98 | 3.03 | 3.39 | 7.6 | 102 | 5500 | 24 | 30 | 8558 | 9176.889 | 7263.772 | 8145.571 |
| 30 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2811 | 156 | 3.60 | 3.90 | 7.0 | 145 | 5000 | 19 | 24 | 12964 | 15195.436 | 17183.808 | 13716.133 |
| 32 | 2 | 86.6 | 144.6 | 63.9 | 50.8 | 1819 | 92 | 2.91 | 3.41 | 9.2 | 76 | 6000 | 31 | 38 | 6855 | 8337.452 | 7263.772 | 6482.563 |
| 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.58 | 9.0 | 86 | 5800 | 27 | 33 | 7895 | 10235.262 | 7263.772 | 8654.288 |
| 42 | 0 | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | 110 | 3.15 | 3.58 | 9.0 | 101 | 5800 | 24 | 28 | 12945 | 10316.630 | 10359.026 | 12202.971 |
| 50 | 0 | 102.0 | 191.7 | 70.6 | 47.8 | 3950 | 326 | 3.54 | 2.76 | 11.5 | 262 | 5000 | 13 | 17 | 36000 | 51915.678 | 34232.250 | 34235.892 |
| 51 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1890 | 91 | 3.03 | 3.15 | 9.0 | 68 | 5000 | 30 | 31 | 5195 | 5173.659 | 7263.772 | 6663.734 |
| 52 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | 91 | 3.03 | 3.15 | 9.0 | 68 | 5000 | 31 | 38 | 6095 | 6289.890 | 7263.772 | 6475.812 |
| 63 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2410 | 122 | 3.39 | 3.39 | 8.6 | 84 | 4800 | 26 | 32 | 10245 | 9967.079 | 10359.026 | 9730.418 |
| 70 | 0 | 106.7 | 187.5 | 70.3 | 54.9 | 3495 | 183 | 3.58 | 3.64 | 21.5 | 123 | 4350 | 22 | 25 | 28176 | 24770.362 | 34232.250 | 25679.976 |
| 74 | 0 | 120.9 | 208.1 | 71.7 | 56.7 | 3900 | 308 | 3.80 | 3.35 | 8.0 | 184 | 4500 | 14 | 16 | 40960 | 43724.776 | 34232.250 | 39709.057 |
| 78 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | 92 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6189 | 7843.103 | 7263.772 | 6391.613 |
| 88 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.46 | 7.5 | 116 | 5500 | 23 | 30 | 9279 | 10855.164 | 10359.026 | 9337.945 |
| 94 | 1 | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | 97 | 3.15 | 3.29 | 9.4 | 69 | 5200 | 31 | 37 | 7349 | 5864.873 | 7263.772 | 7833.066 |
| 101 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2302 | 120 | 3.33 | 3.47 | 8.5 | 97 | 5200 | 27 | 34 | 9549 | 10337.773 | 10359.026 | 9219.387 |
| 102 | 0 | 100.4 | 181.7 | 66.5 | 55.1 | 3095 | 181 | 3.43 | 3.27 | 9.0 | 152 | 5200 | 17 | 22 | 13499 | 23934.672 | 17183.808 | 18160.907 |
| 103 | 0 | 100.4 | 184.6 | 66.5 | 56.1 | 3296 | 181 | 3.43 | 3.27 | 9.0 | 152 | 5200 | 17 | 22 | 14399 | 24072.799 | 17183.808 | 20968.961 |
| 107 | 1 | 99.2 | 178.5 | 67.9 | 49.7 | 3139 | 181 | 3.43 | 3.27 | 9.0 | 160 | 5200 | 19 | 25 | 18399 | 24413.031 | 17183.808 | 16487.228 |
| 114 | 0 | 114.2 | 198.9 | 68.4 | 56.7 | 3285 | 120 | 3.46 | 2.19 | 8.4 | 95 | 5000 | 19 | 24 | 16695 | 18564.063 | 13952.536 | 14111.058 |
| 117 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.52 | 21.0 | 95 | 4150 | 28 | 33 | 17950 | 18006.460 | 13952.536 | 15692.847 |
| 122 | 1 | 93.7 | 167.3 | 63.8 | 50.8 | 1989 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6692 | 6327.869 | 7263.772 | 6853.143 |
| 126 | 3 | 94.5 | 168.9 | 68.3 | 50.2 | 2778 | 151 | 3.94 | 3.11 | 9.5 | 143 | 5500 | 19 | 27 | 22018 | 20224.284 | 17183.808 | 20242.562 |
| 127 | 3 | 89.5 | 168.9 | 65.0 | 51.6 | 2756 | 194 | 3.74 | 2.90 | 9.5 | 207 | 5900 | 17 | 25 | 32528 | 28909.391 | 34232.250 | 30175.201 |
| 138 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | 121 | 3.54 | 3.07 | 9.0 | 160 | 5500 | 19 | 26 | 18620 | 15964.520 | 17183.808 | 17531.342 |
| 139 | 2 | 93.7 | 156.9 | 63.4 | 53.7 | 2050 | 97 | 3.62 | 2.36 | 9.0 | 69 | 4900 | 31 | 36 | 5118 | 8691.807 | 7263.772 | 6997.172 |
| 140 | 2 | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | 108 | 3.62 | 2.64 | 8.7 | 73 | 4400 | 26 | 31 | 7053 | 8760.793 | 7263.772 | 7380.448 |
| 152 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 2040 | 92 | 3.05 | 3.03 | 9.0 | 62 | 4800 | 31 | 38 | 6338 | 6784.575 | 7263.772 | 6582.665 |
| 163 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | 98 | 3.19 | 3.03 | 9.0 | 70 | 4800 | 28 | 34 | 9258 | 7193.004 | 7263.772 | 8003.921 |
| 165 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | 98 | 3.19 | 3.03 | 9.0 | 70 | 4800 | 29 | 34 | 8238 | 6459.614 | 7263.772 | 7892.367 |
| 168 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2540 | 146 | 3.62 | 3.50 | 9.3 | 116 | 4800 | 24 | 30 | 8449 | 13739.170 | 10359.026 | 10510.013 |
| 170 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2551 | 146 | 3.62 | 3.50 | 9.3 | 116 | 4800 | 24 | 30 | 9989 | 13758.622 | 13952.536 | 10985.612 |
| 175 | -1 | 102.4 | 175.6 | 66.5 | 54.9 | 2480 | 110 | 3.27 | 3.35 | 22.5 | 73 | 4500 | 30 | 33 | 10698 | 11693.531 | 10359.026 | 11839.286 |
| 187 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2275 | 109 | 3.19 | 3.40 | 9.0 | 85 | 5250 | 27 | 34 | 8495 | 9831.875 | 7263.772 | 8135.098 |
| 190 | 3 | 94.5 | 159.3 | 64.2 | 55.6 | 2254 | 109 | 3.19 | 3.40 | 8.5 | 90 | 5500 | 24 | 29 | 11595 | 10836.086 | 7263.772 | 9274.737 |
| 192 | 0 | 100.4 | 180.2 | 66.9 | 55.1 | 2661 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 24 | 13295 | 16142.879 | 13952.536 | 14279.262 |
| 199 | -2 | 104.3 | 188.8 | 67.2 | 56.2 | 3045 | 130 | 3.62 | 3.15 | 7.5 | 162 | 5100 | 17 | 22 | 18420 | 16091.523 | 17183.808 | 20422.190 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 4065.081 | 2613.561 | 1942.218 |
El RMSE del modelo de regresión lineal es de 4065.081
El RMSE del modelo de árbol de regresión es de 2613.561
El RMSE del modelo de bosques aleatorios es de 1942.218
Con estos resultados, tomando en cuenta las cifras de RMSE de cada uno de los modelos, podemos decir que en R el modelo más óptimo para estos datos con la semilla 1279 es el modelo de bosques aleatorios, resultado que también resulta el más óptimo si se utilizan los mismos datos y la semilla 2022.