Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1283) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
| 15 | 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 20 | 25 | 24565 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.00 | 111 | 5000 | 21 | 27 | 13495.00 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.00 | 115 | 5500 | 18 | 22 | 17450.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.00 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.80 | 101 | 5800 | 23 | 29 | 16430.00 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.80 | 101 | 5800 | 23 | 29 | 16925.00 |
| 19 | 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.03 | 9.50 | 48 | 5100 | 47 | 53 | 5151.00 |
| 22 | 22 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572.00 |
| 27 | 27 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 7609.00 |
| 33 | 33 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | 79 | 2.91 | 3.07 | 10.10 | 60 | 5500 | 38 | 42 | 5399.00 |
| 37 | 37 | 0 | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | 92 | 2.92 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 7295.00 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10410.8 -1509.3 -105.6 1595.4 12547.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.498e+04 1.709e+04 -1.462 0.145968
## symboling -6.803e+01 2.777e+02 -0.245 0.806778
## wheelbase 6.017e+01 1.140e+02 0.528 0.598452
## carlength -3.120e+01 6.197e+01 -0.503 0.615352
## carwidth 4.259e+02 2.608e+02 1.633 0.104612
## carheight 2.778e+01 1.568e+02 0.177 0.859625
## curbweight 9.607e-01 1.827e+00 0.526 0.599692
## enginesize 1.414e+02 1.505e+01 9.401 < 2e-16 ***
## boreratio -3.947e+03 1.397e+03 -2.825 0.005366 **
## stroke -4.467e+03 8.990e+02 -4.969 1.81e-06 ***
## compressionratio 3.647e+02 9.362e+01 3.895 0.000147 ***
## horsepower 3.920e+01 1.728e+01 2.269 0.024702 *
## peakrpm 1.977e+00 7.177e-01 2.754 0.006607 **
## citympg -3.367e+02 1.880e+02 -1.790 0.075409 .
## highwaympg 2.017e+02 1.668e+02 1.209 0.228401
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3085 on 150 degrees of freedom
## Multiple R-squared: 0.8733, Adjusted R-squared: 0.8614
## F-statistic: 73.83 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 5 10 11 12 19 22 27
## 14598.514 15990.243 18479.498 12559.276 12695.346 -1708.375 5430.154 6944.533
## 33 37 41 47 50 54 57 68
## 4954.259 7912.642 8573.980 10500.792 53579.352 5964.974 7305.516 25029.912
## 69 80 88 89 100 101 104 107
## 25316.784 8657.563 11017.604 11153.673 10258.802 10237.667 23717.361 24603.204
## 110 113 120 135 136 140 154 159
## 13812.776 17505.886 8683.348 21940.870 13644.225 7638.549 6070.431 9474.196
## 160 163 174 181 188 194 200 205
## 10340.400 7222.487 8407.976 22523.634 9668.326 10913.634 16634.820 17797.911
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 14598.514 |
| 5 | 17450.00 | 15990.243 |
| 10 | 17859.17 | 18479.498 |
| 11 | 16430.00 | 12559.276 |
| 12 | 16925.00 | 12695.346 |
| 19 | 5151.00 | -1708.375 |
| 22 | 5572.00 | 5430.154 |
| 27 | 7609.00 | 6944.533 |
| 33 | 5399.00 | 4954.259 |
| 37 | 7295.00 | 7912.642 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 4258.024
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11264350000 13378.510
## 2) enginesize< 182 150 3154274000 11202.260
## 4) curbweight< 2659.5 101 534647900 8625.168
## 8) curbweight< 2291.5 60 84633810 7315.083 *
## 9) curbweight>=2291.5 41 196333300 10542.370 *
## 5) curbweight>=2659.5 49 566212700 16514.220 *
## 3) enginesize>=182 15 295560200 35141.030 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 5 10 11 12 19 22 27
## 10542.366 16514.224 16514.224 10542.366 10542.366 7315.083 7315.083 7315.083
## 33 37 41 47 50 54 57 68
## 7315.083 7315.083 10542.366 16514.224 35141.033 7315.083 10542.366 35141.033
## 69 80 88 89 100 101 104 107
## 35141.033 7315.083 10542.366 10542.366 10542.366 10542.366 16514.224 16514.224
## 110 113 120 135 136 140 154 159
## 16514.224 16514.224 7315.083 16514.224 16514.224 7315.083 7315.083 7315.083
## 160 163 174 181 188 194 200 205
## 7315.083 7315.083 10542.366 16514.224 10542.366 10542.366 16514.224 16514.224
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 10542.366 |
| 5 | 17450.00 | 16514.224 |
| 10 | 17859.17 | 16514.224 |
| 11 | 16430.00 | 10542.366 |
| 12 | 16925.00 | 10542.366 |
| 19 | 5151.00 | 7315.083 |
| 22 | 5572.00 | 7315.083 |
| 27 | 7609.00 | 7315.083 |
| 33 | 5399.00 | 7315.083 |
| 37 | 7295.00 | 7315.083 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3023.092
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 6818800
## % Var explained: 90.01
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 21683763.8 2885806481
## horsepower 21671732.0 2496845349
## citympg 13310749.5 1394117011
## curbweight 12292588.7 1369533548
## highwaympg 6789852.9 1005505226
## carwidth 6359475.2 851347067
## carlength 3590220.6 501114474
## wheelbase 1651354.4 215953664
## peakrpm 763869.0 212413860
## compressionratio 1096175.6 156293648
## carheight 755841.0 97024933
## boreratio 950325.3 91832468
## stroke 165027.4 44253605
## symboling 621553.9 13292838
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 5 10 11 12 19 22 27
## 15755.392 16025.979 20097.458 11385.545 11318.745 6322.735 5873.598 6491.117
## 33 37 41 47 50 54 57 68
## 6511.785 7503.638 8785.860 10010.836 37101.817 6916.543 13145.508 26733.942
## 69 80 88 89 100 101 104 107
## 27783.526 8400.249 10565.021 10565.021 10306.242 10389.005 17483.699 17888.934
## 110 113 120 135 136 140 154 159
## 16695.596 15558.827 8400.249 13798.011 14013.773 7787.873 7570.962 7950.020
## 160 163 174 181 188 194 200 205
## 8072.850 8056.402 10453.044 15871.252 8449.655 12329.320 17483.754 18159.217
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 15755.392 |
| 5 | 17450.00 | 16025.979 |
| 10 | 17859.17 | 20097.458 |
| 11 | 16430.00 | 11385.545 |
| 12 | 16925.00 | 11318.745 |
| 19 | 5151.00 | 6322.735 |
| 22 | 5572.00 | 5873.597 |
| 27 | 7609.00 | 6491.117 |
| 33 | 5399.00 | 6511.785 |
| 37 | 7295.00 | 7503.638 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1970.437
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.680 | 9.00 | 111 | 5000 | 21 | 27 | 13495.00 | 14598.514 | 10542.366 | 15755.392 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.400 | 8.00 | 115 | 5500 | 18 | 22 | 17450.00 | 15990.243 | 16514.224 | 16025.979 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.400 | 7.00 | 160 | 5500 | 16 | 22 | 17859.17 | 18479.498 | 16514.224 | 20097.458 |
| 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.80 | 101 | 5800 | 23 | 29 | 16430.00 | 12559.276 | 10542.366 | 11385.545 |
| 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.80 | 101 | 5800 | 23 | 29 | 16925.00 | 12695.346 | 10542.366 | 11318.745 |
| 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.030 | 9.50 | 48 | 5100 | 47 | 53 | 5151.00 | -1708.375 | 7315.083 | 6322.735 |
| 22 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.230 | 9.41 | 68 | 5500 | 37 | 41 | 5572.00 | 5430.154 | 7315.083 | 5873.597 |
| 27 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | 90 | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 7609.00 | 6944.533 | 7315.083 | 6491.117 |
| 33 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | 79 | 2.91 | 3.070 | 10.10 | 60 | 5500 | 38 | 42 | 5399.00 | 4954.259 | 7315.083 | 6511.785 |
| 37 | 0 | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | 92 | 2.92 | 3.410 | 9.20 | 76 | 6000 | 30 | 34 | 7295.00 | 7912.642 | 7315.083 | 7503.638 |
| 41 | 0 | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | 110 | 3.15 | 3.580 | 9.00 | 86 | 5800 | 27 | 33 | 10295.00 | 8573.980 | 10542.366 | 8785.860 |
| 47 | 2 | 96.0 | 172.6 | 65.2 | 51.4 | 2734 | 119 | 3.43 | 3.230 | 9.20 | 90 | 5000 | 24 | 29 | 11048.00 | 10500.792 | 16514.224 | 10010.836 |
| 50 | 0 | 102.0 | 191.7 | 70.6 | 47.8 | 3950 | 326 | 3.54 | 2.760 | 11.50 | 262 | 5000 | 13 | 17 | 36000.00 | 53579.352 | 35141.033 | 37101.817 |
| 54 | 1 | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | 91 | 3.03 | 3.150 | 9.00 | 68 | 5000 | 31 | 38 | 6695.00 | 5964.974 | 7315.083 | 6916.543 |
| 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.40 | 101 | 6000 | 17 | 23 | 11845.00 | 7305.516 | 10542.366 | 13145.507 |
| 68 | -1 | 110.0 | 190.9 | 70.3 | 56.5 | 3515 | 183 | 3.58 | 3.640 | 21.50 | 123 | 4350 | 22 | 25 | 25552.00 | 25029.912 | 35141.033 | 26733.942 |
| 69 | -1 | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | 183 | 3.58 | 3.640 | 21.50 | 123 | 4350 | 22 | 25 | 28248.00 | 25316.784 | 35141.033 | 27783.526 |
| 80 | 1 | 93.0 | 157.3 | 63.8 | 50.8 | 2145 | 98 | 3.03 | 3.390 | 7.60 | 102 | 5500 | 24 | 30 | 7689.00 | 8657.563 | 7315.083 | 8400.249 |
| 88 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.460 | 7.50 | 116 | 5500 | 23 | 30 | 9279.00 | 11017.604 | 10542.366 | 10565.021 |
| 89 | -1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.460 | 7.50 | 116 | 5500 | 23 | 30 | 9279.00 | 11153.673 | 10542.366 | 10565.021 |
| 100 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2324 | 120 | 3.33 | 3.470 | 8.50 | 97 | 5200 | 27 | 34 | 8949.00 | 10258.802 | 10542.366 | 10306.242 |
| 101 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2302 | 120 | 3.33 | 3.470 | 8.50 | 97 | 5200 | 27 | 34 | 9549.00 | 10237.667 | 10542.366 | 10389.005 |
| 104 | 0 | 100.4 | 184.6 | 66.5 | 55.1 | 3060 | 181 | 3.43 | 3.270 | 9.00 | 152 | 5200 | 19 | 25 | 13499.00 | 23717.361 | 16514.224 | 17483.699 |
| 107 | 1 | 99.2 | 178.5 | 67.9 | 49.7 | 3139 | 181 | 3.43 | 3.270 | 9.00 | 160 | 5200 | 19 | 25 | 18399.00 | 24603.204 | 16514.224 | 17888.934 |
| 110 | 0 | 114.2 | 198.9 | 68.4 | 58.7 | 3230 | 120 | 3.46 | 3.190 | 8.40 | 97 | 5000 | 19 | 24 | 12440.00 | 13812.776 | 16514.224 | 16695.596 |
| 113 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.520 | 21.00 | 95 | 4150 | 28 | 33 | 16900.00 | 17505.886 | 16514.224 | 15558.827 |
| 120 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | 98 | 3.03 | 3.390 | 7.60 | 102 | 5500 | 24 | 30 | 7957.00 | 8683.348 | 7315.083 | 8400.249 |
| 135 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | 121 | 2.54 | 2.070 | 9.30 | 110 | 5250 | 21 | 28 | 15040.00 | 21940.870 | 16514.224 | 13798.011 |
| 136 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | 121 | 3.54 | 3.070 | 9.30 | 110 | 5250 | 21 | 28 | 15510.00 | 13644.225 | 16514.224 | 14013.773 |
| 140 | 2 | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | 108 | 3.62 | 2.640 | 8.70 | 73 | 4400 | 26 | 31 | 7053.00 | 7638.549 | 7315.083 | 7787.873 |
| 154 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2280 | 92 | 3.05 | 3.030 | 9.00 | 62 | 4800 | 31 | 37 | 6918.00 | 6070.431 | 7315.083 | 7570.962 |
| 159 | 0 | 95.7 | 166.3 | 64.4 | 53.0 | 2275 | 110 | 3.27 | 3.350 | 22.50 | 56 | 4500 | 34 | 36 | 7898.00 | 9474.196 | 7315.083 | 7950.020 |
| 160 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2275 | 110 | 3.27 | 3.350 | 22.50 | 56 | 4500 | 38 | 47 | 7788.00 | 10340.400 | 7315.083 | 8072.850 |
| 163 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | 98 | 3.19 | 3.030 | 9.00 | 70 | 4800 | 28 | 34 | 9258.00 | 7222.487 | 7315.083 | 8056.403 |
| 174 | -1 | 102.4 | 175.6 | 66.5 | 54.9 | 2326 | 122 | 3.31 | 3.540 | 8.70 | 92 | 4200 | 29 | 34 | 8948.00 | 8407.976 | 10542.366 | 10453.044 |
| 181 | -1 | 104.5 | 187.8 | 66.5 | 54.1 | 3131 | 171 | 3.27 | 3.350 | 9.20 | 156 | 5200 | 20 | 24 | 15690.00 | 22523.634 | 16514.224 | 15871.252 |
| 188 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | 97 | 3.01 | 3.400 | 23.00 | 68 | 4500 | 37 | 42 | 9495.00 | 9668.326 | 10542.366 | 8449.655 |
| 194 | 0 | 100.4 | 183.1 | 66.9 | 55.1 | 2563 | 109 | 3.19 | 3.400 | 9.00 | 88 | 5500 | 25 | 31 | 12290.00 | 10913.634 | 10542.366 | 12329.320 |
| 200 | -1 | 104.3 | 188.8 | 67.2 | 57.5 | 3157 | 130 | 3.62 | 3.150 | 7.50 | 162 | 5100 | 17 | 22 | 18950.00 | 16634.820 | 16514.224 | 17483.754 |
| 205 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3062 | 141 | 3.78 | 3.150 | 9.50 | 114 | 5400 | 19 | 25 | 22625.00 | 17797.911 | 16514.224 | 18159.217 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 4258.024 | 3023.092 | 1970.437 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
El modelo de regresión linea múltiple destaca variables estadísticamente significativas: Las variable compressionratio tiene un nivel de confianza del 95%; las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% y la variable enginesize tiene un nivel de confianza como predictor del 99.9%.
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, highwaympg, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.