Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1306) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
| 39 | 39 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 9095 |
| 41 | 41 | 0 | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 10295 |
| 53 | 53 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 |
| 57 | 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845 |
| 65 | 65 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | 122 | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 11245 |
| 69 | 69 | -1 | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | 183 | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 28248 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9794.8 -1813.7 -201.3 1453.3 14617.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -65312.768 18324.688 -3.564 0.00049 ***
## symboling 316.346 288.466 1.097 0.27455
## wheelbase 191.455 132.851 1.441 0.15163
## carlength -90.654 66.088 -1.372 0.17220
## carwidth 617.745 288.389 2.142 0.03380 *
## carheight 128.550 172.216 0.746 0.45657
## curbweight 1.117 2.372 0.471 0.63821
## enginesize 122.415 16.530 7.405 8.76e-12 ***
## boreratio 123.954 1486.645 0.083 0.93366
## stroke -2813.227 944.733 -2.978 0.00339 **
## compressionratio 223.561 105.715 2.115 0.03610 *
## horsepower 25.395 19.880 1.277 0.20344
## peakrpm 2.588 0.838 3.089 0.00240 **
## citympg -249.012 207.640 -1.199 0.23232
## highwaympg 192.664 187.532 1.027 0.30590
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3376 on 150 degrees of freedom
## Multiple R-squared: 0.8506, Adjusted R-squared: 0.8367
## F-statistic: 61.01 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 3 5 11 14 39 41 53 57
## 16249.353 15675.745 13536.659 15897.806 10292.840 8104.347 6084.074 8402.611
## 65 69 76 78 86 90 94 114
## 10398.217 24422.454 18693.078 7694.952 10121.591 6371.855 5949.951 17816.589
## 116 122 125 127 131 138 140 143
## 14719.309 5906.873 15618.864 26615.728 10983.098 16054.143 8434.433 8524.219
## 152 154 155 156 163 167 172 175
## 6242.164 5595.469 5639.370 6555.657 6555.631 11546.107 14155.252 10831.789
## 185 186 191 193 195 200 203 204
## 9753.978 10238.294 9539.189 9168.457 16429.825 16154.051 24309.253 18979.200
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 16249.353 |
| 5 | 17450 | 15675.745 |
| 11 | 16430 | 13536.659 |
| 14 | 21105 | 15897.806 |
| 39 | 9095 | 10292.840 |
| 41 | 10295 | 8104.347 |
| 53 | 6795 | 6084.074 |
| 57 | 11845 | 8402.611 |
| 65 | 11245 | 10398.217 |
| 69 | 28248 | 24422.454 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2510.996
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11445220000 13426.070
## 2) enginesize< 182 149 3002948000 11135.320
## 4) curbweight< 2544 96 484534700 8449.750
## 8) curbweight< 2291.5 58 78729770 7209.621 *
## 9) curbweight>=2291.5 38 180459100 10342.580 *
## 5) curbweight>=2544 53 571917500 15999.740 *
## 3) enginesize>=182 16 379081000 34758.720 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 3 5 11 14 39 41 53 57
## 15999.739 15999.739 10342.579 15999.739 7209.621 10342.579 7209.621 10342.579
## 65 69 76 78 86 90 94 114
## 10342.579 34758.719 15999.739 7209.621 10342.579 7209.621 7209.621 15999.739
## 116 122 125 127 131 138 140 143
## 15999.739 7209.621 15999.739 34758.719 15999.739 15999.739 7209.621 7209.621
## 152 154 155 156 163 167 172 175
## 7209.621 7209.621 7209.621 15999.739 7209.621 10342.579 15999.739 10342.579
## 185 186 191 193 195 200 203 204
## 7209.621 7209.621 7209.621 15999.739 15999.739 15999.739 15999.739 15999.739
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 15999.739 |
| 5 | 17450 | 15999.739 |
| 11 | 16430 | 10342.579 |
| 14 | 21105 | 15999.739 |
| 39 | 9095 | 7209.621 |
| 41 | 10295 | 10342.579 |
| 53 | 6795 | 7209.621 |
| 57 | 11845 | 10342.579 |
| 65 | 11245 | 10342.579 |
| 69 | 28248 | 34758.719 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3086.6
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 6726345
## % Var explained: 90.3
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 35617773.057 3136697449
## horsepower 15145464.114 2306265117
## curbweight 16404109.308 1754876848
## highwaympg 4833431.165 912209945
## carlength 3532019.215 629240616
## carwidth 4776638.587 534062936
## citympg 2187339.998 473501897
## wheelbase 3266390.722 444882866
## compressionratio 1533415.802 220660769
## boreratio 5283505.877 145517517
## peakrpm 770155.026 124143868
## stroke 1309263.295 116183809
## carheight 600983.448 53581221
## symboling -6422.724 16237861
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 3 5 11 14 39 41 53 57
## 15239.895 15955.170 14854.625 19609.825 8936.829 8904.374 6164.256 12293.090
## 65 69 76 78 86 90 94 114
## 9852.097 28384.943 24593.525 6436.341 8579.975 7004.522 7928.322 17277.907
## 116 122 125 127 131 138 140 143
## 13495.121 6967.620 13734.864 34372.863 10869.378 16347.357 7764.032 8092.583
## 152 154 155 156 163 167 172 175
## 6723.509 7743.936 8158.715 11822.827 7898.086 10366.571 12666.768 10113.736
## 185 186 191 193 195 200 203 204
## 8265.428 8213.279 9709.236 12145.732 15517.227 17483.815 20706.701 18003.279
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 15239.895 |
| 5 | 17450 | 15955.170 |
| 11 | 16430 | 14854.625 |
| 14 | 21105 | 19609.825 |
| 39 | 9095 | 8936.829 |
| 41 | 10295 | 8904.374 |
| 53 | 6795 | 6164.256 |
| 57 | 11845 | 12293.090 |
| 65 | 11245 | 9852.097 |
| 69 | 28248 | 28384.943 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1957.208
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 | 16249.353 | 15999.739 | 15239.895 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450 | 15675.745 | 15999.739 | 15955.170 |
| 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16430 | 13536.659 | 10342.579 | 14854.625 |
| 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 | 15897.806 | 15999.739 | 19609.825 |
| 39 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 9095 | 10292.840 | 7209.621 | 8936.829 |
| 41 | 0 | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 10295 | 8104.347 | 10342.579 | 8904.374 |
| 53 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 | 6084.074 | 7209.621 | 6164.256 |
| 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845 | 8402.611 | 10342.579 | 12293.090 |
| 65 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | 122 | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 11245 | 10398.217 | 10342.579 | 9852.097 |
| 69 | -1 | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | 183 | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 28248 | 24422.454 | 34758.719 | 28384.943 |
| 76 | 1 | 102.7 | 178.4 | 68.0 | 54.8 | 2910 | 140 | 3.78 | 3.120 | 8.0 | 175 | 5000 | 19 | 24 | 16503 | 18693.078 | 15999.739 | 24593.525 |
| 78 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | 92 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6189 | 7694.952 | 7209.621 | 6436.341 |
| 86 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2365 | 122 | 3.35 | 3.460 | 8.5 | 88 | 5000 | 25 | 32 | 6989 | 10121.591 | 10342.579 | 8579.975 |
| 90 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1889 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 5499 | 6371.855 | 7209.621 | 7004.522 |
| 94 | 1 | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7349 | 5949.951 | 7209.621 | 7928.323 |
| 114 | 0 | 114.2 | 198.9 | 68.4 | 56.7 | 3285 | 120 | 3.46 | 2.190 | 8.4 | 95 | 5000 | 19 | 24 | 16695 | 17816.589 | 15999.739 | 17277.907 |
| 116 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | 120 | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630 | 14719.309 | 15999.739 | 13495.121 |
| 122 | 1 | 93.7 | 167.3 | 63.8 | 50.8 | 1989 | 90 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6692 | 5906.873 | 7209.621 | 6967.620 |
| 125 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2818 | 156 | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 12764 | 15618.864 | 15999.739 | 13734.864 |
| 127 | 3 | 89.5 | 168.9 | 65.0 | 51.6 | 2756 | 194 | 3.74 | 2.900 | 9.5 | 207 | 5900 | 17 | 25 | 32528 | 26615.728 | 34758.719 | 34372.863 |
| 131 | 0 | 96.1 | 181.5 | 66.5 | 55.2 | 2579 | 132 | 3.46 | 3.900 | 8.7 | 90 | 5100 | 23 | 31 | 9295 | 10983.098 | 15999.739 | 10869.378 |
| 138 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | 121 | 3.54 | 3.070 | 9.0 | 160 | 5500 | 19 | 26 | 18620 | 16054.143 | 15999.739 | 16347.357 |
| 140 | 2 | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | 108 | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7053 | 8434.433 | 7209.621 | 7764.032 |
| 143 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2190 | 108 | 3.62 | 2.640 | 9.5 | 82 | 4400 | 28 | 33 | 7775 | 8524.219 | 7209.621 | 8092.583 |
| 152 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 2040 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 38 | 6338 | 6242.164 | 7209.621 | 6723.509 |
| 154 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2280 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 37 | 6918 | 5595.469 | 7209.621 | 7743.936 |
| 155 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898 | 5639.370 | 7209.621 | 8158.715 |
| 156 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 3110 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 8778 | 6555.657 | 15999.739 | 11822.827 |
| 163 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 28 | 34 | 9258 | 6555.631 | 7209.621 | 7898.086 |
| 167 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | 98 | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9538 | 11546.107 | 10342.579 | 10366.571 |
| 172 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2714 | 146 | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 11549 | 14155.252 | 15999.739 | 12666.768 |
| 175 | -1 | 102.4 | 175.6 | 66.5 | 54.9 | 2480 | 110 | 3.27 | 3.350 | 22.5 | 73 | 4500 | 30 | 33 | 10698 | 10831.789 | 10342.579 | 10113.736 |
| 185 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2264 | 97 | 3.01 | 3.400 | 23.0 | 52 | 4800 | 37 | 46 | 7995 | 9753.978 | 7209.621 | 8265.428 |
| 186 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2212 | 109 | 3.19 | 3.400 | 9.0 | 85 | 5250 | 27 | 34 | 8195 | 10238.294 | 7209.621 | 8213.279 |
| 191 | 3 | 94.5 | 165.7 | 64.0 | 51.4 | 2221 | 109 | 3.19 | 3.400 | 8.5 | 90 | 5500 | 24 | 29 | 9980 | 9539.189 | 7209.621 | 9709.236 |
| 193 | 0 | 100.4 | 180.2 | 66.9 | 55.1 | 2579 | 97 | 3.01 | 3.400 | 23.0 | 68 | 4500 | 33 | 38 | 13845 | 9168.457 | 15999.739 | 12145.732 |
| 195 | -2 | 104.3 | 188.8 | 67.2 | 56.2 | 2912 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 23 | 28 | 12940 | 16429.825 | 15999.739 | 15517.228 |
| 200 | -1 | 104.3 | 188.8 | 67.2 | 57.5 | 3157 | 130 | 3.62 | 3.150 | 7.5 | 162 | 5100 | 17 | 22 | 18950 | 16154.051 | 15999.739 | 17483.815 |
| 203 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3012 | 173 | 3.58 | 2.870 | 8.8 | 134 | 5500 | 18 | 23 | 21485 | 24309.253 | 15999.739 | 20706.701 |
| 204 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | 145 | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470 | 18979.200 | 15999.739 | 18003.279 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2510.996 | 3086.6 | 1957.208 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
El modelo de regresión linea múltiple destaca variables estadísticamente significativas: Las variable compressionratio tiene un nivel de confianza del 95%; las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% y la variable enginesize tiene un nivel de confianza como predictor del 99.9%.
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, highwaympg, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.
Para este modelo y con estos datos el valor de rmse mas bajo fue el con random forest por lo tanto es el modelo mas optimo de los 3 utilizados.