Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(2022) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 20 | 25 | 24565 |
| 31 | 31 | 2 | 86.6 | 144.6 | 63.9 | 50.8 | 1713 | 92 | 2.91 | 3.410 | 9.6 | 58 | 4800 | 49 | 54 | 6479 |
| 38 | 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 |
| 44 | 44 | 0 | 94.3 | 170.7 | 61.8 | 53.5 | 2337 | 111 | 3.31 | 3.230 | 8.5 | 78 | 4800 | 24 | 29 | 6785 |
| 51 | 51 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1890 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 30 | 31 | 5195 |
| 52 | 52 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095 |
| 57 | 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845 |
| 75 | 75 | 1 | 112.0 | 199.2 | 72.0 | 55.4 | 3715 | 304 | 3.80 | 3.350 | 8.0 | 184 | 4500 | 14 | 16 | 45400 |
| 78 | 78 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | 92 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6189 |
| 79 | 79 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 2004 | 92 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6669 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8272.1 -1754.3 -48.1 1498.7 14881.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.550e+04 1.841e+04 -3.014 0.00303 **
## symboling 1.375e+02 2.628e+02 0.523 0.60172
## wheelbase 1.908e+02 1.133e+02 1.685 0.09412 .
## carlength -9.563e+01 6.088e+01 -1.571 0.11835
## carwidth 4.949e+02 2.820e+02 1.755 0.08132 .
## carheight 1.271e+02 1.485e+02 0.856 0.39364
## curbweight 2.967e+00 1.859e+00 1.596 0.11257
## enginesize 1.018e+02 1.570e+01 6.488 1.19e-09 ***
## boreratio -1.067e+03 1.329e+03 -0.803 0.42325
## stroke -2.243e+03 8.312e+02 -2.698 0.00778 **
## compressionratio 2.206e+02 9.640e+01 2.288 0.02354 *
## horsepower 2.933e+01 1.862e+01 1.576 0.11723
## peakrpm 2.342e+00 7.521e-01 3.113 0.00221 **
## citympg -3.601e+02 2.094e+02 -1.720 0.08755 .
## highwaympg 2.932e+02 1.962e+02 1.495 0.13712
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3102 on 150 degrees of freedom
## Multiple R-squared: 0.8492, Adjusted R-squared: 0.8351
## F-statistic: 60.34 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 15 31 38 44 51 52 57 75
## 17198.963 2501.016 10429.543 6283.025 4391.072 6112.885 8802.582 37401.471
## 78 79 85 86 93 100 104 113
## 7592.731 7770.756 15626.112 10105.913 6214.441 10611.517 21549.303 18311.020
## 117 121 125 126 128 129 130 131
## 18311.020 6997.423 15305.666 19663.111 25200.151 25330.704 34503.077 11375.097
## 133 136 144 147 150 162 165 166
## 13789.703 13946.725 10737.958 8462.704 9469.444 6684.607 6023.436 11188.459
## 174 175 186 189 193 194 204 205
## 8834.319 11073.389 10072.788 11353.595 10097.057 11471.266 19732.396 19200.271
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 15 | 24565 | 17198.963 |
| 31 | 6479 | 2501.016 |
| 38 | 7895 | 10429.543 |
| 44 | 6785 | 6283.025 |
| 51 | 5195 | 4391.072 |
| 52 | 6095 | 6112.885 |
| 57 | 11845 | 8802.582 |
| 75 | 45400 | 37401.471 |
| 78 | 6189 | 7592.731 |
| 79 | 6669 | 7770.756 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3673.339
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 9569698000 13119.900
## 2) enginesize< 182 151 2938851000 11229.970
## 4) highwaympg>=28.5 99 623944000 8667.040
## 8) curbweight< 2291.5 60 84833530 7300.583 *
## 9) curbweight>=2291.5 39 254720700 10769.280 *
## 5) highwaympg< 28.5 52 426561400 16109.390
## 10) horsepower< 114.5 24 130010300 14542.830 *
## 11) horsepower>=114.5 28 187168600 17452.150 *
## 3) enginesize>=182 14 274215500 33504.210 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 15 31 38 44 51 52 57 75
## 17452.149 7300.583 7300.583 10769.282 7300.583 7300.583 14542.833 33504.214
## 78 79 85 86 93 100 104 113
## 7300.583 7300.583 17452.149 10769.282 7300.583 10769.282 17452.149 10769.282
## 117 121 125 126 128 129 130 131
## 10769.282 7300.583 17452.149 17452.149 33504.214 33504.214 33504.214 10769.282
## 133 136 144 147 150 162 165 166
## 14542.833 14542.833 10769.282 7300.583 14542.833 7300.583 7300.583 7300.583
## 174 175 186 189 193 194 204 205
## 10769.282 10769.282 7300.583 10769.282 10769.282 10769.282 14542.833 14542.833
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 15 | 24565 | 17452.149 |
| 31 | 6479 | 7300.583 |
| 38 | 7895 | 7300.583 |
| 44 | 6785 | 10769.282 |
| 51 | 5195 | 7300.583 |
| 52 | 6095 | 7300.583 |
| 57 | 11845 | 14542.833 |
| 75 | 45400 | 33504.214 |
| 78 | 6189 | 7300.583 |
| 79 | 6669 | 7300.583 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3827.726
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 4783542
## % Var explained: 91.75
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 19734904.72 2959385553
## citympg 14645952.47 1673655627
## curbweight 11700258.39 1372066660
## horsepower 10126865.80 1194578601
## carwidth 4245682.63 748157514
## highwaympg 6721732.38 655729324
## carlength 1353716.38 583231822
## boreratio 276449.48 161500053
## stroke 1364697.14 155913342
## compressionratio 572554.27 101758201
## wheelbase 1946978.65 100587531
## peakrpm 1107844.66 47389481
## carheight 98921.28 42043764
## symboling 30983.94 13858920
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 15 31 38 44 51 52 57 75
## 17799.683 5956.730 8779.703 9703.812 6709.958 6734.879 13450.522 37767.236
## 78 79 85 86 93 100 104 113
## 6579.246 6749.000 14177.604 8768.958 6774.953 9872.542 15694.245 14317.617
## 117 121 125 126 128 129 130 131
## 14317.617 6615.899 13853.694 16124.047 28052.757 28052.757 28266.237 11366.945
## 133 136 144 147 150 162 165 166
## 13710.725 14035.183 9680.552 8647.761 12423.736 8069.050 8199.142 10019.898
## 174 175 186 189 193 194 204 205
## 10223.895 10989.203 8332.735 10776.245 11196.059 10898.665 15900.374 18459.557
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 15 | 24565 | 17799.682 |
| 31 | 6479 | 5956.730 |
| 38 | 7895 | 8779.702 |
| 44 | 6785 | 9703.812 |
| 51 | 5195 | 6709.958 |
| 52 | 6095 | 6734.879 |
| 57 | 11845 | 13450.522 |
| 75 | 45400 | 37767.236 |
| 78 | 6189 | 6579.246 |
| 79 | 6669 | 6749.000 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 3139.922
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.190 | 9.00 | 121 | 4250 | 20 | 25 | 24565.0 | 17198.963 | 17452.149 | 17799.682 |
| 31 | 2 | 86.6 | 144.6 | 63.9 | 50.8 | 1713 | 92 | 2.91 | 3.410 | 9.60 | 58 | 4800 | 49 | 54 | 6479.0 | 2501.016 | 7300.583 | 5956.730 |
| 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.580 | 9.00 | 86 | 5800 | 27 | 33 | 7895.0 | 10429.543 | 7300.583 | 8779.702 |
| 44 | 0 | 94.3 | 170.7 | 61.8 | 53.5 | 2337 | 111 | 3.31 | 3.230 | 8.50 | 78 | 4800 | 24 | 29 | 6785.0 | 6283.025 | 10769.282 | 9703.812 |
| 51 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1890 | 91 | 3.03 | 3.150 | 9.00 | 68 | 5000 | 30 | 31 | 5195.0 | 4391.072 | 7300.583 | 6709.958 |
| 52 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | 91 | 3.03 | 3.150 | 9.00 | 68 | 5000 | 31 | 38 | 6095.0 | 6112.885 | 7300.583 | 6734.879 |
| 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.40 | 101 | 6000 | 17 | 23 | 11845.0 | 8802.582 | 14542.833 | 13450.522 |
| 75 | 1 | 112.0 | 199.2 | 72.0 | 55.4 | 3715 | 304 | 3.80 | 3.350 | 8.00 | 184 | 4500 | 14 | 16 | 45400.0 | 37401.471 | 33504.214 | 37767.236 |
| 78 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | 92 | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 6189.0 | 7592.731 | 7300.583 | 6579.246 |
| 79 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 2004 | 92 | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 6669.0 | 7770.756 | 7300.583 | 6749.000 |
| 85 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2926 | 156 | 3.59 | 3.860 | 7.00 | 145 | 5000 | 19 | 24 | 14489.0 | 15626.112 | 17452.149 | 14177.604 |
| 86 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2365 | 122 | 3.35 | 3.460 | 8.50 | 88 | 5000 | 25 | 32 | 6989.0 | 10105.913 | 10769.282 | 8768.958 |
| 93 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1938 | 97 | 3.15 | 3.290 | 9.40 | 69 | 5200 | 31 | 37 | 6849.0 | 6214.441 | 7300.583 | 6774.953 |
| 100 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2324 | 120 | 3.33 | 3.470 | 8.50 | 97 | 5200 | 27 | 34 | 8949.0 | 10611.517 | 10769.282 | 9872.542 |
| 104 | 0 | 100.4 | 184.6 | 66.5 | 55.1 | 3060 | 181 | 3.43 | 3.270 | 9.00 | 152 | 5200 | 19 | 25 | 13499.0 | 21549.303 | 17452.149 | 15694.245 |
| 113 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.520 | 21.00 | 95 | 4150 | 28 | 33 | 16900.0 | 18311.020 | 10769.282 | 14317.618 |
| 117 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.520 | 21.00 | 95 | 4150 | 28 | 33 | 17950.0 | 18311.020 | 10769.282 | 14317.618 |
| 121 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 6229.0 | 6997.423 | 7300.583 | 6615.899 |
| 125 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2818 | 156 | 3.59 | 3.860 | 7.00 | 145 | 5000 | 19 | 24 | 12764.0 | 15305.666 | 17452.149 | 13853.694 |
| 126 | 3 | 94.5 | 168.9 | 68.3 | 50.2 | 2778 | 151 | 3.94 | 3.110 | 9.50 | 143 | 5500 | 19 | 27 | 22018.0 | 19663.111 | 17452.149 | 16124.047 |
| 128 | 3 | 89.5 | 168.9 | 65.0 | 51.6 | 2756 | 194 | 3.74 | 2.900 | 9.50 | 207 | 5900 | 17 | 25 | 34028.0 | 25200.151 | 33504.214 | 28052.757 |
| 129 | 3 | 89.5 | 168.9 | 65.0 | 51.6 | 2800 | 194 | 3.74 | 2.900 | 9.50 | 207 | 5900 | 17 | 25 | 37028.0 | 25330.704 | 33504.214 | 28052.757 |
| 130 | 1 | 98.4 | 175.7 | 72.3 | 50.5 | 3366 | 203 | 3.94 | 3.110 | 10.00 | 288 | 5750 | 17 | 28 | 31400.5 | 34503.077 | 33504.214 | 28266.237 |
| 131 | 0 | 96.1 | 181.5 | 66.5 | 55.2 | 2579 | 132 | 3.46 | 3.900 | 8.70 | 90 | 5100 | 23 | 31 | 9295.0 | 11375.097 | 10769.282 | 11366.945 |
| 133 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | 121 | 3.54 | 3.070 | 9.31 | 110 | 5250 | 21 | 28 | 11850.0 | 13789.703 | 14542.833 | 13710.725 |
| 136 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | 121 | 3.54 | 3.070 | 9.30 | 110 | 5250 | 21 | 28 | 15510.0 | 13946.725 | 14542.833 | 14035.183 |
| 144 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2340 | 108 | 3.62 | 2.640 | 9.00 | 94 | 5200 | 26 | 32 | 9960.0 | 10737.958 | 10769.282 | 9680.552 |
| 147 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | 108 | 3.62 | 2.640 | 9.00 | 82 | 4800 | 28 | 32 | 7463.0 | 8462.704 | 7300.583 | 8647.761 |
| 150 | 0 | 96.9 | 173.6 | 65.4 | 54.9 | 2650 | 108 | 3.62 | 2.640 | 7.70 | 111 | 4800 | 23 | 23 | 11694.0 | 9469.444 | 14542.833 | 12423.736 |
| 162 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2122 | 98 | 3.19 | 3.030 | 9.00 | 70 | 4800 | 28 | 34 | 8358.0 | 6684.607 | 7300.583 | 8069.050 |
| 165 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | 98 | 3.19 | 3.030 | 9.00 | 70 | 4800 | 29 | 34 | 8238.0 | 6023.436 | 7300.583 | 8199.142 |
| 166 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2265 | 98 | 3.24 | 3.080 | 9.40 | 112 | 6600 | 26 | 29 | 9298.0 | 11188.459 | 7300.583 | 10019.898 |
| 174 | -1 | 102.4 | 175.6 | 66.5 | 54.9 | 2326 | 122 | 3.31 | 3.540 | 8.70 | 92 | 4200 | 29 | 34 | 8948.0 | 8834.319 | 10769.282 | 10223.895 |
| 175 | -1 | 102.4 | 175.6 | 66.5 | 54.9 | 2480 | 110 | 3.27 | 3.350 | 22.50 | 73 | 4500 | 30 | 33 | 10698.0 | 11073.389 | 10769.282 | 10989.203 |
| 186 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2212 | 109 | 3.19 | 3.400 | 9.00 | 85 | 5250 | 27 | 34 | 8195.0 | 10072.788 | 7300.583 | 8332.735 |
| 189 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2300 | 109 | 3.19 | 3.400 | 10.00 | 100 | 5500 | 26 | 32 | 9995.0 | 11353.595 | 10769.282 | 10776.245 |
| 193 | 0 | 100.4 | 180.2 | 66.9 | 55.1 | 2579 | 97 | 3.01 | 3.400 | 23.00 | 68 | 4500 | 33 | 38 | 13845.0 | 10097.057 | 10769.282 | 11196.059 |
| 194 | 0 | 100.4 | 183.1 | 66.9 | 55.1 | 2563 | 109 | 3.19 | 3.400 | 9.00 | 88 | 5500 | 25 | 31 | 12290.0 | 11471.266 | 10769.282 | 10898.665 |
| 204 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | 145 | 3.01 | 3.400 | 23.00 | 106 | 4800 | 26 | 27 | 22470.0 | 19732.396 | 14542.833 | 15900.374 |
| 205 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3062 | 141 | 3.78 | 3.150 | 9.50 | 114 | 5400 | 19 | 25 | 22625.0 | 19200.271 | 14542.833 | 18459.557 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3673.339 | 3827.726 | 3139.922 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
El modelo de regresión linea múltiple destaca variables estadísticamente significativas: Las variable compressionratio tiene un nivel de confianza del 95%; las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% y la variable enginesize tiene un nivel de confianza como predictor del 99.9%.
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, highwaympg, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.