Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1704) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 25 | 15250.0 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.30 | 140 | 5500 | 17 | 20 | 23875.0 |
| 16 | 16 | 0 | 103.5 | 189.0 | 66.9 | 55.7 | 3230 | 209 | 3.62 | 3.39 | 8.00 | 182 | 5400 | 16 | 22 | 30760.0 |
| 17 | 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.00 | 182 | 5400 | 16 | 22 | 41315.0 |
| 21 | 21 | 0 | 94.5 | 158.8 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 6575.0 |
| 22 | 22 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572.0 |
| 35 | 35 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | 92 | 2.91 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 7129.0 |
| 39 | 39 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | 110 | 3.15 | 3.58 | 9.00 | 86 | 5800 | 27 | 33 | 9095.0 |
| 43 | 43 | 1 | 96.5 | 169.1 | 66.0 | 51.0 | 2293 | 110 | 3.15 | 3.58 | 9.10 | 100 | 5500 | 25 | 31 | 10345.0 |
| 46 | 46 | 0 | 94.5 | 155.9 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 8916.5 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10102.4 -1847.6 -170.1 1637.5 9478.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.867e+04 1.646e+04 -2.958 0.0036 **
## symboling 5.001e+02 2.617e+02 1.911 0.0580 .
## wheelbase 7.755e+01 1.205e+02 0.644 0.5208
## carlength -1.181e+02 6.221e+01 -1.898 0.0596 .
## carwidth 5.641e+02 2.650e+02 2.129 0.0349 *
## carheight 3.007e+02 1.476e+02 2.038 0.0433 *
## curbweight 3.832e+00 2.115e+00 1.812 0.0720 .
## enginesize 1.164e+02 1.471e+01 7.916 5.04e-13 ***
## boreratio -1.713e+03 1.377e+03 -1.244 0.2153
## stroke -3.815e+03 9.292e+02 -4.105 6.61e-05 ***
## compressionratio 2.652e+02 8.853e+01 2.996 0.0032 **
## horsepower 1.981e+01 1.691e+01 1.171 0.2433
## peakrpm 2.192e+00 7.038e-01 3.115 0.0022 **
## citympg -3.415e+02 1.918e+02 -1.781 0.0770 .
## highwaympg 2.378e+02 1.671e+02 1.423 0.1569
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3093 on 150 degrees of freedom
## Multiple R-squared: 0.8679, Adjusted R-squared: 0.8556
## F-statistic: 70.42 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 6 9 16 17 21 22 35 39
## 15454.595 19095.320 26435.568 26406.164 5179.175 5060.059 8457.070 9800.187
## 43 46 53 59 60 61 66 68
## 9739.963 5521.655 6580.044 11341.733 9960.211 10097.279 15874.058 24328.653
## 77 78 80 84 88 92 98 103
## 6289.722 7725.231 8272.657 15700.668 10359.588 6186.304 5762.910 23277.211
## 105 106 113 114 118 132 133 134
## 23904.764 24846.970 18513.233 18902.172 18699.679 10439.022 15006.080 14645.138
## 135 140 144 146 156 157 178 189
## 20718.558 9294.672 10885.450 11147.601 10734.886 6591.232 7931.971 11789.942
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 6 | 15250.0 | 15454.595 |
| 9 | 23875.0 | 19095.320 |
| 16 | 30760.0 | 26435.568 |
| 17 | 41315.0 | 26406.164 |
| 21 | 6575.0 | 5179.175 |
| 22 | 5572.0 | 5060.059 |
| 35 | 7129.0 | 8457.070 |
| 39 | 9095.0 | 9800.187 |
| 43 | 10345.0 | 9739.963 |
| 46 | 8916.5 | 5521.655 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3712.237
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10870190000 13357.520
## 2) enginesize< 182 150 3128774000 11231.340
## 4) curbweight< 2689.5 103 618646100 8744.102
## 8) curbweight< 2367.5 70 148932500 7582.079 *
## 9) curbweight>=2367.5 33 174693900 11209.000 *
## 5) curbweight>=2689.5 47 476521200 16682.110 *
## 3) enginesize>=182 15 282416500 34619.230 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 6 9 16 17 21 22 35 39
## 11209.000 16682.110 34619.233 34619.233 7582.079 7582.079 7582.079 7582.079
## 43 46 53 59 60 61 66 68
## 7582.079 7582.079 7582.079 11209.000 11209.000 11209.000 11209.000 34619.233
## 77 78 80 84 88 92 98 103
## 7582.079 7582.079 7582.079 16682.110 11209.000 7582.079 7582.079 16682.110
## 105 106 113 114 118 132 133 134
## 16682.110 16682.110 16682.110 16682.110 16682.110 11209.000 11209.000 16682.110
## 135 140 144 146 156 157 178 189
## 16682.110 7582.079 7582.079 11209.000 16682.110 7582.079 11209.000 7582.079
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 6 | 15250.0 | 11209.000 |
| 9 | 23875.0 | 16682.110 |
| 16 | 30760.0 | 34619.233 |
| 17 | 41315.0 | 34619.233 |
| 21 | 6575.0 | 7582.079 |
| 22 | 5572.0 | 7582.079 |
| 35 | 7129.0 | 7582.079 |
| 39 | 9095.0 | 7582.079 |
| 43 | 10345.0 | 7582.079 |
| 46 | 8916.5 | 7582.079 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3322.909
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 5081140
## % Var explained: 92.29
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 25422908.34 2895022676
## curbweight 23971706.18 2160074284
## carwidth 5607239.31 1715149048
## horsepower 8883242.29 1508649681
## highwaympg 3203814.62 890444751
## citympg 5894226.28 742996234
## carlength 3006418.06 629155904
## boreratio 1048408.82 443943684
## wheelbase 2512128.10 258821901
## stroke 767121.74 98490286
## carheight 244894.55 61721323
## compressionratio 208898.42 60462807
## peakrpm 1216907.24 49859749
## symboling 84898.83 10962473
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 6 9 16 17 21 22 35 39
## 13988.671 21653.958 30202.289 31068.526 7073.896 6092.961 6732.125 8742.144
## 43 46 53 59 60 61 66 68
## 9660.593 7073.896 6111.287 14270.684 10765.549 11085.659 16654.878 29288.925
## 77 78 80 84 88 92 98 103
## 6027.711 6555.158 8308.740 13385.660 9819.380 6583.651 7386.056 17208.553
## 105 106 113 114 118 132 133 134
## 19171.289 23496.337 16313.111 15193.480 16643.487 10178.258 14385.812 14909.417
## 135 140 144 146 156 157 178 189
## 15488.017 7351.285 9135.426 10776.313 13383.987 7595.688 10266.028 9794.394
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 6 | 15250.0 | 13988.671 |
| 9 | 23875.0 | 21653.958 |
| 16 | 30760.0 | 30202.289 |
| 17 | 41315.0 | 31068.526 |
| 21 | 6575.0 | 7073.896 |
| 22 | 5572.0 | 6092.961 |
| 35 | 7129.0 | 6732.125 |
| 39 | 9095.0 | 8742.144 |
| 43 | 10345.0 | 9660.593 |
| 46 | 8916.5 | 7073.896 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2337.697
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.400 | 8.50 | 110 | 5500 | 19 | 25 | 15250.0 | 15454.595 | 11209.000 | 13988.671 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.400 | 8.30 | 140 | 5500 | 17 | 20 | 23875.0 | 19095.320 | 16682.110 | 21653.958 |
| 16 | 0 | 103.5 | 189.0 | 66.9 | 55.7 | 3230 | 209 | 3.62 | 3.390 | 8.00 | 182 | 5400 | 16 | 22 | 30760.0 | 26435.568 | 34619.233 | 30202.289 |
| 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.390 | 8.00 | 182 | 5400 | 16 | 22 | 41315.0 | 26406.164 | 34619.233 | 31068.526 |
| 21 | 0 | 94.5 | 158.8 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.110 | 9.60 | 70 | 5400 | 38 | 43 | 6575.0 | 5179.175 | 7582.079 | 7073.896 |
| 22 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.230 | 9.41 | 68 | 5500 | 37 | 41 | 5572.0 | 5060.059 | 7582.079 | 6092.961 |
| 35 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | 92 | 2.91 | 3.410 | 9.20 | 76 | 6000 | 30 | 34 | 7129.0 | 8457.070 | 7582.079 | 6732.125 |
| 39 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | 110 | 3.15 | 3.580 | 9.00 | 86 | 5800 | 27 | 33 | 9095.0 | 9800.187 | 7582.079 | 8742.144 |
| 43 | 1 | 96.5 | 169.1 | 66.0 | 51.0 | 2293 | 110 | 3.15 | 3.580 | 9.10 | 100 | 5500 | 25 | 31 | 10345.0 | 9739.963 | 7582.079 | 9660.593 |
| 46 | 0 | 94.5 | 155.9 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.110 | 9.60 | 70 | 5400 | 38 | 43 | 8916.5 | 5521.655 | 7582.079 | 7073.896 |
| 53 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | 91 | 3.03 | 3.150 | 9.00 | 68 | 5000 | 31 | 38 | 6795.0 | 6580.044 | 7582.079 | 6111.287 |
| 59 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2500 | 80 | 3.33 | 3.255 | 9.40 | 135 | 6000 | 16 | 23 | 15645.0 | 11341.733 | 11209.000 | 14270.684 |
| 60 | 1 | 98.8 | 177.8 | 66.5 | 53.7 | 2385 | 122 | 3.39 | 3.390 | 8.60 | 84 | 4800 | 26 | 32 | 8845.0 | 9960.211 | 11209.000 | 10765.549 |
| 61 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2410 | 122 | 3.39 | 3.390 | 8.60 | 84 | 4800 | 26 | 32 | 8495.0 | 10097.279 | 11209.000 | 11085.659 |
| 66 | 0 | 104.9 | 175.0 | 66.1 | 54.4 | 2670 | 140 | 3.76 | 3.160 | 8.00 | 120 | 5000 | 19 | 27 | 18280.0 | 15874.058 | 11209.000 | 16654.878 |
| 68 | -1 | 110.0 | 190.9 | 70.3 | 56.5 | 3515 | 183 | 3.58 | 3.640 | 21.50 | 123 | 4350 | 22 | 25 | 25552.0 | 24328.653 | 34619.233 | 29288.925 |
| 77 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | 92 | 2.97 | 3.230 | 9.40 | 68 | 5500 | 37 | 41 | 5389.0 | 6289.722 | 7582.079 | 6027.711 |
| 78 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | 92 | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 6189.0 | 7725.231 | 7582.079 | 6555.158 |
| 80 | 1 | 93.0 | 157.3 | 63.8 | 50.8 | 2145 | 98 | 3.03 | 3.390 | 7.60 | 102 | 5500 | 24 | 30 | 7689.0 | 8272.657 | 7582.079 | 8308.740 |
| 84 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | 156 | 3.59 | 3.860 | 7.00 | 145 | 5000 | 19 | 24 | 14869.0 | 15700.668 | 16682.110 | 13385.660 |
| 88 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.460 | 7.50 | 116 | 5500 | 23 | 30 | 9279.0 | 10359.588 | 11209.000 | 9819.380 |
| 92 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1918 | 97 | 3.15 | 3.290 | 9.40 | 69 | 5200 | 31 | 37 | 6649.0 | 6186.304 | 7582.079 | 6583.651 |
| 98 | 1 | 94.5 | 170.2 | 63.8 | 53.5 | 2037 | 97 | 3.15 | 3.290 | 9.40 | 69 | 5200 | 31 | 37 | 7999.0 | 5762.910 | 7582.079 | 7386.056 |
| 103 | 0 | 100.4 | 184.6 | 66.5 | 56.1 | 3296 | 181 | 3.43 | 3.270 | 9.00 | 152 | 5200 | 17 | 22 | 14399.0 | 23277.211 | 16682.110 | 17208.553 |
| 105 | 3 | 91.3 | 170.7 | 67.9 | 49.7 | 3071 | 181 | 3.43 | 3.270 | 9.00 | 160 | 5200 | 19 | 25 | 17199.0 | 23904.764 | 16682.110 | 19171.289 |
| 106 | 3 | 91.3 | 170.7 | 67.9 | 49.7 | 3139 | 181 | 3.43 | 3.270 | 7.80 | 200 | 5200 | 17 | 23 | 19699.0 | 24846.970 | 16682.110 | 23496.337 |
| 113 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.520 | 21.00 | 95 | 4150 | 28 | 33 | 16900.0 | 18513.233 | 16682.110 | 16313.111 |
| 114 | 0 | 114.2 | 198.9 | 68.4 | 56.7 | 3285 | 120 | 3.46 | 2.190 | 8.40 | 95 | 5000 | 19 | 24 | 16695.0 | 18902.172 | 16682.110 | 15193.480 |
| 118 | 0 | 108.0 | 186.7 | 68.3 | 56.0 | 3130 | 134 | 3.61 | 3.210 | 7.00 | 142 | 5600 | 18 | 24 | 18150.0 | 18699.679 | 16682.110 | 16643.487 |
| 132 | 2 | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | 132 | 3.46 | 3.900 | 8.70 | 90 | 5100 | 23 | 31 | 9895.0 | 10439.022 | 11209.000 | 10178.258 |
| 133 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | 121 | 3.54 | 3.070 | 9.31 | 110 | 5250 | 21 | 28 | 11850.0 | 15006.080 | 11209.000 | 14385.812 |
| 134 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2695 | 121 | 3.54 | 3.070 | 9.30 | 110 | 5250 | 21 | 28 | 12170.0 | 14645.138 | 16682.110 | 14909.417 |
| 135 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | 121 | 2.54 | 2.070 | 9.30 | 110 | 5250 | 21 | 28 | 15040.0 | 20718.558 | 16682.110 | 15488.017 |
| 140 | 2 | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | 108 | 3.62 | 2.640 | 8.70 | 73 | 4400 | 26 | 31 | 7053.0 | 9294.672 | 7582.079 | 7351.285 |
| 144 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2340 | 108 | 3.62 | 2.640 | 9.00 | 94 | 5200 | 26 | 32 | 9960.0 | 10885.450 | 7582.079 | 9135.426 |
| 146 | 0 | 97.0 | 172.0 | 65.4 | 54.3 | 2510 | 108 | 3.62 | 2.640 | 7.70 | 111 | 4800 | 24 | 29 | 11259.0 | 11147.601 | 11209.000 | 10776.313 |
| 156 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 3110 | 92 | 3.05 | 3.030 | 9.00 | 62 | 4800 | 27 | 32 | 8778.0 | 10734.886 | 16682.110 | 13383.987 |
| 157 | 0 | 95.7 | 166.3 | 64.4 | 53.0 | 2081 | 98 | 3.19 | 3.030 | 9.00 | 70 | 4800 | 30 | 37 | 6938.0 | 6591.232 | 7582.079 | 7595.688 |
| 178 | -1 | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | 122 | 3.31 | 3.540 | 8.70 | 92 | 4200 | 27 | 32 | 11248.0 | 7931.971 | 11209.000 | 10266.028 |
| 189 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2300 | 109 | 3.19 | 3.400 | 10.00 | 100 | 5500 | 26 | 32 | 9995.0 | 11789.942 | 7582.079 | 9794.394 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3712.237 | 3322.909 | 2337.697 |
Este caso consistió en cargar un conjunto de datos numéricos de precios de automóviles con respecto a algunas variables numéricas.
El modelo de regresión linea múltiple destaca variables estadísticamente significativas: Las variables stroke, compressionratio y peakrpm tienen un nivel de confianza como predictores del 99%. La variable enginesize también posee un nivel de confianza del 99.9%. Por otro lado, la mayoría de las variables están por lo menos muy cerca del 90% y en algunos casos del 95%.
En el modelode árbol de regresión la variable con mayor importancia es enginesize con un valor de 23 y luego le siguen algunas otras como curbweight, horsepower, carwidth, highwaympg, citympg, carlength y wheelbase, también con ese orden de importancia.
El modelo de bosque aleatorio tiene en cuenta variables de importancia como: enginesize, curbweight, horsepower, citympg y carwidth.
Se puede apreciar, la variable enginesize continua estando presente en todos los modelos como importante y significativa.
El mejor modelofue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. El valor que arrojó fue de 2337.697, siendo el más bajo de los 3 modelos de regresión.