Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1271) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.400 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.400 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 33 | 33 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | 79 | 2.91 | 3.070 | 10.1 | 60 | 5500 | 38 | 42 | 5399.00 |
| 44 | 44 | 0 | 94.3 | 170.7 | 61.8 | 53.5 | 2337 | 111 | 3.31 | 3.230 | 8.5 | 78 | 4800 | 24 | 29 | 6785.00 |
| 47 | 47 | 2 | 96.0 | 172.6 | 65.2 | 51.4 | 2734 | 119 | 3.43 | 3.230 | 9.2 | 90 | 5000 | 24 | 29 | 11048.00 |
| 56 | 56 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945.00 |
| 57 | 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845.00 |
| 72 | 72 | -1 | 115.6 | 202.6 | 71.7 | 56.5 | 3740 | 234 | 3.46 | 3.100 | 8.3 | 155 | 4750 | 16 | 18 | 34184.00 |
| 77 | 77 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | 92 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 37 | 41 | 5389.00 |
| 84 | 84 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | 156 | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 14869.00 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11507.1 -1689.2 -187.6 1401.1 13627.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -46322.258 18280.752 -2.534 0.012305 *
## symboling 64.364 280.958 0.229 0.819112
## wheelbase 135.009 120.780 1.118 0.265436
## carlength -86.062 62.750 -1.372 0.172264
## carwidth 324.450 285.859 1.135 0.258185
## carheight 162.791 163.722 0.994 0.321674
## curbweight 1.837 2.311 0.795 0.427941
## enginesize 133.912 16.136 8.299 5.64e-14 ***
## boreratio 449.081 1419.698 0.316 0.752198
## stroke -3428.857 889.463 -3.855 0.000171 ***
## compressionratio 292.481 100.613 2.907 0.004203 **
## horsepower 19.572 18.852 1.038 0.300836
## peakrpm 2.841 0.772 3.680 0.000324 ***
## citympg -396.153 207.315 -1.911 0.057928 .
## highwaympg 310.853 187.381 1.659 0.099218 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3310 on 150 degrees of freedom
## Multiple R-squared: 0.8578, Adjusted R-squared: 0.8446
## F-statistic: 64.65 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 98%.
Las variables highwaympg y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables enginesize, stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
0.8446
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8446 significa que las variables independientes explican aproximadamente el 84.44% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 9 10 33 44 47 56 57 72
## 16942.803 16473.778 5301.044 7204.223 11022.515 7851.656 7851.656 31588.516
## 77 84 87 89 97 100 104 107
## 5979.030 15252.727 10378.638 10405.155 6684.602 10903.360 22703.983 23008.175
## 108 116 122 132 138 140 142 147
## 14327.392 14428.427 6166.339 11091.658 16276.811 8985.576 9585.263 9661.038
## 148 149 155 156 160 161 167 171
## 12213.061 11235.252 6490.468 7996.805 10685.304 7017.584 11959.446 14219.038
## 173 178 188 192 196 201 203 204
## 14925.579 8406.054 8669.066 15473.126 17905.580 18628.974 24791.956 19036.351
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 9 | 23875.00 | 16942.803 |
| 10 | 17859.17 | 16473.778 |
| 33 | 5399.00 | 5301.044 |
| 44 | 6785.00 | 7204.223 |
| 47 | 11048.00 | 11022.515 |
| 56 | 10945.00 | 7851.656 |
| 57 | 11845.00 | 7851.656 |
| 72 | 34184.00 | 31588.516 |
| 77 | 5389.00 | 5979.030 |
| 84 | 14869.00 | 15252.727 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2915.551
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11562880000 13522.800
## 2) enginesize< 182 148 2976024000 11138.720
## 4) curbweight< 2664 104 693947700 8805.567
## 8) curbweight< 2291 63 92941480 7319.714 *
## 9) curbweight>=2291 41 248195500 11088.710
## 18) highwaympg>=29.5 29 63171640 9991.931 *
## 19) highwaympg< 29.5 12 65834740 13739.250 *
## 5) curbweight>=2664 44 377800600 16653.450 *
## 3) enginesize>=182 17 422193000 34278.320 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 9 10 33 44 47 56 57 72
## 16653.455 16653.455 7319.714 13739.250 16653.455 13739.250 13739.250 34278.324
## 77 84 87 89 97 100 104 107
## 7319.714 16653.455 9991.931 9991.931 7319.714 9991.931 16653.455 16653.455
## 108 116 122 132 138 140 142 147
## 16653.455 16653.455 7319.714 9991.931 16653.455 7319.714 7319.714 7319.714
## 148 149 155 156 160 161 167 171
## 9991.931 13739.250 7319.714 16653.455 7319.714 7319.714 13739.250 16653.455
## 173 178 188 192 196 201 203 204
## 16653.455 9991.931 9991.931 13739.250 16653.455 16653.455 16653.455 16653.455
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 9 | 23875.00 | 16653.455 |
| 10 | 17859.17 | 16653.455 |
| 33 | 5399.00 | 7319.714 |
| 44 | 6785.00 | 13739.250 |
| 47 | 11048.00 | 16653.455 |
| 56 | 10945.00 | 13739.250 |
| 57 | 11845.00 | 13739.250 |
| 72 | 34184.00 | 34278.324 |
| 77 | 5389.00 | 7319.714 |
| 84 | 14869.00 | 16653.455 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3217.158
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 4447333
## % Var explained: 93.65
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 49729573.3 4931905857
## highwaympg 13259713.3 1526465689
## curbweight 12701949.8 963951056
## horsepower 8623072.0 880190906
## carlength 3045830.6 737353416
## wheelbase 6255495.2 528155676
## citympg 3812712.5 370359547
## carwidth 2516471.0 311465860
## boreratio 4479711.1 257610626
## compressionratio 1017535.0 149039936
## peakrpm 1939356.6 127464095
## stroke 1493552.1 51783892
## carheight 659535.1 51572211
## symboling 274547.4 34059590
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 9 10 33 44 47 56 57 72
## 19545.596 18617.807 6295.784 10161.188 13073.485 13817.543 13817.543 35946.032
## 77 84 87 89 97 100 104 107
## 5891.486 13470.944 8822.478 9607.702 6859.600 9450.738 14622.413 16847.024
## 108 116 122 132 138 140 142 147
## 15300.398 15279.498 7156.435 9794.326 15916.927 7990.458 7935.674 8767.958
## 148 149 155 156 160 161 167 171
## 9667.052 11505.499 7957.923 10298.215 8022.398 7122.327 9752.818 11191.824
## 173 178 188 192 196 201 203 204
## 11812.162 9909.652 8265.507 15973.523 15680.271 18622.537 18923.450 17244.725
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 9 | 23875.00 | 19545.596 |
| 10 | 17859.17 | 18617.807 |
| 33 | 5399.00 | 6295.784 |
| 44 | 6785.00 | 10161.188 |
| 47 | 11048.00 | 13073.485 |
| 56 | 10945.00 | 13817.542 |
| 57 | 11845.00 | 13817.542 |
| 72 | 34184.00 | 35946.032 |
| 77 | 5389.00 | 5891.486 |
| 84 | 14869.00 | 13470.944 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2146.428
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.400 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 | 16942.803 | 16653.455 | 19545.596 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.400 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 | 16473.778 | 16653.455 | 18617.807 |
| 33 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | 79 | 2.91 | 3.070 | 10.1 | 60 | 5500 | 38 | 42 | 5399.00 | 5301.044 | 7319.714 | 6295.784 |
| 44 | 0 | 94.3 | 170.7 | 61.8 | 53.5 | 2337 | 111 | 3.31 | 3.230 | 8.5 | 78 | 4800 | 24 | 29 | 6785.00 | 7204.223 | 13739.250 | 10161.188 |
| 47 | 2 | 96.0 | 172.6 | 65.2 | 51.4 | 2734 | 119 | 3.43 | 3.230 | 9.2 | 90 | 5000 | 24 | 29 | 11048.00 | 11022.515 | 16653.455 | 13073.485 |
| 56 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945.00 | 7851.656 | 13739.250 | 13817.542 |
| 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845.00 | 7851.656 | 13739.250 | 13817.542 |
| 72 | -1 | 115.6 | 202.6 | 71.7 | 56.5 | 3740 | 234 | 3.46 | 3.100 | 8.3 | 155 | 4750 | 16 | 18 | 34184.00 | 31588.516 | 34278.324 | 35946.032 |
| 77 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | 92 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 37 | 41 | 5389.00 | 5979.030 | 7319.714 | 5891.486 |
| 84 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | 156 | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 14869.00 | 15252.727 | 16653.455 | 13470.944 |
| 87 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2405 | 122 | 3.35 | 3.460 | 8.5 | 88 | 5000 | 25 | 32 | 8189.00 | 10378.638 | 9991.931 | 8822.478 |
| 89 | -1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.460 | 7.5 | 116 | 5500 | 23 | 30 | 9279.00 | 10405.155 | 9991.931 | 9607.702 |
| 97 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1971 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7499.00 | 6684.602 | 7319.714 | 6859.600 |
| 100 | 0 | 97.2 | 173.4 | 65.2 | 54.7 | 2324 | 120 | 3.33 | 3.470 | 8.5 | 97 | 5200 | 27 | 34 | 8949.00 | 10903.360 | 9991.931 | 9450.738 |
| 104 | 0 | 100.4 | 184.6 | 66.5 | 55.1 | 3060 | 181 | 3.43 | 3.270 | 9.0 | 152 | 5200 | 19 | 25 | 13499.00 | 22703.983 | 16653.455 | 14622.413 |
| 107 | 1 | 99.2 | 178.5 | 67.9 | 49.7 | 3139 | 181 | 3.43 | 3.270 | 9.0 | 160 | 5200 | 19 | 25 | 18399.00 | 23008.175 | 16653.455 | 16847.024 |
| 108 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3020 | 120 | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 11900.00 | 14327.392 | 16653.455 | 15300.398 |
| 116 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | 120 | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630.00 | 14428.427 | 16653.455 | 15279.498 |
| 122 | 1 | 93.7 | 167.3 | 63.8 | 50.8 | 1989 | 90 | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6692.00 | 6166.339 | 7319.714 | 7156.435 |
| 132 | 2 | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | 132 | 3.46 | 3.900 | 8.7 | 90 | 5100 | 23 | 31 | 9895.00 | 11091.658 | 9991.931 | 9794.326 |
| 138 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | 121 | 3.54 | 3.070 | 9.0 | 160 | 5500 | 19 | 26 | 18620.00 | 16276.811 | 16653.455 | 15916.927 |
| 140 | 2 | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | 108 | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7053.00 | 8985.576 | 7319.714 | 7990.458 |
| 142 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2145 | 108 | 3.62 | 2.640 | 9.5 | 82 | 4800 | 32 | 37 | 7126.00 | 9585.263 | 7319.714 | 7935.674 |
| 147 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | 108 | 3.62 | 2.640 | 9.0 | 82 | 4800 | 28 | 32 | 7463.00 | 9661.038 | 7319.714 | 8767.958 |
| 148 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | 108 | 3.62 | 2.640 | 9.0 | 94 | 5200 | 25 | 31 | 10198.00 | 12213.061 | 9991.931 | 9667.052 |
| 149 | 0 | 96.9 | 173.6 | 65.4 | 54.9 | 2420 | 108 | 3.62 | 2.640 | 9.0 | 82 | 4800 | 23 | 29 | 8013.00 | 11235.252 | 13739.250 | 11505.499 |
| 155 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898.00 | 6490.468 | 7319.714 | 7957.922 |
| 156 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 3110 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 8778.00 | 7996.805 | 16653.455 | 10298.215 |
| 160 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2275 | 110 | 3.27 | 3.350 | 22.5 | 56 | 4500 | 38 | 47 | 7788.00 | 10685.304 | 7319.714 | 8022.398 |
| 161 | 0 | 95.7 | 166.3 | 64.4 | 53.0 | 2094 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 38 | 47 | 7738.00 | 7017.584 | 7319.714 | 7122.328 |
| 167 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | 98 | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9538.00 | 11959.446 | 13739.250 | 9752.818 |
| 171 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2679 | 146 | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 11199.00 | 14219.038 | 16653.455 | 11191.824 |
| 173 | 2 | 98.4 | 176.2 | 65.6 | 53.0 | 2975 | 146 | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 17669.00 | 14925.579 | 16653.455 | 11812.162 |
| 178 | -1 | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | 122 | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 11248.00 | 8406.054 | 9991.931 | 9909.652 |
| 188 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | 97 | 3.01 | 3.400 | 23.0 | 68 | 4500 | 37 | 42 | 9495.00 | 8669.066 | 9991.931 | 8265.507 |
| 192 | 0 | 100.4 | 180.2 | 66.9 | 55.1 | 2661 | 136 | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 24 | 13295.00 | 15473.126 | 13739.250 | 15973.522 |
| 196 | -1 | 104.3 | 188.8 | 67.2 | 57.5 | 3034 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 23 | 28 | 13415.00 | 17905.580 | 16653.455 | 15680.271 |
| 201 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 2952 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 23 | 28 | 16845.00 | 18628.974 | 16653.455 | 18622.537 |
| 203 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3012 | 173 | 3.58 | 2.870 | 8.8 | 134 | 5500 | 18 | 23 | 21485.00 | 24791.956 | 16653.455 | 18923.450 |
| 204 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | 145 | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470.00 | 19036.351 | 16653.455 | 17244.725 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2915.551 | 3217.158 | 2146.428 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
El modelo de regresión linea múltiple destaca variables estadísticamente significativas: - enginesize -> 99.9999 % - stroke -> 99.9829 % - peakrpm -> 99.9676 % - compressionratio -> 99.5797 % - citympg -> 94.2072 % - highwaympg -> 90.0782 %
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, highwaympg, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, highwaympg
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. Asi quedaron de mas aceptado al menor: