Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1307) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.00 | 111 | 5000 | 21 | 27 | 13495 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.00 | 102 | 5500 | 24 | 30 | 13950 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 20970 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105 |
| 18 | 18 | 0 | 110.0 | 197.0 | 70.9 | 56.3 | 3505 | 209 | 3.62 | 3.39 | 8.00 | 182 | 5400 | 15 | 20 | 36880 |
| 19 | 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.03 | 9.50 | 48 | 5100 | 47 | 53 | 5151 |
| 22 | 22 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572 |
| 23 | 23 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6377 |
| 38 | 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.58 | 9.00 | 86 | 5800 | 27 | 33 | 7895 |
| 42 | 42 | 0 | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | 110 | 3.15 | 3.58 | 9.00 | 101 | 5800 | 24 | 28 | 12945 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9575.7 -1740.3 18.8 1526.8 14412.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.966e+04 1.788e+04 -2.777 0.006184 **
## symboling 2.560e+02 2.805e+02 0.912 0.362985
## wheelbase 1.277e+02 1.263e+02 1.011 0.313448
## carlength -6.860e+01 6.349e+01 -1.080 0.281674
## carwidth 4.241e+02 2.799e+02 1.515 0.131806
## carheight 2.055e+02 1.588e+02 1.294 0.197486
## curbweight 1.576e+00 1.951e+00 0.808 0.420413
## enginesize 1.145e+02 1.514e+01 7.568 3.56e-12 ***
## boreratio -9.993e+02 1.435e+03 -0.696 0.487201
## stroke -3.039e+03 9.327e+02 -3.258 0.001389 **
## compressionratio 3.309e+02 9.523e+01 3.475 0.000668 ***
## horsepower 3.044e+01 1.849e+01 1.646 0.101770
## peakrpm 2.621e+00 7.464e-01 3.512 0.000588 ***
## citympg -3.980e+02 2.129e+02 -1.869 0.063598 .
## highwaympg 2.490e+02 1.849e+02 1.347 0.179990
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3240 on 150 degrees of freedom
## Multiple R-squared: 0.8465, Adjusted R-squared: 0.8322
## F-statistic: 59.11 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 4 13 14 18 19 22 23
## 13187.261 12122.679 16254.768 16341.463 29417.089 -1341.794 4843.380 6480.748
## 38 42 45 53 54 65 68 69
## 10153.838 10542.660 5466.677 6028.840 5563.672 10088.770 24062.963 24885.549
## 72 73 77 84 94 97 103 106
## 30825.178 29184.730 5645.758 15321.158 5920.836 6378.960 22747.899 23650.878
## 117 120 124 133 135 138 143 147
## 17915.873 8481.003 11616.651 14689.541 18801.358 17104.206 7915.848 8681.818
## 148 150 155 159 165 184 188 192
## 11300.588 9821.287 6607.841 8481.645 5970.279 10227.048 9368.167 15747.809
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 13187.261 |
| 4 | 13950 | 12122.679 |
| 13 | 20970 | 16254.768 |
| 14 | 21105 | 16341.463 |
| 18 | 36880 | 29417.089 |
| 19 | 5151 | -1341.794 |
| 22 | 5572 | 4843.380 |
| 23 | 6377 | 6480.748 |
| 38 | 7895 | 10153.838 |
| 42 | 12945 | 10542.660 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3071.741
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10263650000 13207.410
## 2) enginesize< 182 152 3174609000 11330.450
## 4) curbweight< 2624.5 100 610484400 8735.675
## 8) curbweight< 2367.5 67 121724900 7563.381 *
## 9) curbweight>=2367.5 33 209740200 11115.790 *
## 5) curbweight>=2624.5 52 596065500 16320.390
## 10) carwidth< 68.6 44 394490800 15583.750 *
## 11) carwidth>=68.6 8 46382600 20371.880 *
## 3) enginesize>=182 13 292348800 35153.500 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 4 13 14 18 19 22 23
## 11115.788 7563.381 15583.754 15583.754 35153.500 7563.381 7563.381 7563.381
## 38 42 45 53 54 65 68 69
## 7563.381 11115.788 7563.381 7563.381 7563.381 11115.788 35153.500 35153.500
## 72 73 77 84 94 97 103 106
## 35153.500 35153.500 7563.381 15583.754 7563.381 7563.381 15583.754 15583.754
## 117 120 124 133 135 138 143 147
## 15583.754 7563.381 11115.788 15583.754 15583.754 15583.754 7563.381 7563.381
## 148 150 155 159 165 184 188 192
## 11115.788 15583.754 7563.381 7563.381 7563.381 7563.381 7563.381 15583.754
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 11115.788 |
| 4 | 13950 | 7563.381 |
| 13 | 20970 | 15583.754 |
| 14 | 21105 | 15583.754 |
| 18 | 36880 | 35153.500 |
| 19 | 5151 | 7563.381 |
| 22 | 5572 | 7563.381 |
| 23 | 6377 | 7563.381 |
| 38 | 7895 | 7563.381 |
| 42 | 12945 | 11115.788 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 2962.646
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 5989613
## % Var explained: 90.37
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 25367855.99 3107956116
## curbweight 21079567.61 2415517518
## horsepower 9913179.25 1627114679
## highwaympg 4377674.74 965418196
## citympg 3988234.29 665637697
## carlength 4565557.03 658863936
## carwidth 2067552.08 246691319
## wheelbase 3834990.52 214304221
## stroke 830642.84 117291996
## boreratio 360360.10 96699499
## peakrpm 1006905.80 86476147
## compressionratio -148366.54 82691393
## carheight 173930.67 54458996
## symboling 43480.14 28061333
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 4 13 14 18 19 22 23
## 15613.917 11652.016 15618.462 15659.087 37439.913 6373.898 5992.345 6190.459
## 38 42 45 53 54 65 68 69
## 9007.894 11131.268 6507.137 6349.365 6875.095 9568.021 28143.673 28143.673
## 72 73 77 84 94 97 103 106
## 32622.177 30988.420 6174.546 13922.200 7581.618 7098.892 16995.930 22068.240
## 117 120 124 133 135 138 143 147
## 14966.372 8047.759 9532.490 13716.900 14770.103 18438.755 8471.235 8701.272
## 148 150 155 159 165 184 188 192
## 9905.081 11237.458 8004.001 7658.347 8367.050 8650.370 8297.474 15743.495
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 15613.917 |
| 4 | 13950 | 11652.016 |
| 13 | 20970 | 15618.462 |
| 14 | 21105 | 15659.087 |
| 18 | 36880 | 37439.913 |
| 19 | 5151 | 6373.898 |
| 22 | 5572 | 5992.345 |
| 23 | 6377 | 6190.459 |
| 38 | 7895 | 9007.894 |
| 42 | 12945 | 11131.268 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1913.242
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.00 | 111 | 5000 | 21 | 27 | 13495.0 | 13187.261 | 11115.788 | 15613.917 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.00 | 102 | 5500 | 24 | 30 | 13950.0 | 12122.679 | 7563.381 | 11652.016 |
| 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 20970.0 | 16254.768 | 15583.754 | 15618.462 |
| 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105.0 | 16341.463 | 15583.754 | 15659.087 |
| 18 | 0 | 110.0 | 197.0 | 70.9 | 56.3 | 3505 | 209 | 3.62 | 3.39 | 8.00 | 182 | 5400 | 15 | 20 | 36880.0 | 29417.089 | 35153.500 | 37439.913 |
| 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.03 | 9.50 | 48 | 5100 | 47 | 53 | 5151.0 | -1341.794 | 7563.381 | 6373.898 |
| 22 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572.0 | 4843.380 | 7563.381 | 5992.345 |
| 23 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6377.0 | 6480.748 | 7563.381 | 6190.459 |
| 38 | 0 | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | 110 | 3.15 | 3.58 | 9.00 | 86 | 5800 | 27 | 33 | 7895.0 | 10153.838 | 7563.381 | 9007.894 |
| 42 | 0 | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | 110 | 3.15 | 3.58 | 9.00 | 101 | 5800 | 24 | 28 | 12945.0 | 10542.660 | 11115.788 | 11131.268 |
| 45 | 1 | 94.5 | 155.9 | 63.6 | 52.0 | 1874 | 90 | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 8916.5 | 5466.677 | 7563.381 | 6507.137 |
| 53 | 1 | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | 91 | 3.03 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 6795.0 | 6028.840 | 7563.381 | 6349.365 |
| 54 | 1 | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | 91 | 3.03 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 6695.0 | 5563.672 | 7563.381 | 6875.095 |
| 65 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | 122 | 3.39 | 3.39 | 8.60 | 84 | 4800 | 26 | 32 | 11245.0 | 10088.770 | 11115.788 | 9568.021 |
| 68 | -1 | 110.0 | 190.9 | 70.3 | 56.5 | 3515 | 183 | 3.58 | 3.64 | 21.50 | 123 | 4350 | 22 | 25 | 25552.0 | 24062.963 | 35153.500 | 28143.673 |
| 69 | -1 | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | 183 | 3.58 | 3.64 | 21.50 | 123 | 4350 | 22 | 25 | 28248.0 | 24885.549 | 35153.500 | 28143.673 |
| 72 | -1 | 115.6 | 202.6 | 71.7 | 56.5 | 3740 | 234 | 3.46 | 3.10 | 8.30 | 155 | 4750 | 16 | 18 | 34184.0 | 30825.178 | 35153.500 | 32622.177 |
| 73 | 3 | 96.6 | 180.3 | 70.5 | 50.8 | 3685 | 234 | 3.46 | 3.10 | 8.30 | 155 | 4750 | 16 | 18 | 35056.0 | 29184.730 | 35153.500 | 30988.420 |
| 77 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | 92 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 37 | 41 | 5389.0 | 5645.758 | 7563.381 | 6174.546 |
| 84 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | 156 | 3.59 | 3.86 | 7.00 | 145 | 5000 | 19 | 24 | 14869.0 | 15321.158 | 15583.754 | 13922.200 |
| 94 | 1 | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | 97 | 3.15 | 3.29 | 9.40 | 69 | 5200 | 31 | 37 | 7349.0 | 5920.836 | 7563.381 | 7581.618 |
| 97 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1971 | 97 | 3.15 | 3.29 | 9.40 | 69 | 5200 | 31 | 37 | 7499.0 | 6378.960 | 7563.381 | 7098.892 |
| 103 | 0 | 100.4 | 184.6 | 66.5 | 56.1 | 3296 | 181 | 3.43 | 3.27 | 9.00 | 152 | 5200 | 17 | 22 | 14399.0 | 22747.899 | 15583.754 | 16995.930 |
| 106 | 3 | 91.3 | 170.7 | 67.9 | 49.7 | 3139 | 181 | 3.43 | 3.27 | 7.80 | 200 | 5200 | 17 | 23 | 19699.0 | 23650.878 | 15583.754 | 22068.240 |
| 117 | 0 | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | 152 | 3.70 | 3.52 | 21.00 | 95 | 4150 | 28 | 33 | 17950.0 | 17915.873 | 15583.754 | 14966.372 |
| 120 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | 98 | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 7957.0 | 8481.003 | 7563.381 | 8047.759 |
| 124 | -1 | 103.3 | 174.6 | 64.6 | 59.8 | 2535 | 122 | 3.35 | 3.46 | 8.50 | 88 | 5000 | 24 | 30 | 8921.0 | 11616.651 | 11115.788 | 9532.490 |
| 133 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | 121 | 3.54 | 3.07 | 9.31 | 110 | 5250 | 21 | 28 | 11850.0 | 14689.541 | 15583.754 | 13716.900 |
| 135 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | 121 | 2.54 | 2.07 | 9.30 | 110 | 5250 | 21 | 28 | 15040.0 | 18801.358 | 15583.754 | 14770.103 |
| 138 | 2 | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | 121 | 3.54 | 3.07 | 9.00 | 160 | 5500 | 19 | 26 | 18620.0 | 17104.206 | 15583.754 | 18438.755 |
| 143 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2190 | 108 | 3.62 | 2.64 | 9.50 | 82 | 4400 | 28 | 33 | 7775.0 | 7915.848 | 7563.381 | 8471.235 |
| 147 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | 108 | 3.62 | 2.64 | 9.00 | 82 | 4800 | 28 | 32 | 7463.0 | 8681.818 | 7563.381 | 8701.272 |
| 148 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | 108 | 3.62 | 2.64 | 9.00 | 94 | 5200 | 25 | 31 | 10198.0 | 11300.588 | 11115.788 | 9905.081 |
| 150 | 0 | 96.9 | 173.6 | 65.4 | 54.9 | 2650 | 108 | 3.62 | 2.64 | 7.70 | 111 | 4800 | 23 | 23 | 11694.0 | 9821.287 | 15583.754 | 11237.458 |
| 155 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | 92 | 3.05 | 3.03 | 9.00 | 62 | 4800 | 27 | 32 | 7898.0 | 6607.841 | 7563.381 | 8004.001 |
| 159 | 0 | 95.7 | 166.3 | 64.4 | 53.0 | 2275 | 110 | 3.27 | 3.35 | 22.50 | 56 | 4500 | 34 | 36 | 7898.0 | 8481.645 | 7563.381 | 7658.347 |
| 165 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | 98 | 3.19 | 3.03 | 9.00 | 70 | 4800 | 29 | 34 | 8238.0 | 5970.279 | 7563.381 | 8367.050 |
| 184 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2209 | 109 | 3.19 | 3.40 | 9.00 | 85 | 5250 | 27 | 34 | 7975.0 | 10227.048 | 7563.381 | 8650.370 |
| 188 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | 97 | 3.01 | 3.40 | 23.00 | 68 | 4500 | 37 | 42 | 9495.0 | 9368.167 | 7563.381 | 8297.474 |
| 192 | 0 | 100.4 | 180.2 | 66.9 | 55.1 | 2661 | 136 | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 24 | 13295.0 | 15747.809 | 15583.754 | 15743.495 |
Se compara el RMSE
rmse <- data.frame("Regresion multiple" = rmse_rm, "Arboles de regresión" = rmse_ar, "Bosques aleatorios" = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| Regresion.multiple | Arboles.de.regresión | Bosques.aleatorios |
|---|---|---|
| 3071.741 | 2962.646 | 1913.242 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
Se utilizo la semilla 1307.
El modelo de regresión linea múltiple:
Interpretando la tabla de summary en el modelo de regresion multiple se puede indentificar que las variables con mejor precision son: Peakrpm, enginesize, compressionratio. La variable de enginesize es la mas significativa en este modelo y la que va a causar el mayor cambio durante las predicciones.
El modelo de arbol de regresion lineal:
En este modelo las variables mas importantes es enginesize, curbweight, y carwidth. Este fue el segundo mejor modelo de todos los que se comparararon en relacion al RMSE.
El modelo de bosques aleatorio:
Las variables enginesize, curbweight y horsepower son las mas importantes en este modelo.
Teniendo en cuenta que el mejor valor de RMSE en los modelos fue el de arboles aleatorios con un valor de RMSE de: 1913.242, este es mejor por bastante en comparación de los demas, y esto me dice que las variables de enginesize y curbweight fueron las mas importantes en todos los modelos. Revisando la comparación de las predicciones contra las reales, puedo decir que son variadas a veces cercana pero en general no diria que es muy confiable.