Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1349) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 15 | 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 20 | 25 | 24565.00 |
| 17 | 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.0 | 182 | 5400 | 16 | 22 | 41315.00 |
| 21 | 21 | 0 | 94.5 | 158.8 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.11 | 9.6 | 70 | 5400 | 38 | 43 | 6575.00 |
| 25 | 25 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6229.00 |
| 27 | 27 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | 90 | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 7609.00 |
| 28 | 28 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | 98 | 3.03 | 3.39 | 7.6 | 102 | 5500 | 24 | 30 | 8558.00 |
| 30 | 30 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2811 | 156 | 3.60 | 3.90 | 7.0 | 145 | 5000 | 19 | 24 | 12964.00 |
| 37 | 37 | 0 | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | 92 | 2.92 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7295.00 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8237.4 -1775.1 -293.9 1748.8 8188.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -59279.491 15976.915 -3.710 0.000291 ***
## symboling 529.303 254.233 2.082 0.039044 *
## wheelbase 206.775 109.680 1.885 0.061330 .
## carlength -101.985 59.650 -1.710 0.089388 .
## carwidth 545.172 245.365 2.222 0.027788 *
## carheight 188.819 144.654 1.305 0.193785
## curbweight 1.772 1.707 1.038 0.300835
## enginesize 111.838 13.863 8.067 2.13e-13 ***
## boreratio -646.436 1378.631 -0.469 0.639824
## stroke -2475.780 864.704 -2.863 0.004795 **
## compressionratio 251.977 83.350 3.023 0.002943 **
## horsepower 25.691 16.360 1.570 0.118454
## peakrpm 2.237 0.708 3.159 0.001914 **
## citympg -180.043 192.173 -0.937 0.350324
## highwaympg 89.607 164.983 0.543 0.587849
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2983 on 150 degrees of freedom
## Multiple R-squared: 0.8743, Adjusted R-squared: 0.8625
## F-statistic: 74.5 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Las variables wheelbase, carwidth y citympg tienen un nivel de confianza del 90% (.)
Las variable compressionratio tiene un nivel de confianza del 95% (*)
Las variables stroke y peakrpm tienen un nivel de confianza como predictores del 99% (**)
La variable enginesize tiene un nivel de confianza como predictor del 99.9% (***)
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8351 significa que las variables independientes explican aproximadamente el 83.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 10 15 17 21 25 27 28
## 12691.088 16843.627 17604.480 26005.030 5654.218 6794.408 6833.392 8614.499
## 30 37 47 66 68 77 78 88
## 15945.657 9379.388 11308.525 16287.079 24416.856 7013.985 7871.498 10641.664
## 91 103 115 125 129 132 133 135
## 7620.372 21985.682 19244.893 16063.557 25917.629 11241.316 14612.488 17819.011
## 147 151 153 164 165 167 168 169
## 8942.331 5940.300 6624.027 6275.962 6337.982 11650.084 13988.972 13981.884
## 170 172 187 192 194 198 200 201
## 14008.464 14297.298 10735.382 15424.217 11042.971 16502.119 16359.943 18064.359
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 12691.088 |
| 10 | 17859.17 | 16843.627 |
| 15 | 24565.00 | 17604.480 |
| 17 | 41315.00 | 26005.030 |
| 21 | 6575.00 | 5654.218 |
| 25 | 6229.00 | 6794.408 |
| 27 | 7609.00 | 6833.392 |
| 28 | 8558.00 | 8614.499 |
| 30 | 12964.00 | 15945.657 |
| 37 | 7295.00 | 9379.388 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3976.765
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10616350000 13352.880
## 2) enginesize< 182 150 3155582000 11268.030
## 4) curbweight< 2697.5 105 675607100 8858.238
## 8) curbweight< 2291.5 59 85854540 7324.644 *
## 9) curbweight>=2291.5 46 273011600 10825.240
## 18) horsepower< 100.5 30 55831460 9610.700 *
## 19) horsepower>=100.5 16 89952290 13102.500 *
## 5) curbweight>=2697.5 45 447485400 16890.890
## 10) carwidth< 68.6 38 283691400 16156.840 *
## 11) carwidth>=68.6 7 32166770 20875.710 *
## 3) enginesize>=182 15 288888800 34201.370 *
enginesize
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 10 15 17 21 25 27 28
## 13102.500 16156.842 16156.842 34201.367 7324.644 7324.644 7324.644 7324.644
## 30 37 47 66 68 77 78 88
## 16156.842 7324.644 16156.842 13102.500 34201.367 7324.644 7324.644 13102.500
## 91 103 115 125 129 132 133 135
## 7324.644 16156.842 16156.842 16156.842 34201.367 9610.700 13102.500 16156.842
## 147 151 153 164 165 167 168 169
## 7324.644 7324.644 7324.644 7324.644 7324.644 13102.500 13102.500 13102.500
## 170 172 187 192 194 198 200 201
## 13102.500 16156.842 7324.644 13102.500 9610.700 16156.842 16156.842 20875.714
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 13102.500 |
| 10 | 17859.17 | 16156.842 |
| 15 | 24565.00 | 16156.842 |
| 17 | 41315.00 | 34201.367 |
| 21 | 6575.00 | 7324.644 |
| 25 | 6229.00 | 7324.644 |
| 27 | 7609.00 | 7324.644 |
| 28 | 8558.00 | 7324.644 |
| 30 | 12964.00 | 16156.842 |
| 37 | 7295.00 | 7324.644 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3269.982
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 5184444
## % Var explained: 91.94
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 36307249.78 3054315745
## horsepower 16162109.64 1969567023
## carwidth 8303231.37 1116205283
## curbweight 7896706.69 1078785414
## citympg 6147977.62 879224284
## highwaympg 6116023.11 814388953
## carlength 3083716.76 364604499
## boreratio 784420.43 217358473
## wheelbase 3478922.28 158541083
## carheight 19359.00 79562961
## peakrpm 411275.70 66734683
## compressionratio 410064.34 46321637
## stroke 45850.23 44956207
## symboling 91473.48 12708110
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 10 15 17 21 25 27 28
## 14940.032 18816.228 18395.318 32083.526 7568.889 6427.608 6458.889 8065.406
## 30 37 47 66 68 77 78 88
## 14560.910 7616.122 10435.839 17731.630 30087.383 6297.633 6568.064 10846.913
## 91 103 115 125 129 132 133 135
## 7399.587 16226.953 17207.081 14179.875 31665.522 12216.994 13624.912 14925.281
## 147 151 153 164 165 167 168 169
## 8294.124 6589.538 6633.913 7907.417 7989.057 10031.819 14159.838 14159.838
## 170 172 187 192 194 198 200 201
## 14159.838 16008.177 8243.769 15633.610 12011.473 14785.062 18475.875 18838.682
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 14940.032 |
| 10 | 17859.17 | 18816.228 |
| 15 | 24565.00 | 18395.318 |
| 17 | 41315.00 | 32083.526 |
| 21 | 6575.00 | 7568.889 |
| 25 | 6229.00 | 6427.608 |
| 27 | 7609.00 | 6458.889 |
| 28 | 8558.00 | 8065.406 |
| 30 | 12964.00 | 14560.910 |
| 37 | 7295.00 | 7616.122 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2759.932
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.00 | 111 | 5000 | 21 | 27 | 13495.00 | 12691.088 | 13102.500 | 14940.032 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.00 | 160 | 5500 | 16 | 22 | 17859.17 | 16843.627 | 16156.842 | 18816.228 |
| 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.19 | 9.00 | 121 | 4250 | 20 | 25 | 24565.00 | 17604.480 | 16156.842 | 18395.318 |
| 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.00 | 182 | 5400 | 16 | 22 | 41315.00 | 26005.030 | 34201.367 | 32083.526 |
| 21 | 0 | 94.5 | 158.8 | 63.6 | 52.0 | 1909 | 90 | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 6575.00 | 5654.218 | 7324.644 | 7568.889 |
| 25 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6229.00 | 6794.408 | 7324.644 | 6427.608 |
| 27 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | 90 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 7609.00 | 6833.392 | 7324.644 | 6458.889 |
| 28 | 1 | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | 98 | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 8558.00 | 8614.499 | 7324.644 | 8065.406 |
| 30 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2811 | 156 | 3.60 | 3.90 | 7.00 | 145 | 5000 | 19 | 24 | 12964.00 | 15945.657 | 16156.842 | 14560.910 |
| 37 | 0 | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | 92 | 2.92 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 7295.00 | 9379.388 | 7324.644 | 7616.122 |
| 47 | 2 | 96.0 | 172.6 | 65.2 | 51.4 | 2734 | 119 | 3.43 | 3.23 | 9.20 | 90 | 5000 | 24 | 29 | 11048.00 | 11308.525 | 16156.842 | 10435.839 |
| 66 | 0 | 104.9 | 175.0 | 66.1 | 54.4 | 2670 | 140 | 3.76 | 3.16 | 8.00 | 120 | 5000 | 19 | 27 | 18280.00 | 16287.079 | 13102.500 | 17731.630 |
| 68 | -1 | 110.0 | 190.9 | 70.3 | 56.5 | 3515 | 183 | 3.58 | 3.64 | 21.50 | 123 | 4350 | 22 | 25 | 25552.00 | 24416.856 | 34201.367 | 30087.383 |
| 77 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | 92 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 37 | 41 | 5389.00 | 7013.985 | 7324.644 | 6297.633 |
| 78 | 2 | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | 92 | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6189.00 | 7871.498 | 7324.644 | 6568.064 |
| 88 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | 110 | 3.17 | 3.46 | 7.50 | 116 | 5500 | 23 | 30 | 9279.00 | 10641.664 | 13102.500 | 10846.913 |
| 91 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 2017 | 103 | 2.99 | 3.47 | 21.90 | 55 | 4800 | 45 | 50 | 7099.00 | 7620.372 | 7324.644 | 7399.587 |
| 103 | 0 | 100.4 | 184.6 | 66.5 | 56.1 | 3296 | 181 | 3.43 | 3.27 | 9.00 | 152 | 5200 | 17 | 22 | 14399.00 | 21985.682 | 16156.842 | 16226.953 |
| 115 | 0 | 114.2 | 198.9 | 68.4 | 58.7 | 3485 | 152 | 3.70 | 3.52 | 21.00 | 95 | 4150 | 25 | 25 | 17075.00 | 19244.893 | 16156.842 | 17207.081 |
| 125 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2818 | 156 | 3.59 | 3.86 | 7.00 | 145 | 5000 | 19 | 24 | 12764.00 | 16063.557 | 16156.842 | 14179.875 |
| 129 | 3 | 89.5 | 168.9 | 65.0 | 51.6 | 2800 | 194 | 3.74 | 2.90 | 9.50 | 207 | 5900 | 17 | 25 | 37028.00 | 25917.629 | 34201.367 | 31665.522 |
| 132 | 2 | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | 132 | 3.46 | 3.90 | 8.70 | 90 | 5100 | 23 | 31 | 9895.00 | 11241.316 | 9610.700 | 12216.994 |
| 133 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | 121 | 3.54 | 3.07 | 9.31 | 110 | 5250 | 21 | 28 | 11850.00 | 14612.488 | 13102.500 | 13624.912 |
| 135 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | 121 | 2.54 | 2.07 | 9.30 | 110 | 5250 | 21 | 28 | 15040.00 | 17819.011 | 16156.842 | 14925.281 |
| 147 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | 108 | 3.62 | 2.64 | 9.00 | 82 | 4800 | 28 | 32 | 7463.00 | 8942.331 | 7324.644 | 8294.124 |
| 151 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 1985 | 92 | 3.05 | 3.03 | 9.00 | 62 | 4800 | 35 | 39 | 5348.00 | 5940.300 | 7324.644 | 6589.538 |
| 153 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 2015 | 92 | 3.05 | 3.03 | 9.00 | 62 | 4800 | 31 | 38 | 6488.00 | 6624.027 | 7324.644 | 6633.913 |
| 164 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2169 | 98 | 3.19 | 3.03 | 9.00 | 70 | 4800 | 29 | 34 | 8058.00 | 6275.962 | 7324.644 | 7907.417 |
| 165 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | 98 | 3.19 | 3.03 | 9.00 | 70 | 4800 | 29 | 34 | 8238.00 | 6337.982 | 7324.644 | 7989.057 |
| 167 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | 98 | 3.24 | 3.08 | 9.40 | 112 | 6600 | 26 | 29 | 9538.00 | 11650.084 | 13102.500 | 10031.819 |
| 168 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2540 | 146 | 3.62 | 3.50 | 9.30 | 116 | 4800 | 24 | 30 | 8449.00 | 13988.972 | 13102.500 | 14159.838 |
| 169 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2536 | 146 | 3.62 | 3.50 | 9.30 | 116 | 4800 | 24 | 30 | 9639.00 | 13981.884 | 13102.500 | 14159.838 |
| 170 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2551 | 146 | 3.62 | 3.50 | 9.30 | 116 | 4800 | 24 | 30 | 9989.00 | 14008.464 | 13102.500 | 14159.838 |
| 172 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2714 | 146 | 3.62 | 3.50 | 9.30 | 116 | 4800 | 24 | 30 | 11549.00 | 14297.298 | 16156.842 | 16008.177 |
| 187 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2275 | 109 | 3.19 | 3.40 | 9.00 | 85 | 5250 | 27 | 34 | 8495.00 | 10735.382 | 7324.644 | 8243.769 |
| 192 | 0 | 100.4 | 180.2 | 66.9 | 55.1 | 2661 | 136 | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 24 | 13295.00 | 15424.217 | 13102.500 | 15633.610 |
| 194 | 0 | 100.4 | 183.1 | 66.9 | 55.1 | 2563 | 109 | 3.19 | 3.40 | 9.00 | 88 | 5500 | 25 | 31 | 12290.00 | 11042.971 | 9610.700 | 12011.473 |
| 198 | -1 | 104.3 | 188.8 | 67.2 | 57.5 | 3042 | 141 | 3.78 | 3.15 | 9.50 | 114 | 5400 | 24 | 28 | 16515.00 | 16502.119 | 16156.842 | 14785.062 |
| 200 | -1 | 104.3 | 188.8 | 67.2 | 57.5 | 3157 | 130 | 3.62 | 3.15 | 7.50 | 162 | 5100 | 17 | 22 | 18950.00 | 16359.943 | 16156.842 | 18475.875 |
| 201 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 2952 | 141 | 3.78 | 3.15 | 9.50 | 114 | 5400 | 23 | 28 | 16845.00 | 18064.359 | 20875.714 | 18838.682 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3976.765 | 3269.982 | 2759.932 |
En el presente ejercicio se realizo una cargade datos numéricos de precios de automóviles con respecto a algunas variables numéricas mediante un enlace de Github en formato CSV.
El modelo de regresión linea múltiple destaca el estadístico Adjusted R-squared con un valor de 0.8625, lo que se define como que las variables independientes explican aproximadamente el 86.25% de la variable dependiente precio.
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, highwaympg, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.
El RMSE del modelo de regresión lineal es de 3976.765.
El RMSE del modelo de árbol de regresión es de 3269.982.
El RMSE del modelo de bosques aleatorios es de 2759.932.
Los datos obtenidos mostrados anteriormente fueron realizados utilizando la semilla 1349