Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
str(datos)
## 'data.frame': 205 obs. of 16 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1301) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 2 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 3 | 3 | 1 | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | 152 | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 4 | 4 | 2 | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | 109 | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950 |
| 5 | 5 | 2 | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | 136 | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 6 | 6 | 2 | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250 |
| 7 | 7 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 11 | 11 | 2 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 13 | 13 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970 |
| 16 | 16 | 0 | 103.5 | 189.0 | 66.9 | 55.7 | 3230 | 209 | 3.62 | 3.39 | 8.0 | 182 | 5400 | 16 | 22 | 30760 |
| 17 | 17 | 0 | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | 209 | 3.62 | 3.39 | 8.0 | 182 | 5400 | 16 | 22 | 41315 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| X | symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 8 | 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 12 | 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
| 14 | 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 21105.00 |
| 15 | 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.19 | 9.0 | 121 | 4250 | 20 | 25 | 24565.00 |
| 19 | 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.03 | 9.5 | 48 | 5100 | 47 | 53 | 5151.00 |
| 24 | 24 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | 98 | 3.03 | 3.39 | 7.6 | 102 | 5500 | 24 | 30 | 7957.00 |
| 33 | 33 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | 79 | 2.91 | 3.07 | 10.1 | 60 | 5500 | 38 | 42 | 5399.00 |
Se construye el modelo de regresión lineal múltiple (rm)
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ symboling + wheelbase + carlength + carwidth +
## carheight + curbweight + enginesize + boreratio + stroke +
## compressionratio + horsepower + peakrpm + citympg + highwaympg,
## data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10550.4 -1642.4 -106.2 1513.5 14260.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.902e+04 1.831e+04 -2.678 0.008242 **
## symboling 3.465e+02 2.743e+02 1.263 0.208524
## wheelbase 1.518e+02 1.175e+02 1.292 0.198325
## carlength -9.766e+01 6.148e+01 -1.589 0.114246
## carwidth 3.570e+02 2.862e+02 1.247 0.214268
## carheight 2.193e+02 1.580e+02 1.388 0.167331
## curbweight 1.519e+00 1.853e+00 0.820 0.413749
## enginesize 1.278e+02 1.486e+01 8.605 9.56e-15 ***
## boreratio 4.489e+01 1.537e+03 0.029 0.976743
## stroke -3.052e+03 9.307e+02 -3.279 0.001295 **
## compressionratio 3.342e+02 9.266e+01 3.606 0.000422 ***
## horsepower 2.686e+01 1.783e+01 1.507 0.134010
## peakrpm 2.664e+00 7.542e-01 3.532 0.000549 ***
## citympg -3.524e+02 1.984e+02 -1.776 0.077718 .
## highwaympg 2.407e+02 1.743e+02 1.381 0.169393
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3183 on 150 degrees of freedom
## Multiple R-squared: 0.8653, Adjusted R-squared: 0.8528
## F-statistic: 68.85 on 14 and 150 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel por encima del 95%, desplegado como del 99%.
La variable citympg tiene un nivel de confianza del 90% (.).
La variable stroke tiene un nivel de confianza del 99% (**).
-Las variables peakrpm, compressionratio, enginesize tiene un nivel de confianza del 99.9% (***).
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8528, quiere decir que las variables independientes explican aproximadamente el 85.51% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
predicciones_rm
## 1 8 9 10 12 14 15
## 13442.4682 17681.5012 17524.1731 16419.0054 13153.5205 16615.0727 17246.7120
## 19 24 33 37 47 54 57
## -307.0846 8350.6388 5292.3846 9154.8729 10934.5194 5439.4855 8476.0393
## 64 65 72 84 86 93 102
## 12754.8362 10069.1269 30958.0557 15735.8711 9968.0517 6530.5427 22561.5041
## 135 139 144 146 147 149 151
## 17646.8525 9113.5063 11190.1288 10752.0635 8954.3599 10583.3100 5191.8282
## 155 160 164 165 168 172 174
## 6377.1242 10210.5031 5937.5597 5990.7244 14002.9730 14267.2779 7815.2692
## 179 187 189 204 205
## 21609.1878 10375.9063 11687.8384 19364.3301 18702.0951
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 13442.4682 |
| 8 | 18920.00 | 17681.5012 |
| 9 | 23875.00 | 17524.1731 |
| 10 | 17859.17 | 16419.0054 |
| 12 | 16925.00 | 13153.5205 |
| 14 | 21105.00 | 16615.0727 |
| 15 | 24565.00 | 17246.7120 |
| 19 | 5151.00 | -307.0846 |
| 24 | 7957.00 | 8350.6388 |
| 33 | 5399.00 | 5292.3846 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3291.318
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + wheelbase + carlength + carwidth + carheight + curbweight + enginesize + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg ,
data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11288530000 13502.360
## 2) enginesize< 182 148 2685581000 11115.930
## 4) curbweight< 2659.5 100 576642400 8745.330
## 8) curbweight< 2284.5 58 76999860 7316.207 *
## 9) curbweight>=2284.5 42 217597800 10718.880 *
## 5) curbweight>=2659.5 48 376180900 16054.690 *
## 3) enginesize>=182 17 422193000 34278.320 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 8 9 10 12 14 15 19
## 10718.881 16054.688 16054.688 16054.688 10718.881 16054.688 16054.688 7316.207
## 24 33 37 47 54 57 64 65
## 7316.207 7316.207 7316.207 16054.688 7316.207 10718.881 10718.881 10718.881
## 72 84 86 93 102 135 139 144
## 34278.324 16054.688 10718.881 7316.207 16054.688 16054.688 7316.207 10718.881
## 146 147 149 151 155 160 164 165
## 10718.881 10718.881 10718.881 7316.207 10718.881 7316.207 7316.207 7316.207
## 168 172 174 179 187 189 204 205
## 10718.881 16054.688 10718.881 16054.688 7316.207 10718.881 16054.688 16054.688
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 10718.881 |
| 8 | 18920.00 | 16054.688 |
| 9 | 23875.00 | 16054.688 |
| 10 | 17859.17 | 16054.688 |
| 12 | 16925.00 | 10718.881 |
| 14 | 21105.00 | 16054.688 |
| 15 | 24565.00 | 16054.688 |
| 19 | 5151.00 | 7316.207 |
| 24 | 7957.00 | 7316.207 |
| 33 | 5399.00 | 7316.207 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3270.396
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "wheelbase",
"carlength", "carwidth", "carheight", "curbweight",
"enginesize", "boreratio", "stroke",
"compressionratio", "horsepower", "peakrpm",
"citympg", "highwaympg" )],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginesize", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 4
##
## Mean of squared residuals: 5474771
## % Var explained: 92
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 30309891.1 3370773186
## curbweight 25887142.8 2486913039
## highwaympg 14471151.6 1447686174
## horsepower 9771662.3 1192337496
## citympg 7688504.0 876804211
## carlength 2962896.2 615095206
## carwidth 6860636.0 391227766
## boreratio 802743.4 178194220
## wheelbase 1209559.8 129085966
## compressionratio 963462.7 113855053
## carheight 676900.0 86710837
## peakrpm 1501528.5 75215877
## stroke 1517104.2 74397497
## symboling 206351.2 18725577
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 8 9 10 12 14 15 19
## 16081.592 18055.464 20143.821 19473.779 13448.462 18302.598 15452.015 6324.190
## 24 33 37 47 54 57 64 65
## 7932.759 6433.715 7708.667 11968.272 6657.384 12312.102 10531.087 9613.614
## 72 84 86 93 102 135 139 144
## 36270.316 13591.730 8647.006 6738.583 16211.983 14782.823 7313.535 9361.297
## 146 147 149 151 155 160 164 165
## 10346.828 8666.483 9768.510 6556.952 8354.043 8156.772 8240.900 8252.780
## 168 172 174 179 187 189 204 205
## 10195.099 12569.833 10334.653 16839.152 8559.271 9813.003 15900.354 17032.625
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495.00 | 16081.593 |
| 8 | 18920.00 | 18055.464 |
| 9 | 23875.00 | 20143.821 |
| 10 | 17859.17 | 19473.779 |
| 12 | 16925.00 | 13448.462 |
| 14 | 21105.00 | 18302.598 |
| 15 | 24565.00 | 15452.015 |
| 19 | 5151.00 | 6324.190 |
| 24 | 7957.00 | 7932.759 |
| 33 | 5399.00 | 6433.715 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2462.144
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | wheelbase | carlength | carwidth | carheight | curbweight | enginesize | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | 130 | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 | 13442.4682 | 10718.881 | 16081.593 |
| 8 | 1 | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | 136 | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 | 17681.5012 | 16054.688 | 18055.464 |
| 9 | 1 | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | 131 | 3.13 | 3.400 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 | 17524.1731 | 16054.688 | 20143.821 |
| 10 | 0 | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | 131 | 3.13 | 3.400 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 | 16419.0054 | 16054.688 | 19473.779 |
| 12 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | 108 | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 | 13153.5205 | 10718.881 | 13448.462 |
| 14 | 0 | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105.00 | 16615.0727 | 16054.688 | 18302.598 |
| 15 | 1 | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | 164 | 3.31 | 3.190 | 9.0 | 121 | 4250 | 20 | 25 | 24565.00 | 17246.7120 | 16054.688 | 15452.015 |
| 19 | 2 | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | 61 | 2.91 | 3.030 | 9.5 | 48 | 5100 | 47 | 53 | 5151.00 | -307.0846 | 7316.207 | 6324.190 |
| 24 | 1 | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | 98 | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 7957.00 | 8350.6388 | 7316.207 | 7932.759 |
| 33 | 1 | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | 79 | 2.91 | 3.070 | 10.1 | 60 | 5500 | 38 | 42 | 5399.00 | 5292.3846 | 7316.207 | 6433.715 |
| 37 | 0 | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | 92 | 2.92 | 3.410 | 9.2 | 76 | 6000 | 30 | 34 | 7295.00 | 9154.8729 | 7316.207 | 7708.667 |
| 47 | 2 | 96.0 | 172.6 | 65.2 | 51.4 | 2734 | 119 | 3.43 | 3.230 | 9.2 | 90 | 5000 | 24 | 29 | 11048.00 | 10934.5194 | 16054.688 | 11968.272 |
| 54 | 1 | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | 91 | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6695.00 | 5439.4855 | 7316.207 | 6657.384 |
| 57 | 3 | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | 70 | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845.00 | 8476.0393 | 10718.881 | 12312.103 |
| 64 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2443 | 122 | 3.39 | 3.390 | 22.7 | 64 | 4650 | 36 | 42 | 10795.00 | 12754.8362 | 10718.881 | 10531.087 |
| 65 | 0 | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | 122 | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 11245.00 | 10069.1269 | 10718.881 | 9613.614 |
| 72 | -1 | 115.6 | 202.6 | 71.7 | 56.5 | 3740 | 234 | 3.46 | 3.100 | 8.3 | 155 | 4750 | 16 | 18 | 34184.00 | 30958.0557 | 34278.324 | 36270.316 |
| 84 | 3 | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | 156 | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 14869.00 | 15735.8711 | 16054.688 | 13591.730 |
| 86 | 1 | 96.3 | 172.4 | 65.4 | 51.6 | 2365 | 122 | 3.35 | 3.460 | 8.5 | 88 | 5000 | 25 | 32 | 6989.00 | 9968.0517 | 10718.881 | 8647.006 |
| 93 | 1 | 94.5 | 165.3 | 63.8 | 54.5 | 1938 | 97 | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 6849.00 | 6530.5427 | 7316.207 | 6738.583 |
| 102 | 0 | 100.4 | 181.7 | 66.5 | 55.1 | 3095 | 181 | 3.43 | 3.270 | 9.0 | 152 | 5200 | 17 | 22 | 13499.00 | 22561.5041 | 16054.688 | 16211.983 |
| 135 | 3 | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | 121 | 2.54 | 2.070 | 9.3 | 110 | 5250 | 21 | 28 | 15040.00 | 17646.8525 | 16054.688 | 14782.823 |
| 139 | 2 | 93.7 | 156.9 | 63.4 | 53.7 | 2050 | 97 | 3.62 | 2.360 | 9.0 | 69 | 4900 | 31 | 36 | 5118.00 | 9113.5063 | 7316.207 | 7313.535 |
| 144 | 0 | 97.2 | 172.0 | 65.4 | 52.5 | 2340 | 108 | 3.62 | 2.640 | 9.0 | 94 | 5200 | 26 | 32 | 9960.00 | 11190.1288 | 10718.881 | 9361.297 |
| 146 | 0 | 97.0 | 172.0 | 65.4 | 54.3 | 2510 | 108 | 3.62 | 2.640 | 7.7 | 111 | 4800 | 24 | 29 | 11259.00 | 10752.0635 | 10718.881 | 10346.828 |
| 147 | 0 | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | 108 | 3.62 | 2.640 | 9.0 | 82 | 4800 | 28 | 32 | 7463.00 | 8954.3599 | 10718.881 | 8666.483 |
| 149 | 0 | 96.9 | 173.6 | 65.4 | 54.9 | 2420 | 108 | 3.62 | 2.640 | 9.0 | 82 | 4800 | 23 | 29 | 8013.00 | 10583.3100 | 10718.881 | 9768.510 |
| 151 | 1 | 95.7 | 158.7 | 63.6 | 54.5 | 1985 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 35 | 39 | 5348.00 | 5191.8282 | 7316.207 | 6556.952 |
| 155 | 0 | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | 92 | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898.00 | 6377.1242 | 10718.881 | 8354.043 |
| 160 | 0 | 95.7 | 166.3 | 64.4 | 52.8 | 2275 | 110 | 3.27 | 3.350 | 22.5 | 56 | 4500 | 38 | 47 | 7788.00 | 10210.5031 | 7316.207 | 8156.773 |
| 164 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2169 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 29 | 34 | 8058.00 | 5937.5597 | 7316.207 | 8240.900 |
| 165 | 1 | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | 98 | 3.19 | 3.030 | 9.0 | 70 | 4800 | 29 | 34 | 8238.00 | 5990.7244 | 7316.207 | 8252.780 |
| 168 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2540 | 146 | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 8449.00 | 14002.9730 | 10718.881 | 10195.099 |
| 172 | 2 | 98.4 | 176.2 | 65.6 | 52.0 | 2714 | 146 | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 11549.00 | 14267.2779 | 16054.688 | 12569.833 |
| 174 | -1 | 102.4 | 175.6 | 66.5 | 54.9 | 2326 | 122 | 3.31 | 3.540 | 8.7 | 92 | 4200 | 29 | 34 | 8948.00 | 7815.2692 | 10718.881 | 10334.653 |
| 179 | 3 | 102.9 | 183.5 | 67.7 | 52.0 | 2976 | 171 | 3.27 | 3.350 | 9.3 | 161 | 5200 | 20 | 24 | 16558.00 | 21609.1878 | 16054.688 | 16839.152 |
| 187 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2275 | 109 | 3.19 | 3.400 | 9.0 | 85 | 5250 | 27 | 34 | 8495.00 | 10375.9063 | 7316.207 | 8559.271 |
| 189 | 2 | 97.3 | 171.7 | 65.5 | 55.7 | 2300 | 109 | 3.19 | 3.400 | 10.0 | 100 | 5500 | 26 | 32 | 9995.00 | 11687.8384 | 10718.881 | 9813.003 |
| 204 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | 145 | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470.00 | 19364.3301 | 16054.688 | 15900.354 |
| 205 | -1 | 109.1 | 188.8 | 68.9 | 55.5 | 3062 | 141 | 3.78 | 3.150 | 9.5 | 114 | 5400 | 19 | 25 | 22625.00 | 18702.0951 | 16054.688 | 17032.625 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3291.318 | 3270.396 | 2462.144 |
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
1.- Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
2.- El modelo de regresión linea múltiple destaca variables estadísticamente significativas: - El coeficiente de intersección tiene un nivel por encima del 95%, desplegado como del 99%. - Las variables peakrpm, compressionratio, enginesize tiene un nivel de confianza como predictor del 99.9%. - La variable stroke tiene un nivel de confianza como predictor del 99%. - La variable citympg tiene un nivel de confianza como predictor del 90%.
3.- En el modelo de árbol de regresión sus variables de importancia fueron: enginesize y curbweight.
4.- El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, highwaympg, horsepower y citympg.
5.- Despues de analizar se destaca que: - La variable enginesize está presente en todos los modelos como variable importante y significativa. - Las variables enginesize, compressionratio y peakrpm son importantes en el Modelo de regresión lineal múltiple. - Las variables de enginesize y curbweight son importantes tanto en el Modelo de árbol de regresión como en el de bosque aleatorio.
6.- En términos de mejor modelo de regresion, ¿cuál modelo es mejor comparando el estadístico RMSE? Al analizar los resultados, el mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de Modelo de bosques aleatorios (RF) con un valor de 2462.144. Resultando de los datos de entrenamiento y validación, y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.