Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1270) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 20 | 1 | gas | std | two | hatchback | fwd | front | 94.5 | 155.9 | 63.6 | 52.0 | 1874 | ohc | four | 90 | 2bbl | 3.03 | 3.110 | 9.6 | 70 | 5400 | 38 | 43 | 6295 |
| 27 | 1 | gas | std | four | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 7609 |
| 29 | -1 | gas | std | four | wagon | fwd | front | 103.3 | 174.6 | 64.6 | 59.8 | 2535 | ohc | four | 122 | 2bbl | 3.34 | 3.460 | 8.5 | 88 | 5000 | 24 | 30 | 8921 |
| 38 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 |
| 43 | 1 | gas | std | two | sedan | fwd | front | 96.5 | 169.1 | 66.0 | 51.0 | 2293 | ohc | four | 110 | 2bbl | 3.15 | 3.580 | 9.1 | 100 | 5500 | 25 | 31 | 10345 |
| 49 | 0 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.170 | 8.1 | 176 | 4750 | 15 | 19 | 35550 |
| 59 | 3 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2500 | rotor | two | 80 | mpfi | 3.33 | 3.255 | 9.4 | 135 | 6000 | 16 | 23 | 15645 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5445.3 -1159.1 -26.7 822.4 9670.4
##
## Coefficients: (3 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.655e+04 1.886e+04 -1.407 0.161827
## symboling 1.641e+02 2.889e+02 0.568 0.571131
## fueltypegas -1.441e+04 8.189e+03 -1.759 0.080976 .
## aspirationturbo 1.817e+03 1.125e+03 1.616 0.108741
## doornumbertwo 7.159e+01 6.839e+02 0.105 0.916805
## carbodyhardtop -2.793e+03 1.523e+03 -1.833 0.069158 .
## carbodyhatchback -3.806e+03 1.344e+03 -2.832 0.005401 **
## carbodysedan -2.802e+03 1.481e+03 -1.892 0.060854 .
## carbodywagon -3.892e+03 1.645e+03 -2.366 0.019540 *
## drivewheelfwd -2.234e+02 1.261e+03 -0.177 0.859698
## drivewheelrwd 2.710e+02 1.431e+03 0.189 0.850054
## enginelocationrear 6.487e+03 2.949e+03 2.199 0.029705 *
## wheelbase 7.179e+01 1.132e+02 0.634 0.527171
## carlength -6.647e+01 5.787e+01 -1.149 0.252908
## carwidth 7.542e+02 2.804e+02 2.689 0.008143 **
## carheight 1.700e+02 1.468e+02 1.158 0.249043
## curbweight 3.115e+00 1.937e+00 1.609 0.110257
## enginetypedohcv -1.044e+04 5.648e+03 -1.849 0.066883 .
## enginetypel -1.559e+03 1.907e+03 -0.818 0.415143
## enginetypeohc 2.839e+03 1.044e+03 2.720 0.007471 **
## enginetypeohcf 4.773e+00 1.847e+03 0.003 0.997942
## enginetypeohcv -7.541e+03 1.563e+03 -4.825 4.04e-06 ***
## enginetyperotor -6.192e+03 4.823e+03 -1.284 0.201575
## cylindernumberfive -1.266e+04 3.012e+03 -4.203 4.99e-05 ***
## cylindernumberfour -1.389e+04 3.397e+03 -4.088 7.76e-05 ***
## cylindernumbersix -8.395e+03 2.407e+03 -3.487 0.000676 ***
## cylindernumberthree -5.120e+03 4.944e+03 -1.036 0.302435
## cylindernumbertwelve -7.050e+03 5.075e+03 -1.389 0.167283
## cylindernumbertwo NA NA NA NA
## enginesize 1.038e+02 2.964e+01 3.502 0.000643 ***
## fuelsystem2bbl -6.221e+02 9.919e+02 -0.627 0.531690
## fuelsystem4bbl NA NA NA NA
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -3.715e+03 2.745e+03 -1.353 0.178418
## fuelsystemmpfi -7.212e+02 1.125e+03 -0.641 0.522640
## fuelsystemspdi -3.142e+03 1.640e+03 -1.915 0.057755 .
## fuelsystemspfi -3.448e+02 2.602e+03 -0.132 0.894807
## boreratio 8.387e+02 2.034e+03 0.412 0.680807
## stroke -5.036e+03 1.054e+03 -4.776 4.96e-06 ***
## compressionratio -1.058e+03 6.253e+02 -1.691 0.093292 .
## horsepower 8.208e+00 2.799e+01 0.293 0.769792
## peakrpm 2.311e+00 6.973e-01 3.314 0.001205 **
## citympg -1.616e+02 1.864e+02 -0.867 0.387536
## highwaympg 2.427e+02 1.654e+02 1.467 0.144788
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2237 on 124 degrees of freedom
## Multiple R-squared: 0.9428, Adjusted R-squared: 0.9244
## F-statistic: 51.1 on 40 and 124 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 3 8 11 20 27 29 38 43
## 8682.183 19148.054 14511.229 6062.616 6805.102 10417.260 9298.946 9282.296
## 49 59 71 81 89 94 107 108
## 31238.519 13270.855 27477.743 9670.119 8646.032 5164.545 15784.449 11828.330
## 115 117 119 121 122 125 126 128
## 14941.507 16781.202 5443.859 5732.469 6174.386 14246.156 19194.010 33313.180
## 131 139 146 149 150 157 158 166
## 9658.029 7855.868 13330.225 8779.448 11370.215 8095.871 7145.000 8421.996
## 169 180 183 185 191 193 199 202
## 12895.199 21316.926 9112.386 9050.142 7989.768 10274.297 19332.671 21798.999
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 1270 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 8682.183 |
| 8 | 18920 | 19148.054 |
| 11 | 16430 | 14511.229 |
| 20 | 6295 | 6062.616 |
| 27 | 7609 | 6805.102 |
| 29 | 8921 | 10417.260 |
| 38 | 7895 | 9298.946 |
| 43 | 10345 | 9282.296 |
| 49 | 35550 | 31238.519 |
| 59 | 15645 | 13270.855 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2361.055
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10846890000 13304.800
## 2) enginesize< 182 150 3103259000 11197.030
## 4) curbweight< 2544 95 439724600 8419.074
## 8) curbweight< 2295 59 78760220 7275.322 *
## 9) curbweight>=2295 36 157290000 10293.560 *
## 5) curbweight>=2544 55 664122000 15995.310
## 10) wheelbase< 100.8 27 211128700 14285.450 *
## 11) wheelbase>=100.8 28 297936700 17644.110
## 22) carheight>=56.1 12 45387290 15259.170 *
## 23) carheight< 56.1 16 133102600 19432.810 *
## 3) enginesize>=182 15 413185900 34382.500 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 3 8 11 20 27 29 38 43
## 14285.451 19432.812 10293.556 7275.322 7275.322 10293.556 7275.322 7275.322
## 49 59 71 81 89 94 107 108
## 34382.500 10293.556 34382.500 10293.556 10293.556 7275.322 14285.451 15259.167
## 115 117 119 121 122 125 126 128
## 15259.167 15259.167 7275.322 7275.322 7275.322 14285.451 14285.451 34382.500
## 131 139 146 149 150 157 158 166
## 14285.451 7275.322 10293.556 10293.556 14285.451 7275.322 7275.322 7275.322
## 169 180 183 185 191 193 199 202
## 10293.556 19432.812 7275.322 7275.322 7275.322 14285.451 15259.167 19432.812
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 14285.451 |
| 8 | 18920 | 19432.812 |
| 11 | 16430 | 10293.556 |
| 20 | 6295 | 7275.322 |
| 27 | 7609 | 7275.322 |
| 29 | 8921 | 10293.556 |
| 38 | 7895 | 7275.322 |
| 43 | 10345 | 7275.322 |
| 49 | 35550 | 34382.500 |
| 59 | 15645 | 10293.556 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 2631.784
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 6063205
## % Var explained: 90.78
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 22927765.10 2715579805
## curbweight 14602648.07 1923033512
## citympg 10492684.00 1839455079
## horsepower 6937921.45 1589150297
## cylindernumber 3652913.19 701425536
## carwidth 2367213.26 492199482
## wheelbase 2295782.89 315200206
## highwaympg 1561902.28 307856441
## carlength 2641967.75 235318674
## fuelsystem 1847743.45 212246443
## drivewheel 1756440.83 183532640
## compressionratio 122307.46 91023211
## enginetype 718592.13 82834550
## enginelocation 973277.55 63234918
## carheight 1062938.14 62519991
## peakrpm 764780.55 40536668
## carbody 409044.83 33451835
## symboling 472195.10 31437164
## boreratio -122909.38 30802565
## stroke 217792.74 26286953
## fueltype -37441.14 13967919
## aspiration 232175.02 12686123
## doornumber -106755.88 4173710
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 3 8 11 20 27 29 38 43
## 15607.739 17849.030 13949.338 7598.459 6743.323 9474.427 8905.032 9631.631
## 49 59 71 81 89 94 107 108
## 34131.643 15867.411 27979.325 11102.051 9966.003 7593.299 17232.462 16310.350
## 115 117 119 121 122 125 126 128
## 17096.355 16668.340 5876.670 6398.387 6999.223 14351.720 16365.708 31801.647
## 131 139 146 149 150 157 158 166
## 11377.519 7445.248 10716.406 10441.373 13382.287 7893.719 7983.033 9568.537
## 169 180 183 185 191 193 199 202
## 10022.523 17017.098 8312.310 8284.985 9994.751 12189.839 19085.939 19837.434
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 15607.739 |
| 8 | 18920 | 17849.030 |
| 11 | 16430 | 13949.338 |
| 20 | 6295 | 7598.459 |
| 27 | 7609 | 6743.323 |
| 29 | 8921 | 9474.427 |
| 38 | 7895 | 8905.032 |
| 43 | 10345 | 9631.631 |
| 49 | 35550 | 34131.643 |
| 59 | 15645 | 15867.411 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1712.114
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 | 8682.183 | 14285.451 | 15607.739 |
| 8 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 18920 | 19148.054 | 19432.812 | 17849.030 |
| 11 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16430 | 14511.229 | 10293.556 | 13949.338 |
| 20 | gas | std | two | hatchback | fwd | front | 94.5 | 155.9 | 63.6 | 52.0 | 1874 | ohc | four | 90 | 2bbl | 3.03 | 3.110 | 9.6 | 70 | 5400 | 38 | 43 | 6295 | 6062.616 | 7275.322 | 7598.459 |
| 27 | gas | std | four | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 7609 | 6805.102 | 7275.322 | 6743.323 |
| 29 | gas | std | four | wagon | fwd | front | 103.3 | 174.6 | 64.6 | 59.8 | 2535 | ohc | four | 122 | 2bbl | 3.34 | 3.460 | 8.5 | 88 | 5000 | 24 | 30 | 8921 | 10417.260 | 10293.556 | 9474.427 |
| 38 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 | 9298.946 | 7275.322 | 8905.032 |
| 43 | gas | std | two | sedan | fwd | front | 96.5 | 169.1 | 66.0 | 51.0 | 2293 | ohc | four | 110 | 2bbl | 3.15 | 3.580 | 9.1 | 100 | 5500 | 25 | 31 | 10345 | 9282.296 | 7275.322 | 9631.631 |
| 49 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.170 | 8.1 | 176 | 4750 | 15 | 19 | 35550 | 31238.519 | 34382.500 | 34131.643 |
| 59 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2500 | rotor | two | 80 | mpfi | 3.33 | 3.255 | 9.4 | 135 | 6000 | 16 | 23 | 15645 | 13270.855 | 10293.556 | 15867.411 |
| 71 | diesel | turbo | four | sedan | rwd | front | 115.6 | 202.6 | 71.7 | 56.3 | 3770 | ohc | five | 183 | idi | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 31600 | 27477.743 | 34382.500 | 27979.325 |
| 81 | gas | turbo | two | hatchback | fwd | front | 96.3 | 173.0 | 65.4 | 49.4 | 2370 | ohc | four | 110 | spdi | 3.17 | 3.460 | 7.5 | 116 | 5500 | 23 | 30 | 9959 | 9670.119 | 10293.556 | 11102.051 |
| 89 | gas | std | four | sedan | fwd | front | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | ohc | four | 110 | spdi | 3.17 | 3.460 | 7.5 | 116 | 5500 | 23 | 30 | 9279 | 8646.032 | 10293.556 | 9966.003 |
| 94 | gas | std | four | wagon | fwd | front | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7349 | 5164.545 | 7275.322 | 7593.299 |
| 107 | gas | std | two | hatchback | rwd | front | 99.2 | 178.5 | 67.9 | 49.7 | 3139 | ohcv | six | 181 | mpfi | 3.43 | 3.270 | 9.0 | 160 | 5200 | 19 | 25 | 18399 | 15784.449 | 14285.451 | 17232.462 |
| 108 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3020 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 11900 | 11828.330 | 15259.167 | 16310.350 |
| 115 | diesel | turbo | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3485 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 25 | 25 | 17075 | 14941.507 | 15259.167 | 17096.355 |
| 117 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 28 | 33 | 17950 | 16781.202 | 15259.167 | 16668.340 |
| 119 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1918 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 37 | 41 | 5572 | 5443.859 | 7275.322 | 5876.670 |
| 121 | gas | std | four | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6229 | 5732.469 | 7275.322 | 6398.387 |
| 122 | gas | std | four | sedan | fwd | front | 93.7 | 167.3 | 63.8 | 50.8 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6692 | 6174.386 | 7275.322 | 6999.223 |
| 125 | gas | turbo | two | hatchback | rwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2818 | ohc | four | 156 | spdi | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 12764 | 14246.156 | 14285.451 | 14351.720 |
| 126 | gas | std | two | hatchback | rwd | front | 94.5 | 168.9 | 68.3 | 50.2 | 2778 | ohc | four | 151 | mpfi | 3.94 | 3.110 | 9.5 | 143 | 5500 | 19 | 27 | 22018 | 19194.010 | 14285.451 | 16365.708 |
| 128 | gas | std | two | hardtop | rwd | rear | 89.5 | 168.9 | 65.0 | 51.6 | 2756 | ohcf | six | 194 | mpfi | 3.74 | 2.900 | 9.5 | 207 | 5900 | 17 | 25 | 34028 | 33313.180 | 34382.500 | 31801.647 |
| 131 | gas | std | four | wagon | fwd | front | 96.1 | 181.5 | 66.5 | 55.2 | 2579 | ohc | four | 132 | mpfi | 3.46 | 3.900 | 8.7 | 90 | 5100 | 23 | 31 | 9295 | 9658.029 | 14285.451 | 11377.519 |
| 139 | gas | std | two | hatchback | fwd | front | 93.7 | 156.9 | 63.4 | 53.7 | 2050 | ohcf | four | 97 | 2bbl | 3.62 | 2.360 | 9.0 | 69 | 4900 | 31 | 36 | 5118 | 7855.868 | 7275.322 | 7445.248 |
| 146 | gas | turbo | four | sedan | 4wd | front | 97.0 | 172.0 | 65.4 | 54.3 | 2510 | ohcf | four | 108 | mpfi | 3.62 | 2.640 | 7.7 | 111 | 4800 | 24 | 29 | 11259 | 13330.225 | 10293.556 | 10716.406 |
| 149 | gas | std | four | wagon | 4wd | front | 96.9 | 173.6 | 65.4 | 54.9 | 2420 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.0 | 82 | 4800 | 23 | 29 | 8013 | 8779.448 | 10293.556 | 10441.373 |
| 150 | gas | turbo | four | wagon | 4wd | front | 96.9 | 173.6 | 65.4 | 54.9 | 2650 | ohcf | four | 108 | mpfi | 3.62 | 2.640 | 7.7 | 111 | 4800 | 23 | 23 | 11694 | 11370.215 | 14285.451 | 13382.287 |
| 157 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2081 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 30 | 37 | 6938 | 8095.871 | 7275.322 | 7893.719 |
| 158 | gas | std | four | hatchback | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2109 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 30 | 37 | 7198 | 7145.000 | 7275.322 | 7983.033 |
| 166 | gas | std | two | sedan | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2265 | dohc | four | 98 | mpfi | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9298 | 8421.996 | 7275.322 | 9568.538 |
| 169 | gas | std | two | hardtop | rwd | front | 98.4 | 176.2 | 65.6 | 52.0 | 2536 | ohc | four | 146 | mpfi | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 9639 | 12895.199 | 10293.556 | 10022.523 |
| 180 | gas | std | two | hatchback | rwd | front | 102.9 | 183.5 | 67.7 | 52.0 | 3016 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.3 | 161 | 5200 | 19 | 24 | 15998 | 21316.926 | 19432.812 | 17017.098 |
| 183 | diesel | std | two | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2261 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 52 | 4800 | 37 | 46 | 7775 | 9112.386 | 7275.322 | 8312.310 |
| 185 | diesel | std | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2264 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 52 | 4800 | 37 | 46 | 7995 | 9050.142 | 7275.322 | 8284.985 |
| 191 | gas | std | two | hatchback | fwd | front | 94.5 | 165.7 | 64.0 | 51.4 | 2221 | ohc | four | 109 | mpfi | 3.19 | 3.400 | 8.5 | 90 | 5500 | 24 | 29 | 9980 | 7989.768 | 7275.322 | 9994.751 |
| 193 | diesel | turbo | four | sedan | fwd | front | 100.4 | 180.2 | 66.9 | 55.1 | 2579 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 68 | 4500 | 33 | 38 | 13845 | 10274.297 | 14285.451 | 12189.839 |
| 199 | gas | turbo | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 3045 | ohc | four | 130 | mpfi | 3.62 | 3.150 | 7.5 | 162 | 5100 | 17 | 22 | 18420 | 19332.671 | 15259.167 | 19085.939 |
| 202 | gas | turbo | four | sedan | rwd | front | 109.1 | 188.8 | 68.8 | 55.5 | 3049 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 8.7 | 160 | 5300 | 19 | 25 | 19045 | 21798.999 | 19432.812 | 19837.434 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2361.055 | 2631.784 | 1712.114 |
Se cambio la semilla propuesta de 1271 a la 1270, esto con el fin de que abarcara el maxima cantidad de etiquetas, ya que con la semilla anterior esto no fue posible, generando un error, ya que nuestros datos de entrenamiento carecian de algunas variables que si se encontraban en nuestros datos de validación, por lo que nuestro modelo no sabia que relizar en estos casos
Con nuestra semilla el mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con un rmse de 1712.114, esto usando 80% de datos de entrenamiento y 20% de validación. En orden de resultados los modelos quedaron de la siguiente manera: