Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1307) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.00 | 111 | 5000 | 21 | 27 | 13495 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.00 | 102 | 5500 | 24 | 30 | 13950 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 20970 |
| 14 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105 |
| 18 | 0 | gas | std | four | sedan | rwd | front | 110.0 | 197.0 | 70.9 | 56.3 | 3505 | ohc | six | 209 | mpfi | 3.62 | 3.39 | 8.00 | 182 | 5400 | 15 | 20 | 36880 |
| 19 | 2 | gas | std | two | hatchback | fwd | front | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | l | three | 61 | 2bbl | 2.91 | 3.03 | 9.50 | 48 | 5100 | 47 | 53 | 5151 |
| 22 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572 |
| 23 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6377 |
| 38 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.58 | 9.00 | 86 | 5800 | 27 | 33 | 7895 |
| 42 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | ohc | four | 110 | mpfi | 3.15 | 3.58 | 9.00 | 101 | 5800 | 24 | 28 | 12945 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4401.5 -1108.0 -117.4 1044.9 9213.5
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.685e+03 1.928e+04 -0.502 0.616277
## symboling -4.615e+01 2.712e+02 -0.170 0.865180
## fueltypegas -1.507e+04 8.300e+03 -1.816 0.071821 .
## aspirationturbo 1.351e+03 1.027e+03 1.316 0.190708
## doornumbertwo 6.251e+02 6.723e+02 0.930 0.354280
## carbodyhardtop -3.587e+03 1.497e+03 -2.397 0.018045 *
## carbodyhatchback -3.161e+03 1.387e+03 -2.278 0.024438 *
## carbodysedan -2.406e+03 1.491e+03 -1.614 0.109037
## carbodywagon -3.776e+03 1.655e+03 -2.282 0.024181 *
## drivewheelfwd 8.968e+02 1.308e+03 0.686 0.494229
## drivewheelrwd 2.109e+03 1.451e+03 1.453 0.148621
## enginelocationrear 8.095e+03 2.836e+03 2.854 0.005064 **
## wheelbase -2.279e+01 1.113e+02 -0.205 0.838113
## carlength -2.165e+01 5.713e+01 -0.379 0.705384
## carwidth 7.018e+02 2.732e+02 2.569 0.011391 *
## carheight 1.147e+02 1.506e+02 0.762 0.447782
## curbweight 4.771e+00 2.160e+00 2.209 0.029031 *
## enginetypedohcv -1.007e+03 6.198e+03 -0.162 0.871240
## enginetypel -1.191e+03 1.815e+03 -0.656 0.512800
## enginetypeohc 3.839e+03 1.084e+03 3.543 0.000559 ***
## enginetypeohcf 2.783e+03 1.941e+03 1.434 0.154076
## enginetypeohcv -4.571e+03 1.423e+03 -3.212 0.001679 **
## enginetyperotor 8.026e+03 7.786e+03 1.031 0.304656
## cylindernumberfive -4.547e+03 5.178e+03 -0.878 0.381573
## cylindernumberfour -3.869e+03 6.019e+03 -0.643 0.521614
## cylindernumbersix -3.779e+03 4.114e+03 -0.918 0.360152
## cylindernumbertwelve -1.395e+04 5.127e+03 -2.720 0.007464 **
## cylindernumbertwo NA NA NA NA
## enginesize 1.640e+02 4.294e+01 3.820 0.000210 ***
## fuelsystem2bbl -3.029e+02 1.001e+03 -0.302 0.762825
## fuelsystem4bbl -1.487e+03 2.897e+03 -0.513 0.608781
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -3.155e+03 2.720e+03 -1.160 0.248393
## fuelsystemmpfi -2.243e+02 1.122e+03 -0.200 0.841852
## fuelsystemspdi -3.370e+03 1.568e+03 -2.150 0.033536 *
## fuelsystemspfi -1.434e+03 2.641e+03 -0.543 0.588154
## boreratio -6.605e+03 2.809e+03 -2.351 0.020295 *
## stroke -6.497e+03 1.350e+03 -4.814 4.23e-06 ***
## compressionratio -1.041e+03 6.248e+02 -1.666 0.098333 .
## horsepower -4.144e+00 2.443e+01 -0.170 0.865550
## peakrpm 2.325e+00 7.054e-01 3.296 0.001279 **
## citympg -1.737e+02 1.807e+02 -0.961 0.338311
## highwaympg 1.940e+02 1.614e+02 1.202 0.231659
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2237 on 124 degrees of freedom
## Multiple R-squared: 0.9395, Adjusted R-squared: 0.92
## F-statistic: 48.16 on 40 and 124 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
data_test_new <- datos.validacion # Duplicate test data set
data_test_new$cylindernumber[which(!(data_test_new$cylindernumber%in% unique(datos.entrenamiento$cylindernumber)))] <- NA # Replace new levels by NA
predicciones_rm <- predict(object = modelo_rm, newdata = data_test_new )
## Warning in predict.lm(object = modelo_rm, newdata = data_test_new): prediction
## from a rank-deficient fit may be misleading
predicciones_rm
## 1 4 13 14 18 19 22 23
## 15193.194 9456.155 20018.597 19655.885 34044.491 NA 5638.833 6109.191
## 38 42 45 53 54 65 68 69
## 9741.803 10148.665 6552.068 6422.725 6575.968 10301.545 27633.528 27637.477
## 72 73 77 84 94 97 103 106
## 35683.622 37687.871 6552.592 13599.283 4260.770 5598.273 15532.681 20449.791
## 117 120 124 133 135 138 143 147
## 16392.628 6867.786 9704.882 13066.131 26412.402 12310.955 6970.136 7363.385
## 148 150 155 159 165 184 188 192
## 9436.512 10187.001 5934.065 7696.473 8478.271 10087.960 9426.971 16567.941
comparaciones <- data.frame(precio_real = data_test_new$price, precio_predicciones = predicciones_rm )
Al haber usado semilla 1307 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 15193.194 |
| 4 | 13950 | 9456.155 |
| 13 | 20970 | 20018.597 |
| 14 | 21105 | 19655.885 |
| 18 | 36880 | 34044.491 |
| 19 | 5151 | NA |
| 22 | 5572 | 5638.833 |
| 23 | 6377 | 6109.191 |
| 38 | 7895 | 9741.803 |
| 42 | 12945 | 10148.665 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] NA
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10263650000 13207.410
## 2) enginesize< 182 152 3174609000 11330.450
## 4) curbweight< 2624.5 100 610484400 8735.675
## 8) curbweight< 2367.5 67 121724900 7563.381 *
## 9) curbweight>=2367.5 33 209740200 11115.790 *
## 5) curbweight>=2624.5 52 596065500 16320.390
## 10) carwidth< 68.6 44 394490800 15583.750 *
## 11) carwidth>=68.6 8 46382600 20371.880 *
## 3) enginesize>=182 13 292348800 35153.500 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 4 13 14 18 19 22 23
## 11115.788 7563.381 15583.754 15583.754 35153.500 7563.381 7563.381 7563.381
## 38 42 45 53 54 65 68 69
## 7563.381 11115.788 7563.381 7563.381 7563.381 11115.788 35153.500 35153.500
## 72 73 77 84 94 97 103 106
## 35153.500 35153.500 7563.381 15583.754 7563.381 7563.381 15583.754 15583.754
## 117 120 124 133 135 138 143 147
## 15583.754 7563.381 11115.788 15583.754 15583.754 15583.754 7563.381 7563.381
## 148 150 155 159 165 184 188 192
## 11115.788 15583.754 7563.381 7563.381 7563.381 7563.381 7563.381 15583.754
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 11115.788 |
| 4 | 13950 | 7563.381 |
| 13 | 20970 | 15583.754 |
| 14 | 21105 | 15583.754 |
| 18 | 36880 | 35153.500 |
| 19 | 5151 | 7563.381 |
| 22 | 5572 | 7563.381 |
| 23 | 6377 | 7563.381 |
| 38 | 7895 | 7563.381 |
| 42 | 12945 | 11115.788 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 2962.646
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 6164538
## % Var explained: 90.09
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 2.954723e+07 2659178989
## horsepower 2.091029e+07 2119314456
## curbweight 1.081980e+07 1265376991
## citympg 7.098837e+06 720326355
## highwaympg 2.116687e+06 565405610
## cylindernumber 4.609822e+06 496954121
## carwidth 3.507011e+06 353783619
## carlength 3.223264e+06 275735568
## drivewheel 1.636207e+06 145938976
## wheelbase 1.238895e+06 113565211
## peakrpm 4.322105e+05 76842423
## fuelsystem 7.678803e+05 67345772
## carbody 9.880913e+05 67189248
## enginelocation 2.582092e+04 61439792
## carheight 6.623651e+05 59181689
## compressionratio 3.483534e+05 57364759
## stroke 3.607838e+03 42470949
## enginetype 3.705955e+05 30033621
## boreratio 5.384846e+05 29555340
## symboling 1.412875e+05 20251621
## doornumber 3.737976e+04 12435386
## aspiration 2.847889e+04 9100012
## fueltype 6.603317e+02 4097582
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 4 13 14 18 19 22 23
## 15405.163 11359.912 18471.225 18530.935 36405.154 6440.026 5887.381 6027.514
## 38 42 45 53 54 65 68 69
## 9182.715 11781.260 6755.093 5772.983 6988.312 9666.115 29696.347 29646.066
## 72 73 77 84 94 97 103 106
## 30946.302 29246.909 6138.789 13662.150 7725.418 6969.484 16943.236 24601.329
## 117 120 124 133 135 138 143 147
## 16178.307 8173.021 9409.783 14300.486 14860.856 16793.338 8711.547 8763.845
## 148 150 155 159 165 184 188 192
## 10430.525 12324.046 8008.327 7883.175 8294.264 8764.537 7693.567 15597.699
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 15405.163 |
| 4 | 13950 | 11359.912 |
| 13 | 20970 | 18471.225 |
| 14 | 21105 | 18530.935 |
| 18 | 36880 | 36405.154 |
| 19 | 5151 | 6440.026 |
| 22 | 5572 | 5887.381 |
| 23 | 6377 | 6027.514 |
| 38 | 7895 | 9182.715 |
| 42 | 12945 | 11781.260 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1987.168
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.00 | 111 | 5000 | 21 | 27 | 13495.0 | 15193.194 | 11115.788 | 15405.163 |
| 4 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.00 | 102 | 5500 | 24 | 30 | 13950.0 | 9456.155 | 7563.381 | 11359.912 |
| 13 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 20970.0 | 20018.597 | 15583.754 | 18471.225 |
| 14 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105.0 | 19655.885 | 15583.754 | 18530.935 |
| 18 | gas | std | four | sedan | rwd | front | 110.0 | 197.0 | 70.9 | 56.3 | 3505 | ohc | six | 209 | mpfi | 3.62 | 3.39 | 8.00 | 182 | 5400 | 15 | 20 | 36880.0 | 34044.491 | 35153.500 | 36405.154 |
| 19 | gas | std | two | hatchback | fwd | front | 88.4 | 141.1 | 60.3 | 53.2 | 1488 | l | three | 61 | 2bbl | 2.91 | 3.03 | 9.50 | 48 | 5100 | 47 | 53 | 5151.0 | NA | 7563.381 | 6440.026 |
| 22 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572.0 | 5638.833 | 7563.381 | 5887.381 |
| 23 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6377.0 | 6109.191 | 7563.381 | 6027.514 |
| 38 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.58 | 9.00 | 86 | 5800 | 27 | 33 | 7895.0 | 9741.803 | 7563.381 | 9182.715 |
| 42 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | ohc | four | 110 | mpfi | 3.15 | 3.58 | 9.00 | 101 | 5800 | 24 | 28 | 12945.0 | 10148.665 | 11115.788 | 11781.260 |
| 45 | gas | std | two | sedan | fwd | front | 94.5 | 155.9 | 63.6 | 52.0 | 1874 | ohc | four | 90 | 2bbl | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 8916.5 | 6552.068 | 7563.381 | 6755.093 |
| 53 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 6795.0 | 6422.725 | 7563.381 | 5772.983 |
| 54 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | ohc | four | 91 | 2bbl | 3.03 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 6695.0 | 6575.968 | 7563.381 | 6988.312 |
| 65 | gas | std | four | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | ohc | four | 122 | 2bbl | 3.39 | 3.39 | 8.60 | 84 | 4800 | 26 | 32 | 11245.0 | 10301.545 | 11115.788 | 9666.115 |
| 68 | diesel | turbo | four | sedan | rwd | front | 110.0 | 190.9 | 70.3 | 56.5 | 3515 | ohc | five | 183 | idi | 3.58 | 3.64 | 21.50 | 123 | 4350 | 22 | 25 | 25552.0 | 27633.528 | 35153.500 | 29696.347 |
| 69 | diesel | turbo | four | wagon | rwd | front | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | ohc | five | 183 | idi | 3.58 | 3.64 | 21.50 | 123 | 4350 | 22 | 25 | 28248.0 | 27637.477 | 35153.500 | 29646.066 |
| 72 | gas | std | four | sedan | rwd | front | 115.6 | 202.6 | 71.7 | 56.5 | 3740 | ohcv | eight | 234 | mpfi | 3.46 | 3.10 | 8.30 | 155 | 4750 | 16 | 18 | 34184.0 | 35683.622 | 35153.500 | 30946.302 |
| 73 | gas | std | two | convertible | rwd | front | 96.6 | 180.3 | 70.5 | 50.8 | 3685 | ohcv | eight | 234 | mpfi | 3.46 | 3.10 | 8.30 | 155 | 4750 | 16 | 18 | 35056.0 | 37687.871 | 35153.500 | 29246.909 |
| 77 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | ohc | four | 92 | 2bbl | 2.97 | 3.23 | 9.40 | 68 | 5500 | 37 | 41 | 5389.0 | 6552.592 | 7563.381 | 6138.789 |
| 84 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | ohc | four | 156 | spdi | 3.59 | 3.86 | 7.00 | 145 | 5000 | 19 | 24 | 14869.0 | 13599.283 | 15583.754 | 13662.150 |
| 94 | gas | std | four | wagon | fwd | front | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | ohc | four | 97 | 2bbl | 3.15 | 3.29 | 9.40 | 69 | 5200 | 31 | 37 | 7349.0 | 4260.770 | 7563.381 | 7725.417 |
| 97 | gas | std | four | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1971 | ohc | four | 97 | 2bbl | 3.15 | 3.29 | 9.40 | 69 | 5200 | 31 | 37 | 7499.0 | 5598.273 | 7563.381 | 6969.484 |
| 103 | gas | std | four | wagon | fwd | front | 100.4 | 184.6 | 66.5 | 56.1 | 3296 | ohcv | six | 181 | mpfi | 3.43 | 3.27 | 9.00 | 152 | 5200 | 17 | 22 | 14399.0 | 15532.681 | 15583.754 | 16943.236 |
| 106 | gas | turbo | two | hatchback | rwd | front | 91.3 | 170.7 | 67.9 | 49.7 | 3139 | ohcv | six | 181 | mpfi | 3.43 | 3.27 | 7.80 | 200 | 5200 | 17 | 23 | 19699.0 | 20449.791 | 15583.754 | 24601.329 |
| 117 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.52 | 21.00 | 95 | 4150 | 28 | 33 | 17950.0 | 16392.628 | 15583.754 | 16178.307 |
| 120 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | spdi | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 7957.0 | 6867.786 | 7563.381 | 8173.021 |
| 124 | gas | std | four | wagon | fwd | front | 103.3 | 174.6 | 64.6 | 59.8 | 2535 | ohc | four | 122 | 2bbl | 3.35 | 3.46 | 8.50 | 88 | 5000 | 24 | 30 | 8921.0 | 9704.882 | 11115.788 | 9409.783 |
| 133 | gas | std | two | hatchback | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | ohc | four | 121 | mpfi | 3.54 | 3.07 | 9.31 | 110 | 5250 | 21 | 28 | 11850.0 | 13066.131 | 15583.754 | 14300.486 |
| 135 | gas | std | two | hatchback | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | ohc | four | 121 | mpfi | 2.54 | 2.07 | 9.30 | 110 | 5250 | 21 | 28 | 15040.0 | 26412.402 | 15583.754 | 14860.856 |
| 138 | gas | turbo | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | dohc | four | 121 | mpfi | 3.54 | 3.07 | 9.00 | 160 | 5500 | 19 | 26 | 18620.0 | 12310.955 | 15583.754 | 16793.338 |
| 143 | gas | std | four | sedan | fwd | front | 97.2 | 172.0 | 65.4 | 52.5 | 2190 | ohcf | four | 108 | 2bbl | 3.62 | 2.64 | 9.50 | 82 | 4400 | 28 | 33 | 7775.0 | 6970.136 | 7563.381 | 8711.547 |
| 147 | gas | std | four | wagon | fwd | front | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | ohcf | four | 108 | 2bbl | 3.62 | 2.64 | 9.00 | 82 | 4800 | 28 | 32 | 7463.0 | 7363.385 | 7563.381 | 8763.845 |
| 148 | gas | std | four | wagon | fwd | front | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | ohcf | four | 108 | mpfi | 3.62 | 2.64 | 9.00 | 94 | 5200 | 25 | 31 | 10198.0 | 9436.512 | 11115.788 | 10430.525 |
| 150 | gas | turbo | four | wagon | 4wd | front | 96.9 | 173.6 | 65.4 | 54.9 | 2650 | ohcf | four | 108 | mpfi | 3.62 | 2.64 | 7.70 | 111 | 4800 | 23 | 23 | 11694.0 | 10187.001 | 15583.754 | 12324.046 |
| 155 | gas | std | four | wagon | 4wd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | ohc | four | 92 | 2bbl | 3.05 | 3.03 | 9.00 | 62 | 4800 | 27 | 32 | 7898.0 | 5934.065 | 7563.381 | 8008.327 |
| 159 | diesel | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2275 | ohc | four | 110 | idi | 3.27 | 3.35 | 22.50 | 56 | 4500 | 34 | 36 | 7898.0 | 7696.473 | 7563.381 | 7883.175 |
| 165 | gas | std | two | hatchback | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | ohc | four | 98 | 2bbl | 3.19 | 3.03 | 9.00 | 70 | 4800 | 29 | 34 | 8238.0 | 8478.271 | 7563.381 | 8294.264 |
| 184 | gas | std | two | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2209 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 9.00 | 85 | 5250 | 27 | 34 | 7975.0 | 10087.960 | 7563.381 | 8764.537 |
| 188 | diesel | turbo | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | ohc | four | 97 | idi | 3.01 | 3.40 | 23.00 | 68 | 4500 | 37 | 42 | 9495.0 | 9426.971 | 7563.381 | 7693.567 |
| 192 | gas | std | four | sedan | fwd | front | 100.4 | 180.2 | 66.9 | 55.1 | 2661 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 24 | 13295.0 | 16567.941 | 15583.754 | 15597.699 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| NA | 2962.646 | 1987.168 |
Las variables con confianza mayor a 90% son: fueltypegas, carbodyhardtop, carbodyhatchback, carbodywagon, enginelocationrear, carwidth, curbweight, enginetypeohc, enginetypeohcv, cylindernumbertwelve, enginesize, fuelsystemspdi, boreratio, stroke, compressionratio, peakrpm.
Dentro de este ejercicio el modelo de random forest fue el mas preciso durante este ejercicio tuve un problema por la semilla, el problema era que al momento de hacer la separacion entre los datos de categoria cylindernumber para los datos de entrenamiento y los de validacion. Para comprobarlo primero use la semilla por defecto “2023” luego use mi semilla 1307, con la original no habia problema pero con la mia si, entonces pregunte a otro compañero y no tenia ese problema. Entonces me puse a buscar la causa y encontre eso de las opciones faltantes, termine solucionando el error intercambiando los valores de validacion que no se encontraban en los datos entrenamiento con NA. Esto causa que el caso pueda ejecutarse pero se pierde el resultado de ese modelo, entonces yo tome el valor el caso de ejemplo de la clase para hacer la comparacion, entonces contra ese ejemplo de todas formas el mas preciso sigue siendo el random forest con 1987.168 de RMSE.
Si la muestra fuera mas grande la probabilidad de que la semilla partiera los datos de tal forma que un valor categorico no se repita en ambos lados seria mucho menor.