Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1306) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16430 |
| 14 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
| 39 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 9095 |
| 41 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 10295 |
| 53 | 1 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 |
| 57 | 3 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845 |
| 65 | 0 | gas | std | four | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | ohc | four | 122 | 2bbl | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 11245 |
| 69 | -1 | diesel | turbo | four | wagon | rwd | front | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | ohc | five | 183 | idi | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 28248 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4580.9 -1100.1 -30.2 852.7 8617.7
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.141e+04 1.914e+04 -1.641 0.103261
## symboling 3.251e+00 2.824e+02 0.012 0.990835
## fueltypegas -1.232e+04 8.168e+03 -1.508 0.134143
## aspirationturbo 1.946e+03 9.920e+02 1.962 0.052052 .
## doornumbertwo 2.793e+02 6.331e+02 0.441 0.659890
## carbodyhardtop -3.503e+03 1.485e+03 -2.359 0.019889 *
## carbodyhatchback -3.562e+03 1.334e+03 -2.670 0.008604 **
## carbodysedan -2.310e+03 1.447e+03 -1.596 0.112988
## carbodywagon -3.209e+03 1.614e+03 -1.989 0.048927 *
## drivewheelfwd 6.980e+01 1.329e+03 0.053 0.958189
## drivewheelrwd 4.682e+02 1.541e+03 0.304 0.761772
## enginelocationrear 8.030e+03 2.979e+03 2.696 0.008010 **
## wheelbase 3.154e+01 1.148e+02 0.275 0.784023
## carlength -8.723e+01 5.612e+01 -1.554 0.122678
## carwidth 9.375e+02 2.829e+02 3.313 0.001211 **
## carheight -1.117e+00 1.558e+02 -0.007 0.994290
## curbweight 3.665e+00 2.307e+00 1.589 0.114667
## enginetypedohcv -8.153e+03 5.226e+03 -1.560 0.121358
## enginetypel -6.848e+02 1.838e+03 -0.372 0.710169
## enginetypeohc 4.326e+03 1.043e+03 4.149 6.17e-05 ***
## enginetypeohcf 9.251e+02 1.968e+03 0.470 0.639086
## enginetypeohcv -6.976e+03 1.475e+03 -4.728 6.09e-06 ***
## enginetyperotor 1.008e+03 4.767e+03 0.211 0.832947
## cylindernumberfive -1.092e+04 3.031e+03 -3.601 0.000458 ***
## cylindernumberfour -1.077e+04 3.319e+03 -3.246 0.001507 **
## cylindernumbersix -6.449e+03 2.443e+03 -2.640 0.009372 **
## cylindernumberthree 4.231e+01 4.757e+03 0.009 0.992918
## cylindernumbertwelve -1.081e+04 4.902e+03 -2.206 0.029241 *
## cylindernumbertwo NA NA NA NA
## enginesize 1.291e+02 2.905e+01 4.443 1.95e-05 ***
## fuelsystem2bbl 1.694e+02 1.025e+03 0.165 0.868982
## fuelsystem4bbl -1.574e+03 2.967e+03 -0.530 0.596775
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -4.136e+03 2.719e+03 -1.521 0.130814
## fuelsystemmpfi -1.450e+01 1.167e+03 -0.012 0.990109
## fuelsystemspdi -2.894e+03 1.586e+03 -1.825 0.070456 .
## fuelsystemspfi -3.275e+02 2.593e+03 -0.126 0.899721
## boreratio -4.256e+01 1.920e+03 -0.022 0.982353
## stroke -5.102e+03 1.085e+03 -4.702 6.78e-06 ***
## compressionratio -8.133e+02 6.161e+02 -1.320 0.189257
## horsepower 7.932e-01 2.716e+01 0.029 0.976752
## peakrpm 2.443e+00 7.345e-01 3.326 0.001163 **
## citympg -4.223e+01 1.593e+02 -0.265 0.791330
## highwaympg 9.591e+01 1.467e+02 0.654 0.514475
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2171 on 123 degrees of freedom
## Multiple R-squared: 0.9494, Adjusted R-squared: 0.9325
## F-statistic: 56.24 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 3 5 11 14 39 41 53 57
## 8264.586 16068.198 13891.389 20587.000 9233.280 7288.972 5774.748 12285.837
## 65 69 76 78 86 90 94 114
## 10347.583 27197.948 20306.878 6907.492 11008.040 6219.037 5109.150 15530.930
## 116 122 125 127 131 138 140 143
## 11425.499 6348.924 14834.549 33695.690 10046.023 12492.372 5820.163 7075.984
## 152 154 155 156 163 167 172 175
## 6070.678 5960.038 5616.276 8621.581 8013.357 6349.528 13069.459 12737.973
## 185 186 191 193 195 200 203 204
## 8589.882 9392.483 8158.094 11034.849 17622.319 18765.849 18888.422 25689.243
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 1306 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 8264.586 |
| 5 | 17450 | 16068.198 |
| 11 | 16430 | 13891.389 |
| 14 | 21105 | 20587.000 |
| 39 | 9095 | 9233.280 |
| 41 | 10295 | 7288.972 |
| 53 | 6795 | 5774.748 |
| 57 | 11845 | 12285.837 |
| 65 | 11245 | 10347.583 |
| 69 | 28248 | 27197.948 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2627.374
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11445220000 13426.070
## 2) enginesize< 182 149 3002948000 11135.320
## 4) curbweight< 2544 96 484534700 8449.750
## 8) curbweight< 2291.5 58 78729770 7209.621 *
## 9) curbweight>=2291.5 38 180459100 10342.580 *
## 5) curbweight>=2544 53 571917500 15999.740 *
## 3) enginesize>=182 16 379081000 34758.720 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 3 5 11 14 39 41 53 57
## 15999.739 15999.739 10342.579 15999.739 7209.621 10342.579 7209.621 10342.579
## 65 69 76 78 86 90 94 114
## 10342.579 34758.719 15999.739 7209.621 10342.579 7209.621 7209.621 15999.739
## 116 122 125 127 131 138 140 143
## 15999.739 7209.621 15999.739 34758.719 15999.739 15999.739 7209.621 7209.621
## 152 154 155 156 163 167 172 175
## 7209.621 7209.621 7209.621 15999.739 7209.621 10342.579 15999.739 10342.579
## 185 186 191 193 195 200 203 204
## 7209.621 7209.621 7209.621 15999.739 15999.739 15999.739 15999.739 15999.739
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 15999.739 |
| 5 | 17450 | 15999.739 |
| 11 | 16430 | 10342.579 |
| 14 | 21105 | 15999.739 |
| 39 | 9095 | 7209.621 |
| 41 | 10295 | 10342.579 |
| 53 | 6795 | 7209.621 |
| 57 | 11845 | 10342.579 |
| 65 | 11245 | 10342.579 |
| 69 | 28248 | 34758.719 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3086.6
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 5719360
## % Var explained: 91.75
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 32157957.45 4115689614
## curbweight 20226244.27 2030138954
## horsepower 7495306.57 1094402848
## cylindernumber 4035909.92 966031591
## citympg 9064117.64 945011459
## wheelbase 5465720.46 649163086
## carwidth 1886327.81 320465716
## highwaympg 1504290.36 279479618
## carlength 1788421.32 215404663
## fuelsystem 915233.56 204301305
## boreratio 1222285.71 144444666
## peakrpm 1467348.31 119236341
## carbody 452459.76 71647651
## compressionratio 907198.35 56385233
## enginetype 13463.02 53595516
## carheight 360821.35 41240011
## stroke 257166.56 33308953
## fueltype 563705.50 32479852
## drivewheel 292191.37 25127909
## symboling 392003.47 15338886
## doornumber -312113.83 12982346
## aspiration 35519.03 2880439
## enginelocation 0.00 0
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 3 5 11 14 39 41 53 57
## 16396.446 15723.201 14202.965 19411.422 8717.329 9531.782 6084.350 12219.111
## 65 69 76 78 86 90 94 114
## 9655.298 26821.192 19078.894 6394.962 8504.639 6974.060 7823.352 14331.072
## 116 122 125 127 131 138 140 143
## 14049.112 7045.413 14021.855 32443.419 11416.847 16015.829 7689.069 8068.413
## 152 154 155 156 163 167 172 175
## 6634.752 7743.184 7824.865 10026.958 7798.067 10150.562 11980.241 13305.775
## 185 186 191 193 195 200 203 204
## 8672.855 8303.257 9084.548 15362.950 15463.375 16245.008 20104.245 19340.888
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 3 | 16500 | 16396.446 |
| 5 | 17450 | 15723.201 |
| 11 | 16430 | 14202.965 |
| 14 | 21105 | 19411.423 |
| 39 | 9095 | 8717.329 |
| 41 | 10295 | 9531.782 |
| 53 | 6795 | 6084.350 |
| 57 | 11845 | 12219.111 |
| 65 | 11245 | 9655.298 |
| 69 | 28248 | 26821.192 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1538.294
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 | 8264.586 | 15999.739 | 16396.446 |
| 5 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450 | 16068.198 | 15999.739 | 15723.201 |
| 11 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16430 | 13891.389 | 10342.579 | 14202.965 |
| 14 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 | 20587.000 | 15999.739 | 19411.423 |
| 39 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 9095 | 9233.280 | 7209.621 | 8717.329 |
| 41 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 10295 | 7288.972 | 10342.579 | 9531.782 |
| 53 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 | 5774.748 | 7209.621 | 6084.350 |
| 57 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845 | 12285.837 | 10342.579 | 12219.111 |
| 65 | gas | std | four | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | ohc | four | 122 | 2bbl | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 11245 | 10347.583 | 10342.579 | 9655.298 |
| 69 | diesel | turbo | four | wagon | rwd | front | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | ohc | five | 183 | idi | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 28248 | 27197.948 | 34758.719 | 26821.192 |
| 76 | gas | turbo | two | hatchback | rwd | front | 102.7 | 178.4 | 68.0 | 54.8 | 2910 | ohc | four | 140 | mpfi | 3.78 | 3.120 | 8.0 | 175 | 5000 | 19 | 24 | 16503 | 20306.878 | 15999.739 | 19078.894 |
| 78 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | ohc | four | 92 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6189 | 6907.492 | 7209.621 | 6394.962 |
| 86 | gas | std | four | sedan | fwd | front | 96.3 | 172.4 | 65.4 | 51.6 | 2365 | ohc | four | 122 | 2bbl | 3.35 | 3.460 | 8.5 | 88 | 5000 | 25 | 32 | 6989 | 11008.040 | 10342.579 | 8504.639 |
| 90 | gas | std | two | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1889 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 5499 | 6219.037 | 7209.621 | 6974.060 |
| 94 | gas | std | four | wagon | fwd | front | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7349 | 5109.150 | 7209.621 | 7823.352 |
| 114 | gas | std | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 56.7 | 3285 | l | four | 120 | mpfi | 3.46 | 2.190 | 8.4 | 95 | 5000 | 19 | 24 | 16695 | 15530.930 | 15999.739 | 14331.072 |
| 116 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630 | 11425.499 | 15999.739 | 14049.112 |
| 122 | gas | std | four | sedan | fwd | front | 93.7 | 167.3 | 63.8 | 50.8 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6692 | 6348.924 | 7209.621 | 7045.413 |
| 125 | gas | turbo | two | hatchback | rwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2818 | ohc | four | 156 | spdi | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 12764 | 14834.549 | 15999.739 | 14021.855 |
| 127 | gas | std | two | hardtop | rwd | rear | 89.5 | 168.9 | 65.0 | 51.6 | 2756 | ohcf | six | 194 | mpfi | 3.74 | 2.900 | 9.5 | 207 | 5900 | 17 | 25 | 32528 | 33695.690 | 34758.719 | 32443.419 |
| 131 | gas | std | four | wagon | fwd | front | 96.1 | 181.5 | 66.5 | 55.2 | 2579 | ohc | four | 132 | mpfi | 3.46 | 3.900 | 8.7 | 90 | 5100 | 23 | 31 | 9295 | 10046.023 | 15999.739 | 11416.847 |
| 138 | gas | turbo | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | dohc | four | 121 | mpfi | 3.54 | 3.070 | 9.0 | 160 | 5500 | 19 | 26 | 18620 | 12492.372 | 15999.739 | 16015.829 |
| 140 | gas | std | two | hatchback | fwd | front | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7053 | 5820.163 | 7209.621 | 7689.069 |
| 143 | gas | std | four | sedan | fwd | front | 97.2 | 172.0 | 65.4 | 52.5 | 2190 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.5 | 82 | 4400 | 28 | 33 | 7775 | 7075.984 | 7209.621 | 8068.413 |
| 152 | gas | std | two | hatchback | fwd | front | 95.7 | 158.7 | 63.6 | 54.5 | 2040 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 38 | 6338 | 6070.678 | 7209.621 | 6634.752 |
| 154 | gas | std | four | wagon | fwd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2280 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 37 | 6918 | 5960.038 | 7209.621 | 7743.184 |
| 155 | gas | std | four | wagon | 4wd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898 | 5616.276 | 7209.621 | 7824.865 |
| 156 | gas | std | four | wagon | 4wd | front | 95.7 | 169.7 | 63.6 | 59.1 | 3110 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 8778 | 8621.581 | 15999.739 | 10026.958 |
| 163 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 28 | 34 | 9258 | 8013.357 | 7209.621 | 7798.067 |
| 167 | gas | std | two | hatchback | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | dohc | four | 98 | mpfi | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9538 | 6349.528 | 10342.579 | 10150.562 |
| 172 | gas | std | two | hatchback | rwd | front | 98.4 | 176.2 | 65.6 | 52.0 | 2714 | ohc | four | 146 | mpfi | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 11549 | 13069.459 | 15999.739 | 11980.241 |
| 175 | diesel | turbo | four | sedan | fwd | front | 102.4 | 175.6 | 66.5 | 54.9 | 2480 | ohc | four | 110 | idi | 3.27 | 3.350 | 22.5 | 73 | 4500 | 30 | 33 | 10698 | 12737.973 | 10342.579 | 13305.775 |
| 185 | diesel | std | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2264 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 52 | 4800 | 37 | 46 | 7995 | 8589.882 | 7209.621 | 8672.855 |
| 186 | gas | std | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2212 | ohc | four | 109 | mpfi | 3.19 | 3.400 | 9.0 | 85 | 5250 | 27 | 34 | 8195 | 9392.483 | 7209.621 | 8303.257 |
| 191 | gas | std | two | hatchback | fwd | front | 94.5 | 165.7 | 64.0 | 51.4 | 2221 | ohc | four | 109 | mpfi | 3.19 | 3.400 | 8.5 | 90 | 5500 | 24 | 29 | 9980 | 8158.094 | 7209.621 | 9084.548 |
| 193 | diesel | turbo | four | sedan | fwd | front | 100.4 | 180.2 | 66.9 | 55.1 | 2579 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 68 | 4500 | 33 | 38 | 13845 | 11034.849 | 15999.739 | 15362.950 |
| 195 | gas | std | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 2912 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 9.5 | 114 | 5400 | 23 | 28 | 12940 | 17622.319 | 15999.739 | 15463.375 |
| 200 | gas | turbo | four | wagon | rwd | front | 104.3 | 188.8 | 67.2 | 57.5 | 3157 | ohc | four | 130 | mpfi | 3.62 | 3.150 | 7.5 | 162 | 5100 | 17 | 22 | 18950 | 18765.849 | 15999.739 | 16245.008 |
| 203 | gas | std | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3012 | ohcv | six | 173 | mpfi | 3.58 | 2.870 | 8.8 | 134 | 5500 | 18 | 23 | 21485 | 18888.422 | 15999.739 | 20104.245 |
| 204 | diesel | turbo | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | ohc | six | 145 | idi | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470 | 25689.243 | 15999.739 | 19340.888 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2627.374 | 3086.6 | 1538.294 |
Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.
El modelo de regresión linea múltiple destaca algunas variables estadísticamente significativas.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.
Gracias a los ultimos ejercicios realizados podemos concluir que el modelo de random forest suele dar valores de rmse mas a bajos a comparado a los otros dos modelos.