Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1749) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 14 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
| 17 | 0 | gas | std | two | sedan | rwd | front | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | ohc | six | 209 | mpfi | 3.62 | 3.390 | 8.0 | 182 | 5400 | 16 | 22 | 41315 |
| 23 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6377 |
| 24 | 1 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 7957 |
| 37 | 0 | gas | std | four | wagon | fwd | front | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | ohc | four | 92 | 1bbl | 2.92 | 3.410 | 9.2 | 76 | 6000 | 30 | 34 | 7295 |
| 39 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 9095 |
| 41 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 10295 |
| 54 | 1 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6695 |
| 57 | 3 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 11845 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5759.7 -957.1 -50.5 896.0 4781.0
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.009e+04 1.692e+04 -1.778 0.077909 .
## symboling 5.424e+02 2.527e+02 2.146 0.033804 *
## fueltypegas -1.244e+04 7.306e+03 -1.702 0.091201 .
## aspirationturbo 1.408e+03 9.935e+02 1.417 0.159045
## doornumbertwo -4.877e+02 5.688e+02 -0.858 0.392833
## carbodyhardtop -3.363e+03 1.284e+03 -2.620 0.009911 **
## carbodyhatchback -3.226e+03 1.102e+03 -2.927 0.004076 **
## carbodysedan -2.404e+03 1.197e+03 -2.008 0.046820 *
## carbodywagon -3.171e+03 1.318e+03 -2.405 0.017650 *
## drivewheelfwd -1.257e+02 1.151e+03 -0.109 0.913189
## drivewheelrwd 1.433e+03 1.256e+03 1.141 0.256188
## enginelocationrear 1.009e+04 2.495e+03 4.046 9.13e-05 ***
## wheelbase 1.533e+02 9.504e+01 1.613 0.109324
## carlength -7.718e+01 4.900e+01 -1.575 0.117834
## carwidth 7.858e+02 2.406e+02 3.265 0.001416 **
## carheight 6.725e+01 1.251e+02 0.537 0.591989
## curbweight 3.408e+00 1.679e+00 2.029 0.044599 *
## enginetypedohcv -4.955e+03 4.445e+03 -1.115 0.267231
## enginetypel -3.017e+03 1.597e+03 -1.889 0.061264 .
## enginetypeohc 2.050e+03 9.093e+02 2.254 0.025949 *
## enginetypeohcf -1.057e+02 1.583e+03 -0.067 0.946850
## enginetypeohcv -5.698e+03 1.250e+03 -4.559 1.22e-05 ***
## enginetyperotor -1.699e+03 4.254e+03 -0.399 0.690306
## cylindernumberfive -8.143e+03 2.517e+03 -3.236 0.001558 **
## cylindernumberfour -9.294e+03 2.858e+03 -3.252 0.001477 **
## cylindernumbersix -5.296e+03 2.025e+03 -2.615 0.010041 *
## cylindernumberthree -1.676e+02 4.199e+03 -0.040 0.968226
## cylindernumbertwelve -4.436e+03 4.182e+03 -1.061 0.290871
## cylindernumbertwo NA NA NA NA
## enginesize 9.145e+01 2.630e+01 3.477 0.000701 ***
## fuelsystem2bbl 1.544e+02 8.919e+02 0.173 0.862823
## fuelsystem4bbl -1.720e+03 2.585e+03 -0.665 0.506993
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -3.006e+03 2.357e+03 -1.275 0.204679
## fuelsystemmpfi -3.369e+01 1.047e+03 -0.032 0.974383
## fuelsystemspdi -2.422e+03 1.352e+03 -1.791 0.075703 .
## fuelsystemspfi -8.224e+02 2.262e+03 -0.364 0.716839
## boreratio -1.402e+03 1.476e+03 -0.950 0.344098
## stroke -3.859e+03 8.551e+02 -4.513 1.47e-05 ***
## compressionratio -8.640e+02 5.372e+02 -1.608 0.110320
## horsepower 1.135e+01 2.316e+01 0.490 0.624858
## peakrpm 1.769e+00 6.108e-01 2.896 0.004469 **
## citympg 3.749e+01 1.547e+02 0.242 0.808881
## highwaympg 3.476e+01 1.382e+02 0.252 0.801746
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1908 on 123 degrees of freedom
## Multiple R-squared: 0.9564, Adjusted R-squared: 0.9419
## F-statistic: 65.87 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 5 14 17 23 24 37 39 41
## 17493.116 20008.203 29155.811 5717.639 9226.660 7625.667 8321.172 7236.683
## 54 57 65 67 75 83 84 87
## 6751.161 12286.481 10129.783 14225.851 38387.278 13805.130 14090.990 10567.360
## 91 107 110 111 116 119 122 132
## 8048.098 17477.925 12297.181 16020.718 12377.384 6190.020 6641.048 9276.940
## 133 138 140 142 145 148 149 151
## 12366.755 14269.613 6224.384 7868.965 8617.785 8662.135 7973.066 6445.859
## 161 168 178 179 180 187 195 205
## 8360.073 12533.651 8639.560 22639.587 22738.399 10427.936 15978.736 20210.274
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 1749 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 5 | 17450 | 17493.116 |
| 14 | 21105 | 20008.203 |
| 17 | 41315 | 29155.811 |
| 23 | 6377 | 5717.639 |
| 24 | 7957 | 9226.660 |
| 37 | 7295 | 7625.667 |
| 39 | 9095 | 8321.172 |
| 41 | 10295 | 7236.683 |
| 54 | 6695 | 6751.161 |
| 57 | 11845 | 12286.481 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3206.024
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 10279300000 13330.560
## 2) enginesize< 182 149 3100016000 11203.630
## 4) highwaympg>=28.5 102 718146000 8778.186
## 8) carlength< 175.5 79 166617400 7785.291 *
## 9) carlength>=175.5 23 206141500 12188.570 *
## 5) highwaympg< 28.5 47 479602100 16467.370
## 10) carwidth< 66.7 23 119400800 14773.610 *
## 11) carwidth>=66.7 24 230985300 18090.550 *
## 3) enginesize>=182 16 228172900 33137.530 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 5 14 17 23 24 37 39 41
## 14773.609 14773.609 33137.531 7785.291 7785.291 7785.291 7785.291 7785.291
## 54 57 65 67 75 83 84 87
## 7785.291 14773.609 12188.565 7785.291 33137.531 14773.609 14773.609 7785.291
## 91 107 110 111 116 119 122 132
## 7785.291 18090.549 18090.549 18090.549 18090.549 7785.291 7785.291 12188.565
## 133 138 140 142 145 148 149 151
## 14773.609 14773.609 7785.291 7785.291 14773.609 7785.291 7785.291 7785.291
## 161 168 178 179 180 187 195 205
## 7785.291 12188.565 12188.565 18090.549 18090.549 7785.291 18090.549 18090.549
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 5 | 17450 | 14773.609 |
| 14 | 21105 | 14773.609 |
| 17 | 41315 | 33137.531 |
| 23 | 6377 | 7785.291 |
| 24 | 7957 | 7785.291 |
| 37 | 7295 | 7785.291 |
| 39 | 9095 | 7785.291 |
| 41 | 10295 | 7785.291 |
| 54 | 6695 | 7785.291 |
| 57 | 11845 | 14773.609 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3878.034
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 4759508
## % Var explained: 92.36
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## horsepower 16248977.861 2331719872
## enginesize 19216484.402 2165997366
## citympg 7057973.597 1172174777
## cylindernumber 5247963.245 952783757
## curbweight 6999681.450 913286460
## carwidth 2961664.487 859653958
## drivewheel 1788634.514 549160511
## highwaympg 3881978.631 522740868
## carlength 1601046.742 180932114
## peakrpm 2056446.154 172913329
## compressionratio 652303.589 117462773
## stroke -42304.185 108292283
## carbody 165759.685 103839804
## wheelbase 2054947.608 88456893
## boreratio 338091.265 81822668
## fuelsystem 311290.820 77864349
## carheight 354563.238 29861397
## enginetype 13546.483 27799582
## symboling -82428.667 14880229
## fueltype 3462.492 7399621
## aspiration 44428.198 4602065
## doornumber -13339.751 2509531
## enginelocation -4724.695 1017369
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 5 14 17 23 24 37 39 41
## 16077.642 19739.870 31855.966 5964.947 8259.646 7416.402 8172.897 9243.565
## 54 57 65 67 75 83 84 87
## 7172.313 11725.802 9592.528 11375.027 36802.740 14627.563 14764.188 8634.197
## 91 107 110 111 116 119 122 132
## 7056.385 17474.757 15554.846 16479.464 14550.333 6013.414 7201.887 10890.448
## 133 138 140 142 145 148 149 151
## 14163.763 16680.814 7458.832 7953.297 9496.494 10657.578 8958.413 6235.311
## 161 168 178 179 180 187 195 205
## 7215.416 10287.230 10196.848 18485.101 17894.601 8323.903 15571.014 17656.888
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 5 | 17450 | 16077.642 |
| 14 | 21105 | 19739.870 |
| 17 | 41315 | 31855.966 |
| 23 | 6377 | 5964.947 |
| 24 | 7957 | 8259.646 |
| 37 | 7295 | 7416.402 |
| 39 | 9095 | 8172.897 |
| 41 | 10295 | 9243.565 |
| 54 | 6695 | 7172.313 |
| 57 | 11845 | 11725.802 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2756.469
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.00 | 115 | 5500 | 18 | 22 | 17450 | 17493.116 | 14773.609 | 16077.642 |
| 14 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.190 | 9.00 | 121 | 4250 | 21 | 28 | 21105 | 20008.203 | 14773.609 | 19739.870 |
| 17 | gas | std | two | sedan | rwd | front | 103.5 | 193.8 | 67.9 | 53.7 | 3380 | ohc | six | 209 | mpfi | 3.62 | 3.390 | 8.00 | 182 | 5400 | 16 | 22 | 41315 | 29155.811 | 33137.531 | 31855.966 |
| 23 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 6377 | 5717.639 | 7785.291 | 5964.947 |
| 24 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.60 | 102 | 5500 | 24 | 30 | 7957 | 9226.660 | 7785.291 | 8259.646 |
| 37 | gas | std | four | wagon | fwd | front | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | ohc | four | 92 | 1bbl | 2.92 | 3.410 | 9.20 | 76 | 6000 | 30 | 34 | 7295 | 7625.667 | 7785.291 | 7416.402 |
| 39 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.00 | 86 | 5800 | 27 | 33 | 9095 | 8321.172 | 7785.291 | 8172.897 |
| 41 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 62.5 | 54.1 | 2372 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.00 | 86 | 5800 | 27 | 33 | 10295 | 7236.683 | 7785.291 | 9243.565 |
| 54 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.00 | 68 | 5000 | 31 | 38 | 6695 | 6751.161 | 7785.291 | 7172.313 |
| 57 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.40 | 101 | 6000 | 17 | 23 | 11845 | 12286.481 | 14773.609 | 11725.802 |
| 65 | gas | std | four | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 55.5 | 2425 | ohc | four | 122 | 2bbl | 3.39 | 3.390 | 8.60 | 84 | 4800 | 26 | 32 | 11245 | 10129.783 | 12188.565 | 9592.528 |
| 67 | diesel | std | four | sedan | rwd | front | 104.9 | 175.0 | 66.1 | 54.4 | 2700 | ohc | four | 134 | idi | 3.43 | 3.640 | 22.00 | 72 | 4200 | 31 | 39 | 18344 | 14225.851 | 7785.291 | 11375.027 |
| 75 | gas | std | two | hardtop | rwd | front | 112.0 | 199.2 | 72.0 | 55.4 | 3715 | ohcv | eight | 304 | mpfi | 3.80 | 3.350 | 8.00 | 184 | 4500 | 14 | 16 | 45400 | 38387.278 | 33137.531 | 36802.740 |
| 83 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2833 | ohc | four | 156 | spdi | 3.58 | 3.860 | 7.00 | 145 | 5000 | 19 | 24 | 12629 | 13805.130 | 14773.609 | 14627.563 |
| 84 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | ohc | four | 156 | spdi | 3.59 | 3.860 | 7.00 | 145 | 5000 | 19 | 24 | 14869 | 14090.990 | 14773.609 | 14764.188 |
| 87 | gas | std | four | sedan | fwd | front | 96.3 | 172.4 | 65.4 | 51.6 | 2405 | ohc | four | 122 | 2bbl | 3.35 | 3.460 | 8.50 | 88 | 5000 | 25 | 32 | 8189 | 10567.360 | 7785.291 | 8634.197 |
| 91 | diesel | std | two | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 2017 | ohc | four | 103 | idi | 2.99 | 3.470 | 21.90 | 55 | 4800 | 45 | 50 | 7099 | 8048.098 | 7785.291 | 7056.385 |
| 107 | gas | std | two | hatchback | rwd | front | 99.2 | 178.5 | 67.9 | 49.7 | 3139 | ohcv | six | 181 | mpfi | 3.43 | 3.270 | 9.00 | 160 | 5200 | 19 | 25 | 18399 | 17477.925 | 18090.549 | 17474.757 |
| 110 | gas | std | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3230 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.40 | 97 | 5000 | 19 | 24 | 12440 | 12297.181 | 18090.549 | 15554.846 |
| 111 | diesel | turbo | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3430 | l | four | 152 | idi | 3.70 | 3.520 | 21.00 | 95 | 4150 | 25 | 25 | 13860 | 16020.718 | 18090.549 | 16479.464 |
| 116 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.40 | 97 | 5000 | 19 | 24 | 16630 | 12377.384 | 18090.549 | 14550.333 |
| 119 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1918 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.40 | 68 | 5500 | 37 | 41 | 5572 | 6190.020 | 7785.291 | 6013.414 |
| 122 | gas | std | four | sedan | fwd | front | 93.7 | 167.3 | 63.8 | 50.8 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.40 | 68 | 5500 | 31 | 38 | 6692 | 6641.048 | 7785.291 | 7201.887 |
| 132 | gas | std | two | hatchback | fwd | front | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | ohc | four | 132 | mpfi | 3.46 | 3.900 | 8.70 | 90 | 5100 | 23 | 31 | 9895 | 9276.940 | 12188.565 | 10890.448 |
| 133 | gas | std | two | hatchback | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2658 | ohc | four | 121 | mpfi | 3.54 | 3.070 | 9.31 | 110 | 5250 | 21 | 28 | 11850 | 12366.755 | 14773.609 | 14163.763 |
| 138 | gas | turbo | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | dohc | four | 121 | mpfi | 3.54 | 3.070 | 9.00 | 160 | 5500 | 19 | 26 | 18620 | 14269.613 | 14773.609 | 16680.814 |
| 140 | gas | std | two | hatchback | fwd | front | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 8.70 | 73 | 4400 | 26 | 31 | 7053 | 6224.384 | 7785.291 | 7458.832 |
| 142 | gas | std | four | sedan | fwd | front | 97.2 | 172.0 | 65.4 | 52.5 | 2145 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.50 | 82 | 4800 | 32 | 37 | 7126 | 7868.965 | 7785.291 | 7953.297 |
| 145 | gas | std | four | sedan | 4wd | front | 97.0 | 172.0 | 65.4 | 54.3 | 2385 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.00 | 82 | 4800 | 24 | 25 | 9233 | 8617.785 | 14773.609 | 9496.494 |
| 148 | gas | std | four | wagon | fwd | front | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | ohcf | four | 108 | mpfi | 3.62 | 2.640 | 9.00 | 94 | 5200 | 25 | 31 | 10198 | 8662.135 | 7785.291 | 10657.578 |
| 149 | gas | std | four | wagon | 4wd | front | 96.9 | 173.6 | 65.4 | 54.9 | 2420 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.00 | 82 | 4800 | 23 | 29 | 8013 | 7973.066 | 7785.291 | 8958.413 |
| 151 | gas | std | two | hatchback | fwd | front | 95.7 | 158.7 | 63.6 | 54.5 | 1985 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.00 | 62 | 4800 | 35 | 39 | 5348 | 6445.859 | 7785.291 | 6235.311 |
| 161 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2094 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.00 | 70 | 4800 | 38 | 47 | 7738 | 8360.073 | 7785.291 | 7215.416 |
| 168 | gas | std | two | hardtop | rwd | front | 98.4 | 176.2 | 65.6 | 52.0 | 2540 | ohc | four | 146 | mpfi | 3.62 | 3.500 | 9.30 | 116 | 4800 | 24 | 30 | 8449 | 12533.651 | 12188.565 | 10287.230 |
| 178 | gas | std | four | hatchback | fwd | front | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | ohc | four | 122 | mpfi | 3.31 | 3.540 | 8.70 | 92 | 4200 | 27 | 32 | 11248 | 8639.560 | 12188.565 | 10196.848 |
| 179 | gas | std | two | hatchback | rwd | front | 102.9 | 183.5 | 67.7 | 52.0 | 2976 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.30 | 161 | 5200 | 20 | 24 | 16558 | 22639.587 | 18090.549 | 18485.101 |
| 180 | gas | std | two | hatchback | rwd | front | 102.9 | 183.5 | 67.7 | 52.0 | 3016 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.30 | 161 | 5200 | 19 | 24 | 15998 | 22738.399 | 18090.549 | 17894.601 |
| 187 | gas | std | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2275 | ohc | four | 109 | mpfi | 3.19 | 3.400 | 9.00 | 85 | 5250 | 27 | 34 | 8495 | 10427.936 | 7785.291 | 8323.903 |
| 195 | gas | std | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 2912 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 9.50 | 114 | 5400 | 23 | 28 | 12940 | 15978.736 | 18090.549 | 15571.014 |
| 205 | gas | turbo | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3062 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 9.50 | 114 | 5400 | 19 | 25 | 22625 | 20210.274 | 18090.549 | 17656.888 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3206.024 | 3878.034 | 2756.469 |
Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.
El modelo de regresión linea múltiple destaca algunas variables estadísticamente significativas.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.