Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 2023
n <- nrow(datos)
set.seed(2023) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 24 | 1 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 7957 |
| 28 | 1 | gas | turbo | two | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 8558 |
| 32 | 2 | gas | std | two | hatchback | fwd | front | 86.6 | 144.6 | 63.9 | 50.8 | 1819 | ohc | four | 92 | 1bbl | 2.91 | 3.410 | 9.2 | 76 | 6000 | 31 | 38 | 6855 |
| 33 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | ohc | four | 79 | 1bbl | 2.91 | 3.070 | 10.1 | 60 | 5500 | 38 | 42 | 5399 |
| 38 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 |
| 40 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 65.2 | 54.1 | 2304 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 8845 |
| 53 | 1 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 |
| 54 | 1 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6695 |
| 56 | 3 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4821.1 -976.3 -96.5 918.4 8321.1
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.983e+03 2.044e+04 0.293 0.770185
## symboling 1.725e+02 2.782e+02 0.620 0.536420
## fueltypegas -2.050e+04 8.559e+03 -2.396 0.018096 *
## aspirationturbo 9.834e+02 1.163e+03 0.845 0.399511
## doornumbertwo 4.710e+02 6.632e+02 0.710 0.478933
## carbodyhardtop -3.774e+03 1.536e+03 -2.456 0.015427 *
## carbodyhatchback -3.991e+03 1.422e+03 -2.806 0.005830 **
## carbodysedan -2.703e+03 1.540e+03 -1.755 0.081761 .
## carbodywagon -3.312e+03 1.653e+03 -2.004 0.047255 *
## drivewheelfwd 4.525e+02 1.253e+03 0.361 0.718558
## drivewheelrwd 9.561e+02 1.411e+03 0.678 0.499177
## enginelocationrear 7.130e+03 2.852e+03 2.501 0.013717 *
## wheelbase 6.878e+01 1.173e+02 0.586 0.558661
## carlength -6.035e+01 5.776e+01 -1.045 0.298117
## carwidth 6.304e+02 2.715e+02 2.322 0.021888 *
## carheight -8.810e+01 1.539e+02 -0.572 0.568097
## curbweight 4.520e+00 2.054e+00 2.200 0.029657 *
## enginetypedohcv -7.043e+03 5.779e+03 -1.219 0.225286
## enginetypel -4.597e+02 2.104e+03 -0.218 0.827441
## enginetypeohc 4.420e+03 1.209e+03 3.654 0.000380 ***
## enginetypeohcf 1.974e+03 1.888e+03 1.046 0.297753
## enginetypeohcv -5.429e+03 1.550e+03 -3.502 0.000645 ***
## enginetyperotor -1.370e+03 5.428e+03 -0.252 0.801214
## cylindernumberfive -1.144e+04 3.522e+03 -3.249 0.001495 **
## cylindernumberfour -1.191e+04 3.797e+03 -3.137 0.002137 **
## cylindernumbersix -7.402e+03 2.933e+03 -2.524 0.012878 *
## cylindernumberthree -1.694e+03 5.386e+03 -0.315 0.753626
## cylindernumbertwelve -9.939e+03 4.985e+03 -1.994 0.048390 *
## cylindernumbertwo NA NA NA NA
## enginesize 1.094e+02 3.059e+01 3.577 0.000499 ***
## fuelsystem2bbl -5.791e+02 1.130e+03 -0.512 0.609378
## fuelsystem4bbl -2.565e+03 3.498e+03 -0.733 0.464706
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -5.470e+03 2.852e+03 -1.918 0.057425 .
## fuelsystemmpfi -8.634e+02 1.297e+03 -0.666 0.506784
## fuelsystemspdi -4.753e+03 1.677e+03 -2.835 0.005365 **
## fuelsystemspfi -1.498e+03 2.678e+03 -0.559 0.576922
## boreratio -1.665e+03 1.761e+03 -0.946 0.346113
## stroke -4.860e+03 1.067e+03 -4.555 1.24e-05 ***
## compressionratio -1.457e+03 6.375e+02 -2.286 0.023985 *
## horsepower 1.128e+01 2.760e+01 0.409 0.683510
## peakrpm 2.175e+00 7.521e-01 2.892 0.004533 **
## citympg -7.815e+01 1.772e+02 -0.441 0.659917
## highwaympg 1.187e+02 1.582e+02 0.750 0.454482
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2258 on 123 degrees of freedom
## Multiple R-squared: 0.9443, Adjusted R-squared: 0.9258
## F-statistic: 50.87 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 1 24 28 32 33 38 40 53
## 14177.247 10450.091 12040.616 7758.867 5313.271 9628.532 10205.788 5847.028
## 54 56 58 68 69 73 77 82
## 6380.277 11845.000 11867.600 26435.217 26694.406 39162.416 6857.378 10718.648
## 84 93 94 102 108 111 112 113
## 14389.167 5769.009 5340.911 15921.253 11381.690 15415.387 16467.696 16414.486
## 116 117 121 138 141 149 160 166
## 11630.290 16414.486 5968.732 11860.767 6469.257 8020.156 8280.177 7556.839
## 167 178 179 188 189 193 203 205
## 6426.897 7955.930 20560.194 8976.746 8702.493 10280.318 19125.019 19175.237
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 2023 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 14177.247 |
| 24 | 7957 | 10450.091 |
| 28 | 8558 | 12040.616 |
| 32 | 6855 | 7758.867 |
| 33 | 5399 | 5313.271 |
| 38 | 7895 | 9628.532 |
| 40 | 8845 | 10205.788 |
| 53 | 6795 | 5847.028 |
| 54 | 6695 | 6380.277 |
| 56 | 10945 | 11845.000 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2334.121
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11262780000 13419.220
## 2) enginesize< 182 150 3136062000 11240.750
## 4) highwaympg>=28.5 100 640421900 8688.060
## 8) carlength< 174.8 75 157981900 7719.147 *
## 9) carlength>=174.8 25 200802100 11594.800 *
## 5) highwaympg< 28.5 50 540775900 16346.120
## 10) enginesize< 125.5 12 72850800 13601.420 *
## 11) enginesize>=125.5 38 348976500 17212.870 *
## 3) enginesize>=182 15 296231400 35203.970 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 24 28 32 33 38 40 53
## 17212.873 7719.147 7719.147 7719.147 7719.147 7719.147 11594.800 7719.147
## 54 56 58 68 69 73 77 82
## 7719.147 13601.417 13601.417 35203.967 35203.967 35203.967 7719.147 7719.147
## 84 93 94 102 108 111 112 113
## 17212.873 7719.147 7719.147 17212.873 13601.417 17212.873 13601.417 11594.800
## 116 117 121 138 141 149 160 166
## 13601.417 11594.800 7719.147 13601.417 7719.147 7719.147 7719.147 7719.147
## 167 178 179 188 189 193 203 205
## 7719.147 11594.800 17212.873 7719.147 7719.147 11594.800 17212.873 17212.873
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 17212.873 |
| 24 | 7957 | 7719.147 |
| 28 | 8558 | 7719.147 |
| 32 | 6855 | 7719.147 |
| 33 | 5399 | 7719.147 |
| 38 | 7895 | 7719.147 |
| 40 | 8845 | 11594.800 |
| 53 | 6795 | 7719.147 |
| 54 | 6695 | 7719.147 |
| 56 | 10945 | 13601.417 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3144.878
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 5063760
## % Var explained: 92.58
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 28824997.114 3560587762
## cylindernumber 12675557.492 2102621144
## curbweight 7304828.512 1321177595
## highwaympg 5998389.299 1016036594
## citympg 4583719.502 888265446
## horsepower 8575234.799 736508907
## carwidth 3365757.880 683346079
## boreratio 4893901.354 525948241
## carlength 1984745.712 183858214
## peakrpm 620210.388 140007046
## wheelbase 1204777.008 137941985
## fuelsystem 1235957.781 90845539
## carheight 163875.151 54509892
## stroke 171333.124 43448903
## compressionratio 253964.712 43241576
## carbody -40915.782 40332083
## enginetype 599418.620 38661738
## drivewheel 519650.130 31861946
## symboling 245967.373 18279142
## aspiration 50826.097 2632290
## doornumber -8138.402 2463232
## fueltype 0.000 1560074
## enginelocation 0.000 0
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 24 28 32 33 38 40 53
## 17163.933 8053.450 8339.010 6733.058 6004.693 8657.903 9684.570 6127.805
## 54 56 58 68 69 73 77 82
## 7132.098 13489.202 13489.202 31653.507 31668.630 32130.729 5885.352 8784.192
## 84 93 94 102 108 111 112 113
## 13052.452 7030.427 7725.812 14278.034 15153.038 18037.587 15734.517 15000.754
## 116 117 121 138 141 149 160 166
## 15153.038 15000.754 6308.975 16675.105 7995.672 8965.717 7733.531 11507.390
## 167 178 179 188 189 193 203 205
## 11975.857 10576.013 16703.757 8158.087 10119.473 11498.150 20399.771 17904.332
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 17163.933 |
| 24 | 7957 | 8053.450 |
| 28 | 8558 | 8339.010 |
| 32 | 6855 | 6733.058 |
| 33 | 5399 | 6004.693 |
| 38 | 7895 | 8657.903 |
| 40 | 8845 | 9684.570 |
| 53 | 6795 | 6127.805 |
| 54 | 6695 | 7132.097 |
| 56 | 10945 | 13489.202 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2083.778
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 | 14177.247 | 17212.873 | 17163.933 |
| 24 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 7957 | 10450.091 | 7719.147 | 8053.450 |
| 28 | gas | turbo | two | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 8558 | 12040.616 | 7719.147 | 8339.010 |
| 32 | gas | std | two | hatchback | fwd | front | 86.6 | 144.6 | 63.9 | 50.8 | 1819 | ohc | four | 92 | 1bbl | 2.91 | 3.410 | 9.2 | 76 | 6000 | 31 | 38 | 6855 | 7758.867 | 7719.147 | 6733.058 |
| 33 | gas | std | two | hatchback | fwd | front | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | ohc | four | 79 | 1bbl | 2.91 | 3.070 | 10.1 | 60 | 5500 | 38 | 42 | 5399 | 5313.271 | 7719.147 | 6004.693 |
| 38 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 | 9628.532 | 7719.147 | 8657.903 |
| 40 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 65.2 | 54.1 | 2304 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 8845 | 10205.788 | 11594.800 | 9684.570 |
| 53 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 | 5847.028 | 7719.147 | 6127.805 |
| 54 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6695 | 6380.277 | 7719.147 | 7132.097 |
| 56 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 | 11845.000 | 13601.417 | 13489.202 |
| 58 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2385 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 13645 | 11867.600 | 13601.417 | 13489.202 |
| 68 | diesel | turbo | four | sedan | rwd | front | 110.0 | 190.9 | 70.3 | 56.5 | 3515 | ohc | five | 183 | idi | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 25552 | 26435.217 | 35203.967 | 31653.507 |
| 69 | diesel | turbo | four | wagon | rwd | front | 110.0 | 190.9 | 70.3 | 58.7 | 3750 | ohc | five | 183 | idi | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 28248 | 26694.406 | 35203.967 | 31668.630 |
| 73 | gas | std | two | convertible | rwd | front | 96.6 | 180.3 | 70.5 | 50.8 | 3685 | ohcv | eight | 234 | mpfi | 3.46 | 3.100 | 8.3 | 155 | 4750 | 16 | 18 | 35056 | 39162.416 | 35203.967 | 32130.729 |
| 77 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | ohc | four | 92 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 37 | 41 | 5389 | 6857.378 | 7719.147 | 5885.352 |
| 82 | gas | std | two | hatchback | fwd | front | 96.3 | 173.0 | 65.4 | 49.4 | 2328 | ohc | four | 122 | 2bbl | 3.35 | 3.460 | 8.5 | 88 | 5000 | 25 | 32 | 8499 | 10718.648 | 7719.147 | 8784.192 |
| 84 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2921 | ohc | four | 156 | spdi | 3.59 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 14869 | 14389.167 | 17212.873 | 13052.452 |
| 93 | gas | std | four | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1938 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 6849 | 5769.009 | 7719.147 | 7030.427 |
| 94 | gas | std | four | wagon | fwd | front | 94.5 | 170.2 | 63.8 | 53.5 | 2024 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7349 | 5340.911 | 7719.147 | 7725.812 |
| 102 | gas | std | four | sedan | fwd | front | 100.4 | 181.7 | 66.5 | 55.1 | 3095 | ohcv | six | 181 | mpfi | 3.43 | 3.270 | 9.0 | 152 | 5200 | 17 | 22 | 13499 | 15921.253 | 17212.873 | 14278.034 |
| 108 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3020 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 11900 | 11381.690 | 13601.417 | 15153.038 |
| 111 | diesel | turbo | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3430 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 25 | 25 | 13860 | 15415.387 | 17212.873 | 18037.588 |
| 112 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 2.190 | 8.4 | 95 | 5000 | 19 | 24 | 15580 | 16467.696 | 13601.417 | 15734.517 |
| 113 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 28 | 33 | 16900 | 16414.486 | 11594.800 | 15000.754 |
| 116 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630 | 11630.290 | 13601.417 | 15153.038 |
| 117 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 28 | 33 | 17950 | 16414.486 | 11594.800 | 15000.754 |
| 121 | gas | std | four | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6229 | 5968.732 | 7719.147 | 6308.975 |
| 138 | gas | turbo | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2847 | dohc | four | 121 | mpfi | 3.54 | 3.070 | 9.0 | 160 | 5500 | 19 | 26 | 18620 | 11860.767 | 13601.417 | 16675.105 |
| 141 | gas | std | two | hatchback | 4wd | front | 93.3 | 157.3 | 63.8 | 55.7 | 2240 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7603 | 6469.257 | 7719.147 | 7995.672 |
| 149 | gas | std | four | wagon | 4wd | front | 96.9 | 173.6 | 65.4 | 54.9 | 2420 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.0 | 82 | 4800 | 23 | 29 | 8013 | 8020.156 | 7719.147 | 8965.717 |
| 160 | diesel | std | four | hatchback | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2275 | ohc | four | 110 | idi | 3.27 | 3.350 | 22.5 | 56 | 4500 | 38 | 47 | 7788 | 8280.177 | 7719.147 | 7733.531 |
| 166 | gas | std | two | sedan | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2265 | dohc | four | 98 | mpfi | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9298 | 7556.839 | 7719.147 | 11507.390 |
| 167 | gas | std | two | hatchback | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | dohc | four | 98 | mpfi | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9538 | 6426.897 | 7719.147 | 11975.857 |
| 178 | gas | std | four | hatchback | fwd | front | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | ohc | four | 122 | mpfi | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 11248 | 7955.930 | 11594.800 | 10576.013 |
| 179 | gas | std | two | hatchback | rwd | front | 102.9 | 183.5 | 67.7 | 52.0 | 2976 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.3 | 161 | 5200 | 20 | 24 | 16558 | 20560.194 | 17212.873 | 16703.757 |
| 188 | diesel | turbo | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 68 | 4500 | 37 | 42 | 9495 | 8976.746 | 7719.147 | 8158.087 |
| 189 | gas | std | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2300 | ohc | four | 109 | mpfi | 3.19 | 3.400 | 10.0 | 100 | 5500 | 26 | 32 | 9995 | 8702.493 | 7719.147 | 10119.473 |
| 193 | diesel | turbo | four | sedan | fwd | front | 100.4 | 180.2 | 66.9 | 55.1 | 2579 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 68 | 4500 | 33 | 38 | 13845 | 10280.318 | 11594.800 | 11498.150 |
| 203 | gas | std | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3012 | ohcv | six | 173 | mpfi | 3.58 | 2.870 | 8.8 | 134 | 5500 | 18 | 23 | 21485 | 19125.019 | 17212.873 | 20399.771 |
| 205 | gas | turbo | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3062 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 9.5 | 114 | 5400 | 19 | 25 | 22625 | 19175.237 | 17212.873 | 17904.332 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2334.121 | 3144.878 | 2083.778 |
Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.
El modelo de regresión linea múltiple destaca algunas variables estadísticamente significativas.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.