Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1264) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
| 15 | 1 | gas | std | four | sedan | rwd | front | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 20 | 25 | 24565.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
| 14 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 |
| 27 | 1 | gas | std | four | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 7609 |
| 38 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 |
| 52 | 1 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095 |
| 53 | 1 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 |
| 56 | 3 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5738.0 -890.4 0.0 974.8 7628.2
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.258e+04 1.688e+04 -2.522 0.012941 *
## symboling -7.622e+01 2.580e+02 -0.295 0.768135
## fueltypegas -7.843e+03 7.150e+03 -1.097 0.274847
## aspirationturbo 2.819e+03 8.928e+02 3.157 0.002005 **
## doornumbertwo 6.283e+02 5.988e+02 1.049 0.296120
## carbodyhardtop -3.229e+03 1.427e+03 -2.263 0.025377 *
## carbodyhatchback -4.009e+03 1.384e+03 -2.897 0.004455 **
## carbodysedan -2.716e+03 1.491e+03 -1.822 0.070874 .
## carbodywagon -4.146e+03 1.612e+03 -2.573 0.011283 *
## drivewheelfwd 1.688e+02 1.107e+03 0.152 0.879064
## drivewheelrwd -7.977e+01 1.280e+03 -0.062 0.950425
## enginelocationrear 5.970e+03 2.545e+03 2.345 0.020608 *
## wheelbase -5.815e+01 1.021e+02 -0.570 0.569944
## carlength -2.957e+01 4.997e+01 -0.592 0.555192
## carwidth 8.633e+02 2.367e+02 3.647 0.000390 ***
## carheight 1.163e+02 1.387e+02 0.839 0.403078
## curbweight 3.853e+00 1.725e+00 2.234 0.027290 *
## enginetypedohcv -1.059e+04 4.529e+03 -2.338 0.020998 *
## enginetypel -1.505e+03 1.691e+03 -0.890 0.375380
## enginetypeohc 2.381e+03 9.875e+02 2.411 0.017389 *
## enginetypeohcf -1.389e+03 1.744e+03 -0.796 0.427423
## enginetypeohcv -9.558e+03 1.358e+03 -7.041 1.19e-10 ***
## enginetyperotor -3.382e+03 4.428e+03 -0.764 0.446441
## cylindernumberfive -1.305e+04 2.777e+03 -4.702 6.80e-06 ***
## cylindernumberfour -1.432e+04 3.092e+03 -4.631 9.13e-06 ***
## cylindernumbersix -7.387e+03 2.123e+03 -3.479 0.000696 ***
## cylindernumberthree -5.135e+03 4.446e+03 -1.155 0.250382
## cylindernumbertwelve -1.010e+04 4.335e+03 -2.330 0.021431 *
## cylindernumbertwo NA NA NA NA
## enginesize 1.230e+02 2.610e+01 4.713 6.48e-06 ***
## fuelsystem2bbl -9.847e+01 8.958e+02 -0.110 0.912654
## fuelsystem4bbl -2.070e+03 2.727e+03 -0.759 0.449147
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -3.280e+03 2.473e+03 -1.326 0.187166
## fuelsystemmpfi -3.971e+02 1.012e+03 -0.392 0.695415
## fuelsystemspdi -2.578e+03 1.379e+03 -1.870 0.063892 .
## fuelsystemspfi 1.312e+01 2.362e+03 0.006 0.995575
## boreratio 1.462e+03 1.742e+03 0.839 0.402948
## stroke -5.589e+03 9.581e+02 -5.834 4.49e-08 ***
## compressionratio -5.648e+02 5.360e+02 -1.054 0.294101
## horsepower -1.308e+01 2.272e+01 -0.576 0.565922
## peakrpm 3.415e+00 6.754e-01 5.056 1.51e-06 ***
## citympg 1.153e+01 1.517e+02 0.076 0.939512
## highwaympg 7.951e+01 1.402e+02 0.567 0.571543
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2017 on 123 degrees of freedom
## Multiple R-squared: 0.9574, Adjusted R-squared: 0.9432
## F-statistic: 67.42 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 1 3 7 12 14 27 38 52
## 14894.135 7155.108 20258.961 13669.073 20687.997 6777.815 9636.339 5680.662
## 53 56 60 67 71 90 97 98
## 5699.929 12735.366 10401.997 10804.809 27861.474 6863.488 6551.147 5113.741
## 113 115 116 121 132 136 141 148
## 16459.548 14761.806 11449.746 5399.805 9474.092 14441.362 6104.461 8839.126
## 152 154 155 161 163 165 166 176
## 5827.832 6193.676 5619.714 8703.080 7708.080 6606.759 10188.249 6464.469
## 178 180 181 190 195 197 199 204
## 6634.020 22438.604 22973.060 12646.596 18073.803 18173.967 18748.983 26624.392
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 2023 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 14894.135 |
| 3 | 16500 | 7155.108 |
| 7 | 17710 | 20258.961 |
| 12 | 16925 | 13669.073 |
| 14 | 21105 | 20687.997 |
| 27 | 7609 | 6777.815 |
| 38 | 7895 | 9636.339 |
| 52 | 6095 | 5680.662 |
| 53 | 6795 | 5699.929 |
| 56 | 10945 | 12735.366 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3272.124
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11742610000 13495.790
## 2) enginesize< 182 148 3021864000 11091.150
## 4) highwaympg>=28.5 99 585475300 8595.414
## 8) carlength< 175.5 78 158367900 7738.115 *
## 9) carlength>=175.5 21 156851600 11779.670 *
## 5) highwaympg< 28.5 49 573881700 16133.550
## 10) horsepower< 112.5 17 90563670 13697.760 *
## 11) horsepower>=112.5 32 328872800 17427.570 *
## 3) enginesize>=182 17 414635700 34430.320 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 3 7 12 14 27 38 52
## 13697.765 17427.568 13697.765 11779.667 17427.568 7738.115 7738.115 7738.115
## 53 56 60 67 71 90 97 98
## 7738.115 13697.765 11779.667 7738.115 34430.324 7738.115 7738.115 7738.115
## 113 115 116 121 132 136 141 148
## 11779.667 13697.765 13697.765 7738.115 11779.667 13697.765 7738.115 7738.115
## 152 154 155 161 163 165 166 176
## 7738.115 7738.115 7738.115 7738.115 7738.115 7738.115 7738.115 11779.667
## 178 180 181 190 195 197 199 204
## 11779.667 17427.568 17427.568 7738.115 17427.568 17427.568 17427.568 13697.765
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 13697.765 |
| 3 | 16500 | 17427.568 |
| 7 | 17710 | 13697.765 |
| 12 | 16925 | 11779.667 |
| 14 | 21105 | 17427.568 |
| 27 | 7609 | 7738.115 |
| 38 | 7895 | 7738.115 |
| 52 | 6095 | 7738.115 |
| 53 | 6795 | 7738.115 |
| 56 | 10945 | 13697.765 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3142.833
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 5228164
## % Var explained: 92.65
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 26700428.60 2.936059e+09
## horsepower 14135156.67 2.384172e+09
## citympg 10901979.44 1.382432e+09
## curbweight 8492078.98 1.277358e+09
## highwaympg 6873665.93 1.163907e+09
## cylindernumber 4231464.58 9.070652e+08
## drivewheel 2475227.06 3.555691e+08
## carwidth 1259453.96 2.962653e+08
## carlength 1606190.67 2.412287e+08
## boreratio 1562423.12 2.089318e+08
## wheelbase 1092363.05 1.225738e+08
## compressionratio 488167.56 1.119685e+08
## fuelsystem 1361864.39 1.054098e+08
## peakrpm 724722.47 9.191926e+07
## carbody 1306911.88 8.823256e+07
## stroke 501644.98 6.101246e+07
## enginelocation 487090.93 4.706763e+07
## carheight 310523.45 2.383873e+07
## aspiration 337177.91 2.047677e+07
## symboling 171429.30 1.571178e+07
## enginetype 258279.67 8.917956e+06
## doornumber 80491.18 8.294080e+06
## fueltype 0.00 2.709375e+04
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 3 7 12 14 27 38 52
## 14876.004 15712.824 19685.660 12917.123 19295.866 6811.736 8816.698 5931.224
## 53 56 60 67 71 90 97 98
## 5931.224 12843.357 10728.432 11522.212 28694.987 7302.545 7118.512 7419.763
## 113 115 116 121 132 136 141 148
## 15928.975 16856.017 13488.805 6492.623 10650.431 14363.554 7271.007 9956.282
## 152 154 155 161 163 165 166 176
## 6218.555 7696.061 8112.408 7228.502 7562.012 7880.615 9758.815 10145.972
## 178 180 181 190 195 197 199 204
## 10265.382 17558.471 16946.797 9936.677 16187.671 16526.521 19134.203 19221.157
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 14876.004 |
| 3 | 16500 | 15712.824 |
| 7 | 17710 | 19685.660 |
| 12 | 16925 | 12917.123 |
| 14 | 21105 | 19295.866 |
| 27 | 7609 | 6811.736 |
| 38 | 7895 | 8816.698 |
| 52 | 6095 | 5931.224 |
| 53 | 6795 | 5931.224 |
| 56 | 10945 | 12843.357 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 1875.285
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 | 14894.135 | 13697.765 | 14876.004 |
| 3 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.470 | 9.0 | 154 | 5000 | 19 | 26 | 16500 | 7155.108 | 17427.568 | 15712.824 |
| 7 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 17710 | 20258.961 | 13697.765 | 19685.660 |
| 12 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16925 | 13669.073 | 11779.667 | 12917.123 |
| 14 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.190 | 9.0 | 121 | 4250 | 21 | 28 | 21105 | 20687.997 | 17427.568 | 19295.866 |
| 27 | gas | std | four | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1989 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 7609 | 6777.815 | 7738.115 | 6811.736 |
| 38 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2236 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 7895 | 9636.339 | 7738.115 | 8816.698 |
| 52 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095 | 5680.662 | 7738.115 | 5931.224 |
| 53 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6795 | 5699.929 | 7738.115 | 5931.224 |
| 56 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 | 12735.366 | 13697.765 | 12843.357 |
| 60 | gas | std | two | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 53.7 | 2385 | ohc | four | 122 | 2bbl | 3.39 | 3.390 | 8.6 | 84 | 4800 | 26 | 32 | 8845 | 10401.997 | 11779.667 | 10728.433 |
| 67 | diesel | std | four | sedan | rwd | front | 104.9 | 175.0 | 66.1 | 54.4 | 2700 | ohc | four | 134 | idi | 3.43 | 3.640 | 22.0 | 72 | 4200 | 31 | 39 | 18344 | 10804.809 | 7738.115 | 11522.212 |
| 71 | diesel | turbo | four | sedan | rwd | front | 115.6 | 202.6 | 71.7 | 56.3 | 3770 | ohc | five | 183 | idi | 3.58 | 3.640 | 21.5 | 123 | 4350 | 22 | 25 | 31600 | 27861.474 | 34430.324 | 28694.987 |
| 90 | gas | std | two | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1889 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 5499 | 6863.488 | 7738.115 | 7302.545 |
| 97 | gas | std | four | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1971 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7499 | 6551.147 | 7738.115 | 7118.512 |
| 98 | gas | std | four | wagon | fwd | front | 94.5 | 170.2 | 63.8 | 53.5 | 2037 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7999 | 5113.741 | 7738.115 | 7419.763 |
| 113 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 28 | 33 | 16900 | 16459.548 | 11779.667 | 15928.975 |
| 115 | diesel | turbo | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3485 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 25 | 25 | 17075 | 14761.806 | 13697.765 | 16856.017 |
| 116 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630 | 11449.746 | 13697.765 | 13488.805 |
| 121 | gas | std | four | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | ohc | four | 90 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6229 | 5399.805 | 7738.115 | 6492.623 |
| 132 | gas | std | two | hatchback | fwd | front | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | ohc | four | 132 | mpfi | 3.46 | 3.900 | 8.7 | 90 | 5100 | 23 | 31 | 9895 | 9474.092 | 11779.667 | 10650.431 |
| 136 | gas | std | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | ohc | four | 121 | mpfi | 3.54 | 3.070 | 9.3 | 110 | 5250 | 21 | 28 | 15510 | 14441.362 | 13697.765 | 14363.554 |
| 141 | gas | std | two | hatchback | 4wd | front | 93.3 | 157.3 | 63.8 | 55.7 | 2240 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7603 | 6104.461 | 7738.115 | 7271.007 |
| 148 | gas | std | four | wagon | fwd | front | 97.0 | 173.5 | 65.4 | 53.0 | 2455 | ohcf | four | 108 | mpfi | 3.62 | 2.640 | 9.0 | 94 | 5200 | 25 | 31 | 10198 | 8839.126 | 7738.115 | 9956.282 |
| 152 | gas | std | two | hatchback | fwd | front | 95.7 | 158.7 | 63.6 | 54.5 | 2040 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 38 | 6338 | 5827.832 | 7738.115 | 6218.555 |
| 154 | gas | std | four | wagon | fwd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2280 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 31 | 37 | 6918 | 6193.676 | 7738.115 | 7696.061 |
| 155 | gas | std | four | wagon | 4wd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898 | 5619.714 | 7738.115 | 8112.408 |
| 161 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2094 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 38 | 47 | 7738 | 8703.080 | 7738.115 | 7228.502 |
| 163 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 28 | 34 | 9258 | 7708.080 | 7738.115 | 7562.012 |
| 165 | gas | std | two | hatchback | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2204 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 29 | 34 | 8238 | 6606.759 | 7738.115 | 7880.615 |
| 166 | gas | std | two | sedan | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2265 | dohc | four | 98 | mpfi | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9298 | 10188.249 | 7738.115 | 9758.815 |
| 176 | gas | std | four | hatchback | fwd | front | 102.4 | 175.6 | 66.5 | 53.9 | 2414 | ohc | four | 122 | mpfi | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 9988 | 6464.469 | 11779.667 | 10145.972 |
| 178 | gas | std | four | hatchback | fwd | front | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | ohc | four | 122 | mpfi | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 11248 | 6634.020 | 11779.667 | 10265.382 |
| 180 | gas | std | two | hatchback | rwd | front | 102.9 | 183.5 | 67.7 | 52.0 | 3016 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.3 | 161 | 5200 | 19 | 24 | 15998 | 22438.604 | 17427.568 | 17558.471 |
| 181 | gas | std | four | sedan | rwd | front | 104.5 | 187.8 | 66.5 | 54.1 | 3131 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.2 | 156 | 5200 | 20 | 24 | 15690 | 22973.060 | 17427.568 | 16946.797 |
| 190 | gas | std | two | convertible | fwd | front | 94.5 | 159.3 | 64.2 | 55.6 | 2254 | ohc | four | 109 | mpfi | 3.19 | 3.400 | 8.5 | 90 | 5500 | 24 | 29 | 11595 | 12646.596 | 7738.115 | 9936.677 |
| 195 | gas | std | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 2912 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 9.5 | 114 | 5400 | 23 | 28 | 12940 | 18073.803 | 17427.568 | 16187.671 |
| 197 | gas | std | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 2935 | ohc | four | 141 | mpfi | 3.78 | 3.150 | 9.5 | 114 | 5400 | 24 | 28 | 15985 | 18173.967 | 17427.568 | 16526.521 |
| 199 | gas | turbo | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 3045 | ohc | four | 130 | mpfi | 3.62 | 3.150 | 7.5 | 162 | 5100 | 17 | 22 | 18420 | 18748.983 | 17427.568 | 19134.203 |
| 204 | diesel | turbo | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3217 | ohc | six | 145 | idi | 3.01 | 3.400 | 23.0 | 106 | 4800 | 26 | 27 | 22470 | 26624.392 | 13697.765 | 19221.158 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3272.124 | 3142.833 | 1875.285 |
Se utilizaron todos los factores numéricos y categóricos para cargar los datos de los precios de los coches.
El modelo de regresión lineal múltiple pone de relieve algunos factores estadísticamente significativos.
Utilizando la medida de error cuadrático medio RMSE, el mejor modelo utilizando estos datos de entrenamiento y validación y porcentajes de datos de entrenamiento y validación del 80% y el 20% fue el modelo de bosque aleatorio.