Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(2001) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
| 14 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 21105.00 |
| 15 | 1 | gas | std | four | sedan | rwd | front | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 20 | 25 | 24565.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495 |
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925 |
| 28 | 1 | gas | turbo | two | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | ohc | four | 98 | mpfi | 3.03 | 3.39 | 7.6 | 102 | 5500 | 24 | 30 | 8558 |
| 33 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | ohc | four | 79 | 1bbl | 2.91 | 3.07 | 10.1 | 60 | 5500 | 38 | 42 | 5399 |
| 39 | 0 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | ohc | four | 110 | 1bbl | 3.15 | 3.58 | 9.0 | 86 | 5800 | 27 | 33 | 9095 |
| 48 | 0 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.1 | 176 | 4750 | 15 | 19 | 32250 |
| 51 | 1 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1890 | ohc | four | 91 | 2bbl | 3.03 | 3.15 | 9.0 | 68 | 5000 | 30 | 31 | 5195 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variable independientes incluyendo numéricas y no numéricas.
# Modelo de regresión lineal múltiple para observar variables de importancia
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5099.8 -1077.3 -38.6 979.2 8776.0
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.776e+04 1.945e+04 -1.427 0.156154
## symboling 1.028e+02 2.699e+02 0.381 0.703945
## fueltypegas -1.092e+04 8.066e+03 -1.354 0.178316
## aspirationturbo 2.210e+03 1.015e+03 2.176 0.031464 *
## doornumbertwo 4.365e+02 6.578e+02 0.664 0.508227
## carbodyhardtop -2.619e+03 1.531e+03 -1.711 0.089670 .
## carbodyhatchback -2.713e+03 1.468e+03 -1.848 0.066952 .
## carbodysedan -1.619e+03 1.588e+03 -1.020 0.309817
## carbodywagon -2.448e+03 1.675e+03 -1.462 0.146294
## drivewheelfwd 9.917e+02 1.301e+03 0.762 0.447398
## drivewheelrwd 1.749e+03 1.413e+03 1.238 0.217989
## enginelocationrear 6.391e+03 2.732e+03 2.339 0.020933 *
## wheelbase 5.580e+01 1.120e+02 0.498 0.619259
## carlength -4.123e+01 5.709e+01 -0.722 0.471543
## carwidth 5.989e+02 2.706e+02 2.214 0.028702 *
## carheight 8.927e+01 1.452e+02 0.615 0.539792
## curbweight 4.619e+00 1.975e+00 2.338 0.020992 *
## enginetypedohcv -5.236e+03 5.432e+03 -0.964 0.336980
## enginetypel 7.972e+02 2.017e+03 0.395 0.693401
## enginetypeohc 4.634e+03 1.221e+03 3.796 0.000229 ***
## enginetypeohcf 4.502e+03 2.093e+03 2.150 0.033482 *
## enginetypeohcv -5.310e+03 1.499e+03 -3.542 0.000562 ***
## enginetyperotor 1.403e+03 4.905e+03 0.286 0.775358
## cylindernumberfive -1.022e+04 3.045e+03 -3.358 0.001046 **
## cylindernumberfour -1.085e+04 3.424e+03 -3.169 0.001932 **
## cylindernumbersix -6.745e+03 2.542e+03 -2.653 0.009035 **
## cylindernumberthree -2.159e+03 4.984e+03 -0.433 0.665694
## cylindernumbertwelve -1.038e+04 4.961e+03 -2.092 0.038471 *
## cylindernumbertwo NA NA NA NA
## enginesize 1.231e+02 2.918e+01 4.220 4.71e-05 ***
## fuelsystem2bbl 5.847e+02 1.014e+03 0.577 0.565279
## fuelsystem4bbl -5.498e+02 2.956e+03 -0.186 0.852758
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -3.901e+03 2.655e+03 -1.469 0.144275
## fuelsystemmpfi 2.014e+02 1.104e+03 0.182 0.855586
## fuelsystemspdi -3.027e+03 1.484e+03 -2.040 0.043506 *
## fuelsystemspfi 1.730e+02 2.550e+03 0.068 0.946008
## boreratio -3.343e+03 1.749e+03 -1.911 0.058352 .
## stroke -3.312e+03 1.149e+03 -2.883 0.004653 **
## compressionratio -7.818e+02 6.046e+02 -1.293 0.198386
## horsepower 1.032e+01 2.531e+01 0.408 0.684178
## peakrpm 2.470e+00 6.987e-01 3.536 0.000574 ***
## citympg -2.417e+01 1.695e+02 -0.143 0.886828
## highwaympg 1.236e+02 1.636e+02 0.755 0.451527
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2177 on 123 degrees of freedom
## Multiple R-squared: 0.9502, Adjusted R-squared: 0.9336
## F-statistic: 57.21 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9336 significa que las variables independientes explican aproximadamente el 93.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 1 2 5 7 12 28 33 39
## 10491.606 10491.606 16072.654 19771.032 12360.751 11558.929 4897.325 9446.219
## 48 51 52 54 55 56 67 78
## 33188.953 4924.501 5811.561 6358.926 6214.886 12733.453 12679.975 7031.581
## 81 83 95 100 111 112 113 116
## 9642.310 14138.856 6701.187 9816.481 15911.291 15385.279 16807.441 12093.908
## 124 126 134 136 137 140 147 155
## 10543.056 17254.306 12692.948 12983.924 11405.066 6355.460 8240.997 5394.102
## 158 159 166 170 178 180 188 203
## 6241.136 6944.200 7253.067 12124.164 8011.923 20629.708 9834.068 18259.272
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado la semilla y haiendo realizado las pruebas, se conlcuye que los datos de entrenamiento deben de cubrir y garantizar todas los posible valores de las variables categoricas en los datos de validacion, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 10491.606 |
| 2 | 16500 | 10491.606 |
| 5 | 17450 | 16072.654 |
| 7 | 17710 | 19771.032 |
| 12 | 16925 | 12360.751 |
| 28 | 8558 | 11558.929 |
| 33 | 5399 | 4897.325 |
| 39 | 9095 | 9446.219 |
| 48 | 32250 | 33188.953 |
| 51 | 5195 | 4924.501 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2674.766
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 2001 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 10491.606 |
| 2 | 16500 | 10491.606 |
| 5 | 17450 | 16072.654 |
| 7 | 17710 | 19771.032 |
| 12 | 16925 | 12360.751 |
| 28 | 8558 | 11558.929 |
| 33 | 5399 | 4897.325 |
| 39 | 9095 | 9446.219 |
| 48 | 32250 | 33188.953 |
| 51 | 5195 | 4924.501 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2674.766
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 11703110000 13514.470
## 2) enginesize< 182 148 3024234000 11116.370
## 4) curbweight< 2659.5 101 543374300 8663.851
## 8) curbweight< 2286.5 58 80815120 7283.776 *
## 9) curbweight>=2286.5 43 203089500 10525.350 *
## 5) curbweight>=2659.5 47 567888200 16386.660 *
## 3) enginesize>=182 17 417867800 34392.090 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 1 2 5 7 12 28 33 39
## 10525.349 10525.349 16386.663 16386.663 10525.349 7283.776 7283.776 10525.349
## 48 51 52 54 55 56 67 78
## 34392.088 7283.776 7283.776 7283.776 7283.776 10525.349 16386.663 7283.776
## 81 83 95 100 111 112 113 116
## 10525.349 16386.663 7283.776 10525.349 16386.663 16386.663 16386.663 16386.663
## 124 126 134 136 137 140 147 155
## 10525.349 16386.663 16386.663 16386.663 16386.663 7283.776 10525.349 10525.349
## 158 159 166 170 178 180 188 203
## 7283.776 7283.776 7283.776 10525.349 10525.349 16386.663 10525.349 16386.663
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 10525.349 |
| 2 | 16500 | 10525.349 |
| 5 | 17450 | 16386.663 |
| 7 | 17710 | 16386.663 |
| 12 | 16925 | 10525.349 |
| 28 | 8558 | 7283.776 |
| 33 | 5399 | 7283.776 |
| 39 | 9095 | 10525.349 |
| 48 | 32250 | 34392.088 |
| 51 | 5195 | 7283.776 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 2452.158
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 6556184
## % Var explained: 90.76
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 34272896.52 4448199374
## curbweight 9191300.79 1725800032
## carlength 4823061.86 1209110837
## cylindernumber 12705175.54 1108286753
## citympg 4797379.24 972917499
## carwidth 4906304.34 876779703
## highwaympg 3342776.77 429927566
## horsepower 5043436.70 347986801
## fuelsystem 4526153.48 234288443
## wheelbase 2380672.69 186638380
## drivewheel 813992.64 143191902
## compressionratio 672518.13 98369788
## carbody 867692.53 86982750
## enginelocation 0.00 65328957
## peakrpm 421432.24 58268966
## carheight 134105.15 54675570
## enginetype 215351.67 47026349
## stroke 679348.36 45497385
## aspiration 80020.25 35845022
## boreratio 729426.47 29107861
## fueltype 360747.89 17624606
## symboling -73258.15 15621933
## doornumber 75131.74 7980449
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 1 2 5 7 12 28 33 39
## 13623.183 13623.183 15681.207 22288.179 13859.846 8436.427 6158.163 8473.876
## 48 51 52 54 55 56 67 78
## 35511.482 6586.418 6494.941 6875.696 6875.696 13074.809 12531.550 6431.780
## 81 83 95 100 111 112 113 116
## 9751.479 13957.914 6543.202 9404.517 20878.033 14599.592 15891.083 14773.392
## 124 126 134 136 137 140 147 155
## 9140.462 15856.657 14638.054 14638.054 16876.173 7346.447 8439.802 8112.638
## 158 159 166 170 178 180 188 203
## 7781.830 7797.333 9778.212 10060.150 10354.529 16909.436 8528.115 20142.060
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 1 | 13495 | 13623.183 |
| 2 | 16500 | 13623.183 |
| 5 | 17450 | 15681.207 |
| 7 | 17710 | 22288.179 |
| 12 | 16925 | 13859.846 |
| 28 | 8558 | 8436.427 |
| 33 | 5399 | 6158.163 |
| 39 | 9095 | 8473.876 |
| 48 | 32250 | 35511.482 |
| 51 | 5195 | 6586.418 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2258.095
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 13495 | 10491.606 | 10525.349 | 13623.183 |
| 2 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.680 | 9.0 | 111 | 5000 | 21 | 27 | 16500 | 10491.606 | 10525.349 | 13623.183 |
| 5 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.0 | 115 | 5500 | 18 | 22 | 17450 | 16072.654 | 16386.663 | 15681.207 |
| 7 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.400 | 8.5 | 110 | 5500 | 19 | 25 | 17710 | 19771.032 | 16386.663 | 22288.179 |
| 12 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.800 | 8.8 | 101 | 5800 | 23 | 29 | 16925 | 12360.751 | 10525.349 | 13859.846 |
| 28 | gas | turbo | two | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | ohc | four | 98 | mpfi | 3.03 | 3.390 | 7.6 | 102 | 5500 | 24 | 30 | 8558 | 11558.929 | 7283.776 | 8436.427 |
| 33 | gas | std | two | hatchback | fwd | front | 93.7 | 150.0 | 64.0 | 52.6 | 1837 | ohc | four | 79 | 1bbl | 2.91 | 3.070 | 10.1 | 60 | 5500 | 38 | 42 | 5399 | 4897.325 | 7283.776 | 6158.163 |
| 39 | gas | std | two | hatchback | fwd | front | 96.5 | 167.5 | 65.2 | 53.3 | 2289 | ohc | four | 110 | 1bbl | 3.15 | 3.580 | 9.0 | 86 | 5800 | 27 | 33 | 9095 | 9446.219 | 10525.349 | 8473.876 |
| 48 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.170 | 8.1 | 176 | 4750 | 15 | 19 | 32250 | 33188.953 | 34392.088 | 35511.482 |
| 51 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1890 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 30 | 31 | 5195 | 4924.501 | 7283.776 | 6586.418 |
| 52 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6095 | 5811.561 | 7283.776 | 6494.941 |
| 54 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1945 | ohc | four | 91 | 2bbl | 3.03 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 6695 | 6358.926 | 7283.776 | 6875.696 |
| 55 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1950 | ohc | four | 91 | 2bbl | 3.08 | 3.150 | 9.0 | 68 | 5000 | 31 | 38 | 7395 | 6214.886 | 7283.776 | 6875.696 |
| 56 | gas | std | two | hatchback | rwd | front | 95.3 | 169.0 | 65.7 | 49.6 | 2380 | rotor | two | 70 | 4bbl | 3.33 | 3.255 | 9.4 | 101 | 6000 | 17 | 23 | 10945 | 12733.453 | 10525.349 | 13074.809 |
| 67 | diesel | std | four | sedan | rwd | front | 104.9 | 175.0 | 66.1 | 54.4 | 2700 | ohc | four | 134 | idi | 3.43 | 3.640 | 22.0 | 72 | 4200 | 31 | 39 | 18344 | 12679.975 | 16386.663 | 12531.550 |
| 78 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 64.4 | 50.8 | 1944 | ohc | four | 92 | 2bbl | 2.97 | 3.230 | 9.4 | 68 | 5500 | 31 | 38 | 6189 | 7031.581 | 7283.776 | 6431.780 |
| 81 | gas | turbo | two | hatchback | fwd | front | 96.3 | 173.0 | 65.4 | 49.4 | 2370 | ohc | four | 110 | spdi | 3.17 | 3.460 | 7.5 | 116 | 5500 | 23 | 30 | 9959 | 9642.310 | 10525.349 | 9751.479 |
| 83 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2833 | ohc | four | 156 | spdi | 3.58 | 3.860 | 7.0 | 145 | 5000 | 19 | 24 | 12629 | 14138.856 | 16386.663 | 13957.914 |
| 95 | gas | std | two | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1951 | ohc | four | 97 | 2bbl | 3.15 | 3.290 | 9.4 | 69 | 5200 | 31 | 37 | 7299 | 6701.187 | 7283.776 | 6543.202 |
| 100 | gas | std | four | hatchback | fwd | front | 97.2 | 173.4 | 65.2 | 54.7 | 2324 | ohc | four | 120 | 2bbl | 3.33 | 3.470 | 8.5 | 97 | 5200 | 27 | 34 | 8949 | 9816.481 | 10525.349 | 9404.517 |
| 111 | diesel | turbo | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3430 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 25 | 25 | 13860 | 15911.291 | 16386.663 | 20878.033 |
| 112 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 2.190 | 8.4 | 95 | 5000 | 19 | 24 | 15580 | 15385.279 | 16386.663 | 14599.592 |
| 113 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.520 | 21.0 | 95 | 4150 | 28 | 33 | 16900 | 16807.441 | 16386.663 | 15891.083 |
| 116 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 3.190 | 8.4 | 97 | 5000 | 19 | 24 | 16630 | 12093.908 | 16386.663 | 14773.392 |
| 124 | gas | std | four | wagon | fwd | front | 103.3 | 174.6 | 64.6 | 59.8 | 2535 | ohc | four | 122 | 2bbl | 3.35 | 3.460 | 8.5 | 88 | 5000 | 24 | 30 | 8921 | 10543.056 | 10525.349 | 9140.462 |
| 126 | gas | std | two | hatchback | rwd | front | 94.5 | 168.9 | 68.3 | 50.2 | 2778 | ohc | four | 151 | mpfi | 3.94 | 3.110 | 9.5 | 143 | 5500 | 19 | 27 | 22018 | 17254.306 | 16386.663 | 15856.657 |
| 134 | gas | std | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2695 | ohc | four | 121 | mpfi | 3.54 | 3.070 | 9.3 | 110 | 5250 | 21 | 28 | 12170 | 12692.948 | 16386.663 | 14638.054 |
| 136 | gas | std | four | sedan | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2758 | ohc | four | 121 | mpfi | 3.54 | 3.070 | 9.3 | 110 | 5250 | 21 | 28 | 15510 | 12983.924 | 16386.663 | 14638.054 |
| 137 | gas | turbo | two | hatchback | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2808 | dohc | four | 121 | mpfi | 3.54 | 3.070 | 9.0 | 160 | 5500 | 19 | 26 | 18150 | 11405.066 | 16386.663 | 16876.173 |
| 140 | gas | std | two | hatchback | fwd | front | 93.7 | 157.9 | 63.6 | 53.7 | 2120 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 8.7 | 73 | 4400 | 26 | 31 | 7053 | 6355.460 | 7283.776 | 7346.447 |
| 147 | gas | std | four | wagon | fwd | front | 97.0 | 173.5 | 65.4 | 53.0 | 2290 | ohcf | four | 108 | 2bbl | 3.62 | 2.640 | 9.0 | 82 | 4800 | 28 | 32 | 7463 | 8240.997 | 10525.349 | 8439.802 |
| 155 | gas | std | four | wagon | 4wd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | ohc | four | 92 | 2bbl | 3.05 | 3.030 | 9.0 | 62 | 4800 | 27 | 32 | 7898 | 5394.102 | 10525.349 | 8112.638 |
| 158 | gas | std | four | hatchback | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2109 | ohc | four | 98 | 2bbl | 3.19 | 3.030 | 9.0 | 70 | 4800 | 30 | 37 | 7198 | 6241.136 | 7283.776 | 7781.830 |
| 159 | diesel | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2275 | ohc | four | 110 | idi | 3.27 | 3.350 | 22.5 | 56 | 4500 | 34 | 36 | 7898 | 6944.200 | 7283.776 | 7797.333 |
| 166 | gas | std | two | sedan | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2265 | dohc | four | 98 | mpfi | 3.24 | 3.080 | 9.4 | 112 | 6600 | 26 | 29 | 9298 | 7253.067 | 7283.776 | 9778.212 |
| 170 | gas | std | two | hatchback | rwd | front | 98.4 | 176.2 | 65.6 | 52.0 | 2551 | ohc | four | 146 | mpfi | 3.62 | 3.500 | 9.3 | 116 | 4800 | 24 | 30 | 9989 | 12124.164 | 10525.349 | 10060.150 |
| 178 | gas | std | four | hatchback | fwd | front | 102.4 | 175.6 | 66.5 | 53.9 | 2458 | ohc | four | 122 | mpfi | 3.31 | 3.540 | 8.7 | 92 | 4200 | 27 | 32 | 11248 | 8011.923 | 10525.349 | 10354.529 |
| 180 | gas | std | two | hatchback | rwd | front | 102.9 | 183.5 | 67.7 | 52.0 | 3016 | dohc | six | 171 | mpfi | 3.27 | 3.350 | 9.3 | 161 | 5200 | 19 | 24 | 15998 | 20629.708 | 16386.663 | 16909.436 |
| 188 | diesel | turbo | four | sedan | fwd | front | 97.3 | 171.7 | 65.5 | 55.7 | 2319 | ohc | four | 97 | idi | 3.01 | 3.400 | 23.0 | 68 | 4500 | 37 | 42 | 9495 | 9834.068 | 10525.349 | 8528.115 |
| 203 | gas | std | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3012 | ohcv | six | 173 | mpfi | 3.58 | 2.870 | 8.8 | 134 | 5500 | 18 | 23 | 21485 | 18259.272 | 16386.663 | 20142.060 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2674.766 | 2452.158 | 2258.095 |
Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.
Se utilizó la semilla “2001”, debido a que al intentar con la semilla de los ultimos dígitos del numero de control “1301” el programa arrojaba error al compilar.
El modelo de regresión linea múltiple destaca variables estadísticamente significativas:
El coeficiente de intersección tiene un nivel por encima del 95%, desplegado como del 99%.
Las variables enginetypeohc, peakrpm, enginetypeohcv, enginesize tienen un nivel de confianza como predictor del 99.9%.
Las variables stroke, cylindernumbersix, cylindernumberfour, cylindernumberfive, tienen un nivel de confianza como predictor del 99%.
Las variables fuelsystemspdi, cylindernumbertwelve, enginetypeohcf, enginetypedohcv, enginelocationrear, carwidth, aspirationturbo tienen un nivel de confianza como predictor del 95%.
Las variables carbodyhardtop, carbodyhatchback, boreratio, tienen un nivel de confianza como predictor del 90%.
En el modelo de árbol de regresión sus variables de importancia fueron: enginesize y curbweight.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, carlenght, cylindernumber y citympg.
-El resultado de RMSE del modelo RM es de: 2674.766.
-El resultado de RMSE del modelo AR es de: -2452.158.
-El resultado de RMSE del modelo RF es de: 2258.095.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios, con un valor de 2258.095, con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.