Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1321) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16430.00 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.8 | 101 | 5800 | 23 | 29 | 16925.00 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875 |
| 23 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6377 |
| 35 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | ohc | four | 92 | 1bbl | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7129 |
| 36 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 163.4 | 64.0 | 54.5 | 2010 | ohc | four | 92 | 1bbl | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7295 |
| 37 | 0 | gas | std | four | wagon | fwd | front | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | ohc | four | 92 | 1bbl | 2.92 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7295 |
| 42 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | ohc | four | 110 | mpfi | 3.15 | 3.58 | 9.0 | 101 | 5800 | 24 | 28 | 12945 |
| 48 | 0 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.1 | 176 | 4750 | 15 | 19 | 32250 |
| 49 | 0 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.1 | 176 | 4750 | 15 | 19 | 35550 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4499.9 -1210.6 -55.9 832.3 9434.2
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.790e+04 2.156e+04 -0.830 0.407908
## symboling -1.390e+02 3.033e+02 -0.458 0.647419
## fueltypegas -7.818e+03 8.718e+03 -0.897 0.371541
## aspirationturbo 1.154e+03 1.356e+03 0.851 0.396466
## doornumbertwo 3.308e+02 6.840e+02 0.484 0.629480
## carbodyhardtop -3.822e+03 1.981e+03 -1.929 0.056008 .
## carbodyhatchback -3.135e+03 1.672e+03 -1.875 0.063157 .
## carbodysedan -2.373e+03 1.805e+03 -1.315 0.190876
## carbodywagon -3.012e+03 1.962e+03 -1.535 0.127338
## drivewheelfwd 1.799e+02 1.536e+03 0.117 0.906986
## drivewheelrwd 6.847e+02 1.680e+03 0.408 0.684240
## enginelocationrear 7.017e+03 3.032e+03 2.315 0.022297 *
## wheelbase 2.537e+01 1.221e+02 0.208 0.835695
## carlength -3.374e+01 5.700e+01 -0.592 0.554968
## carwidth 5.185e+02 2.946e+02 1.760 0.080868 .
## carheight 5.463e+01 1.507e+02 0.362 0.717679
## curbweight 2.274e+00 2.178e+00 1.044 0.298392
## enginetypedohcv -1.144e+04 5.841e+03 -1.958 0.052525 .
## enginetypel 1.062e+03 2.023e+03 0.525 0.600664
## enginetypeohc 4.178e+03 1.154e+03 3.620 0.000429 ***
## enginetypeohcf 5.587e+02 1.925e+03 0.290 0.772121
## enginetypeohcv -4.440e+03 1.500e+03 -2.961 0.003682 **
## enginetyperotor -1.717e+03 5.553e+03 -0.309 0.757613
## cylindernumberfive -1.084e+04 3.277e+03 -3.309 0.001229 **
## cylindernumberfour -1.196e+04 3.838e+03 -3.117 0.002273 **
## cylindernumbersix -8.455e+03 2.768e+03 -3.054 0.002766 **
## cylindernumberthree -3.974e+03 5.520e+03 -0.720 0.472885
## cylindernumbertwelve -1.464e+04 5.112e+03 -2.863 0.004932 **
## cylindernumbertwo NA NA NA NA
## enginesize 1.161e+02 3.562e+01 3.261 0.001438 **
## fuelsystem2bbl -3.682e+02 1.086e+03 -0.339 0.735138
## fuelsystem4bbl -1.448e+03 2.984e+03 -0.485 0.628381
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -3.229e+03 2.877e+03 -1.122 0.263945
## fuelsystemmpfi -5.854e+02 1.261e+03 -0.464 0.643304
## fuelsystemspdi -2.511e+03 1.707e+03 -1.471 0.143784
## fuelsystemspfi -7.598e+02 2.684e+03 -0.283 0.777607
## boreratio 2.378e+02 1.927e+03 0.123 0.902010
## stroke -5.162e+03 1.064e+03 -4.852 3.63e-06 ***
## compressionratio -3.830e+02 6.420e+02 -0.596 0.551947
## horsepower 2.913e+01 3.098e+01 0.940 0.348861
## peakrpm 1.903e+00 7.689e-01 2.475 0.014697 *
## citympg -2.166e+02 1.944e+02 -1.114 0.267361
## highwaympg 2.378e+02 1.929e+02 1.233 0.220109
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2307 on 123 degrees of freedom
## Multiple R-squared: 0.934, Adjusted R-squared: 0.912
## F-statistic: 42.45 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 5 8 9 23 35 36 37 42
## 15839.194 18262.868 19966.563 6202.067 7015.853 7431.001 7194.976 9643.037
## 48 49 52 60 64 74 77 83
## 29529.379 29529.379 6313.567 10442.915 13373.157 43761.687 6115.531 13933.624
## 87 89 100 102 111 113 121 129
## 10733.865 9535.718 10378.333 16927.113 16190.373 17819.076 6067.220 37200.516
## 131 143 145 146 150 155 157 159
## 10151.238 7409.759 7683.388 11198.295 9644.316 6246.826 8288.497 9424.113
## 160 167 170 173 175 195 196 205
## 10400.751 7198.810 13349.287 17503.023 12880.647 17952.482 17523.338 20426.816
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 1321 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 5 | 17450 | 15839.194 |
| 8 | 18920 | 18262.868 |
| 9 | 23875 | 19966.563 |
| 23 | 6377 | 6202.067 |
| 35 | 7129 | 7015.853 |
| 36 | 7295 | 7431.001 |
| 37 | 7295 | 7194.976 |
| 42 | 12945 | 9643.037 |
| 48 | 32250 | 29529.379 |
| 49 | 35550 | 29529.379 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 2283.043
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 9914069000 13196.010
## 2) enginesize< 182 151 3153903000 11299.430
## 4) curbweight< 2544 96 539037400 8514.677
## 8) horsepower< 83 52 57008810 7051.635 *
## 9) horsepower>=83 44 239179900 10243.730
## 18) highwaympg>=29.5 34 59677900 9387.235 *
## 19) highwaympg< 29.5 10 69758620 13155.800 *
## 5) curbweight>=2544 55 570966900 16160.090
## 10) horsepower< 118 27 231195100 14663.260 *
## 11) horsepower>=118 28 220944500 17603.470
## 22) stroke>=3.24 15 56366990 15775.740 *
## 23) stroke< 3.24 13 56651040 19712.380 *
## 3) enginesize>=182 14 358772600 33651.960 *
Pendiente
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 5 8 9 23 35 36 37 42
## 14663.259 14663.259 15775.744 7051.635 7051.635 7051.635 7051.635 13155.800
## 48 49 52 60 64 74 77 83
## 33651.964 33651.964 7051.635 9387.235 7051.635 33651.964 7051.635 15775.744
## 87 89 100 102 111 113 121 129
## 9387.235 9387.235 9387.235 15775.744 14663.259 14663.259 7051.635 33651.964
## 131 143 145 146 150 155 157 159
## 14663.259 7051.635 7051.635 13155.800 14663.259 7051.635 7051.635 7051.635
## 160 167 170 173 175 195 196 205
## 7051.635 13155.800 14663.259 14663.259 7051.635 14663.259 14663.259 14663.259
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 5 | 17450 | 14663.259 |
| 8 | 18920 | 14663.259 |
| 9 | 23875 | 15775.744 |
| 23 | 6377 | 7051.635 |
| 35 | 7129 | 7051.635 |
| 36 | 7295 | 7051.635 |
| 37 | 7295 | 7051.635 |
| 42 | 12945 | 13155.800 |
| 48 | 32250 | 33651.964 |
| 49 | 35550 | 33651.964 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3070.846
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 5368937
## % Var explained: 91.06
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 18271337.199 2763218511
## curbweight 21802753.149 1757519442
## highwaympg 12145847.440 1332832000
## horsepower 8390936.228 961235998
## cylindernumber 2075140.760 666974327
## carwidth 1157069.585 589035280
## citympg 1047235.035 231747538
## wheelbase 2221977.418 177946148
## fuelsystem 2814567.544 155389104
## boreratio 2341762.453 132144410
## carlength 1097699.152 105813208
## peakrpm 1910939.006 98352418
## enginetype 563971.312 74589618
## compressionratio 481084.819 67479818
## carheight 53962.934 64519221
## stroke 695015.510 56212464
## enginelocation 0.000 54789810
## drivewheel 683383.192 48007468
## carbody 541857.313 36484529
## aspiration -22111.637 11539279
## symboling -1291.305 8719832
## doornumber -4068.009 5593864
## fueltype -1077.404 3762167
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 5 8 9 23 35 36 37 42
## 16102.142 17943.562 20188.880 5949.182 6754.386 7448.650 7518.507 13280.927
## 48 49 52 60 64 74 77 83
## 37045.684 37045.684 5999.036 10290.985 11808.723 41062.341 6219.577 14015.300
## 87 89 100 102 111 113 121 129
## 8424.019 9953.445 9432.283 14327.299 18137.658 16391.282 6628.247 31710.554
## 131 143 145 146 150 155 157 159
## 10598.618 8259.032 12352.051 11584.069 15127.181 8005.493 7677.760 8893.638
## 160 167 170 173 175 195 196 205
## 8962.406 9819.596 10823.414 12785.221 12142.733 16720.929 16730.369 18947.768
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 5 | 17450 | 16102.142 |
| 8 | 18920 | 17943.562 |
| 9 | 23875 | 20188.880 |
| 23 | 6377 | 5949.182 |
| 35 | 7129 | 6754.386 |
| 36 | 7295 | 7448.650 |
| 37 | 7295 | 7518.507 |
| 42 | 12945 | 13280.927 |
| 48 | 32250 | 37045.684 |
| 49 | 35550 | 37045.684 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2169.571
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 5 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450 | 15839.194 | 14663.259 | 16102.142 |
| 8 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920 | 18262.868 | 14663.259 | 17943.562 |
| 9 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875 | 19966.563 | 15775.744 | 20188.880 |
| 23 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6377 | 6202.067 | 7051.635 | 5949.182 |
| 35 | gas | std | two | hatchback | fwd | front | 93.7 | 150.0 | 64.0 | 52.6 | 1956 | ohc | four | 92 | 1bbl | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7129 | 7015.853 | 7051.635 | 6754.386 |
| 36 | gas | std | four | sedan | fwd | front | 96.5 | 163.4 | 64.0 | 54.5 | 2010 | ohc | four | 92 | 1bbl | 2.91 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7295 | 7431.001 | 7051.635 | 7448.650 |
| 37 | gas | std | four | wagon | fwd | front | 96.5 | 157.1 | 63.9 | 58.3 | 2024 | ohc | four | 92 | 1bbl | 2.92 | 3.41 | 9.2 | 76 | 6000 | 30 | 34 | 7295 | 7194.976 | 7051.635 | 7518.507 |
| 42 | gas | std | four | sedan | fwd | front | 96.5 | 175.4 | 65.2 | 54.1 | 2465 | ohc | four | 110 | mpfi | 3.15 | 3.58 | 9.0 | 101 | 5800 | 24 | 28 | 12945 | 9643.037 | 13155.800 | 13280.927 |
| 48 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.1 | 176 | 4750 | 15 | 19 | 32250 | 29529.379 | 33651.964 | 37045.684 |
| 49 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.1 | 176 | 4750 | 15 | 19 | 35550 | 29529.379 | 33651.964 | 37045.684 |
| 52 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1900 | ohc | four | 91 | 2bbl | 3.03 | 3.15 | 9.0 | 68 | 5000 | 31 | 38 | 6095 | 6313.567 | 7051.635 | 5999.036 |
| 60 | gas | std | two | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 53.7 | 2385 | ohc | four | 122 | 2bbl | 3.39 | 3.39 | 8.6 | 84 | 4800 | 26 | 32 | 8845 | 10442.915 | 9387.235 | 10290.985 |
| 64 | diesel | std | four | sedan | fwd | front | 98.8 | 177.8 | 66.5 | 55.5 | 2443 | ohc | four | 122 | idi | 3.39 | 3.39 | 22.7 | 64 | 4650 | 36 | 42 | 10795 | 13373.157 | 7051.635 | 11808.723 |
| 74 | gas | std | four | sedan | rwd | front | 120.9 | 208.1 | 71.7 | 56.7 | 3900 | ohcv | eight | 308 | mpfi | 3.80 | 3.35 | 8.0 | 184 | 4500 | 14 | 16 | 40960 | 43761.687 | 33651.964 | 41062.341 |
| 77 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 64.4 | 50.8 | 1918 | ohc | four | 92 | 2bbl | 2.97 | 3.23 | 9.4 | 68 | 5500 | 37 | 41 | 5389 | 6115.531 | 7051.635 | 6219.577 |
| 83 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2833 | ohc | four | 156 | spdi | 3.58 | 3.86 | 7.0 | 145 | 5000 | 19 | 24 | 12629 | 13933.624 | 15775.744 | 14015.300 |
| 87 | gas | std | four | sedan | fwd | front | 96.3 | 172.4 | 65.4 | 51.6 | 2405 | ohc | four | 122 | 2bbl | 3.35 | 3.46 | 8.5 | 88 | 5000 | 25 | 32 | 8189 | 10733.865 | 9387.235 | 8424.019 |
| 89 | gas | std | four | sedan | fwd | front | 96.3 | 172.4 | 65.4 | 51.6 | 2403 | ohc | four | 110 | spdi | 3.17 | 3.46 | 7.5 | 116 | 5500 | 23 | 30 | 9279 | 9535.718 | 9387.235 | 9953.445 |
| 100 | gas | std | four | hatchback | fwd | front | 97.2 | 173.4 | 65.2 | 54.7 | 2324 | ohc | four | 120 | 2bbl | 3.33 | 3.47 | 8.5 | 97 | 5200 | 27 | 34 | 8949 | 10378.333 | 9387.235 | 9432.283 |
| 102 | gas | std | four | sedan | fwd | front | 100.4 | 181.7 | 66.5 | 55.1 | 3095 | ohcv | six | 181 | mpfi | 3.43 | 3.27 | 9.0 | 152 | 5200 | 17 | 22 | 13499 | 16927.113 | 15775.744 | 14327.299 |
| 111 | diesel | turbo | four | wagon | rwd | front | 114.2 | 198.9 | 68.4 | 58.7 | 3430 | l | four | 152 | idi | 3.70 | 3.52 | 21.0 | 95 | 4150 | 25 | 25 | 13860 | 16190.373 | 14663.259 | 18137.658 |
| 113 | diesel | turbo | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3252 | l | four | 152 | idi | 3.70 | 3.52 | 21.0 | 95 | 4150 | 28 | 33 | 16900 | 17819.076 | 14663.259 | 16391.282 |
| 121 | gas | std | four | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.4 | 68 | 5500 | 31 | 38 | 6229 | 6067.220 | 7051.635 | 6628.247 |
| 129 | gas | std | two | convertible | rwd | rear | 89.5 | 168.9 | 65.0 | 51.6 | 2800 | ohcf | six | 194 | mpfi | 3.74 | 2.90 | 9.5 | 207 | 5900 | 17 | 25 | 37028 | 37200.516 | 33651.964 | 31710.554 |
| 131 | gas | std | four | wagon | fwd | front | 96.1 | 181.5 | 66.5 | 55.2 | 2579 | ohc | four | 132 | mpfi | 3.46 | 3.90 | 8.7 | 90 | 5100 | 23 | 31 | 9295 | 10151.238 | 14663.259 | 10598.618 |
| 143 | gas | std | four | sedan | fwd | front | 97.2 | 172.0 | 65.4 | 52.5 | 2190 | ohcf | four | 108 | 2bbl | 3.62 | 2.64 | 9.5 | 82 | 4400 | 28 | 33 | 7775 | 7409.759 | 7051.635 | 8259.032 |
| 145 | gas | std | four | sedan | 4wd | front | 97.0 | 172.0 | 65.4 | 54.3 | 2385 | ohcf | four | 108 | 2bbl | 3.62 | 2.64 | 9.0 | 82 | 4800 | 24 | 25 | 9233 | 7683.388 | 7051.635 | 12352.051 |
| 146 | gas | turbo | four | sedan | 4wd | front | 97.0 | 172.0 | 65.4 | 54.3 | 2510 | ohcf | four | 108 | mpfi | 3.62 | 2.64 | 7.7 | 111 | 4800 | 24 | 29 | 11259 | 11198.295 | 13155.800 | 11584.069 |
| 150 | gas | turbo | four | wagon | 4wd | front | 96.9 | 173.6 | 65.4 | 54.9 | 2650 | ohcf | four | 108 | mpfi | 3.62 | 2.64 | 7.7 | 111 | 4800 | 23 | 23 | 11694 | 9644.316 | 14663.259 | 15127.181 |
| 155 | gas | std | four | wagon | 4wd | front | 95.7 | 169.7 | 63.6 | 59.1 | 2290 | ohc | four | 92 | 2bbl | 3.05 | 3.03 | 9.0 | 62 | 4800 | 27 | 32 | 7898 | 6246.826 | 7051.635 | 8005.493 |
| 157 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2081 | ohc | four | 98 | 2bbl | 3.19 | 3.03 | 9.0 | 70 | 4800 | 30 | 37 | 6938 | 8288.497 | 7051.635 | 7677.760 |
| 159 | diesel | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 53.0 | 2275 | ohc | four | 110 | idi | 3.27 | 3.35 | 22.5 | 56 | 4500 | 34 | 36 | 7898 | 9424.113 | 7051.635 | 8893.638 |
| 160 | diesel | std | four | hatchback | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2275 | ohc | four | 110 | idi | 3.27 | 3.35 | 22.5 | 56 | 4500 | 38 | 47 | 7788 | 10400.751 | 7051.635 | 8962.406 |
| 167 | gas | std | two | hatchback | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | dohc | four | 98 | mpfi | 3.24 | 3.08 | 9.4 | 112 | 6600 | 26 | 29 | 9538 | 7198.810 | 13155.800 | 9819.596 |
| 170 | gas | std | two | hatchback | rwd | front | 98.4 | 176.2 | 65.6 | 52.0 | 2551 | ohc | four | 146 | mpfi | 3.62 | 3.50 | 9.3 | 116 | 4800 | 24 | 30 | 9989 | 13349.287 | 14663.259 | 10823.414 |
| 173 | gas | std | two | convertible | rwd | front | 98.4 | 176.2 | 65.6 | 53.0 | 2975 | ohc | four | 146 | mpfi | 3.62 | 3.50 | 9.3 | 116 | 4800 | 24 | 30 | 17669 | 17503.023 | 14663.259 | 12785.221 |
| 175 | diesel | turbo | four | sedan | fwd | front | 102.4 | 175.6 | 66.5 | 54.9 | 2480 | ohc | four | 110 | idi | 3.27 | 3.35 | 22.5 | 73 | 4500 | 30 | 33 | 10698 | 12880.647 | 7051.635 | 12142.733 |
| 195 | gas | std | four | sedan | rwd | front | 104.3 | 188.8 | 67.2 | 56.2 | 2912 | ohc | four | 141 | mpfi | 3.78 | 3.15 | 9.5 | 114 | 5400 | 23 | 28 | 12940 | 17952.482 | 14663.259 | 16720.929 |
| 196 | gas | std | four | wagon | rwd | front | 104.3 | 188.8 | 67.2 | 57.5 | 3034 | ohc | four | 141 | mpfi | 3.78 | 3.15 | 9.5 | 114 | 5400 | 23 | 28 | 13415 | 17523.338 | 14663.259 | 16730.369 |
| 205 | gas | turbo | four | sedan | rwd | front | 109.1 | 188.8 | 68.9 | 55.5 | 3062 | ohc | four | 141 | mpfi | 3.78 | 3.15 | 9.5 | 114 | 5400 | 19 | 25 | 22625 | 20426.816 | 14663.259 | 18947.768 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 2283.043 | 3070.846 | 2169.571 |
Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.
El modelo de regresión linea múltiple destaca algunas variables estadísticamente significativas.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.
Estos modelos se pueden utilizar para ayudar a los compradores a tomar decisiones de compra. También ayudan a los fabricantes de automóviles a pronosticar la demanda y establecer programas de producción.
Entre python y r, r sería la mejor opción para a realización del ejercicio.