Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Librerías
library(readr)
library(PerformanceAnalytics) # Para correlaciones gráficas
library(dplyr)
library(knitr) # Para datos tabulares
library(kableExtra) # Para datos tabulares amigables
library(ggplot2) # Para visualizar
library(plotly) # Para visualizar
library(caret) # Para particionar
library(Metrics) # Para determinar rmse
library(rpart) # Para árbol
library(rpart.plot) # Para árbol
library(randomForest) # Para random forest
library(caret) # Para hacer divisiones o particiones
library(reshape) # Para renombrar columnas
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv",
fileEncoding = "UTF-8",
stringsAsFactors = TRUE)
Hay 205 observaciones y 26 variables de las cuales se eligen las variables numéricas.
str(datos)
## 'data.frame': 205 obs. of 26 variables:
## $ car_ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ symboling : int 3 3 1 2 2 2 1 1 1 0 ...
## $ CarName : Factor w/ 147 levels "alfa-romero giulia",..: 1 3 2 4 5 9 5 7 6 8 ...
## $ fueltype : Factor w/ 2 levels "diesel","gas": 2 2 2 2 2 2 2 2 2 2 ...
## $ aspiration : Factor w/ 2 levels "std","turbo": 1 1 1 1 1 1 1 1 2 2 ...
## $ doornumber : Factor w/ 2 levels "four","two": 2 2 2 1 1 2 1 1 1 2 ...
## $ carbody : Factor w/ 5 levels "convertible",..: 1 1 3 4 4 4 4 5 4 3 ...
## $ drivewheel : Factor w/ 3 levels "4wd","fwd","rwd": 3 3 3 2 1 2 2 2 2 1 ...
## $ enginelocation : Factor w/ 2 levels "front","rear": 1 1 1 1 1 1 1 1 1 1 ...
## $ wheelbase : num 88.6 88.6 94.5 99.8 99.4 ...
## $ carlength : num 169 169 171 177 177 ...
## $ carwidth : num 64.1 64.1 65.5 66.2 66.4 66.3 71.4 71.4 71.4 67.9 ...
## $ carheight : num 48.8 48.8 52.4 54.3 54.3 53.1 55.7 55.7 55.9 52 ...
## $ curbweight : int 2548 2548 2823 2337 2824 2507 2844 2954 3086 3053 ...
## $ enginetype : Factor w/ 7 levels "dohc","dohcv",..: 1 1 6 4 4 4 4 4 4 4 ...
## $ cylindernumber : Factor w/ 7 levels "eight","five",..: 3 3 4 3 2 2 2 2 2 2 ...
## $ enginesize : int 130 130 152 109 136 136 136 136 131 131 ...
## $ fuelsystem : Factor w/ 8 levels "1bbl","2bbl",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ boreratio : num 3.47 3.47 2.68 3.19 3.19 3.19 3.19 3.19 3.13 3.13 ...
## $ stroke : num 2.68 2.68 3.47 3.4 3.4 3.4 3.4 3.4 3.4 3.4 ...
## $ compressionratio: num 9 9 9 10 8 8.5 8.5 8.5 8.3 7 ...
## $ horsepower : int 111 111 154 102 115 110 110 110 140 160 ...
## $ peakrpm : int 5000 5000 5000 5500 5500 5500 5500 5500 5500 5500 ...
## $ citympg : int 21 21 19 24 18 19 19 19 17 16 ...
## $ highwaympg : int 27 27 26 30 22 25 25 25 20 22 ...
## $ price : num 13495 16500 16500 13950 17450 ...
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| car_ID | symboling | CarName | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | alfa-romero giulia | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | alfa-romero stelvio | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | alfa-romero Quadrifoglio | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 4 | 2 | audi 100 ls | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 5 | 2 | audi 100ls | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | audi fox | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 7 | 1 | audi 100ls | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 8 | 1 | audi 5000 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | audi 4000 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | audi 5000s (diesel) | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos <- datos[, c(2,4:26)]
Nuevamente los primeros registros
kable(head(datos, 10), caption = "Datos de precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.0 | 102 | 5500 | 24 | 30 | 13950.00 |
| 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 17710.00 |
| 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
Datos de entrenamiento al 80% de los datos y 20% los datos de validación.
n <- nrow(datos)
set.seed(1394) # Semilla
entrena <- createDataPartition(y = datos$price, p = 0.80, list = FALSE, times = 1)
# Datos entrenamiento
datos.entrenamiento <- datos[entrena, ] # [renglones, columna]
# Datos validación
datos.validacion <- datos[-entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de Entrenamient. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 13495.00 |
| 2 | 3 | gas | std | two | convertible | rwd | front | 88.6 | 168.8 | 64.1 | 48.8 | 2548 | dohc | four | 130 | mpfi | 3.47 | 2.68 | 9.0 | 111 | 5000 | 21 | 27 | 16500.00 |
| 3 | 1 | gas | std | two | hatchback | rwd | front | 94.5 | 171.2 | 65.5 | 52.4 | 2823 | ohcv | six | 152 | mpfi | 2.68 | 3.47 | 9.0 | 154 | 5000 | 19 | 26 | 16500.00 |
| 5 | 2 | gas | std | four | sedan | 4wd | front | 99.4 | 176.6 | 66.4 | 54.3 | 2824 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.0 | 115 | 5500 | 18 | 22 | 17450.00 |
| 6 | 2 | gas | std | two | sedan | fwd | front | 99.8 | 177.3 | 66.3 | 53.1 | 2507 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 15250.00 |
| 8 | 1 | gas | std | four | wagon | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2954 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.5 | 110 | 5500 | 19 | 25 | 18920.00 |
| 9 | 1 | gas | turbo | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.9 | 3086 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 8.3 | 140 | 5500 | 17 | 20 | 23875.00 |
| 10 | 0 | gas | turbo | two | hatchback | 4wd | front | 99.5 | 178.2 | 67.9 | 52.0 | 3053 | ohc | five | 131 | mpfi | 3.13 | 3.40 | 7.0 | 160 | 5500 | 16 | 22 | 17859.17 |
| 13 | 0 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2710 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.0 | 121 | 4250 | 21 | 28 | 20970.00 |
| 16 | 0 | gas | std | four | sedan | rwd | front | 103.5 | 189.0 | 66.9 | 55.7 | 3230 | ohc | six | 209 | mpfi | 3.62 | 3.39 | 8.0 | 182 | 5400 | 16 | 22 | 30760.00 |
kable(head(datos.validacion, 10), caption = "Datos de Validación. Precios de carros") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| symboling | fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | 2 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.00 | 102 | 5500 | 24 | 30 | 13950 |
| 7 | 1 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 25 | 17710 |
| 11 | 2 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.80 | 101 | 5800 | 23 | 29 | 16430 |
| 12 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.80 | 101 | 5800 | 23 | 29 | 16925 |
| 14 | 0 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105 |
| 15 | 1 | gas | std | four | sedan | rwd | front | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 20 | 25 | 24565 |
| 22 | 1 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572 |
| 24 | 1 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | mpfi | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 7957 |
| 28 | 1 | gas | turbo | two | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | ohc | four | 98 | mpfi | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 8558 |
| 36 | 0 | gas | std | four | sedan | fwd | front | 96.5 | 163.4 | 64.0 | 54.5 | 2010 | ohc | four | 92 | 1bbl | 2.91 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 7295 |
Se construye el modelo de regresión lineal múltiple (rm). La variable precio en función de todas las variables independientes incluyendo numéricas y no numéricas.
La expresión price ~ . singnifica price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg
# Modelo de regresión lineal múltiple para observar variables de importancia
#modelo_rm <- lm(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento)
modelo_rm <- lm(formula = price ~ . ,
data = datos.entrenamiento)
summary(modelo_rm)
##
## Call:
## lm(formula = price ~ ., data = datos.entrenamiento)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3883.3 -1189.2 -159.6 1189.9 8670.2
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.007e+04 1.851e+04 -0.544 0.587160
## symboling -1.205e+02 2.582e+02 -0.467 0.641673
## fueltypegas -1.109e+04 7.989e+03 -1.389 0.167474
## aspirationturbo 2.411e+03 9.607e+02 2.509 0.013397 *
## doornumbertwo 5.199e+02 6.356e+02 0.818 0.414922
## carbodyhardtop -4.471e+03 1.533e+03 -2.918 0.004194 **
## carbodyhatchback -2.721e+03 1.540e+03 -1.767 0.079707 .
## carbodysedan -1.860e+03 1.661e+03 -1.120 0.264997
## carbodywagon -2.447e+03 1.784e+03 -1.372 0.172524
## drivewheelfwd 3.956e+02 1.063e+03 0.372 0.710448
## drivewheelrwd 8.919e+02 1.249e+03 0.714 0.476390
## enginelocationrear 7.299e+03 2.761e+03 2.643 0.009276 **
## wheelbase -3.636e+01 1.088e+02 -0.334 0.738889
## carlength -1.362e+01 5.372e+01 -0.254 0.800270
## carwidth 7.027e+02 2.574e+02 2.730 0.007269 **
## carheight 8.335e+01 1.405e+02 0.593 0.554121
## curbweight 2.667e+00 1.950e+00 1.368 0.173841
## enginetypedohcv -1.998e+03 5.788e+03 -0.345 0.730540
## enginetypel -6.880e+02 1.880e+03 -0.366 0.714989
## enginetypeohc 4.069e+03 1.088e+03 3.741 0.000280 ***
## enginetypeohcf 2.908e+03 1.662e+03 1.750 0.082654 .
## enginetypeohcv -4.908e+03 1.337e+03 -3.669 0.000361 ***
## enginetyperotor 1.408e+04 7.119e+03 1.978 0.050157 .
## cylindernumberfive -2.033e+03 4.435e+03 -0.458 0.647403
## cylindernumberfour 1.626e+02 5.480e+03 0.030 0.976383
## cylindernumbersix -2.724e+03 3.440e+03 -0.792 0.430022
## cylindernumberthree 1.267e+04 7.098e+03 1.786 0.076643 .
## cylindernumbertwelve -2.226e+04 5.525e+03 -4.030 9.71e-05 ***
## cylindernumbertwo NA NA NA NA
## enginesize 2.217e+02 4.300e+01 5.156 9.79e-07 ***
## fuelsystem2bbl 1.015e+01 9.697e+02 0.010 0.991669
## fuelsystem4bbl -3.496e+02 2.738e+03 -0.128 0.898599
## fuelsystemidi NA NA NA NA
## fuelsystemmfi -4.503e+03 2.621e+03 -1.718 0.088309 .
## fuelsystemmpfi -7.522e+01 1.134e+03 -0.066 0.947216
## fuelsystemspdi -4.113e+03 1.542e+03 -2.668 0.008660 **
## fuelsystemspfi 9.782e+01 2.478e+03 0.039 0.968572
## boreratio -9.120e+03 2.849e+03 -3.201 0.001742 **
## stroke -7.621e+03 1.242e+03 -6.136 1.07e-08 ***
## compressionratio -6.800e+02 6.009e+02 -1.132 0.260009
## horsepower 1.379e+01 2.605e+01 0.530 0.597382
## peakrpm 2.318e+00 7.632e-01 3.037 0.002916 **
## citympg -2.245e+02 1.559e+02 -1.440 0.152487
## highwaympg 2.145e+02 1.378e+02 1.556 0.122190
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2109 on 123 degrees of freedom
## Multiple R-squared: 0.9452, Adjusted R-squared: 0.927
## F-statistic: 51.79 on 41 and 123 DF, p-value: < 2.2e-16
¿cuáles son variables que están por encima del 90% de confianza como predictores?
El coeficiente de intersección tiene un nivel de confianza del 95%.
Se observan algunos coeficientes igual o por encima del 90% de confianza
Dado que algunos predictores no presentan un nivel de confianza por encima del 90% es posible que se quiera construir un modelo con solo los predictores que presentan niveles de confianza igual o superior del 90%. Es para trabajos futuros, no se hace en este caso.
En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9736 significa que las variables independientes explican aproximadamente el 97.36% de la variable dependiente precio.
predicciones_rm <- predict(object = modelo_rm, newdata = datos.validacion)
## Warning in predict.lm(object = modelo_rm, newdata = datos.validacion):
## prediction from a rank-deficient fit may be misleading
predicciones_rm
## 4 7 11 12 14 15 22 24
## 10084.248 19859.204 13249.152 12970.168 19025.655 20602.103 5690.098 10952.752
## 28 36 45 48 49 53 55 60
## 11965.026 7863.641 6733.548 33329.521 33329.521 6428.079 6328.207 10452.823
## 73 74 80 85 86 97 99 101
## 36993.542 47062.895 6985.282 14056.681 10599.555 5721.577 3525.766 10854.155
## 116 119 121 132 135 151 163 167
## 10617.766 5808.905 6106.314 9222.908 29835.020 5888.146 7742.246 6781.442
## 174 175 177 190 192 194 196 200
## 8784.681 12538.519 9039.330 11893.611 16431.738 11265.698 16539.946 19686.137
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rm)
Al haber usado semilla 1394 y habiendo realizado las pruebas, se concluye que los datos de entrenamiento deben de cubrir y garantizar todas los posibles valores de las variables categóricas en los datos de validación, es decir, no debe haber valores en datos de validación que no se hayan entrenado.
kable(head(comparaciones, 10), caption = "Regresión Lineal Múltiple. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 4 | 13950 | 10084.248 |
| 7 | 17710 | 19859.204 |
| 11 | 16430 | 13249.152 |
| 12 | 16925 | 12970.168 |
| 14 | 21105 | 19025.655 |
| 15 | 24565 | 20602.103 |
| 22 | 5572 | 5690.098 |
| 24 | 7957 | 10952.752 |
| 28 | 8558 | 11965.026 |
| 36 | 7295 | 7863.641 |
rmse_rm <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rm
## [1] 3452.468
Se construye el modelo de árbol de regresión (ar)
modelo_ar <- rpart(formula = price ~ symboling + fueltype + aspiration + doornumber + carbody + drivewheel + enginelocation + wheelbase + carlength + carwidth + carheight + curbweight + enginetype + cylindernumber + enginesize + fuelsystem + boreratio + stroke + compressionratio + horsepower + peakrpm + citympg + highwaympg, data = datos.entrenamiento )
modelo_ar
## n= 165
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 165 9988449000 13174.720
## 2) enginesize< 182 151 3117246000 11263.100
## 4) curbweight< 2544 92 378958800 8311.440
## 8) curbweight< 2291.5 58 61703310 7194.129 *
## 9) curbweight>=2291.5 34 121332600 10217.440 *
## 5) curbweight>=2544 59 686906100 15865.700
## 10) carwidth< 68.6 52 458980000 15207.910 *
## 11) carwidth>=68.6 7 38284790 20752.140 *
## 3) enginesize>=182 14 367921100 33792.820 *
enginesize
rpart.plot(modelo_ar)
predicciones_ar <- predict(object = modelo_ar, newdata = datos.validacion)
predicciones_ar
## 4 7 11 12 14 15 22 24
## 10217.441 20752.143 10217.441 10217.441 15207.907 15207.907 7194.129 7194.129
## 28 36 45 48 49 53 55 60
## 7194.129 7194.129 7194.129 33792.821 33792.821 7194.129 7194.129 10217.441
## 73 74 80 85 86 97 99 101
## 33792.821 33792.821 7194.129 15207.907 10217.441 7194.129 7194.129 10217.441
## 116 119 121 132 135 151 163 167
## 15207.907 7194.129 7194.129 10217.441 15207.907 7194.129 7194.129 10217.441
## 174 175 177 190 192 194 196 200
## 10217.441 10217.441 10217.441 7194.129 15207.907 15207.907 15207.907 15207.907
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_ar)
kable(head(comparaciones, 10), caption = "Arbol de regresión. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 4 | 13950 | 10217.441 |
| 7 | 17710 | 20752.143 |
| 11 | 16430 | 10217.441 |
| 12 | 16925 | 10217.441 |
| 14 | 21105 | 15207.907 |
| 15 | 24565 | 15207.907 |
| 22 | 5572 | 7194.129 |
| 24 | 7957 | 7194.129 |
| 28 | 8558 | 7194.129 |
| 36 | 7295 | 7194.129 |
rmse_ar <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_ar
## [1] 3063.57
Se construye el modelo de árbol de regresión (ar)
modelo_rf <- randomForest(x = datos.entrenamiento[,c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")],
y = datos.entrenamiento[,'price'],
importance = TRUE,
keep.forest = TRUE,
ntree=20)
modelo_rf
##
## Call:
## randomForest(x = datos.entrenamiento[, c("symboling", "fueltype", "aspiration", "doornumber", "carbody", "drivewheel", "enginelocation", "wheelbase", "carlength", "carwidth", "carheight", "curbweight", "enginetype", "cylindernumber", "enginesize", "fuelsystem", "boreratio", "stroke", "compressionratio", "horsepower", "peakrpm", "citympg", "highwaympg")], y = datos.entrenamiento[, "price"], ntree = 20, importance = TRUE, keep.forest = TRUE)
## Type of random forest: regression
## Number of trees: 20
## No. of variables tried at each split: 7
##
## Mean of squared residuals: 4893832
## % Var explained: 91.92
as.data.frame(modelo_rf$importance) %>%
arrange(desc(IncNodePurity))
## %IncMSE IncNodePurity
## enginesize 19611902.366 2606723277
## cylindernumber 10291524.925 1794259533
## horsepower 8469100.931 1226545520
## curbweight 11981656.510 905449051
## highwaympg 5361916.707 849222898
## carwidth 3036615.452 512923821
## carlength 1531629.756 391916013
## drivewheel 2185441.488 367427895
## citympg 3300665.091 274890079
## boreratio 3414719.933 252189946
## wheelbase 1382281.078 114921669
## fuelsystem 514051.888 104457081
## peakrpm 676661.781 87830015
## enginelocation 187971.180 51976017
## carbody 534782.050 50532856
## compressionratio 195706.559 49917709
## stroke 221479.452 44614806
## carheight 202354.199 43206432
## symboling 1538.551 28708244
## aspiration 254811.842 25437503
## fueltype 286006.423 18180371
## enginetype 139950.764 16998350
## doornumber 35955.288 6552841
predicciones_rf <- predict(object = modelo_rf, newdata = datos.validacion)
predicciones_rf
## 4 7 11 12 14 15 22 24
## 10833.393 19559.497 10469.089 10434.527 18411.541 16907.338 5970.815 8462.708
## 28 36 45 48 49 53 55 60
## 8489.263 7082.132 6460.348 37851.708 37851.708 5928.584 6773.641 10339.187
## 73 74 80 85 86 97 99 101
## 31914.928 39863.497 8296.641 14105.446 8720.013 7011.459 7128.447 9260.442
## 116 119 121 132 135 151 163 167
## 13611.876 6032.605 6611.235 10111.758 13954.891 6513.617 8028.367 10205.576
## 174 175 177 190 192 194 196 200
## 10594.099 11084.789 10150.349 9021.320 15792.070 11197.373 16146.473 17810.456
comparaciones <- data.frame(precio_real = datos.validacion$price, precio_predicciones = predicciones_rf)
kable(head(comparaciones, 10), caption = "Random Forest. Comparación precios reales VS predicción de precios. 10 primeras predicciones") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| precio_real | precio_predicciones | |
|---|---|---|
| 4 | 13950 | 10833.393 |
| 7 | 17710 | 19559.497 |
| 11 | 16430 | 10469.089 |
| 12 | 16925 | 10434.527 |
| 14 | 21105 | 18411.541 |
| 15 | 24565 | 16907.338 |
| 22 | 5572 | 5970.815 |
| 24 | 7957 | 8462.708 |
| 28 | 8558 | 8489.263 |
| 36 | 7295 | 7082.132 |
rmse_rf <- rmse(comparaciones$precio_real, comparaciones$precio_predicciones)
rmse_rf
## [1] 2547.684
Se comparan las predicciones
comparaciones <- data.frame(cbind(datos.validacion[,-1], predicciones_rm, predicciones_ar, predicciones_rf))
Se visualizan las predicciones de cada modelo
kable(comparaciones, caption = "Predicciones de los modelos") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| fueltype | aspiration | doornumber | carbody | drivewheel | enginelocation | wheelbase | carlength | carwidth | carheight | curbweight | enginetype | cylindernumber | enginesize | fuelsystem | boreratio | stroke | compressionratio | horsepower | peakrpm | citympg | highwaympg | price | predicciones_rm | predicciones_ar | predicciones_rf | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 4 | gas | std | four | sedan | fwd | front | 99.8 | 176.6 | 66.2 | 54.3 | 2337 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 10.00 | 102 | 5500 | 24 | 30 | 13950.0 | 10084.248 | 10217.441 | 10833.393 |
| 7 | gas | std | four | sedan | fwd | front | 105.8 | 192.7 | 71.4 | 55.7 | 2844 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 25 | 17710.0 | 19859.204 | 20752.143 | 19559.497 |
| 11 | gas | std | two | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.80 | 101 | 5800 | 23 | 29 | 16430.0 | 13249.152 | 10217.441 | 10469.089 |
| 12 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2395 | ohc | four | 108 | mpfi | 3.50 | 2.80 | 8.80 | 101 | 5800 | 23 | 29 | 16925.0 | 12970.168 | 10217.441 | 10434.527 |
| 14 | gas | std | four | sedan | rwd | front | 101.2 | 176.8 | 64.8 | 54.3 | 2765 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 21 | 28 | 21105.0 | 19025.655 | 15207.907 | 18411.541 |
| 15 | gas | std | four | sedan | rwd | front | 103.5 | 189.0 | 66.9 | 55.7 | 3055 | ohc | six | 164 | mpfi | 3.31 | 3.19 | 9.00 | 121 | 4250 | 20 | 25 | 24565.0 | 20602.103 | 15207.907 | 16907.338 |
| 22 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1876 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.41 | 68 | 5500 | 37 | 41 | 5572.0 | 5690.098 | 7194.129 | 5970.815 |
| 24 | gas | turbo | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 2128 | ohc | four | 98 | mpfi | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 7957.0 | 10952.752 | 7194.129 | 8462.708 |
| 28 | gas | turbo | two | sedan | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 2191 | ohc | four | 98 | mpfi | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 8558.0 | 11965.026 | 7194.129 | 8489.263 |
| 36 | gas | std | four | sedan | fwd | front | 96.5 | 163.4 | 64.0 | 54.5 | 2010 | ohc | four | 92 | 1bbl | 2.91 | 3.41 | 9.20 | 76 | 6000 | 30 | 34 | 7295.0 | 7863.641 | 7194.129 | 7082.132 |
| 45 | gas | std | two | sedan | fwd | front | 94.5 | 155.9 | 63.6 | 52.0 | 1874 | ohc | four | 90 | 2bbl | 3.03 | 3.11 | 9.60 | 70 | 5400 | 38 | 43 | 8916.5 | 6733.548 | 7194.129 | 6460.347 |
| 48 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.10 | 176 | 4750 | 15 | 19 | 32250.0 | 33329.521 | 33792.821 | 37851.708 |
| 49 | gas | std | four | sedan | rwd | front | 113.0 | 199.6 | 69.6 | 52.8 | 4066 | dohc | six | 258 | mpfi | 3.63 | 4.17 | 8.10 | 176 | 4750 | 15 | 19 | 35550.0 | 33329.521 | 33792.821 | 37851.708 |
| 53 | gas | std | two | hatchback | fwd | front | 93.1 | 159.1 | 64.2 | 54.1 | 1905 | ohc | four | 91 | 2bbl | 3.03 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 6795.0 | 6428.079 | 7194.129 | 5928.584 |
| 55 | gas | std | four | sedan | fwd | front | 93.1 | 166.8 | 64.2 | 54.1 | 1950 | ohc | four | 91 | 2bbl | 3.08 | 3.15 | 9.00 | 68 | 5000 | 31 | 38 | 7395.0 | 6328.207 | 7194.129 | 6773.641 |
| 60 | gas | std | two | hatchback | fwd | front | 98.8 | 177.8 | 66.5 | 53.7 | 2385 | ohc | four | 122 | 2bbl | 3.39 | 3.39 | 8.60 | 84 | 4800 | 26 | 32 | 8845.0 | 10452.823 | 10217.441 | 10339.187 |
| 73 | gas | std | two | convertible | rwd | front | 96.6 | 180.3 | 70.5 | 50.8 | 3685 | ohcv | eight | 234 | mpfi | 3.46 | 3.10 | 8.30 | 155 | 4750 | 16 | 18 | 35056.0 | 36993.542 | 33792.821 | 31914.928 |
| 74 | gas | std | four | sedan | rwd | front | 120.9 | 208.1 | 71.7 | 56.7 | 3900 | ohcv | eight | 308 | mpfi | 3.80 | 3.35 | 8.00 | 184 | 4500 | 14 | 16 | 40960.0 | 47062.895 | 33792.821 | 39863.497 |
| 80 | gas | turbo | two | hatchback | fwd | front | 93.0 | 157.3 | 63.8 | 50.8 | 2145 | ohc | four | 98 | spdi | 3.03 | 3.39 | 7.60 | 102 | 5500 | 24 | 30 | 7689.0 | 6985.282 | 7194.129 | 8296.641 |
| 85 | gas | turbo | two | hatchback | fwd | front | 95.9 | 173.2 | 66.3 | 50.2 | 2926 | ohc | four | 156 | spdi | 3.59 | 3.86 | 7.00 | 145 | 5000 | 19 | 24 | 14489.0 | 14056.681 | 15207.907 | 14105.446 |
| 86 | gas | std | four | sedan | fwd | front | 96.3 | 172.4 | 65.4 | 51.6 | 2365 | ohc | four | 122 | 2bbl | 3.35 | 3.46 | 8.50 | 88 | 5000 | 25 | 32 | 6989.0 | 10599.555 | 10217.441 | 8720.013 |
| 97 | gas | std | four | sedan | fwd | front | 94.5 | 165.3 | 63.8 | 54.5 | 1971 | ohc | four | 97 | 2bbl | 3.15 | 3.29 | 9.40 | 69 | 5200 | 31 | 37 | 7499.0 | 5721.577 | 7194.129 | 7011.459 |
| 99 | gas | std | two | hardtop | fwd | front | 95.1 | 162.4 | 63.8 | 53.3 | 2008 | ohc | four | 97 | 2bbl | 3.15 | 3.29 | 9.40 | 69 | 5200 | 31 | 37 | 8249.0 | 3525.766 | 7194.129 | 7128.447 |
| 101 | gas | std | four | sedan | fwd | front | 97.2 | 173.4 | 65.2 | 54.7 | 2302 | ohc | four | 120 | 2bbl | 3.33 | 3.47 | 8.50 | 97 | 5200 | 27 | 34 | 9549.0 | 10854.155 | 10217.441 | 9260.442 |
| 116 | gas | std | four | sedan | rwd | front | 107.9 | 186.7 | 68.4 | 56.7 | 3075 | l | four | 120 | mpfi | 3.46 | 3.19 | 8.40 | 97 | 5000 | 19 | 24 | 16630.0 | 10617.766 | 15207.907 | 13611.876 |
| 119 | gas | std | two | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.8 | 1918 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.40 | 68 | 5500 | 37 | 41 | 5572.0 | 5808.905 | 7194.129 | 6032.605 |
| 121 | gas | std | four | hatchback | fwd | front | 93.7 | 157.3 | 63.8 | 50.6 | 1967 | ohc | four | 90 | 2bbl | 2.97 | 3.23 | 9.40 | 68 | 5500 | 31 | 38 | 6229.0 | 6106.314 | 7194.129 | 6611.235 |
| 132 | gas | std | two | hatchback | fwd | front | 96.1 | 176.8 | 66.6 | 50.5 | 2460 | ohc | four | 132 | mpfi | 3.46 | 3.90 | 8.70 | 90 | 5100 | 23 | 31 | 9895.0 | 9222.908 | 10217.441 | 10111.757 |
| 135 | gas | std | two | hatchback | fwd | front | 99.1 | 186.6 | 66.5 | 56.1 | 2707 | ohc | four | 121 | mpfi | 2.54 | 2.07 | 9.30 | 110 | 5250 | 21 | 28 | 15040.0 | 29835.020 | 15207.907 | 13954.891 |
| 151 | gas | std | two | hatchback | fwd | front | 95.7 | 158.7 | 63.6 | 54.5 | 1985 | ohc | four | 92 | 2bbl | 3.05 | 3.03 | 9.00 | 62 | 4800 | 35 | 39 | 5348.0 | 5888.146 | 7194.129 | 6513.617 |
| 163 | gas | std | four | sedan | fwd | front | 95.7 | 166.3 | 64.4 | 52.8 | 2140 | ohc | four | 98 | 2bbl | 3.19 | 3.03 | 9.00 | 70 | 4800 | 28 | 34 | 9258.0 | 7742.246 | 7194.129 | 8028.367 |
| 167 | gas | std | two | hatchback | rwd | front | 94.5 | 168.7 | 64.0 | 52.6 | 2300 | dohc | four | 98 | mpfi | 3.24 | 3.08 | 9.40 | 112 | 6600 | 26 | 29 | 9538.0 | 6781.442 | 10217.441 | 10205.576 |
| 174 | gas | std | four | sedan | fwd | front | 102.4 | 175.6 | 66.5 | 54.9 | 2326 | ohc | four | 122 | mpfi | 3.31 | 3.54 | 8.70 | 92 | 4200 | 29 | 34 | 8948.0 | 8784.681 | 10217.441 | 10594.099 |
| 175 | diesel | turbo | four | sedan | fwd | front | 102.4 | 175.6 | 66.5 | 54.9 | 2480 | ohc | four | 110 | idi | 3.27 | 3.35 | 22.50 | 73 | 4500 | 30 | 33 | 10698.0 | 12538.519 | 10217.441 | 11084.789 |
| 177 | gas | std | four | sedan | fwd | front | 102.4 | 175.6 | 66.5 | 54.9 | 2414 | ohc | four | 122 | mpfi | 3.31 | 3.54 | 8.70 | 92 | 4200 | 27 | 32 | 10898.0 | 9039.330 | 10217.441 | 10150.349 |
| 190 | gas | std | two | convertible | fwd | front | 94.5 | 159.3 | 64.2 | 55.6 | 2254 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 8.50 | 90 | 5500 | 24 | 29 | 11595.0 | 11893.611 | 7194.129 | 9021.320 |
| 192 | gas | std | four | sedan | fwd | front | 100.4 | 180.2 | 66.9 | 55.1 | 2661 | ohc | five | 136 | mpfi | 3.19 | 3.40 | 8.50 | 110 | 5500 | 19 | 24 | 13295.0 | 16431.738 | 15207.907 | 15792.070 |
| 194 | gas | std | four | wagon | fwd | front | 100.4 | 183.1 | 66.9 | 55.1 | 2563 | ohc | four | 109 | mpfi | 3.19 | 3.40 | 9.00 | 88 | 5500 | 25 | 31 | 12290.0 | 11265.698 | 15207.907 | 11197.372 |
| 196 | gas | std | four | wagon | rwd | front | 104.3 | 188.8 | 67.2 | 57.5 | 3034 | ohc | four | 141 | mpfi | 3.78 | 3.15 | 9.50 | 114 | 5400 | 23 | 28 | 13415.0 | 16539.946 | 15207.907 | 16146.473 |
| 200 | gas | turbo | four | wagon | rwd | front | 104.3 | 188.8 | 67.2 | 57.5 | 3157 | ohc | four | 130 | mpfi | 3.62 | 3.15 | 7.50 | 162 | 5100 | 17 | 22 | 18950.0 | 19686.137 | 15207.907 | 17810.456 |
Se compara el RMSE
rmse <- data.frame(rm = rmse_rm, ar = rmse_ar, rf = rmse_rf)
kable(rmse, caption = "Estadístico RMSE de cada modelo") %>%
kable_styling(full_width = F, bootstrap_options = c("striped", "bordered", "condensed")) %>%
kable_paper("hover")
| rm | ar | rf |
|---|---|---|
| 3452.468 | 3063.57 | 2547.684 |
En el presente ejercicio se realizo una cargade datos numéricos de precios de automóviles con respecto a algunas variables numéricas mediante un enlace de Github en formato CSV. Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
Además, para la continuación de este ejercicio tuvo que ser utilizada la semilla 1350 en lugar de la semilla que debería der 1349 ya que en algunas variables no tomaba bien los valores de los distintos modelos.
Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.
El modelo de regresión linea múltiple destaca algunas variables estadísticamente significativas.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.
Comparado con la semilla 2023, el RMSE no suele ser muy bajo.