1 Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

2 Descripción

Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv

  • Participan todas las variables del conjunto de datos.

  • Se crean datos de entrenamiento al 80%

  • Se crean datos de validación al 20%

  • Se crea el modelo regresión múltiple con datos de entrenamiento

    • Con este modelo se responde a preguntas tales como:

    • ¿cuáles son variables que están por encima del 90% de confianza como predictores?,

    • ¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Se crea el modelo árboles de regresión con los datos de entrenamiento

    • Se identifica la importancia de las variables sobre el precio

    • Se visualiza el árbol de regresión y sus reglas de asociación

    • Se hacen predicciones con datos de validación

    • Se determinar el estadístico RMSE para efectos de comparación

  • Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados

    • Se identifica la importancia de las variables sobre el precio

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.

3 Desarrollo

3.1 Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd

# Gráficos
import matplotlib.pyplot as plt

# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split

# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial

# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV

# Random Forest
from sklearn.ensemble import RandomForestRegressor


# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

3.2 Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
##      car_ID  symboling                   CarName  ... citympg highwaympg    price
## 0         1          3        alfa-romero giulia  ...      21         27  13495.0
## 1         2          3       alfa-romero stelvio  ...      21         27  16500.0
## 2         3          1  alfa-romero Quadrifoglio  ...      19         26  16500.0
## 3         4          2               audi 100 ls  ...      24         30  13950.0
## 4         5          2                audi 100ls  ...      18         22  17450.0
## ..      ...        ...                       ...  ...     ...        ...      ...
## 200     201         -1           volvo 145e (sw)  ...      23         28  16845.0
## 201     202         -1               volvo 144ea  ...      19         25  19045.0
## 202     203         -1               volvo 244dl  ...      18         23  21485.0
## 203     204         -1                 volvo 246  ...      26         27  22470.0
## 204     205         -1               volvo 264gl  ...      19         25  22625.0
## 
## [205 rows x 26 columns]

3.3 Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID                int64
## symboling             int64
## CarName              object
## fueltype             object
## aspiration           object
## doornumber           object
## carbody              object
## drivewheel           object
## enginelocation       object
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginetype           object
## cylindernumber       object
## enginesize            int64
## fuelsystem           object
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

3.3.1 Diccionario de datos

Col Nombre Descripción
1 Car_ID Unique id of each observation (Interger)
2 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3 carCompany Name of car company (Categorical)
4 fueltype Car fuel type i.e gas or diesel (Categorical)
5 aspiration Aspiration used in a car (Categorical) (Std o Turbo)
6 doornumber Number of doors in a car (Categorical). Puertas
7 carbody body of car (Categorical). (convertible, sedan, wagon …)
8 drivewheel type of drive wheel (Categorical). (hidráulica, manual, )
9 enginelocation Location of car engine (Categorical). Lugar del motor
10 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
11 carlength Length of car (Numeric). Longitud
12 carwidth Width of car (Numeric). Amplitud
13 carheight height of car (Numeric). Altura
14 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
15 enginetype Type of engine. (Categorical). Tipo de motor
16 cylindernumber cylinder placed in the car (Categorical). Cilindraje
17 enginesize Size of car (Numeric). Tamaño del carro en …
18 fuelsystem Fuel system of car (Categorical)
19 boreratio Boreratio of car (Numeric). Eficiencia de motor
20 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
21 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
22 horsepower Horsepower (Numeric). Poder del carro
23 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
24 citympg Mileage in city (Numeric). Consumo de gasolina
25 highwaympg Mileage on highway (Numeric). Consumo de gasolina
26

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

Fuentehttps://archive.ics.uci.edu/ml/datasets/Automobile

3.4 Preparación de datos

3.4.1 Eliminar variables

Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName

datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
##      symboling fueltype aspiration  ... citympg highwaympg    price
## 0            3      gas        std  ...      21         27  13495.0
## 1            3      gas        std  ...      21         27  16500.0
## 2            1      gas        std  ...      19         26  16500.0
## 3            2      gas        std  ...      24         30  13950.0
## 4            2      gas        std  ...      18         22  17450.0
## ..         ...      ...        ...  ...     ...        ...      ...
## 200         -1      gas        std  ...      23         28  16845.0
## 201         -1      gas      turbo  ...      19         25  19045.0
## 202         -1      gas        std  ...      18         23  21485.0
## 203         -1   diesel      turbo  ...      26         27  22470.0
## 204         -1      gas      turbo  ...      19         25  22625.0
## 
## [205 rows x 24 columns]

3.4.2 Construir variables Dummys

Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object

Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.

El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.

¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.

Ejemplo

genero
MASCULINO
FEMENINO
MASCULINO

Mismos datos con variables dummis

genero_masculino genero_femenino
1 0
0 1
1 0
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 0            3       88.6  ...                0                0
## 1            3       88.6  ...                0                0
## 2            1       94.5  ...                0                0
## 3            2       99.8  ...                0                0
## 4            2       99.4  ...                0                0
## ..         ...        ...  ...              ...              ...
## 200         -1      109.1  ...                0                0
## 201         -1      109.1  ...                0                0
## 202         -1      109.1  ...                0                0
## 203         -1      109.1  ...                0                0
## 204         -1      109.1  ...                0                0
## 
## [205 rows x 44 columns]

3.5 Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 2022

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80,  random_state = 2022)

3.5.1 Datos de entrenamiento

X_entrena
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 152          1       95.7  ...                0                0
## 185          2       97.3  ...                0                0
## 162          0       95.7  ...                0                0
## 47           0      113.0  ...                0                0
## 163          1       94.5  ...                0                0
## ..         ...        ...  ...              ...              ...
## 183          2       97.3  ...                0                0
## 177         -1      102.4  ...                0                0
## 112          0      107.9  ...                0                0
## 173         -1      102.4  ...                0                0
## 125          3       94.5  ...                0                0
## 
## [164 rows x 43 columns]

3.5.2 Datos de validación

X_valida
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 36           0       96.5  ...                0                0
## 198         -2      104.3  ...                0                0
## 102          0      100.4  ...                0                0
## 146          0       97.0  ...                0                0
## 79           1       93.0  ...                1                0
## 32           1       93.7  ...                0                0
## 107          0      107.9  ...                0                0
## 180         -1      104.5  ...                0                0
## 127          3       89.5  ...                0                0
## 149          0       96.9  ...                0                0
## 43           0       94.3  ...                0                0
## 40           0       96.5  ...                0                0
## 203         -1      109.1  ...                0                0
## 138          2       93.7  ...                0                0
## 201         -1      109.1  ...                0                0
## 20           0       94.5  ...                0                0
## 164          1       94.5  ...                0                0
## 65           0      104.9  ...                0                0
## 22           1       93.7  ...                0                0
## 186          2       97.3  ...                0                0
## 106          1       99.2  ...                0                0
## 156          0       95.7  ...                0                0
## 111          0      107.9  ...                0                0
## 68          -1      110.0  ...                0                0
## 123         -1      103.3  ...                0                0
## 108          0      107.9  ...                0                0
## 78           2       93.7  ...                0                0
## 8            1      105.8  ...                0                0
## 74           1      112.0  ...                0                0
## 10           2      101.2  ...                0                0
## 113          0      114.2  ...                0                0
## 82           3       95.9  ...                1                0
## 57           3       95.3  ...                0                0
## 158          0       95.7  ...                0                0
## 58           3       95.3  ...                0                0
## 17           0      110.0  ...                0                0
## 129          1       98.4  ...                0                0
## 150          1       95.7  ...                0                0
## 73           0      120.9  ...                0                0
## 116          0      107.9  ...                0                0
## 30           2       86.6  ...                0                0
## 
## [41 rows x 43 columns]

3.6 Modelos Supervisados. Regresión

3.6.1 Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
## LinearRegression()

3.6.1.1 Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([-1.44061234e+02,  4.20304710e+01, -7.69669355e+01,  5.40067885e+02,
##         8.20826526e+01,  5.54718628e+00,  8.41017077e+01, -1.58255808e+03,
##        -3.59143919e+03, -3.37663052e+02,  1.46591017e+01,  2.09786814e+00,
##        -2.37983590e+02,  1.83095361e+02, -3.71831408e+03,  1.50095971e+03,
##         1.92820429e+02, -4.20182555e+03, -3.61331893e+03, -2.55787518e+03,
##        -4.29555739e+03,  8.03555949e+02,  1.62354190e+03,  6.86505950e+03,
##        -2.27373675e-13, -4.60472536e+02,  2.94237926e+03,  1.56214668e+03,
##        -6.39338776e+03, -1.72796049e+03, -1.06398381e+04, -1.12804032e+04,
##        -6.76468915e+03, -2.15379329e+03, -8.33237282e+03, -1.72796049e+03,
##         1.20349506e+03, -1.72796049e+03,  3.71831408e+03, -1.97754583e+03,
##         5.35151220e+02, -1.84032434e+03, -2.24293176e+02])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.0000 significa que las variables independientes explican aproximadamente el 00.00% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9286642852257763

3.6.1.2 Predicciones del modelo rm

Se hacen predicciones con los datos de validación

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 6275.26876164 20183.80316317 15898.03172714  7458.97081347
##   7476.32581442  4100.96861125 13547.48559838 22607.94940748
##  32555.04912632 10035.2107168   8993.20379876  8298.73782873
##  26892.05603303  6948.98854852 21800.61585107  6681.8695686
##   7779.68551985 16688.81823958  6245.85861703  9591.29125384
##  17418.52661503  8079.40012354 17414.70182742 26960.09519985
##  10570.39918611 17496.98536569  7304.08137327 21648.18939631
##  36651.49427211 13620.71628995 16167.7240898  13398.56832513
##  11422.73593142  8593.97388554 15901.18418856 33722.62184987
##  39795.20858981  5592.15191463 39243.07428789 17802.08061133]

3.6.1.3 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 36           0       96.5  ...       7295.0        6275.268762
## 198         -2      104.3  ...      18420.0       20183.803163
## 102          0      100.4  ...      14399.0       15898.031727
## 146          0       97.0  ...       7463.0        7458.970813
## 79           1       93.0  ...       7689.0        7476.325814
## 32           1       93.7  ...       5399.0        4100.968611
## 107          0      107.9  ...      11900.0       13547.485598
## 180         -1      104.5  ...      15690.0       22607.949407
## 127          3       89.5  ...      34028.0       32555.049126
## 149          0       96.9  ...      11694.0       10035.210717
## 43           0       94.3  ...       6785.0        8993.203799
## 40           0       96.5  ...      10295.0        8298.737829
## 203         -1      109.1  ...      22470.0       26892.056033
## 138          2       93.7  ...       5118.0        6948.988549
## 201         -1      109.1  ...      19045.0       21800.615851
## 20           0       94.5  ...       6575.0        6681.869569
## 164          1       94.5  ...       8238.0        7779.685520
## 65           0      104.9  ...      18280.0       16688.818240
## 22           1       93.7  ...       6377.0        6245.858617
## 186          2       97.3  ...       8495.0        9591.291254
## 106          1       99.2  ...      18399.0       17418.526615
## 156          0       95.7  ...       6938.0        8079.400124
## 111          0      107.9  ...      15580.0       17414.701827
## 68          -1      110.0  ...      28248.0       26960.095200
## 123         -1      103.3  ...       8921.0       10570.399186
## 108          0      107.9  ...      13200.0       17496.985366
## 78           2       93.7  ...       6669.0        7304.081373
## 8            1      105.8  ...      23875.0       21648.189396
## 74           1      112.0  ...      45400.0       36651.494272
## 10           2      101.2  ...      16430.0       13620.716290
## 113          0      114.2  ...      16695.0       16167.724090
## 82           3       95.9  ...      12629.0       13398.568325
## 57           3       95.3  ...      13645.0       11422.735931
## 158          0       95.7  ...       7898.0        8593.973886
## 58           3       95.3  ...      15645.0       15901.184189
## 17           0      110.0  ...      36880.0       33722.621850
## 129          1       98.4  ...      31400.5       39795.208590
## 150          1       95.7  ...       5348.0        5592.151915
## 73           0      120.9  ...      40960.0       39243.074288
## 116          0      107.9  ...      17950.0       17802.080611
## 30           2       86.6  ...       6479.0        1307.069171
## 
## [41 rows x 45 columns]

3.6.1.4 RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2887.3005726006995

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2887.3005726006995

3.6.2 Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 2022
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
## DecisionTreeRegressor(random_state=2022)

3.6.2.1 Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))

print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 13
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 152
#plot = plot_tree(
#            decision_tree = modelo_ar,
#            feature_names = datos.drop(columns = "price").columns,
#            class_names   = 'price',
#            filled        = True,
#            impurity      = False,
#            fontsize      = 10,
#            precision     = 2,
#            ax            = ax
#       )

#plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos_dummis.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2697.50
## |   |   |--- curbweight <= 2291.50
## |   |   |   |--- citympg <= 29.50
## |   |   |   |   |--- symboling <= 2.50
## |   |   |   |   |   |--- peakrpm <= 4600.00
## |   |   |   |   |   |   |--- curbweight <= 2155.00
## |   |   |   |   |   |   |   |--- value: [7053.00]
## |   |   |   |   |   |   |--- curbweight >  2155.00
## |   |   |   |   |   |   |   |--- symboling <= 1.00
## |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |--- symboling >  1.00
## |   |   |   |   |   |   |   |   |--- value: [7603.00]
## |   |   |   |   |   |--- peakrpm >  4600.00
## |   |   |   |   |   |   |--- boreratio <= 3.22
## |   |   |   |   |   |   |   |--- carlength <= 168.10
## |   |   |   |   |   |   |   |   |--- curbweight <= 2134.00
## |   |   |   |   |   |   |   |   |   |--- aspiration_turbo <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8358.00]
## |   |   |   |   |   |   |   |   |   |--- aspiration_turbo >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7957.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2134.00
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5150.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  5150.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2262.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2262.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9095.00]
## |   |   |   |   |   |   |   |--- carlength >  168.10
## |   |   |   |   |   |   |   |   |--- curbweight <= 2251.00
## |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8195.00]
## |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2189.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8058.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2189.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7975.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2251.00
## |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |--- boreratio >  3.22
## |   |   |   |   |   |   |   |--- value: [9298.00]
## |   |   |   |   |--- symboling >  2.50
## |   |   |   |   |   |--- curbweight <= 2237.50
## |   |   |   |   |   |   |--- value: [9980.00]
## |   |   |   |   |   |--- curbweight >  2237.50
## |   |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |--- citympg >  29.50
## |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |--- carlength <= 156.60
## |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |--- carlength >  156.60
## |   |   |   |   |   |   |--- curbweight <= 1947.50
## |   |   |   |   |   |   |   |--- curbweight <= 1903.50
## |   |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |   |--- curbweight >  1903.50
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |   |--- carwidth >  64.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6695.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6649.00]
## |   |   |   |   |   |   |--- curbweight >  1947.50
## |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |--- enginesize <= 94.50
## |   |   |   |   |   |   |   |   |   |--- carlength <= 167.05
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 52.35
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7150.50]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  52.35
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- carlength >  167.05
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 63.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  63.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6692.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  94.50
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 100.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2030.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2030.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |--- enginesize >  100.50
## |   |   |   |   |   |   |   |   |   |   |--- fuelsystem_2bbl <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7099.00]
## |   |   |   |   |   |   |   |   |   |   |--- fuelsystem_2bbl >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7126.00]
## |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |--- carheight <= 54.50
## |   |   |   |   |   |   |   |   |   |--- value: [8249.00]
## |   |   |   |   |   |   |   |   |--- carheight >  54.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2262.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2262.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7995.00]
## |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |--- enginesize <= 94.50
## |   |   |   |   |   |   |--- highwaympg <= 39.50
## |   |   |   |   |   |   |   |--- highwaympg <= 32.50
## |   |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |   |--- highwaympg >  32.50
## |   |   |   |   |   |   |   |   |--- fuelsystem_2bbl <= 0.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1948.00
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 90.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6855.00]
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  90.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6529.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1948.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7129.00]
## |   |   |   |   |   |   |   |   |--- fuelsystem_2bbl >  0.50
## |   |   |   |   |   |   |   |   |   |--- boreratio <= 3.00
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6229.00]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6189.00]
## |   |   |   |   |   |   |   |   |   |--- boreratio >  3.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1902.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6095.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1902.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |--- highwaympg >  39.50
## |   |   |   |   |   |   |   |--- wheelbase <= 94.10
## |   |   |   |   |   |   |   |   |--- citympg <= 42.00
## |   |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |   |   |   |--- citympg >  42.00
## |   |   |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |   |--- wheelbase >  94.10
## |   |   |   |   |   |   |   |   |--- value: [6295.00]
## |   |   |   |   |   |--- enginesize >  94.50
## |   |   |   |   |   |   |--- horsepower <= 69.50
## |   |   |   |   |   |   |   |--- peakrpm <= 4850.00
## |   |   |   |   |   |   |   |   |--- value: [7788.00]
## |   |   |   |   |   |   |   |--- peakrpm >  4850.00
## |   |   |   |   |   |   |   |   |--- value: [7799.00]
## |   |   |   |   |   |   |--- horsepower >  69.50
## |   |   |   |   |   |   |   |--- value: [7198.00]
## |   |   |--- curbweight >  2291.50
## |   |   |   |--- citympg <= 22.00
## |   |   |   |   |--- enginesize <= 125.50
## |   |   |   |   |   |--- drivewheel_fwd <= 0.50
## |   |   |   |   |   |   |--- value: [11395.00]
## |   |   |   |   |   |--- drivewheel_fwd >  0.50
## |   |   |   |   |   |   |--- symboling <= 2.50
## |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |   |--- symboling >  2.50
## |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |--- enginesize >  125.50
## |   |   |   |   |   |--- highwaympg <= 24.50
## |   |   |   |   |   |   |--- value: [13295.00]
## |   |   |   |   |   |--- highwaympg >  24.50
## |   |   |   |   |   |   |--- boreratio <= 3.33
## |   |   |   |   |   |   |   |--- value: [15250.00]
## |   |   |   |   |   |   |--- boreratio >  3.33
## |   |   |   |   |   |   |   |--- value: [14997.50]
## |   |   |   |--- citympg >  22.00
## |   |   |   |   |--- wheelbase <= 99.30
## |   |   |   |   |   |--- curbweight <= 2422.50
## |   |   |   |   |   |   |--- horsepower <= 91.00
## |   |   |   |   |   |   |   |--- wheelbase <= 96.95
## |   |   |   |   |   |   |   |   |--- curbweight <= 2346.50
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2346.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2385.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6989.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2385.00
## |   |   |   |   |   |   |   |   |   |   |--- enginetype_ohc <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |   |   |   |   |   |   |   |   |--- enginetype_ohc >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |   |--- wheelbase >  96.95
## |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 28.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9233.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  28.50
## |   |   |   |   |   |   |   |   |   |   |--- fueltype_gas <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9495.00]
## |   |   |   |   |   |   |   |   |   |   |--- fueltype_gas >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9370.00]
## |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [9720.00]
## |   |   |   |   |   |   |--- horsepower >  91.00
## |   |   |   |   |   |   |   |--- carwidth <= 65.45
## |   |   |   |   |   |   |   |   |--- enginetype_ohcf <= 0.50
## |   |   |   |   |   |   |   |   |   |--- carheight <= 50.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  50.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2313.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2313.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- enginetype_ohcf >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [9960.00]
## |   |   |   |   |   |   |   |--- carwidth >  65.45
## |   |   |   |   |   |   |   |   |--- wheelbase <= 96.90
## |   |   |   |   |   |   |   |   |   |--- value: [10345.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  96.90
## |   |   |   |   |   |   |   |   |   |--- value: [9995.00]
## |   |   |   |   |   |--- curbweight >  2422.50
## |   |   |   |   |   |   |--- carwidth <= 65.30
## |   |   |   |   |   |   |   |--- value: [12945.00]
## |   |   |   |   |   |   |--- carwidth >  65.30
## |   |   |   |   |   |   |   |--- stroke <= 3.45
## |   |   |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 4725.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [10795.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  4725.00
## |   |   |   |   |   |   |   |   |   |   |--- enginetype_ohc <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11259.00]
## |   |   |   |   |   |   |   |   |   |   |--- enginetype_ohc >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11245.00]
## |   |   |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [10198.00]
## |   |   |   |   |   |   |   |--- stroke >  3.45
## |   |   |   |   |   |   |   |   |--- curbweight <= 2629.00
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2538.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9639.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2538.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- drivewheel_fwd <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |   |   |   |   |   |   |--- drivewheel_fwd >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2629.00
## |   |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |--- wheelbase >  99.30
## |   |   |   |   |   |--- wheelbase <= 101.80
## |   |   |   |   |   |   |--- compressionratio <= 8.90
## |   |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |   |   |   |--- compressionratio >  8.90
## |   |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |   |--- peakrpm <= 5000.00
## |   |   |   |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |   |   |   |--- peakrpm >  5000.00
## |   |   |   |   |   |   |   |   |   |--- value: [13950.00]
## |   |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |   |--- value: [12290.00]
## |   |   |   |   |   |--- wheelbase >  101.80
## |   |   |   |   |   |   |--- carlength <= 175.10
## |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |   |--- carlength >  175.10
## |   |   |   |   |   |   |   |--- highwaympg <= 33.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2436.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 54.40
## |   |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  54.40
## |   |   |   |   |   |   |   |   |   |   |--- value: [10898.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2436.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 54.40
## |   |   |   |   |   |   |   |   |   |   |--- value: [11248.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  54.40
## |   |   |   |   |   |   |   |   |   |   |--- value: [10698.00]
## |   |   |   |   |   |   |   |--- highwaympg >  33.50
## |   |   |   |   |   |   |   |   |--- value: [8948.00]
## |   |--- curbweight >  2697.50
## |   |   |--- enginesize <= 119.50
## |   |   |   |--- drivewheel_rwd <= 0.50
## |   |   |   |   |--- value: [8778.00]
## |   |   |   |--- drivewheel_rwd >  0.50
## |   |   |   |   |--- value: [11048.00]
## |   |   |--- enginesize >  119.50
## |   |   |   |--- peakrpm <= 5450.00
## |   |   |   |   |--- peakrpm <= 4525.00
## |   |   |   |   |   |--- enginesize <= 158.00
## |   |   |   |   |   |   |--- citympg <= 26.50
## |   |   |   |   |   |   |   |--- curbweight <= 3457.50
## |   |   |   |   |   |   |   |   |--- value: [13860.00]
## |   |   |   |   |   |   |   |--- curbweight >  3457.50
## |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |   |   |   |--- citympg >  26.50
## |   |   |   |   |   |   |   |--- citympg <= 29.50
## |   |   |   |   |   |   |   |   |--- value: [16900.00]
## |   |   |   |   |   |   |   |--- citympg >  29.50
## |   |   |   |   |   |   |   |   |--- value: [18344.00]
## |   |   |   |   |   |--- enginesize >  158.00
## |   |   |   |   |   |   |--- wheelbase <= 102.35
## |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [21105.00]
## |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |--- value: [20970.00]
## |   |   |   |   |   |   |--- wheelbase >  102.35
## |   |   |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |   |--- peakrpm >  4525.00
## |   |   |   |   |   |--- carwidth <= 68.65
## |   |   |   |   |   |   |--- horsepower <= 153.00
## |   |   |   |   |   |   |   |--- carheight <= 52.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2869.50
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.68
## |   |   |   |   |   |   |   |   |   |   |--- value: [11549.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.68
## |   |   |   |   |   |   |   |   |   |   |--- fuelsystem_spdi <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |   |   |   |   |--- fuelsystem_spdi >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12764.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2869.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [14869.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [14489.00]
## |   |   |   |   |   |   |   |--- carheight >  52.50
## |   |   |   |   |   |   |   |   |--- citympg <= 23.50
## |   |   |   |   |   |   |   |   |   |--- carlength <= 187.75
## |   |   |   |   |   |   |   |   |   |   |--- horsepower <= 131.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- horsepower >  131.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  187.75
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.17
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.17
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12440.00]
## |   |   |   |   |   |   |   |   |--- citympg >  23.50
## |   |   |   |   |   |   |   |   |   |--- carheight <= 54.60
## |   |   |   |   |   |   |   |   |   |   |--- value: [17669.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  54.60
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2988.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15985.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2988.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16515.00]
## |   |   |   |   |   |   |--- horsepower >  153.00
## |   |   |   |   |   |   |   |--- citympg <= 18.00
## |   |   |   |   |   |   |   |   |--- stroke <= 3.21
## |   |   |   |   |   |   |   |   |   |--- value: [18950.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.21
## |   |   |   |   |   |   |   |   |   |--- value: [19699.00]
## |   |   |   |   |   |   |   |--- citympg >  18.00
## |   |   |   |   |   |   |   |   |--- wheelbase <= 92.90
## |   |   |   |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  92.90
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2996.00
## |   |   |   |   |   |   |   |   |   |   |--- symboling <= 2.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- symboling >  2.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2996.00
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 103.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15998.00]
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  103.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |--- carwidth >  68.65
## |   |   |   |   |   |   |--- aspiration_turbo <= 0.50
## |   |   |   |   |   |   |   |--- value: [16845.00]
## |   |   |   |   |   |   |--- aspiration_turbo >  0.50
## |   |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |--- peakrpm >  5450.00
## |   |   |   |   |--- enginesize <= 143.50
## |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |--- boreratio <= 3.37
## |   |   |   |   |   |   |   |--- wheelbase <= 99.45
## |   |   |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |   |   |   |   |--- wheelbase >  99.45
## |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [17710.00]
## |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [17859.17]
## |   |   |   |   |   |   |--- boreratio >  3.37
## |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |--- citympg <= 18.50
## |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |   |--- citympg >  18.50
## |   |   |   |   |   |   |   |   |   |--- value: [18620.00]
## |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |--- value: [18920.00]
## |   |   |   |   |--- enginesize >  143.50
## |   |   |   |   |   |--- cylindernumber_four <= 0.50
## |   |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |   |   |--- cylindernumber_four >  0.50
## |   |   |   |   |   |   |--- value: [22018.00]
## |--- enginesize >  182.00
## |   |--- fuelsystem_mpfi <= 0.50
## |   |   |--- wheelbase <= 112.80
## |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |--- value: [28176.00]
## |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |--- value: [25552.00]
## |   |   |--- wheelbase >  112.80
## |   |   |   |--- value: [31600.00]
## |   |--- fuelsystem_mpfi >  0.50
## |   |   |--- doornumber_two <= 0.50
## |   |   |   |--- highwaympg <= 20.50
## |   |   |   |   |--- enginetype_ohcv <= 0.50
## |   |   |   |   |   |--- value: [33900.00]
## |   |   |   |   |--- enginetype_ohcv >  0.50
## |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |--- highwaympg >  20.50
## |   |   |   |   |--- value: [30760.00]
## |   |   |--- doornumber_two >  0.50
## |   |   |   |--- compressionratio <= 8.15
## |   |   |   |   |--- value: [41315.00]
## |   |   |   |--- compressionratio >  8.15
## |   |   |   |   |--- curbweight <= 2778.00
## |   |   |   |   |   |--- value: [32528.00]
## |   |   |   |   |--- curbweight >  2778.00
## |   |   |   |   |   |--- highwaympg <= 21.50
## |   |   |   |   |   |   |--- horsepower <= 208.50
## |   |   |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |   |   |--- horsepower >  208.50
## |   |   |   |   |   |   |   |--- value: [36000.00]
## |   |   |   |   |   |--- highwaympg >  21.50
## |   |   |   |   |   |   |--- value: [37028.00]

3.6.2.2 Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos_dummis.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##                 predictor  importancia
## 6              enginesize     0.663199
## 5              curbweight     0.255530
## 12                citympg     0.014422
## 11                peakrpm     0.012998
## 1               wheelbase     0.012245
## 40        fuelsystem_mpfi     0.011149
## 10             horsepower     0.005951
## 3                carwidth     0.005139
## 9        compressionratio     0.004650
## 16         doornumber_two     0.002660
## 13             highwaympg     0.002078
## 15       aspiration_turbo     0.001959
## 2               carlength     0.001671
## 0               symboling     0.001545
## 18      carbody_hatchback     0.001421
## 4               carheight     0.001183
## 8                  stroke     0.000566
## 19          carbody_sedan     0.000403
## 20          carbody_wagon     0.000380
## 22         drivewheel_rwd     0.000342
## 7               boreratio     0.000204
## 26         enginetype_ohc     0.000144
## 36        fuelsystem_2bbl     0.000061
## 21         drivewheel_fwd     0.000045
## 27        enginetype_ohcf     0.000029
## 31    cylindernumber_four     0.000017
## 28        enginetype_ohcv     0.000006
## 41        fuelsystem_spdi     0.000002
## 14           fueltype_gas     0.000001
## 24       enginetype_dohcv     0.000000
## 35     cylindernumber_two     0.000000
## 39         fuelsystem_mfi     0.000000
## 38         fuelsystem_idi     0.000000
## 37        fuelsystem_4bbl     0.000000
## 17        carbody_hardtop     0.000000
## 33   cylindernumber_three     0.000000
## 34  cylindernumber_twelve     0.000000
## 25           enginetype_l     0.000000
## 32     cylindernumber_six     0.000000
## 30    cylindernumber_five     0.000000
## 29       enginetype_rotor     0.000000
## 23    enginelocation_rear     0.000000
## 42        fuelsystem_spfi     0.000000

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase

3.6.2.3 Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 7295. , 18950. , 13499. ,  9298. ,  7895. ,  5572. , 16630. ,
##        15750. , 32528. , 10198. ,  8845. ,  6989. , 22625. ,  7799. ,
##        22625. ,  6849. ,  7975. , 14997.5,  6189. ,  7898. , 15998. ,
##         7609. , 16630. , 28176. ,  8921. , 16900. ,  6189. , 17710. ,
##        41315. , 16925. , 13415. , 12764. , 11395. ,  7099. , 11395. ,
##        33900. , 37028. ,  6488. , 34184. , 16900. ,  5151. ])

3.6.2.4 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 36           0       96.5  ...       7295.0             7295.0
## 198         -2      104.3  ...      18420.0            18950.0
## 102          0      100.4  ...      14399.0            13499.0
## 146          0       97.0  ...       7463.0             9298.0
## 79           1       93.0  ...       7689.0             7895.0
## 32           1       93.7  ...       5399.0             5572.0
## 107          0      107.9  ...      11900.0            16630.0
## 180         -1      104.5  ...      15690.0            15750.0
## 127          3       89.5  ...      34028.0            32528.0
## 149          0       96.9  ...      11694.0            10198.0
## 43           0       94.3  ...       6785.0             8845.0
## 40           0       96.5  ...      10295.0             6989.0
## 203         -1      109.1  ...      22470.0            22625.0
## 138          2       93.7  ...       5118.0             7799.0
## 201         -1      109.1  ...      19045.0            22625.0
## 20           0       94.5  ...       6575.0             6849.0
## 164          1       94.5  ...       8238.0             7975.0
## 65           0      104.9  ...      18280.0            14997.5
## 22           1       93.7  ...       6377.0             6189.0
## 186          2       97.3  ...       8495.0             7898.0
## 106          1       99.2  ...      18399.0            15998.0
## 156          0       95.7  ...       6938.0             7609.0
## 111          0      107.9  ...      15580.0            16630.0
## 68          -1      110.0  ...      28248.0            28176.0
## 123         -1      103.3  ...       8921.0             8921.0
## 108          0      107.9  ...      13200.0            16900.0
## 78           2       93.7  ...       6669.0             6189.0
## 8            1      105.8  ...      23875.0            17710.0
## 74           1      112.0  ...      45400.0            41315.0
## 10           2      101.2  ...      16430.0            16925.0
## 113          0      114.2  ...      16695.0            13415.0
## 82           3       95.9  ...      12629.0            12764.0
## 57           3       95.3  ...      13645.0            11395.0
## 158          0       95.7  ...       7898.0             7099.0
## 58           3       95.3  ...      15645.0            11395.0
## 17           0      110.0  ...      36880.0            33900.0
## 129          1       98.4  ...      31400.5            37028.0
## 150          1       95.7  ...       5348.0             6488.0
## 73           0      120.9  ...      40960.0            34184.0
## 116          0      107.9  ...      17950.0            16900.0
## 30           2       86.6  ...       6479.0             5151.0
## 
## [41 rows x 45 columns]

3.6.2.5 RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2609.8260641977377

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2609.8260641977377

3.6.3 Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 2022)

modelo_rf.fit(X_entrena, Y_entrena)
## RandomForestRegressor(n_estimators=20, random_state=2022)

3.6.3.1 Variables de importancia

# pendiente ... ...

3.6.3.2 Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 7491.525     , 17491.        , 14890.2       ,  8524.8       ,
##         8057.65      ,  5844.125     , 15933.55      , 16705.9       ,
##        33379.85      , 12634.36666667,  8236.6       ,  8964.05      ,
##        18462.6       ,  7126.6       , 18935.05      ,  8497.975     ,
##         8069.15      , 13756.65      ,  5623.75      ,  8621.4       ,
##        18107.5       ,  7749.25      , 16508.125     , 28952.95      ,
##         9214.95      , 16684.3       ,  6481.55      , 19457.3667    ,
##        36978.05      , 15329.55      , 15677.75      , 13909.00835   ,
##        11499.45      ,  7993.925     , 11575.1       , 35511.35      ,
##        36234.7       ,  6544.775     , 36366.2       , 16684.3       ,
##         5707.925     ])

3.6.3.3 Tabla comparativa


comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 36           0       96.5  ...       7295.0        7491.525000
## 198         -2      104.3  ...      18420.0       17491.000000
## 102          0      100.4  ...      14399.0       14890.200000
## 146          0       97.0  ...       7463.0        8524.800000
## 79           1       93.0  ...       7689.0        8057.650000
## 32           1       93.7  ...       5399.0        5844.125000
## 107          0      107.9  ...      11900.0       15933.550000
## 180         -1      104.5  ...      15690.0       16705.900000
## 127          3       89.5  ...      34028.0       33379.850000
## 149          0       96.9  ...      11694.0       12634.366667
## 43           0       94.3  ...       6785.0        8236.600000
## 40           0       96.5  ...      10295.0        8964.050000
## 203         -1      109.1  ...      22470.0       18462.600000
## 138          2       93.7  ...       5118.0        7126.600000
## 201         -1      109.1  ...      19045.0       18935.050000
## 20           0       94.5  ...       6575.0        8497.975000
## 164          1       94.5  ...       8238.0        8069.150000
## 65           0      104.9  ...      18280.0       13756.650000
## 22           1       93.7  ...       6377.0        5623.750000
## 186          2       97.3  ...       8495.0        8621.400000
## 106          1       99.2  ...      18399.0       18107.500000
## 156          0       95.7  ...       6938.0        7749.250000
## 111          0      107.9  ...      15580.0       16508.125000
## 68          -1      110.0  ...      28248.0       28952.950000
## 123         -1      103.3  ...       8921.0        9214.950000
## 108          0      107.9  ...      13200.0       16684.300000
## 78           2       93.7  ...       6669.0        6481.550000
## 8            1      105.8  ...      23875.0       19457.366700
## 74           1      112.0  ...      45400.0       36978.050000
## 10           2      101.2  ...      16430.0       15329.550000
## 113          0      114.2  ...      16695.0       15677.750000
## 82           3       95.9  ...      12629.0       13909.008350
## 57           3       95.3  ...      13645.0       11499.450000
## 158          0       95.7  ...       7898.0        7993.925000
## 58           3       95.3  ...      15645.0       11575.100000
## 17           0      110.0  ...      36880.0       35511.350000
## 129          1       98.4  ...      31400.5       36234.700000
## 150          1       95.7  ...       5348.0        6544.775000
## 73           0      120.9  ...      40960.0       36366.200000
## 116          0      107.9  ...      17950.0       16684.300000
## 30           2       86.6  ...       6479.0        5707.925000
## 
## [41 rows x 45 columns]

3.6.3.4 RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2468.378731065271

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2468.378731065271

3.7 Evaluación de modelos de regresión

Se comparan las predicciones de lo tres modelos de regresión

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 36           0       96.5  ...                7295.0           7491.525000
## 198         -2      104.3  ...               18950.0          17491.000000
## 102          0      100.4  ...               13499.0          14890.200000
## 146          0       97.0  ...                9298.0           8524.800000
## 79           1       93.0  ...                7895.0           8057.650000
## 32           1       93.7  ...                5572.0           5844.125000
## 107          0      107.9  ...               16630.0          15933.550000
## 180         -1      104.5  ...               15750.0          16705.900000
## 127          3       89.5  ...               32528.0          33379.850000
## 149          0       96.9  ...               10198.0          12634.366667
## 43           0       94.3  ...                8845.0           8236.600000
## 40           0       96.5  ...                6989.0           8964.050000
## 203         -1      109.1  ...               22625.0          18462.600000
## 138          2       93.7  ...                7799.0           7126.600000
## 201         -1      109.1  ...               22625.0          18935.050000
## 20           0       94.5  ...                6849.0           8497.975000
## 164          1       94.5  ...                7975.0           8069.150000
## 65           0      104.9  ...               14997.5          13756.650000
## 22           1       93.7  ...                6189.0           5623.750000
## 186          2       97.3  ...                7898.0           8621.400000
## 106          1       99.2  ...               15998.0          18107.500000
## 156          0       95.7  ...                7609.0           7749.250000
## 111          0      107.9  ...               16630.0          16508.125000
## 68          -1      110.0  ...               28176.0          28952.950000
## 123         -1      103.3  ...                8921.0           9214.950000
## 108          0      107.9  ...               16900.0          16684.300000
## 78           2       93.7  ...                6189.0           6481.550000
## 8            1      105.8  ...               17710.0          19457.366700
## 74           1      112.0  ...               41315.0          36978.050000
## 10           2      101.2  ...               16925.0          15329.550000
## 113          0      114.2  ...               13415.0          15677.750000
## 82           3       95.9  ...               12764.0          13909.008350
## 57           3       95.3  ...               11395.0          11499.450000
## 158          0       95.7  ...                7099.0           7993.925000
## 58           3       95.3  ...               11395.0          11575.100000
## 17           0      110.0  ...               33900.0          35511.350000
## 129          1       98.4  ...               37028.0          36234.700000
## 150          1       95.7  ...                6488.0           6544.775000
## 73           0      120.9  ...               34184.0          36366.200000
## 116          0      107.9  ...               16900.0          16684.300000
## 30           2       86.6  ...                5151.0           5707.925000
## 
## [41 rows x 47 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2887.3005726 , 2609.8260642 , 2468.37873107]])

Se construye data.frame a partir del arreglo numpy


rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  2887.300573  2609.826064  2468.378731

4 Interpretación

Puede ser similar a la de R ….. Pendiente …..

Se cargaron todos los datos numéricos y categóricos del conjunto de datos de precios de automóviles.

Pendiente

El mejor modelo de regresión conforme al estadístico raiz del error cuadrático medio (rmse) fue bosques aleatorios; se tuvo como resultado un de 2468.37 de diferencia en promedio de las predicciones conforme a valores reales.

Se construyeron datos de entrenamiento y validación y con el porcentaje de 80% y 20% respectivamente.