Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

Descripción

Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv

Desarrollo

Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd

# Gráficos
import matplotlib.pyplot as plt

# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split

# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial

# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV

# Random Forest
from sklearn.ensemble import RandomForestRegressor


# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
##      car_ID  symboling                   CarName  ... citympg highwaympg    price
## 0         1          3        alfa-romero giulia  ...      21         27  13495.0
## 1         2          3       alfa-romero stelvio  ...      21         27  16500.0
## 2         3          1  alfa-romero Quadrifoglio  ...      19         26  16500.0
## 3         4          2               audi 100 ls  ...      24         30  13950.0
## 4         5          2                audi 100ls  ...      18         22  17450.0
## ..      ...        ...                       ...  ...     ...        ...      ...
## 200     201         -1           volvo 145e (sw)  ...      23         28  16845.0
## 201     202         -1               volvo 144ea  ...      19         25  19045.0
## 202     203         -1               volvo 244dl  ...      18         23  21485.0
## 203     204         -1                 volvo 246  ...      26         27  22470.0
## 204     205         -1               volvo 264gl  ...      19         25  22625.0
## 
## [205 rows x 26 columns]

Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID                int64
## symboling             int64
## CarName              object
## fueltype             object
## aspiration           object
## doornumber           object
## carbody              object
## drivewheel           object
## enginelocation       object
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginetype           object
## cylindernumber       object
## enginesize            int64
## fuelsystem           object
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

Diccionario de datos

Col Nombre Descripción
1 Car_ID Unique id of each observation (Interger)
2 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3 carCompany Name of car company (Categorical)
4 fueltype Car fuel type i.e gas or diesel (Categorical)
5 aspiration Aspiration used in a car (Categorical) (Std o Turbo)
6 doornumber Number of doors in a car (Categorical). Puertas
7 carbody body of car (Categorical). (convertible, sedan, wagon …)
8 drivewheel type of drive wheel (Categorical). (hidráulica, manual, )
9 enginelocation Location of car engine (Categorical). Lugar del motor
10 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
11 carlength Length of car (Numeric). Longitud
12 carwidth Width of car (Numeric). Amplitud
13 carheight height of car (Numeric). Altura
14 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
15 enginetype Type of engine. (Categorical). Tipo de motor
16 cylindernumber cylinder placed in the car (Categorical). Cilindraje
17 enginesize Size of car (Numeric). Tamaño del carro en …
18 fuelsystem Fuel system of car (Categorical)
19 boreratio Boreratio of car (Numeric). Eficiencia de motor
20 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
21 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
22 horsepower Horsepower (Numeric). Poder del carro
23 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
24 citympg Mileage in city (Numeric). Consumo de gasolina
25 highwaympg Mileage on highway (Numeric). Consumo de gasolina
26

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

Fuentehttps://archive.ics.uci.edu/ml/datasets/Automobile

Preparación de datos

Eliminar variables

Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName

datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
##      symboling fueltype aspiration  ... citympg highwaympg    price
## 0            3      gas        std  ...      21         27  13495.0
## 1            3      gas        std  ...      21         27  16500.0
## 2            1      gas        std  ...      19         26  16500.0
## 3            2      gas        std  ...      24         30  13950.0
## 4            2      gas        std  ...      18         22  17450.0
## ..         ...      ...        ...  ...     ...        ...      ...
## 200         -1      gas        std  ...      23         28  16845.0
## 201         -1      gas      turbo  ...      19         25  19045.0
## 202         -1      gas        std  ...      18         23  21485.0
## 203         -1   diesel      turbo  ...      26         27  22470.0
## 204         -1      gas      turbo  ...      19         25  22625.0
## 
## [205 rows x 24 columns]

Construir cariables Dummys

Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object

Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.

El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.

¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.

Ejemplo

genero
MASCULINO
FEMENINO
MASCULINO

Mismos datos con variables dummis

genero_masculino genero_femenino
1 0
0 1
1 0
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 0            3       88.6  ...                0                0
## 1            3       88.6  ...                0                0
## 2            1       94.5  ...                0                0
## 3            2       99.8  ...                0                0
## 4            2       99.4  ...                0                0
## ..         ...        ...  ...              ...              ...
## 200         -1      109.1  ...                0                0
## 201         -1      109.1  ...                0                0
## 202         -1      109.1  ...                0                0
## 203         -1      109.1  ...                0                0
## 204         -1      109.1  ...                0                0
## 
## [205 rows x 44 columns]

Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1280

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80,  random_state = 1280)

Datos de entrenamiento

X_entrena
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 68          -1      110.0  ...                0                0
## 135          2       99.1  ...                0                0
## 34           1       93.7  ...                0                0
## 186          2       97.3  ...                0                0
## 30           2       86.6  ...                0                0
## ..         ...        ...  ...              ...              ...
## 173         -1      102.4  ...                0                0
## 49           0      102.0  ...                0                0
## 178          3      102.9  ...                0                0
## 3            2       99.8  ...                0                0
## 189          3       94.5  ...                0                0
## 
## [164 rows x 43 columns]

Datos de validación

X_valida
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 128          3       89.5  ...                0                0
## 55           3       95.3  ...                0                0
## 14           1      103.5  ...                0                0
## 42           1       96.5  ...                0                0
## 88          -1       96.3  ...                1                0
## 15           0      103.5  ...                0                0
## 107          0      107.9  ...                0                0
## 194         -2      104.3  ...                0                0
## 64           0       98.8  ...                0                0
## 199         -1      104.3  ...                0                0
## 7            1      105.8  ...                0                0
## 133          2       99.1  ...                0                0
## 136          3       99.1  ...                0                0
## 59           1       98.8  ...                0                0
## 62           0       98.8  ...                0                0
## 17           0      110.0  ...                0                0
## 166          1       94.5  ...                0                0
## 12           0      101.2  ...                0                0
## 138          2       93.7  ...                0                0
## 6            1      105.8  ...                0                0
## 119          1       93.7  ...                1                0
## 74           1      112.0  ...                0                0
## 187          2       97.3  ...                0                0
## 160          0       95.7  ...                0                0
## 182          2       97.3  ...                0                0
## 171          2       98.4  ...                0                0
## 50           1       93.1  ...                0                0
## 177         -1      102.4  ...                0                0
## 191          0      100.4  ...                0                0
## 53           1       93.1  ...                0                0
## 111          0      107.9  ...                0                0
## 197         -1      104.3  ...                0                0
## 156          0       95.7  ...                0                0
## 132          3       99.1  ...                0                0
## 5            2       99.8  ...                0                0
## 54           1       93.1  ...                0                0
## 179          3      102.9  ...                0                0
## 184          2       97.3  ...                0                0
## 36           0       96.5  ...                0                0
## 4            2       99.4  ...                0                0
## 109          0      114.2  ...                0                0
## 
## [41 rows x 43 columns]

Modelos Supervisados

Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 2.18001631e+02,  7.98408464e+01, -6.63545235e+01,  6.28381129e+02,
##         1.36576714e+01,  5.39647046e+00,  9.54415488e+01, -1.06219421e+03,
##        -4.10493549e+03, -6.12985603e+02,  5.07080839e-01,  2.11049109e+00,
##        -2.13646758e+02,  2.70003243e+02, -4.51807409e+03,  8.72823653e+02,
##         4.43428672e+02, -4.44671616e+03, -3.20702120e+03, -1.58917172e+03,
##        -2.97871087e+03,  3.06190065e+02,  1.42642247e+03,  9.16799620e+03,
##        -5.75689685e+03, -4.30457810e+02,  3.48215006e+03,  1.98214895e+03,
##        -5.06620511e+03, -6.53695832e+02, -7.24395733e+03, -1.01233241e+04,
##        -6.08995629e+03, -1.48239164e+03, -7.73314552e+03, -6.53695832e+02,
##        -2.09754696e+02, -1.21645363e+03,  4.51807409e+03, -2.38228193e+03,
##        -1.35841906e+02, -2.35237245e+03, -1.58999473e+03])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.0000 significa que las variables independientes explican aproximadamente el 00.00% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9393250510068653

Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [37962.16085946 12731.50882386 22778.32764671 10547.10275963
##   9132.42155983 29764.87032954 13408.99438043 17347.64882652
##   9509.85869876 17787.94407715 21081.08784834 14123.74297246
##  11791.95807102  9930.84637472 11046.7611251  33432.38841051
##   7512.62175913 21025.50547975  7440.43701072 21877.01525524
##   6569.53908634 36878.91882902  9627.97435347  9136.91492541
##  10590.63109053 13673.07511573  4058.60356406  7887.45999625
##  17963.83757767  6695.27636089 17809.72158065 16681.76067389
##   8075.9024471  12961.52452907 18022.39589349  6669.14900262
##  20955.30111267 10163.39182974  6778.3411609  18789.86267249]

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 128          3       89.5  ...      37028.0       37962.160859
## 55           3       95.3  ...      10945.0       12731.508824
## 14           1      103.5  ...      24565.0       22778.327647
## 42           1       96.5  ...      10345.0       10547.102760
## 88          -1       96.3  ...       9279.0        9132.421560
## 15           0      103.5  ...      30760.0       29764.870330
## 107          0      107.9  ...      11900.0       13408.994380
## 194         -2      104.3  ...      12940.0       17347.648827
## 64           0       98.8  ...      11245.0        9509.858699
## 199         -1      104.3  ...      18950.0       17787.944077
## 7            1      105.8  ...      18920.0       21081.087848
## 133          2       99.1  ...      12170.0       14123.742972
## 136          3       99.1  ...      18150.0       11791.958071
## 59           1       98.8  ...       8845.0        9930.846375
## 62           0       98.8  ...      10245.0       11046.761125
## 17           0      110.0  ...      36880.0       33432.388411
## 166          1       94.5  ...       9538.0        7512.621759
## 12           0      101.2  ...      20970.0       21025.505480
## 138          2       93.7  ...       5118.0        7440.437011
## 6            1      105.8  ...      17710.0       21877.015255
## 119          1       93.7  ...       7957.0        6569.539086
## 74           1      112.0  ...      45400.0       36878.918829
## 187          2       97.3  ...       9495.0        9627.974353
## 160          0       95.7  ...       7738.0        9136.914925
## 182          2       97.3  ...       7775.0       10590.631091
## 171          2       98.4  ...      11549.0       13673.075116
## 50           1       93.1  ...       5195.0        4058.603564
## 177         -1      102.4  ...      11248.0        7887.459996
## 191          0      100.4  ...      13295.0       17963.837578
## 53           1       93.1  ...       6695.0        6695.276361
## 111          0      107.9  ...      15580.0       17809.721581
## 197         -1      104.3  ...      16515.0       16681.760674
## 156          0       95.7  ...       6938.0        8075.902447
## 132          3       99.1  ...      11850.0       12961.524529
## 5            2       99.8  ...      15250.0       18022.395893
## 54           1       93.1  ...       7395.0        6669.149003
## 179          3      102.9  ...      15998.0       20955.301113
## 184          2       97.3  ...       7995.0       10163.391830
## 36           0       96.5  ...       7295.0        6778.341161
## 4            2       99.4  ...      17450.0       18789.862672
## 109          0      114.2  ...      12440.0       12873.501508
## 
## [41 rows x 45 columns]

RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2679.906783891391

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2679.906783891391

Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 1280
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1280)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))

print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 15
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 154
#plot = plot_tree(
#            decision_tree = modelo_ar,
#            feature_names = datos.drop(columns = "price").columns,
#            class_names   = 'price',
#            filled        = True,
#            impurity      = False,
#            fontsize      = 10,
#            precision     = 2,
#            ax            = ax
#       )

#plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos_dummis.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2544.00
## |   |   |--- horsepower <= 89.00
## |   |   |   |--- curbweight <= 2121.00
## |   |   |   |   |--- horsepower <= 68.50
## |   |   |   |   |   |--- curbweight <= 1987.00
## |   |   |   |   |   |   |--- highwaympg <= 38.50
## |   |   |   |   |   |   |   |--- curbweight <= 1924.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1902.50
## |   |   |   |   |   |   |   |   |   |--- boreratio <= 3.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6377.00]
## |   |   |   |   |   |   |   |   |   |--- boreratio >  3.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6095.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1902.50
## |   |   |   |   |   |   |   |   |   |--- value: [6795.00]
## |   |   |   |   |   |   |   |--- curbweight >  1924.50
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6229.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6189.00]
## |   |   |   |   |   |   |--- highwaympg >  38.50
## |   |   |   |   |   |   |   |--- citympg <= 48.00
## |   |   |   |   |   |   |   |   |--- wheelbase <= 91.05
## |   |   |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  91.05
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 91.00
## |   |   |   |   |   |   |   |   |   |   |--- carlength <= 153.65
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5399.00]
## |   |   |   |   |   |   |   |   |   |   |--- carlength >  153.65
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  91.00
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.13
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5348.00]
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.13
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |   |   |--- citympg >  48.00
## |   |   |   |   |   |   |   |   |--- value: [6479.00]
## |   |   |   |   |   |--- curbweight >  1987.00
## |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |--- carlength <= 166.30
## |   |   |   |   |   |   |   |   |--- horsepower <= 61.50
## |   |   |   |   |   |   |   |   |   |--- value: [7099.00]
## |   |   |   |   |   |   |   |   |--- horsepower >  61.50
## |   |   |   |   |   |   |   |   |   |--- value: [7150.50]
## |   |   |   |   |   |   |   |--- carlength >  166.30
## |   |   |   |   |   |   |   |   |--- value: [6692.00]
## |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |--- compressionratio <= 9.20
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6488.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6338.00]
## |   |   |   |   |   |   |   |--- compressionratio >  9.20
## |   |   |   |   |   |   |   |   |--- value: [6669.00]
## |   |   |   |   |--- horsepower >  68.50
## |   |   |   |   |   |--- carheight <= 53.60
## |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |--- carlength <= 157.35
## |   |   |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |   |   |--- carlength >  157.35
## |   |   |   |   |   |   |   |   |--- enginesize <= 93.50
## |   |   |   |   |   |   |   |   |   |--- value: [6575.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  93.50
## |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2030.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7349.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2030.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7999.00]
## |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8249.00]
## |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |--- curbweight <= 1948.00
## |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 73.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6295.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  73.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6529.00]
## |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |--- value: [6855.00]
## |   |   |   |   |   |   |   |--- curbweight >  1948.00
## |   |   |   |   |   |   |   |   |--- citympg <= 30.50
## |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7198.00]
## |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7129.00]
## |   |   |   |   |   |   |   |   |--- citympg >  30.50
## |   |   |   |   |   |   |   |   |   |--- value: [7799.00]
## |   |   |   |   |   |--- carheight >  53.60
## |   |   |   |   |   |   |--- curbweight <= 1903.50
## |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |--- curbweight >  1903.50
## |   |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6649.00]
## |   |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |   |--- citympg <= 28.00
## |   |   |   |   |   |   |   |   |   |--- value: [7053.00]
## |   |   |   |   |   |   |   |   |--- citympg >  28.00
## |   |   |   |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7295.00]
## |   |   |   |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7499.00]
## |   |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7299.00]
## |   |   |   |--- curbweight >  2121.00
## |   |   |   |   |--- carlength <= 174.10
## |   |   |   |   |   |--- carwidth <= 63.90
## |   |   |   |   |   |   |--- highwaympg <= 30.00
## |   |   |   |   |   |   |   |--- value: [6785.00]
## |   |   |   |   |   |   |--- highwaympg >  30.00
## |   |   |   |   |   |   |   |--- citympg <= 29.00
## |   |   |   |   |   |   |   |   |--- horsepower <= 67.50
## |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |   |   |--- horsepower >  67.50
## |   |   |   |   |   |   |   |   |   |--- value: [7603.00]
## |   |   |   |   |   |   |   |--- citympg >  29.00
## |   |   |   |   |   |   |   |   |--- carlength <= 168.50
## |   |   |   |   |   |   |   |   |   |--- value: [7609.00]
## |   |   |   |   |   |   |   |   |--- carlength >  168.50
## |   |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |   |   |--- carwidth >  63.90
## |   |   |   |   |   |   |--- highwaympg <= 27.00
## |   |   |   |   |   |   |   |--- value: [9233.00]
## |   |   |   |   |   |   |--- highwaympg >  27.00
## |   |   |   |   |   |   |   |--- boreratio <= 3.23
## |   |   |   |   |   |   |   |   |--- curbweight <= 2282.00
## |   |   |   |   |   |   |   |   |   |--- carlength <= 166.90
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2131.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8358.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2131.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  166.90
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2255.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 5
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2255.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2282.00
## |   |   |   |   |   |   |   |   |   |--- value: [9095.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.23
## |   |   |   |   |   |   |   |   |--- carheight <= 50.50
## |   |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |   |--- carheight >  50.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2385.00
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 4650.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  4650.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2385.00
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 96.60
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  96.60
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |   |   |--- carlength >  174.10
## |   |   |   |   |   |--- compressionratio <= 15.75
## |   |   |   |   |   |   |--- carheight <= 54.80
## |   |   |   |   |   |   |   |--- curbweight <= 2338.00
## |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |   |--- curbweight >  2338.00
## |   |   |   |   |   |   |   |   |--- enginesize <= 116.00
## |   |   |   |   |   |   |   |   |   |--- value: [10295.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  116.00
## |   |   |   |   |   |   |   |   |   |--- value: [10595.00]
## |   |   |   |   |   |   |--- carheight >  54.80
## |   |   |   |   |   |   |   |--- carheight <= 57.65
## |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |   |   |--- carheight >  57.65
## |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |--- compressionratio >  15.75
## |   |   |   |   |   |   |--- carlength <= 176.70
## |   |   |   |   |   |   |   |--- value: [10698.00]
## |   |   |   |   |   |   |--- carlength >  176.70
## |   |   |   |   |   |   |   |--- value: [10795.00]
## |   |   |--- horsepower >  89.00
## |   |   |   |--- peakrpm <= 5650.00
## |   |   |   |   |--- carwidth <= 63.90
## |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |--- value: [8558.00]
## |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |   |   |--- value: [7689.00]
## |   |   |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |   |   |--- value: [7957.00]
## |   |   |   |   |--- carwidth >  63.90
## |   |   |   |   |   |--- stroke <= 3.43
## |   |   |   |   |   |   |--- carlength <= 175.05
## |   |   |   |   |   |   |   |--- carlength <= 162.50
## |   |   |   |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |   |   |   |   |--- carlength >  162.50
## |   |   |   |   |   |   |   |   |--- compressionratio <= 8.10
## |   |   |   |   |   |   |   |   |   |--- value: [11259.00]
## |   |   |   |   |   |   |   |   |--- compressionratio >  8.10
## |   |   |   |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- carlength <= 171.85
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- carlength >  171.85
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9960.00]
## |   |   |   |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [10198.00]
## |   |   |   |   |   |   |--- carlength >  175.05
## |   |   |   |   |   |   |   |--- value: [13950.00]
## |   |   |   |   |   |--- stroke >  3.43
## |   |   |   |   |   |   |--- curbweight <= 2538.00
## |   |   |   |   |   |   |   |--- curbweight <= 2408.50
## |   |   |   |   |   |   |   |   |--- symboling <= 2.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2313.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9549.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2313.00
## |   |   |   |   |   |   |   |   |   |   |--- enginesize <= 115.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |   |   |   |   |   |   |--- enginesize >  115.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- symboling >  2.00
## |   |   |   |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |   |   |   |   |--- curbweight >  2408.50
## |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9639.00]
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- enginesize <= 127.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |   |   |--- enginesize >  127.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [10898.00]
## |   |   |   |   |   |   |--- curbweight >  2538.00
## |   |   |   |   |   |   |   |--- value: [8449.00]
## |   |   |   |--- peakrpm >  5650.00
## |   |   |   |   |--- curbweight <= 2382.50
## |   |   |   |   |   |--- stroke <= 3.17
## |   |   |   |   |   |   |--- value: [9298.00]
## |   |   |   |   |   |--- stroke >  3.17
## |   |   |   |   |   |   |--- value: [11845.00]
## |   |   |   |   |--- curbweight >  2382.50
## |   |   |   |   |   |--- boreratio <= 3.41
## |   |   |   |   |   |   |--- horsepower <= 118.00
## |   |   |   |   |   |   |   |--- carlength <= 172.20
## |   |   |   |   |   |   |   |   |--- value: [13645.00]
## |   |   |   |   |   |   |   |--- carlength >  172.20
## |   |   |   |   |   |   |   |   |--- value: [12945.00]
## |   |   |   |   |   |   |--- horsepower >  118.00
## |   |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |--- boreratio >  3.41
## |   |   |   |   |   |   |--- symboling <= 1.00
## |   |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |   |   |   |--- symboling >  1.00
## |   |   |   |   |   |   |   |--- value: [16430.00]
## |   |--- curbweight >  2544.00
## |   |   |--- carwidth <= 67.45
## |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |--- horsepower <= 100.00
## |   |   |   |   |   |--- carheight <= 55.15
## |   |   |   |   |   |   |--- carheight <= 53.25
## |   |   |   |   |   |   |   |--- value: [11048.00]
## |   |   |   |   |   |   |--- carheight >  53.25
## |   |   |   |   |   |   |   |--- value: [12290.00]
## |   |   |   |   |   |--- carheight >  55.15
## |   |   |   |   |   |   |--- peakrpm <= 4950.00
## |   |   |   |   |   |   |   |--- value: [8778.00]
## |   |   |   |   |   |   |--- peakrpm >  4950.00
## |   |   |   |   |   |   |   |--- value: [9295.00]
## |   |   |   |   |--- horsepower >  100.00
## |   |   |   |   |   |--- boreratio <= 3.52
## |   |   |   |   |   |   |--- horsepower <= 153.00
## |   |   |   |   |   |   |   |--- enginesize <= 155.50
## |   |   |   |   |   |   |   |   |--- compressionratio <= 9.15
## |   |   |   |   |   |   |   |   |   |--- value: [14997.50]
## |   |   |   |   |   |   |   |   |--- compressionratio >  9.15
## |   |   |   |   |   |   |   |   |   |--- value: [15040.00]
## |   |   |   |   |   |   |   |--- enginesize >  155.50
## |   |   |   |   |   |   |   |   |--- value: [14399.00]
## |   |   |   |   |   |   |--- horsepower >  153.00
## |   |   |   |   |   |   |   |--- enginesize <= 156.50
## |   |   |   |   |   |   |   |   |--- value: [16500.00]
## |   |   |   |   |   |   |   |--- enginesize >  156.50
## |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |--- boreratio >  3.52
## |   |   |   |   |   |   |--- curbweight <= 2877.00
## |   |   |   |   |   |   |   |--- carlength <= 173.40
## |   |   |   |   |   |   |   |   |--- fuelsystem_spdi <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |   |   |--- fuelsystem_spdi >  0.50
## |   |   |   |   |   |   |   |   |   |--- drivewheel_fwd <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [12764.00]
## |   |   |   |   |   |   |   |   |   |--- drivewheel_fwd >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [12629.00]
## |   |   |   |   |   |   |   |--- carlength >  173.40
## |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 113.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11694.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  113.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |   |   |--- curbweight >  2877.00
## |   |   |   |   |   |   |   |--- peakrpm <= 4900.00
## |   |   |   |   |   |   |   |   |--- value: [17669.00]
## |   |   |   |   |   |   |   |--- peakrpm >  4900.00
## |   |   |   |   |   |   |   |   |--- aspiration_turbo <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [13415.00]
## |   |   |   |   |   |   |   |   |--- aspiration_turbo >  0.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [14869.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [14489.00]
## |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |--- carlength <= 178.50
## |   |   |   |   |   |--- horsepower <= 120.50
## |   |   |   |   |   |   |--- stroke <= 3.40
## |   |   |   |   |   |   |   |--- value: [18280.00]
## |   |   |   |   |   |   |--- stroke >  3.40
## |   |   |   |   |   |   |   |--- value: [18344.00]
## |   |   |   |   |   |--- horsepower >  120.50
## |   |   |   |   |   |   |--- value: [21105.00]
## |   |   |   |   |--- carlength >  178.50
## |   |   |   |   |   |--- horsepower <= 158.00
## |   |   |   |   |   |   |--- carlength <= 185.60
## |   |   |   |   |   |   |   |--- stroke <= 3.34
## |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |--- stroke >  3.34
## |   |   |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |   |--- carlength >  185.60
## |   |   |   |   |   |   |   |--- peakrpm <= 5325.00
## |   |   |   |   |   |   |   |   |--- enginetype_ohc <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |   |   |   |--- enginetype_ohc >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [15510.00]
## |   |   |   |   |   |   |   |--- peakrpm >  5325.00
## |   |   |   |   |   |   |   |   |--- value: [15985.00]
## |   |   |   |   |   |--- horsepower >  158.00
## |   |   |   |   |   |   |--- highwaympg <= 24.00
## |   |   |   |   |   |   |   |--- value: [18420.00]
## |   |   |   |   |   |   |--- highwaympg >  24.00
## |   |   |   |   |   |   |   |--- value: [18620.00]
## |   |   |--- carwidth >  67.45
## |   |   |   |--- carwidth <= 68.85
## |   |   |   |   |--- peakrpm <= 5100.00
## |   |   |   |   |   |--- curbweight <= 3224.50
## |   |   |   |   |   |   |--- fueltype_gas <= 0.50
## |   |   |   |   |   |   |   |--- value: [13200.00]
## |   |   |   |   |   |   |--- fueltype_gas >  0.50
## |   |   |   |   |   |   |   |--- aspiration_turbo <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [16630.00]
## |   |   |   |   |   |   |   |--- aspiration_turbo >  0.50
## |   |   |   |   |   |   |   |   |--- value: [16503.00]
## |   |   |   |   |   |--- curbweight >  3224.50
## |   |   |   |   |   |   |--- carheight <= 57.70
## |   |   |   |   |   |   |   |--- aspiration_turbo <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [16695.00]
## |   |   |   |   |   |   |   |--- aspiration_turbo >  0.50
## |   |   |   |   |   |   |   |   |--- value: [17425.00]
## |   |   |   |   |   |   |--- carheight >  57.70
## |   |   |   |   |   |   |   |--- curbweight <= 3457.50
## |   |   |   |   |   |   |   |   |--- value: [13860.00]
## |   |   |   |   |   |   |   |--- curbweight >  3457.50
## |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |   |--- peakrpm >  5100.00
## |   |   |   |   |   |--- boreratio <= 3.86
## |   |   |   |   |   |   |--- compressionratio <= 8.85
## |   |   |   |   |   |   |   |--- compressionratio <= 7.40
## |   |   |   |   |   |   |   |   |--- enginetype_ohc <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |   |--- enginetype_ohc >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [17859.17]
## |   |   |   |   |   |   |   |--- compressionratio >  7.40
## |   |   |   |   |   |   |   |   |--- compressionratio <= 8.25
## |   |   |   |   |   |   |   |   |   |--- value: [19699.00]
## |   |   |   |   |   |   |   |   |--- compressionratio >  8.25
## |   |   |   |   |   |   |   |   |   |--- value: [19045.00]
## |   |   |   |   |   |   |--- compressionratio >  8.85
## |   |   |   |   |   |   |   |--- symboling <= 2.00
## |   |   |   |   |   |   |   |   |--- value: [18399.00]
## |   |   |   |   |   |   |   |--- symboling >  2.00
## |   |   |   |   |   |   |   |   |--- enginesize <= 176.00
## |   |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  176.00
## |   |   |   |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |--- boreratio >  3.86
## |   |   |   |   |   |   |--- value: [22018.00]
## |   |   |   |--- carwidth >  68.85
## |   |   |   |   |--- highwaympg <= 27.50
## |   |   |   |   |   |--- horsepower <= 137.00
## |   |   |   |   |   |   |--- horsepower <= 124.00
## |   |   |   |   |   |   |   |--- horsepower <= 110.00
## |   |   |   |   |   |   |   |   |--- value: [22470.00]
## |   |   |   |   |   |   |   |--- horsepower >  110.00
## |   |   |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |   |   |   |--- horsepower >  124.00
## |   |   |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |   |   |--- horsepower >  137.00
## |   |   |   |   |   |   |--- value: [23875.00]
## |   |   |   |   |--- highwaympg >  27.50
## |   |   |   |   |   |--- value: [16845.00]
## |--- enginesize >  182.00
## |   |--- compressionratio <= 8.05
## |   |   |--- doornumber_two <= 0.50
## |   |   |   |--- value: [40960.00]
## |   |   |--- doornumber_two >  0.50
## |   |   |   |--- value: [41315.00]
## |   |--- compressionratio >  8.05
## |   |   |--- enginesize <= 188.50
## |   |   |   |--- carwidth <= 71.00
## |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |--- value: [28248.00]
## |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |--- value: [28176.00]
## |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |--- value: [25552.00]
## |   |   |   |--- carwidth >  71.00
## |   |   |   |   |--- value: [31600.00]
## |   |   |--- enginesize >  188.50
## |   |   |   |--- enginesize <= 218.50
## |   |   |   |   |--- highwaympg <= 26.50
## |   |   |   |   |   |--- value: [33278.00]
## |   |   |   |   |--- highwaympg >  26.50
## |   |   |   |   |   |--- value: [31400.50]
## |   |   |   |--- enginesize >  218.50
## |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |--- highwaympg <= 18.50
## |   |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |   |   |--- highwaympg >  18.50
## |   |   |   |   |   |   |--- value: [33900.00]
## |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |--- citympg <= 14.50
## |   |   |   |   |   |   |--- value: [36000.00]
## |   |   |   |   |   |--- citympg >  14.50
## |   |   |   |   |   |   |--- value: [35056.00]

Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos_dummis.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##                 predictor   importancia
## 6              enginesize  6.665171e-01
## 5              curbweight  2.153374e-01
## 10             horsepower  3.227645e-02
## 3                carwidth  2.904430e-02
## 9        compressionratio  1.557212e-02
## 11                peakrpm  1.209516e-02
## 19          carbody_sedan  8.951184e-03
## 2               carlength  7.580508e-03
## 7               boreratio  4.129452e-03
## 13             highwaympg  3.371561e-03
## 4               carheight  1.641767e-03
## 8                  stroke  1.219705e-03
## 14           fueltype_gas  7.800696e-04
## 18      carbody_hatchback  5.339940e-04
## 16         doornumber_two  3.286578e-04
## 0               symboling  2.359916e-04
## 12                citympg  2.081932e-04
## 15       aspiration_turbo  1.474810e-04
## 1               wheelbase  9.602660e-06
## 26         enginetype_ohc  6.038975e-06
## 41        fuelsystem_spdi  4.925198e-06
## 20          carbody_wagon  3.736431e-06
## 40        fuelsystem_mpfi  3.707720e-06
## 21         drivewheel_fwd  9.408164e-07
## 36        fuelsystem_2bbl  5.162230e-11
## 24       enginetype_dohcv  0.000000e+00
## 34  cylindernumber_twelve  0.000000e+00
## 17        carbody_hardtop  0.000000e+00
## 22         drivewheel_rwd  0.000000e+00
## 39         fuelsystem_mfi  0.000000e+00
## 38         fuelsystem_idi  0.000000e+00
## 37        fuelsystem_4bbl  0.000000e+00
## 35     cylindernumber_two  0.000000e+00
## 33   cylindernumber_three  0.000000e+00
## 25           enginetype_l  0.000000e+00
## 32     cylindernumber_six  0.000000e+00
## 31    cylindernumber_four  0.000000e+00
## 30    cylindernumber_five  0.000000e+00
## 29       enginetype_rotor  0.000000e+00
## 28        enginetype_ohcv  0.000000e+00
## 27        enginetype_ohcf  0.000000e+00
## 23    enginelocation_rear  0.000000e+00
## 42        fuelsystem_spfi  0.000000e+00

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase

Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([33278., 11845., 15510.,  9549.,  9279., 40960., 16630., 15985.,
##         8495., 14489., 22470., 15510.,  9989., 10595.,  8495., 40960.,
##         9298., 21105.,  7299., 22470.,  7689., 41315.,  9095.,  7999.,
##         8495.,  9989.,  6095.,  9988., 13845.,  6229., 16630., 13415.,
##         7999.,  9989., 13950.,  6229., 16558.,  8495.,  7295., 18280.,
##        13860.])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 128          3       89.5  ...      37028.0            33278.0
## 55           3       95.3  ...      10945.0            11845.0
## 14           1      103.5  ...      24565.0            15510.0
## 42           1       96.5  ...      10345.0             9549.0
## 88          -1       96.3  ...       9279.0             9279.0
## 15           0      103.5  ...      30760.0            40960.0
## 107          0      107.9  ...      11900.0            16630.0
## 194         -2      104.3  ...      12940.0            15985.0
## 64           0       98.8  ...      11245.0             8495.0
## 199         -1      104.3  ...      18950.0            14489.0
## 7            1      105.8  ...      18920.0            22470.0
## 133          2       99.1  ...      12170.0            15510.0
## 136          3       99.1  ...      18150.0             9989.0
## 59           1       98.8  ...       8845.0            10595.0
## 62           0       98.8  ...      10245.0             8495.0
## 17           0      110.0  ...      36880.0            40960.0
## 166          1       94.5  ...       9538.0             9298.0
## 12           0      101.2  ...      20970.0            21105.0
## 138          2       93.7  ...       5118.0             7299.0
## 6            1      105.8  ...      17710.0            22470.0
## 119          1       93.7  ...       7957.0             7689.0
## 74           1      112.0  ...      45400.0            41315.0
## 187          2       97.3  ...       9495.0             9095.0
## 160          0       95.7  ...       7738.0             7999.0
## 182          2       97.3  ...       7775.0             8495.0
## 171          2       98.4  ...      11549.0             9989.0
## 50           1       93.1  ...       5195.0             6095.0
## 177         -1      102.4  ...      11248.0             9988.0
## 191          0      100.4  ...      13295.0            13845.0
## 53           1       93.1  ...       6695.0             6229.0
## 111          0      107.9  ...      15580.0            16630.0
## 197         -1      104.3  ...      16515.0            13415.0
## 156          0       95.7  ...       6938.0             7999.0
## 132          3       99.1  ...      11850.0             9989.0
## 5            2       99.8  ...      15250.0            13950.0
## 54           1       93.1  ...       7395.0             6229.0
## 179          3      102.9  ...      15998.0            16558.0
## 184          2       97.3  ...       7995.0             8495.0
## 36           0       96.5  ...       7295.0             7295.0
## 4            2       99.4  ...      17450.0            18280.0
## 109          0      114.2  ...      12440.0            13860.0
## 
## [41 rows x 45 columns]

RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 3297.2461458528483

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 3297.2461458528483

Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1280)

modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1280)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Variables de importancia

# pendiente ... ...

Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([31863.775     , 12755.        , 17206.7       ,  8890.        ,
##         9388.        , 37686.5       , 16791.75      , 15437.95      ,
##         9825.        , 17011.15      , 20134.6       , 14415.25      ,
##        16768.60835   , 10256.75      ,  9757.15      , 38117.43571429,
##        10736.55      , 18657.41666667,  7125.55      , 20777.25      ,
##         7964.3       , 39006.63571429,  8054.7       ,  7736.85      ,
##         7299.5       , 12614.1       ,  6105.65      , 10292.6       ,
##        14880.25      ,  6664.8       , 16676.4       , 14416.5       ,
##         7775.6       , 14035.25      , 14001.575     ,  6723.76666667,
##        17296.75      ,  7299.5       ,  7737.33333333, 16814.3       ,
##        16138.75      ])

Tabla comparativa


comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 128          3       89.5  ...      37028.0       31863.775000
## 55           3       95.3  ...      10945.0       12755.000000
## 14           1      103.5  ...      24565.0       17206.700000
## 42           1       96.5  ...      10345.0        8890.000000
## 88          -1       96.3  ...       9279.0        9388.000000
## 15           0      103.5  ...      30760.0       37686.500000
## 107          0      107.9  ...      11900.0       16791.750000
## 194         -2      104.3  ...      12940.0       15437.950000
## 64           0       98.8  ...      11245.0        9825.000000
## 199         -1      104.3  ...      18950.0       17011.150000
## 7            1      105.8  ...      18920.0       20134.600000
## 133          2       99.1  ...      12170.0       14415.250000
## 136          3       99.1  ...      18150.0       16768.608350
## 59           1       98.8  ...       8845.0       10256.750000
## 62           0       98.8  ...      10245.0        9757.150000
## 17           0      110.0  ...      36880.0       38117.435714
## 166          1       94.5  ...       9538.0       10736.550000
## 12           0      101.2  ...      20970.0       18657.416667
## 138          2       93.7  ...       5118.0        7125.550000
## 6            1      105.8  ...      17710.0       20777.250000
## 119          1       93.7  ...       7957.0        7964.300000
## 74           1      112.0  ...      45400.0       39006.635714
## 187          2       97.3  ...       9495.0        8054.700000
## 160          0       95.7  ...       7738.0        7736.850000
## 182          2       97.3  ...       7775.0        7299.500000
## 171          2       98.4  ...      11549.0       12614.100000
## 50           1       93.1  ...       5195.0        6105.650000
## 177         -1      102.4  ...      11248.0       10292.600000
## 191          0      100.4  ...      13295.0       14880.250000
## 53           1       93.1  ...       6695.0        6664.800000
## 111          0      107.9  ...      15580.0       16676.400000
## 197         -1      104.3  ...      16515.0       14416.500000
## 156          0       95.7  ...       6938.0        7775.600000
## 132          3       99.1  ...      11850.0       14035.250000
## 5            2       99.8  ...      15250.0       14001.575000
## 54           1       93.1  ...       7395.0        6723.766667
## 179          3      102.9  ...      15998.0       17296.750000
## 184          2       97.3  ...       7995.0        7299.500000
## 36           0       96.5  ...       7295.0        7737.333333
## 4            2       99.4  ...      17450.0       16814.300000
## 109          0      114.2  ...      12440.0       16138.750000
## 
## [41 rows x 45 columns]

RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2616.3578144633602

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2616.3578144633602

Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 128          3       89.5  ...               33278.0          31863.775000
## 55           3       95.3  ...               11845.0          12755.000000
## 14           1      103.5  ...               15510.0          17206.700000
## 42           1       96.5  ...                9549.0           8890.000000
## 88          -1       96.3  ...                9279.0           9388.000000
## 15           0      103.5  ...               40960.0          37686.500000
## 107          0      107.9  ...               16630.0          16791.750000
## 194         -2      104.3  ...               15985.0          15437.950000
## 64           0       98.8  ...                8495.0           9825.000000
## 199         -1      104.3  ...               14489.0          17011.150000
## 7            1      105.8  ...               22470.0          20134.600000
## 133          2       99.1  ...               15510.0          14415.250000
## 136          3       99.1  ...                9989.0          16768.608350
## 59           1       98.8  ...               10595.0          10256.750000
## 62           0       98.8  ...                8495.0           9757.150000
## 17           0      110.0  ...               40960.0          38117.435714
## 166          1       94.5  ...                9298.0          10736.550000
## 12           0      101.2  ...               21105.0          18657.416667
## 138          2       93.7  ...                7299.0           7125.550000
## 6            1      105.8  ...               22470.0          20777.250000
## 119          1       93.7  ...                7689.0           7964.300000
## 74           1      112.0  ...               41315.0          39006.635714
## 187          2       97.3  ...                9095.0           8054.700000
## 160          0       95.7  ...                7999.0           7736.850000
## 182          2       97.3  ...                8495.0           7299.500000
## 171          2       98.4  ...                9989.0          12614.100000
## 50           1       93.1  ...                6095.0           6105.650000
## 177         -1      102.4  ...                9988.0          10292.600000
## 191          0      100.4  ...               13845.0          14880.250000
## 53           1       93.1  ...                6229.0           6664.800000
## 111          0      107.9  ...               16630.0          16676.400000
## 197         -1      104.3  ...               13415.0          14416.500000
## 156          0       95.7  ...                7999.0           7775.600000
## 132          3       99.1  ...                9989.0          14035.250000
## 5            2       99.8  ...               13950.0          14001.575000
## 54           1       93.1  ...                6229.0           6723.766667
## 179          3      102.9  ...               16558.0          17296.750000
## 184          2       97.3  ...                8495.0           7299.500000
## 36           0       96.5  ...                7295.0           7737.333333
## 4            2       99.4  ...               18280.0          16814.300000
## 109          0      114.2  ...               13860.0          16138.750000
## 
## [41 rows x 47 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2679.90678389, 3297.24614585, 2616.35781446]])

Se construye data.frame a partir del rreglo nmpy


rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  2679.906784  3297.246146  2616.357814

Interpretación

El ejercicio consistió en cargar un conjunto de datos numéricos de precios de automóviles con respecto a todas las variables numéricas y categóricas respectivamente.

El modelo de regresión linea múltiple destaca el estadístico Adjusted R-squared con un valor de 0.9393, lo que se define como que las variables independientes explican aproximadamente el 93.93% de la variable dependiente precio.

En el modelo de árbol de regresión las variables que corresponden a los predictores más importantes para este modelo son enginesize, curbweight, horsepower, carwidth y compressionratio

El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.

La variable enginesize continua estando presente como la más importante en todos los modelos de regresión, incluso en los que corresponden a la programación en R.

El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. El valor que arrojó fue de 2616.357814, siendo el más bajo de los 3 modelos de regresión. Comparando este resultado con el del anterior caso, donde no estaban involucradas las variables categóricas, la cantidad si disminuyó un poco.

Cabe señalar que en la realización de este modelo, usando la semilla 1280, no hubo inconvenientes ni errores con respecto a elementos de datos de validación que no sean reconocidos en el modelo por no haber estado presentes en los datos de entrenamiento. Por lo tanto esto significa que los datos de entrenamiento cubren y garantizan todos los posibles valores de las variables categoricas en los datos de validación respectivamente.

Finalmente comparando los resultados en R con los resultados arrojados en Python, el modelo que proporcionó el menor valor del estádistico RMSE fue el de random forest en ambos casos. No obstante, en R tuvo una cantidad de 2271.819 y en Python tuvo otra de 2616.357814, por lo tanto se puede concluir en que el modelo más óptimo, haciendo uso de todas las variables numéricas y categóricas de este caso especificamente, vuelve a ser el random forest pero haciendo uso de la programación en R.