1 Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

2 Descripción

Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv

  • Participan todas las variables del conjunto de datos.

  • Se crean datos de entrenamiento al 80%

  • Se crean datos de validación al 20%

  • Se crea el modelo regresión múltiple con datos de entrenamiento

    • Con este modelo se responde a preguntas tales como:

    • ¿cuáles son variables que están por encima del 90% de confianza como predictores?,

    • ¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Se crea el modelo árboles de regresión con los datos de entrenamiento

    • Se identifica la importancia de las variables sobre el precio

    • Se visualiza el árbol de regresión y sus reglas de asociación

    • Se hacen predicciones con datos de validación

    • Se determinar el estadístico RMSE para efectos de comparación

  • Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados

    • Se identifica la importancia de las variables sobre el precio

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.

3 Desarrollo

3.1 Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

3.2 Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
##      car_ID  symboling                   CarName  ... citympg highwaympg    price
## 0         1          3        alfa-romero giulia  ...      21         27  13495.0
## 1         2          3       alfa-romero stelvio  ...      21         27  16500.0
## 2         3          1  alfa-romero Quadrifoglio  ...      19         26  16500.0
## 3         4          2               audi 100 ls  ...      24         30  13950.0
## 4         5          2                audi 100ls  ...      18         22  17450.0
## ..      ...        ...                       ...  ...     ...        ...      ...
## 200     201         -1           volvo 145e (sw)  ...      23         28  16845.0
## 201     202         -1               volvo 144ea  ...      19         25  19045.0
## 202     203         -1               volvo 244dl  ...      18         23  21485.0
## 203     204         -1                 volvo 246  ...      26         27  22470.0
## 204     205         -1               volvo 264gl  ...      19         25  22625.0
## 
## [205 rows x 26 columns]

3.3 Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID                int64
## symboling             int64
## CarName              object
## fueltype             object
## aspiration           object
## doornumber           object
## carbody              object
## drivewheel           object
## enginelocation       object
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginetype           object
## cylindernumber       object
## enginesize            int64
## fuelsystem           object
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

3.4 Diccionario de datos

Col Nombre Descripción
1 Car_ID Unique id of each observation (Interger)
2 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3 carCompany Name of car company (Categorical)
4 fueltype Car fuel type i.e gas or diesel (Categorical)
5 aspiration Aspiration used in a car (Categorical) (Std o Turbo)
6 doornumber Number of doors in a car (Categorical). Puertas
7 carbody body of car (Categorical). (convertible, sedan, wagon …)
8 drivewheel type of drive wheel (Categorical). (hidráulica, manual, )
9 enginelocation Location of car engine (Categorical). Lugar del motor
10 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
11 carlength Length of car (Numeric). Longitud
12 carwidth Width of car (Numeric). Amplitud
13 carheight height of car (Numeric). Altura
14 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
15 enginetype Type of engine. (Categorical). Tipo de motor
16 cylindernumber cylinder placed in the car (Categorical). Cilindraje
17 enginesize Size of car (Numeric). Tamaño del carro en …
18 fuelsystem Fuel system of car (Categorical)
19 boreratio Boreratio of car (Numeric). Eficiencia de motor
20 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
21 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
22 horsepower Horsepower (Numeric). Poder del carro
23 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
24 citympg Mileage in city (Numeric). Consumo de gasolina
25 highwaympg Mileage on highway (Numeric). Consumo de gasolina
26

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~

3.5 Preparación de datos

3.5.1 Eliminar variables

Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName

datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
##      symboling fueltype aspiration  ... citympg highwaympg    price
## 0            3      gas        std  ...      21         27  13495.0
## 1            3      gas        std  ...      21         27  16500.0
## 2            1      gas        std  ...      19         26  16500.0
## 3            2      gas        std  ...      24         30  13950.0
## 4            2      gas        std  ...      18         22  17450.0
## ..         ...      ...        ...  ...     ...        ...      ...
## 200         -1      gas        std  ...      23         28  16845.0
## 201         -1      gas      turbo  ...      19         25  19045.0
## 202         -1      gas        std  ...      18         23  21485.0
## 203         -1   diesel      turbo  ...      26         27  22470.0
## 204         -1      gas      turbo  ...      19         25  22625.0
## 
## [205 rows x 24 columns]

3.5.2 Construir cariables Dummys

Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object

Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.

El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.

¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.

Ejemplo

genero
MASCULINO
FEMENINO
MASCULINO

Mismos datos con variables dummis

genero_masculino genero_femenino
1 0
0 1
1 0
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 0            3       88.6  ...                0                0
## 1            3       88.6  ...                0                0
## 2            1       94.5  ...                0                0
## 3            2       99.8  ...                0                0
## 4            2       99.4  ...                0                0
## ..         ...        ...  ...              ...              ...
## 200         -1      109.1  ...                0                0
## 201         -1      109.1  ...                0                0
## 202         -1      109.1  ...                0                0
## 203         -1      109.1  ...                0                0
## 204         -1      109.1  ...                0                0
## 
## [205 rows x 44 columns]

3.5.3 Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1270

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80,  random_state = 1270)

3.5.3.1 Datos de entrenamiento

X_entrena
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 14           1      103.5  ...                0                0
## 24           1       93.7  ...                0                0
## 195         -1      104.3  ...                0                0
## 118          1       93.7  ...                0                0
## 112          0      107.9  ...                0                0
## ..         ...        ...  ...              ...              ...
## 99           0       97.2  ...                0                0
## 57           3       95.3  ...                0                0
## 50           1       93.1  ...                0                0
## 23           1       93.7  ...                0                0
## 46           2       96.0  ...                0                1
## 
## [164 rows x 43 columns]

3.5.3.2 Datos de validación

X_valida
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 91           1       94.5  ...                0                0
## 188          2       97.3  ...                0                0
## 136          3       99.1  ...                0                0
## 178          3      102.9  ...                0                0
## 191          0      100.4  ...                0                0
## 41           0       96.5  ...                0                0
## 138          2       93.7  ...                0                0
## 125          3       94.5  ...                0                0
## 68          -1      110.0  ...                0                0
## 159          0       95.7  ...                0                0
## 165          1       94.5  ...                0                0
## 110          0      114.2  ...                0                0
## 128          3       89.5  ...                0                0
## 12           0      101.2  ...                0                0
## 21           1       93.7  ...                0                0
## 42           1       96.5  ...                0                0
## 130          0       96.1  ...                0                0
## 49           0      102.0  ...                0                0
## 79           1       93.0  ...                1                0
## 154          0       95.7  ...                0                0
## 108          0      107.9  ...                0                0
## 200         -1      109.1  ...                0                0
## 31           2       86.6  ...                0                0
## 66           0      104.9  ...                0                0
## 105          3       91.3  ...                0                0
## 143          0       97.2  ...                0                0
## 157          0       95.7  ...                0                0
## 140          2       93.3  ...                0                0
## 78           2       93.7  ...                0                0
## 122          1       93.7  ...                0                0
## 150          1       95.7  ...                0                0
## 33           1       93.7  ...                0                0
## 15           0      103.5  ...                0                0
## 63           0       98.8  ...                0                0
## 43           0       94.3  ...                0                0
## 28          -1      103.3  ...                0                0
## 9            0       99.5  ...                0                0
## 97           1       94.5  ...                0                0
## 95           1       94.5  ...                0                0
## 147          0       97.0  ...                0                0
## 81           3       96.3  ...                0                0
## 
## [41 rows x 43 columns]

3.6 Modelos Supervisados

3.6.1 Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.6.1.1 Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 7.08246130e+01,  4.97123832e+01, -2.16196011e+01,  5.48528107e+02,
##         5.27371588e+01,  3.82732156e+00,  1.19387168e+02, -1.90095363e+03,
##        -4.52661596e+03, -1.26052438e+03,  1.94580412e+00,  2.38234138e+00,
##         8.28824386e+00,  9.34452584e+01, -8.56556091e+03,  1.80339184e+03,
##        -6.44361199e+00, -3.48242766e+03, -3.78594142e+03, -2.95267608e+03,
##        -4.43529876e+03,  3.33446885e+02,  1.27793976e+03,  7.59764056e+03,
##        -3.41562385e+03,  1.85344789e+02,  3.64200606e+03,  1.96999358e+03,
##        -5.23758925e+03,  6.40707629e+02, -8.62431175e+03, -1.02967042e+04,
##        -5.78054666e+03, -2.28037137e+03, -8.64019967e-12,  6.40707629e+02,
##        -1.46827527e+02, -2.05584075e+03,  8.56556091e+03, -3.77553325e+03,
##        -2.60481249e+02, -3.44964556e+03, -5.12116830e+02])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.0000 significa que las variables independientes explican aproximadamente el 00.00% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9403176387600196

3.6.1.2 Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 6210.17677474  9001.16003137 11694.59290399 20638.42790364
##  16948.32352563 10089.76805812  7173.53049915 17263.34502768
##  26803.60880878  8073.32087234  8191.92373462 16252.52201166
##  36928.82980856 20705.74251711  6055.94363584  9466.96761627
##   9212.65992687 43574.42836317  6880.11129263  5658.2567314
##  17460.90236722 18151.05606293  6711.3249534  13153.36507387
##  20616.77263081 10143.10039083  7086.24773204  6595.5361855
##   6867.09661419  8522.70022369  6253.75631603  7107.52874324
##  31173.31571665 11409.83549429  8449.78726549  9745.354715
##  20755.85344936  5030.77576914  5728.14633041  8982.88289   ]

3.6.1.3 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 91           1       94.5  ...     6649.000        6210.176775
## 188          2       97.3  ...     9995.000        9001.160031
## 136          3       99.1  ...    18150.000       11694.592904
## 178          3      102.9  ...    16558.000       20638.427904
## 191          0      100.4  ...    13295.000       16948.323526
## 41           0       96.5  ...    12945.000       10089.768058
## 138          2       93.7  ...     5118.000        7173.530499
## 125          3       94.5  ...    22018.000       17263.345028
## 68          -1      110.0  ...    28248.000       26803.608809
## 159          0       95.7  ...     7788.000        8073.320872
## 165          1       94.5  ...     9298.000        8191.923735
## 110          0      114.2  ...    13860.000       16252.522012
## 128          3       89.5  ...    37028.000       36928.829809
## 12           0      101.2  ...    20970.000       20705.742517
## 21           1       93.7  ...     5572.000        6055.943636
## 42           1       96.5  ...    10345.000        9466.967616
## 130          0       96.1  ...     9295.000        9212.659927
## 49           0      102.0  ...    36000.000       43574.428363
## 79           1       93.0  ...     7689.000        6880.111293
## 154          0       95.7  ...     7898.000        5658.256731
## 108          0      107.9  ...    13200.000       17460.902367
## 200         -1      109.1  ...    16845.000       18151.056063
## 31           2       86.6  ...     6855.000        6711.324953
## 66           0      104.9  ...    18344.000       13153.365074
## 105          3       91.3  ...    19699.000       20616.772631
## 143          0       97.2  ...     9960.000       10143.100391
## 157          0       95.7  ...     7198.000        7086.247732
## 140          2       93.3  ...     7603.000        6595.536185
## 78           2       93.7  ...     6669.000        6867.096614
## 122          1       93.7  ...     7609.000        8522.700224
## 150          1       95.7  ...     5348.000        6253.756316
## 33           1       93.7  ...     6529.000        7107.528743
## 15           0      103.5  ...    30760.000       31173.315717
## 63           0       98.8  ...    10795.000       11409.835494
## 43           0       94.3  ...     6785.000        8449.787265
## 28          -1      103.3  ...     8921.000        9745.354715
## 9            0       99.5  ...    17859.167       20755.853449
## 97           1       94.5  ...     7999.000        5030.775769
## 95           1       94.5  ...     7799.000        5728.146330
## 147          0       97.0  ...    10198.000        8982.882890
## 81           3       96.3  ...     8499.000        9632.441263
## 
## [41 rows x 45 columns]

3.6.1.4 RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2518.6653145552323

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2518.6653145552323

3.6.2 Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 1270
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1270)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.6.2.1 Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 16
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 150
#plot = plot_tree(
#            decision_tree = modelo_ar,
#            feature_names = datos.drop(columns = "price").columns,
#            class_names   = 'price',
#            filled        = True,
#            impurity      = False,
#            fontsize      = 10,
#            precision     = 2,
#            ax            = ax
#       )
#plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos_dummis.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2544.00
## |   |   |--- curbweight <= 2295.00
## |   |   |   |--- curbweight <= 2121.00
## |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |--- carlength <= 156.50
## |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |--- carlength >  156.50
## |   |   |   |   |   |   |--- curbweight <= 1899.00
## |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |--- curbweight >  1899.00
## |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1947.50
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.22
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1927.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6575.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1927.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6695.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.22
## |   |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1947.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2087.50
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 53.25
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  53.25
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2087.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7738.00]
## |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |--- value: [8249.00]
## |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |--- horsepower <= 71.50
## |   |   |   |   |   |   |--- enginesize <= 84.50
## |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |--- value: [5399.00]
## |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |--- enginesize >  84.50
## |   |   |   |   |   |   |   |--- citympg <= 30.50
## |   |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |   |--- citympg >  30.50
## |   |   |   |   |   |   |   |   |--- peakrpm <= 5450.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1902.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1887.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1887.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6095.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1902.50
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 63.90
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  63.90
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6795.00]
## |   |   |   |   |   |   |   |   |--- peakrpm >  5450.00
## |   |   |   |   |   |   |   |   |   |--- citympg <= 34.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1910.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6377.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1910.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- citympg >  34.00
## |   |   |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |--- horsepower >  71.50
## |   |   |   |   |   |   |--- enginetype_ohcf <= 0.50
## |   |   |   |   |   |   |   |--- value: [7129.00]
## |   |   |   |   |   |   |--- enginetype_ohcf >  0.50
## |   |   |   |   |   |   |   |--- value: [7053.00]
## |   |   |   |--- curbweight >  2121.00
## |   |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |   |--- carwidth <= 64.10
## |   |   |   |   |   |   |--- value: [9980.00]
## |   |   |   |   |   |--- carwidth >  64.10
## |   |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |   |--- highwaympg <= 36.50
## |   |   |   |   |   |   |--- boreratio <= 3.23
## |   |   |   |   |   |   |   |--- curbweight <= 2282.00
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- citympg <= 27.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2243.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8195.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2243.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |   |   |   |   |--- citympg >  27.50
## |   |   |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8358.00]
## |   |   |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |   |   |--- value: [8558.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.21
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.21
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |--- curbweight >  2282.00
## |   |   |   |   |   |   |   |   |--- value: [9095.00]
## |   |   |   |   |   |   |--- boreratio >  3.23
## |   |   |   |   |   |   |   |--- highwaympg <= 32.50
## |   |   |   |   |   |   |   |   |--- value: [7463.00]
## |   |   |   |   |   |   |   |--- highwaympg >  32.50
## |   |   |   |   |   |   |   |   |--- stroke <= 3.00
## |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.00
## |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |--- highwaympg >  36.50
## |   |   |   |   |   |   |--- compressionratio <= 16.25
## |   |   |   |   |   |   |   |--- compressionratio <= 9.25
## |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |   |   |   |   |--- compressionratio >  9.25
## |   |   |   |   |   |   |   |   |--- value: [7126.00]
## |   |   |   |   |   |   |--- compressionratio >  16.25
## |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [7995.00]
## |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |--- curbweight >  2295.00
## |   |   |   |--- peakrpm <= 5350.00
## |   |   |   |   |--- wheelbase <= 96.95
## |   |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |   |--- curbweight <= 2385.00
## |   |   |   |   |   |   |   |--- value: [6989.00]
## |   |   |   |   |   |   |--- curbweight >  2385.00
## |   |   |   |   |   |   |   |--- boreratio <= 3.48
## |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.48
## |   |   |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |   |--- wheelbase >  96.95
## |   |   |   |   |   |--- curbweight <= 2412.00
## |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |--- curbweight <= 2313.00
## |   |   |   |   |   |   |   |   |--- value: [9549.00]
## |   |   |   |   |   |   |   |--- curbweight >  2313.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2355.50
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8949.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8948.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2355.50
## |   |   |   |   |   |   |   |   |   |--- carheight <= 54.90
## |   |   |   |   |   |   |   |   |   |   |--- value: [9233.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  54.90
## |   |   |   |   |   |   |   |   |   |   |--- value: [9370.00]
## |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |--- fuelsystem_idi <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [9720.00]
## |   |   |   |   |   |   |   |--- fuelsystem_idi >  0.50
## |   |   |   |   |   |   |   |   |--- value: [9495.00]
## |   |   |   |   |   |--- curbweight >  2412.00
## |   |   |   |   |   |   |--- curbweight <= 2522.50
## |   |   |   |   |   |   |   |--- curbweight <= 2419.50
## |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [10898.00]
## |   |   |   |   |   |   |   |--- curbweight >  2419.50
## |   |   |   |   |   |   |   |   |--- fuelsystem_idi <= 0.50
## |   |   |   |   |   |   |   |   |   |--- wheelbase <= 97.90
## |   |   |   |   |   |   |   |   |   |   |--- value: [11259.00]
## |   |   |   |   |   |   |   |   |   |--- wheelbase >  97.90
## |   |   |   |   |   |   |   |   |   |   |--- symboling <= -0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11248.00]
## |   |   |   |   |   |   |   |   |   |   |--- symboling >  -0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11245.00]
## |   |   |   |   |   |   |   |   |--- fuelsystem_idi >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [10698.00]
## |   |   |   |   |   |   |--- curbweight >  2522.50
## |   |   |   |   |   |   |   |--- curbweight <= 2538.00
## |   |   |   |   |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [9639.00]
## |   |   |   |   |   |   |   |--- curbweight >  2538.00
## |   |   |   |   |   |   |   |   |--- value: [8449.00]
## |   |   |   |--- peakrpm >  5350.00
## |   |   |   |   |--- carlength <= 176.00
## |   |   |   |   |   |--- citympg <= 20.00
## |   |   |   |   |   |   |--- curbweight <= 2382.50
## |   |   |   |   |   |   |   |--- value: [11395.00]
## |   |   |   |   |   |   |--- curbweight >  2382.50
## |   |   |   |   |   |   |   |--- fuelsystem_4bbl <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |   |   |--- fuelsystem_4bbl >  0.50
## |   |   |   |   |   |   |   |   |--- value: [13645.00]
## |   |   |   |   |   |--- citympg >  20.00
## |   |   |   |   |   |   |--- carwidth <= 63.25
## |   |   |   |   |   |   |   |--- value: [10295.00]
## |   |   |   |   |   |   |--- carwidth >  63.25
## |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |--- wheelbase <= 95.40
## |   |   |   |   |   |   |   |   |   |--- value: [9538.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  95.40
## |   |   |   |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |--- horsepower <= 101.00
## |   |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |   |   |--- horsepower >  101.00
## |   |   |   |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |--- carlength >  176.00
## |   |   |   |   |   |--- enginesize <= 108.50
## |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |--- value: [16430.00]
## |   |   |   |   |   |--- enginesize >  108.50
## |   |   |   |   |   |   |--- carwidth <= 66.25
## |   |   |   |   |   |   |   |--- value: [13950.00]
## |   |   |   |   |   |   |--- carwidth >  66.25
## |   |   |   |   |   |   |   |--- value: [15250.00]
## |   |--- curbweight >  2544.00
## |   |   |--- wheelbase <= 100.80
## |   |   |   |--- horsepower <= 153.00
## |   |   |   |   |--- citympg <= 22.00
## |   |   |   |   |   |--- cylindernumber_five <= 0.50
## |   |   |   |   |   |   |--- stroke <= 2.88
## |   |   |   |   |   |   |   |--- drivewheel_rwd <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [15040.00]
## |   |   |   |   |   |   |   |--- drivewheel_rwd >  0.50
## |   |   |   |   |   |   |   |   |--- value: [14997.50]
## |   |   |   |   |   |   |--- stroke >  2.88
## |   |   |   |   |   |   |   |--- curbweight <= 2726.50
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |   |   |   |--- curbweight >  2726.50
## |   |   |   |   |   |   |   |   |--- carheight <= 55.60
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2877.00
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.88
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.88
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2877.00
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 98.15
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  98.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |   |--- carheight >  55.60
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 25.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [14399.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  25.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [15510.00]
## |   |   |   |   |   |--- cylindernumber_five >  0.50
## |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |   |--- citympg >  22.00
## |   |   |   |   |   |--- wheelbase <= 95.85
## |   |   |   |   |   |   |--- value: [8778.00]
## |   |   |   |   |   |--- wheelbase >  95.85
## |   |   |   |   |   |   |--- curbweight <= 2854.50
## |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2557.00
## |   |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2557.00
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 30.50
## |   |   |   |   |   |   |   |   |   |   |--- boreratio <= 3.52
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11048.00]
## |   |   |   |   |   |   |   |   |   |   |--- boreratio >  3.52
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  30.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [12290.00]
## |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |   |--- curbweight >  2854.50
## |   |   |   |   |   |   |   |--- value: [17669.00]
## |   |   |   |--- horsepower >  153.00
## |   |   |   |   |--- carlength <= 174.85
## |   |   |   |   |   |--- highwaympg <= 25.50
## |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |--- highwaympg >  25.50
## |   |   |   |   |   |   |--- value: [16500.00]
## |   |   |   |   |--- carlength >  174.85
## |   |   |   |   |   |--- enginesize <= 151.00
## |   |   |   |   |   |   |--- value: [18620.00]
## |   |   |   |   |   |--- enginesize >  151.00
## |   |   |   |   |   |   |--- value: [18399.00]
## |   |   |--- wheelbase >  100.80
## |   |   |   |--- carheight <= 56.10
## |   |   |   |   |--- horsepower <= 141.00
## |   |   |   |   |   |--- curbweight <= 2983.00
## |   |   |   |   |   |   |--- horsepower <= 120.50
## |   |   |   |   |   |   |   |--- curbweight <= 2899.00
## |   |   |   |   |   |   |   |   |--- cylindernumber_four <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [17710.00]
## |   |   |   |   |   |   |   |   |--- cylindernumber_four >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [18280.00]
## |   |   |   |   |   |   |   |--- curbweight >  2899.00
## |   |   |   |   |   |   |   |   |--- value: [18920.00]
## |   |   |   |   |   |   |--- horsepower >  120.50
## |   |   |   |   |   |   |   |--- value: [21105.00]
## |   |   |   |   |   |--- curbweight >  2983.00
## |   |   |   |   |   |   |--- wheelbase <= 107.45
## |   |   |   |   |   |   |   |--- drivewheel_rwd <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [23875.00]
## |   |   |   |   |   |   |   |--- drivewheel_rwd >  0.50
## |   |   |   |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |   |   |   |--- wheelbase >  107.45
## |   |   |   |   |   |   |   |--- enginesize <= 159.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 3139.50
## |   |   |   |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  3139.50
## |   |   |   |   |   |   |   |   |   |--- value: [22470.00]
## |   |   |   |   |   |   |   |--- enginesize >  159.00
## |   |   |   |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |   |--- horsepower >  141.00
## |   |   |   |   |   |--- carwidth <= 68.15
## |   |   |   |   |   |   |--- enginetype_ohc <= 0.50
## |   |   |   |   |   |   |   |--- carlength <= 185.65
## |   |   |   |   |   |   |   |   |--- value: [15998.00]
## |   |   |   |   |   |   |   |--- carlength >  185.65
## |   |   |   |   |   |   |   |   |--- enginesize <= 166.00
## |   |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  166.00
## |   |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |   |--- enginetype_ohc >  0.50
## |   |   |   |   |   |   |   |--- value: [16503.00]
## |   |   |   |   |   |--- carwidth >  68.15
## |   |   |   |   |   |   |--- compressionratio <= 7.85
## |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |--- compressionratio >  7.85
## |   |   |   |   |   |   |   |--- value: [19045.00]
## |   |   |   |--- carheight >  56.10
## |   |   |   |   |--- aspiration_turbo <= 0.50
## |   |   |   |   |   |--- curbweight <= 3038.00
## |   |   |   |   |   |   |--- citympg <= 23.50
## |   |   |   |   |   |   |   |--- citympg <= 21.00
## |   |   |   |   |   |   |   |   |--- value: [11900.00]
## |   |   |   |   |   |   |   |--- citympg >  21.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2973.00
## |   |   |   |   |   |   |   |   |   |--- value: [12940.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2973.00
## |   |   |   |   |   |   |   |   |   |--- value: [13415.00]
## |   |   |   |   |   |   |--- citympg >  23.50
## |   |   |   |   |   |   |   |--- value: [15985.00]
## |   |   |   |   |   |--- curbweight >  3038.00
## |   |   |   |   |   |   |--- carheight <= 58.10
## |   |   |   |   |   |   |   |--- carlength <= 187.75
## |   |   |   |   |   |   |   |   |--- stroke <= 2.69
## |   |   |   |   |   |   |   |   |   |--- value: [15580.00]
## |   |   |   |   |   |   |   |   |--- stroke >  2.69
## |   |   |   |   |   |   |   |   |   |--- value: [16630.00]
## |   |   |   |   |   |   |   |--- carlength >  187.75
## |   |   |   |   |   |   |   |   |--- enginetype_ohc <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [16695.00]
## |   |   |   |   |   |   |   |   |--- enginetype_ohc >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [16515.00]
## |   |   |   |   |   |   |--- carheight >  58.10
## |   |   |   |   |   |   |   |--- value: [12440.00]
## |   |   |   |   |--- aspiration_turbo >  0.50
## |   |   |   |   |   |--- highwaympg <= 23.50
## |   |   |   |   |   |   |--- symboling <= -1.50
## |   |   |   |   |   |   |   |--- value: [18420.00]
## |   |   |   |   |   |   |--- symboling >  -1.50
## |   |   |   |   |   |   |   |--- value: [18950.00]
## |   |   |   |   |   |--- highwaympg >  23.50
## |   |   |   |   |   |   |--- highwaympg <= 29.00
## |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |   |   |   |--- highwaympg >  29.00
## |   |   |   |   |   |   |   |--- value: [17425.00]
## |--- enginesize >  182.00
## |   |--- compressionratio <= 8.05
## |   |   |--- carbody_sedan <= 0.50
## |   |   |   |--- value: [45400.00]
## |   |   |--- carbody_sedan >  0.50
## |   |   |   |--- carlength <= 195.40
## |   |   |   |   |--- value: [41315.00]
## |   |   |   |--- carlength >  195.40
## |   |   |   |   |--- carheight <= 56.50
## |   |   |   |   |   |--- value: [36880.00]
## |   |   |   |   |--- carheight >  56.50
## |   |   |   |   |   |--- value: [40960.00]
## |   |--- compressionratio >  8.05
## |   |   |--- fuelsystem_idi <= 0.50
## |   |   |   |--- carheight <= 50.65
## |   |   |   |   |--- value: [31400.50]
## |   |   |   |--- carheight >  50.65
## |   |   |   |   |--- carheight <= 51.20
## |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |--- carheight >  51.20
## |   |   |   |   |   |--- compressionratio <= 8.90
## |   |   |   |   |   |   |--- highwaympg <= 18.50
## |   |   |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |   |   |   |--- highwaympg >  18.50
## |   |   |   |   |   |   |   |--- value: [33900.00]
## |   |   |   |   |   |--- compressionratio >  8.90
## |   |   |   |   |   |   |--- value: [33278.00]
## |   |   |--- fuelsystem_idi >  0.50
## |   |   |   |--- carwidth <= 71.00
## |   |   |   |   |--- carlength <= 189.20
## |   |   |   |   |   |--- value: [28176.00]
## |   |   |   |   |--- carlength >  189.20
## |   |   |   |   |   |--- value: [25552.00]
## |   |   |   |--- carwidth >  71.00
## |   |   |   |   |--- value: [31600.00]

3.6.2.2 Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos_dummis.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##                 predictor   importancia
## 6              enginesize  6.714462e-01
## 5              curbweight  2.170050e-01
## 9        compressionratio  2.306875e-02
## 1               wheelbase  2.222260e-02
## 10             horsepower  1.377919e-02
## 4               carheight  1.242327e-02
## 2               carlength  7.705249e-03
## 11                peakrpm  6.253695e-03
## 12                citympg  6.199828e-03
## 38         fuelsystem_idi  5.362574e-03
## 15       aspiration_turbo  3.143578e-03
## 19          carbody_sedan  3.001312e-03
## 3                carwidth  2.608831e-03
## 13             highwaympg  1.874106e-03
## 30    cylindernumber_five  1.179311e-03
## 18      carbody_hatchback  1.153570e-03
## 8                  stroke  5.788389e-04
## 40        fuelsystem_mpfi  3.645312e-04
## 37        fuelsystem_4bbl  1.932777e-04
## 0               symboling  1.629297e-04
## 7               boreratio  1.055430e-04
## 16         doornumber_two  7.912552e-05
## 26         enginetype_ohc  3.610621e-05
## 22         drivewheel_rwd  2.501778e-05
## 31    cylindernumber_four  1.569898e-05
## 17        carbody_hardtop  1.150043e-05
## 27        enginetype_ohcf  2.790930e-07
## 20          carbody_wagon  1.022439e-07
## 39         fuelsystem_mfi  0.000000e+00
## 34  cylindernumber_twelve  0.000000e+00
## 36        fuelsystem_2bbl  0.000000e+00
## 41        fuelsystem_spdi  0.000000e+00
## 35     cylindernumber_two  0.000000e+00
## 21         drivewheel_fwd  0.000000e+00
## 33   cylindernumber_three  0.000000e+00
## 32     cylindernumber_six  0.000000e+00
## 29       enginetype_rotor  0.000000e+00
## 28        enginetype_ohcv  0.000000e+00
## 25           enginetype_l  0.000000e+00
## 24       enginetype_dohcv  0.000000e+00
## 23    enginelocation_rear  0.000000e+00
## 14           fueltype_gas  0.000000e+00
## 42        fuelsystem_spfi  0.000000e+00

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase

3.6.2.3 Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 6849. ,  8845. , 18620. , 15998. , 17450. ,  8845. ,  6338. ,
##        12764. , 25552. ,  7995. ,  9980. , 17075. , 33278. , 21105. ,
##         5572. ,  9095. , 12290. , 31400.5,  7957. ,  9095. , 17425. ,
##        18920. ,  7129. , 18280. , 17199. ,  8949. ,  5195. ,  7463. ,
##         6229. ,  7126. ,  6338. ,  7129. , 41315. , 10698. ,  6989. ,
##         8921. , 18620. ,  7349. ,  6338. , 11259. ,  6989. ])

3.6.2.4 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 91           1       94.5  ...     6649.000             6849.0
## 188          2       97.3  ...     9995.000             8845.0
## 136          3       99.1  ...    18150.000            18620.0
## 178          3      102.9  ...    16558.000            15998.0
## 191          0      100.4  ...    13295.000            17450.0
## 41           0       96.5  ...    12945.000             8845.0
## 138          2       93.7  ...     5118.000             6338.0
## 125          3       94.5  ...    22018.000            12764.0
## 68          -1      110.0  ...    28248.000            25552.0
## 159          0       95.7  ...     7788.000             7995.0
## 165          1       94.5  ...     9298.000             9980.0
## 110          0      114.2  ...    13860.000            17075.0
## 128          3       89.5  ...    37028.000            33278.0
## 12           0      101.2  ...    20970.000            21105.0
## 21           1       93.7  ...     5572.000             5572.0
## 42           1       96.5  ...    10345.000             9095.0
## 130          0       96.1  ...     9295.000            12290.0
## 49           0      102.0  ...    36000.000            31400.5
## 79           1       93.0  ...     7689.000             7957.0
## 154          0       95.7  ...     7898.000             9095.0
## 108          0      107.9  ...    13200.000            17425.0
## 200         -1      109.1  ...    16845.000            18920.0
## 31           2       86.6  ...     6855.000             7129.0
## 66           0      104.9  ...    18344.000            18280.0
## 105          3       91.3  ...    19699.000            17199.0
## 143          0       97.2  ...     9960.000             8949.0
## 157          0       95.7  ...     7198.000             5195.0
## 140          2       93.3  ...     7603.000             7463.0
## 78           2       93.7  ...     6669.000             6229.0
## 122          1       93.7  ...     7609.000             7126.0
## 150          1       95.7  ...     5348.000             6338.0
## 33           1       93.7  ...     6529.000             7129.0
## 15           0      103.5  ...    30760.000            41315.0
## 63           0       98.8  ...    10795.000            10698.0
## 43           0       94.3  ...     6785.000             6989.0
## 28          -1      103.3  ...     8921.000             8921.0
## 9            0       99.5  ...    17859.167            18620.0
## 97           1       94.5  ...     7999.000             7349.0
## 95           1       94.5  ...     7799.000             6338.0
## 147          0       97.0  ...    10198.000            11259.0
## 81           3       96.3  ...     8499.000             6989.0
## 
## [41 rows x 45 columns]

3.6.2.5 RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2887.266992410073

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2887.266992410073

3.6.2.6 Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 1270 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1270)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1270)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.6.2.7 Variables de importancia

# pendiente ... ...

3.6.2.8 Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 6773.475     ,  9918.9       , 16995.15      , 15886.15      ,
##        14693.25      , 13149.15      ,  6962.05      , 14531.25      ,
##        27242.7       ,  7503.9       ,  9446.25      , 16894.25      ,
##        33227.85      , 19542.75      ,  5712.9       ,  9644.6       ,
##        11217.05      , 34100.55      ,  8159.75      ,  7804.5       ,
##        17365.9       , 18593.5       ,  6773.375     , 11945.5       ,
##        17081.05      ,  8990.05      ,  7345.75      ,  8331.1       ,
##         6438.28333333,  7588.05      ,  6299.        ,  6901.2       ,
##        38980.7       , 10003.6       ,  9407.15      ,  9189.3       ,
##        17632.35      ,  7280.575     ,  6822.4       , 10253.95      ,
##         8232.        ])

3.6.2.9 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 91           1       94.5  ...     6649.000        6773.475000
## 188          2       97.3  ...     9995.000        9918.900000
## 136          3       99.1  ...    18150.000       16995.150000
## 178          3      102.9  ...    16558.000       15886.150000
## 191          0      100.4  ...    13295.000       14693.250000
## 41           0       96.5  ...    12945.000       13149.150000
## 138          2       93.7  ...     5118.000        6962.050000
## 125          3       94.5  ...    22018.000       14531.250000
## 68          -1      110.0  ...    28248.000       27242.700000
## 159          0       95.7  ...     7788.000        7503.900000
## 165          1       94.5  ...     9298.000        9446.250000
## 110          0      114.2  ...    13860.000       16894.250000
## 128          3       89.5  ...    37028.000       33227.850000
## 12           0      101.2  ...    20970.000       19542.750000
## 21           1       93.7  ...     5572.000        5712.900000
## 42           1       96.5  ...    10345.000        9644.600000
## 130          0       96.1  ...     9295.000       11217.050000
## 49           0      102.0  ...    36000.000       34100.550000
## 79           1       93.0  ...     7689.000        8159.750000
## 154          0       95.7  ...     7898.000        7804.500000
## 108          0      107.9  ...    13200.000       17365.900000
## 200         -1      109.1  ...    16845.000       18593.500000
## 31           2       86.6  ...     6855.000        6773.375000
## 66           0      104.9  ...    18344.000       11945.500000
## 105          3       91.3  ...    19699.000       17081.050000
## 143          0       97.2  ...     9960.000        8990.050000
## 157          0       95.7  ...     7198.000        7345.750000
## 140          2       93.3  ...     7603.000        8331.100000
## 78           2       93.7  ...     6669.000        6438.283333
## 122          1       93.7  ...     7609.000        7588.050000
## 150          1       95.7  ...     5348.000        6299.000000
## 33           1       93.7  ...     6529.000        6901.200000
## 15           0      103.5  ...    30760.000       38980.700000
## 63           0       98.8  ...    10795.000       10003.600000
## 43           0       94.3  ...     6785.000        9407.150000
## 28          -1      103.3  ...     8921.000        9189.300000
## 9            0       99.5  ...    17859.167       17632.350000
## 97           1       94.5  ...     7999.000        7280.575000
## 95           1       94.5  ...     7799.000        6822.400000
## 147          0       97.0  ...    10198.000       10253.950000
## 81           3       96.3  ...     8499.000        8232.000000
## 
## [41 rows x 45 columns]

3.6.2.10 RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2447.738067029221

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2447.738067029221

3.7 Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 91           1       94.5  ...                6849.0           6773.475000
## 188          2       97.3  ...                8845.0           9918.900000
## 136          3       99.1  ...               18620.0          16995.150000
## 178          3      102.9  ...               15998.0          15886.150000
## 191          0      100.4  ...               17450.0          14693.250000
## 41           0       96.5  ...                8845.0          13149.150000
## 138          2       93.7  ...                6338.0           6962.050000
## 125          3       94.5  ...               12764.0          14531.250000
## 68          -1      110.0  ...               25552.0          27242.700000
## 159          0       95.7  ...                7995.0           7503.900000
## 165          1       94.5  ...                9980.0           9446.250000
## 110          0      114.2  ...               17075.0          16894.250000
## 128          3       89.5  ...               33278.0          33227.850000
## 12           0      101.2  ...               21105.0          19542.750000
## 21           1       93.7  ...                5572.0           5712.900000
## 42           1       96.5  ...                9095.0           9644.600000
## 130          0       96.1  ...               12290.0          11217.050000
## 49           0      102.0  ...               31400.5          34100.550000
## 79           1       93.0  ...                7957.0           8159.750000
## 154          0       95.7  ...                9095.0           7804.500000
## 108          0      107.9  ...               17425.0          17365.900000
## 200         -1      109.1  ...               18920.0          18593.500000
## 31           2       86.6  ...                7129.0           6773.375000
## 66           0      104.9  ...               18280.0          11945.500000
## 105          3       91.3  ...               17199.0          17081.050000
## 143          0       97.2  ...                8949.0           8990.050000
## 157          0       95.7  ...                5195.0           7345.750000
## 140          2       93.3  ...                7463.0           8331.100000
## 78           2       93.7  ...                6229.0           6438.283333
## 122          1       93.7  ...                7126.0           7588.050000
## 150          1       95.7  ...                6338.0           6299.000000
## 33           1       93.7  ...                7129.0           6901.200000
## 15           0      103.5  ...               41315.0          38980.700000
## 63           0       98.8  ...               10698.0          10003.600000
## 43           0       94.3  ...                6989.0           9407.150000
## 28          -1      103.3  ...                8921.0           9189.300000
## 9            0       99.5  ...               18620.0          17632.350000
## 97           1       94.5  ...                7349.0           7280.575000
## 95           1       94.5  ...                6338.0           6822.400000
## 147          0       97.0  ...               11259.0          10253.950000
## 81           3       96.3  ...                6989.0           8232.000000
## 
## [41 rows x 47 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2518.66531456, 2887.26699241, 2447.73806703]])

Se construye data.frame a partir del rreglo nmpy

rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  2518.665315  2887.266992  2447.738067

4 Interpretación

Se cambio la semilla propuesta de 1271 a la 1270, esto con el fin de que fuera la misma usada en R, en el cual la semilla 1271 no lograba00 abarcara el maxima cantidad de etiquetas, generando un error. Por lo que se elige 1270 para poder hacer la comparación entre Python y R.

Con nuestra semilla el mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con un rmse de 2447.738067029221, esto usando 80% de datos de entrenamiento y 20% de validación. En orden de resultados los modelos quedaron de la siguiente manera:

  1. Random Forest 2447.738067029221
  2. Regresion multiple 2518.6653145552323
  3. Arbol de regresion 2887.266992410073

Para estos datos podemos concluir que para este analisis el mejor modelo tanto en Python como en R fue el del Random Forest, donde tenemos como mejor lenguaje para este caso fue el obtenido con R, con el valor de 1712.114, con una gran diferencia a lo obtenido en Python con un valor de 2447.738067029221