Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

Descripción

Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv

Desarrollo

Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
##      car_ID  symboling                   CarName  ... citympg highwaympg    price
## 0         1          3        alfa-romero giulia  ...      21         27  13495.0
## 1         2          3       alfa-romero stelvio  ...      21         27  16500.0
## 2         3          1  alfa-romero Quadrifoglio  ...      19         26  16500.0
## 3         4          2               audi 100 ls  ...      24         30  13950.0
## 4         5          2                audi 100ls  ...      18         22  17450.0
## ..      ...        ...                       ...  ...     ...        ...      ...
## 200     201         -1           volvo 145e (sw)  ...      23         28  16845.0
## 201     202         -1               volvo 144ea  ...      19         25  19045.0
## 202     203         -1               volvo 244dl  ...      18         23  21485.0
## 203     204         -1                 volvo 246  ...      26         27  22470.0
## 204     205         -1               volvo 264gl  ...      19         25  22625.0
## 
## [205 rows x 26 columns]

Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID                int64
## symboling             int64
## CarName              object
## fueltype             object
## aspiration           object
## doornumber           object
## carbody              object
## drivewheel           object
## enginelocation       object
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginetype           object
## cylindernumber       object
## enginesize            int64
## fuelsystem           object
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

Diccionario de datos

Col Nombre Descripción
1 Car_ID Unique id of each observation (Interger)
2 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3 carCompany Name of car company (Categorical)
4 fueltype Car fuel type i.e gas or diesel (Categorical)
5 aspiration Aspiration used in a car (Categorical) (Std o Turbo)
6 doornumber Number of doors in a car (Categorical). Puertas
7 carbody body of car (Categorical). (convertible, sedan, wagon …)
8 drivewheel type of drive wheel (Categorical). (hidráulica, manual, )
9 enginelocation Location of car engine (Categorical). Lugar del motor
10 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
11 carlength Length of car (Numeric). Longitud
12 carwidth Width of car (Numeric). Amplitud
13 carheight height of car (Numeric). Altura
14 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
15 enginetype Type of engine. (Categorical). Tipo de motor
16 cylindernumber cylinder placed in the car (Categorical). Cilindraje
17 enginesize Size of car (Numeric). Tamaño del carro en …
18 fuelsystem Fuel system of car (Categorical)
19 boreratio Boreratio of car (Numeric). Eficiencia de motor
20 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
21 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
22 horsepower Horsepower (Numeric). Poder del carro
23 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
24 citympg Mileage in city (Numeric). Consumo de gasolina
25 highwaympg Mileage on highway (Numeric). Consumo de gasolina
26

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~

Preparación de datos

Eliminar variables

Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName

datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
##      symboling fueltype aspiration  ... citympg highwaympg    price
## 0            3      gas        std  ...      21         27  13495.0
## 1            3      gas        std  ...      21         27  16500.0
## 2            1      gas        std  ...      19         26  16500.0
## 3            2      gas        std  ...      24         30  13950.0
## 4            2      gas        std  ...      18         22  17450.0
## ..         ...      ...        ...  ...     ...        ...      ...
## 200         -1      gas        std  ...      23         28  16845.0
## 201         -1      gas      turbo  ...      19         25  19045.0
## 202         -1      gas        std  ...      18         23  21485.0
## 203         -1   diesel      turbo  ...      26         27  22470.0
## 204         -1      gas      turbo  ...      19         25  22625.0
## 
## [205 rows x 24 columns]

Construir cariables Dummys

Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object

Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.

El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.

¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.

Ejemplo

genero
MASCULINO
FEMENINO
MASCULINO

Mismos datos con variables dummis

genero_masculino genero_femenino
1 0
0 1
1 0
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 0            3       88.6  ...                0                0
## 1            3       88.6  ...                0                0
## 2            1       94.5  ...                0                0
## 3            2       99.8  ...                0                0
## 4            2       99.4  ...                0                0
## ..         ...        ...  ...              ...              ...
## 200         -1      109.1  ...                0                0
## 201         -1      109.1  ...                0                0
## 202         -1      109.1  ...                0                0
## 203         -1      109.1  ...                0                0
## 204         -1      109.1  ...                0                0
## 
## [205 rows x 44 columns]

Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1301

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80,  random_state = 1301)

Datos de entrenamiento

X_entrena
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 14           1      103.5  ...                0                0
## 62           0       98.8  ...                0                0
## 97           1       94.5  ...                0                0
## 133          2       99.1  ...                0                0
## 178          3      102.9  ...                0                0
## ..         ...        ...  ...              ...              ...
## 177         -1      102.4  ...                0                0
## 81           3       96.3  ...                0                0
## 180         -1      104.5  ...                0                0
## 110          0      114.2  ...                0                0
## 171          2       98.4  ...                0                0
## 
## [164 rows x 43 columns]

Datos de validación

X_valida
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 46           2       96.0  ...                0                1
## 113          0      114.2  ...                0                0
## 167          2       98.4  ...                0                0
## 165          1       94.5  ...                0                0
## 108          0      107.9  ...                0                0
## 152          1       95.7  ...                0                0
## 172          2       98.4  ...                0                0
## 85           1       96.3  ...                0                0
## 57           3       95.3  ...                0                0
## 69           0      106.7  ...                0                0
## 99           0       97.2  ...                0                0
## 48           0      113.0  ...                0                0
## 157          0       95.7  ...                0                0
## 189          3       94.5  ...                0                0
## 176         -1      102.4  ...                0                0
## 22           1       93.7  ...                0                0
## 75           1      102.7  ...                0                0
## 93           1       94.5  ...                0                0
## 201         -1      109.1  ...                0                0
## 43           0       94.3  ...                0                0
## 155          0       95.7  ...                0                0
## 67          -1      110.0  ...                0                0
## 194         -2      104.3  ...                0                0
## 2            1       94.5  ...                0                0
## 135          2       99.1  ...                0                0
## 202         -1      109.1  ...                0                0
## 186          2       97.3  ...                0                0
## 3            2       99.8  ...                0                0
## 187          2       97.3  ...                0                0
## 137          2       99.1  ...                0                0
## 122          1       93.7  ...                0                0
## 156          0       95.7  ...                0                0
## 33           1       93.7  ...                0                0
## 124          3       95.9  ...                1                0
## 53           1       93.1  ...                0                0
## 129          1       98.4  ...                0                0
## 168          2       98.4  ...                0                0
## 16           0      103.5  ...                0                0
## 65           0      104.9  ...                0                0
## 204         -1      109.1  ...                0                0
## 144          0       97.0  ...                0                0
## 
## [41 rows x 43 columns]

Modelos Supervisados

Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
## LinearRegression()

Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 2.68100427e+02,  8.83316895e+01, -8.28706837e+01,  7.10879453e+02,
##         1.21425850e+02,  2.93759760e+00,  1.05105948e+02, -1.57740306e+02,
##        -4.08412330e+03, -8.07673133e+02,  9.81893755e+00,  1.83003485e+00,
##        -8.95468660e+01,  1.36029292e+02, -5.70417046e+03,  1.92899760e+03,
##        -6.94499927e+02, -2.64454009e+03, -4.07655513e+03, -3.42806111e+03,
##        -4.56924149e+03, -7.28468560e+01,  1.71611286e+03,  8.13040677e+03,
##         1.81898940e-12, -8.59616319e+02,  3.66269879e+03,  1.04029967e+03,
##        -4.72551415e+03, -4.47511307e+02, -9.23480706e+03, -1.05444418e+04,
##        -7.52553932e+03, -1.69130637e+03, -6.91151864e+03, -4.47511307e+02,
##        -3.52320017e+02, -2.95513432e+03,  5.70417046e+03, -3.86949329e+03,
##        -5.32096260e+02, -2.94563552e+03,  0.00000000e+00])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.9577914317431622 significa que las variables independientes explican aproximadamente el 95.77% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9577914317431622

Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [12411.20721105 15798.21690342 14661.82497634  7941.23455489
##  17584.63558621  7105.54865613 18705.64587837 10723.24523047
##  11409.68798802 27814.08645445 10216.5252933  29977.07589286
##   7486.66594653 13566.43747217  9521.71949889  5637.4299192
##  20744.1827087   5390.57143907 22271.4379705   8804.36683692
##   8823.2712081  27719.73250122 17330.58219065 10257.56239734
##  13664.18262819 19321.45525875 10283.54943227 10149.3437339
##  10395.25412964 13289.72030658  7917.90786352  8077.19240745
##   7328.5219638  15862.32300525  6858.32790485 37642.25596387
##  14650.07458593 29157.11662548 17969.74012867 21465.90853588]

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 46           2       96.0  ...      11048.0       12411.207211
## 113          0      114.2  ...      16695.0       15798.216903
## 167          2       98.4  ...       8449.0       14661.824976
## 165          1       94.5  ...       9298.0        7941.234555
## 108          0      107.9  ...      13200.0       17584.635586
## 152          1       95.7  ...       6488.0        7105.548656
## 172          2       98.4  ...      17669.0       18705.645878
## 85           1       96.3  ...       6989.0       10723.245230
## 57           3       95.3  ...      13645.0       11409.687988
## 69           0      106.7  ...      28176.0       27814.086454
## 99           0       97.2  ...       8949.0       10216.525293
## 48           0      113.0  ...      35550.0       29977.075893
## 157          0       95.7  ...       7198.0        7486.665947
## 189          3       94.5  ...      11595.0       13566.437472
## 176         -1      102.4  ...      10898.0        9521.719499
## 22           1       93.7  ...       6377.0        5637.429919
## 75           1      102.7  ...      16503.0       20744.182709
## 93           1       94.5  ...       7349.0        5390.571439
## 201         -1      109.1  ...      19045.0       22271.437971
## 43           0       94.3  ...       6785.0        8804.366837
## 155          0       95.7  ...       8778.0        8823.271208
## 67          -1      110.0  ...      25552.0       27719.732501
## 194         -2      104.3  ...      12940.0       17330.582191
## 2            1       94.5  ...      16500.0       10257.562397
## 135          2       99.1  ...      15510.0       13664.182628
## 202         -1      109.1  ...      21485.0       19321.455259
## 186          2       97.3  ...       8495.0       10283.549432
## 3            2       99.8  ...      13950.0       10149.343734
## 187          2       97.3  ...       9495.0       10395.254130
## 137          2       99.1  ...      18620.0       13289.720307
## 122          1       93.7  ...       7609.0        7917.907864
## 156          0       95.7  ...       6938.0        8077.192407
## 33           1       93.7  ...       6529.0        7328.521964
## 124          3       95.9  ...      12764.0       15862.323005
## 53           1       93.1  ...       6695.0        6858.327905
## 129          1       98.4  ...      31400.5       37642.255964
## 168          2       98.4  ...       9639.0       14650.074586
## 16           0      103.5  ...      41315.0       29157.116625
## 65           0      104.9  ...      18280.0       17969.740129
## 204         -1      109.1  ...      22625.0       21465.908536
## 144          0       97.0  ...       9233.0        8530.667361
## 
## [41 rows x 45 columns]

RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3497.7989684360423

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3497.7989684360423

Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 1301
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
## DecisionTreeRegressor(random_state=1301)

Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 151
#plot = plot_tree(
#            decision_tree = modelo_ar,
#            feature_names = datos.drop(columns = "price").columns,
#            class_names   = 'price',
#            filled        = True,
#            impurity      = False,
#            fontsize      = 10,
#            precision     = 2,
#            ax            = ax
#       )
#plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos_dummis.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2697.50
## |   |   |--- curbweight <= 2291.50
## |   |   |   |--- curbweight <= 2072.00
## |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |--- carlength <= 156.50
## |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |--- carlength >  156.50
## |   |   |   |   |   |   |--- curbweight <= 1928.00
## |   |   |   |   |   |   |   |--- curbweight <= 1899.00
## |   |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |   |--- curbweight >  1899.00
## |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6575.00]
## |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [6649.00]
## |   |   |   |   |   |   |--- curbweight >  1928.00
## |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |--- carheight <= 55.90
## |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7999.00]
## |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8249.00]
## |   |   |   |   |   |   |   |   |--- carheight >  55.90
## |   |   |   |   |   |   |   |   |   |--- value: [7295.00]
## |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |--- carlength <= 167.05
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1944.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1944.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1980.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1980.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |--- carlength >  167.05
## |   |   |   |   |   |   |   |   |   |--- value: [6692.00]
## |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |--- stroke <= 3.26
## |   |   |   |   |   |   |--- highwaympg <= 38.50
## |   |   |   |   |   |   |   |--- highwaympg <= 37.00
## |   |   |   |   |   |   |   |   |--- citympg <= 30.50
## |   |   |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |   |   |--- citympg >  30.50
## |   |   |   |   |   |   |   |   |   |--- value: [5118.00]
## |   |   |   |   |   |   |   |--- highwaympg >  37.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 1902.50
## |   |   |   |   |   |   |   |   |   |--- value: [6095.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1902.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1924.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [6795.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1924.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1985.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1985.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |--- highwaympg >  38.50
## |   |   |   |   |   |   |   |--- horsepower <= 69.00
## |   |   |   |   |   |   |   |   |--- cylindernumber_three <= 0.50
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 91.00
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg <= 41.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg >  41.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5399.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  91.00
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5150.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5348.00]
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  5150.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |   |   |   |--- cylindernumber_three >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |   |--- horsepower >  69.00
## |   |   |   |   |   |   |   |   |--- value: [6295.00]
## |   |   |   |   |   |--- stroke >  3.26
## |   |   |   |   |   |   |--- boreratio <= 3.03
## |   |   |   |   |   |   |   |--- curbweight <= 1766.00
## |   |   |   |   |   |   |   |   |--- value: [6479.00]
## |   |   |   |   |   |   |   |--- curbweight >  1766.00
## |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |--- value: [7129.00]
## |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |--- value: [6855.00]
## |   |   |   |   |   |   |--- boreratio >  3.03
## |   |   |   |   |   |   |   |--- value: [7799.00]
## |   |   |   |--- curbweight >  2072.00
## |   |   |   |   |--- symboling <= 2.50
## |   |   |   |   |   |--- enginetype_ohcf <= 0.50
## |   |   |   |   |   |   |--- highwaympg <= 35.00
## |   |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2262.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2262.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9095.00]
## |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |   |--- value: [8558.00]
## |   |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |   |--- wheelbase <= 93.35
## |   |   |   |   |   |   |   |   |   |   |--- value: [7689.00]
## |   |   |   |   |   |   |   |   |   |--- wheelbase >  93.35
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5375.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  5375.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7957.00]
## |   |   |   |   |   |   |--- highwaympg >  35.00
## |   |   |   |   |   |   |   |--- citympg <= 32.50
## |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |   |   |   |   |--- citympg >  32.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2262.50
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 46.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  46.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7738.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2262.50
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 46.50
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 96.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  96.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7995.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  46.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7788.00]
## |   |   |   |   |   |--- enginetype_ohcf >  0.50
## |   |   |   |   |   |   |--- curbweight <= 2167.50
## |   |   |   |   |   |   |   |--- horsepower <= 77.50
## |   |   |   |   |   |   |   |   |--- value: [7053.00]
## |   |   |   |   |   |   |   |--- horsepower >  77.50
## |   |   |   |   |   |   |   |   |--- value: [7126.00]
## |   |   |   |   |   |   |--- curbweight >  2167.50
## |   |   |   |   |   |   |   |--- compressionratio <= 9.25
## |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [7463.00]
## |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [7603.00]
## |   |   |   |   |   |   |   |--- compressionratio >  9.25
## |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |--- symboling >  2.50
## |   |   |   |   |   |--- value: [9980.00]
## |   |   |--- curbweight >  2291.50
## |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |--- wheelbase <= 99.45
## |   |   |   |   |   |--- highwaympg <= 28.50
## |   |   |   |   |   |   |--- horsepower <= 110.50
## |   |   |   |   |   |   |   |--- boreratio <= 3.24
## |   |   |   |   |   |   |   |   |--- value: [12945.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.24
## |   |   |   |   |   |   |   |   |--- cylindernumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |   |   |   |   |--- cylindernumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [11395.00]
## |   |   |   |   |   |   |--- horsepower >  110.50
## |   |   |   |   |   |   |   |--- boreratio <= 3.54
## |   |   |   |   |   |   |   |   |--- horsepower <= 123.00
## |   |   |   |   |   |   |   |   |   |--- value: [14997.50]
## |   |   |   |   |   |   |   |   |--- horsepower >  123.00
## |   |   |   |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.54
## |   |   |   |   |   |   |   |   |--- value: [11694.00]
## |   |   |   |   |   |--- highwaympg >  28.50
## |   |   |   |   |   |   |--- wheelbase <= 96.95
## |   |   |   |   |   |   |   |--- drivewheel_rwd <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |   |   |   |   |   |--- drivewheel_rwd >  0.50
## |   |   |   |   |   |   |   |   |--- value: [9538.00]
## |   |   |   |   |   |   |--- wheelbase >  96.95
## |   |   |   |   |   |   |   |--- value: [11259.00]
## |   |   |   |   |--- wheelbase >  99.45
## |   |   |   |   |   |--- highwaympg <= 24.50
## |   |   |   |   |   |   |--- value: [13295.00]
## |   |   |   |   |   |--- highwaympg >  24.50
## |   |   |   |   |   |   |--- cylindernumber_five <= 0.50
## |   |   |   |   |   |   |   |--- symboling <= 1.00
## |   |   |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |   |   |   |   |--- symboling >  1.00
## |   |   |   |   |   |   |   |   |--- value: [16430.00]
## |   |   |   |   |   |   |--- cylindernumber_five >  0.50
## |   |   |   |   |   |   |   |--- value: [15250.00]
## |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |--- carwidth <= 66.75
## |   |   |   |   |   |--- compressionratio <= 8.55
## |   |   |   |   |   |   |--- peakrpm <= 5100.00
## |   |   |   |   |   |   |   |--- wheelbase <= 99.80
## |   |   |   |   |   |   |   |   |--- curbweight <= 2366.50
## |   |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2366.50
## |   |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |   |--- wheelbase >  99.80
## |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |   |--- peakrpm >  5100.00
## |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |--- citympg <= 25.00
## |   |   |   |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |   |   |   |   |--- citympg >  25.00
## |   |   |   |   |   |   |   |   |   |--- value: [9549.00]
## |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |   |   |--- compressionratio >  8.55
## |   |   |   |   |   |   |--- curbweight <= 2419.50
## |   |   |   |   |   |   |   |--- carlength <= 173.70
## |   |   |   |   |   |   |   |   |--- fuelsystem_2bbl <= 0.50
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 97.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9960.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  97.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9995.00]
## |   |   |   |   |   |   |   |   |--- fuelsystem_2bbl >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [10345.00]
## |   |   |   |   |   |   |   |--- carlength >  173.70
## |   |   |   |   |   |   |   |   |--- curbweight <= 2349.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 54.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  54.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8948.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2349.00
## |   |   |   |   |   |   |   |   |   |--- boreratio <= 3.35
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5000.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  5000.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [10295.00]
## |   |   |   |   |   |   |   |   |   |--- boreratio >  3.35
## |   |   |   |   |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9370.00]
## |   |   |   |   |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9720.00]
## |   |   |   |   |   |   |--- curbweight >  2419.50
## |   |   |   |   |   |   |   |--- peakrpm <= 4950.00
## |   |   |   |   |   |   |   |   |--- compressionratio <= 9.00
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 88.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [11245.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  88.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [11248.00]
## |   |   |   |   |   |   |   |   |--- compressionratio >  9.00
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- boreratio <= 3.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- boreratio >  3.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |   |   |   |--- peakrpm >  4950.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2519.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2457.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [10198.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2457.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2519.50
## |   |   |   |   |   |   |   |   |   |--- value: [9295.00]
## |   |   |   |   |--- carwidth >  66.75
## |   |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |   |--- value: [12290.00]
## |   |--- curbweight >  2697.50
## |   |   |--- boreratio <= 3.37
## |   |   |   |--- peakrpm <= 5000.00
## |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |   |--- value: [22470.00]
## |   |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |   |--- curbweight <= 2737.50
## |   |   |   |   |   |   |   |--- value: [20970.00]
## |   |   |   |   |   |   |--- curbweight >  2737.50
## |   |   |   |   |   |   |   |--- value: [21105.00]
## |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |--- peakrpm >  5000.00
## |   |   |   |   |--- highwaympg <= 21.00
## |   |   |   |   |   |--- value: [23875.00]
## |   |   |   |   |--- highwaympg >  21.00
## |   |   |   |   |   |--- cylindernumber_five <= 0.50
## |   |   |   |   |   |   |--- carlength <= 185.05
## |   |   |   |   |   |   |   |--- citympg <= 19.50
## |   |   |   |   |   |   |   |   |--- value: [15998.00]
## |   |   |   |   |   |   |   |--- citympg >  19.50
## |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |   |   |   |   |--- carlength >  185.05
## |   |   |   |   |   |   |   |--- curbweight <= 2919.00
## |   |   |   |   |   |   |   |   |--- value: [15040.00]
## |   |   |   |   |   |   |   |--- curbweight >  2919.00
## |   |   |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |--- cylindernumber_five >  0.50
## |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |--- carlength <= 177.40
## |   |   |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |   |   |   |   |--- carlength >  177.40
## |   |   |   |   |   |   |   |   |--- carheight <= 53.85
## |   |   |   |   |   |   |   |   |   |--- value: [17859.17]
## |   |   |   |   |   |   |   |   |--- carheight >  53.85
## |   |   |   |   |   |   |   |   |   |--- value: [17710.00]
## |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |--- value: [18920.00]
## |   |   |--- boreratio >  3.37
## |   |   |   |--- horsepower <= 156.00
## |   |   |   |   |--- peakrpm <= 5450.00
## |   |   |   |   |   |--- compressionratio <= 9.40
## |   |   |   |   |   |   |--- curbweight <= 2877.00
## |   |   |   |   |   |   |   |--- curbweight <= 2762.50
## |   |   |   |   |   |   |   |   |--- value: [11549.00]
## |   |   |   |   |   |   |   |--- curbweight >  2762.50
## |   |   |   |   |   |   |   |   |--- fuelsystem_mfi <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [12629.00]
## |   |   |   |   |   |   |   |   |--- fuelsystem_mfi >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |--- curbweight >  2877.00
## |   |   |   |   |   |   |   |--- carheight <= 57.70
## |   |   |   |   |   |   |   |   |--- curbweight <= 3067.50
## |   |   |   |   |   |   |   |   |   |--- carwidth <= 67.45
## |   |   |   |   |   |   |   |   |   |   |--- cylindernumber_six <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- cylindernumber_six >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |   |   |--- carwidth >  67.45
## |   |   |   |   |   |   |   |   |   |   |--- value: [11900.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  3067.50
## |   |   |   |   |   |   |   |   |   |--- cylindernumber_four <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- carlength <= 183.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |   |   |   |--- carlength >  183.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [14399.00]
## |   |   |   |   |   |   |   |   |   |--- cylindernumber_four >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 2.69
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15580.00]
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  2.69
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16630.00]
## |   |   |   |   |   |   |   |--- carheight >  57.70
## |   |   |   |   |   |   |   |   |--- value: [12440.00]
## |   |   |   |   |   |--- compressionratio >  9.40
## |   |   |   |   |   |   |--- carlength <= 187.75
## |   |   |   |   |   |   |   |--- carlength <= 180.85
## |   |   |   |   |   |   |   |   |--- value: [18344.00]
## |   |   |   |   |   |   |   |--- carlength >  180.85
## |   |   |   |   |   |   |   |   |--- value: [17425.00]
## |   |   |   |   |   |   |--- carlength >  187.75
## |   |   |   |   |   |   |   |--- curbweight <= 3457.50
## |   |   |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |   |   |--- symboling <= -1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [15985.00]
## |   |   |   |   |   |   |   |   |   |--- symboling >  -1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [16845.00]
## |   |   |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3038.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [13415.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3038.00
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 58.10
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16515.00]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  58.10
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13860.00]
## |   |   |   |   |   |   |   |--- curbweight >  3457.50
## |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |   |--- peakrpm >  5450.00
## |   |   |   |   |   |--- highwaympg <= 25.50
## |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |--- highwaympg >  25.50
## |   |   |   |   |   |   |--- value: [22018.00]
## |   |   |   |--- horsepower >  156.00
## |   |   |   |   |--- compressionratio <= 8.40
## |   |   |   |   |   |--- cylindernumber_six <= 0.50
## |   |   |   |   |   |   |--- carheight <= 56.85
## |   |   |   |   |   |   |   |--- value: [18420.00]
## |   |   |   |   |   |   |--- carheight >  56.85
## |   |   |   |   |   |   |   |--- value: [18950.00]
## |   |   |   |   |   |--- cylindernumber_six >  0.50
## |   |   |   |   |   |   |--- value: [19699.00]
## |   |   |   |   |--- compressionratio >  8.40
## |   |   |   |   |   |--- carlength <= 174.60
## |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |--- carlength >  174.60
## |   |   |   |   |   |   |--- boreratio <= 3.49
## |   |   |   |   |   |   |   |--- value: [18399.00]
## |   |   |   |   |   |   |--- boreratio >  3.49
## |   |   |   |   |   |   |   |--- value: [18150.00]
## |--- enginesize >  182.00
## |   |--- highwaympg <= 16.50
## |   |   |--- carbody_sedan <= 0.50
## |   |   |   |--- value: [45400.00]
## |   |   |--- carbody_sedan >  0.50
## |   |   |   |--- value: [40960.00]
## |   |--- highwaympg >  16.50
## |   |   |--- stroke <= 3.52
## |   |   |   |--- citympg <= 15.50
## |   |   |   |   |--- boreratio <= 3.58
## |   |   |   |   |   |--- value: [36000.00]
## |   |   |   |   |--- boreratio >  3.58
## |   |   |   |   |   |--- value: [36880.00]
## |   |   |   |--- citympg >  15.50
## |   |   |   |   |--- stroke <= 3.25
## |   |   |   |   |   |--- carbody_hardtop <= 0.50
## |   |   |   |   |   |   |--- stroke <= 3.00
## |   |   |   |   |   |   |   |--- value: [37028.00]
## |   |   |   |   |   |   |--- stroke >  3.00
## |   |   |   |   |   |   |   |--- carheight <= 53.65
## |   |   |   |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |   |   |   |--- carheight >  53.65
## |   |   |   |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |   |   |--- carbody_hardtop >  0.50
## |   |   |   |   |   |   |--- value: [33278.00]
## |   |   |   |   |--- stroke >  3.25
## |   |   |   |   |   |--- value: [30760.00]
## |   |   |--- stroke >  3.52
## |   |   |   |--- carheight <= 57.50
## |   |   |   |   |--- highwaympg <= 22.00
## |   |   |   |   |   |--- value: [32250.00]
## |   |   |   |   |--- highwaympg >  22.00
## |   |   |   |   |   |--- value: [31600.00]
## |   |   |   |--- carheight >  57.50
## |   |   |   |   |--- value: [28248.00]

Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos_dummis.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##                 predictor  importancia
## 6              enginesize     0.671667
## 5              curbweight     0.240145
## 13             highwaympg     0.033713
## 11                peakrpm     0.011829
## 7               boreratio     0.009099
## 10             horsepower     0.006099
## 8                  stroke     0.005291
## 9        compressionratio     0.004247
## 1               wheelbase     0.003702
## 3                carwidth     0.002414
## 2               carlength     0.001792
## 4               carheight     0.001682
## 18      carbody_hatchback     0.001486
## 0               symboling     0.001193
## 30    cylindernumber_five     0.001172
## 19          carbody_sedan     0.001149
## 12                citympg     0.001021
## 17        carbody_hardtop     0.000544
## 20          carbody_wagon     0.000506
## 31    cylindernumber_four     0.000458
## 40        fuelsystem_mpfi     0.000254
## 27        enginetype_ohcf     0.000171
## 32     cylindernumber_six     0.000159
## 22         drivewheel_rwd     0.000115
## 35     cylindernumber_two     0.000037
## 16         doornumber_two     0.000033
## 36        fuelsystem_2bbl     0.000009
## 33   cylindernumber_three     0.000008
## 39         fuelsystem_mfi     0.000006
## 38         fuelsystem_idi     0.000000
## 37        fuelsystem_4bbl     0.000000
## 34  cylindernumber_twelve     0.000000
## 41        fuelsystem_spdi     0.000000
## 21         drivewheel_fwd     0.000000
## 29       enginetype_rotor     0.000000
## 28        enginetype_ohcv     0.000000
## 26         enginetype_ohc     0.000000
## 25           enginetype_l     0.000000
## 24       enginetype_dohcv     0.000000
## 23    enginelocation_rear     0.000000
## 15       aspiration_turbo     0.000000
## 14           fueltype_gas     0.000000
## 42        fuelsystem_spfi     0.000000

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase

Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([11549., 15580., 11199.,  7957., 17425.,  6338., 14489.,  8499.,
##        11395., 31600.,  9959., 32250.,  6918.,  9980.,  9988.,  6095.,
##        18420.,  7999., 18150.,  9538., 22470., 31600., 15985., 24565.,
##        11549., 18150.,  8195.,  8845.,  9960., 18150.,  6918.,  6918.,
##         7129., 12629.,  7395., 35056., 11199., 30760., 16925., 16845.,
##        12170.])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 46           2       96.0  ...      11048.0            11549.0
## 113          0      114.2  ...      16695.0            15580.0
## 167          2       98.4  ...       8449.0            11199.0
## 165          1       94.5  ...       9298.0             7957.0
## 108          0      107.9  ...      13200.0            17425.0
## 152          1       95.7  ...       6488.0             6338.0
## 172          2       98.4  ...      17669.0            14489.0
## 85           1       96.3  ...       6989.0             8499.0
## 57           3       95.3  ...      13645.0            11395.0
## 69           0      106.7  ...      28176.0            31600.0
## 99           0       97.2  ...       8949.0             9959.0
## 48           0      113.0  ...      35550.0            32250.0
## 157          0       95.7  ...       7198.0             6918.0
## 189          3       94.5  ...      11595.0             9980.0
## 176         -1      102.4  ...      10898.0             9988.0
## 22           1       93.7  ...       6377.0             6095.0
## 75           1      102.7  ...      16503.0            18420.0
## 93           1       94.5  ...       7349.0             7999.0
## 201         -1      109.1  ...      19045.0            18150.0
## 43           0       94.3  ...       6785.0             9538.0
## 155          0       95.7  ...       8778.0            22470.0
## 67          -1      110.0  ...      25552.0            31600.0
## 194         -2      104.3  ...      12940.0            15985.0
## 2            1       94.5  ...      16500.0            24565.0
## 135          2       99.1  ...      15510.0            11549.0
## 202         -1      109.1  ...      21485.0            18150.0
## 186          2       97.3  ...       8495.0             8195.0
## 3            2       99.8  ...      13950.0             8845.0
## 187          2       97.3  ...       9495.0             9960.0
## 137          2       99.1  ...      18620.0            18150.0
## 122          1       93.7  ...       7609.0             6918.0
## 156          0       95.7  ...       6938.0             6918.0
## 33           1       93.7  ...       6529.0             7129.0
## 124          3       95.9  ...      12764.0            12629.0
## 53           1       93.1  ...       6695.0             7395.0
## 129          1       98.4  ...      31400.5            35056.0
## 168          2       98.4  ...       9639.0            11199.0
## 16           0      103.5  ...      41315.0            30760.0
## 65           0      104.9  ...      18280.0            16925.0
## 204         -1      109.1  ...      22625.0            16845.0
## 144          0       97.0  ...       9233.0            12170.0
## 
## [41 rows x 45 columns]

RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 3857.4438625129756

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 3857.4438625129756

Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1301)
modelo_rf.fit(X_entrena, Y_entrena)
## RandomForestRegressor(n_estimators=20, random_state=1301)

Variables de importancia

# pendiente ... ...

Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([13798.13335   , 13813.        , 11036.2       ,  9192.7       ,
##        17365.325     ,  6500.        , 13438.6       ,  9022.1       ,
##        11398.75      , 28760.95      ,  9612.95      , 33779.9       ,
##         8088.3       ,  9816.1       ,  9840.79166667,  5937.8       ,
##        18121.2167    ,  7630.1       , 18782.15835   , 12100.65      ,
##        13827.        , 28033.825     , 15846.95      , 16833.15      ,
##        14018.95835   , 18472.4167    ,  8284.7       , 10410.65      ,
##        10500.75      , 17249.35      ,  7773.25      ,  7815.95      ,
##         6953.15      , 15182.4       ,  7070.7       , 37306.8       ,
##        11036.2       , 32506.1       , 14871.45      , 17622.5       ,
##        10814.15      ])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 46           2       96.0  ...      11048.0       13798.133350
## 113          0      114.2  ...      16695.0       13813.000000
## 167          2       98.4  ...       8449.0       11036.200000
## 165          1       94.5  ...       9298.0        9192.700000
## 108          0      107.9  ...      13200.0       17365.325000
## 152          1       95.7  ...       6488.0        6500.000000
## 172          2       98.4  ...      17669.0       13438.600000
## 85           1       96.3  ...       6989.0        9022.100000
## 57           3       95.3  ...      13645.0       11398.750000
## 69           0      106.7  ...      28176.0       28760.950000
## 99           0       97.2  ...       8949.0        9612.950000
## 48           0      113.0  ...      35550.0       33779.900000
## 157          0       95.7  ...       7198.0        8088.300000
## 189          3       94.5  ...      11595.0        9816.100000
## 176         -1      102.4  ...      10898.0        9840.791667
## 22           1       93.7  ...       6377.0        5937.800000
## 75           1      102.7  ...      16503.0       18121.216700
## 93           1       94.5  ...       7349.0        7630.100000
## 201         -1      109.1  ...      19045.0       18782.158350
## 43           0       94.3  ...       6785.0       12100.650000
## 155          0       95.7  ...       8778.0       13827.000000
## 67          -1      110.0  ...      25552.0       28033.825000
## 194         -2      104.3  ...      12940.0       15846.950000
## 2            1       94.5  ...      16500.0       16833.150000
## 135          2       99.1  ...      15510.0       14018.958350
## 202         -1      109.1  ...      21485.0       18472.416700
## 186          2       97.3  ...       8495.0        8284.700000
## 3            2       99.8  ...      13950.0       10410.650000
## 187          2       97.3  ...       9495.0       10500.750000
## 137          2       99.1  ...      18620.0       17249.350000
## 122          1       93.7  ...       7609.0        7773.250000
## 156          0       95.7  ...       6938.0        7815.950000
## 33           1       93.7  ...       6529.0        6953.150000
## 124          3       95.9  ...      12764.0       15182.400000
## 53           1       93.1  ...       6695.0        7070.700000
## 129          1       98.4  ...      31400.5       37306.800000
## 168          2       98.4  ...       9639.0       11036.200000
## 16           0      103.5  ...      41315.0       32506.100000
## 65           0      104.9  ...      18280.0       14871.450000
## 204         -1      109.1  ...      22625.0       17622.500000
## 144          0       97.0  ...       9233.0       10814.150000
## 
## [41 rows x 45 columns]

RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2857.2700474092694

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2857.2700474092694

Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 46           2       96.0  ...               11549.0          13798.133350
## 113          0      114.2  ...               15580.0          13813.000000
## 167          2       98.4  ...               11199.0          11036.200000
## 165          1       94.5  ...                7957.0           9192.700000
## 108          0      107.9  ...               17425.0          17365.325000
## 152          1       95.7  ...                6338.0           6500.000000
## 172          2       98.4  ...               14489.0          13438.600000
## 85           1       96.3  ...                8499.0           9022.100000
## 57           3       95.3  ...               11395.0          11398.750000
## 69           0      106.7  ...               31600.0          28760.950000
## 99           0       97.2  ...                9959.0           9612.950000
## 48           0      113.0  ...               32250.0          33779.900000
## 157          0       95.7  ...                6918.0           8088.300000
## 189          3       94.5  ...                9980.0           9816.100000
## 176         -1      102.4  ...                9988.0           9840.791667
## 22           1       93.7  ...                6095.0           5937.800000
## 75           1      102.7  ...               18420.0          18121.216700
## 93           1       94.5  ...                7999.0           7630.100000
## 201         -1      109.1  ...               18150.0          18782.158350
## 43           0       94.3  ...                9538.0          12100.650000
## 155          0       95.7  ...               22470.0          13827.000000
## 67          -1      110.0  ...               31600.0          28033.825000
## 194         -2      104.3  ...               15985.0          15846.950000
## 2            1       94.5  ...               24565.0          16833.150000
## 135          2       99.1  ...               11549.0          14018.958350
## 202         -1      109.1  ...               18150.0          18472.416700
## 186          2       97.3  ...                8195.0           8284.700000
## 3            2       99.8  ...                8845.0          10410.650000
## 187          2       97.3  ...                9960.0          10500.750000
## 137          2       99.1  ...               18150.0          17249.350000
## 122          1       93.7  ...                6918.0           7773.250000
## 156          0       95.7  ...                6918.0           7815.950000
## 33           1       93.7  ...                7129.0           6953.150000
## 124          3       95.9  ...               12629.0          15182.400000
## 53           1       93.1  ...                7395.0           7070.700000
## 129          1       98.4  ...               35056.0          37306.800000
## 168          2       98.4  ...               11199.0          11036.200000
## 16           0      103.5  ...               30760.0          32506.100000
## 65           0      104.9  ...               16925.0          14871.450000
## 204         -1      109.1  ...               16845.0          17622.500000
## 144          0       97.0  ...               12170.0          10814.150000
## 
## [41 rows x 47 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3497.79896844, 3857.44386251, 2857.27004741]])

Se construye data.frame a partir del rreglo nmpy

rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  3497.798968  3857.443863  2857.270047

Interpretación

Se cargaron datos de precios de automóviles basados en todas variables tanto numéricas como categóricas.

Los predictoes importantes en el modelo de Arbol de regresión son:

enginesize 0.663199 curbweight 0.255530 citympg 0.014422 peakrpm 0.012998 wheelbase 0.012245

Se utilizó la semilla “1301”.

El modelo de regresión linea múltiple destaca variables estadísticamente significativas

-El resultado de RMSE del modelo RM es de: 3497.798968

-El resultado de RMSE del modelo AR es de: 3857.443863

-El resultado de RMSE del modelo RF es de: 2857.270047

El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios, con un valor de 2857.270047, con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%.