1 Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

2 Descripción

Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv

  • Participan todas las variables del conjunto de datos.

  • Se crean datos de entrenamiento al 80%

  • Se crean datos de validación al 20%

  • Se crea el modelo regresión múltiple con datos de entrenamiento

    • Con este modelo se responde a preguntas tales como:

    • ¿cuáles son variables que están por encima del 90% de confianza como predictores?,

    • ¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Se crea el modelo árboles de regresión con los datos de entrenamiento

    • Se identifica la importancia de las variables sobre el precio

    • Se visualiza el árbol de regresión y sus reglas de asociación

    • Se hacen predicciones con datos de validación

    • Se determinar el estadístico RMSE para efectos de comparación

  • Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados

    • Se identifica la importancia de las variables sobre el precio

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.

3 Desarrollo

3.1 Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd

# Gráficos
import matplotlib.pyplot as plt

# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split

# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial

# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV

# Random Forest
from sklearn.ensemble import RandomForestRegressor


# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

3.2 Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
##      car_ID  symboling                   CarName  ... citympg highwaympg    price
## 0         1          3        alfa-romero giulia  ...      21         27  13495.0
## 1         2          3       alfa-romero stelvio  ...      21         27  16500.0
## 2         3          1  alfa-romero Quadrifoglio  ...      19         26  16500.0
## 3         4          2               audi 100 ls  ...      24         30  13950.0
## 4         5          2                audi 100ls  ...      18         22  17450.0
## ..      ...        ...                       ...  ...     ...        ...      ...
## 200     201         -1           volvo 145e (sw)  ...      23         28  16845.0
## 201     202         -1               volvo 144ea  ...      19         25  19045.0
## 202     203         -1               volvo 244dl  ...      18         23  21485.0
## 203     204         -1                 volvo 246  ...      26         27  22470.0
## 204     205         -1               volvo 264gl  ...      19         25  22625.0
## 
## [205 rows x 26 columns]

3.3 Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID                int64
## symboling             int64
## CarName              object
## fueltype             object
## aspiration           object
## doornumber           object
## carbody              object
## drivewheel           object
## enginelocation       object
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginetype           object
## cylindernumber       object
## enginesize            int64
## fuelsystem           object
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

3.4 Diccionario de datos

Col Nombre Descripción
1 Car_ID Unique id of each observation (Interger)
2 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
3 carCompany Name of car company (Categorical)
4 fueltype Car fuel type i.e gas or diesel (Categorical)
5 aspiration Aspiration used in a car (Categorical) (Std o Turbo)
6 doornumber Number of doors in a car (Categorical). Puertas
7 carbody body of car (Categorical). (convertible, sedan, wagon …)
8 drivewheel type of drive wheel (Categorical). (hidráulica, manual, )
9 enginelocation Location of car engine (Categorical). Lugar del motor
10 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
11 carlength Length of car (Numeric). Longitud
12 carwidth Width of car (Numeric). Amplitud
13 carheight height of car (Numeric). Altura
14 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
15 enginetype Type of engine. (Categorical). Tipo de motor
16 cylindernumber cylinder placed in the car (Categorical). Cilindraje
17 enginesize Size of car (Numeric). Tamaño del carro en …
18 fuelsystem Fuel system of car (Categorical)
19 boreratio Boreratio of car (Numeric). Eficiencia de motor
20 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
21 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
22 horsepower Horsepower (Numeric). Poder del carro
23 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
24 citympg Mileage in city (Numeric). Consumo de gasolina
25 highwaympg Mileage on highway (Numeric). Consumo de gasolina
26

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~

3.5 Preparación de datos

3.5.1 Eliminar variables

Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName

datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
##      symboling fueltype aspiration  ... citympg highwaympg    price
## 0            3      gas        std  ...      21         27  13495.0
## 1            3      gas        std  ...      21         27  16500.0
## 2            1      gas        std  ...      19         26  16500.0
## 3            2      gas        std  ...      24         30  13950.0
## 4            2      gas        std  ...      18         22  17450.0
## ..         ...      ...        ...  ...     ...        ...      ...
## 200         -1      gas        std  ...      23         28  16845.0
## 201         -1      gas      turbo  ...      19         25  19045.0
## 202         -1      gas        std  ...      18         23  21485.0
## 203         -1   diesel      turbo  ...      26         27  22470.0
## 204         -1      gas      turbo  ...      19         25  22625.0
## 
## [205 rows x 24 columns]

3.5.2 Construir cariables Dummys

Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object

Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.

El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.

¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.

Ejemplo

genero
MASCULINO
FEMENINO
MASCULINO

Mismos datos con variables dummis

genero_masculino genero_femenino
1 0
0 1
1 0
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 0            3       88.6  ...                0                0
## 1            3       88.6  ...                0                0
## 2            1       94.5  ...                0                0
## 3            2       99.8  ...                0                0
## 4            2       99.4  ...                0                0
## ..         ...        ...  ...              ...              ...
## 200         -1      109.1  ...                0                0
## 201         -1      109.1  ...                0                0
## 202         -1      109.1  ...                0                0
## 203         -1      109.1  ...                0                0
## 204         -1      109.1  ...                0                0
## 
## [205 rows x 44 columns]

3.5.3 Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1349

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80,  random_state = 1349)

3.5.3.1 Datos de entrenamiento

X_entrena
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 179          3      102.9  ...                0                0
## 28          -1      103.3  ...                0                0
## 132          3       99.1  ...                0                0
## 116          0      107.9  ...                0                0
## 123         -1      103.3  ...                0                0
## ..         ...        ...  ...              ...              ...
## 194         -2      104.3  ...                0                0
## 164          1       94.5  ...                0                0
## 17           0      110.0  ...                0                0
## 126          3       89.5  ...                0                0
## 18           2       88.4  ...                0                0
## 
## [164 rows x 43 columns]

3.5.3.2 Datos de validación

X_valida
##      symboling  wheelbase  ...  fuelsystem_spdi  fuelsystem_spfi
## 154          0       95.7  ...                0                0
## 147          0       97.0  ...                0                0
## 104          3       91.3  ...                0                0
## 102          0      100.4  ...                0                0
## 61           1       98.8  ...                0                0
## 163          1       94.5  ...                0                0
## 124          3       95.9  ...                1                0
## 7            1      105.8  ...                0                0
## 169          2       98.4  ...                0                0
## 16           0      103.5  ...                0                0
## 15           0      103.5  ...                0                0
## 73           0      120.9  ...                0                0
## 187          2       97.3  ...                0                0
## 109          0      114.2  ...                0                0
## 52           1       93.1  ...                0                0
## 111          0      107.9  ...                0                0
## 80           3       96.3  ...                1                0
## 103          0      100.4  ...                0                0
## 75           1      102.7  ...                0                0
## 10           2      101.2  ...                0                0
## 54           1       93.1  ...                0                0
## 40           0       96.5  ...                0                0
## 57           3       95.3  ...                0                0
## 66           0      104.9  ...                0                0
## 137          2       99.1  ...                0                0
## 161          0       95.7  ...                0                0
## 158          0       95.7  ...                0                0
## 11           0      101.2  ...                0                0
## 171          2       98.4  ...                0                0
## 2            1       94.5  ...                0                0
## 177         -1      102.4  ...                0                0
## 184          2       97.3  ...                0                0
## 21           1       93.7  ...                0                0
## 82           3       95.9  ...                1                0
## 125          3       94.5  ...                0                0
## 6            1      105.8  ...                0                0
## 65           0      104.9  ...                0                0
## 48           0      113.0  ...                0                0
## 42           1       96.5  ...                0                0
## 27           1       93.7  ...                0                0
## 79           1       93.0  ...                1                0
## 
## [41 rows x 43 columns]

3.6 Modelos Supervisados

3.6.1 Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.6.1.1 Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 2.04388493e+02,  3.16126675e+01, -5.83182354e+01,  8.88164168e+02,
##         1.41889656e+02,  3.11573551e+00,  1.23542219e+02, -9.39968556e+02,
##        -4.45174029e+03, -3.85091460e+02, -1.98531362e+01,  1.90332440e+00,
##         4.12596487e+01,  2.54267156e+01, -2.21433311e+03,  3.20956657e+03,
##        -1.97026142e+02, -3.25992564e+03, -3.49764588e+03, -2.68368678e+03,
##        -3.57821520e+03,  3.04267898e+02,  8.94224762e+02,  1.15655151e+04,
##        -2.45524955e+03, -1.05050270e+03,  2.41000287e+03, -1.29316916e+02,
##        -4.74398095e+03,  7.33671797e+02, -8.02999609e+03, -9.98863262e+03,
##        -6.45072457e+03, -1.93555522e+03, -8.02476950e+03,  7.33671797e+02,
##        -2.49184328e+01, -3.09837933e+03,  2.21433311e+03, -2.59273097e+03,
##         2.58576497e+02, -1.47437544e+03,  1.74188010e+02])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.0000 significa que las variables independientes explican aproximadamente el 00.00% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9552451622081178

3.6.1.2 Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 6474.28534028  8784.63604725 17131.26653913 15887.38291784
##  10057.63199124  7930.96816998 14881.59184891 20438.1219921
##  11913.68026675 27930.19570335 27335.40419357 42812.72467294
##  10568.03009743 11953.55052779  5966.55058655 17085.12986277
##  10152.23184869 16063.50754817 18725.03002097 12772.31493574
##   6621.69508548  7093.51950264 11410.57867756 11641.68787686
##  14020.44637291  6893.53320286  7455.53788324 12560.56409214
##  12421.5451551  10269.08129135  8736.67563889  8177.45242998
##   5870.96255677 14347.77070264 17760.21189592 20989.91950907
##  15106.77815584 30496.58715621  8845.11875073 10556.66271416]

3.6.1.3 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 154          0       95.7  ...       7898.0        6474.285340
## 147          0       97.0  ...      10198.0        8784.636047
## 104          3       91.3  ...      17199.0       17131.266539
## 102          0      100.4  ...      14399.0       15887.382918
## 61           1       98.8  ...      10595.0       10057.631991
## 163          1       94.5  ...       8058.0        7930.968170
## 124          3       95.9  ...      12764.0       14881.591849
## 7            1      105.8  ...      18920.0       20438.121992
## 169          2       98.4  ...       9989.0       11913.680267
## 16           0      103.5  ...      41315.0       27930.195703
## 15           0      103.5  ...      30760.0       27335.404194
## 73           0      120.9  ...      40960.0       42812.724673
## 187          2       97.3  ...       9495.0       10568.030097
## 109          0      114.2  ...      12440.0       11953.550528
## 52           1       93.1  ...       6795.0        5966.550587
## 111          0      107.9  ...      15580.0       17085.129863
## 80           3       96.3  ...       9959.0       10152.231849
## 103          0      100.4  ...      13499.0       16063.507548
## 75           1      102.7  ...      16503.0       18725.030021
## 10           2      101.2  ...      16430.0       12772.314936
## 54           1       93.1  ...       7395.0        6621.695085
## 40           0       96.5  ...      10295.0        7093.519503
## 57           3       95.3  ...      13645.0       11410.578678
## 66           0      104.9  ...      18344.0       11641.687877
## 137          2       99.1  ...      18620.0       14020.446373
## 161          0       95.7  ...       8358.0        6893.533203
## 158          0       95.7  ...       7898.0        7455.537883
## 11           0      101.2  ...      16925.0       12560.564092
## 171          2       98.4  ...      11549.0       12421.545155
## 2            1       94.5  ...      16500.0       10269.081291
## 177         -1      102.4  ...      11248.0        8736.675639
## 184          2       97.3  ...       7995.0        8177.452430
## 21           1       93.7  ...       5572.0        5870.962557
## 82           3       95.9  ...      12629.0       14347.770703
## 125          3       94.5  ...      22018.0       17760.211896
## 6            1      105.8  ...      17710.0       20989.919509
## 65           0      104.9  ...      18280.0       15106.778156
## 48           0      113.0  ...      35550.0       30496.587156
## 42           1       96.5  ...      10345.0        8845.118751
## 27           1       93.7  ...       8558.0       10556.662714
## 79           1       93.0  ...       7689.0        7872.676913
## 
## [41 rows x 45 columns]

3.6.1.4 RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3362.896276671872

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3362.896276671872

3.6.2 Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 1349
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1349)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.6.2.1 Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))

print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 152
#plot = plot_tree(
#            decision_tree = modelo_ar,
#            feature_names = datos.drop(columns = "price").columns,
#            class_names   = 'price',
#            filled        = True,
#            impurity      = False,
#            fontsize      = 10,
#            precision     = 2,
#            ax            = ax
#       )

#plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos_dummis.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2544.00
## |   |   |--- curbweight <= 2216.50
## |   |   |   |--- horsepower <= 68.50
## |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |--- curbweight <= 2104.00
## |   |   |   |   |   |   |--- carlength <= 166.05
## |   |   |   |   |   |   |   |--- fuelsystem_idi <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [7150.50]
## |   |   |   |   |   |   |   |--- fuelsystem_idi >  0.50
## |   |   |   |   |   |   |   |   |--- value: [7099.00]
## |   |   |   |   |   |   |--- carlength >  166.05
## |   |   |   |   |   |   |   |--- carwidth <= 64.00
## |   |   |   |   |   |   |   |   |--- value: [6692.00]
## |   |   |   |   |   |   |   |--- carwidth >  64.00
## |   |   |   |   |   |   |   |   |--- value: [6695.00]
## |   |   |   |   |   |--- curbweight >  2104.00
## |   |   |   |   |   |   |--- value: [7609.00]
## |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |--- highwaympg <= 38.50
## |   |   |   |   |   |   |--- highwaympg <= 34.50
## |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |--- highwaympg >  34.50
## |   |   |   |   |   |   |   |--- curbweight <= 1985.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1888.00
## |   |   |   |   |   |   |   |   |   |--- value: [6377.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1888.00
## |   |   |   |   |   |   |   |   |   |--- carlength <= 158.20
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.10
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6229.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  64.10
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6189.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  158.20
## |   |   |   |   |   |   |   |   |   |   |--- value: [6095.00]
## |   |   |   |   |   |   |   |--- curbweight >  1985.50
## |   |   |   |   |   |   |   |   |--- carheight <= 52.65
## |   |   |   |   |   |   |   |   |   |--- value: [6669.00]
## |   |   |   |   |   |   |   |   |--- carheight >  52.65
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2027.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [6488.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2027.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [6338.00]
## |   |   |   |   |   |--- highwaympg >  38.50
## |   |   |   |   |   |   |--- citympg <= 48.00
## |   |   |   |   |   |   |   |--- curbweight <= 1662.50
## |   |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |   |--- curbweight >  1662.50
## |   |   |   |   |   |   |   |   |--- enginesize <= 91.00
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.15
## |   |   |   |   |   |   |   |   |   |   |--- value: [5399.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.15
## |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  91.00
## |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [5348.00]
## |   |   |   |   |   |   |   |   |   |--- carwidth >  64.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |   |--- citympg >  48.00
## |   |   |   |   |   |   |   |--- value: [6479.00]
## |   |   |   |--- horsepower >  68.50
## |   |   |   |   |--- carwidth <= 63.50
## |   |   |   |   |   |--- value: [5118.00]
## |   |   |   |   |--- carwidth >  63.50
## |   |   |   |   |   |--- curbweight <= 2124.00
## |   |   |   |   |   |   |--- carheight <= 53.60
## |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |--- carlength <= 157.35
## |   |   |   |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |   |   |   |--- carlength >  157.35
## |   |   |   |   |   |   |   |   |   |--- compressionratio <= 9.50
## |   |   |   |   |   |   |   |   |   |   |--- citympg <= 30.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6938.00]
## |   |   |   |   |   |   |   |   |   |   |--- citympg >  30.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
## |   |   |   |   |   |   |   |   |   |--- compressionratio >  9.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [6575.00]
## |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1948.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1846.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [6855.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1846.50
## |   |   |   |   |   |   |   |   |   |   |--- citympg <= 34.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6529.00]
## |   |   |   |   |   |   |   |   |   |   |--- citympg >  34.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6295.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1948.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 53.05
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.20
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7129.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  64.20
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7198.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  53.05
## |   |   |   |   |   |   |   |   |   |   |--- value: [7799.00]
## |   |   |   |   |   |   |--- carheight >  53.60
## |   |   |   |   |   |   |   |--- curbweight <= 1903.50
## |   |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |   |--- curbweight >  1903.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1928.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6649.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1928.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 63.85
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  63.85
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7295.00]
## |   |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7053.00]
## |   |   |   |   |   |--- curbweight >  2124.00
## |   |   |   |   |   |   |--- horsepower <= 76.00
## |   |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |   |--- value: [8238.00]
## |   |   |   |   |   |   |--- horsepower >  76.00
## |   |   |   |   |   |   |   |--- citympg <= 30.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2210.50
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.02
## |   |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.02
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg <= 32.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7957.00]
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg >  32.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7975.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2210.50
## |   |   |   |   |   |   |   |   |   |--- value: [8195.00]
## |   |   |   |   |   |   |   |--- citympg >  30.00
## |   |   |   |   |   |   |   |   |--- value: [7126.00]
## |   |   |--- curbweight >  2216.50
## |   |   |   |--- cylindernumber_four <= 0.50
## |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |--- value: [11395.00]
## |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |--- curbweight <= 2503.50
## |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |--- curbweight >  2503.50
## |   |   |   |   |   |   |--- value: [15250.00]
## |   |   |   |--- cylindernumber_four >  0.50
## |   |   |   |   |--- horsepower <= 89.00
## |   |   |   |   |   |--- carwidth <= 66.00
## |   |   |   |   |   |   |--- horsepower <= 80.00
## |   |   |   |   |   |   |   |--- curbweight <= 2277.50
## |   |   |   |   |   |   |   |   |--- peakrpm <= 4450.00
## |   |   |   |   |   |   |   |   |   |--- value: [7603.00]
## |   |   |   |   |   |   |   |   |--- peakrpm >  4450.00
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 54.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  54.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7788.00]
## |   |   |   |   |   |   |   |--- curbweight >  2277.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2308.50
## |   |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2308.50
## |   |   |   |   |   |   |   |   |   |--- value: [6785.00]
## |   |   |   |   |   |   |--- horsepower >  80.00
## |   |   |   |   |   |   |   |--- carheight <= 53.15
## |   |   |   |   |   |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2385.00
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 96.65
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6989.00]
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  96.65
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7463.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2385.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |--- carheight >  53.15
## |   |   |   |   |   |   |   |   |--- curbweight <= 2255.50
## |   |   |   |   |   |   |   |   |   |--- value: [7895.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2255.50
## |   |   |   |   |   |   |   |   |   |--- citympg <= 23.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |   |   |   |   |   |   |   |--- citympg >  23.50
## |   |   |   |   |   |   |   |   |   |   |--- symboling <= 1.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- symboling >  1.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |--- carwidth >  66.00
## |   |   |   |   |   |   |--- curbweight <= 2417.50
## |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |--- value: [9370.00]
## |   |   |   |   |   |   |--- curbweight >  2417.50
## |   |   |   |   |   |   |   |--- fuelsystem_idi <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [11245.00]
## |   |   |   |   |   |   |   |--- fuelsystem_idi >  0.50
## |   |   |   |   |   |   |   |   |--- citympg <= 33.00
## |   |   |   |   |   |   |   |   |   |--- value: [10698.00]
## |   |   |   |   |   |   |   |   |--- citympg >  33.00
## |   |   |   |   |   |   |   |   |   |--- value: [10795.00]
## |   |   |   |   |--- horsepower >  89.00
## |   |   |   |   |   |--- carheight <= 54.00
## |   |   |   |   |   |   |--- curbweight <= 2538.00
## |   |   |   |   |   |   |   |--- horsepower <= 103.00
## |   |   |   |   |   |   |   |   |--- carwidth <= 66.55
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 108.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9960.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  108.50
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 52.65
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9980.00]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  52.65
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  66.55
## |   |   |   |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |   |   |   |   |--- horsepower >  103.00
## |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5700.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9639.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  5700.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9538.00]
## |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2334.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9298.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2334.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |   |   |--- curbweight >  2538.00
## |   |   |   |   |   |   |   |--- value: [8449.00]
## |   |   |   |   |   |--- carheight >  54.00
## |   |   |   |   |   |   |--- highwaympg <= 31.00
## |   |   |   |   |   |   |   |--- compressionratio <= 8.75
## |   |   |   |   |   |   |   |   |--- enginesize <= 108.50
## |   |   |   |   |   |   |   |   |   |--- value: [11259.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  108.50
## |   |   |   |   |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |   |   |   |   |--- compressionratio >  8.75
## |   |   |   |   |   |   |   |   |--- carlength <= 176.00
## |   |   |   |   |   |   |   |   |   |--- value: [12945.00]
## |   |   |   |   |   |   |   |   |--- carlength >  176.00
## |   |   |   |   |   |   |   |   |   |--- value: [13950.00]
## |   |   |   |   |   |   |--- highwaympg >  31.00
## |   |   |   |   |   |   |   |--- highwaympg <= 33.00
## |   |   |   |   |   |   |   |   |--- carheight <= 55.30
## |   |   |   |   |   |   |   |   |   |--- value: [10898.00]
## |   |   |   |   |   |   |   |   |--- carheight >  55.30
## |   |   |   |   |   |   |   |   |   |--- value: [9995.00]
## |   |   |   |   |   |   |   |--- highwaympg >  33.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2313.00
## |   |   |   |   |   |   |   |   |   |--- value: [9549.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2313.00
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 94.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8948.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  94.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8949.00]
## |   |--- curbweight >  2544.00
## |   |   |--- carwidth <= 68.60
## |   |   |   |--- horsepower <= 118.50
## |   |   |   |   |--- horsepower <= 92.50
## |   |   |   |   |   |--- wheelbase <= 98.25
## |   |   |   |   |   |   |--- carheight <= 53.30
## |   |   |   |   |   |   |   |--- value: [11048.00]
## |   |   |   |   |   |   |--- carheight >  53.30
## |   |   |   |   |   |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |   |   |   |   |   |--- value: [8778.00]
## |   |   |   |   |   |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |   |   |   |   |   |--- value: [9295.00]
## |   |   |   |   |   |--- wheelbase >  98.25
## |   |   |   |   |   |   |--- enginesize <= 103.00
## |   |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |   |--- enginesize >  103.00
## |   |   |   |   |   |   |   |--- value: [12290.00]
## |   |   |   |   |--- horsepower >  92.50
## |   |   |   |   |   |--- curbweight <= 2701.00
## |   |   |   |   |   |   |--- boreratio <= 3.50
## |   |   |   |   |   |   |   |--- horsepower <= 110.50
## |   |   |   |   |   |   |   |   |--- value: [13295.00]
## |   |   |   |   |   |   |   |--- horsepower >  110.50
## |   |   |   |   |   |   |   |   |--- value: [14997.50]
## |   |   |   |   |   |   |--- boreratio >  3.50
## |   |   |   |   |   |   |   |--- carbody_hardtop <= 0.50
## |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 110.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  110.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11694.00]
## |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |   |   |--- carbody_hardtop >  0.50
## |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |   |--- curbweight >  2701.00
## |   |   |   |   |   |   |--- horsepower <= 114.50
## |   |   |   |   |   |   |   |--- curbweight <= 3038.00
## |   |   |   |   |   |   |   |   |--- carheight <= 56.45
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2923.50
## |   |   |   |   |   |   |   |   |   |   |--- enginesize <= 131.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- enginesize >  131.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12940.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [15985.00]
## |   |   |   |   |   |   |   |   |--- carheight >  56.45
## |   |   |   |   |   |   |   |   |   |--- symboling <= -0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [13415.00]
## |   |   |   |   |   |   |   |   |   |--- symboling >  -0.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11900.00]
## |   |   |   |   |   |   |   |--- curbweight >  3038.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 3224.50
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.36
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg <= 26.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16630.00]
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg >  26.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16515.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.36
## |   |   |   |   |   |   |   |   |   |   |--- value: [13200.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  3224.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3357.50
## |   |   |   |   |   |   |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16695.00]
## |   |   |   |   |   |   |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [17425.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3357.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 3457.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13860.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  3457.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |   |   |   |--- horsepower >  114.50
## |   |   |   |   |   |   |   |--- carheight <= 53.65
## |   |   |   |   |   |   |   |   |--- value: [17669.00]
## |   |   |   |   |   |   |   |--- carheight >  53.65
## |   |   |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |--- horsepower >  118.50
## |   |   |   |   |--- horsepower <= 131.50
## |   |   |   |   |   |--- carheight <= 55.00
## |   |   |   |   |   |   |--- doornumber_two <= 0.50
## |   |   |   |   |   |   |   |--- value: [21105.00]
## |   |   |   |   |   |   |--- doornumber_two >  0.50
## |   |   |   |   |   |   |   |--- value: [20970.00]
## |   |   |   |   |   |--- carheight >  55.00
## |   |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |   |--- horsepower >  131.50
## |   |   |   |   |   |--- horsepower <= 158.00
## |   |   |   |   |   |   |--- drivewheel_fwd <= 0.50
## |   |   |   |   |   |   |   |--- citympg <= 18.50
## |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |--- citympg >  18.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 3141.00
## |   |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  3141.00
## |   |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |   |--- drivewheel_fwd >  0.50
## |   |   |   |   |   |   |   |--- fuelsystem_spdi <= 0.50
## |   |   |   |   |   |   |   |   |--- horsepower <= 148.50
## |   |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |   |   |--- horsepower >  148.50
## |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |--- fuelsystem_spdi >  0.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2923.50
## |   |   |   |   |   |   |   |   |   |--- value: [14869.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2923.50
## |   |   |   |   |   |   |   |   |   |--- value: [14489.00]
## |   |   |   |   |   |--- horsepower >  158.00
## |   |   |   |   |   |   |--- compressionratio <= 9.15
## |   |   |   |   |   |   |   |--- carlength <= 174.45
## |   |   |   |   |   |   |   |   |--- value: [19699.00]
## |   |   |   |   |   |   |   |--- carlength >  174.45
## |   |   |   |   |   |   |   |   |--- carbody_wagon <= 0.50
## |   |   |   |   |   |   |   |   |   |--- drivewheel_rwd <= 0.50
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 54.05
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [17859.17]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  54.05
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |   |   |--- drivewheel_rwd >  0.50
## |   |   |   |   |   |   |   |   |   |   |--- enginesize <= 155.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18420.00]
## |   |   |   |   |   |   |   |   |   |   |--- enginesize >  155.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18399.00]
## |   |   |   |   |   |   |   |   |--- carbody_wagon >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [18950.00]
## |   |   |   |   |   |   |--- compressionratio >  9.15
## |   |   |   |   |   |   |   |--- citympg <= 19.50
## |   |   |   |   |   |   |   |   |--- value: [15998.00]
## |   |   |   |   |   |   |   |--- citympg >  19.50
## |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |--- carwidth >  68.60
## |   |   |   |--- curbweight <= 3055.50
## |   |   |   |   |--- peakrpm <= 5450.00
## |   |   |   |   |   |--- horsepower <= 137.00
## |   |   |   |   |   |   |--- value: [16845.00]
## |   |   |   |   |   |--- horsepower >  137.00
## |   |   |   |   |   |   |--- value: [19045.00]
## |   |   |   |   |--- peakrpm >  5450.00
## |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |--- curbweight >  3055.50
## |   |   |   |   |--- peakrpm <= 5450.00
## |   |   |   |   |   |--- citympg <= 22.50
## |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |   |   |--- citympg >  22.50
## |   |   |   |   |   |   |--- value: [22470.00]
## |   |   |   |   |--- peakrpm >  5450.00
## |   |   |   |   |   |--- value: [23875.00]
## |--- enginesize >  182.00
## |   |--- highwaympg <= 16.50
## |   |   |--- value: [45400.00]
## |   |--- highwaympg >  16.50
## |   |   |--- fuelsystem_mpfi <= 0.50
## |   |   |   |--- curbweight <= 3760.00
## |   |   |   |   |--- carbody_sedan <= 0.50
## |   |   |   |   |   |--- carheight <= 56.80
## |   |   |   |   |   |   |--- value: [28176.00]
## |   |   |   |   |   |--- carheight >  56.80
## |   |   |   |   |   |   |--- value: [28248.00]
## |   |   |   |   |--- carbody_sedan >  0.50
## |   |   |   |   |   |--- value: [25552.00]
## |   |   |   |--- curbweight >  3760.00
## |   |   |   |   |--- value: [31600.00]
## |   |   |--- fuelsystem_mpfi >  0.50
## |   |   |   |--- carbody_hatchback <= 0.50
## |   |   |   |   |--- curbweight <= 4008.00
## |   |   |   |   |   |--- curbweight <= 2778.00
## |   |   |   |   |   |   |--- value: [33278.00]
## |   |   |   |   |   |--- curbweight >  2778.00
## |   |   |   |   |   |   |--- peakrpm <= 4875.00
## |   |   |   |   |   |   |   |--- carwidth <= 71.10
## |   |   |   |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |   |   |   |--- carwidth >  71.10
## |   |   |   |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |   |   |   |--- peakrpm >  4875.00
## |   |   |   |   |   |   |   |--- enginesize <= 267.50
## |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |--- value: [36880.00]
## |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |--- value: [37028.00]
## |   |   |   |   |   |   |   |--- enginesize >  267.50
## |   |   |   |   |   |   |   |   |--- value: [36000.00]
## |   |   |   |   |--- curbweight >  4008.00
## |   |   |   |   |   |--- value: [32250.00]
## |   |   |   |--- carbody_hatchback >  0.50
## |   |   |   |   |--- value: [31400.50]

3.6.2.2 Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos_dummis.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##                 predictor  importancia
## 6              enginesize     0.652109
## 5              curbweight     0.242459
## 10             horsepower     0.036227
## 3                carwidth     0.019139
## 13             highwaympg     0.017630
## 40        fuelsystem_mpfi     0.011611
## 31    cylindernumber_four     0.006195
## 4               carheight     0.003089
## 18      carbody_hatchback     0.001877
## 11                peakrpm     0.001450
## 1               wheelbase     0.001373
## 9        compressionratio     0.001306
## 7               boreratio     0.001258
## 21         drivewheel_fwd     0.001142
## 8                  stroke     0.000766
## 12                citympg     0.000713
## 19          carbody_sedan     0.000550
## 2               carlength     0.000548
## 41        fuelsystem_spdi     0.000211
## 0               symboling     0.000201
## 20          carbody_wagon     0.000048
## 17        carbody_hardtop     0.000038
## 16         doornumber_two     0.000026
## 38         fuelsystem_idi     0.000017
## 22         drivewheel_rwd     0.000016
## 23    enginelocation_rear     0.000000
## 34  cylindernumber_twelve     0.000000
## 14           fueltype_gas     0.000000
## 15       aspiration_turbo     0.000000
## 39         fuelsystem_mfi     0.000000
## 37        fuelsystem_4bbl     0.000000
## 36        fuelsystem_2bbl     0.000000
## 35     cylindernumber_two     0.000000
## 33   cylindernumber_three     0.000000
## 24       enginetype_dohcv     0.000000
## 32     cylindernumber_six     0.000000
## 30    cylindernumber_five     0.000000
## 29       enginetype_rotor     0.000000
## 28        enginetype_ohcv     0.000000
## 27        enginetype_ohcf     0.000000
## 26         enginetype_ohc     0.000000
## 25           enginetype_l     0.000000
## 42        fuelsystem_spfi     0.000000

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase

3.6.2.3 Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 6918.,  9960., 19699., 13499.,  8845.,  8238., 15690., 21485.,
##        11694., 36880., 36880., 45400.,  6785., 16695.,  6095., 16630.,
##         9639., 13499., 18420., 13950.,  6695.,  8845., 11395., 12290.,
##        18150.,  7198.,  7788., 13950., 17669., 15690.,  9988.,  7775.,
##         5572., 14869., 15690., 21485., 21105., 32250.,  9980.,  7957.,
##         7957.])

3.6.2.4 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 154          0       95.7  ...       7898.0             6918.0
## 147          0       97.0  ...      10198.0             9960.0
## 104          3       91.3  ...      17199.0            19699.0
## 102          0      100.4  ...      14399.0            13499.0
## 61           1       98.8  ...      10595.0             8845.0
## 163          1       94.5  ...       8058.0             8238.0
## 124          3       95.9  ...      12764.0            15690.0
## 7            1      105.8  ...      18920.0            21485.0
## 169          2       98.4  ...       9989.0            11694.0
## 16           0      103.5  ...      41315.0            36880.0
## 15           0      103.5  ...      30760.0            36880.0
## 73           0      120.9  ...      40960.0            45400.0
## 187          2       97.3  ...       9495.0             6785.0
## 109          0      114.2  ...      12440.0            16695.0
## 52           1       93.1  ...       6795.0             6095.0
## 111          0      107.9  ...      15580.0            16630.0
## 80           3       96.3  ...       9959.0             9639.0
## 103          0      100.4  ...      13499.0            13499.0
## 75           1      102.7  ...      16503.0            18420.0
## 10           2      101.2  ...      16430.0            13950.0
## 54           1       93.1  ...       7395.0             6695.0
## 40           0       96.5  ...      10295.0             8845.0
## 57           3       95.3  ...      13645.0            11395.0
## 66           0      104.9  ...      18344.0            12290.0
## 137          2       99.1  ...      18620.0            18150.0
## 161          0       95.7  ...       8358.0             7198.0
## 158          0       95.7  ...       7898.0             7788.0
## 11           0      101.2  ...      16925.0            13950.0
## 171          2       98.4  ...      11549.0            17669.0
## 2            1       94.5  ...      16500.0            15690.0
## 177         -1      102.4  ...      11248.0             9988.0
## 184          2       97.3  ...       7995.0             7775.0
## 21           1       93.7  ...       5572.0             5572.0
## 82           3       95.9  ...      12629.0            14869.0
## 125          3       94.5  ...      22018.0            15690.0
## 6            1      105.8  ...      17710.0            21485.0
## 65           0      104.9  ...      18280.0            21105.0
## 48           0      113.0  ...      35550.0            32250.0
## 42           1       96.5  ...      10345.0             9980.0
## 27           1       93.7  ...       8558.0             7957.0
## 79           1       93.0  ...       7689.0             7957.0
## 
## [41 rows x 45 columns]

3.6.2.5 RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2777.332173147461

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2777.332173147461

3.6.2.6 Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 1349 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1349)

modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1349)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

3.6.2.7 Variables de importancia

# pendiente ... ...

3.6.2.8 Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 7977.        , 10349.65      , 18873.95      , 15997.25      ,
##         8937.61666667,  8474.925     , 15009.65835   , 21553.7       ,
##        10903.4       , 33480.825     , 33529.8       , 40447.45      ,
##         8460.9       , 17086.        ,  6032.7       , 15509.65      ,
##        10494.6       , 16181.1       , 17863.55835   , 10647.675     ,
##         6820.55      ,  8693.9       , 11317.5       , 13087.15      ,
##        17445.55      ,  7971.65      ,  7898.3       , 10385.40833333,
##        14249.9       , 15676.825     , 10461.7       ,  8418.2       ,
##         5810.45      , 14793.05835   , 16371.7       , 21921.7       ,
##        16728.225     , 31958.175     ,  9515.13333333,  8418.225     ,
##         8274.35      ])

3.6.2.9 Tabla comparativa


comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 154          0       95.7  ...       7898.0        7977.000000
## 147          0       97.0  ...      10198.0       10349.650000
## 104          3       91.3  ...      17199.0       18873.950000
## 102          0      100.4  ...      14399.0       15997.250000
## 61           1       98.8  ...      10595.0        8937.616667
## 163          1       94.5  ...       8058.0        8474.925000
## 124          3       95.9  ...      12764.0       15009.658350
## 7            1      105.8  ...      18920.0       21553.700000
## 169          2       98.4  ...       9989.0       10903.400000
## 16           0      103.5  ...      41315.0       33480.825000
## 15           0      103.5  ...      30760.0       33529.800000
## 73           0      120.9  ...      40960.0       40447.450000
## 187          2       97.3  ...       9495.0        8460.900000
## 109          0      114.2  ...      12440.0       17086.000000
## 52           1       93.1  ...       6795.0        6032.700000
## 111          0      107.9  ...      15580.0       15509.650000
## 80           3       96.3  ...       9959.0       10494.600000
## 103          0      100.4  ...      13499.0       16181.100000
## 75           1      102.7  ...      16503.0       17863.558350
## 10           2      101.2  ...      16430.0       10647.675000
## 54           1       93.1  ...       7395.0        6820.550000
## 40           0       96.5  ...      10295.0        8693.900000
## 57           3       95.3  ...      13645.0       11317.500000
## 66           0      104.9  ...      18344.0       13087.150000
## 137          2       99.1  ...      18620.0       17445.550000
## 161          0       95.7  ...       8358.0        7971.650000
## 158          0       95.7  ...       7898.0        7898.300000
## 11           0      101.2  ...      16925.0       10385.408333
## 171          2       98.4  ...      11549.0       14249.900000
## 2            1       94.5  ...      16500.0       15676.825000
## 177         -1      102.4  ...      11248.0       10461.700000
## 184          2       97.3  ...       7995.0        8418.200000
## 21           1       93.7  ...       5572.0        5810.450000
## 82           3       95.9  ...      12629.0       14793.058350
## 125          3       94.5  ...      22018.0       16371.700000
## 6            1      105.8  ...      17710.0       21921.700000
## 65           0      104.9  ...      18280.0       16728.225000
## 48           0      113.0  ...      35550.0       31958.175000
## 42           1       96.5  ...      10345.0        9515.133333
## 27           1       93.7  ...       8558.0        8418.225000
## 79           1       93.0  ...       7689.0        8274.350000
## 
## [41 rows x 45 columns]

3.6.2.10 RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2776.9562227134757

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2776.9562227134757

3.7 Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 154          0       95.7  ...                6918.0           7977.000000
## 147          0       97.0  ...                9960.0          10349.650000
## 104          3       91.3  ...               19699.0          18873.950000
## 102          0      100.4  ...               13499.0          15997.250000
## 61           1       98.8  ...                8845.0           8937.616667
## 163          1       94.5  ...                8238.0           8474.925000
## 124          3       95.9  ...               15690.0          15009.658350
## 7            1      105.8  ...               21485.0          21553.700000
## 169          2       98.4  ...               11694.0          10903.400000
## 16           0      103.5  ...               36880.0          33480.825000
## 15           0      103.5  ...               36880.0          33529.800000
## 73           0      120.9  ...               45400.0          40447.450000
## 187          2       97.3  ...                6785.0           8460.900000
## 109          0      114.2  ...               16695.0          17086.000000
## 52           1       93.1  ...                6095.0           6032.700000
## 111          0      107.9  ...               16630.0          15509.650000
## 80           3       96.3  ...                9639.0          10494.600000
## 103          0      100.4  ...               13499.0          16181.100000
## 75           1      102.7  ...               18420.0          17863.558350
## 10           2      101.2  ...               13950.0          10647.675000
## 54           1       93.1  ...                6695.0           6820.550000
## 40           0       96.5  ...                8845.0           8693.900000
## 57           3       95.3  ...               11395.0          11317.500000
## 66           0      104.9  ...               12290.0          13087.150000
## 137          2       99.1  ...               18150.0          17445.550000
## 161          0       95.7  ...                7198.0           7971.650000
## 158          0       95.7  ...                7788.0           7898.300000
## 11           0      101.2  ...               13950.0          10385.408333
## 171          2       98.4  ...               17669.0          14249.900000
## 2            1       94.5  ...               15690.0          15676.825000
## 177         -1      102.4  ...                9988.0          10461.700000
## 184          2       97.3  ...                7775.0           8418.200000
## 21           1       93.7  ...                5572.0           5810.450000
## 82           3       95.9  ...               14869.0          14793.058350
## 125          3       94.5  ...               15690.0          16371.700000
## 6            1      105.8  ...               21485.0          21921.700000
## 65           0      104.9  ...               21105.0          16728.225000
## 48           0      113.0  ...               32250.0          31958.175000
## 42           1       96.5  ...                9980.0           9515.133333
## 27           1       93.7  ...                7957.0           8418.225000
## 79           1       93.0  ...                7957.0           8274.350000
## 
## [41 rows x 47 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3362.89627667, 2777.33217315, 2776.95622271]])

Se construye data.frame a partir del rreglo nmpy


rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  3362.896277  2777.332173  2776.956223

4 Interpretación

En el presente ejercicio se realizo una cargade datos numéricos de precios de automóviles con respecto a algunas variables numéricas mediante un enlace de Github en formato CSV. Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.

El RMSE del modelo de regresión lineal es de 3362.896277.

El RMSE del modelo de árbol de regresión es de 2777.332173.

El RMSE del modelo de bosques aleatorios es de 2776.956223.

Se construyeron datos de entrenamiento y validación y con el porcentaje de 80% y 20% respectivamente.