1 Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

2 Descripción

  • Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv

  • Se crean datos de entrenamiento al 80%

  • Se crean datos de validación al 20%

  • Se crea el modelo regresión múltiple con datos de entrenamiento

    • Con este modelo se responde a preguntas tales como:

      • ¿cuáles son variables que están por encima del 90% de confianza como predictores?,

      • ¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Se crea el modelo árboles de regresión con los datos de entrenamiento

    • Se identifica la importancia de las variables sobre el precio

    • Se visualiza el árbol de regresión y sus reglas de asociación

  • Se hacen predicciones con datos de validación

  • Se determinar el estadístico RMSE para efectos de comparación

  • Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados

    • Se identifica la importancia de las variables sobre el precio

    • Se generan predicciones con datos de validación

    • Se determina el estadístico RMSE para efectos de comparación

  • Al final del caso, se describe una interpretación personal

3 Desarrollo

3.1 Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd

# Gráficos
import matplotlib.pyplot as plt

# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split

# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial

# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV

# Random Forest
from sklearn.ensemble import RandomForestRegressor


# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

3.2 Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
datos
##      Unnamed: 0  symboling  wheelbase  ...  citympg  highwaympg    price
## 0             1          3       88.6  ...       21          27  13495.0
## 1             2          3       88.6  ...       21          27  16500.0
## 2             3          1       94.5  ...       19          26  16500.0
## 3             4          2       99.8  ...       24          30  13950.0
## 4             5          2       99.4  ...       18          22  17450.0
## ..          ...        ...        ...  ...      ...         ...      ...
## 200         201         -1      109.1  ...       23          28  16845.0
## 201         202         -1      109.1  ...       19          25  19045.0
## 202         203         -1      109.1  ...       18          23  21485.0
## 203         204         -1      109.1  ...       26          27  22470.0
## 204         205         -1      109.1  ...       19          25  22625.0
## 
## [205 rows x 16 columns]

3.3 Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 16)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## Unnamed: 0            int64
## symboling             int64
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginesize            int64
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

3.4 Diccionario de datos

Col Nombre Descripción
1 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
2 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
3 carlength Length of car (Numeric). Longitud
4 carwidth Width of car (Numeric). Amplitud
5 carheight height of car (Numeric). Altura
6 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
7 enginesize Size of car (Numeric). Tamaño del carro en …
8 boreratio Boreratio of car (Numeric). Eficiencia de motor
9 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
10 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
11 horsepower Horsepower (Numeric). Poder del carro
12 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
13 citympg Mileage in city (Numeric). Consumo de gasolina
14 highwaympg Mileage on highway (Numeric). Consumo de gasolina
16

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~

3.5 Limpiar datos

Dejar solo las variables necesarias:

‘symboling’, ‘wheelbase’, ‘carlength’, ‘carwidth’, ‘carheight’, ‘curbweight’, ‘enginesize’, ‘boreratio’, ‘stroke’, ‘compressionratio’, ‘horsepower’, ‘peakrpm’, ‘citympg’, ‘highwaympg’, ‘price’


datos = datos[['symboling', 'wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price']]
datos.describe()
##         symboling   wheelbase   carlength  ...     citympg  highwaympg         price
## count  205.000000  205.000000  205.000000  ...  205.000000  205.000000    205.000000
## mean     0.834146   98.756585  174.049268  ...   25.219512   30.751220  13276.710571
## std      1.245307    6.021776   12.337289  ...    6.542142    6.886443   7988.852332
## min     -2.000000   86.600000  141.100000  ...   13.000000   16.000000   5118.000000
## 25%      0.000000   94.500000  166.300000  ...   19.000000   25.000000   7788.000000
## 50%      1.000000   97.000000  173.200000  ...   24.000000   30.000000  10295.000000
## 75%      2.000000  102.400000  183.100000  ...   30.000000   34.000000  16503.000000
## max      3.000000  120.900000  208.100000  ...   49.000000   54.000000  45400.000000
## 
## [8 rows x 15 columns]
datos
##      symboling  wheelbase  carlength  ...  citympg  highwaympg    price
## 0            3       88.6      168.8  ...       21          27  13495.0
## 1            3       88.6      168.8  ...       21          27  16500.0
## 2            1       94.5      171.2  ...       19          26  16500.0
## 3            2       99.8      176.6  ...       24          30  13950.0
## 4            2       99.4      176.6  ...       18          22  17450.0
## ..         ...        ...        ...  ...      ...         ...      ...
## 200         -1      109.1      188.8  ...       23          28  16845.0
## 201         -1      109.1      188.8  ...       19          25  19045.0
## 202         -1      109.1      188.8  ...       18          23  21485.0
## 203         -1      109.1      188.8  ...       26          27  22470.0
## 204         -1      109.1      188.8  ...       19          25  22625.0
## 
## [205 rows x 15 columns]

3.5.1 Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1307

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos.drop(columns = "price"), datos['price'],train_size = 0.80,  random_state = 1307)

3.5.1.1 Datos de entrenamiento

X_entrena
##      symboling  wheelbase  carlength  ...  peakrpm  citympg  highwaympg
## 109          0      114.2      198.9  ...     5000       19          24
## 121          1       93.7      167.3  ...     5500       31          38
## 9            0       99.5      178.2  ...     5500       16          22
## 35           0       96.5      163.4  ...     6000       30          34
## 25           1       93.7      157.3  ...     5500       31          38
## ..         ...        ...        ...  ...      ...      ...         ...
## 59           1       98.8      177.8  ...     4800       26          32
## 134          3       99.1      186.6  ...     5250       21          28
## 178          3      102.9      183.5  ...     5200       20          24
## 81           3       96.3      173.0  ...     5000       25          32
## 122          1       93.7      167.3  ...     5500       31          38
## 
## [164 rows x 14 columns]

3.5.1.2 Datos de validación

X_valida
##      symboling  wheelbase  carlength  ...  peakrpm  citympg  highwaympg
## 12           0      101.2      176.8  ...     4250       21          28
## 65           0      104.9      175.0  ...     5000       19          27
## 167          2       98.4      176.2  ...     4800       24          30
## 5            2       99.8      177.3  ...     5500       19          25
## 34           1       93.7      150.0  ...     6000       30          34
## 199         -1      104.3      188.8  ...     5100       17          22
## 201         -1      109.1      188.8  ...     5300       19          25
## 44           1       94.5      155.9  ...     5400       38          43
## 41           0       96.5      175.4  ...     5800       24          28
## 148          0       96.9      173.6  ...     4800       23          29
## 179          3      102.9      183.5  ...     5200       19          24
## 184          2       97.3      171.7  ...     4800       37          46
## 182          2       97.3      171.7  ...     4800       37          46
## 106          1       99.2      178.5  ...     5200       19          25
## 153          0       95.7      169.7  ...     4800       31          37
## 102          0      100.4      184.6  ...     5200       17          22
## 130          0       96.1      181.5  ...     5100       23          31
## 161          0       95.7      166.3  ...     4800       28          34
## 156          0       95.7      166.3  ...     4800       30          37
## 197         -1      104.3      188.8  ...     5400       24          28
## 13           0      101.2      176.8  ...     4250       21          28
## 61           1       98.8      177.8  ...     4800       26          32
## 149          0       96.9      173.6  ...     4800       23          23
## 124          3       95.9      173.2  ...     5000       19          24
## 98           2       95.1      162.4  ...     5200       31          37
## 16           0      103.5      193.8  ...     5400       16          22
## 60           0       98.8      177.8  ...     4800       26          32
## 139          2       93.7      157.9  ...     4400       26          31
## 2            1       94.5      171.2  ...     5000       19          26
## 142          0       97.2      172.0  ...     4400       28          33
## 7            1      105.8      192.7  ...     5500       19          25
## 55           3       95.3      169.0  ...     6000       17          23
## 172          2       98.4      176.2  ...     4800       24          30
## 73           0      120.9      208.1  ...     4500       14          16
## 32           1       93.7      150.0  ...     5500       38          42
## 158          0       95.7      166.3  ...     4500       34          36
## 15           0      103.5      189.0  ...     5400       16          22
## 49           0      102.0      191.7  ...     5000       13          17
## 19           1       94.5      155.9  ...     5400       38          43
## 93           1       94.5      170.2  ...     5200       31          37
## 112          0      107.9      186.7  ...     4150       28          33
## 
## [41 rows x 14 columns]

3.6 Modelos Supervisados

3.6.1 Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
## LinearRegression()

3.6.1.1 Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 3.17186534e+02,  1.32383009e+02, -1.03370273e+02,  6.44126965e+02,
##         1.13974626e+02,  1.43640873e+00,  1.32654173e+02, -2.07214251e+03,
##        -3.90223816e+03,  3.72881265e+02,  2.86621423e+01,  2.33860775e+00,
##        -2.79207643e+02,  1.65303498e+02])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8347 significa que las variables independientes explican aproximadamente el 83.47% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.8691448544672669

3.6.1.2 Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [16626.26023764 15839.87709934 13468.89237861 15760.3524183
##   8704.34670222 15763.85156511 18969.81484685  6241.33875894
##   9892.03503603 10276.63266463 22919.99328956 11192.71102931
##  11188.40180312 23687.97143591  5691.39036531 22785.4695227
##  10073.2670804   6689.99280404  6591.39017885 15835.24716701
##  16705.26271786  9906.21949034  9961.64216746 15239.70264707
##   6804.71130653 27080.00069376  9830.09750074  8570.88999407
##  18580.59324031  8394.0002033  18869.01798138  8367.27261818
##  14207.70480256 41876.84759442  5932.13261809  9696.32662776
##  26944.53898237 49802.38703319  6241.33875894  5647.58429948]

3.6.1.3 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 12           0      101.2  ...      20970.0       16626.260238
## 65           0      104.9  ...      18280.0       15839.877099
## 167          2       98.4  ...       8449.0       13468.892379
## 5            2       99.8  ...      15250.0       15760.352418
## 34           1       93.7  ...       7129.0        8704.346702
## 199         -1      104.3  ...      18950.0       15763.851565
## 201         -1      109.1  ...      19045.0       18969.814847
## 44           1       94.5  ...       8916.5        6241.338759
## 41           0       96.5  ...      12945.0        9892.035036
## 148          0       96.9  ...       8013.0       10276.632665
## 179          3      102.9  ...      15998.0       22919.993290
## 184          2       97.3  ...       7995.0       11192.711029
## 182          2       97.3  ...       7775.0       11188.401803
## 106          1       99.2  ...      18399.0       23687.971436
## 153          0       95.7  ...       6918.0        5691.390365
## 102          0      100.4  ...      14399.0       22785.469523
## 130          0       96.1  ...       9295.0       10073.267080
## 161          0       95.7  ...       8358.0        6689.992804
## 156          0       95.7  ...       6938.0        6591.390179
## 197         -1      104.3  ...      16515.0       15835.247167
## 13           0      101.2  ...      21105.0       16705.262718
## 61           1       98.8  ...      10595.0        9906.219490
## 149          0       96.9  ...      11694.0        9961.642167
## 124          3       95.9  ...      12764.0       15239.702647
## 98           2       95.1  ...       8249.0        6804.711307
## 16           0      103.5  ...      41315.0       27080.000694
## 60           0       98.8  ...       8495.0        9830.097501
## 139          2       93.7  ...       7053.0        8570.889994
## 2            1       94.5  ...      16500.0       18580.593240
## 142          0       97.2  ...       7775.0        8394.000203
## 7            1      105.8  ...      18920.0       18869.017981
## 55           3       95.3  ...      10945.0        8367.272618
## 172          2       98.4  ...      17669.0       14207.704803
## 73           0      120.9  ...      40960.0       41876.847594
## 32           1       93.7  ...       5399.0        5932.132618
## 158          0       95.7  ...       7898.0        9696.326628
## 15           0      103.5  ...      30760.0       26944.538982
## 49           0      102.0  ...      36000.0       49802.387033
## 19           1       94.5  ...       6295.0        6241.338759
## 93           1       94.5  ...       7349.0        5647.584299
## 112          0      107.9  ...      16900.0       18540.628858
## 
## [41 rows x 16 columns]

3.6.1.4 RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 4231.747475030332

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 4231.747475030332

3.6.2 Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 1307
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
## DecisionTreeRegressor(random_state=1307)

3.6.2.1 Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))

print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 153
plot = plot_tree(
            decision_tree = modelo_ar,
            feature_names = datos.drop(columns = "price").columns,
            class_names   = 'price',
            filled        = True,
            impurity      = False,
            fontsize      = 10,
            precision     = 2,
            ax            = ax
       )

plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2542.00
## |   |   |--- curbweight <= 2247.00
## |   |   |   |--- curbweight <= 2072.00
## |   |   |   |   |--- wheelbase <= 94.10
## |   |   |   |   |   |--- stroke <= 3.09
## |   |   |   |   |   |   |--- carlength <= 149.00
## |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |--- carlength >  149.00
## |   |   |   |   |   |   |   |--- value: [5118.00]
## |   |   |   |   |   |--- stroke >  3.09
## |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |--- carlength <= 153.65
## |   |   |   |   |   |   |   |   |--- highwaympg <= 46.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 51.70
## |   |   |   |   |   |   |   |   |   |   |--- value: [6855.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  51.70
## |   |   |   |   |   |   |   |   |   |   |--- value: [6529.00]
## |   |   |   |   |   |   |   |   |--- highwaympg >  46.00
## |   |   |   |   |   |   |   |   |   |--- value: [6479.00]
## |   |   |   |   |   |   |   |--- carlength >  153.65
## |   |   |   |   |   |   |   |   |--- citympg <= 34.00
## |   |   |   |   |   |   |   |   |   |--- citympg <= 30.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |   |   |   |--- citympg >  30.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1902.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1902.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- citympg >  34.00
## |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.10
## |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |   |   |--- carwidth >  64.10
## |   |   |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |--- boreratio <= 3.05
## |   |   |   |   |   |   |   |   |--- curbweight <= 1978.00
## |   |   |   |   |   |   |   |   |   |--- carlength <= 162.05
## |   |   |   |   |   |   |   |   |   |   |--- value: [6229.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  162.05
## |   |   |   |   |   |   |   |   |   |   |--- value: [6695.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1978.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |   |   |--- value: [7150.50]
## |   |   |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.10
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6692.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  64.10
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6669.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.05
## |   |   |   |   |   |   |   |   |--- value: [7395.00]
## |   |   |   |   |--- wheelbase >  94.10
## |   |   |   |   |   |--- carlength <= 156.50
## |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |--- carlength >  156.50
## |   |   |   |   |   |   |--- carwidth <= 63.70
## |   |   |   |   |   |   |   |--- curbweight <= 2000.00
## |   |   |   |   |   |   |   |   |--- highwaympg <= 41.00
## |   |   |   |   |   |   |   |   |   |--- value: [5348.00]
## |   |   |   |   |   |   |   |   |--- highwaympg >  41.00
## |   |   |   |   |   |   |   |   |   |--- value: [6575.00]
## |   |   |   |   |   |   |   |--- curbweight >  2000.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2027.50
## |   |   |   |   |   |   |   |   |   |--- value: [6488.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2027.50
## |   |   |   |   |   |   |   |   |   |--- value: [6338.00]
## |   |   |   |   |   |   |--- carwidth >  63.70
## |   |   |   |   |   |   |   |--- curbweight <= 1903.50
## |   |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |   |--- curbweight >  1903.50
## |   |   |   |   |   |   |   |   |--- carheight <= 54.00
## |   |   |   |   |   |   |   |   |   |--- carlength <= 167.90
## |   |   |   |   |   |   |   |   |   |   |--- value: [7799.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  167.90
## |   |   |   |   |   |   |   |   |   |   |--- value: [7999.00]
## |   |   |   |   |   |   |   |   |--- carheight >  54.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1928.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6649.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1928.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.44
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.44
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7099.00]
## |   |   |   |--- curbweight >  2072.00
## |   |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |   |--- value: [9980.00]
## |   |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |   |--- citympg <= 29.50
## |   |   |   |   |   |   |--- horsepower <= 71.50
## |   |   |   |   |   |   |   |--- wheelbase <= 95.10
## |   |   |   |   |   |   |   |   |--- curbweight <= 2186.50
## |   |   |   |   |   |   |   |   |   |--- value: [8058.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2186.50
## |   |   |   |   |   |   |   |   |   |--- value: [8238.00]
## |   |   |   |   |   |   |   |--- wheelbase >  95.10
## |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |--- horsepower >  71.50
## |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |--- value: [8558.00]
## |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |--- wheelbase <= 93.50
## |   |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7689.00]
## |   |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [7603.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  93.50
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 33.50
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7957.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  64.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7895.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  33.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2210.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7975.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2210.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8195.00]
## |   |   |   |   |   |--- citympg >  29.50
## |   |   |   |   |   |   |--- highwaympg <= 37.50
## |   |   |   |   |   |   |   |--- enginesize <= 103.00
## |   |   |   |   |   |   |   |   |--- value: [7198.00]
## |   |   |   |   |   |   |   |--- enginesize >  103.00
## |   |   |   |   |   |   |   |   |--- value: [7126.00]
## |   |   |   |   |   |   |--- highwaympg >  37.50
## |   |   |   |   |   |   |   |--- wheelbase <= 94.70
## |   |   |   |   |   |   |   |   |--- value: [7609.00]
## |   |   |   |   |   |   |   |--- wheelbase >  94.70
## |   |   |   |   |   |   |   |   |--- value: [7738.00]
## |   |   |--- curbweight >  2247.00
## |   |   |   |--- horsepower <= 100.50
## |   |   |   |   |--- carwidth <= 65.75
## |   |   |   |   |   |--- peakrpm <= 5100.00
## |   |   |   |   |   |   |--- carheight <= 53.90
## |   |   |   |   |   |   |   |--- carheight <= 50.50
## |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |--- carheight >  50.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2385.00
## |   |   |   |   |   |   |   |   |   |--- citympg <= 26.50
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 4900.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6785.00]
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  4900.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6989.00]
## |   |   |   |   |   |   |   |   |   |--- citympg >  26.50
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg <= 39.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7463.00]
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg >  39.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7788.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2385.00
## |   |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |--- carheight >  53.90
## |   |   |   |   |   |   |   |--- horsepower <= 65.00
## |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |   |--- horsepower >  65.00
## |   |   |   |   |   |   |   |   |--- horsepower <= 85.00
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 4650.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9495.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  4650.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9233.00]
## |   |   |   |   |   |   |   |   |--- horsepower >  85.00
## |   |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |--- peakrpm >  5100.00
## |   |   |   |   |   |   |--- highwaympg <= 30.00
## |   |   |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |   |   |   |--- highwaympg >  30.00
## |   |   |   |   |   |   |   |--- curbweight <= 2332.00
## |   |   |   |   |   |   |   |   |--- horsepower <= 98.50
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 109.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  109.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2303.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2303.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- horsepower >  98.50
## |   |   |   |   |   |   |   |   |   |--- value: [9995.00]
## |   |   |   |   |   |   |   |--- curbweight >  2332.00
## |   |   |   |   |   |   |   |   |--- carlength <= 172.75
## |   |   |   |   |   |   |   |   |   |--- value: [9960.00]
## |   |   |   |   |   |   |   |   |--- carlength >  172.75
## |   |   |   |   |   |   |   |   |   |--- carheight <= 53.55
## |   |   |   |   |   |   |   |   |   |   |--- value: [10198.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  53.55
## |   |   |   |   |   |   |   |   |   |   |--- value: [10295.00]
## |   |   |   |   |--- carwidth >  65.75
## |   |   |   |   |   |--- curbweight <= 2397.50
## |   |   |   |   |   |   |--- peakrpm <= 5150.00
## |   |   |   |   |   |   |   |--- symboling <= 0.00
## |   |   |   |   |   |   |   |   |--- value: [8948.00]
## |   |   |   |   |   |   |   |--- symboling >  0.00
## |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |--- peakrpm >  5150.00
## |   |   |   |   |   |   |   |--- value: [10345.00]
## |   |   |   |   |   |--- curbweight >  2397.50
## |   |   |   |   |   |   |--- carwidth <= 66.55
## |   |   |   |   |   |   |   |--- curbweight <= 2419.50
## |   |   |   |   |   |   |   |   |--- carheight <= 54.40
## |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |--- carheight >  54.40
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2412.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [10245.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2412.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [10898.00]
## |   |   |   |   |   |   |   |--- curbweight >  2419.50
## |   |   |   |   |   |   |   |   |--- horsepower <= 78.50
## |   |   |   |   |   |   |   |   |   |--- carheight <= 55.20
## |   |   |   |   |   |   |   |   |   |   |--- value: [10698.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  55.20
## |   |   |   |   |   |   |   |   |   |   |--- value: [10795.00]
## |   |   |   |   |   |   |   |   |--- horsepower >  78.50
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 4500.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [11248.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  4500.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [11245.00]
## |   |   |   |   |   |   |--- carwidth >  66.55
## |   |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |--- horsepower >  100.50
## |   |   |   |   |--- carlength <= 176.40
## |   |   |   |   |   |--- citympg <= 20.00
## |   |   |   |   |   |   |--- horsepower <= 118.00
## |   |   |   |   |   |   |   |--- curbweight <= 2382.50
## |   |   |   |   |   |   |   |   |--- value: [11845.00]
## |   |   |   |   |   |   |   |--- curbweight >  2382.50
## |   |   |   |   |   |   |   |   |--- value: [13645.00]
## |   |   |   |   |   |   |--- horsepower >  118.00
## |   |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |--- citympg >  20.00
## |   |   |   |   |   |   |--- carheight <= 53.45
## |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2282.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9298.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2282.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9538.00]
## |   |   |   |   |   |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |--- boreratio <= 3.39
## |   |   |   |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |   |   |   |   |   |--- boreratio >  3.39
## |   |   |   |   |   |   |   |   |   |--- value: [9639.00]
## |   |   |   |   |   |   |--- carheight >  53.45
## |   |   |   |   |   |   |   |--- value: [11259.00]
## |   |   |   |   |--- carlength >  176.40
## |   |   |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |   |   |--- symboling <= 1.00
## |   |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |   |   |   |--- symboling >  1.00
## |   |   |   |   |   |   |   |--- value: [16430.00]
## |   |   |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |   |   |--- value: [13950.00]
## |   |--- curbweight >  2542.00
## |   |   |--- carwidth <= 68.65
## |   |   |   |--- horsepower <= 118.50
## |   |   |   |   |--- carwidth <= 65.85
## |   |   |   |   |   |--- curbweight <= 2549.50
## |   |   |   |   |   |   |--- value: [14997.50]
## |   |   |   |   |   |--- curbweight >  2549.50
## |   |   |   |   |   |   |--- enginesize <= 105.50
## |   |   |   |   |   |   |   |--- value: [8778.00]
## |   |   |   |   |   |   |--- enginesize >  105.50
## |   |   |   |   |   |   |   |--- curbweight <= 2615.00
## |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |   |   |   |--- curbweight >  2615.00
## |   |   |   |   |   |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |   |   |   |   |   |--- value: [11048.00]
## |   |   |   |   |   |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2696.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2696.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11549.00]
## |   |   |   |   |--- carwidth >  65.85
## |   |   |   |   |   |--- curbweight <= 2697.50
## |   |   |   |   |   |   |--- carlength <= 181.65
## |   |   |   |   |   |   |   |--- compressionratio <= 15.75
## |   |   |   |   |   |   |   |   |--- value: [13295.00]
## |   |   |   |   |   |   |   |--- compressionratio >  15.75
## |   |   |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |   |--- carlength >  181.65
## |   |   |   |   |   |   |   |--- compressionratio <= 9.31
## |   |   |   |   |   |   |   |   |--- curbweight <= 2629.00
## |   |   |   |   |   |   |   |   |   |--- value: [12290.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2629.00
## |   |   |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |   |   |--- compressionratio >  9.31
## |   |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |   |--- curbweight >  2697.50
## |   |   |   |   |   |   |--- carlength <= 181.60
## |   |   |   |   |   |   |   |--- peakrpm <= 4850.00
## |   |   |   |   |   |   |   |   |--- value: [18344.00]
## |   |   |   |   |   |   |   |--- peakrpm >  4850.00
## |   |   |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |   |   |   |--- carlength >  181.60
## |   |   |   |   |   |   |   |--- curbweight <= 3241.00
## |   |   |   |   |   |   |   |   |--- stroke <= 3.11
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2732.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [15040.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2732.50
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg <= 26.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15580.00]
## |   |   |   |   |   |   |   |   |   |   |--- highwaympg >  26.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15510.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.11
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3136.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 3054.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  3054.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16630.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3136.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 3213.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13200.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  3213.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12440.00]
## |   |   |   |   |   |   |   |--- curbweight >  3241.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 3357.50
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 136.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [16695.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  136.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [17950.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  3357.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3457.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [13860.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3457.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |--- horsepower >  118.50
## |   |   |   |   |--- horsepower <= 144.00
## |   |   |   |   |   |--- enginesize <= 142.50
## |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |--- enginesize >  142.50
## |   |   |   |   |   |   |--- curbweight <= 2916.50
## |   |   |   |   |   |   |   |--- value: [22018.00]
## |   |   |   |   |   |   |--- curbweight >  2916.50
## |   |   |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |   |--- horsepower >  144.00
## |   |   |   |   |   |--- horsepower <= 158.00
## |   |   |   |   |   |   |--- boreratio <= 3.35
## |   |   |   |   |   |   |   |--- curbweight <= 3141.00
## |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |   |   |--- curbweight >  3141.00
## |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |   |--- boreratio >  3.35
## |   |   |   |   |   |   |   |--- curbweight <= 2877.00
## |   |   |   |   |   |   |   |   |--- stroke <= 3.88
## |   |   |   |   |   |   |   |   |   |--- value: [12629.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.88
## |   |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |   |--- curbweight >  2877.00
## |   |   |   |   |   |   |   |   |--- carwidth <= 66.40
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [14869.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2923.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [14489.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  66.40
## |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |--- horsepower >  158.00
## |   |   |   |   |   |   |--- horsepower <= 187.50
## |   |   |   |   |   |   |   |--- enginesize <= 135.50
## |   |   |   |   |   |   |   |   |--- carheight <= 54.05
## |   |   |   |   |   |   |   |   |   |--- value: [17859.17]
## |   |   |   |   |   |   |   |   |--- carheight >  54.05
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2827.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2827.50
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5300.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18420.00]
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  5300.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18620.00]
## |   |   |   |   |   |   |   |--- enginesize >  135.50
## |   |   |   |   |   |   |   |   |--- wheelbase <= 97.00
## |   |   |   |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  97.00
## |   |   |   |   |   |   |   |   |   |--- boreratio <= 3.52
## |   |   |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |   |   |   |   |   |   |   |--- boreratio >  3.52
## |   |   |   |   |   |   |   |   |   |   |--- value: [16503.00]
## |   |   |   |   |   |   |--- horsepower >  187.50
## |   |   |   |   |   |   |   |--- value: [19699.00]
## |   |   |--- carwidth >  68.65
## |   |   |   |--- curbweight <= 2982.00
## |   |   |   |   |--- horsepower <= 112.00
## |   |   |   |   |   |--- value: [17710.00]
## |   |   |   |   |--- horsepower >  112.00
## |   |   |   |   |   |--- value: [16845.00]
## |   |   |   |--- curbweight >  2982.00
## |   |   |   |   |--- carlength <= 190.75
## |   |   |   |   |   |--- stroke <= 3.01
## |   |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |   |   |--- stroke >  3.01
## |   |   |   |   |   |   |--- horsepower <= 110.00
## |   |   |   |   |   |   |   |--- value: [22470.00]
## |   |   |   |   |   |   |--- horsepower >  110.00
## |   |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |   |--- carlength >  190.75
## |   |   |   |   |   |--- value: [23875.00]
## |--- enginesize >  182.00
## |   |--- citympg <= 14.50
## |   |   |--- value: [45400.00]
## |   |--- citympg >  14.50
## |   |   |--- compressionratio <= 9.75
## |   |   |   |--- compressionratio <= 8.05
## |   |   |   |   |--- value: [36880.00]
## |   |   |   |--- compressionratio >  8.05
## |   |   |   |   |--- curbweight <= 2778.00
## |   |   |   |   |   |--- value: [33278.00]
## |   |   |   |   |--- curbweight >  2778.00
## |   |   |   |   |   |--- compressionratio <= 8.90
## |   |   |   |   |   |   |--- wheelbase <= 104.80
## |   |   |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |   |   |--- wheelbase >  104.80
## |   |   |   |   |   |   |   |--- carheight <= 54.65
## |   |   |   |   |   |   |   |   |--- value: [33900.00]
## |   |   |   |   |   |   |   |--- carheight >  54.65
## |   |   |   |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |   |   |--- compressionratio >  8.90
## |   |   |   |   |   |   |--- value: [37028.00]
## |   |   |--- compressionratio >  9.75
## |   |   |   |--- carwidth <= 71.00
## |   |   |   |   |--- carheight <= 57.60
## |   |   |   |   |   |--- carheight <= 55.70
## |   |   |   |   |   |   |--- value: [28176.00]
## |   |   |   |   |   |--- carheight >  55.70
## |   |   |   |   |   |   |--- value: [25552.00]
## |   |   |   |   |--- carheight >  57.60
## |   |   |   |   |   |--- value: [28248.00]
## |   |   |   |--- carwidth >  71.00
## |   |   |   |   |--- symboling <= 0.00
## |   |   |   |   |   |--- value: [31600.00]
## |   |   |   |   |--- symboling >  0.00
## |   |   |   |   |   |--- value: [31400.50]

3.6.2.2 Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##            predictor  importancia
## 6         enginesize     0.663625
## 5         curbweight     0.230675
## 10        horsepower     0.032163
## 3           carwidth     0.026297
## 12           citympg     0.020558
## 9   compressionratio     0.011517
## 2          carlength     0.008241
## 4          carheight     0.001527
## 13        highwaympg     0.001454
## 11           peakrpm     0.001428
## 8             stroke     0.001043
## 7          boreratio     0.000709
## 1          wheelbase     0.000708
## 0          symboling     0.000055

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, peakrpm, carheight y wheelbase

3.6.2.3 Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([22018. , 18150. ,  9639. , 16430. ,  6229. , 18420. , 22625. ,
##         8916.5, 11259. ,  9233. , 16558. ,  7898. ,  7898. , 16558. ,
##         7898. , 13499. , 13295. ,  9258. ,  7198. , 15985. , 22018. ,
##         8845. , 11048. , 12629. ,  7799. , 36880. , 10245. ,  7957. ,
##        15690. ,  7895. , 17710. , 11845. , 11549. , 45400. ,  5118. ,
##         7463. , 36880. , 45400. ,  8916.5,  7999. , 17950. ])

3.6.2.4 Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 12           0      101.2  ...      20970.0            22018.0
## 65           0      104.9  ...      18280.0            18150.0
## 167          2       98.4  ...       8449.0             9639.0
## 5            2       99.8  ...      15250.0            16430.0
## 34           1       93.7  ...       7129.0             6229.0
## 199         -1      104.3  ...      18950.0            18420.0
## 201         -1      109.1  ...      19045.0            22625.0
## 44           1       94.5  ...       8916.5             8916.5
## 41           0       96.5  ...      12945.0            11259.0
## 148          0       96.9  ...       8013.0             9233.0
## 179          3      102.9  ...      15998.0            16558.0
## 184          2       97.3  ...       7995.0             7898.0
## 182          2       97.3  ...       7775.0             7898.0
## 106          1       99.2  ...      18399.0            16558.0
## 153          0       95.7  ...       6918.0             7898.0
## 102          0      100.4  ...      14399.0            13499.0
## 130          0       96.1  ...       9295.0            13295.0
## 161          0       95.7  ...       8358.0             9258.0
## 156          0       95.7  ...       6938.0             7198.0
## 197         -1      104.3  ...      16515.0            15985.0
## 13           0      101.2  ...      21105.0            22018.0
## 61           1       98.8  ...      10595.0             8845.0
## 149          0       96.9  ...      11694.0            11048.0
## 124          3       95.9  ...      12764.0            12629.0
## 98           2       95.1  ...       8249.0             7799.0
## 16           0      103.5  ...      41315.0            36880.0
## 60           0       98.8  ...       8495.0            10245.0
## 139          2       93.7  ...       7053.0             7957.0
## 2            1       94.5  ...      16500.0            15690.0
## 142          0       97.2  ...       7775.0             7895.0
## 7            1      105.8  ...      18920.0            17710.0
## 55           3       95.3  ...      10945.0            11845.0
## 172          2       98.4  ...      17669.0            11549.0
## 73           0      120.9  ...      40960.0            45400.0
## 32           1       93.7  ...       5399.0             5118.0
## 158          0       95.7  ...       7898.0             7463.0
## 15           0      103.5  ...      30760.0            36880.0
## 49           0      102.0  ...      36000.0            45400.0
## 19           1       94.5  ...       6295.0             8916.5
## 93           1       94.5  ...       7349.0             7999.0
## 112          0      107.9  ...      16900.0            17950.0
## 
## [41 rows x 16 columns]

3.6.2.5 RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2554.2590842575455

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2554.2590842575455

3.6.2.6 Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 1307 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1307)

modelo_rf.fit(X_entrena, Y_entrena)
## RandomForestRegressor(n_estimators=20, random_state=1307)

3.6.2.7 Variables de importancia

# pendiente ... ...

3.6.2.8 Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([15658.9       , 14528.1       ,  9792.25      , 13554.05      ,
##         7088.775     , 17404.35      , 19165.05      ,  6651.375     ,
##        12607.15      , 10483.95      , 17038.70835   ,  8804.8       ,
##         8659.8       , 16964.84168333,  8020.85      , 14571.8       ,
##        11591.        ,  8812.9       ,  7527.95      , 14376.5       ,
##        15857.7       ,  9996.6       , 13873.44166667, 13523.7       ,
##         7445.40416667, 34456.18335   , 10117.65      ,  7830.4       ,
##        13542.35      ,  8128.975     , 18734.70835   , 12392.5       ,
##        12919.5       , 39419.5       ,  5877.375     ,  8016.95      ,
##        35589.25835   , 37070.9       ,  6651.375     ,  7783.65      ,
##        16407.7       ])

3.6.2.9 Tabla comparativa


comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 12           0      101.2  ...      20970.0       15658.900000
## 65           0      104.9  ...      18280.0       14528.100000
## 167          2       98.4  ...       8449.0        9792.250000
## 5            2       99.8  ...      15250.0       13554.050000
## 34           1       93.7  ...       7129.0        7088.775000
## 199         -1      104.3  ...      18950.0       17404.350000
## 201         -1      109.1  ...      19045.0       19165.050000
## 44           1       94.5  ...       8916.5        6651.375000
## 41           0       96.5  ...      12945.0       12607.150000
## 148          0       96.9  ...       8013.0       10483.950000
## 179          3      102.9  ...      15998.0       17038.708350
## 184          2       97.3  ...       7995.0        8804.800000
## 182          2       97.3  ...       7775.0        8659.800000
## 106          1       99.2  ...      18399.0       16964.841683
## 153          0       95.7  ...       6918.0        8020.850000
## 102          0      100.4  ...      14399.0       14571.800000
## 130          0       96.1  ...       9295.0       11591.000000
## 161          0       95.7  ...       8358.0        8812.900000
## 156          0       95.7  ...       6938.0        7527.950000
## 197         -1      104.3  ...      16515.0       14376.500000
## 13           0      101.2  ...      21105.0       15857.700000
## 61           1       98.8  ...      10595.0        9996.600000
## 149          0       96.9  ...      11694.0       13873.441667
## 124          3       95.9  ...      12764.0       13523.700000
## 98           2       95.1  ...       8249.0        7445.404167
## 16           0      103.5  ...      41315.0       34456.183350
## 60           0       98.8  ...       8495.0       10117.650000
## 139          2       93.7  ...       7053.0        7830.400000
## 2            1       94.5  ...      16500.0       13542.350000
## 142          0       97.2  ...       7775.0        8128.975000
## 7            1      105.8  ...      18920.0       18734.708350
## 55           3       95.3  ...      10945.0       12392.500000
## 172          2       98.4  ...      17669.0       12919.500000
## 73           0      120.9  ...      40960.0       39419.500000
## 32           1       93.7  ...       5399.0        5877.375000
## 158          0       95.7  ...       7898.0        8016.950000
## 15           0      103.5  ...      30760.0       35589.258350
## 49           0      102.0  ...      36000.0       37070.900000
## 19           1       94.5  ...       6295.0        6651.375000
## 93           1       94.5  ...       7349.0        7783.650000
## 112          0      107.9  ...      16900.0       16407.700000
## 
## [41 rows x 16 columns]

3.6.2.10 RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2328.478373613539

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2328.478373613539

3.7 Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 12           0      101.2  ...               22018.0          15658.900000
## 65           0      104.9  ...               18150.0          14528.100000
## 167          2       98.4  ...                9639.0           9792.250000
## 5            2       99.8  ...               16430.0          13554.050000
## 34           1       93.7  ...                6229.0           7088.775000
## 199         -1      104.3  ...               18420.0          17404.350000
## 201         -1      109.1  ...               22625.0          19165.050000
## 44           1       94.5  ...                8916.5           6651.375000
## 41           0       96.5  ...               11259.0          12607.150000
## 148          0       96.9  ...                9233.0          10483.950000
## 179          3      102.9  ...               16558.0          17038.708350
## 184          2       97.3  ...                7898.0           8804.800000
## 182          2       97.3  ...                7898.0           8659.800000
## 106          1       99.2  ...               16558.0          16964.841683
## 153          0       95.7  ...                7898.0           8020.850000
## 102          0      100.4  ...               13499.0          14571.800000
## 130          0       96.1  ...               13295.0          11591.000000
## 161          0       95.7  ...                9258.0           8812.900000
## 156          0       95.7  ...                7198.0           7527.950000
## 197         -1      104.3  ...               15985.0          14376.500000
## 13           0      101.2  ...               22018.0          15857.700000
## 61           1       98.8  ...                8845.0           9996.600000
## 149          0       96.9  ...               11048.0          13873.441667
## 124          3       95.9  ...               12629.0          13523.700000
## 98           2       95.1  ...                7799.0           7445.404167
## 16           0      103.5  ...               36880.0          34456.183350
## 60           0       98.8  ...               10245.0          10117.650000
## 139          2       93.7  ...                7957.0           7830.400000
## 2            1       94.5  ...               15690.0          13542.350000
## 142          0       97.2  ...                7895.0           8128.975000
## 7            1      105.8  ...               17710.0          18734.708350
## 55           3       95.3  ...               11845.0          12392.500000
## 172          2       98.4  ...               11549.0          12919.500000
## 73           0      120.9  ...               45400.0          39419.500000
## 32           1       93.7  ...                5118.0           5877.375000
## 158          0       95.7  ...                7463.0           8016.950000
## 15           0      103.5  ...               36880.0          35589.258350
## 49           0      102.0  ...               45400.0          37070.900000
## 19           1       94.5  ...                8916.5           6651.375000
## 93           1       94.5  ...                7999.0           7783.650000
## 112          0      107.9  ...               17950.0          16407.700000
## 
## [41 rows x 18 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[4231.74747503, 2554.25908426, 2328.47837361]])

Se construye data.frame a partir del rreglo nmpy


rmse = pd.DataFrame(rmse)

rmse.columns = ['Regresion multiple', 'Arbol de regresion', 'Bosque aleatorio']
print("RMSE por modelo\n", rmse)
## RMSE por modelo
##     Regresion multiple  Arbol de regresion  Bosque aleatorio
## 0         4231.747475         2554.259084       2328.478374

4 Interpretación

Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.

Se utilizo la semilla 1307.

Importancia de las variables:

La variable de enginesize es la mas significativa en estos modelos y la que va a causar el mayor cambio durante las predicciones. Las variables mas importantes curbweight, y carwidth tambien tienen un gran efecto en todos los modelos.

Teniendo en cuenta que el mejor valor de RMSE en los modelos fue el de arboles aleatorios con un valor de RMSE de: 2328.478374, la diferencia entre el siguiente mejor modelo no es tan grande, en python la diferencia es mas pronunciada. Comparando con los modelos de R, la precisión de los modelos de python fue mas baja, esto como en casos anteriores se debe a que aunque se utiliza un mismo modelo la implementacion y la separacion con el uso de las semillas termina en resultados diferentes.

Las visualizaciones de R son mucho mejores que las de python y creo que dejan mas en claro muchos de los puntos de la actividad al momento de hacer el caso.