Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

Descripción

Desarrollo

Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
datos
##      Unnamed: 0  symboling  wheelbase  ...  citympg  highwaympg    price
## 0             1          3       88.6  ...       21          27  13495.0
## 1             2          3       88.6  ...       21          27  16500.0
## 2             3          1       94.5  ...       19          26  16500.0
## 3             4          2       99.8  ...       24          30  13950.0
## 4             5          2       99.4  ...       18          22  17450.0
## ..          ...        ...        ...  ...      ...         ...      ...
## 200         201         -1      109.1  ...       23          28  16845.0
## 201         202         -1      109.1  ...       19          25  19045.0
## 202         203         -1      109.1  ...       18          23  21485.0
## 203         204         -1      109.1  ...       26          27  22470.0
## 204         205         -1      109.1  ...       19          25  22625.0
## 
## [205 rows x 16 columns]

Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 16)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## Unnamed: 0            int64
## symboling             int64
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginesize            int64
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

Diccionario de datos

Col Nombre Descripción
1 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
2 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
3 carlength Length of car (Numeric). Longitud
4 carwidth Width of car (Numeric). Amplitud
5 carheight height of car (Numeric). Altura
6 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
7 enginesize Size of car (Numeric). Tamaño del carro en …
8 boreratio Boreratio of car (Numeric). Eficiencia de motor
9 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
10 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
11 horsepower Horsepower (Numeric). Poder del carro
12 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
13 citympg Mileage in city (Numeric). Consumo de gasolina
14 highwaympg Mileage on highway (Numeric). Consumo de gasolina
16

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~

Limpiar datos

Dejar solo las variables necesarias:

‘symboling’, ‘wheelbase’, ‘carlength’, ‘carwidth’, ‘carheight’, ‘curbweight’, ‘enginesize’, ‘boreratio’, ‘stroke’, ‘compressionratio’, ‘horsepower’, ‘peakrpm’, ‘citympg’, ‘highwaympg’, ‘price’

datos = datos[['symboling', 'wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price']]
datos.describe()
##         symboling   wheelbase   carlength  ...     citympg  highwaympg         price
## count  205.000000  205.000000  205.000000  ...  205.000000  205.000000    205.000000
## mean     0.834146   98.756585  174.049268  ...   25.219512   30.751220  13276.710571
## std      1.245307    6.021776   12.337289  ...    6.542142    6.886443   7988.852332
## min     -2.000000   86.600000  141.100000  ...   13.000000   16.000000   5118.000000
## 25%      0.000000   94.500000  166.300000  ...   19.000000   25.000000   7788.000000
## 50%      1.000000   97.000000  173.200000  ...   24.000000   30.000000  10295.000000
## 75%      2.000000  102.400000  183.100000  ...   30.000000   34.000000  16503.000000
## max      3.000000  120.900000  208.100000  ...   49.000000   54.000000  45400.000000
## 
## [8 rows x 15 columns]
datos
##      symboling  wheelbase  carlength  ...  citympg  highwaympg    price
## 0            3       88.6      168.8  ...       21          27  13495.0
## 1            3       88.6      168.8  ...       21          27  16500.0
## 2            1       94.5      171.2  ...       19          26  16500.0
## 3            2       99.8      176.6  ...       24          30  13950.0
## 4            2       99.4      176.6  ...       18          22  17450.0
## ..         ...        ...        ...  ...      ...         ...      ...
## 200         -1      109.1      188.8  ...       23          28  16845.0
## 201         -1      109.1      188.8  ...       19          25  19045.0
## 202         -1      109.1      188.8  ...       18          23  21485.0
## 203         -1      109.1      188.8  ...       26          27  22470.0
## 204         -1      109.1      188.8  ...       19          25  22625.0
## 
## [205 rows x 15 columns]

Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1279

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos.drop(columns = "price"), datos['price'],train_size = 0.80,  random_state = 1279)

Datos de entrenamiento

X_entrena
##      symboling  wheelbase  carlength  ...  peakrpm  citympg  highwaympg
## 194         -2      104.3      188.8  ...     5400       23          28
## 132          3       99.1      186.6  ...     5250       21          28
## 84           3       95.9      173.2  ...     5000       19          24
## 90           1       94.5      165.3  ...     4800       45          50
## 4            2       99.4      176.6  ...     5500       18          22
## ..         ...        ...        ...  ...      ...      ...         ...
## 40           0       96.5      175.4  ...     5800       27          33
## 86           1       96.3      172.4  ...     5000       25          32
## 60           0       98.8      177.8  ...     4800       26          32
## 155          0       95.7      169.7  ...     4800       27          32
## 167          2       98.4      176.2  ...     4800       24          30
## 
## [164 rows x 14 columns]

Datos de validación

X_valida
##      symboling  wheelbase  carlength  ...  peakrpm  citympg  highwaympg
## 83           3       95.9      173.2  ...     5000       19          24
## 91           1       94.5      165.3  ...     5200       31          37
## 1            3       88.6      168.8  ...     5000       21          27
## 110          0      114.2      198.9  ...     4150       25          25
## 136          3       99.1      186.6  ...     5500       19          26
## 105          3       91.3      170.7  ...     5200       17          23
## 197         -1      104.3      188.8  ...     5400       24          28
## 49           0      102.0      191.7  ...     5000       13          17
## 181         -1      104.5      187.8  ...     5200       19          24
## 140          2       93.3      157.3  ...     4400       26          31
## 198         -2      104.3      188.8  ...     5100       17          22
## 146          0       97.0      173.5  ...     4800       28          32
## 99           0       97.2      173.4  ...     5200       27          34
## 130          0       96.1      181.5  ...     5100       23          31
## 145          0       97.0      172.0  ...     4800       24          29
## 70          -1      115.6      202.6  ...     4350       22          25
## 37           0       96.5      167.5  ...     5800       27          33
## 13           0      101.2      176.8  ...     4250       21          28
## 5            2       99.8      177.3  ...     5500       19          25
## 203         -1      109.1      188.8  ...     4800       26          27
## 131          2       96.1      176.8  ...     5100       23          31
## 48           0      113.0      199.6  ...     4750       15          19
## 174         -1      102.4      175.6  ...     4500       30          33
## 41           0       96.5      175.4  ...     5800       24          28
## 6            1      105.8      192.7  ...     5500       19          25
## 196         -2      104.3      188.8  ...     5400       24          28
## 199         -1      104.3      188.8  ...     5100       17          22
## 63           0       98.8      177.8  ...     4650       36          42
## 191          0      100.4      180.2  ...     5500       19          24
## 184          2       97.3      171.7  ...     4800       37          46
## 149          0       96.9      173.6  ...     4800       23          23
## 62           0       98.8      177.8  ...     4800       26          32
## 152          1       95.7      158.7  ...     4800       31          38
## 109          0      114.2      198.9  ...     5000       19          24
## 68          -1      110.0      190.9  ...     4350       22          25
## 127          3       89.5      168.9  ...     5900       17          25
## 201         -1      109.1      188.8  ...     5300       19          25
## 61           1       98.8      177.8  ...     4800       26          32
## 10           2      101.2      176.8  ...     5800       23          29
## 87           1       96.3      172.4  ...     5500       23          30
## 32           1       93.7      150.0  ...     5500       38          42
## 
## [41 rows x 14 columns]

Modelos Supervisados

Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 1.53154482e+02,  1.11888397e+02, -7.30998253e+01,  6.43581473e+02,
##         1.10897998e+02,  1.03717880e+00,  1.29981002e+02, -1.85797897e+03,
##        -3.53332377e+03,  2.98102781e+02,  2.85527772e+01,  2.38908193e+00,
##        -2.81088636e+02,  1.88537018e+02])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8347 significa que las variables independientes explican aproximadamente el 83.47% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.8507340734785982

Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [15160.84661158  6366.79565534 13349.3148034  17386.46742585
##  16136.16698363 24321.10020209 16232.39865749 48725.2932995
##  19737.15751738  8559.22788284 15699.67544259  9193.25186618
##  10376.97681821 10796.86845542 10674.48875487 25427.77293285
##  10209.25853887 16597.24930195 15636.34520052 20502.30050282
##  10866.4598746  31170.42354412 11685.13736124 10286.87474132
##  18948.91335475 15824.09864632 16113.16134776 12669.6279701
##  15764.31318033 10446.34934839  9907.95055504 10287.08588237
##   6323.76138793 14461.27636072 25000.86342357 27375.54611131
##  19260.52703775 10214.69449666 12964.60690436 10733.53208554]

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 83           3       95.9  ...      14869.0       15160.846612
## 91           1       94.5  ...       6649.0        6366.795655
## 1            3       88.6  ...      16500.0       13349.314803
## 110          0      114.2  ...      13860.0       17386.467426
## 136          3       99.1  ...      18150.0       16136.166984
## 105          3       91.3  ...      19699.0       24321.100202
## 197         -1      104.3  ...      16515.0       16232.398657
## 49           0      102.0  ...      36000.0       48725.293299
## 181         -1      104.5  ...      15750.0       19737.157517
## 140          2       93.3  ...       7603.0        8559.227883
## 198         -2      104.3  ...      18420.0       15699.675443
## 146          0       97.0  ...       7463.0        9193.251866
## 99           0       97.2  ...       8949.0       10376.976818
## 130          0       96.1  ...       9295.0       10796.868455
## 145          0       97.0  ...      11259.0       10674.488755
## 70          -1      115.6  ...      31600.0       25427.772933
## 37           0       96.5  ...       7895.0       10209.258539
## 13           0      101.2  ...      21105.0       16597.249302
## 5            2       99.8  ...      15250.0       15636.345201
## 203         -1      109.1  ...      22470.0       20502.300503
## 131          2       96.1  ...       9895.0       10866.459875
## 48           0      113.0  ...      35550.0       31170.423544
## 174         -1      102.4  ...      10698.0       11685.137361
## 41           0       96.5  ...      12945.0       10286.874741
## 6            1      105.8  ...      17710.0       18948.913355
## 196         -2      104.3  ...      15985.0       15824.098646
## 199         -1      104.3  ...      18950.0       16113.161348
## 63           0       98.8  ...      10795.0       12669.627970
## 191          0      100.4  ...      13295.0       15764.313180
## 184          2       97.3  ...       7995.0       10446.349348
## 149          0       96.9  ...      11694.0        9907.950555
## 62           0       98.8  ...      10245.0       10287.085882
## 152          1       95.7  ...       6488.0        6323.761388
## 109          0      114.2  ...      12440.0       14461.276361
## 68          -1      110.0  ...      28248.0       25000.863424
## 127          3       89.5  ...      34028.0       27375.546111
## 201         -1      109.1  ...      19045.0       19260.527038
## 61           1       98.8  ...      10595.0       10214.694497
## 10           2      101.2  ...      16430.0       12964.606904
## 87           1       96.3  ...       9279.0       10733.532086
## 32           1       93.7  ...       5399.0        5756.785192
## 
## [41 rows x 16 columns]

RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3274.200409009657

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3274.200409009657

Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 2022
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=2022)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 16
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 154
plot = plot_tree(
            decision_tree = modelo_ar,
            feature_names = datos.drop(columns = "price").columns,
            class_names   = 'price',
            filled        = True,
            impurity      = False,
            fontsize      = 10,
            precision     = 2,
            ax            = ax
       )
plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2557.00
## |   |   |--- horsepower <= 83.00
## |   |   |   |--- wheelbase <= 94.40
## |   |   |   |   |--- highwaympg <= 39.50
## |   |   |   |   |   |--- stroke <= 2.50
## |   |   |   |   |   |   |--- value: [5118.00]
## |   |   |   |   |   |--- stroke >  2.50
## |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |--- highwaympg <= 32.50
## |   |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |   |--- highwaympg >  32.50
## |   |   |   |   |   |   |   |   |--- wheelbase <= 89.85
## |   |   |   |   |   |   |   |   |   |--- value: [6855.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  89.85
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1902.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1888.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6377.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1888.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6095.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1902.50
## |   |   |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6189.00]
## |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |--- enginesize <= 90.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1978.00
## |   |   |   |   |   |   |   |   |   |--- value: [6229.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1978.00
## |   |   |   |   |   |   |   |   |   |--- carlength <= 162.30
## |   |   |   |   |   |   |   |   |   |   |--- value: [7150.50]
## |   |   |   |   |   |   |   |   |   |--- carlength >  162.30
## |   |   |   |   |   |   |   |   |   |   |--- value: [6692.00]
## |   |   |   |   |   |   |   |--- enginesize >  90.50
## |   |   |   |   |   |   |   |   |--- carwidth <= 64.30
## |   |   |   |   |   |   |   |   |   |--- compressionratio <= 9.30
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1947.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6695.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1947.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
## |   |   |   |   |   |   |   |   |   |--- compressionratio >  9.30
## |   |   |   |   |   |   |   |   |   |   |--- value: [7609.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  64.30
## |   |   |   |   |   |   |   |   |   |--- value: [6669.00]
## |   |   |   |   |--- highwaympg >  39.50
## |   |   |   |   |   |--- compressionratio <= 9.55
## |   |   |   |   |   |   |--- horsepower <= 58.00
## |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |--- horsepower >  58.00
## |   |   |   |   |   |   |   |--- enginesize <= 91.00
## |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |--- enginesize >  91.00
## |   |   |   |   |   |   |   |   |--- value: [5389.00]
## |   |   |   |   |   |--- compressionratio >  9.55
## |   |   |   |   |   |   |--- value: [6479.00]
## |   |   |   |--- wheelbase >  94.40
## |   |   |   |   |--- curbweight <= 2115.50
## |   |   |   |   |   |--- carheight <= 54.00
## |   |   |   |   |   |   |--- citympg <= 30.50
## |   |   |   |   |   |   |   |--- carheight <= 52.90
## |   |   |   |   |   |   |   |   |--- value: [7198.00]
## |   |   |   |   |   |   |   |--- carheight >  52.90
## |   |   |   |   |   |   |   |   |--- value: [6938.00]
## |   |   |   |   |   |   |--- citympg >  30.50
## |   |   |   |   |   |   |   |--- carlength <= 157.35
## |   |   |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |   |   |--- value: [7605.75]
## |   |   |   |   |   |   |   |--- carlength >  157.35
## |   |   |   |   |   |   |   |   |--- compressionratio <= 9.50
## |   |   |   |   |   |   |   |   |   |--- carlength <= 164.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [8249.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  164.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2026.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7349.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2026.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |--- compressionratio >  9.50
## |   |   |   |   |   |   |   |   |   |--- value: [6575.00]
## |   |   |   |   |   |--- carheight >  54.00
## |   |   |   |   |   |   |--- compressionratio <= 9.10
## |   |   |   |   |   |   |   |--- curbweight <= 2012.50
## |   |   |   |   |   |   |   |   |--- value: [5348.00]
## |   |   |   |   |   |   |   |--- curbweight >  2012.50
## |   |   |   |   |   |   |   |   |--- value: [6338.00]
## |   |   |   |   |   |   |--- compressionratio >  9.10
## |   |   |   |   |   |   |   |--- curbweight <= 1913.50
## |   |   |   |   |   |   |   |   |--- value: [5499.00]
## |   |   |   |   |   |   |   |--- curbweight >  1913.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5000.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7099.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  5000.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 1990.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  1990.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7295.00]
## |   |   |   |   |--- curbweight >  2115.50
## |   |   |   |   |   |--- curbweight <= 2304.50
## |   |   |   |   |   |   |--- curbweight <= 2142.50
## |   |   |   |   |   |   |   |--- curbweight <= 2131.00
## |   |   |   |   |   |   |   |   |--- value: [8358.00]
## |   |   |   |   |   |   |   |--- curbweight >  2131.00
## |   |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |   |--- curbweight >  2142.50
## |   |   |   |   |   |   |   |--- highwaympg <= 36.50
## |   |   |   |   |   |   |   |   |--- wheelbase <= 95.10
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2186.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8058.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2186.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8238.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  95.10
## |   |   |   |   |   |   |   |   |   |--- carlength <= 170.85
## |   |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  170.85
## |   |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |   |   |--- highwaympg >  36.50
## |   |   |   |   |   |   |   |   |--- stroke <= 3.19
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 100.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  100.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7126.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.19
## |   |   |   |   |   |   |   |   |   |--- carwidth <= 64.95
## |   |   |   |   |   |   |   |   |   |   |--- value: [7788.00]
## |   |   |   |   |   |   |   |   |   |--- carwidth >  64.95
## |   |   |   |   |   |   |   |   |   |   |--- value: [7775.00]
## |   |   |   |   |   |--- curbweight >  2304.50
## |   |   |   |   |   |   |--- carlength <= 172.80
## |   |   |   |   |   |   |   |--- horsepower <= 75.00
## |   |   |   |   |   |   |   |   |--- value: [9495.00]
## |   |   |   |   |   |   |   |--- horsepower >  75.00
## |   |   |   |   |   |   |   |   |--- value: [9233.00]
## |   |   |   |   |   |   |--- carlength >  172.80
## |   |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |--- horsepower >  83.00
## |   |   |   |--- citympg <= 23.50
## |   |   |   |   |--- stroke <= 3.36
## |   |   |   |   |   |--- curbweight <= 2382.50
## |   |   |   |   |   |   |--- value: [11395.00]
## |   |   |   |   |   |--- curbweight >  2382.50
## |   |   |   |   |   |   |--- citympg <= 22.00
## |   |   |   |   |   |   |   |--- horsepower <= 123.00
## |   |   |   |   |   |   |   |   |--- citympg <= 19.00
## |   |   |   |   |   |   |   |   |   |--- value: [13645.00]
## |   |   |   |   |   |   |   |   |--- citympg >  19.00
## |   |   |   |   |   |   |   |   |   |--- value: [13495.00]
## |   |   |   |   |   |   |   |--- horsepower >  123.00
## |   |   |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |   |--- citympg >  22.00
## |   |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |   |--- stroke >  3.36
## |   |   |   |   |   |--- carlength <= 172.70
## |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |   |--- carlength >  172.70
## |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |--- citympg >  23.50
## |   |   |   |   |--- compressionratio <= 9.70
## |   |   |   |   |   |--- curbweight <= 2216.50
## |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |--- value: [8558.00]
## |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |--- wheelbase <= 93.35
## |   |   |   |   |   |   |   |   |--- value: [7689.00]
## |   |   |   |   |   |   |   |--- wheelbase >  93.35
## |   |   |   |   |   |   |   |   |--- curbweight <= 2210.50
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 32.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7957.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  32.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7975.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2210.50
## |   |   |   |   |   |   |   |   |   |--- value: [8195.00]
## |   |   |   |   |   |--- curbweight >  2216.50
## |   |   |   |   |   |   |--- horsepower <= 89.00
## |   |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |   |--- carlength <= 175.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 56.55
## |   |   |   |   |   |   |   |   |   |   |--- value: [9095.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  56.55
## |   |   |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |   |   |   |--- carlength >  175.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2417.50
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 63.85
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [10295.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  63.85
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2417.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11245.00]
## |   |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |   |--- carlength <= 172.70
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5125.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2385.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6989.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2385.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8189.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  5125.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |   |   |   |--- carlength >  172.70
## |   |   |   |   |   |   |   |   |   |--- compressionratio <= 8.55
## |   |   |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |   |   |--- compressionratio >  8.55
## |   |   |   |   |   |   |   |   |   |   |--- value: [8845.00]
## |   |   |   |   |   |   |--- horsepower >  89.00
## |   |   |   |   |   |   |   |--- carlength <= 162.50
## |   |   |   |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |   |   |   |   |--- carlength >  162.50
## |   |   |   |   |   |   |   |   |--- compressionratio <= 9.20
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 33.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2456.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 6
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2456.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11248.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  33.00
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 65.85
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9549.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  65.85
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8948.00]
## |   |   |   |   |   |   |   |   |--- compressionratio >  9.20
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2545.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2538.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2538.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8449.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2545.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |--- compressionratio >  9.70
## |   |   |   |   |   |--- highwaympg <= 31.00
## |   |   |   |   |   |   |--- value: [13950.00]
## |   |   |   |   |   |--- highwaympg >  31.00
## |   |   |   |   |   |   |--- value: [9995.00]
## |   |--- curbweight >  2557.00
## |   |   |--- carwidth <= 68.65
## |   |   |   |--- peakrpm <= 4375.00
## |   |   |   |   |--- citympg <= 23.00
## |   |   |   |   |   |--- citympg <= 20.50
## |   |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |   |   |--- citympg >  20.50
## |   |   |   |   |   |   |--- value: [20970.00]
## |   |   |   |   |--- citympg >  23.00
## |   |   |   |   |   |--- compressionratio <= 21.50
## |   |   |   |   |   |   |--- curbweight <= 3224.50
## |   |   |   |   |   |   |   |--- value: [13200.00]
## |   |   |   |   |   |   |--- curbweight >  3224.50
## |   |   |   |   |   |   |   |--- carlength <= 192.80
## |   |   |   |   |   |   |   |   |--- value: [17425.00]
## |   |   |   |   |   |   |   |--- carlength >  192.80
## |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |   |   |--- compressionratio >  21.50
## |   |   |   |   |   |   |--- value: [18344.00]
## |   |   |   |--- peakrpm >  4375.00
## |   |   |   |   |--- highwaympg <= 27.50
## |   |   |   |   |   |--- peakrpm <= 5350.00
## |   |   |   |   |   |   |--- horsepower <= 153.00
## |   |   |   |   |   |   |   |--- stroke <= 3.18
## |   |   |   |   |   |   |   |   |--- enginesize <= 130.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3180.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [15580.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3180.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [16695.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  130.00
## |   |   |   |   |   |   |   |   |   |--- value: [18280.00]
## |   |   |   |   |   |   |   |--- stroke >  3.18
## |   |   |   |   |   |   |   |   |--- curbweight <= 3067.50
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 138.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [11900.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  138.00
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2879.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2879.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- curbweight >  3067.50
## |   |   |   |   |   |   |   |   |   |--- carlength <= 185.65
## |   |   |   |   |   |   |   |   |   |   |--- carlength <= 183.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |   |   |   |--- carlength >  183.15
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [14399.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  185.65
## |   |   |   |   |   |   |   |   |   |   |--- value: [16630.00]
## |   |   |   |   |   |   |--- horsepower >  153.00
## |   |   |   |   |   |   |   |--- carheight <= 50.85
## |   |   |   |   |   |   |   |   |--- wheelbase <= 95.25
## |   |   |   |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  95.25
## |   |   |   |   |   |   |   |   |   |--- value: [18399.00]
## |   |   |   |   |   |   |   |--- carheight >  50.85
## |   |   |   |   |   |   |   |   |--- curbweight <= 2996.00
## |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5100.00
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.29
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16503.00]
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.29
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [16500.00]
## |   |   |   |   |   |   |   |   |   |--- peakrpm >  5100.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2996.00
## |   |   |   |   |   |   |   |   |   |--- carheight <= 53.05
## |   |   |   |   |   |   |   |   |   |   |--- value: [15998.00]
## |   |   |   |   |   |   |   |   |   |--- carheight >  53.05
## |   |   |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |--- peakrpm >  5350.00
## |   |   |   |   |   |   |--- compressionratio <= 9.25
## |   |   |   |   |   |   |   |--- carheight <= 55.15
## |   |   |   |   |   |   |   |   |--- carlength <= 177.40
## |   |   |   |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |   |   |   |   |   |--- carlength >  177.40
## |   |   |   |   |   |   |   |   |   |--- value: [17859.17]
## |   |   |   |   |   |   |   |--- carheight >  55.15
## |   |   |   |   |   |   |   |   |--- wheelbase <= 103.55
## |   |   |   |   |   |   |   |   |   |--- value: [18620.00]
## |   |   |   |   |   |   |   |   |--- wheelbase >  103.55
## |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |--- compressionratio >  9.25
## |   |   |   |   |   |   |   |--- value: [22018.00]
## |   |   |   |   |--- highwaympg >  27.50
## |   |   |   |   |   |--- carwidth <= 65.40
## |   |   |   |   |   |   |--- carwidth <= 64.40
## |   |   |   |   |   |   |   |--- value: [8778.00]
## |   |   |   |   |   |   |--- carwidth >  64.40
## |   |   |   |   |   |   |   |--- value: [11048.00]
## |   |   |   |   |   |--- carwidth >  65.40
## |   |   |   |   |   |   |--- curbweight <= 2736.00
## |   |   |   |   |   |   |   |--- boreratio <= 3.10
## |   |   |   |   |   |   |   |   |--- enginesize <= 109.00
## |   |   |   |   |   |   |   |   |   |--- value: [13845.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  109.00
## |   |   |   |   |   |   |   |   |   |--- value: [15040.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.10
## |   |   |   |   |   |   |   |   |--- stroke <= 3.45
## |   |   |   |   |   |   |   |   |   |--- symboling <= 2.50
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 99.75
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  99.75
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [12290.00]
## |   |   |   |   |   |   |   |   |   |--- symboling >  2.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.45
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2696.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2696.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [11549.00]
## |   |   |   |   |   |   |--- curbweight >  2736.00
## |   |   |   |   |   |   |   |--- wheelbase <= 101.70
## |   |   |   |   |   |   |   |   |--- carwidth <= 66.05
## |   |   |   |   |   |   |   |   |   |--- value: [17669.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  66.05
## |   |   |   |   |   |   |   |   |   |--- value: [15510.00]
## |   |   |   |   |   |   |   |--- wheelbase >  101.70
## |   |   |   |   |   |   |   |   |--- symboling <= -1.50
## |   |   |   |   |   |   |   |   |   |--- value: [12940.00]
## |   |   |   |   |   |   |   |   |--- symboling >  -1.50
## |   |   |   |   |   |   |   |   |   |--- value: [13415.00]
## |   |   |--- carwidth >  68.65
## |   |   |   |--- curbweight <= 2983.00
## |   |   |   |   |--- carheight <= 55.60
## |   |   |   |   |   |--- value: [16845.00]
## |   |   |   |   |--- carheight >  55.60
## |   |   |   |   |   |--- value: [18920.00]
## |   |   |   |--- curbweight >  2983.00
## |   |   |   |   |--- stroke <= 3.28
## |   |   |   |   |   |--- boreratio <= 3.68
## |   |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |   |   |--- boreratio >  3.68
## |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |   |--- stroke >  3.28
## |   |   |   |   |   |--- value: [23875.00]
## |--- enginesize >  182.00
## |   |--- highwaympg <= 17.00
## |   |   |--- symboling <= 0.50
## |   |   |   |--- value: [40960.00]
## |   |   |--- symboling >  0.50
## |   |   |   |--- value: [45400.00]
## |   |--- highwaympg >  17.00
## |   |   |--- horsepower <= 139.00
## |   |   |   |--- carlength <= 189.20
## |   |   |   |   |--- value: [28176.00]
## |   |   |   |--- carlength >  189.20
## |   |   |   |   |--- value: [25552.00]
## |   |   |--- horsepower >  139.00
## |   |   |   |--- curbweight <= 3373.00
## |   |   |   |   |--- curbweight <= 3015.00
## |   |   |   |   |   |--- curbweight <= 2778.00
## |   |   |   |   |   |   |--- value: [32528.00]
## |   |   |   |   |   |--- curbweight >  2778.00
## |   |   |   |   |   |   |--- value: [37028.00]
## |   |   |   |   |--- curbweight >  3015.00
## |   |   |   |   |   |--- compressionratio <= 9.00
## |   |   |   |   |   |   |--- value: [30760.00]
## |   |   |   |   |   |--- compressionratio >  9.00
## |   |   |   |   |   |   |--- value: [31400.50]
## |   |   |   |--- curbweight >  3373.00
## |   |   |   |   |--- carwidth <= 68.75
## |   |   |   |   |   |--- value: [41315.00]
## |   |   |   |   |--- carwidth >  68.75
## |   |   |   |   |   |--- carlength <= 198.30
## |   |   |   |   |   |   |--- compressionratio <= 8.15
## |   |   |   |   |   |   |   |--- value: [36880.00]
## |   |   |   |   |   |   |--- compressionratio >  8.15
## |   |   |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |   |--- carlength >  198.30
## |   |   |   |   |   |   |--- citympg <= 15.50
## |   |   |   |   |   |   |   |--- value: [32250.00]
## |   |   |   |   |   |   |--- citympg >  15.50
## |   |   |   |   |   |   |   |--- value: [34184.00]

Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##            predictor  importancia
## 6         enginesize     0.670830
## 5         curbweight     0.206807
## 10        horsepower     0.033579
## 13        highwaympg     0.026208
## 3           carwidth     0.018517
## 12           citympg     0.013630
## 11           peakrpm     0.012132
## 8             stroke     0.005338
## 9   compressionratio     0.003770
## 1          wheelbase     0.003268
## 2          carlength     0.002260
## 0          symboling     0.001523
## 4          carheight     0.001100
## 7          boreratio     0.001040

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, peakrpm, carheight y wheelbase

Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([14489.   ,  6849.   , 13495.   , 17075.   , 18620.   , 17199.   ,
##        13415.   , 40960.   , 15690.   ,  7053.   , 15690.   ,  7775.   ,
##         9549.   , 11199.   , 11248.   , 25552.   ,  9095.   , 20970.   ,
##         9959.   , 23875.   ,  9959.   , 32250.   ,  8013.   , 11248.   ,
##        18920.   , 12940.   , 15690.   ,  8013.   , 17859.167,  7775.   ,
##        15580.   ,  8495.   ,  6338.   , 16630.   , 25552.   , 32528.   ,
##        22625.   ,  8845.   , 16925.   ,  9279.   ,  6479.   ])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 83           3       95.9  ...      14869.0          14489.000
## 91           1       94.5  ...       6649.0           6849.000
## 1            3       88.6  ...      16500.0          13495.000
## 110          0      114.2  ...      13860.0          17075.000
## 136          3       99.1  ...      18150.0          18620.000
## 105          3       91.3  ...      19699.0          17199.000
## 197         -1      104.3  ...      16515.0          13415.000
## 49           0      102.0  ...      36000.0          40960.000
## 181         -1      104.5  ...      15750.0          15690.000
## 140          2       93.3  ...       7603.0           7053.000
## 198         -2      104.3  ...      18420.0          15690.000
## 146          0       97.0  ...       7463.0           7775.000
## 99           0       97.2  ...       8949.0           9549.000
## 130          0       96.1  ...       9295.0          11199.000
## 145          0       97.0  ...      11259.0          11248.000
## 70          -1      115.6  ...      31600.0          25552.000
## 37           0       96.5  ...       7895.0           9095.000
## 13           0      101.2  ...      21105.0          20970.000
## 5            2       99.8  ...      15250.0           9959.000
## 203         -1      109.1  ...      22470.0          23875.000
## 131          2       96.1  ...       9895.0           9959.000
## 48           0      113.0  ...      35550.0          32250.000
## 174         -1      102.4  ...      10698.0           8013.000
## 41           0       96.5  ...      12945.0          11248.000
## 6            1      105.8  ...      17710.0          18920.000
## 196         -2      104.3  ...      15985.0          12940.000
## 199         -1      104.3  ...      18950.0          15690.000
## 63           0       98.8  ...      10795.0           8013.000
## 191          0      100.4  ...      13295.0          17859.167
## 184          2       97.3  ...       7995.0           7775.000
## 149          0       96.9  ...      11694.0          15580.000
## 62           0       98.8  ...      10245.0           8495.000
## 152          1       95.7  ...       6488.0           6338.000
## 109          0      114.2  ...      12440.0          16630.000
## 68          -1      110.0  ...      28248.0          25552.000
## 127          3       89.5  ...      34028.0          32528.000
## 201         -1      109.1  ...      19045.0          22625.000
## 61           1       98.8  ...      10595.0           8845.000
## 10           2      101.2  ...      16430.0          16925.000
## 87           1       96.3  ...       9279.0           9279.000
## 32           1       93.7  ...       5399.0           6479.000
## 
## [41 rows x 16 columns]

RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2583.235362230902

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2583.235362230902

Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 2022)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=2022)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Variables de importancia

# pendiente ... ...

Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([14372.25   ,  6870.825  , 12041.25   , 17622.     , 16788.1    ,
##        17087.6    , 13827.05   , 38090.425  , 17047.05   ,  7904.1    ,
##        17001.     ,  8493.55   ,  9293.95   , 12045.45   , 10209.1    ,
##        27342.3    ,  8752.85   , 18803.75   , 14565.55835, 19037.75   ,
##        11802.6    , 35166.3    , 10769.35   , 11089.05   , 20403.95835,
##        13693.45   , 17557.3    , 10204.65   , 15593.4167 ,  8167.5    ,
##        12719.2    ,  9141.1    ,  6567.     , 17678.45   , 25996.95   ,
##        34271.625  , 19811.6    ,  8931.9    , 14121.05   ,  9824.2    ,
##         6062.59375])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 83           3       95.9  ...      14869.0        14372.25000
## 91           1       94.5  ...       6649.0         6870.82500
## 1            3       88.6  ...      16500.0        12041.25000
## 110          0      114.2  ...      13860.0        17622.00000
## 136          3       99.1  ...      18150.0        16788.10000
## 105          3       91.3  ...      19699.0        17087.60000
## 197         -1      104.3  ...      16515.0        13827.05000
## 49           0      102.0  ...      36000.0        38090.42500
## 181         -1      104.5  ...      15750.0        17047.05000
## 140          2       93.3  ...       7603.0         7904.10000
## 198         -2      104.3  ...      18420.0        17001.00000
## 146          0       97.0  ...       7463.0         8493.55000
## 99           0       97.2  ...       8949.0         9293.95000
## 130          0       96.1  ...       9295.0        12045.45000
## 145          0       97.0  ...      11259.0        10209.10000
## 70          -1      115.6  ...      31600.0        27342.30000
## 37           0       96.5  ...       7895.0         8752.85000
## 13           0      101.2  ...      21105.0        18803.75000
## 5            2       99.8  ...      15250.0        14565.55835
## 203         -1      109.1  ...      22470.0        19037.75000
## 131          2       96.1  ...       9895.0        11802.60000
## 48           0      113.0  ...      35550.0        35166.30000
## 174         -1      102.4  ...      10698.0        10769.35000
## 41           0       96.5  ...      12945.0        11089.05000
## 6            1      105.8  ...      17710.0        20403.95835
## 196         -2      104.3  ...      15985.0        13693.45000
## 199         -1      104.3  ...      18950.0        17557.30000
## 63           0       98.8  ...      10795.0        10204.65000
## 191          0      100.4  ...      13295.0        15593.41670
## 184          2       97.3  ...       7995.0         8167.50000
## 149          0       96.9  ...      11694.0        12719.20000
## 62           0       98.8  ...      10245.0         9141.10000
## 152          1       95.7  ...       6488.0         6567.00000
## 109          0      114.2  ...      12440.0        17678.45000
## 68          -1      110.0  ...      28248.0        25996.95000
## 127          3       89.5  ...      34028.0        34271.62500
## 201         -1      109.1  ...      19045.0        19811.60000
## 61           1       98.8  ...      10595.0         8931.90000
## 10           2      101.2  ...      16430.0        14121.05000
## 87           1       96.3  ...       9279.0         9824.20000
## 32           1       93.7  ...       5399.0         6062.59375
## 
## [41 rows x 16 columns]

RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2073.366352389405

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2073.366352389405

Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 83           3       95.9  ...             14489.000           14372.25000
## 91           1       94.5  ...              6849.000            6870.82500
## 1            3       88.6  ...             13495.000           12041.25000
## 110          0      114.2  ...             17075.000           17622.00000
## 136          3       99.1  ...             18620.000           16788.10000
## 105          3       91.3  ...             17199.000           17087.60000
## 197         -1      104.3  ...             13415.000           13827.05000
## 49           0      102.0  ...             40960.000           38090.42500
## 181         -1      104.5  ...             15690.000           17047.05000
## 140          2       93.3  ...              7053.000            7904.10000
## 198         -2      104.3  ...             15690.000           17001.00000
## 146          0       97.0  ...              7775.000            8493.55000
## 99           0       97.2  ...              9549.000            9293.95000
## 130          0       96.1  ...             11199.000           12045.45000
## 145          0       97.0  ...             11248.000           10209.10000
## 70          -1      115.6  ...             25552.000           27342.30000
## 37           0       96.5  ...              9095.000            8752.85000
## 13           0      101.2  ...             20970.000           18803.75000
## 5            2       99.8  ...              9959.000           14565.55835
## 203         -1      109.1  ...             23875.000           19037.75000
## 131          2       96.1  ...              9959.000           11802.60000
## 48           0      113.0  ...             32250.000           35166.30000
## 174         -1      102.4  ...              8013.000           10769.35000
## 41           0       96.5  ...             11248.000           11089.05000
## 6            1      105.8  ...             18920.000           20403.95835
## 196         -2      104.3  ...             12940.000           13693.45000
## 199         -1      104.3  ...             15690.000           17557.30000
## 63           0       98.8  ...              8013.000           10204.65000
## 191          0      100.4  ...             17859.167           15593.41670
## 184          2       97.3  ...              7775.000            8167.50000
## 149          0       96.9  ...             15580.000           12719.20000
## 62           0       98.8  ...              8495.000            9141.10000
## 152          1       95.7  ...              6338.000            6567.00000
## 109          0      114.2  ...             16630.000           17678.45000
## 68          -1      110.0  ...             25552.000           25996.95000
## 127          3       89.5  ...             32528.000           34271.62500
## 201         -1      109.1  ...             22625.000           19811.60000
## 61           1       98.8  ...              8845.000            8931.90000
## 10           2      101.2  ...             16925.000           14121.05000
## 87           1       96.3  ...              9279.000            9824.20000
## 32           1       93.7  ...              6479.000            6062.59375
## 
## [41 rows x 18 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3274.20040901, 2583.23536223, 2073.36635239]])

Se construye data.frame a partir del rreglo nmpy

rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  3274.200409  2583.235362  2073.366352

Interpretación

El RMSE del modelo de regresión lineal es de 3274.200409

El RMSE del modelo de árbol de regresión es de 2583.235362

El RMSE del modelo de bosques aleatorios es de 2073.366352

Con estos resultados, tomando en cuenta las cifras de RMSE de cada uno de los modelos, podemos decir que en Python el modelo más óptimo para estos datos con la semilla 1279 es el modelo de bosques aleatorios, resultado que también resulta el más óptimo si se utilizan los mismos datos y la semilla 2022.