title: “Caso 6. Comparación modelos de regresión. Datos Precios autos. Programamción Python”
author: “Luis Alberto Jimenez Soto”
date: “2022-10-18”
output:
html_document:
code_folding: hide
toc: true
toc_float: true
toc_depth: 6
number_sections: yes

Objetivo

Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).

Descripción

Desarrollo

Cargar librerías

# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns  # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score

Cargar datos

datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
datos
##      Unnamed: 0  symboling  wheelbase  ...  citympg  highwaympg    price
## 0             1          3       88.6  ...       21          27  13495.0
## 1             2          3       88.6  ...       21          27  16500.0
## 2             3          1       94.5  ...       19          26  16500.0
## 3             4          2       99.8  ...       24          30  13950.0
## 4             5          2       99.4  ...       18          22  17450.0
## ..          ...        ...        ...  ...      ...         ...      ...
## 200         201         -1      109.1  ...       23          28  16845.0
## 201         202         -1      109.1  ...       19          25  19045.0
## 202         203         -1      109.1  ...       18          23  21485.0
## 203         204         -1      109.1  ...       26          27  22470.0
## 204         205         -1      109.1  ...       19          25  22625.0
## 
## [205 rows x 16 columns]

Exploración de datos

print("Observaciones y variables: ", datos.shape)
## Observaciones y variables:  (205, 16)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## Unnamed: 0            int64
## symboling             int64
## wheelbase           float64
## carlength           float64
## carwidth            float64
## carheight           float64
## curbweight            int64
## enginesize            int64
## boreratio           float64
## stroke              float64
## compressionratio    float64
## horsepower            int64
## peakrpm               int64
## citympg               int64
## highwaympg            int64
## price               float64
## dtype: object

Diccionario de datos

Col Nombre Descripción
1 Symboling Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical)
2 wheelbase Weelbase of car (Numeric). Distancia de ejes en pulgadas.
3 carlength Length of car (Numeric). Longitud
4 carwidth Width of car (Numeric). Amplitud
5 carheight height of car (Numeric). Altura
6 curbweight The weight of a car without occupants or baggage. (Numeric). Peso del auto
7 enginesize Size of car (Numeric). Tamaño del carro en …
8 boreratio Boreratio of car (Numeric). Eficiencia de motor
9 stroke Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión
10 compressionratio compression ratio of car (Numeric). Comprensión o medición de presión en motor
11 horsepower Horsepower (Numeric). Poder del carro
12 peakrpm car peak rpm (Numeric). Picos de revoluciones por minuto
13 citympg Mileage in city (Numeric). Consumo de gasolina
14 highwaympg Mileage on highway (Numeric). Consumo de gasolina
16

price

(Dependent variable)

Price of car (Numeric). Precio del carro en dólares

~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~

Limpiar datos

Dejar solo las variables necesarias:

‘symboling’, ‘wheelbase’, ‘carlength’, ‘carwidth’, ‘carheight’, ‘curbweight’, ‘enginesize’, ‘boreratio’, ‘stroke’, ‘compressionratio’, ‘horsepower’, ‘peakrpm’, ‘citympg’, ‘highwaympg’, ‘price’

datos = datos[['symboling', 'wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price']]
datos.describe()
##         symboling   wheelbase   carlength  ...     citympg  highwaympg         price
## count  205.000000  205.000000  205.000000  ...  205.000000  205.000000    205.000000
## mean     0.834146   98.756585  174.049268  ...   25.219512   30.751220  13276.710571
## std      1.245307    6.021776   12.337289  ...    6.542142    6.886443   7988.852332
## min     -2.000000   86.600000  141.100000  ...   13.000000   16.000000   5118.000000
## 25%      0.000000   94.500000  166.300000  ...   19.000000   25.000000   7788.000000
## 50%      1.000000   97.000000  173.200000  ...   24.000000   30.000000  10295.000000
## 75%      2.000000  102.400000  183.100000  ...   30.000000   34.000000  16503.000000
## max      3.000000  120.900000  208.100000  ...   49.000000   54.000000  45400.000000
## 
## [8 rows x 15 columns]
datos
##      symboling  wheelbase  carlength  ...  citympg  highwaympg    price
## 0            3       88.6      168.8  ...       21          27  13495.0
## 1            3       88.6      168.8  ...       21          27  16500.0
## 2            1       94.5      171.2  ...       19          26  16500.0
## 3            2       99.8      176.6  ...       24          30  13950.0
## 4            2       99.4      176.6  ...       18          22  17450.0
## ..         ...        ...        ...  ...      ...         ...      ...
## 200         -1      109.1      188.8  ...       23          28  16845.0
## 201         -1      109.1      188.8  ...       19          25  19045.0
## 202         -1      109.1      188.8  ...       18          23  21485.0
## 203         -1      109.1      188.8  ...       26          27  22470.0
## 204         -1      109.1      188.8  ...       19          25  22625.0
## 
## [205 rows x 15 columns]

Datos de entrenamiento y validación

Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1271

X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos.drop(columns = "price"), datos['price'],train_size = 0.80,  random_state = 1271)

Datos de entrenamiento

X_entrena
##      symboling  wheelbase  carlength  ...  peakrpm  citympg  highwaympg
## 156          0       95.7      166.3  ...     4800       30          37
## 69           0      106.7      187.5  ...     4350       22          25
## 116          0      107.9      186.7  ...     4150       28          33
## 111          0      107.9      186.7  ...     5000       19          24
## 115          0      107.9      186.7  ...     5000       19          24
## ..         ...        ...        ...  ...      ...      ...         ...
## 125          3       94.5      168.9  ...     5500       19          27
## 36           0       96.5      157.1  ...     6000       30          34
## 174         -1      102.4      175.6  ...     4500       30          33
## 27           1       93.7      157.3  ...     5500       24          30
## 56           3       95.3      169.0  ...     6000       17          23
## 
## [164 rows x 14 columns]

Datos de validación

X_valida
##      symboling  wheelbase  carlength  ...  peakrpm  citympg  highwaympg
## 84           3       95.9      173.2  ...     5000       19          24
## 139          2       93.7      157.9  ...     4400       26          31
## 143          0       97.2      172.0  ...     5200       26          32
## 38           0       96.5      167.5  ...     5800       27          33
## 15           0      103.5      189.0  ...     5400       16          22
## 122          1       93.7      167.3  ...     5500       31          38
## 150          1       95.7      158.7  ...     4800       35          39
## 160          0       95.7      166.3  ...     4800       38          47
## 179          3      102.9      183.5  ...     5200       19          24
## 30           2       86.6      144.6  ...     4800       49          54
## 9            0       99.5      178.2  ...     5500       16          22
## 35           0       96.5      163.4  ...     6000       30          34
## 76           2       93.7      157.3  ...     5500       37          41
## 168          2       98.4      176.2  ...     4800       24          30
## 8            1      105.8      192.7  ...     5500       17          20
## 89           1       94.5      165.3  ...     5200       31          37
## 5            2       99.8      177.3  ...     5500       19          25
## 145          0       97.0      172.0  ...     4800       24          29
## 51           1       93.1      159.1  ...     5000       31          38
## 66           0      104.9      175.0  ...     4200       31          39
## 49           0      102.0      191.7  ...     5000       13          17
## 43           0       94.3      170.7  ...     4800       24          29
## 186          2       97.3      171.7  ...     5250       27          34
## 141          0       97.2      172.0  ...     4800       32          37
## 142          0       97.2      172.0  ...     4400       28          33
## 23           1       93.7      157.3  ...     5500       24          30
## 193          0      100.4      183.1  ...     5500       25          31
## 10           2      101.2      176.8  ...     5800       23          29
## 1            3       88.6      168.8  ...     5000       21          27
## 86           1       96.3      172.4  ...     5000       25          32
## 57           3       95.3      169.0  ...     6000       17          23
## 62           0       98.8      177.8  ...     4800       26          32
## 20           0       94.5      158.8  ...     5400       38          43
## 13           0      101.2      176.8  ...     4250       21          28
## 185          2       97.3      171.7  ...     5250       27          34
## 119          1       93.7      157.3  ...     5500       24          30
## 198         -2      104.3      188.8  ...     5100       17          22
## 165          1       94.5      168.7  ...     6600       26          29
## 196         -2      104.3      188.8  ...     5400       24          28
## 3            2       99.8      176.6  ...     5500       24          30
## 32           1       93.7      150.0  ...     5500       38          42
## 
## [41 rows x 14 columns]

Modelos Supervisados

Modelo de regresión lineal múltiple. (RM)

Se construye el modelo de regresión lineal múltiple (rm)

modelo_rm = LinearRegression()
 
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Coeficientes

Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)

modelo_rm.coef_
## array([ 2.11081298e+02,  1.62441842e+02, -7.45636564e+01,  3.90949747e+02,
##         9.97443007e+01,  6.79302644e-01,  1.38939896e+02, -1.37839030e+03,
##        -3.87041646e+03,  3.28628928e+02,  3.32912969e+01,  2.53731651e+00,
##        -2.07657002e+02,  1.08210062e+02])
  • En modelos lineales múltiples el estadístico Adjusted R-squared: 0.8347 significa que las variables independientes explican aproximadamente el 83.47% de la variable dependiente precio.
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.8609377759913558

Predicciones del modelo rm

predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [15299.82250044  8572.83459328 11337.10303479  9970.03560356
##  27836.14524479  7294.84326523  5412.33585414  6021.46270628
##  22751.98267255  2262.18611057 16885.41402655  7950.15614101
##   6545.73015596 13768.61197328 18155.76432809  6282.43971037
##  15678.75014976 10814.12751094  5950.89996771 13797.49363348
##  50223.89310709  6592.59871662  9738.13946724  9249.63962107
##   8663.06939391  8424.0042451  10477.5299584  13567.64320057
##  13740.30019382  9806.25989378  7773.0707937   9821.45440669
##   5933.70180563 17035.70356939  9695.3434007   8424.0042451
##  15753.99075796 12157.06245924 16003.19579602 11674.06257723]

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 84           3       95.9  ...    14489.000       15299.822500
## 139          2       93.7  ...     7053.000        8572.834593
## 143          0       97.2  ...     9960.000       11337.103035
## 38           0       96.5  ...     9095.000        9970.035604
## 15           0      103.5  ...    30760.000       27836.145245
## 122          1       93.7  ...     7609.000        7294.843265
## 150          1       95.7  ...     5348.000        5412.335854
## 160          0       95.7  ...     7738.000        6021.462706
## 179          3      102.9  ...    15998.000       22751.982673
## 30           2       86.6  ...     6479.000        2262.186111
## 9            0       99.5  ...    17859.167       16885.414027
## 35           0       96.5  ...     7295.000        7950.156141
## 76           2       93.7  ...     5389.000        6545.730156
## 168          2       98.4  ...     9639.000       13768.611973
## 8            1      105.8  ...    23875.000       18155.764328
## 89           1       94.5  ...     5499.000        6282.439710
## 5            2       99.8  ...    15250.000       15678.750150
## 145          0       97.0  ...    11259.000       10814.127511
## 51           1       93.1  ...     6095.000        5950.899968
## 66           0      104.9  ...    18344.000       13797.493633
## 49           0      102.0  ...    36000.000       50223.893107
## 43           0       94.3  ...     6785.000        6592.598717
## 186          2       97.3  ...     8495.000        9738.139467
## 141          0       97.2  ...     7126.000        9249.639621
## 142          0       97.2  ...     7775.000        8663.069394
## 23           1       93.7  ...     7957.000        8424.004245
## 193          0      100.4  ...    12290.000       10477.529958
## 10           2      101.2  ...    16430.000       13567.643201
## 1            3       88.6  ...    16500.000       13740.300194
## 86           1       96.3  ...     8189.000        9806.259894
## 57           3       95.3  ...    13645.000        7773.070794
## 62           0       98.8  ...    10245.000        9821.454407
## 20           0       94.5  ...     6575.000        5933.701806
## 13           0      101.2  ...    21105.000       17035.703569
## 185          2       97.3  ...     8195.000        9695.343401
## 119          1       93.7  ...     7957.000        8424.004245
## 198         -2      104.3  ...    18420.000       15753.990758
## 165          1       94.5  ...     9298.000       12157.062459
## 196         -2      104.3  ...    15985.000       16003.195796
## 3            2       99.8  ...    13950.000       11674.062577
## 32           1       93.7  ...     5399.000        5607.114218
## 
## [41 rows x 16 columns]

RMSE modelo de rm

rmse_rm = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rm,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3351.4574039517665

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3351.4574039517665

Modelo de árbol de regresión (AR)

Se construye el modelo de árbol de regresión (ar)

modelo_ar = DecisionTreeRegressor(
            #max_depth         = 3,
            random_state      = 1271
          )

Entrenar el modelo

modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1271)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Visualización de árbol de regresión

fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 153
plot = plot_tree(
            decision_tree = modelo_ar,
            feature_names = datos.drop(columns = "price").columns,
            class_names   = 'price',
            filled        = True,
            impurity      = False,
            fontsize      = 10,
            precision     = 2,
            ax            = ax
       )
plot

Reglas de asociación del árbol

texto_modelo = export_text(
                    decision_tree = modelo_ar,
                    feature_names = list(datos.drop(columns = "price").columns)
               )
print(texto_modelo)
## |--- enginesize <= 182.00
## |   |--- curbweight <= 2659.50
## |   |   |--- curbweight <= 2291.50
## |   |   |   |--- symboling <= 2.50
## |   |   |   |   |--- curbweight <= 2115.50
## |   |   |   |   |   |--- wheelbase <= 94.10
## |   |   |   |   |   |   |--- stroke <= 3.09
## |   |   |   |   |   |   |   |--- citympg <= 39.00
## |   |   |   |   |   |   |   |   |--- value: [5118.00]
## |   |   |   |   |   |   |   |--- citympg >  39.00
## |   |   |   |   |   |   |   |   |--- value: [5151.00]
## |   |   |   |   |   |   |--- stroke >  3.09
## |   |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |   |--- highwaympg <= 32.50
## |   |   |   |   |   |   |   |   |   |--- value: [5195.00]
## |   |   |   |   |   |   |   |   |--- highwaympg >  32.50
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 39.50
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 93.40
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  93.40
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  39.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [5572.00]
## |   |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |   |--- boreratio <= 3.05
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1978.00
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 63.90
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [6229.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  63.90
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1978.00
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7150.50]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- boreratio >  3.05
## |   |   |   |   |   |   |   |   |   |--- value: [7395.00]
## |   |   |   |   |   |--- wheelbase >  94.10
## |   |   |   |   |   |   |--- carheight <= 54.00
## |   |   |   |   |   |   |   |--- compressionratio <= 9.20
## |   |   |   |   |   |   |   |   |--- carheight <= 52.90
## |   |   |   |   |   |   |   |   |   |--- value: [7198.00]
## |   |   |   |   |   |   |   |   |--- carheight >  52.90
## |   |   |   |   |   |   |   |   |   |--- value: [6938.00]
## |   |   |   |   |   |   |   |--- compressionratio >  9.20
## |   |   |   |   |   |   |   |   |--- symboling <= 0.50
## |   |   |   |   |   |   |   |   |   |--- value: [8916.50]
## |   |   |   |   |   |   |   |   |--- symboling >  0.50
## |   |   |   |   |   |   |   |   |   |--- symboling <= 1.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2026.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2026.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- symboling >  1.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8249.00]
## |   |   |   |   |   |   |--- carheight >  54.00
## |   |   |   |   |   |   |   |--- carwidth <= 63.70
## |   |   |   |   |   |   |   |   |--- curbweight <= 2027.50
## |   |   |   |   |   |   |   |   |   |--- value: [6488.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2027.50
## |   |   |   |   |   |   |   |   |   |--- value: [6338.00]
## |   |   |   |   |   |   |   |--- carwidth >  63.70
## |   |   |   |   |   |   |   |   |--- curbweight <= 1944.50
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 1928.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6649.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  1928.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [6849.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  1944.50
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 62.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [7099.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  62.00
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase <= 95.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- wheelbase >  95.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7295.00]
## |   |   |   |   |--- curbweight >  2115.50
## |   |   |   |   |   |--- curbweight <= 2142.50
## |   |   |   |   |   |   |--- curbweight <= 2131.00
## |   |   |   |   |   |   |   |--- value: [8358.00]
## |   |   |   |   |   |   |--- curbweight >  2131.00
## |   |   |   |   |   |   |   |--- value: [9258.00]
## |   |   |   |   |   |--- curbweight >  2142.50
## |   |   |   |   |   |   |--- curbweight <= 2277.50
## |   |   |   |   |   |   |   |--- carheight <= 50.70
## |   |   |   |   |   |   |   |   |--- value: [8558.00]
## |   |   |   |   |   |   |   |--- carheight >  50.70
## |   |   |   |   |   |   |   |   |--- carwidth <= 63.90
## |   |   |   |   |   |   |   |   |   |--- stroke <= 3.02
## |   |   |   |   |   |   |   |   |   |   |--- value: [7603.00]
## |   |   |   |   |   |   |   |   |   |--- stroke >  3.02
## |   |   |   |   |   |   |   |   |   |   |--- value: [7689.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  63.90
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2206.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2186.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8058.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2186.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8238.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2206.50
## |   |   |   |   |   |   |   |   |   |   |--- citympg <= 37.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 4
## |   |   |   |   |   |   |   |   |   |   |--- citympg >  37.50
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [7788.00]
## |   |   |   |   |   |   |--- curbweight >  2277.50
## |   |   |   |   |   |   |   |--- highwaympg <= 34.50
## |   |   |   |   |   |   |   |   |--- carlength <= 171.60
## |   |   |   |   |   |   |   |   |   |--- value: [7898.00]
## |   |   |   |   |   |   |   |   |--- carlength >  171.60
## |   |   |   |   |   |   |   |   |   |--- value: [7463.00]
## |   |   |   |   |   |   |   |--- highwaympg >  34.50
## |   |   |   |   |   |   |   |   |--- value: [6918.00]
## |   |   |   |--- symboling >  2.50
## |   |   |   |   |--- carlength <= 162.50
## |   |   |   |   |   |--- value: [11595.00]
## |   |   |   |   |--- carlength >  162.50
## |   |   |   |   |   |--- value: [9980.00]
## |   |   |--- curbweight >  2291.50
## |   |   |   |--- highwaympg <= 29.50
## |   |   |   |   |--- horsepower <= 91.50
## |   |   |   |   |   |--- highwaympg <= 27.00
## |   |   |   |   |   |   |--- value: [9233.00]
## |   |   |   |   |   |--- highwaympg >  27.00
## |   |   |   |   |   |   |--- value: [8013.00]
## |   |   |   |   |--- horsepower >  91.50
## |   |   |   |   |   |--- wheelbase <= 100.15
## |   |   |   |   |   |   |--- citympg <= 16.50
## |   |   |   |   |   |   |   |--- value: [15645.00]
## |   |   |   |   |   |   |--- citympg >  16.50
## |   |   |   |   |   |   |   |--- peakrpm <= 6300.00
## |   |   |   |   |   |   |   |   |--- carwidth <= 65.30
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2506.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [12945.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2506.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [13495.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  65.30
## |   |   |   |   |   |   |   |   |   |--- carlength <= 171.30
## |   |   |   |   |   |   |   |   |   |   |--- value: [11395.00]
## |   |   |   |   |   |   |   |   |   |--- carlength >  171.30
## |   |   |   |   |   |   |   |   |   |   |--- compressionratio <= 8.51
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11694.00]
## |   |   |   |   |   |   |   |   |   |   |--- compressionratio >  8.51
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [11850.00]
## |   |   |   |   |   |   |   |--- peakrpm >  6300.00
## |   |   |   |   |   |   |   |   |--- value: [9538.00]
## |   |   |   |   |   |--- wheelbase >  100.15
## |   |   |   |   |   |   |--- value: [16925.00]
## |   |   |   |--- highwaympg >  29.50
## |   |   |   |   |--- carwidth <= 66.75
## |   |   |   |   |   |--- compressionratio <= 8.55
## |   |   |   |   |   |   |--- boreratio <= 3.34
## |   |   |   |   |   |   |   |--- symboling <= 2.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 2313.00
## |   |   |   |   |   |   |   |   |   |--- value: [9549.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2313.00
## |   |   |   |   |   |   |   |   |   |--- compressionratio <= 8.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [9279.00]
## |   |   |   |   |   |   |   |   |   |--- compressionratio >  8.00
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 57.25
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8949.00]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  57.25
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |   |   |--- symboling >  2.00
## |   |   |   |   |   |   |   |   |--- value: [9959.00]
## |   |   |   |   |   |   |--- boreratio >  3.34
## |   |   |   |   |   |   |   |--- carlength <= 172.70
## |   |   |   |   |   |   |   |   |--- value: [6989.00]
## |   |   |   |   |   |   |   |--- carlength >  172.70
## |   |   |   |   |   |   |   |   |--- curbweight <= 2431.50
## |   |   |   |   |   |   |   |   |   |--- value: [8499.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2431.50
## |   |   |   |   |   |   |   |   |   |--- value: [8921.00]
## |   |   |   |   |   |--- compressionratio >  8.55
## |   |   |   |   |   |   |--- curbweight <= 2412.00
## |   |   |   |   |   |   |   |--- curbweight <= 2397.50
## |   |   |   |   |   |   |   |   |--- curbweight <= 2302.00
## |   |   |   |   |   |   |   |   |   |--- enginesize <= 109.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [9995.00]
## |   |   |   |   |   |   |   |   |   |--- enginesize >  109.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [10345.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2302.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2349.00
## |   |   |   |   |   |   |   |   |   |   |--- stroke <= 3.47
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9495.00]
## |   |   |   |   |   |   |   |   |   |   |--- stroke >  3.47
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2349.00
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm <= 5300.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9720.00]
## |   |   |   |   |   |   |   |   |   |   |--- peakrpm >  5300.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [10295.00]
## |   |   |   |   |   |   |   |--- curbweight >  2397.50
## |   |   |   |   |   |   |   |   |--- value: [8495.00]
## |   |   |   |   |   |   |--- curbweight >  2412.00
## |   |   |   |   |   |   |   |--- citympg <= 24.50
## |   |   |   |   |   |   |   |   |--- carwidth <= 66.55
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2545.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [8449.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2545.50
## |   |   |   |   |   |   |   |   |   |   |--- curbweight <= 2565.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9989.00]
## |   |   |   |   |   |   |   |   |   |   |--- curbweight >  2565.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9295.00]
## |   |   |   |   |   |   |   |   |--- carwidth >  66.55
## |   |   |   |   |   |   |   |   |   |--- value: [9895.00]
## |   |   |   |   |   |   |   |--- citympg >  24.50
## |   |   |   |   |   |   |   |   |--- stroke <= 3.00
## |   |   |   |   |   |   |   |   |   |--- value: [10198.00]
## |   |   |   |   |   |   |   |   |--- stroke >  3.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2419.50
## |   |   |   |   |   |   |   |   |   |   |--- carheight <= 54.40
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [9988.00]
## |   |   |   |   |   |   |   |   |   |   |--- carheight >  54.40
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [10898.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2419.50
## |   |   |   |   |   |   |   |   |   |   |--- horsepower <= 78.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |   |   |--- horsepower >  78.50
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |--- carwidth >  66.75
## |   |   |   |   |   |--- value: [13845.00]
## |   |--- curbweight >  2659.50
## |   |   |--- carwidth <= 68.60
## |   |   |   |--- horsepower <= 118.00
## |   |   |   |   |--- horsepower <= 92.50
## |   |   |   |   |   |--- enginesize <= 105.50
## |   |   |   |   |   |   |--- value: [8778.00]
## |   |   |   |   |   |--- enginesize >  105.50
## |   |   |   |   |   |   |--- value: [11048.00]
## |   |   |   |   |--- horsepower >  92.50
## |   |   |   |   |   |--- curbweight <= 2736.00
## |   |   |   |   |   |   |--- boreratio <= 3.37
## |   |   |   |   |   |   |   |--- peakrpm <= 5375.00
## |   |   |   |   |   |   |   |   |--- value: [15040.00]
## |   |   |   |   |   |   |   |--- peakrpm >  5375.00
## |   |   |   |   |   |   |   |   |--- value: [13295.00]
## |   |   |   |   |   |   |--- boreratio >  3.37
## |   |   |   |   |   |   |   |--- carwidth <= 66.05
## |   |   |   |   |   |   |   |   |--- curbweight <= 2696.50
## |   |   |   |   |   |   |   |   |   |--- value: [11199.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  2696.50
## |   |   |   |   |   |   |   |   |   |--- value: [11549.00]
## |   |   |   |   |   |   |   |--- carwidth >  66.05
## |   |   |   |   |   |   |   |   |--- value: [12170.00]
## |   |   |   |   |   |--- curbweight >  2736.00
## |   |   |   |   |   |   |--- carwidth <= 66.45
## |   |   |   |   |   |   |   |--- boreratio <= 3.40
## |   |   |   |   |   |   |   |   |--- value: [17450.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.40
## |   |   |   |   |   |   |   |   |--- value: [17669.00]
## |   |   |   |   |   |   |--- carwidth >  66.45
## |   |   |   |   |   |   |   |--- curbweight <= 3241.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 3136.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3038.00
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 66.85
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15510.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  66.85
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 3
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3038.00
## |   |   |   |   |   |   |   |   |   |   |--- horsepower <= 96.00
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [15580.00]
## |   |   |   |   |   |   |   |   |   |   |--- horsepower >  96.00
## |   |   |   |   |   |   |   |   |   |   |   |--- truncated branch of depth 2
## |   |   |   |   |   |   |   |   |--- curbweight >  3136.00
## |   |   |   |   |   |   |   |   |   |--- horsepower <= 96.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [13200.00]
## |   |   |   |   |   |   |   |   |   |--- horsepower >  96.00
## |   |   |   |   |   |   |   |   |   |   |--- value: [12440.00]
## |   |   |   |   |   |   |   |--- curbweight >  3241.00
## |   |   |   |   |   |   |   |   |--- carheight <= 57.70
## |   |   |   |   |   |   |   |   |   |--- highwaympg <= 28.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [16695.00]
## |   |   |   |   |   |   |   |   |   |--- highwaympg >  28.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [17425.00]
## |   |   |   |   |   |   |   |   |--- carheight >  57.70
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 3457.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [13860.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  3457.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [17075.00]
## |   |   |   |--- horsepower >  118.00
## |   |   |   |   |--- stroke <= 3.24
## |   |   |   |   |   |--- enginesize <= 145.50
## |   |   |   |   |   |   |--- boreratio <= 3.77
## |   |   |   |   |   |   |   |--- carlength <= 187.75
## |   |   |   |   |   |   |   |   |--- peakrpm <= 5550.00
## |   |   |   |   |   |   |   |   |   |--- curbweight <= 2827.50
## |   |   |   |   |   |   |   |   |   |   |--- carwidth <= 66.30
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18280.00]
## |   |   |   |   |   |   |   |   |   |   |--- carwidth >  66.30
## |   |   |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |   |   |--- curbweight >  2827.50
## |   |   |   |   |   |   |   |   |   |   |--- value: [18620.00]
## |   |   |   |   |   |   |   |   |--- peakrpm >  5550.00
## |   |   |   |   |   |   |   |   |   |--- value: [18150.00]
## |   |   |   |   |   |   |   |--- carlength >  187.75
## |   |   |   |   |   |   |   |   |--- value: [18950.00]
## |   |   |   |   |   |   |--- boreratio >  3.77
## |   |   |   |   |   |   |   |--- value: [16503.00]
## |   |   |   |   |   |--- enginesize >  145.50
## |   |   |   |   |   |   |--- highwaympg <= 26.00
## |   |   |   |   |   |   |   |--- value: [24565.00]
## |   |   |   |   |   |   |--- highwaympg >  26.00
## |   |   |   |   |   |   |   |--- enginesize <= 157.50
## |   |   |   |   |   |   |   |   |--- value: [22018.00]
## |   |   |   |   |   |   |   |--- enginesize >  157.50
## |   |   |   |   |   |   |   |   |--- value: [20970.00]
## |   |   |   |   |--- stroke >  3.24
## |   |   |   |   |   |--- horsepower <= 153.00
## |   |   |   |   |   |   |--- curbweight <= 2877.00
## |   |   |   |   |   |   |   |--- boreratio <= 3.59
## |   |   |   |   |   |   |   |   |--- boreratio <= 3.58
## |   |   |   |   |   |   |   |   |   |--- value: [12629.00]
## |   |   |   |   |   |   |   |   |--- boreratio >  3.58
## |   |   |   |   |   |   |   |   |   |--- value: [12764.00]
## |   |   |   |   |   |   |   |--- boreratio >  3.59
## |   |   |   |   |   |   |   |   |--- value: [12964.00]
## |   |   |   |   |   |   |--- curbweight >  2877.00
## |   |   |   |   |   |   |   |--- compressionratio <= 8.00
## |   |   |   |   |   |   |   |   |--- value: [14869.00]
## |   |   |   |   |   |   |   |--- compressionratio >  8.00
## |   |   |   |   |   |   |   |   |--- curbweight <= 3195.50
## |   |   |   |   |   |   |   |   |   |--- value: [13499.00]
## |   |   |   |   |   |   |   |   |--- curbweight >  3195.50
## |   |   |   |   |   |   |   |   |   |--- value: [14399.00]
## |   |   |   |   |   |--- horsepower >  153.00
## |   |   |   |   |   |   |--- boreratio <= 3.35
## |   |   |   |   |   |   |   |--- wheelbase <= 103.70
## |   |   |   |   |   |   |   |   |--- citympg <= 19.50
## |   |   |   |   |   |   |   |   |   |--- value: [16500.00]
## |   |   |   |   |   |   |   |   |--- citympg >  19.50
## |   |   |   |   |   |   |   |   |   |--- value: [16558.00]
## |   |   |   |   |   |   |   |--- wheelbase >  103.70
## |   |   |   |   |   |   |   |   |--- enginesize <= 166.00
## |   |   |   |   |   |   |   |   |   |--- value: [15750.00]
## |   |   |   |   |   |   |   |   |--- enginesize >  166.00
## |   |   |   |   |   |   |   |   |   |--- value: [15690.00]
## |   |   |   |   |   |   |--- boreratio >  3.35
## |   |   |   |   |   |   |   |--- horsepower <= 180.00
## |   |   |   |   |   |   |   |   |--- carlength <= 174.60
## |   |   |   |   |   |   |   |   |   |--- value: [17199.00]
## |   |   |   |   |   |   |   |   |--- carlength >  174.60
## |   |   |   |   |   |   |   |   |   |--- value: [18399.00]
## |   |   |   |   |   |   |   |--- horsepower >  180.00
## |   |   |   |   |   |   |   |   |--- value: [19699.00]
## |   |   |--- carwidth >  68.60
## |   |   |   |--- curbweight <= 2983.00
## |   |   |   |   |--- curbweight <= 2953.00
## |   |   |   |   |   |--- citympg <= 21.00
## |   |   |   |   |   |   |--- value: [17710.00]
## |   |   |   |   |   |--- citympg >  21.00
## |   |   |   |   |   |   |--- value: [16845.00]
## |   |   |   |   |--- curbweight >  2953.00
## |   |   |   |   |   |--- value: [18920.00]
## |   |   |   |--- curbweight >  2983.00
## |   |   |   |   |--- horsepower <= 147.00
## |   |   |   |   |   |--- enginesize <= 159.00
## |   |   |   |   |   |   |--- peakrpm <= 5100.00
## |   |   |   |   |   |   |   |--- value: [22470.00]
## |   |   |   |   |   |   |--- peakrpm >  5100.00
## |   |   |   |   |   |   |   |--- value: [22625.00]
## |   |   |   |   |   |--- enginesize >  159.00
## |   |   |   |   |   |   |--- value: [21485.00]
## |   |   |   |   |--- horsepower >  147.00
## |   |   |   |   |   |--- value: [19045.00]
## |--- enginesize >  182.00
## |   |--- compressionratio <= 8.05
## |   |   |--- symboling <= 0.50
## |   |   |   |--- highwaympg <= 21.00
## |   |   |   |   |--- carlength <= 202.55
## |   |   |   |   |   |--- value: [36880.00]
## |   |   |   |   |--- carlength >  202.55
## |   |   |   |   |   |--- value: [40960.00]
## |   |   |   |--- highwaympg >  21.00
## |   |   |   |   |--- value: [41315.00]
## |   |   |--- symboling >  0.50
## |   |   |   |--- value: [45400.00]
## |   |--- compressionratio >  8.05
## |   |   |--- compressionratio <= 9.75
## |   |   |   |--- curbweight <= 2778.00
## |   |   |   |   |--- value: [33278.00]
## |   |   |   |--- curbweight >  2778.00
## |   |   |   |   |--- enginesize <= 214.00
## |   |   |   |   |   |--- value: [37028.00]
## |   |   |   |   |--- enginesize >  214.00
## |   |   |   |   |   |--- carheight <= 51.80
## |   |   |   |   |   |   |--- value: [35056.00]
## |   |   |   |   |   |--- carheight >  51.80
## |   |   |   |   |   |   |--- boreratio <= 3.55
## |   |   |   |   |   |   |   |--- value: [34184.00]
## |   |   |   |   |   |   |--- boreratio >  3.55
## |   |   |   |   |   |   |   |--- value: [33900.00]
## |   |   |--- compressionratio >  9.75
## |   |   |   |--- carwidth <= 71.00
## |   |   |   |   |--- curbweight <= 3632.50
## |   |   |   |   |   |--- carlength <= 189.20
## |   |   |   |   |   |   |--- value: [28176.00]
## |   |   |   |   |   |--- carlength >  189.20
## |   |   |   |   |   |   |--- value: [25552.00]
## |   |   |   |   |--- curbweight >  3632.50
## |   |   |   |   |   |--- value: [28248.00]
## |   |   |   |--- carwidth >  71.00
## |   |   |   |   |--- carwidth <= 72.00
## |   |   |   |   |   |--- value: [31600.00]
## |   |   |   |   |--- carwidth >  72.00
## |   |   |   |   |   |--- value: [31400.50]

Importancia de los predictores

importancia_predictores = pd.DataFrame(
                            {'predictor': datos.drop(columns = "price").columns, 
                            'importancia': modelo_ar.feature_importances_}
                            )
                            
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
##            predictor  importancia
## 6         enginesize     0.699023
## 5         curbweight     0.207796
## 9   compressionratio     0.030440
## 10        horsepower     0.018983
## 3           carwidth     0.015350
## 8             stroke     0.009370
## 13        highwaympg     0.004972
## 0          symboling     0.004642
## 1          wheelbase     0.002686
## 7          boreratio     0.002103
## 12           citympg     0.001633
## 2          carlength     0.001482
## 11           peakrpm     0.000781
## 4          carheight     0.000739

Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, peakrpm, carheight y wheelbase

Predicciones del modelo (ar)

predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([14869. ,  8358. ,  9495. ,  7898. , 41315. ,  7689. ,  6488. ,
##         6938. , 16500. ,  5572. , 16500. ,  7295. ,  5572. ,  8449. ,
##        22625. ,  6649. , 11694. , 11694. ,  6795. , 11048. , 28248. ,
##         8013. ,  7898. ,  8058. ,  8238. ,  8358. , 13845. , 16925. ,
##        13495. ,  6989. , 11395. ,  8495. ,  8916.5, 20970. ,  7975. ,
##         8358. , 18950. ,  7898. , 12940. ,  9495. ,  5118. ])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 84           3       95.9  ...    14489.000            14869.0
## 139          2       93.7  ...     7053.000             8358.0
## 143          0       97.2  ...     9960.000             9495.0
## 38           0       96.5  ...     9095.000             7898.0
## 15           0      103.5  ...    30760.000            41315.0
## 122          1       93.7  ...     7609.000             7689.0
## 150          1       95.7  ...     5348.000             6488.0
## 160          0       95.7  ...     7738.000             6938.0
## 179          3      102.9  ...    15998.000            16500.0
## 30           2       86.6  ...     6479.000             5572.0
## 9            0       99.5  ...    17859.167            16500.0
## 35           0       96.5  ...     7295.000             7295.0
## 76           2       93.7  ...     5389.000             5572.0
## 168          2       98.4  ...     9639.000             8449.0
## 8            1      105.8  ...    23875.000            22625.0
## 89           1       94.5  ...     5499.000             6649.0
## 5            2       99.8  ...    15250.000            11694.0
## 145          0       97.0  ...    11259.000            11694.0
## 51           1       93.1  ...     6095.000             6795.0
## 66           0      104.9  ...    18344.000            11048.0
## 49           0      102.0  ...    36000.000            28248.0
## 43           0       94.3  ...     6785.000             8013.0
## 186          2       97.3  ...     8495.000             7898.0
## 141          0       97.2  ...     7126.000             8058.0
## 142          0       97.2  ...     7775.000             8238.0
## 23           1       93.7  ...     7957.000             8358.0
## 193          0      100.4  ...    12290.000            13845.0
## 10           2      101.2  ...    16430.000            16925.0
## 1            3       88.6  ...    16500.000            13495.0
## 86           1       96.3  ...     8189.000             6989.0
## 57           3       95.3  ...    13645.000            11395.0
## 62           0       98.8  ...    10245.000             8495.0
## 20           0       94.5  ...     6575.000             8916.5
## 13           0      101.2  ...    21105.000            20970.0
## 185          2       97.3  ...     8195.000             7975.0
## 119          1       93.7  ...     7957.000             8358.0
## 198         -2      104.3  ...    18420.000            18950.0
## 165          1       94.5  ...     9298.000             7898.0
## 196         -2      104.3  ...    15985.000            12940.0
## 3            2       99.8  ...    13950.000             9495.0
## 32           1       93.7  ...     5399.000             5118.0
## 
## [41 rows x 16 columns]

RMSE modelo de ar

rmse_ar = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_ar,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2759.7797861241993

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2759.7797861241993

Modelo de bosques aleatorios (RF)

Se construye el modelo de árbol de regresión (ar). Semilla 1271 y 20 árboles de entrenamiento

modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1271)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1271)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Variables de importancia

# pendiente ... ...

Predicciones del modelo (rf)

predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([14033.5       ,  7287.4       ,  9564.1       ,  8227.75      ,
##        40392.75      ,  7222.7125    ,  6304.3       ,  7091.95      ,
##        16733.9       ,  5485.5       , 17475.6       ,  7284.275     ,
##         5708.15      ,  8796.2       , 18386.85      ,  6723.45      ,
##        11413.35      , 10336.75      ,  6356.8       , 13603.8       ,
##        32523.325     ,  9309.7       ,  8189.25      ,  7890.925     ,
##         8242.95      ,  8533.8       , 12139.1       , 14795.9       ,
##        12091.55      ,  8481.88333333, 12001.2       ,  9569.46666667,
##         8429.12916667, 19361.2       ,  7947.4       ,  8533.8       ,
##        17821.        ,  9616.2       , 14617.        , 10729.38333333,
##         5626.25      ])

Tabla comparativa

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Real  Precio_Prediccion
## 84           3       95.9  ...    14489.000       14033.500000
## 139          2       93.7  ...     7053.000        7287.400000
## 143          0       97.2  ...     9960.000        9564.100000
## 38           0       96.5  ...     9095.000        8227.750000
## 15           0      103.5  ...    30760.000       40392.750000
## 122          1       93.7  ...     7609.000        7222.712500
## 150          1       95.7  ...     5348.000        6304.300000
## 160          0       95.7  ...     7738.000        7091.950000
## 179          3      102.9  ...    15998.000       16733.900000
## 30           2       86.6  ...     6479.000        5485.500000
## 9            0       99.5  ...    17859.167       17475.600000
## 35           0       96.5  ...     7295.000        7284.275000
## 76           2       93.7  ...     5389.000        5708.150000
## 168          2       98.4  ...     9639.000        8796.200000
## 8            1      105.8  ...    23875.000       18386.850000
## 89           1       94.5  ...     5499.000        6723.450000
## 5            2       99.8  ...    15250.000       11413.350000
## 145          0       97.0  ...    11259.000       10336.750000
## 51           1       93.1  ...     6095.000        6356.800000
## 66           0      104.9  ...    18344.000       13603.800000
## 49           0      102.0  ...    36000.000       32523.325000
## 43           0       94.3  ...     6785.000        9309.700000
## 186          2       97.3  ...     8495.000        8189.250000
## 141          0       97.2  ...     7126.000        7890.925000
## 142          0       97.2  ...     7775.000        8242.950000
## 23           1       93.7  ...     7957.000        8533.800000
## 193          0      100.4  ...    12290.000       12139.100000
## 10           2      101.2  ...    16430.000       14795.900000
## 1            3       88.6  ...    16500.000       12091.550000
## 86           1       96.3  ...     8189.000        8481.883333
## 57           3       95.3  ...    13645.000       12001.200000
## 62           0       98.8  ...    10245.000        9569.466667
## 20           0       94.5  ...     6575.000        8429.129167
## 13           0      101.2  ...    21105.000       19361.200000
## 185          2       97.3  ...     8195.000        7947.400000
## 119          1       93.7  ...     7957.000        8533.800000
## 198         -2      104.3  ...    18420.000       17821.000000
## 165          1       94.5  ...     9298.000        9616.200000
## 196         -2      104.3  ...    15985.000       14617.000000
## 3            2       99.8  ...    13950.000       10729.383333
## 32           1       93.7  ...     5399.000        5626.250000
## 
## [41 rows x 16 columns]

RMSE modelo de ar

rmse_rf = mean_squared_error(
        y_true  = Y_valida,
        y_pred  = predicciones_rf,
        squared = False
       )
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2380.5577291513127

o

print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2380.5577291513127

Evaluación de modelos

Se comparan las predicciones

comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
##      symboling  wheelbase  ...  Precio_Prediccion_ar  Precio_Prediccion_rf
## 84           3       95.9  ...               14869.0          14033.500000
## 139          2       93.7  ...                8358.0           7287.400000
## 143          0       97.2  ...                9495.0           9564.100000
## 38           0       96.5  ...                7898.0           8227.750000
## 15           0      103.5  ...               41315.0          40392.750000
## 122          1       93.7  ...                7689.0           7222.712500
## 150          1       95.7  ...                6488.0           6304.300000
## 160          0       95.7  ...                6938.0           7091.950000
## 179          3      102.9  ...               16500.0          16733.900000
## 30           2       86.6  ...                5572.0           5485.500000
## 9            0       99.5  ...               16500.0          17475.600000
## 35           0       96.5  ...                7295.0           7284.275000
## 76           2       93.7  ...                5572.0           5708.150000
## 168          2       98.4  ...                8449.0           8796.200000
## 8            1      105.8  ...               22625.0          18386.850000
## 89           1       94.5  ...                6649.0           6723.450000
## 5            2       99.8  ...               11694.0          11413.350000
## 145          0       97.0  ...               11694.0          10336.750000
## 51           1       93.1  ...                6795.0           6356.800000
## 66           0      104.9  ...               11048.0          13603.800000
## 49           0      102.0  ...               28248.0          32523.325000
## 43           0       94.3  ...                8013.0           9309.700000
## 186          2       97.3  ...                7898.0           8189.250000
## 141          0       97.2  ...                8058.0           7890.925000
## 142          0       97.2  ...                8238.0           8242.950000
## 23           1       93.7  ...                8358.0           8533.800000
## 193          0      100.4  ...               13845.0          12139.100000
## 10           2      101.2  ...               16925.0          14795.900000
## 1            3       88.6  ...               13495.0          12091.550000
## 86           1       96.3  ...                6989.0           8481.883333
## 57           3       95.3  ...               11395.0          12001.200000
## 62           0       98.8  ...                8495.0           9569.466667
## 20           0       94.5  ...                8916.5           8429.129167
## 13           0      101.2  ...               20970.0          19361.200000
## 185          2       97.3  ...                7975.0           7947.400000
## 119          1       93.7  ...                8358.0           8533.800000
## 198         -2      104.3  ...               18950.0          17821.000000
## 165          1       94.5  ...                7898.0           9616.200000
## 196         -2      104.3  ...               12940.0          14617.000000
## 3            2       99.8  ...                9495.0          10729.383333
## 32           1       93.7  ...                5118.0           5626.250000
## 
## [41 rows x 18 columns]

Se compara el RMSE.

Se crea un arreglo numpy

rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3351.45740395, 2759.77978612, 2380.55772915]])

Se construye data.frame a partir del rreglo nmpy

rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
##        rmse_rm      rmse_ar      rmse_rf
## 0  3351.457404  2759.779786  2380.557729

Interpretación

Como resultado en general SE obtuvieron mejores resultados a diferencia con los datos obtenidos en R, usando la misma semilla de 1271 Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.

El modelo de arbol de regresion destaca variables estadísticamente significativas: - enginesize -> 69.9023 % - curbweight -> 20.7796 % - compressionratio -> 03.0440 % - horsepower -> 01.8983 % - carwidth -> 01.5350 %

El modelo de árbol de regresión sus variables de importancia fueron: enginesize, compressionratio, curbweight y horsepower.

El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, compressionratio

A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.

El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. Asi quedaron de mas aceptado al menor:

  1. Random Forest 2380.5577291513127
  2. Arbol de Regresión 2759.7797861241993
  3. Regresión Multiple 3351.4574039517665