Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
## car_ID symboling CarName ... citympg highwaympg price
## 0 1 3 alfa-romero giulia ... 21 27 13495.0
## 1 2 3 alfa-romero stelvio ... 21 27 16500.0
## 2 3 1 alfa-romero Quadrifoglio ... 19 26 16500.0
## 3 4 2 audi 100 ls ... 24 30 13950.0
## 4 5 2 audi 100ls ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 volvo 145e (sw) ... 23 28 16845.0
## 201 202 -1 volvo 144ea ... 19 25 19045.0
## 202 203 -1 volvo 244dl ... 18 23 21485.0
## 203 204 -1 volvo 246 ... 26 27 22470.0
## 204 205 -1 volvo 264gl ... 19 25 22625.0
##
## [205 rows x 26 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID int64
## symboling int64
## CarName object
## fueltype object
## aspiration object
## doornumber object
## carbody object
## drivewheel object
## enginelocation object
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginetype object
## cylindernumber object
## enginesize int64
## fuelsystem object
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
## symboling fueltype aspiration ... citympg highwaympg price
## 0 3 gas std ... 21 27 13495.0
## 1 3 gas std ... 21 27 16500.0
## 2 1 gas std ... 19 26 16500.0
## 3 2 gas std ... 24 30 13950.0
## 4 2 gas std ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 gas std ... 23 28 16845.0
## 201 -1 gas turbo ... 19 25 19045.0
## 202 -1 gas std ... 18 23 21485.0
## 203 -1 diesel turbo ... 26 27 22470.0
## 204 -1 gas turbo ... 19 25 22625.0
##
## [205 rows x 24 columns]
Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object
Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.
El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.
¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.
Ejemplo
| genero |
|---|
| MASCULINO |
| FEMENINO |
| MASCULINO |
Mismos datos con variables dummis
| genero_masculino | genero_femenino |
|---|---|
| 1 | 0 |
| 0 | 1 |
| 1 | 0 |
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 0 3 88.6 ... 0 0
## 1 3 88.6 ... 0 0
## 2 1 94.5 ... 0 0
## 3 2 99.8 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 200 -1 109.1 ... 0 0
## 201 -1 109.1 ... 0 0
## 202 -1 109.1 ... 0 0
## 203 -1 109.1 ... 0 0
## 204 -1 109.1 ... 0 0
##
## [205 rows x 44 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 2022
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80, random_state = 1279)
X_entrena
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 194 -2 104.3 ... 0 0
## 132 3 99.1 ... 0 0
## 84 3 95.9 ... 1 0
## 90 1 94.5 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 40 0 96.5 ... 0 0
## 86 1 96.3 ... 0 0
## 60 0 98.8 ... 0 0
## 155 0 95.7 ... 0 0
## 167 2 98.4 ... 0 0
##
## [164 rows x 43 columns]
X_valida
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 83 3 95.9 ... 1 0
## 91 1 94.5 ... 0 0
## 1 3 88.6 ... 0 0
## 110 0 114.2 ... 0 0
## 136 3 99.1 ... 0 0
## 105 3 91.3 ... 0 0
## 197 -1 104.3 ... 0 0
## 49 0 102.0 ... 0 0
## 181 -1 104.5 ... 0 0
## 140 2 93.3 ... 0 0
## 198 -2 104.3 ... 0 0
## 146 0 97.0 ... 0 0
## 99 0 97.2 ... 0 0
## 130 0 96.1 ... 0 0
## 145 0 97.0 ... 0 0
## 70 -1 115.6 ... 0 0
## 37 0 96.5 ... 0 0
## 13 0 101.2 ... 0 0
## 5 2 99.8 ... 0 0
## 203 -1 109.1 ... 0 0
## 131 2 96.1 ... 0 0
## 48 0 113.0 ... 0 0
## 174 -1 102.4 ... 0 0
## 41 0 96.5 ... 0 0
## 6 1 105.8 ... 0 0
## 196 -2 104.3 ... 0 0
## 199 -1 104.3 ... 0 0
## 63 0 98.8 ... 0 0
## 191 0 100.4 ... 0 0
## 184 2 97.3 ... 0 0
## 149 0 96.9 ... 0 0
## 62 0 98.8 ... 0 0
## 152 1 95.7 ... 0 0
## 109 0 114.2 ... 0 0
## 68 -1 110.0 ... 0 0
## 127 3 89.5 ... 0 0
## 201 -1 109.1 ... 0 0
## 61 1 98.8 ... 0 0
## 10 2 101.2 ... 0 0
## 87 1 96.3 ... 1 0
## 32 1 93.7 ... 0 0
##
## [41 rows x 43 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([-1.62658176e+02, -6.67463269e-01, -4.36780879e+01, 8.75428011e+02,
## 7.67575520e+01, 2.36157670e+00, 1.01153813e+02, -1.77705736e+03,
## -4.07910689e+03, -6.02560006e+02, 2.01272193e+01, 2.34060075e+00,
## -1.42460910e+02, 1.56983425e+02, -5.15893119e+03, 1.59045490e+03,
## -1.28893373e+02, -3.25289419e+03, -3.66533902e+03, -2.73322171e+03,
## -3.50009243e+03, -9.19992628e+01, 1.19674663e+03, 5.52616675e+03,
## -1.00187607e+04, 1.14663243e+03, 4.70398070e+03, 2.82508796e+03,
## -5.20273001e+03, -1.56350278e+03, -1.26155810e+04, -1.34479719e+04,
## -8.16699884e+03, -4.27609647e+03, 5.45696821e-12, -1.56350278e+03,
## 1.21334870e+02, -1.39551864e+03, 5.15893119e+03, -2.11420948e+03,
## -1.32963782e+01, -2.29966785e+03, 2.03010607e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9388472336156414
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [13219.24990938 6388.53104004 12141.2333224 15565.74242356
## 10927.33477441 20598.88010714 16858.40973215 46534.59873777
## 19618.49931304 6038.74323093 19981.70280723 7976.34277616
## 10069.30912419 10357.74649517 12441.7959641 28247.40482282
## 9346.29755253 21445.44618526 15188.68076476 27183.98096159
## 9389.33188024 30100.84979426 12816.21641579 10595.53595676
## 20263.6888157 17435.46510507 19416.45531779 12373.04370883
## 16401.29485231 9552.27983716 11182.34267727 11203.08959656
## 5980.14992351 12027.29364017 26906.6929 33099.59821979
## 21891.42656755 9782.217732 13643.03766767 11118.85043794]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 83 3 95.9 ... 14869.0 13219.249909
## 91 1 94.5 ... 6649.0 6388.531040
## 1 3 88.6 ... 16500.0 12141.233322
## 110 0 114.2 ... 13860.0 15565.742424
## 136 3 99.1 ... 18150.0 10927.334774
## 105 3 91.3 ... 19699.0 20598.880107
## 197 -1 104.3 ... 16515.0 16858.409732
## 49 0 102.0 ... 36000.0 46534.598738
## 181 -1 104.5 ... 15750.0 19618.499313
## 140 2 93.3 ... 7603.0 6038.743231
## 198 -2 104.3 ... 18420.0 19981.702807
## 146 0 97.0 ... 7463.0 7976.342776
## 99 0 97.2 ... 8949.0 10069.309124
## 130 0 96.1 ... 9295.0 10357.746495
## 145 0 97.0 ... 11259.0 12441.795964
## 70 -1 115.6 ... 31600.0 28247.404823
## 37 0 96.5 ... 7895.0 9346.297553
## 13 0 101.2 ... 21105.0 21445.446185
## 5 2 99.8 ... 15250.0 15188.680765
## 203 -1 109.1 ... 22470.0 27183.980962
## 131 2 96.1 ... 9895.0 9389.331880
## 48 0 113.0 ... 35550.0 30100.849794
## 174 -1 102.4 ... 10698.0 12816.216416
## 41 0 96.5 ... 12945.0 10595.535957
## 6 1 105.8 ... 17710.0 20263.688816
## 196 -2 104.3 ... 15985.0 17435.465105
## 199 -1 104.3 ... 18950.0 19416.455318
## 63 0 98.8 ... 10795.0 12373.043709
## 191 0 100.4 ... 13295.0 16401.294852
## 184 2 97.3 ... 7995.0 9552.279837
## 149 0 96.9 ... 11694.0 11182.342677
## 62 0 98.8 ... 10245.0 11203.089597
## 152 1 95.7 ... 6488.0 5980.149924
## 109 0 114.2 ... 12440.0 12027.293640
## 68 -1 110.0 ... 28248.0 26906.692900
## 127 3 89.5 ... 34028.0 33099.598220
## 201 -1 109.1 ... 19045.0 21891.426568
## 61 1 98.8 ... 10595.0 9782.217732
## 10 2 101.2 ... 16430.0 13643.037668
## 87 1 96.3 ... 9279.0 11118.850438
## 32 1 93.7 ... 5399.0 5231.908775
##
## [41 rows x 45 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2862.41714342437
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2862.41714342437
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 2022
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=2022)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(random_state=2022)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 16
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 153
#plot = plot_tree(
# decision_tree = modelo_ar,
# feature_names = datos.drop(columns = "price").columns,
# class_names = 'price',
# filled = True,
# impurity = False,
# fontsize = 10,
# precision = 2,
# ax = ax
# )
#plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos_dummis.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2557.00
## | | |--- horsepower <= 83.00
## | | | |--- wheelbase <= 94.40
## | | | | |--- carbody_sedan <= 0.50
## | | | | | |--- horsepower <= 71.00
## | | | | | | |--- stroke <= 3.09
## | | | | | | | |--- carlength <= 149.00
## | | | | | | | | |--- value: [5151.00]
## | | | | | | | |--- carlength > 149.00
## | | | | | | | | |--- value: [5118.00]
## | | | | | | |--- stroke > 3.09
## | | | | | | | |--- highwaympg <= 34.50
## | | | | | | | | |--- value: [5195.00]
## | | | | | | | |--- highwaympg > 34.50
## | | | | | | | | |--- highwaympg <= 39.50
## | | | | | | | | | |--- curbweight <= 1985.50
## | | | | | | | | | | |--- curbweight <= 1924.50
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- curbweight > 1924.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- curbweight > 1985.50
## | | | | | | | | | | |--- value: [6669.00]
## | | | | | | | | |--- highwaympg > 39.50
## | | | | | | | | | |--- fuelsystem_2bbl <= 0.50
## | | | | | | | | | | |--- value: [6479.00]
## | | | | | | | | | |--- fuelsystem_2bbl > 0.50
## | | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | | |--- value: [5389.00]
## | | | | | |--- horsepower > 71.00
## | | | | | | |--- curbweight <= 1948.00
## | | | | | | | |--- carwidth <= 63.95
## | | | | | | | | |--- value: [6855.00]
## | | | | | | | |--- carwidth > 63.95
## | | | | | | | | |--- value: [6529.00]
## | | | | | | |--- curbweight > 1948.00
## | | | | | | | |--- carheight <= 53.15
## | | | | | | | | |--- value: [7129.00]
## | | | | | | | |--- carheight > 53.15
## | | | | | | | | |--- value: [7053.00]
## | | | | |--- carbody_sedan > 0.50
## | | | | | |--- curbweight <= 1947.50
## | | | | | | |--- value: [6695.00]
## | | | | | |--- curbweight > 1947.50
## | | | | | | |--- drivewheel_rwd <= 0.50
## | | | | | | | |--- enginesize <= 90.50
## | | | | | | | | |--- carlength <= 162.30
## | | | | | | | | | |--- value: [7150.50]
## | | | | | | | | |--- carlength > 162.30
## | | | | | | | | | |--- value: [6692.00]
## | | | | | | | |--- enginesize > 90.50
## | | | | | | | | |--- carlength <= 167.05
## | | | | | | | | | |--- value: [7395.00]
## | | | | | | | | |--- carlength > 167.05
## | | | | | | | | | |--- value: [7609.00]
## | | | | | | |--- drivewheel_rwd > 0.50
## | | | | | | | |--- value: [6785.00]
## | | | |--- wheelbase > 94.40
## | | | | |--- curbweight <= 2115.50
## | | | | | |--- carheight <= 54.00
## | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | |--- carlength <= 157.35
## | | | | | | | | |--- value: [8916.50]
## | | | | | | | |--- carlength > 157.35
## | | | | | | | | |--- compressionratio <= 9.50
## | | | | | | | | | |--- citympg <= 30.50
## | | | | | | | | | | |--- value: [6938.00]
## | | | | | | | | | |--- citympg > 30.50
## | | | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | | | |--- value: [8249.00]
## | | | | | | | | |--- compressionratio > 9.50
## | | | | | | | | | |--- value: [6575.00]
## | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | |--- compressionratio <= 9.50
## | | | | | | | | |--- carheight <= 53.05
## | | | | | | | | | |--- value: [7198.00]
## | | | | | | | | |--- carheight > 53.05
## | | | | | | | | | |--- value: [7799.00]
## | | | | | | | |--- compressionratio > 9.50
## | | | | | | | | |--- value: [6295.00]
## | | | | | |--- carheight > 54.00
## | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | |--- curbweight <= 1913.50
## | | | | | | | | |--- value: [5499.00]
## | | | | | | | |--- curbweight > 1913.50
## | | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | | |--- horsepower <= 62.00
## | | | | | | | | | | |--- value: [7099.00]
## | | | | | | | | | |--- horsepower > 62.00
## | | | | | | | | | | |--- enginesize <= 94.50
## | | | | | | | | | | | |--- value: [7295.00]
## | | | | | | | | | | |--- enginesize > 94.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | |--- curbweight <= 2012.50
## | | | | | | | | |--- value: [5348.00]
## | | | | | | | |--- curbweight > 2012.50
## | | | | | | | | |--- value: [6338.00]
## | | | | |--- curbweight > 2115.50
## | | | | | |--- curbweight <= 2304.50
## | | | | | | |--- curbweight <= 2142.50
## | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | |--- value: [8358.00]
## | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | |--- value: [9258.00]
## | | | | | | |--- curbweight > 2142.50
## | | | | | | | |--- highwaympg <= 36.50
## | | | | | | | | |--- wheelbase <= 95.10
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- value: [8058.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- value: [8238.00]
## | | | | | | | | |--- wheelbase > 95.10
## | | | | | | | | | |--- carheight <= 52.75
## | | | | | | | | | | |--- value: [7775.00]
## | | | | | | | | | |--- carheight > 52.75
## | | | | | | | | | | |--- value: [7898.00]
## | | | | | | | |--- highwaympg > 36.50
## | | | | | | | | |--- stroke <= 3.19
## | | | | | | | | | |--- carwidth <= 64.50
## | | | | | | | | | | |--- value: [6918.00]
## | | | | | | | | | |--- carwidth > 64.50
## | | | | | | | | | | |--- value: [7126.00]
## | | | | | | | | |--- stroke > 3.19
## | | | | | | | | | |--- carwidth <= 64.95
## | | | | | | | | | | |--- value: [7788.00]
## | | | | | | | | | |--- carwidth > 64.95
## | | | | | | | | | | |--- value: [7775.00]
## | | | | | |--- curbweight > 2304.50
## | | | | | | |--- curbweight <= 2402.50
## | | | | | | | |--- horsepower <= 75.00
## | | | | | | | | |--- value: [9495.00]
## | | | | | | | |--- horsepower > 75.00
## | | | | | | | | |--- value: [9233.00]
## | | | | | | |--- curbweight > 2402.50
## | | | | | | | |--- value: [8013.00]
## | | |--- horsepower > 83.00
## | | | |--- citympg <= 23.50
## | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | |--- horsepower <= 108.50
## | | | | | | |--- curbweight <= 2382.50
## | | | | | | | |--- value: [11395.00]
## | | | | | | |--- curbweight > 2382.50
## | | | | | | | |--- value: [13645.00]
## | | | | | |--- horsepower > 108.50
## | | | | | | |--- aspiration_turbo <= 0.50
## | | | | | | | |--- value: [9279.00]
## | | | | | | |--- aspiration_turbo > 0.50
## | | | | | | | |--- value: [9959.00]
## | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | |--- carwidth <= 64.45
## | | | | | | |--- value: [13495.00]
## | | | | | |--- carwidth > 64.45
## | | | | | | |--- curbweight <= 2447.50
## | | | | | | | |--- value: [16925.00]
## | | | | | | |--- curbweight > 2447.50
## | | | | | | | |--- value: [15645.00]
## | | | |--- citympg > 23.50
## | | | | |--- compressionratio <= 9.70
## | | | | | |--- curbweight <= 2216.50
## | | | | | | |--- carheight <= 50.70
## | | | | | | | |--- value: [8558.00]
## | | | | | | |--- carheight > 50.70
## | | | | | | | |--- wheelbase <= 93.35
## | | | | | | | | |--- value: [7689.00]
## | | | | | | | |--- wheelbase > 93.35
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- value: [8195.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- carlength <= 164.50
## | | | | | | | | | | |--- value: [7957.00]
## | | | | | | | | | |--- carlength > 164.50
## | | | | | | | | | | |--- value: [7975.00]
## | | | | | |--- curbweight > 2216.50
## | | | | | | |--- horsepower <= 89.00
## | | | | | | | |--- symboling <= 0.50
## | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | |--- carwidth <= 63.55
## | | | | | | | | | | |--- value: [10295.00]
## | | | | | | | | | |--- carwidth > 63.55
## | | | | | | | | | | |--- stroke <= 3.43
## | | | | | | | | | | | |--- value: [8495.00]
## | | | | | | | | | | |--- stroke > 3.43
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | |--- boreratio <= 3.27
## | | | | | | | | | | |--- value: [9095.00]
## | | | | | | | | | |--- boreratio > 3.27
## | | | | | | | | | | |--- value: [11245.00]
## | | | | | | | |--- symboling > 0.50
## | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | |--- citympg <= 25.50
## | | | | | | | | | | |--- value: [8499.00]
## | | | | | | | | | |--- citympg > 25.50
## | | | | | | | | | | |--- value: [8845.00]
## | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | |--- stroke <= 3.43
## | | | | | | | | | | |--- value: [8495.00]
## | | | | | | | | | |--- stroke > 3.43
## | | | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | | | |--- value: [6989.00]
## | | | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | | | |--- value: [8189.00]
## | | | | | | |--- horsepower > 89.00
## | | | | | | | |--- carheight <= 55.25
## | | | | | | | | |--- drivewheel_rwd <= 0.50
## | | | | | | | | | |--- highwaympg <= 33.00
## | | | | | | | | | | |--- curbweight <= 2456.50
## | | | | | | | | | | | |--- truncated branch of depth 6
## | | | | | | | | | | |--- curbweight > 2456.50
## | | | | | | | | | | | |--- value: [11248.00]
## | | | | | | | | | |--- highwaympg > 33.00
## | | | | | | | | | | |--- symboling <= -0.50
## | | | | | | | | | | | |--- value: [8948.00]
## | | | | | | | | | | |--- symboling > -0.50
## | | | | | | | | | | | |--- value: [9549.00]
## | | | | | | | | |--- drivewheel_rwd > 0.50
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- curbweight <= 2538.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2538.00
## | | | | | | | | | | | |--- value: [8449.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- enginesize <= 122.00
## | | | | | | | | | | | |--- value: [9538.00]
## | | | | | | | | | | |--- enginesize > 122.00
## | | | | | | | | | | | |--- value: [9989.00]
## | | | | | | | |--- carheight > 55.25
## | | | | | | | | |--- value: [11595.00]
## | | | | |--- compressionratio > 9.70
## | | | | | |--- citympg <= 25.00
## | | | | | | |--- value: [13950.00]
## | | | | | |--- citympg > 25.00
## | | | | | | |--- value: [9995.00]
## | |--- curbweight > 2557.00
## | | |--- carwidth <= 68.65
## | | | |--- peakrpm <= 4375.00
## | | | | |--- stroke <= 3.36
## | | | | | |--- symboling <= 0.50
## | | | | | | |--- value: [20970.00]
## | | | | | |--- symboling > 0.50
## | | | | | | |--- value: [24565.00]
## | | | | |--- stroke > 3.36
## | | | | | |--- wheelbase <= 106.40
## | | | | | | |--- value: [18344.00]
## | | | | | |--- wheelbase > 106.40
## | | | | | | |--- curbweight <= 3224.50
## | | | | | | | |--- value: [13200.00]
## | | | | | | |--- curbweight > 3224.50
## | | | | | | | |--- wheelbase <= 111.05
## | | | | | | | | |--- value: [17425.00]
## | | | | | | | |--- wheelbase > 111.05
## | | | | | | | | |--- value: [17075.00]
## | | | |--- peakrpm > 4375.00
## | | | | |--- citympg <= 20.50
## | | | | | |--- peakrpm <= 5350.00
## | | | | | | |--- drivewheel_fwd <= 0.50
## | | | | | | | |--- highwaympg <= 24.50
## | | | | | | | | |--- carlength <= 175.80
## | | | | | | | | | |--- value: [12764.00]
## | | | | | | | | |--- carlength > 175.80
## | | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | | |--- boreratio <= 3.37
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- boreratio > 3.37
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | | |--- curbweight <= 3047.50
## | | | | | | | | | | | |--- value: [11900.00]
## | | | | | | | | | | |--- curbweight > 3047.50
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | |--- highwaympg > 24.50
## | | | | | | | | |--- wheelbase <= 96.85
## | | | | | | | | | |--- horsepower <= 157.00
## | | | | | | | | | | |--- value: [16500.00]
## | | | | | | | | | |--- horsepower > 157.00
## | | | | | | | | | | |--- value: [17199.00]
## | | | | | | | | |--- wheelbase > 96.85
## | | | | | | | | | |--- boreratio <= 3.60
## | | | | | | | | | | |--- value: [18399.00]
## | | | | | | | | | |--- boreratio > 3.60
## | | | | | | | | | | |--- value: [18280.00]
## | | | | | | |--- drivewheel_fwd > 0.50
## | | | | | | | |--- curbweight <= 2879.50
## | | | | | | | | |--- stroke <= 3.88
## | | | | | | | | | |--- value: [12629.00]
## | | | | | | | | |--- stroke > 3.88
## | | | | | | | | | |--- value: [12964.00]
## | | | | | | | |--- curbweight > 2879.50
## | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | |--- enginetype_ohcv <= 0.50
## | | | | | | | | | | |--- value: [14489.00]
## | | | | | | | | | |--- enginetype_ohcv > 0.50
## | | | | | | | | | | |--- value: [14399.00]
## | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | |--- value: [13499.00]
## | | | | | |--- peakrpm > 5350.00
## | | | | | | |--- boreratio <= 3.77
## | | | | | | | |--- carheight <= 55.15
## | | | | | | | | |--- horsepower <= 137.50
## | | | | | | | | | |--- value: [17450.00]
## | | | | | | | | |--- horsepower > 137.50
## | | | | | | | | | |--- value: [17859.17]
## | | | | | | | |--- carheight > 55.15
## | | | | | | | | |--- stroke <= 3.14
## | | | | | | | | | |--- value: [18620.00]
## | | | | | | | | |--- stroke > 3.14
## | | | | | | | | | |--- value: [18150.00]
## | | | | | | |--- boreratio > 3.77
## | | | | | | | |--- value: [22018.00]
## | | | | |--- citympg > 20.50
## | | | | | |--- carlength <= 174.40
## | | | | | | |--- carlength <= 171.15
## | | | | | | | |--- value: [8778.00]
## | | | | | | |--- carlength > 171.15
## | | | | | | | |--- value: [11048.00]
## | | | | | |--- carlength > 174.40
## | | | | | | |--- curbweight <= 2736.00
## | | | | | | | |--- boreratio <= 3.10
## | | | | | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | | | | | |--- value: [13845.00]
## | | | | | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | | | | | |--- value: [15040.00]
## | | | | | | | |--- boreratio > 3.10
## | | | | | | | | |--- peakrpm <= 5025.00
## | | | | | | | | | |--- carbody_hardtop <= 0.50
## | | | | | | | | | | |--- value: [11549.00]
## | | | | | | | | | |--- carbody_hardtop > 0.50
## | | | | | | | | | | |--- value: [11199.00]
## | | | | | | | | |--- peakrpm > 5025.00
## | | | | | | | | | |--- compressionratio <= 9.31
## | | | | | | | | | | |--- carbody_wagon <= 0.50
## | | | | | | | | | | | |--- value: [12170.00]
## | | | | | | | | | | |--- carbody_wagon > 0.50
## | | | | | | | | | | | |--- value: [12290.00]
## | | | | | | | | | |--- compressionratio > 9.31
## | | | | | | | | | | |--- value: [11850.00]
## | | | | | | |--- curbweight > 2736.00
## | | | | | | | |--- symboling <= 0.50
## | | | | | | | | |--- symboling <= -1.50
## | | | | | | | | | |--- value: [12940.00]
## | | | | | | | | |--- symboling > -1.50
## | | | | | | | | | |--- value: [13415.00]
## | | | | | | | |--- symboling > 0.50
## | | | | | | | | |--- highwaympg <= 29.00
## | | | | | | | | | |--- value: [15510.00]
## | | | | | | | | |--- highwaympg > 29.00
## | | | | | | | | | |--- value: [17669.00]
## | | |--- carwidth > 68.65
## | | | |--- curbweight <= 2983.00
## | | | | |--- curbweight <= 2953.00
## | | | | | |--- value: [16845.00]
## | | | | |--- curbweight > 2953.00
## | | | | | |--- value: [18920.00]
## | | | |--- curbweight > 2983.00
## | | | | |--- cylindernumber_five <= 0.50
## | | | | | |--- peakrpm <= 5450.00
## | | | | | | |--- value: [22625.00]
## | | | | | |--- peakrpm > 5450.00
## | | | | | | |--- value: [21485.00]
## | | | | |--- cylindernumber_five > 0.50
## | | | | | |--- value: [23875.00]
## |--- enginesize > 182.00
## | |--- citympg <= 14.50
## | | |--- symboling <= 0.50
## | | | |--- value: [40960.00]
## | | |--- symboling > 0.50
## | | | |--- value: [45400.00]
## | |--- citympg > 14.50
## | | |--- fuelsystem_mpfi <= 0.50
## | | | |--- carbody_sedan <= 0.50
## | | | | |--- value: [28176.00]
## | | | |--- carbody_sedan > 0.50
## | | | | |--- value: [25552.00]
## | | |--- fuelsystem_mpfi > 0.50
## | | | |--- curbweight <= 3373.00
## | | | | |--- curbweight <= 3015.00
## | | | | | |--- carbody_hardtop <= 0.50
## | | | | | | |--- value: [37028.00]
## | | | | | |--- carbody_hardtop > 0.50
## | | | | | | |--- value: [32528.00]
## | | | | |--- curbweight > 3015.00
## | | | | | |--- compressionratio <= 9.00
## | | | | | | |--- value: [30760.00]
## | | | | | |--- compressionratio > 9.00
## | | | | | | |--- value: [31400.50]
## | | | |--- curbweight > 3373.00
## | | | | |--- curbweight <= 3442.50
## | | | | | |--- value: [41315.00]
## | | | | |--- curbweight > 3442.50
## | | | | | |--- curbweight <= 3712.50
## | | | | | | |--- citympg <= 15.50
## | | | | | | | |--- value: [36880.00]
## | | | | | | |--- citympg > 15.50
## | | | | | | | |--- value: [35056.00]
## | | | | | |--- curbweight > 3712.50
## | | | | | | |--- cylindernumber_six <= 0.50
## | | | | | | | |--- value: [34184.00]
## | | | | | | |--- cylindernumber_six > 0.50
## | | | | | | | |--- value: [32250.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos_dummis.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 6.700701e-01
## 5 curbweight 2.090632e-01
## 12 citympg 3.271158e-02
## 10 horsepower 2.256340e-02
## 3 carwidth 1.295450e-02
## 40 fuelsystem_mpfi 1.292520e-02
## 11 peakrpm 1.215281e-02
## 8 stroke 5.464142e-03
## 2 carlength 3.342841e-03
## 0 symboling 3.178284e-03
## 1 wheelbase 2.586566e-03
## 21 drivewheel_fwd 2.458972e-03
## 7 boreratio 2.456716e-03
## 13 highwaympg 2.013865e-03
## 9 compressionratio 1.601364e-03
## 19 carbody_sedan 1.389110e-03
## 17 carbody_hardtop 9.970823e-04
## 4 carheight 8.295498e-04
## 18 carbody_hatchback 5.223021e-04
## 30 cylindernumber_five 2.161564e-04
## 22 drivewheel_rwd 1.868486e-04
## 32 cylindernumber_six 1.830626e-04
## 36 fuelsystem_2bbl 7.648682e-05
## 16 doornumber_two 2.851832e-05
## 15 aspiration_turbo 2.263104e-05
## 20 carbody_wagon 4.322900e-06
## 28 enginetype_ohcv 3.964347e-07
## 34 cylindernumber_twelve 0.000000e+00
## 39 fuelsystem_mfi 0.000000e+00
## 38 fuelsystem_idi 0.000000e+00
## 37 fuelsystem_4bbl 0.000000e+00
## 41 fuelsystem_spdi 0.000000e+00
## 35 cylindernumber_two 0.000000e+00
## 27 enginetype_ohcf 0.000000e+00
## 33 cylindernumber_three 0.000000e+00
## 31 cylindernumber_four 0.000000e+00
## 29 enginetype_rotor 0.000000e+00
## 26 enginetype_ohc 0.000000e+00
## 25 enginetype_l 0.000000e+00
## 24 enginetype_dohcv 0.000000e+00
## 23 enginelocation_rear 0.000000e+00
## 14 fueltype_gas 0.000000e+00
## 42 fuelsystem_spfi 0.000000e+00
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([14489., 6849., 13495., 17075., 18620., 12764., 13415., 40960.,
## 15998., 7053., 11900., 7898., 9549., 12290., 11248., 25552.,
## 9095., 20970., 15645., 22625., 15645., 32250., 8013., 11248.,
## 16845., 12940., 16695., 8013., 17450., 7775., 11048., 8495.,
## 6338., 16695., 28176., 32528., 22625., 8845., 16925., 9959.,
## 5118.])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 83 3 95.9 ... 14869.0 14489.0
## 91 1 94.5 ... 6649.0 6849.0
## 1 3 88.6 ... 16500.0 13495.0
## 110 0 114.2 ... 13860.0 17075.0
## 136 3 99.1 ... 18150.0 18620.0
## 105 3 91.3 ... 19699.0 12764.0
## 197 -1 104.3 ... 16515.0 13415.0
## 49 0 102.0 ... 36000.0 40960.0
## 181 -1 104.5 ... 15750.0 15998.0
## 140 2 93.3 ... 7603.0 7053.0
## 198 -2 104.3 ... 18420.0 11900.0
## 146 0 97.0 ... 7463.0 7898.0
## 99 0 97.2 ... 8949.0 9549.0
## 130 0 96.1 ... 9295.0 12290.0
## 145 0 97.0 ... 11259.0 11248.0
## 70 -1 115.6 ... 31600.0 25552.0
## 37 0 96.5 ... 7895.0 9095.0
## 13 0 101.2 ... 21105.0 20970.0
## 5 2 99.8 ... 15250.0 15645.0
## 203 -1 109.1 ... 22470.0 22625.0
## 131 2 96.1 ... 9895.0 15645.0
## 48 0 113.0 ... 35550.0 32250.0
## 174 -1 102.4 ... 10698.0 8013.0
## 41 0 96.5 ... 12945.0 11248.0
## 6 1 105.8 ... 17710.0 16845.0
## 196 -2 104.3 ... 15985.0 12940.0
## 199 -1 104.3 ... 18950.0 16695.0
## 63 0 98.8 ... 10795.0 8013.0
## 191 0 100.4 ... 13295.0 17450.0
## 184 2 97.3 ... 7995.0 7775.0
## 149 0 96.9 ... 11694.0 11048.0
## 62 0 98.8 ... 10245.0 8495.0
## 152 1 95.7 ... 6488.0 6338.0
## 109 0 114.2 ... 12440.0 16695.0
## 68 -1 110.0 ... 28248.0 28176.0
## 127 3 89.5 ... 34028.0 32528.0
## 201 -1 109.1 ... 19045.0 22625.0
## 61 1 98.8 ... 10595.0 8845.0
## 10 2 101.2 ... 16430.0 16925.0
## 87 1 96.3 ... 9279.0 9959.0
## 32 1 93.7 ... 5399.0 5118.0
##
## [41 rows x 45 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2825.9006679934546
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2825.9006679934546
Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 2022)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=2022)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_estimators=20, random_state=2022)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([14143.75 , 6759. , 12404.9 , 17453.5 , 17023.15 ,
## 17236.75835, 14484. , 38227.825 , 16158.2 , 7630.1 ,
## 16907.2 , 8574. , 9364.7 , 12355.35 , 10192.2 ,
## 27041.9 , 8560.5 , 18588.35 , 13638.4 , 19285. ,
## 12043.15 , 34599.9 , 10807.5 , 11082.1 , 18924.2 ,
## 14141.45 , 17467.2 , 9651.45 , 14919.9 , 7964.35 ,
## 13348.2 , 9444.55 , 6510.15 , 16810.6 , 26215.75 ,
## 34658.2 , 19839.55 , 9099.4 , 14358.15 , 9639.55 ,
## 5921.45 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 83 3 95.9 ... 14869.0 14143.75000
## 91 1 94.5 ... 6649.0 6759.00000
## 1 3 88.6 ... 16500.0 12404.90000
## 110 0 114.2 ... 13860.0 17453.50000
## 136 3 99.1 ... 18150.0 17023.15000
## 105 3 91.3 ... 19699.0 17236.75835
## 197 -1 104.3 ... 16515.0 14484.00000
## 49 0 102.0 ... 36000.0 38227.82500
## 181 -1 104.5 ... 15750.0 16158.20000
## 140 2 93.3 ... 7603.0 7630.10000
## 198 -2 104.3 ... 18420.0 16907.20000
## 146 0 97.0 ... 7463.0 8574.00000
## 99 0 97.2 ... 8949.0 9364.70000
## 130 0 96.1 ... 9295.0 12355.35000
## 145 0 97.0 ... 11259.0 10192.20000
## 70 -1 115.6 ... 31600.0 27041.90000
## 37 0 96.5 ... 7895.0 8560.50000
## 13 0 101.2 ... 21105.0 18588.35000
## 5 2 99.8 ... 15250.0 13638.40000
## 203 -1 109.1 ... 22470.0 19285.00000
## 131 2 96.1 ... 9895.0 12043.15000
## 48 0 113.0 ... 35550.0 34599.90000
## 174 -1 102.4 ... 10698.0 10807.50000
## 41 0 96.5 ... 12945.0 11082.10000
## 6 1 105.8 ... 17710.0 18924.20000
## 196 -2 104.3 ... 15985.0 14141.45000
## 199 -1 104.3 ... 18950.0 17467.20000
## 63 0 98.8 ... 10795.0 9651.45000
## 191 0 100.4 ... 13295.0 14919.90000
## 184 2 97.3 ... 7995.0 7964.35000
## 149 0 96.9 ... 11694.0 13348.20000
## 62 0 98.8 ... 10245.0 9444.55000
## 152 1 95.7 ... 6488.0 6510.15000
## 109 0 114.2 ... 12440.0 16810.60000
## 68 -1 110.0 ... 28248.0 26215.75000
## 127 3 89.5 ... 34028.0 34658.20000
## 201 -1 109.1 ... 19045.0 19839.55000
## 61 1 98.8 ... 10595.0 9099.40000
## 10 2 101.2 ... 16430.0 14358.15000
## 87 1 96.3 ... 9279.0 9639.55000
## 32 1 93.7 ... 5399.0 5921.45000
##
## [41 rows x 45 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 1949.9442508048328
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 1949.9442508048328
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 83 3 95.9 ... 14489.0 14143.75000
## 91 1 94.5 ... 6849.0 6759.00000
## 1 3 88.6 ... 13495.0 12404.90000
## 110 0 114.2 ... 17075.0 17453.50000
## 136 3 99.1 ... 18620.0 17023.15000
## 105 3 91.3 ... 12764.0 17236.75835
## 197 -1 104.3 ... 13415.0 14484.00000
## 49 0 102.0 ... 40960.0 38227.82500
## 181 -1 104.5 ... 15998.0 16158.20000
## 140 2 93.3 ... 7053.0 7630.10000
## 198 -2 104.3 ... 11900.0 16907.20000
## 146 0 97.0 ... 7898.0 8574.00000
## 99 0 97.2 ... 9549.0 9364.70000
## 130 0 96.1 ... 12290.0 12355.35000
## 145 0 97.0 ... 11248.0 10192.20000
## 70 -1 115.6 ... 25552.0 27041.90000
## 37 0 96.5 ... 9095.0 8560.50000
## 13 0 101.2 ... 20970.0 18588.35000
## 5 2 99.8 ... 15645.0 13638.40000
## 203 -1 109.1 ... 22625.0 19285.00000
## 131 2 96.1 ... 15645.0 12043.15000
## 48 0 113.0 ... 32250.0 34599.90000
## 174 -1 102.4 ... 8013.0 10807.50000
## 41 0 96.5 ... 11248.0 11082.10000
## 6 1 105.8 ... 16845.0 18924.20000
## 196 -2 104.3 ... 12940.0 14141.45000
## 199 -1 104.3 ... 16695.0 17467.20000
## 63 0 98.8 ... 8013.0 9651.45000
## 191 0 100.4 ... 17450.0 14919.90000
## 184 2 97.3 ... 7775.0 7964.35000
## 149 0 96.9 ... 11048.0 13348.20000
## 62 0 98.8 ... 8495.0 9444.55000
## 152 1 95.7 ... 6338.0 6510.15000
## 109 0 114.2 ... 16695.0 16810.60000
## 68 -1 110.0 ... 28176.0 26215.75000
## 127 3 89.5 ... 32528.0 34658.20000
## 201 -1 109.1 ... 22625.0 19839.55000
## 61 1 98.8 ... 8845.0 9099.40000
## 10 2 101.2 ... 16925.0 14358.15000
## 87 1 96.3 ... 9959.0 9639.55000
## 32 1 93.7 ... 5118.0 5921.45000
##
## [41 rows x 47 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2862.41714342, 2825.90066799, 1949.9442508 ]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 2862.417143 2825.900668 1949.944251
El RMSE del modelo de regresión lineal es de
2862.417143
El RMSE del modelo de árbol de regresión es de 2825.900668.
El RMSE del modelo de bosques aleatorios es de 1949.944251
Con estos resultados, tomando en cuenta las cifras de RMSE de cada uno de los modelos, podemos decir que en Python el modelo más óptimo para estos datos con la semilla 1279 es el modelo de bosques aleatorios, resultado que también resulta el más óptimo si se utilizan los mismos datos y la semilla 2022