Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
## car_ID symboling CarName ... citympg highwaympg price
## 0 1 3 alfa-romero giulia ... 21 27 13495.0
## 1 2 3 alfa-romero stelvio ... 21 27 16500.0
## 2 3 1 alfa-romero Quadrifoglio ... 19 26 16500.0
## 3 4 2 audi 100 ls ... 24 30 13950.0
## 4 5 2 audi 100ls ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 volvo 145e (sw) ... 23 28 16845.0
## 201 202 -1 volvo 144ea ... 19 25 19045.0
## 202 203 -1 volvo 244dl ... 18 23 21485.0
## 203 204 -1 volvo 246 ... 26 27 22470.0
## 204 205 -1 volvo 264gl ... 19 25 22625.0
##
## [205 rows x 26 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID int64
## symboling int64
## CarName object
## fueltype object
## aspiration object
## doornumber object
## carbody object
## drivewheel object
## enginelocation object
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginetype object
## cylindernumber object
## enginesize int64
## fuelsystem object
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
## symboling fueltype aspiration ... citympg highwaympg price
## 0 3 gas std ... 21 27 13495.0
## 1 3 gas std ... 21 27 16500.0
## 2 1 gas std ... 19 26 16500.0
## 3 2 gas std ... 24 30 13950.0
## 4 2 gas std ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 gas std ... 23 28 16845.0
## 201 -1 gas turbo ... 19 25 19045.0
## 202 -1 gas std ... 18 23 21485.0
## 203 -1 diesel turbo ... 26 27 22470.0
## 204 -1 gas turbo ... 19 25 22625.0
##
## [205 rows x 24 columns]
Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object
Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.
El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.
¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.
Ejemplo
| genero |
|---|
| MASCULINO |
| FEMENINO |
| MASCULINO |
Mismos datos con variables dummis
| genero_masculino | genero_femenino |
|---|---|
| 1 | 0 |
| 0 | 1 |
| 1 | 0 |
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 0 3 88.6 ... 0 0
## 1 3 88.6 ... 0 0
## 2 1 94.5 ... 0 0
## 3 2 99.8 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 200 -1 109.1 ... 0 0
## 201 -1 109.1 ... 0 0
## 202 -1 109.1 ... 0 0
## 203 -1 109.1 ... 0 0
## 204 -1 109.1 ... 0 0
##
## [205 rows x 44 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1280
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80, random_state = 1280)
X_entrena
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 68 -1 110.0 ... 0 0
## 135 2 99.1 ... 0 0
## 34 1 93.7 ... 0 0
## 186 2 97.3 ... 0 0
## 30 2 86.6 ... 0 0
## .. ... ... ... ... ...
## 173 -1 102.4 ... 0 0
## 49 0 102.0 ... 0 0
## 178 3 102.9 ... 0 0
## 3 2 99.8 ... 0 0
## 189 3 94.5 ... 0 0
##
## [164 rows x 43 columns]
X_valida
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 128 3 89.5 ... 0 0
## 55 3 95.3 ... 0 0
## 14 1 103.5 ... 0 0
## 42 1 96.5 ... 0 0
## 88 -1 96.3 ... 1 0
## 15 0 103.5 ... 0 0
## 107 0 107.9 ... 0 0
## 194 -2 104.3 ... 0 0
## 64 0 98.8 ... 0 0
## 199 -1 104.3 ... 0 0
## 7 1 105.8 ... 0 0
## 133 2 99.1 ... 0 0
## 136 3 99.1 ... 0 0
## 59 1 98.8 ... 0 0
## 62 0 98.8 ... 0 0
## 17 0 110.0 ... 0 0
## 166 1 94.5 ... 0 0
## 12 0 101.2 ... 0 0
## 138 2 93.7 ... 0 0
## 6 1 105.8 ... 0 0
## 119 1 93.7 ... 1 0
## 74 1 112.0 ... 0 0
## 187 2 97.3 ... 0 0
## 160 0 95.7 ... 0 0
## 182 2 97.3 ... 0 0
## 171 2 98.4 ... 0 0
## 50 1 93.1 ... 0 0
## 177 -1 102.4 ... 0 0
## 191 0 100.4 ... 0 0
## 53 1 93.1 ... 0 0
## 111 0 107.9 ... 0 0
## 197 -1 104.3 ... 0 0
## 156 0 95.7 ... 0 0
## 132 3 99.1 ... 0 0
## 5 2 99.8 ... 0 0
## 54 1 93.1 ... 0 0
## 179 3 102.9 ... 0 0
## 184 2 97.3 ... 0 0
## 36 0 96.5 ... 0 0
## 4 2 99.4 ... 0 0
## 109 0 114.2 ... 0 0
##
## [41 rows x 43 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 2.18001631e+02, 7.98408464e+01, -6.63545235e+01, 6.28381129e+02,
## 1.36576714e+01, 5.39647046e+00, 9.54415488e+01, -1.06219421e+03,
## -4.10493549e+03, -6.12985603e+02, 5.07080839e-01, 2.11049109e+00,
## -2.13646758e+02, 2.70003243e+02, -4.51807409e+03, 8.72823653e+02,
## 4.43428672e+02, -4.44671616e+03, -3.20702120e+03, -1.58917172e+03,
## -2.97871087e+03, 3.06190065e+02, 1.42642247e+03, 9.16799620e+03,
## -5.75689685e+03, -4.30457810e+02, 3.48215006e+03, 1.98214895e+03,
## -5.06620511e+03, -6.53695832e+02, -7.24395733e+03, -1.01233241e+04,
## -6.08995629e+03, -1.48239164e+03, -7.73314552e+03, -6.53695832e+02,
## -2.09754696e+02, -1.21645363e+03, 4.51807409e+03, -2.38228193e+03,
## -1.35841906e+02, -2.35237245e+03, -1.58999473e+03])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9393250510068653
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [37962.16085946 12731.50882386 22778.32764671 10547.10275963
## 9132.42155983 29764.87032954 13408.99438043 17347.64882652
## 9509.85869876 17787.94407715 21081.08784834 14123.74297246
## 11791.95807102 9930.84637472 11046.7611251 33432.38841051
## 7512.62175913 21025.50547975 7440.43701072 21877.01525524
## 6569.53908634 36878.91882902 9627.97435347 9136.91492541
## 10590.63109053 13673.07511573 4058.60356406 7887.45999625
## 17963.83757767 6695.27636089 17809.72158065 16681.76067389
## 8075.9024471 12961.52452907 18022.39589349 6669.14900262
## 20955.30111267 10163.39182974 6778.3411609 18789.86267249]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 128 3 89.5 ... 37028.0 37962.160859
## 55 3 95.3 ... 10945.0 12731.508824
## 14 1 103.5 ... 24565.0 22778.327647
## 42 1 96.5 ... 10345.0 10547.102760
## 88 -1 96.3 ... 9279.0 9132.421560
## 15 0 103.5 ... 30760.0 29764.870330
## 107 0 107.9 ... 11900.0 13408.994380
## 194 -2 104.3 ... 12940.0 17347.648827
## 64 0 98.8 ... 11245.0 9509.858699
## 199 -1 104.3 ... 18950.0 17787.944077
## 7 1 105.8 ... 18920.0 21081.087848
## 133 2 99.1 ... 12170.0 14123.742972
## 136 3 99.1 ... 18150.0 11791.958071
## 59 1 98.8 ... 8845.0 9930.846375
## 62 0 98.8 ... 10245.0 11046.761125
## 17 0 110.0 ... 36880.0 33432.388411
## 166 1 94.5 ... 9538.0 7512.621759
## 12 0 101.2 ... 20970.0 21025.505480
## 138 2 93.7 ... 5118.0 7440.437011
## 6 1 105.8 ... 17710.0 21877.015255
## 119 1 93.7 ... 7957.0 6569.539086
## 74 1 112.0 ... 45400.0 36878.918829
## 187 2 97.3 ... 9495.0 9627.974353
## 160 0 95.7 ... 7738.0 9136.914925
## 182 2 97.3 ... 7775.0 10590.631091
## 171 2 98.4 ... 11549.0 13673.075116
## 50 1 93.1 ... 5195.0 4058.603564
## 177 -1 102.4 ... 11248.0 7887.459996
## 191 0 100.4 ... 13295.0 17963.837578
## 53 1 93.1 ... 6695.0 6695.276361
## 111 0 107.9 ... 15580.0 17809.721581
## 197 -1 104.3 ... 16515.0 16681.760674
## 156 0 95.7 ... 6938.0 8075.902447
## 132 3 99.1 ... 11850.0 12961.524529
## 5 2 99.8 ... 15250.0 18022.395893
## 54 1 93.1 ... 7395.0 6669.149003
## 179 3 102.9 ... 15998.0 20955.301113
## 184 2 97.3 ... 7995.0 10163.391830
## 36 0 96.5 ... 7295.0 6778.341161
## 4 2 99.4 ... 17450.0 18789.862672
## 109 0 114.2 ... 12440.0 12873.501508
##
## [41 rows x 45 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2679.906783891391
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2679.906783891391
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 1280
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1280)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(random_state=1280)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 15
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 154
#plot = plot_tree(
# decision_tree = modelo_ar,
# feature_names = datos.drop(columns = "price").columns,
# class_names = 'price',
# filled = True,
# impurity = False,
# fontsize = 10,
# precision = 2,
# ax = ax
# )
#plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos_dummis.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2544.00
## | | |--- horsepower <= 89.00
## | | | |--- curbweight <= 2121.00
## | | | | |--- horsepower <= 68.50
## | | | | | |--- curbweight <= 1987.00
## | | | | | | |--- highwaympg <= 38.50
## | | | | | | | |--- curbweight <= 1924.50
## | | | | | | | | |--- curbweight <= 1902.50
## | | | | | | | | | |--- boreratio <= 3.00
## | | | | | | | | | | |--- value: [6377.00]
## | | | | | | | | | |--- boreratio > 3.00
## | | | | | | | | | | |--- value: [6095.00]
## | | | | | | | | |--- curbweight > 1902.50
## | | | | | | | | | |--- value: [6795.00]
## | | | | | | | |--- curbweight > 1924.50
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- value: [6229.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- value: [6189.00]
## | | | | | | |--- highwaympg > 38.50
## | | | | | | | |--- citympg <= 48.00
## | | | | | | | | |--- wheelbase <= 91.05
## | | | | | | | | | |--- value: [5151.00]
## | | | | | | | | |--- wheelbase > 91.05
## | | | | | | | | | |--- enginesize <= 91.00
## | | | | | | | | | | |--- carlength <= 153.65
## | | | | | | | | | | | |--- value: [5399.00]
## | | | | | | | | | | |--- carlength > 153.65
## | | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | | |--- enginesize > 91.00
## | | | | | | | | | | |--- stroke <= 3.13
## | | | | | | | | | | | |--- value: [5348.00]
## | | | | | | | | | | |--- stroke > 3.13
## | | | | | | | | | | | |--- value: [5389.00]
## | | | | | | | |--- citympg > 48.00
## | | | | | | | | |--- value: [6479.00]
## | | | | | |--- curbweight > 1987.00
## | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | |--- carlength <= 166.30
## | | | | | | | | |--- horsepower <= 61.50
## | | | | | | | | | |--- value: [7099.00]
## | | | | | | | | |--- horsepower > 61.50
## | | | | | | | | | |--- value: [7150.50]
## | | | | | | | |--- carlength > 166.30
## | | | | | | | | |--- value: [6692.00]
## | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | |--- compressionratio <= 9.20
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- value: [6488.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- value: [6338.00]
## | | | | | | | |--- compressionratio > 9.20
## | | | | | | | | |--- value: [6669.00]
## | | | | |--- horsepower > 68.50
## | | | | | |--- carheight <= 53.60
## | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | |--- carlength <= 157.35
## | | | | | | | | |--- value: [8916.50]
## | | | | | | | |--- carlength > 157.35
## | | | | | | | | |--- enginesize <= 93.50
## | | | | | | | | | |--- value: [6575.00]
## | | | | | | | | |--- enginesize > 93.50
## | | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | | |--- curbweight <= 2030.50
## | | | | | | | | | | | |--- value: [7349.00]
## | | | | | | | | | | |--- curbweight > 2030.50
## | | | | | | | | | | | |--- value: [7999.00]
## | | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | | |--- value: [8249.00]
## | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | |--- curbweight <= 1948.00
## | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | |--- horsepower <= 73.00
## | | | | | | | | | | |--- value: [6295.00]
## | | | | | | | | | |--- horsepower > 73.00
## | | | | | | | | | | |--- value: [6529.00]
## | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | |--- value: [6855.00]
## | | | | | | | |--- curbweight > 1948.00
## | | | | | | | | |--- citympg <= 30.50
## | | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | | |--- value: [7198.00]
## | | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | | |--- value: [7129.00]
## | | | | | | | | |--- citympg > 30.50
## | | | | | | | | | |--- value: [7799.00]
## | | | | | |--- carheight > 53.60
## | | | | | | |--- curbweight <= 1903.50
## | | | | | | | |--- value: [5499.00]
## | | | | | | |--- curbweight > 1903.50
## | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- value: [6649.00]
## | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | |--- citympg <= 28.00
## | | | | | | | | | |--- value: [7053.00]
## | | | | | | | | |--- citympg > 28.00
## | | | | | | | | | |--- symboling <= 0.50
## | | | | | | | | | | |--- value: [7295.00]
## | | | | | | | | | |--- symboling > 0.50
## | | | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | | | |--- value: [7499.00]
## | | | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | | | |--- value: [7299.00]
## | | | |--- curbweight > 2121.00
## | | | | |--- carlength <= 174.10
## | | | | | |--- carwidth <= 63.90
## | | | | | | |--- highwaympg <= 30.00
## | | | | | | | |--- value: [6785.00]
## | | | | | | |--- highwaympg > 30.00
## | | | | | | | |--- citympg <= 29.00
## | | | | | | | | |--- horsepower <= 67.50
## | | | | | | | | | |--- value: [7898.00]
## | | | | | | | | |--- horsepower > 67.50
## | | | | | | | | | |--- value: [7603.00]
## | | | | | | | |--- citympg > 29.00
## | | | | | | | | |--- carlength <= 168.50
## | | | | | | | | | |--- value: [7609.00]
## | | | | | | | | |--- carlength > 168.50
## | | | | | | | | | |--- value: [6918.00]
## | | | | | |--- carwidth > 63.90
## | | | | | | |--- highwaympg <= 27.00
## | | | | | | | |--- value: [9233.00]
## | | | | | | |--- highwaympg > 27.00
## | | | | | | | |--- boreratio <= 3.23
## | | | | | | | | |--- curbweight <= 2282.00
## | | | | | | | | | |--- carlength <= 166.90
## | | | | | | | | | | |--- curbweight <= 2131.00
## | | | | | | | | | | | |--- value: [8358.00]
## | | | | | | | | | | |--- curbweight > 2131.00
## | | | | | | | | | | | |--- value: [9258.00]
## | | | | | | | | | |--- carlength > 166.90
## | | | | | | | | | | |--- curbweight <= 2255.50
## | | | | | | | | | | | |--- truncated branch of depth 5
## | | | | | | | | | | |--- curbweight > 2255.50
## | | | | | | | | | | | |--- value: [8495.00]
## | | | | | | | | |--- curbweight > 2282.00
## | | | | | | | | | |--- value: [9095.00]
## | | | | | | | |--- boreratio > 3.23
## | | | | | | | | |--- carheight <= 50.50
## | | | | | | | | | |--- value: [8499.00]
## | | | | | | | | |--- carheight > 50.50
## | | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | | |--- peakrpm <= 4650.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- peakrpm > 4650.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | | |--- wheelbase <= 96.60
## | | | | | | | | | | | |--- value: [8189.00]
## | | | | | | | | | | |--- wheelbase > 96.60
## | | | | | | | | | | | |--- value: [8013.00]
## | | | | |--- carlength > 174.10
## | | | | | |--- compressionratio <= 15.75
## | | | | | | |--- carheight <= 54.80
## | | | | | | | |--- curbweight <= 2338.00
## | | | | | | | | |--- value: [8845.00]
## | | | | | | | |--- curbweight > 2338.00
## | | | | | | | | |--- enginesize <= 116.00
## | | | | | | | | | |--- value: [10295.00]
## | | | | | | | | |--- enginesize > 116.00
## | | | | | | | | | |--- value: [10595.00]
## | | | | | | |--- carheight > 54.80
## | | | | | | | |--- carheight <= 57.65
## | | | | | | | | |--- value: [8495.00]
## | | | | | | | |--- carheight > 57.65
## | | | | | | | | |--- value: [8921.00]
## | | | | | |--- compressionratio > 15.75
## | | | | | | |--- carlength <= 176.70
## | | | | | | | |--- value: [10698.00]
## | | | | | | |--- carlength > 176.70
## | | | | | | | |--- value: [10795.00]
## | | |--- horsepower > 89.00
## | | | |--- peakrpm <= 5650.00
## | | | | |--- carwidth <= 63.90
## | | | | | |--- carheight <= 50.70
## | | | | | | |--- value: [8558.00]
## | | | | | |--- carheight > 50.70
## | | | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | | | |--- value: [7689.00]
## | | | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | | | |--- value: [7957.00]
## | | | | |--- carwidth > 63.90
## | | | | | |--- stroke <= 3.43
## | | | | | | |--- carlength <= 175.05
## | | | | | | | |--- carlength <= 162.50
## | | | | | | | | |--- value: [11595.00]
## | | | | | | | |--- carlength > 162.50
## | | | | | | | | |--- compressionratio <= 8.10
## | | | | | | | | | |--- value: [11259.00]
## | | | | | | | | |--- compressionratio > 8.10
## | | | | | | | | | |--- carbody_wagon <= 0.50
## | | | | | | | | | | |--- carlength <= 171.85
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- carlength > 171.85
## | | | | | | | | | | | |--- value: [9960.00]
## | | | | | | | | | |--- carbody_wagon > 0.50
## | | | | | | | | | | |--- value: [10198.00]
## | | | | | | |--- carlength > 175.05
## | | | | | | | |--- value: [13950.00]
## | | | | | |--- stroke > 3.43
## | | | | | | |--- curbweight <= 2538.00
## | | | | | | | |--- curbweight <= 2408.50
## | | | | | | | | |--- symboling <= 2.00
## | | | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | | | |--- value: [9549.00]
## | | | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | | | |--- enginesize <= 115.00
## | | | | | | | | | | | |--- value: [9279.00]
## | | | | | | | | | | |--- enginesize > 115.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | |--- symboling > 2.00
## | | | | | | | | | |--- value: [9959.00]
## | | | | | | | |--- curbweight > 2408.50
## | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- value: [9639.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- enginesize <= 127.00
## | | | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | | | |--- enginesize > 127.00
## | | | | | | | | | | | |--- value: [9895.00]
## | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | |--- value: [10898.00]
## | | | | | | |--- curbweight > 2538.00
## | | | | | | | |--- value: [8449.00]
## | | | |--- peakrpm > 5650.00
## | | | | |--- curbweight <= 2382.50
## | | | | | |--- stroke <= 3.17
## | | | | | | |--- value: [9298.00]
## | | | | | |--- stroke > 3.17
## | | | | | | |--- value: [11845.00]
## | | | | |--- curbweight > 2382.50
## | | | | | |--- boreratio <= 3.41
## | | | | | | |--- horsepower <= 118.00
## | | | | | | | |--- carlength <= 172.20
## | | | | | | | | |--- value: [13645.00]
## | | | | | | | |--- carlength > 172.20
## | | | | | | | | |--- value: [12945.00]
## | | | | | | |--- horsepower > 118.00
## | | | | | | | |--- value: [15645.00]
## | | | | | |--- boreratio > 3.41
## | | | | | | |--- symboling <= 1.00
## | | | | | | | |--- value: [16925.00]
## | | | | | | |--- symboling > 1.00
## | | | | | | | |--- value: [16430.00]
## | |--- curbweight > 2544.00
## | | |--- carwidth <= 67.45
## | | | |--- carbody_sedan <= 0.50
## | | | | |--- horsepower <= 100.00
## | | | | | |--- carheight <= 55.15
## | | | | | | |--- carheight <= 53.25
## | | | | | | | |--- value: [11048.00]
## | | | | | | |--- carheight > 53.25
## | | | | | | | |--- value: [12290.00]
## | | | | | |--- carheight > 55.15
## | | | | | | |--- peakrpm <= 4950.00
## | | | | | | | |--- value: [8778.00]
## | | | | | | |--- peakrpm > 4950.00
## | | | | | | | |--- value: [9295.00]
## | | | | |--- horsepower > 100.00
## | | | | | |--- boreratio <= 3.52
## | | | | | | |--- horsepower <= 153.00
## | | | | | | | |--- enginesize <= 155.50
## | | | | | | | | |--- compressionratio <= 9.15
## | | | | | | | | | |--- value: [14997.50]
## | | | | | | | | |--- compressionratio > 9.15
## | | | | | | | | | |--- value: [15040.00]
## | | | | | | | |--- enginesize > 155.50
## | | | | | | | | |--- value: [14399.00]
## | | | | | | |--- horsepower > 153.00
## | | | | | | | |--- enginesize <= 156.50
## | | | | | | | | |--- value: [16500.00]
## | | | | | | | |--- enginesize > 156.50
## | | | | | | | | |--- value: [15750.00]
## | | | | | |--- boreratio > 3.52
## | | | | | | |--- curbweight <= 2877.00
## | | | | | | | |--- carlength <= 173.40
## | | | | | | | | |--- fuelsystem_spdi <= 0.50
## | | | | | | | | | |--- value: [12964.00]
## | | | | | | | | |--- fuelsystem_spdi > 0.50
## | | | | | | | | | |--- drivewheel_fwd <= 0.50
## | | | | | | | | | | |--- value: [12764.00]
## | | | | | | | | | |--- drivewheel_fwd > 0.50
## | | | | | | | | | | |--- value: [12629.00]
## | | | | | | | |--- carlength > 173.40
## | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | |--- horsepower <= 113.50
## | | | | | | | | | | |--- value: [11694.00]
## | | | | | | | | | |--- horsepower > 113.50
## | | | | | | | | | | |--- value: [11199.00]
## | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | |--- value: [9989.00]
## | | | | | | |--- curbweight > 2877.00
## | | | | | | | |--- peakrpm <= 4900.00
## | | | | | | | | |--- value: [17669.00]
## | | | | | | | |--- peakrpm > 4900.00
## | | | | | | | | |--- aspiration_turbo <= 0.50
## | | | | | | | | | |--- value: [13415.00]
## | | | | | | | | |--- aspiration_turbo > 0.50
## | | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | | |--- value: [14869.00]
## | | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | | |--- value: [14489.00]
## | | | |--- carbody_sedan > 0.50
## | | | | |--- carlength <= 178.50
## | | | | | |--- horsepower <= 120.50
## | | | | | | |--- stroke <= 3.40
## | | | | | | | |--- value: [18280.00]
## | | | | | | |--- stroke > 3.40
## | | | | | | | |--- value: [18344.00]
## | | | | | |--- horsepower > 120.50
## | | | | | | |--- value: [21105.00]
## | | | | |--- carlength > 178.50
## | | | | | |--- horsepower <= 158.00
## | | | | | | |--- carlength <= 185.60
## | | | | | | | |--- stroke <= 3.34
## | | | | | | | | |--- value: [13499.00]
## | | | | | | | |--- stroke > 3.34
## | | | | | | | | |--- value: [13845.00]
## | | | | | | |--- carlength > 185.60
## | | | | | | | |--- peakrpm <= 5325.00
## | | | | | | | | |--- enginetype_ohc <= 0.50
## | | | | | | | | | |--- value: [15690.00]
## | | | | | | | | |--- enginetype_ohc > 0.50
## | | | | | | | | | |--- value: [15510.00]
## | | | | | | | |--- peakrpm > 5325.00
## | | | | | | | | |--- value: [15985.00]
## | | | | | |--- horsepower > 158.00
## | | | | | | |--- highwaympg <= 24.00
## | | | | | | | |--- value: [18420.00]
## | | | | | | |--- highwaympg > 24.00
## | | | | | | | |--- value: [18620.00]
## | | |--- carwidth > 67.45
## | | | |--- carwidth <= 68.85
## | | | | |--- peakrpm <= 5100.00
## | | | | | |--- curbweight <= 3224.50
## | | | | | | |--- fueltype_gas <= 0.50
## | | | | | | | |--- value: [13200.00]
## | | | | | | |--- fueltype_gas > 0.50
## | | | | | | | |--- aspiration_turbo <= 0.50
## | | | | | | | | |--- value: [16630.00]
## | | | | | | | |--- aspiration_turbo > 0.50
## | | | | | | | | |--- value: [16503.00]
## | | | | | |--- curbweight > 3224.50
## | | | | | | |--- carheight <= 57.70
## | | | | | | | |--- aspiration_turbo <= 0.50
## | | | | | | | | |--- value: [16695.00]
## | | | | | | | |--- aspiration_turbo > 0.50
## | | | | | | | | |--- value: [17425.00]
## | | | | | | |--- carheight > 57.70
## | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | |--- value: [13860.00]
## | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | |--- value: [17075.00]
## | | | | |--- peakrpm > 5100.00
## | | | | | |--- boreratio <= 3.86
## | | | | | | |--- compressionratio <= 8.85
## | | | | | | | |--- compressionratio <= 7.40
## | | | | | | | | |--- enginetype_ohc <= 0.50
## | | | | | | | | | |--- value: [18150.00]
## | | | | | | | | |--- enginetype_ohc > 0.50
## | | | | | | | | | |--- value: [17859.17]
## | | | | | | | |--- compressionratio > 7.40
## | | | | | | | | |--- compressionratio <= 8.25
## | | | | | | | | | |--- value: [19699.00]
## | | | | | | | | |--- compressionratio > 8.25
## | | | | | | | | | |--- value: [19045.00]
## | | | | | | |--- compressionratio > 8.85
## | | | | | | | |--- symboling <= 2.00
## | | | | | | | | |--- value: [18399.00]
## | | | | | | | |--- symboling > 2.00
## | | | | | | | | |--- enginesize <= 176.00
## | | | | | | | | | |--- value: [16558.00]
## | | | | | | | | |--- enginesize > 176.00
## | | | | | | | | | |--- value: [17199.00]
## | | | | | |--- boreratio > 3.86
## | | | | | | |--- value: [22018.00]
## | | | |--- carwidth > 68.85
## | | | | |--- highwaympg <= 27.50
## | | | | | |--- horsepower <= 137.00
## | | | | | | |--- horsepower <= 124.00
## | | | | | | | |--- horsepower <= 110.00
## | | | | | | | | |--- value: [22470.00]
## | | | | | | | |--- horsepower > 110.00
## | | | | | | | | |--- value: [22625.00]
## | | | | | | |--- horsepower > 124.00
## | | | | | | | |--- value: [21485.00]
## | | | | | |--- horsepower > 137.00
## | | | | | | |--- value: [23875.00]
## | | | | |--- highwaympg > 27.50
## | | | | | |--- value: [16845.00]
## |--- enginesize > 182.00
## | |--- compressionratio <= 8.05
## | | |--- doornumber_two <= 0.50
## | | | |--- value: [40960.00]
## | | |--- doornumber_two > 0.50
## | | | |--- value: [41315.00]
## | |--- compressionratio > 8.05
## | | |--- enginesize <= 188.50
## | | | |--- carwidth <= 71.00
## | | | | |--- carbody_sedan <= 0.50
## | | | | | |--- doornumber_two <= 0.50
## | | | | | | |--- value: [28248.00]
## | | | | | |--- doornumber_two > 0.50
## | | | | | | |--- value: [28176.00]
## | | | | |--- carbody_sedan > 0.50
## | | | | | |--- value: [25552.00]
## | | | |--- carwidth > 71.00
## | | | | |--- value: [31600.00]
## | | |--- enginesize > 188.50
## | | | |--- enginesize <= 218.50
## | | | | |--- highwaympg <= 26.50
## | | | | | |--- value: [33278.00]
## | | | | |--- highwaympg > 26.50
## | | | | | |--- value: [31400.50]
## | | | |--- enginesize > 218.50
## | | | | |--- doornumber_two <= 0.50
## | | | | | |--- highwaympg <= 18.50
## | | | | | | |--- value: [34184.00]
## | | | | | |--- highwaympg > 18.50
## | | | | | | |--- value: [33900.00]
## | | | | |--- doornumber_two > 0.50
## | | | | | |--- citympg <= 14.50
## | | | | | | |--- value: [36000.00]
## | | | | | |--- citympg > 14.50
## | | | | | | |--- value: [35056.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos_dummis.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 6.665171e-01
## 5 curbweight 2.153374e-01
## 10 horsepower 3.227645e-02
## 3 carwidth 2.904430e-02
## 9 compressionratio 1.557212e-02
## 11 peakrpm 1.209516e-02
## 19 carbody_sedan 8.951184e-03
## 2 carlength 7.580508e-03
## 7 boreratio 4.129452e-03
## 13 highwaympg 3.371561e-03
## 4 carheight 1.641767e-03
## 8 stroke 1.219705e-03
## 14 fueltype_gas 7.800696e-04
## 18 carbody_hatchback 5.339940e-04
## 16 doornumber_two 3.286578e-04
## 0 symboling 2.359916e-04
## 12 citympg 2.081932e-04
## 15 aspiration_turbo 1.474810e-04
## 1 wheelbase 9.602660e-06
## 26 enginetype_ohc 6.038975e-06
## 41 fuelsystem_spdi 4.925198e-06
## 20 carbody_wagon 3.736431e-06
## 40 fuelsystem_mpfi 3.707720e-06
## 21 drivewheel_fwd 9.408164e-07
## 36 fuelsystem_2bbl 5.162230e-11
## 24 enginetype_dohcv 0.000000e+00
## 34 cylindernumber_twelve 0.000000e+00
## 17 carbody_hardtop 0.000000e+00
## 22 drivewheel_rwd 0.000000e+00
## 39 fuelsystem_mfi 0.000000e+00
## 38 fuelsystem_idi 0.000000e+00
## 37 fuelsystem_4bbl 0.000000e+00
## 35 cylindernumber_two 0.000000e+00
## 33 cylindernumber_three 0.000000e+00
## 25 enginetype_l 0.000000e+00
## 32 cylindernumber_six 0.000000e+00
## 31 cylindernumber_four 0.000000e+00
## 30 cylindernumber_five 0.000000e+00
## 29 enginetype_rotor 0.000000e+00
## 28 enginetype_ohcv 0.000000e+00
## 27 enginetype_ohcf 0.000000e+00
## 23 enginelocation_rear 0.000000e+00
## 42 fuelsystem_spfi 0.000000e+00
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([33278., 11845., 15510., 9549., 9279., 40960., 16630., 15985.,
## 8495., 14489., 22470., 15510., 9989., 10595., 8495., 40960.,
## 9298., 21105., 7299., 22470., 7689., 41315., 9095., 7999.,
## 8495., 9989., 6095., 9988., 13845., 6229., 16630., 13415.,
## 7999., 9989., 13950., 6229., 16558., 8495., 7295., 18280.,
## 13860.])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 128 3 89.5 ... 37028.0 33278.0
## 55 3 95.3 ... 10945.0 11845.0
## 14 1 103.5 ... 24565.0 15510.0
## 42 1 96.5 ... 10345.0 9549.0
## 88 -1 96.3 ... 9279.0 9279.0
## 15 0 103.5 ... 30760.0 40960.0
## 107 0 107.9 ... 11900.0 16630.0
## 194 -2 104.3 ... 12940.0 15985.0
## 64 0 98.8 ... 11245.0 8495.0
## 199 -1 104.3 ... 18950.0 14489.0
## 7 1 105.8 ... 18920.0 22470.0
## 133 2 99.1 ... 12170.0 15510.0
## 136 3 99.1 ... 18150.0 9989.0
## 59 1 98.8 ... 8845.0 10595.0
## 62 0 98.8 ... 10245.0 8495.0
## 17 0 110.0 ... 36880.0 40960.0
## 166 1 94.5 ... 9538.0 9298.0
## 12 0 101.2 ... 20970.0 21105.0
## 138 2 93.7 ... 5118.0 7299.0
## 6 1 105.8 ... 17710.0 22470.0
## 119 1 93.7 ... 7957.0 7689.0
## 74 1 112.0 ... 45400.0 41315.0
## 187 2 97.3 ... 9495.0 9095.0
## 160 0 95.7 ... 7738.0 7999.0
## 182 2 97.3 ... 7775.0 8495.0
## 171 2 98.4 ... 11549.0 9989.0
## 50 1 93.1 ... 5195.0 6095.0
## 177 -1 102.4 ... 11248.0 9988.0
## 191 0 100.4 ... 13295.0 13845.0
## 53 1 93.1 ... 6695.0 6229.0
## 111 0 107.9 ... 15580.0 16630.0
## 197 -1 104.3 ... 16515.0 13415.0
## 156 0 95.7 ... 6938.0 7999.0
## 132 3 99.1 ... 11850.0 9989.0
## 5 2 99.8 ... 15250.0 13950.0
## 54 1 93.1 ... 7395.0 6229.0
## 179 3 102.9 ... 15998.0 16558.0
## 184 2 97.3 ... 7995.0 8495.0
## 36 0 96.5 ... 7295.0 7295.0
## 4 2 99.4 ... 17450.0 18280.0
## 109 0 114.2 ... 12440.0 13860.0
##
## [41 rows x 45 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 3297.2461458528483
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 3297.2461458528483
Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1280)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1280)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_estimators=20, random_state=1280)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([31863.775 , 12755. , 17206.7 , 8890. ,
## 9388. , 37686.5 , 16791.75 , 15437.95 ,
## 9825. , 17011.15 , 20134.6 , 14415.25 ,
## 16768.60835 , 10256.75 , 9757.15 , 38117.43571429,
## 10736.55 , 18657.41666667, 7125.55 , 20777.25 ,
## 7964.3 , 39006.63571429, 8054.7 , 7736.85 ,
## 7299.5 , 12614.1 , 6105.65 , 10292.6 ,
## 14880.25 , 6664.8 , 16676.4 , 14416.5 ,
## 7775.6 , 14035.25 , 14001.575 , 6723.76666667,
## 17296.75 , 7299.5 , 7737.33333333, 16814.3 ,
## 16138.75 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 128 3 89.5 ... 37028.0 31863.775000
## 55 3 95.3 ... 10945.0 12755.000000
## 14 1 103.5 ... 24565.0 17206.700000
## 42 1 96.5 ... 10345.0 8890.000000
## 88 -1 96.3 ... 9279.0 9388.000000
## 15 0 103.5 ... 30760.0 37686.500000
## 107 0 107.9 ... 11900.0 16791.750000
## 194 -2 104.3 ... 12940.0 15437.950000
## 64 0 98.8 ... 11245.0 9825.000000
## 199 -1 104.3 ... 18950.0 17011.150000
## 7 1 105.8 ... 18920.0 20134.600000
## 133 2 99.1 ... 12170.0 14415.250000
## 136 3 99.1 ... 18150.0 16768.608350
## 59 1 98.8 ... 8845.0 10256.750000
## 62 0 98.8 ... 10245.0 9757.150000
## 17 0 110.0 ... 36880.0 38117.435714
## 166 1 94.5 ... 9538.0 10736.550000
## 12 0 101.2 ... 20970.0 18657.416667
## 138 2 93.7 ... 5118.0 7125.550000
## 6 1 105.8 ... 17710.0 20777.250000
## 119 1 93.7 ... 7957.0 7964.300000
## 74 1 112.0 ... 45400.0 39006.635714
## 187 2 97.3 ... 9495.0 8054.700000
## 160 0 95.7 ... 7738.0 7736.850000
## 182 2 97.3 ... 7775.0 7299.500000
## 171 2 98.4 ... 11549.0 12614.100000
## 50 1 93.1 ... 5195.0 6105.650000
## 177 -1 102.4 ... 11248.0 10292.600000
## 191 0 100.4 ... 13295.0 14880.250000
## 53 1 93.1 ... 6695.0 6664.800000
## 111 0 107.9 ... 15580.0 16676.400000
## 197 -1 104.3 ... 16515.0 14416.500000
## 156 0 95.7 ... 6938.0 7775.600000
## 132 3 99.1 ... 11850.0 14035.250000
## 5 2 99.8 ... 15250.0 14001.575000
## 54 1 93.1 ... 7395.0 6723.766667
## 179 3 102.9 ... 15998.0 17296.750000
## 184 2 97.3 ... 7995.0 7299.500000
## 36 0 96.5 ... 7295.0 7737.333333
## 4 2 99.4 ... 17450.0 16814.300000
## 109 0 114.2 ... 12440.0 16138.750000
##
## [41 rows x 45 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2616.3578144633602
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2616.3578144633602
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 128 3 89.5 ... 33278.0 31863.775000
## 55 3 95.3 ... 11845.0 12755.000000
## 14 1 103.5 ... 15510.0 17206.700000
## 42 1 96.5 ... 9549.0 8890.000000
## 88 -1 96.3 ... 9279.0 9388.000000
## 15 0 103.5 ... 40960.0 37686.500000
## 107 0 107.9 ... 16630.0 16791.750000
## 194 -2 104.3 ... 15985.0 15437.950000
## 64 0 98.8 ... 8495.0 9825.000000
## 199 -1 104.3 ... 14489.0 17011.150000
## 7 1 105.8 ... 22470.0 20134.600000
## 133 2 99.1 ... 15510.0 14415.250000
## 136 3 99.1 ... 9989.0 16768.608350
## 59 1 98.8 ... 10595.0 10256.750000
## 62 0 98.8 ... 8495.0 9757.150000
## 17 0 110.0 ... 40960.0 38117.435714
## 166 1 94.5 ... 9298.0 10736.550000
## 12 0 101.2 ... 21105.0 18657.416667
## 138 2 93.7 ... 7299.0 7125.550000
## 6 1 105.8 ... 22470.0 20777.250000
## 119 1 93.7 ... 7689.0 7964.300000
## 74 1 112.0 ... 41315.0 39006.635714
## 187 2 97.3 ... 9095.0 8054.700000
## 160 0 95.7 ... 7999.0 7736.850000
## 182 2 97.3 ... 8495.0 7299.500000
## 171 2 98.4 ... 9989.0 12614.100000
## 50 1 93.1 ... 6095.0 6105.650000
## 177 -1 102.4 ... 9988.0 10292.600000
## 191 0 100.4 ... 13845.0 14880.250000
## 53 1 93.1 ... 6229.0 6664.800000
## 111 0 107.9 ... 16630.0 16676.400000
## 197 -1 104.3 ... 13415.0 14416.500000
## 156 0 95.7 ... 7999.0 7775.600000
## 132 3 99.1 ... 9989.0 14035.250000
## 5 2 99.8 ... 13950.0 14001.575000
## 54 1 93.1 ... 6229.0 6723.766667
## 179 3 102.9 ... 16558.0 17296.750000
## 184 2 97.3 ... 8495.0 7299.500000
## 36 0 96.5 ... 7295.0 7737.333333
## 4 2 99.4 ... 18280.0 16814.300000
## 109 0 114.2 ... 13860.0 16138.750000
##
## [41 rows x 47 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2679.90678389, 3297.24614585, 2616.35781446]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 2679.906784 3297.246146 2616.357814
El ejercicio consistió en cargar un conjunto de datos numéricos de precios de automóviles con respecto a todas las variables numéricas y categóricas respectivamente.
El modelo de regresión linea múltiple destaca el estadístico Adjusted R-squared con un valor de 0.9393, lo que se define como que las variables independientes explican aproximadamente el 93.93% de la variable dependiente precio.
En el modelo de árbol de regresión las variables que corresponden a los predictores más importantes para este modelo son enginesize, curbweight, horsepower, carwidth y compressionratio
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
La variable enginesize continua estando presente como la más importante en todos los modelos de regresión, incluso en los que corresponden a la programación en R.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. El valor que arrojó fue de 2616.357814, siendo el más bajo de los 3 modelos de regresión. Comparando este resultado con el del anterior caso, donde no estaban involucradas las variables categóricas, la cantidad si disminuyó un poco.
Cabe señalar que en la realización de este modelo, usando la semilla 1280, no hubo inconvenientes ni errores con respecto a elementos de datos de validación que no sean reconocidos en el modelo por no haber estado presentes en los datos de entrenamiento. Por lo tanto esto significa que los datos de entrenamiento cubren y garantizan todos los posibles valores de las variables categoricas en los datos de validación respectivamente.
Finalmente comparando los resultados en R con los resultados arrojados en Python, el modelo que proporcionó el menor valor del estádistico RMSE fue el de random forest en ambos casos. No obstante, en R tuvo una cantidad de 2271.819 y en Python tuvo otra de 2616.357814, por lo tanto se puede concluir en que el modelo más óptimo, haciendo uso de todas las variables numéricas y categóricas de este caso especificamente, vuelve a ser el random forest pero haciendo uso de la programación en R.