Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
## car_ID symboling CarName ... citympg highwaympg price
## 0 1 3 alfa-romero giulia ... 21 27 13495.0
## 1 2 3 alfa-romero stelvio ... 21 27 16500.0
## 2 3 1 alfa-romero Quadrifoglio ... 19 26 16500.0
## 3 4 2 audi 100 ls ... 24 30 13950.0
## 4 5 2 audi 100ls ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 volvo 145e (sw) ... 23 28 16845.0
## 201 202 -1 volvo 144ea ... 19 25 19045.0
## 202 203 -1 volvo 244dl ... 18 23 21485.0
## 203 204 -1 volvo 246 ... 26 27 22470.0
## 204 205 -1 volvo 264gl ... 19 25 22625.0
##
## [205 rows x 26 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID int64
## symboling int64
## CarName object
## fueltype object
## aspiration object
## doornumber object
## carbody object
## drivewheel object
## enginelocation object
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginetype object
## cylindernumber object
## enginesize int64
## fuelsystem object
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
## symboling fueltype aspiration ... citympg highwaympg price
## 0 3 gas std ... 21 27 13495.0
## 1 3 gas std ... 21 27 16500.0
## 2 1 gas std ... 19 26 16500.0
## 3 2 gas std ... 24 30 13950.0
## 4 2 gas std ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 gas std ... 23 28 16845.0
## 201 -1 gas turbo ... 19 25 19045.0
## 202 -1 gas std ... 18 23 21485.0
## 203 -1 diesel turbo ... 26 27 22470.0
## 204 -1 gas turbo ... 19 25 22625.0
##
## [205 rows x 24 columns]
Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object
Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.
El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.
¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.
Ejemplo
| genero |
|---|
| MASCULINO |
| FEMENINO |
| MASCULINO |
Mismos datos con variables dummis
| genero_masculino | genero_femenino |
|---|---|
| 1 | 0 |
| 0 | 1 |
| 1 | 0 |
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 0 3 88.6 ... 0 0
## 1 3 88.6 ... 0 0
## 2 1 94.5 ... 0 0
## 3 2 99.8 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 200 -1 109.1 ... 0 0
## 201 -1 109.1 ... 0 0
## 202 -1 109.1 ... 0 0
## 203 -1 109.1 ... 0 0
## 204 -1 109.1 ... 0 0
##
## [205 rows x 44 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1349
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80, random_state = 1349)
X_entrena
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 179 3 102.9 ... 0 0
## 28 -1 103.3 ... 0 0
## 132 3 99.1 ... 0 0
## 116 0 107.9 ... 0 0
## 123 -1 103.3 ... 0 0
## .. ... ... ... ... ...
## 194 -2 104.3 ... 0 0
## 164 1 94.5 ... 0 0
## 17 0 110.0 ... 0 0
## 126 3 89.5 ... 0 0
## 18 2 88.4 ... 0 0
##
## [164 rows x 43 columns]
X_valida
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 154 0 95.7 ... 0 0
## 147 0 97.0 ... 0 0
## 104 3 91.3 ... 0 0
## 102 0 100.4 ... 0 0
## 61 1 98.8 ... 0 0
## 163 1 94.5 ... 0 0
## 124 3 95.9 ... 1 0
## 7 1 105.8 ... 0 0
## 169 2 98.4 ... 0 0
## 16 0 103.5 ... 0 0
## 15 0 103.5 ... 0 0
## 73 0 120.9 ... 0 0
## 187 2 97.3 ... 0 0
## 109 0 114.2 ... 0 0
## 52 1 93.1 ... 0 0
## 111 0 107.9 ... 0 0
## 80 3 96.3 ... 1 0
## 103 0 100.4 ... 0 0
## 75 1 102.7 ... 0 0
## 10 2 101.2 ... 0 0
## 54 1 93.1 ... 0 0
## 40 0 96.5 ... 0 0
## 57 3 95.3 ... 0 0
## 66 0 104.9 ... 0 0
## 137 2 99.1 ... 0 0
## 161 0 95.7 ... 0 0
## 158 0 95.7 ... 0 0
## 11 0 101.2 ... 0 0
## 171 2 98.4 ... 0 0
## 2 1 94.5 ... 0 0
## 177 -1 102.4 ... 0 0
## 184 2 97.3 ... 0 0
## 21 1 93.7 ... 0 0
## 82 3 95.9 ... 1 0
## 125 3 94.5 ... 0 0
## 6 1 105.8 ... 0 0
## 65 0 104.9 ... 0 0
## 48 0 113.0 ... 0 0
## 42 1 96.5 ... 0 0
## 27 1 93.7 ... 0 0
## 79 1 93.0 ... 1 0
##
## [41 rows x 43 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 2.04388493e+02, 3.16126675e+01, -5.83182354e+01, 8.88164168e+02,
## 1.41889656e+02, 3.11573551e+00, 1.23542219e+02, -9.39968556e+02,
## -4.45174029e+03, -3.85091460e+02, -1.98531362e+01, 1.90332440e+00,
## 4.12596487e+01, 2.54267156e+01, -2.21433311e+03, 3.20956657e+03,
## -1.97026142e+02, -3.25992564e+03, -3.49764588e+03, -2.68368678e+03,
## -3.57821520e+03, 3.04267898e+02, 8.94224762e+02, 1.15655151e+04,
## -2.45524955e+03, -1.05050270e+03, 2.41000287e+03, -1.29316916e+02,
## -4.74398095e+03, 7.33671797e+02, -8.02999609e+03, -9.98863262e+03,
## -6.45072457e+03, -1.93555522e+03, -8.02476950e+03, 7.33671797e+02,
## -2.49184328e+01, -3.09837933e+03, 2.21433311e+03, -2.59273097e+03,
## 2.58576497e+02, -1.47437544e+03, 1.74188010e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9552451622081178
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 6474.28534028 8784.63604725 17131.26653913 15887.38291784
## 10057.63199124 7930.96816998 14881.59184891 20438.1219921
## 11913.68026675 27930.19570335 27335.40419357 42812.72467294
## 10568.03009743 11953.55052779 5966.55058655 17085.12986277
## 10152.23184869 16063.50754817 18725.03002097 12772.31493574
## 6621.69508548 7093.51950264 11410.57867756 11641.68787686
## 14020.44637291 6893.53320286 7455.53788324 12560.56409214
## 12421.5451551 10269.08129135 8736.67563889 8177.45242998
## 5870.96255677 14347.77070264 17760.21189592 20989.91950907
## 15106.77815584 30496.58715621 8845.11875073 10556.66271416]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 154 0 95.7 ... 7898.0 6474.285340
## 147 0 97.0 ... 10198.0 8784.636047
## 104 3 91.3 ... 17199.0 17131.266539
## 102 0 100.4 ... 14399.0 15887.382918
## 61 1 98.8 ... 10595.0 10057.631991
## 163 1 94.5 ... 8058.0 7930.968170
## 124 3 95.9 ... 12764.0 14881.591849
## 7 1 105.8 ... 18920.0 20438.121992
## 169 2 98.4 ... 9989.0 11913.680267
## 16 0 103.5 ... 41315.0 27930.195703
## 15 0 103.5 ... 30760.0 27335.404194
## 73 0 120.9 ... 40960.0 42812.724673
## 187 2 97.3 ... 9495.0 10568.030097
## 109 0 114.2 ... 12440.0 11953.550528
## 52 1 93.1 ... 6795.0 5966.550587
## 111 0 107.9 ... 15580.0 17085.129863
## 80 3 96.3 ... 9959.0 10152.231849
## 103 0 100.4 ... 13499.0 16063.507548
## 75 1 102.7 ... 16503.0 18725.030021
## 10 2 101.2 ... 16430.0 12772.314936
## 54 1 93.1 ... 7395.0 6621.695085
## 40 0 96.5 ... 10295.0 7093.519503
## 57 3 95.3 ... 13645.0 11410.578678
## 66 0 104.9 ... 18344.0 11641.687877
## 137 2 99.1 ... 18620.0 14020.446373
## 161 0 95.7 ... 8358.0 6893.533203
## 158 0 95.7 ... 7898.0 7455.537883
## 11 0 101.2 ... 16925.0 12560.564092
## 171 2 98.4 ... 11549.0 12421.545155
## 2 1 94.5 ... 16500.0 10269.081291
## 177 -1 102.4 ... 11248.0 8736.675639
## 184 2 97.3 ... 7995.0 8177.452430
## 21 1 93.7 ... 5572.0 5870.962557
## 82 3 95.9 ... 12629.0 14347.770703
## 125 3 94.5 ... 22018.0 17760.211896
## 6 1 105.8 ... 17710.0 20989.919509
## 65 0 104.9 ... 18280.0 15106.778156
## 48 0 113.0 ... 35550.0 30496.587156
## 42 1 96.5 ... 10345.0 8845.118751
## 27 1 93.7 ... 8558.0 10556.662714
## 79 1 93.0 ... 7689.0 7872.676913
##
## [41 rows x 45 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3362.896276671872
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3362.896276671872
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 1349
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1349)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(random_state=1349)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 152
#plot = plot_tree(
# decision_tree = modelo_ar,
# feature_names = datos.drop(columns = "price").columns,
# class_names = 'price',
# filled = True,
# impurity = False,
# fontsize = 10,
# precision = 2,
# ax = ax
# )
#plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos_dummis.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2544.00
## | | |--- curbweight <= 2216.50
## | | | |--- horsepower <= 68.50
## | | | | |--- carbody_hatchback <= 0.50
## | | | | | |--- curbweight <= 2104.00
## | | | | | | |--- carlength <= 166.05
## | | | | | | | |--- fuelsystem_idi <= 0.50
## | | | | | | | | |--- value: [7150.50]
## | | | | | | | |--- fuelsystem_idi > 0.50
## | | | | | | | | |--- value: [7099.00]
## | | | | | | |--- carlength > 166.05
## | | | | | | | |--- carwidth <= 64.00
## | | | | | | | | |--- value: [6692.00]
## | | | | | | | |--- carwidth > 64.00
## | | | | | | | | |--- value: [6695.00]
## | | | | | |--- curbweight > 2104.00
## | | | | | | |--- value: [7609.00]
## | | | | |--- carbody_hatchback > 0.50
## | | | | | |--- highwaympg <= 38.50
## | | | | | | |--- highwaympg <= 34.50
## | | | | | | | |--- value: [5195.00]
## | | | | | | |--- highwaympg > 34.50
## | | | | | | | |--- curbweight <= 1985.50
## | | | | | | | | |--- curbweight <= 1888.00
## | | | | | | | | | |--- value: [6377.00]
## | | | | | | | | |--- curbweight > 1888.00
## | | | | | | | | | |--- carlength <= 158.20
## | | | | | | | | | | |--- carwidth <= 64.10
## | | | | | | | | | | | |--- value: [6229.00]
## | | | | | | | | | | |--- carwidth > 64.10
## | | | | | | | | | | | |--- value: [6189.00]
## | | | | | | | | | |--- carlength > 158.20
## | | | | | | | | | | |--- value: [6095.00]
## | | | | | | | |--- curbweight > 1985.50
## | | | | | | | | |--- carheight <= 52.65
## | | | | | | | | | |--- value: [6669.00]
## | | | | | | | | |--- carheight > 52.65
## | | | | | | | | | |--- curbweight <= 2027.50
## | | | | | | | | | | |--- value: [6488.00]
## | | | | | | | | | |--- curbweight > 2027.50
## | | | | | | | | | | |--- value: [6338.00]
## | | | | | |--- highwaympg > 38.50
## | | | | | | |--- citympg <= 48.00
## | | | | | | | |--- curbweight <= 1662.50
## | | | | | | | | |--- value: [5151.00]
## | | | | | | | |--- curbweight > 1662.50
## | | | | | | | | |--- enginesize <= 91.00
## | | | | | | | | | |--- stroke <= 3.15
## | | | | | | | | | | |--- value: [5399.00]
## | | | | | | | | | |--- stroke > 3.15
## | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | |--- enginesize > 91.00
## | | | | | | | | | |--- carwidth <= 64.00
## | | | | | | | | | | |--- value: [5348.00]
## | | | | | | | | | |--- carwidth > 64.00
## | | | | | | | | | | |--- value: [5389.00]
## | | | | | | |--- citympg > 48.00
## | | | | | | | |--- value: [6479.00]
## | | | |--- horsepower > 68.50
## | | | | |--- carwidth <= 63.50
## | | | | | |--- value: [5118.00]
## | | | | |--- carwidth > 63.50
## | | | | | |--- curbweight <= 2124.00
## | | | | | | |--- carheight <= 53.60
## | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | |--- carlength <= 157.35
## | | | | | | | | | |--- value: [8916.50]
## | | | | | | | | |--- carlength > 157.35
## | | | | | | | | | |--- compressionratio <= 9.50
## | | | | | | | | | | |--- citympg <= 30.50
## | | | | | | | | | | | |--- value: [6938.00]
## | | | | | | | | | | |--- citympg > 30.50
## | | | | | | | | | | | |--- truncated branch of depth 4
## | | | | | | | | | |--- compressionratio > 9.50
## | | | | | | | | | | |--- value: [6575.00]
## | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | |--- curbweight <= 1948.00
## | | | | | | | | | |--- curbweight <= 1846.50
## | | | | | | | | | | |--- value: [6855.00]
## | | | | | | | | | |--- curbweight > 1846.50
## | | | | | | | | | | |--- citympg <= 34.00
## | | | | | | | | | | | |--- value: [6529.00]
## | | | | | | | | | | |--- citympg > 34.00
## | | | | | | | | | | | |--- value: [6295.00]
## | | | | | | | | |--- curbweight > 1948.00
## | | | | | | | | | |--- carheight <= 53.05
## | | | | | | | | | | |--- carwidth <= 64.20
## | | | | | | | | | | | |--- value: [7129.00]
## | | | | | | | | | | |--- carwidth > 64.20
## | | | | | | | | | | | |--- value: [7198.00]
## | | | | | | | | | |--- carheight > 53.05
## | | | | | | | | | | |--- value: [7799.00]
## | | | | | | |--- carheight > 53.60
## | | | | | | | |--- curbweight <= 1903.50
## | | | | | | | | |--- value: [5499.00]
## | | | | | | | |--- curbweight > 1903.50
## | | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | | |--- curbweight <= 1928.00
## | | | | | | | | | | |--- value: [6649.00]
## | | | | | | | | | |--- curbweight > 1928.00
## | | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- carwidth <= 63.85
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- carwidth > 63.85
## | | | | | | | | | | | |--- value: [7295.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- value: [7053.00]
## | | | | | |--- curbweight > 2124.00
## | | | | | | |--- horsepower <= 76.00
## | | | | | | | |--- symboling <= 0.50
## | | | | | | | | |--- value: [9258.00]
## | | | | | | | |--- symboling > 0.50
## | | | | | | | | |--- value: [8238.00]
## | | | | | | |--- horsepower > 76.00
## | | | | | | | |--- citympg <= 30.00
## | | | | | | | | |--- curbweight <= 2210.50
## | | | | | | | | | |--- stroke <= 3.02
## | | | | | | | | | | |--- value: [7775.00]
## | | | | | | | | | |--- stroke > 3.02
## | | | | | | | | | | |--- highwaympg <= 32.00
## | | | | | | | | | | | |--- value: [7957.00]
## | | | | | | | | | | |--- highwaympg > 32.00
## | | | | | | | | | | | |--- value: [7975.00]
## | | | | | | | | |--- curbweight > 2210.50
## | | | | | | | | | |--- value: [8195.00]
## | | | | | | | |--- citympg > 30.00
## | | | | | | | | |--- value: [7126.00]
## | | |--- curbweight > 2216.50
## | | | |--- cylindernumber_four <= 0.50
## | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | |--- value: [11395.00]
## | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | |--- curbweight <= 2503.50
## | | | | | | |--- value: [15645.00]
## | | | | | |--- curbweight > 2503.50
## | | | | | | |--- value: [15250.00]
## | | | |--- cylindernumber_four > 0.50
## | | | | |--- horsepower <= 89.00
## | | | | | |--- carwidth <= 66.00
## | | | | | | |--- horsepower <= 80.00
## | | | | | | | |--- curbweight <= 2277.50
## | | | | | | | | |--- peakrpm <= 4450.00
## | | | | | | | | | |--- value: [7603.00]
## | | | | | | | | |--- peakrpm > 4450.00
## | | | | | | | | | |--- horsepower <= 54.00
## | | | | | | | | | | |--- value: [7775.00]
## | | | | | | | | | |--- horsepower > 54.00
## | | | | | | | | | | |--- value: [7788.00]
## | | | | | | | |--- curbweight > 2277.50
## | | | | | | | | |--- curbweight <= 2308.50
## | | | | | | | | | |--- value: [6918.00]
## | | | | | | | | |--- curbweight > 2308.50
## | | | | | | | | | |--- value: [6785.00]
## | | | | | | |--- horsepower > 80.00
## | | | | | | | |--- carheight <= 53.15
## | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | | |--- wheelbase <= 96.65
## | | | | | | | | | | | |--- value: [6989.00]
## | | | | | | | | | | |--- wheelbase > 96.65
## | | | | | | | | | | | |--- value: [7463.00]
## | | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | | |--- value: [8189.00]
## | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | |--- value: [8499.00]
## | | | | | | | |--- carheight > 53.15
## | | | | | | | | |--- curbweight <= 2255.50
## | | | | | | | | | |--- value: [7895.00]
## | | | | | | | | |--- curbweight > 2255.50
## | | | | | | | | | |--- citympg <= 23.50
## | | | | | | | | | | |--- value: [8013.00]
## | | | | | | | | | |--- citympg > 23.50
## | | | | | | | | | | |--- symboling <= 1.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- symboling > 1.00
## | | | | | | | | | | | |--- value: [8495.00]
## | | | | | |--- carwidth > 66.00
## | | | | | | |--- curbweight <= 2417.50
## | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | |--- value: [8845.00]
## | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | |--- value: [9370.00]
## | | | | | | |--- curbweight > 2417.50
## | | | | | | | |--- fuelsystem_idi <= 0.50
## | | | | | | | | |--- value: [11245.00]
## | | | | | | | |--- fuelsystem_idi > 0.50
## | | | | | | | | |--- citympg <= 33.00
## | | | | | | | | | |--- value: [10698.00]
## | | | | | | | | |--- citympg > 33.00
## | | | | | | | | | |--- value: [10795.00]
## | | | | |--- horsepower > 89.00
## | | | | | |--- carheight <= 54.00
## | | | | | | |--- curbweight <= 2538.00
## | | | | | | | |--- horsepower <= 103.00
## | | | | | | | | |--- carwidth <= 66.55
## | | | | | | | | | |--- enginesize <= 108.50
## | | | | | | | | | | |--- value: [9960.00]
## | | | | | | | | | |--- enginesize > 108.50
## | | | | | | | | | | |--- carheight <= 52.65
## | | | | | | | | | | | |--- value: [9980.00]
## | | | | | | | | | | |--- carheight > 52.65
## | | | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | |--- carwidth > 66.55
## | | | | | | | | | |--- value: [9895.00]
## | | | | | | | |--- horsepower > 103.00
## | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | |--- peakrpm <= 5700.00
## | | | | | | | | | | |--- value: [9639.00]
## | | | | | | | | | |--- peakrpm > 5700.00
## | | | | | | | | | | |--- value: [9538.00]
## | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | |--- curbweight <= 2334.00
## | | | | | | | | | | |--- value: [9298.00]
## | | | | | | | | | |--- curbweight > 2334.00
## | | | | | | | | | | |--- value: [9279.00]
## | | | | | | |--- curbweight > 2538.00
## | | | | | | | |--- value: [8449.00]
## | | | | | |--- carheight > 54.00
## | | | | | | |--- highwaympg <= 31.00
## | | | | | | | |--- compressionratio <= 8.75
## | | | | | | | | |--- enginesize <= 108.50
## | | | | | | | | | |--- value: [11259.00]
## | | | | | | | | |--- enginesize > 108.50
## | | | | | | | | | |--- value: [11595.00]
## | | | | | | | |--- compressionratio > 8.75
## | | | | | | | | |--- carlength <= 176.00
## | | | | | | | | | |--- value: [12945.00]
## | | | | | | | | |--- carlength > 176.00
## | | | | | | | | | |--- value: [13950.00]
## | | | | | | |--- highwaympg > 31.00
## | | | | | | | |--- highwaympg <= 33.00
## | | | | | | | | |--- carheight <= 55.30
## | | | | | | | | | |--- value: [10898.00]
## | | | | | | | | |--- carheight > 55.30
## | | | | | | | | | |--- value: [9995.00]
## | | | | | | | |--- highwaympg > 33.00
## | | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | | |--- value: [9549.00]
## | | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | | |--- horsepower <= 94.50
## | | | | | | | | | | |--- value: [8948.00]
## | | | | | | | | | |--- horsepower > 94.50
## | | | | | | | | | | |--- value: [8949.00]
## | |--- curbweight > 2544.00
## | | |--- carwidth <= 68.60
## | | | |--- horsepower <= 118.50
## | | | | |--- horsepower <= 92.50
## | | | | | |--- wheelbase <= 98.25
## | | | | | | |--- carheight <= 53.30
## | | | | | | | |--- value: [11048.00]
## | | | | | | |--- carheight > 53.30
## | | | | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | | | | |--- value: [8778.00]
## | | | | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | | | | |--- value: [9295.00]
## | | | | | |--- wheelbase > 98.25
## | | | | | | |--- enginesize <= 103.00
## | | | | | | | |--- value: [13845.00]
## | | | | | | |--- enginesize > 103.00
## | | | | | | | |--- value: [12290.00]
## | | | | |--- horsepower > 92.50
## | | | | | |--- curbweight <= 2701.00
## | | | | | | |--- boreratio <= 3.50
## | | | | | | | |--- horsepower <= 110.50
## | | | | | | | | |--- value: [13295.00]
## | | | | | | | |--- horsepower > 110.50
## | | | | | | | | |--- value: [14997.50]
## | | | | | | |--- boreratio > 3.50
## | | | | | | | |--- carbody_hardtop <= 0.50
## | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | |--- horsepower <= 110.50
## | | | | | | | | | | |--- value: [11850.00]
## | | | | | | | | | |--- horsepower > 110.50
## | | | | | | | | | | |--- value: [11694.00]
## | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | |--- value: [12170.00]
## | | | | | | | |--- carbody_hardtop > 0.50
## | | | | | | | | |--- value: [11199.00]
## | | | | | |--- curbweight > 2701.00
## | | | | | | |--- horsepower <= 114.50
## | | | | | | | |--- curbweight <= 3038.00
## | | | | | | | | |--- carheight <= 56.45
## | | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | | |--- enginesize <= 131.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- enginesize > 131.00
## | | | | | | | | | | | |--- value: [12940.00]
## | | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | | |--- value: [15985.00]
## | | | | | | | | |--- carheight > 56.45
## | | | | | | | | | |--- symboling <= -0.50
## | | | | | | | | | | |--- value: [13415.00]
## | | | | | | | | | |--- symboling > -0.50
## | | | | | | | | | | |--- value: [11900.00]
## | | | | | | | |--- curbweight > 3038.00
## | | | | | | | | |--- curbweight <= 3224.50
## | | | | | | | | | |--- stroke <= 3.36
## | | | | | | | | | | |--- highwaympg <= 26.00
## | | | | | | | | | | | |--- value: [16630.00]
## | | | | | | | | | | |--- highwaympg > 26.00
## | | | | | | | | | | | |--- value: [16515.00]
## | | | | | | | | | |--- stroke > 3.36
## | | | | | | | | | | |--- value: [13200.00]
## | | | | | | | | |--- curbweight > 3224.50
## | | | | | | | | | |--- curbweight <= 3357.50
## | | | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | | | |--- value: [16695.00]
## | | | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | | | |--- value: [17425.00]
## | | | | | | | | | |--- curbweight > 3357.50
## | | | | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | | | | |--- value: [13860.00]
## | | | | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | | | | |--- value: [17075.00]
## | | | | | | |--- horsepower > 114.50
## | | | | | | | |--- carheight <= 53.65
## | | | | | | | | |--- value: [17669.00]
## | | | | | | | |--- carheight > 53.65
## | | | | | | | | |--- value: [17450.00]
## | | | |--- horsepower > 118.50
## | | | | |--- horsepower <= 131.50
## | | | | | |--- carheight <= 55.00
## | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | |--- value: [21105.00]
## | | | | | | |--- doornumber_two > 0.50
## | | | | | | | |--- value: [20970.00]
## | | | | | |--- carheight > 55.00
## | | | | | | |--- value: [24565.00]
## | | | | |--- horsepower > 131.50
## | | | | | |--- horsepower <= 158.00
## | | | | | | |--- drivewheel_fwd <= 0.50
## | | | | | | | |--- citympg <= 18.50
## | | | | | | | | |--- value: [18150.00]
## | | | | | | | |--- citympg > 18.50
## | | | | | | | | |--- curbweight <= 3141.00
## | | | | | | | | | |--- value: [15690.00]
## | | | | | | | | |--- curbweight > 3141.00
## | | | | | | | | | |--- value: [15750.00]
## | | | | | | |--- drivewheel_fwd > 0.50
## | | | | | | | |--- fuelsystem_spdi <= 0.50
## | | | | | | | | |--- horsepower <= 148.50
## | | | | | | | | | |--- value: [12964.00]
## | | | | | | | | |--- horsepower > 148.50
## | | | | | | | | | |--- value: [13499.00]
## | | | | | | | |--- fuelsystem_spdi > 0.50
## | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | |--- value: [14869.00]
## | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | |--- value: [14489.00]
## | | | | | |--- horsepower > 158.00
## | | | | | | |--- compressionratio <= 9.15
## | | | | | | | |--- carlength <= 174.45
## | | | | | | | | |--- value: [19699.00]
## | | | | | | | |--- carlength > 174.45
## | | | | | | | | |--- carbody_wagon <= 0.50
## | | | | | | | | | |--- drivewheel_rwd <= 0.50
## | | | | | | | | | | |--- carheight <= 54.05
## | | | | | | | | | | | |--- value: [17859.17]
## | | | | | | | | | | |--- carheight > 54.05
## | | | | | | | | | | | |--- value: [18150.00]
## | | | | | | | | | |--- drivewheel_rwd > 0.50
## | | | | | | | | | | |--- enginesize <= 155.50
## | | | | | | | | | | | |--- value: [18420.00]
## | | | | | | | | | | |--- enginesize > 155.50
## | | | | | | | | | | | |--- value: [18399.00]
## | | | | | | | | |--- carbody_wagon > 0.50
## | | | | | | | | | |--- value: [18950.00]
## | | | | | | |--- compressionratio > 9.15
## | | | | | | | |--- citympg <= 19.50
## | | | | | | | | |--- value: [15998.00]
## | | | | | | | |--- citympg > 19.50
## | | | | | | | | |--- value: [16558.00]
## | | |--- carwidth > 68.60
## | | | |--- curbweight <= 3055.50
## | | | | |--- peakrpm <= 5450.00
## | | | | | |--- horsepower <= 137.00
## | | | | | | |--- value: [16845.00]
## | | | | | |--- horsepower > 137.00
## | | | | | | |--- value: [19045.00]
## | | | | |--- peakrpm > 5450.00
## | | | | | |--- value: [21485.00]
## | | | |--- curbweight > 3055.50
## | | | | |--- peakrpm <= 5450.00
## | | | | | |--- citympg <= 22.50
## | | | | | | |--- value: [22625.00]
## | | | | | |--- citympg > 22.50
## | | | | | | |--- value: [22470.00]
## | | | | |--- peakrpm > 5450.00
## | | | | | |--- value: [23875.00]
## |--- enginesize > 182.00
## | |--- highwaympg <= 16.50
## | | |--- value: [45400.00]
## | |--- highwaympg > 16.50
## | | |--- fuelsystem_mpfi <= 0.50
## | | | |--- curbweight <= 3760.00
## | | | | |--- carbody_sedan <= 0.50
## | | | | | |--- carheight <= 56.80
## | | | | | | |--- value: [28176.00]
## | | | | | |--- carheight > 56.80
## | | | | | | |--- value: [28248.00]
## | | | | |--- carbody_sedan > 0.50
## | | | | | |--- value: [25552.00]
## | | | |--- curbweight > 3760.00
## | | | | |--- value: [31600.00]
## | | |--- fuelsystem_mpfi > 0.50
## | | | |--- carbody_hatchback <= 0.50
## | | | | |--- curbweight <= 4008.00
## | | | | | |--- curbweight <= 2778.00
## | | | | | | |--- value: [33278.00]
## | | | | | |--- curbweight > 2778.00
## | | | | | | |--- peakrpm <= 4875.00
## | | | | | | | |--- carwidth <= 71.10
## | | | | | | | | |--- value: [35056.00]
## | | | | | | | |--- carwidth > 71.10
## | | | | | | | | |--- value: [34184.00]
## | | | | | | |--- peakrpm > 4875.00
## | | | | | | | |--- enginesize <= 267.50
## | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | |--- value: [36880.00]
## | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | |--- value: [37028.00]
## | | | | | | | |--- enginesize > 267.50
## | | | | | | | | |--- value: [36000.00]
## | | | | |--- curbweight > 4008.00
## | | | | | |--- value: [32250.00]
## | | | |--- carbody_hatchback > 0.50
## | | | | |--- value: [31400.50]
importancia_predictores = pd.DataFrame(
{'predictor': datos_dummis.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 0.652109
## 5 curbweight 0.242459
## 10 horsepower 0.036227
## 3 carwidth 0.019139
## 13 highwaympg 0.017630
## 40 fuelsystem_mpfi 0.011611
## 31 cylindernumber_four 0.006195
## 4 carheight 0.003089
## 18 carbody_hatchback 0.001877
## 11 peakrpm 0.001450
## 1 wheelbase 0.001373
## 9 compressionratio 0.001306
## 7 boreratio 0.001258
## 21 drivewheel_fwd 0.001142
## 8 stroke 0.000766
## 12 citympg 0.000713
## 19 carbody_sedan 0.000550
## 2 carlength 0.000548
## 41 fuelsystem_spdi 0.000211
## 0 symboling 0.000201
## 20 carbody_wagon 0.000048
## 17 carbody_hardtop 0.000038
## 16 doornumber_two 0.000026
## 38 fuelsystem_idi 0.000017
## 22 drivewheel_rwd 0.000016
## 23 enginelocation_rear 0.000000
## 34 cylindernumber_twelve 0.000000
## 14 fueltype_gas 0.000000
## 15 aspiration_turbo 0.000000
## 39 fuelsystem_mfi 0.000000
## 37 fuelsystem_4bbl 0.000000
## 36 fuelsystem_2bbl 0.000000
## 35 cylindernumber_two 0.000000
## 33 cylindernumber_three 0.000000
## 24 enginetype_dohcv 0.000000
## 32 cylindernumber_six 0.000000
## 30 cylindernumber_five 0.000000
## 29 enginetype_rotor 0.000000
## 28 enginetype_ohcv 0.000000
## 27 enginetype_ohcf 0.000000
## 26 enginetype_ohc 0.000000
## 25 enginetype_l 0.000000
## 42 fuelsystem_spfi 0.000000
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 6918., 9960., 19699., 13499., 8845., 8238., 15690., 21485.,
## 11694., 36880., 36880., 45400., 6785., 16695., 6095., 16630.,
## 9639., 13499., 18420., 13950., 6695., 8845., 11395., 12290.,
## 18150., 7198., 7788., 13950., 17669., 15690., 9988., 7775.,
## 5572., 14869., 15690., 21485., 21105., 32250., 9980., 7957.,
## 7957.])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 154 0 95.7 ... 7898.0 6918.0
## 147 0 97.0 ... 10198.0 9960.0
## 104 3 91.3 ... 17199.0 19699.0
## 102 0 100.4 ... 14399.0 13499.0
## 61 1 98.8 ... 10595.0 8845.0
## 163 1 94.5 ... 8058.0 8238.0
## 124 3 95.9 ... 12764.0 15690.0
## 7 1 105.8 ... 18920.0 21485.0
## 169 2 98.4 ... 9989.0 11694.0
## 16 0 103.5 ... 41315.0 36880.0
## 15 0 103.5 ... 30760.0 36880.0
## 73 0 120.9 ... 40960.0 45400.0
## 187 2 97.3 ... 9495.0 6785.0
## 109 0 114.2 ... 12440.0 16695.0
## 52 1 93.1 ... 6795.0 6095.0
## 111 0 107.9 ... 15580.0 16630.0
## 80 3 96.3 ... 9959.0 9639.0
## 103 0 100.4 ... 13499.0 13499.0
## 75 1 102.7 ... 16503.0 18420.0
## 10 2 101.2 ... 16430.0 13950.0
## 54 1 93.1 ... 7395.0 6695.0
## 40 0 96.5 ... 10295.0 8845.0
## 57 3 95.3 ... 13645.0 11395.0
## 66 0 104.9 ... 18344.0 12290.0
## 137 2 99.1 ... 18620.0 18150.0
## 161 0 95.7 ... 8358.0 7198.0
## 158 0 95.7 ... 7898.0 7788.0
## 11 0 101.2 ... 16925.0 13950.0
## 171 2 98.4 ... 11549.0 17669.0
## 2 1 94.5 ... 16500.0 15690.0
## 177 -1 102.4 ... 11248.0 9988.0
## 184 2 97.3 ... 7995.0 7775.0
## 21 1 93.7 ... 5572.0 5572.0
## 82 3 95.9 ... 12629.0 14869.0
## 125 3 94.5 ... 22018.0 15690.0
## 6 1 105.8 ... 17710.0 21485.0
## 65 0 104.9 ... 18280.0 21105.0
## 48 0 113.0 ... 35550.0 32250.0
## 42 1 96.5 ... 10345.0 9980.0
## 27 1 93.7 ... 8558.0 7957.0
## 79 1 93.0 ... 7689.0 7957.0
##
## [41 rows x 45 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2777.332173147461
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2777.332173147461
Se construye el modelo de árbol de regresión (ar). Semilla 1349 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1349)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1349)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_estimators=20, random_state=1349)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 7977. , 10349.65 , 18873.95 , 15997.25 ,
## 8937.61666667, 8474.925 , 15009.65835 , 21553.7 ,
## 10903.4 , 33480.825 , 33529.8 , 40447.45 ,
## 8460.9 , 17086. , 6032.7 , 15509.65 ,
## 10494.6 , 16181.1 , 17863.55835 , 10647.675 ,
## 6820.55 , 8693.9 , 11317.5 , 13087.15 ,
## 17445.55 , 7971.65 , 7898.3 , 10385.40833333,
## 14249.9 , 15676.825 , 10461.7 , 8418.2 ,
## 5810.45 , 14793.05835 , 16371.7 , 21921.7 ,
## 16728.225 , 31958.175 , 9515.13333333, 8418.225 ,
## 8274.35 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 154 0 95.7 ... 7898.0 7977.000000
## 147 0 97.0 ... 10198.0 10349.650000
## 104 3 91.3 ... 17199.0 18873.950000
## 102 0 100.4 ... 14399.0 15997.250000
## 61 1 98.8 ... 10595.0 8937.616667
## 163 1 94.5 ... 8058.0 8474.925000
## 124 3 95.9 ... 12764.0 15009.658350
## 7 1 105.8 ... 18920.0 21553.700000
## 169 2 98.4 ... 9989.0 10903.400000
## 16 0 103.5 ... 41315.0 33480.825000
## 15 0 103.5 ... 30760.0 33529.800000
## 73 0 120.9 ... 40960.0 40447.450000
## 187 2 97.3 ... 9495.0 8460.900000
## 109 0 114.2 ... 12440.0 17086.000000
## 52 1 93.1 ... 6795.0 6032.700000
## 111 0 107.9 ... 15580.0 15509.650000
## 80 3 96.3 ... 9959.0 10494.600000
## 103 0 100.4 ... 13499.0 16181.100000
## 75 1 102.7 ... 16503.0 17863.558350
## 10 2 101.2 ... 16430.0 10647.675000
## 54 1 93.1 ... 7395.0 6820.550000
## 40 0 96.5 ... 10295.0 8693.900000
## 57 3 95.3 ... 13645.0 11317.500000
## 66 0 104.9 ... 18344.0 13087.150000
## 137 2 99.1 ... 18620.0 17445.550000
## 161 0 95.7 ... 8358.0 7971.650000
## 158 0 95.7 ... 7898.0 7898.300000
## 11 0 101.2 ... 16925.0 10385.408333
## 171 2 98.4 ... 11549.0 14249.900000
## 2 1 94.5 ... 16500.0 15676.825000
## 177 -1 102.4 ... 11248.0 10461.700000
## 184 2 97.3 ... 7995.0 8418.200000
## 21 1 93.7 ... 5572.0 5810.450000
## 82 3 95.9 ... 12629.0 14793.058350
## 125 3 94.5 ... 22018.0 16371.700000
## 6 1 105.8 ... 17710.0 21921.700000
## 65 0 104.9 ... 18280.0 16728.225000
## 48 0 113.0 ... 35550.0 31958.175000
## 42 1 96.5 ... 10345.0 9515.133333
## 27 1 93.7 ... 8558.0 8418.225000
## 79 1 93.0 ... 7689.0 8274.350000
##
## [41 rows x 45 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2776.9562227134757
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2776.9562227134757
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 154 0 95.7 ... 6918.0 7977.000000
## 147 0 97.0 ... 9960.0 10349.650000
## 104 3 91.3 ... 19699.0 18873.950000
## 102 0 100.4 ... 13499.0 15997.250000
## 61 1 98.8 ... 8845.0 8937.616667
## 163 1 94.5 ... 8238.0 8474.925000
## 124 3 95.9 ... 15690.0 15009.658350
## 7 1 105.8 ... 21485.0 21553.700000
## 169 2 98.4 ... 11694.0 10903.400000
## 16 0 103.5 ... 36880.0 33480.825000
## 15 0 103.5 ... 36880.0 33529.800000
## 73 0 120.9 ... 45400.0 40447.450000
## 187 2 97.3 ... 6785.0 8460.900000
## 109 0 114.2 ... 16695.0 17086.000000
## 52 1 93.1 ... 6095.0 6032.700000
## 111 0 107.9 ... 16630.0 15509.650000
## 80 3 96.3 ... 9639.0 10494.600000
## 103 0 100.4 ... 13499.0 16181.100000
## 75 1 102.7 ... 18420.0 17863.558350
## 10 2 101.2 ... 13950.0 10647.675000
## 54 1 93.1 ... 6695.0 6820.550000
## 40 0 96.5 ... 8845.0 8693.900000
## 57 3 95.3 ... 11395.0 11317.500000
## 66 0 104.9 ... 12290.0 13087.150000
## 137 2 99.1 ... 18150.0 17445.550000
## 161 0 95.7 ... 7198.0 7971.650000
## 158 0 95.7 ... 7788.0 7898.300000
## 11 0 101.2 ... 13950.0 10385.408333
## 171 2 98.4 ... 17669.0 14249.900000
## 2 1 94.5 ... 15690.0 15676.825000
## 177 -1 102.4 ... 9988.0 10461.700000
## 184 2 97.3 ... 7775.0 8418.200000
## 21 1 93.7 ... 5572.0 5810.450000
## 82 3 95.9 ... 14869.0 14793.058350
## 125 3 94.5 ... 15690.0 16371.700000
## 6 1 105.8 ... 21485.0 21921.700000
## 65 0 104.9 ... 21105.0 16728.225000
## 48 0 113.0 ... 32250.0 31958.175000
## 42 1 96.5 ... 9980.0 9515.133333
## 27 1 93.7 ... 7957.0 8418.225000
## 79 1 93.0 ... 7957.0 8274.350000
##
## [41 rows x 47 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3362.89627667, 2777.33217315, 2776.95622271]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 3362.896277 2777.332173 2776.956223
En el presente ejercicio se realizo una cargade datos numéricos de precios de automóviles con respecto a algunas variables numéricas mediante un enlace de Github en formato CSV. Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
El RMSE del modelo de regresión lineal es de 3362.896277.
El RMSE del modelo de árbol de regresión es de 2777.332173.
El RMSE del modelo de bosques aleatorios es de 2776.956223.
Se construyeron datos de entrenamiento y validación y con el porcentaje de 80% y 20% respectivamente.