Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
## car_ID symboling CarName ... citympg highwaympg price
## 0 1 3 alfa-romero giulia ... 21 27 13495.0
## 1 2 3 alfa-romero stelvio ... 21 27 16500.0
## 2 3 1 alfa-romero Quadrifoglio ... 19 26 16500.0
## 3 4 2 audi 100 ls ... 24 30 13950.0
## 4 5 2 audi 100ls ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 volvo 145e (sw) ... 23 28 16845.0
## 201 202 -1 volvo 144ea ... 19 25 19045.0
## 202 203 -1 volvo 244dl ... 18 23 21485.0
## 203 204 -1 volvo 246 ... 26 27 22470.0
## 204 205 -1 volvo 264gl ... 19 25 22625.0
##
## [205 rows x 26 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID int64
## symboling int64
## CarName object
## fueltype object
## aspiration object
## doornumber object
## carbody object
## drivewheel object
## enginelocation object
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginetype object
## cylindernumber object
## enginesize int64
## fuelsystem object
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
## symboling fueltype aspiration ... citympg highwaympg price
## 0 3 gas std ... 21 27 13495.0
## 1 3 gas std ... 21 27 16500.0
## 2 1 gas std ... 19 26 16500.0
## 3 2 gas std ... 24 30 13950.0
## 4 2 gas std ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 gas std ... 23 28 16845.0
## 201 -1 gas turbo ... 19 25 19045.0
## 202 -1 gas std ... 18 23 21485.0
## 203 -1 diesel turbo ... 26 27 22470.0
## 204 -1 gas turbo ... 19 25 22625.0
##
## [205 rows x 24 columns]
Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object
Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.
El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.
¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.
Ejemplo
| genero |
|---|
| MASCULINO |
| FEMENINO |
| MASCULINO |
Mismos datos con variables dummis
| genero_masculino | genero_femenino |
|---|---|
| 1 | 0 |
| 0 | 1 |
| 1 | 0 |
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 0 3 88.6 ... 0 0
## 1 3 88.6 ... 0 0
## 2 1 94.5 ... 0 0
## 3 2 99.8 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 200 -1 109.1 ... 0 0
## 201 -1 109.1 ... 0 0
## 202 -1 109.1 ... 0 0
## 203 -1 109.1 ... 0 0
## 204 -1 109.1 ... 0 0
##
## [205 rows x 44 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1307
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80, random_state = 1307)
X_entrena
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 109 0 114.2 ... 0 0
## 121 1 93.7 ... 0 0
## 9 0 99.5 ... 0 0
## 35 0 96.5 ... 0 0
## 25 1 93.7 ... 0 0
## .. ... ... ... ... ...
## 59 1 98.8 ... 0 0
## 134 3 99.1 ... 0 0
## 178 3 102.9 ... 0 0
## 81 3 96.3 ... 0 0
## 122 1 93.7 ... 0 0
##
## [164 rows x 43 columns]
X_valida
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 12 0 101.2 ... 0 0
## 65 0 104.9 ... 0 0
## 167 2 98.4 ... 0 0
## 5 2 99.8 ... 0 0
## 34 1 93.7 ... 0 0
## 199 -1 104.3 ... 0 0
## 201 -1 109.1 ... 0 0
## 44 1 94.5 ... 0 0
## 41 0 96.5 ... 0 0
## 148 0 96.9 ... 0 0
## 179 3 102.9 ... 0 0
## 184 2 97.3 ... 0 0
## 182 2 97.3 ... 0 0
## 106 1 99.2 ... 0 0
## 153 0 95.7 ... 0 0
## 102 0 100.4 ... 0 0
## 130 0 96.1 ... 0 0
## 161 0 95.7 ... 0 0
## 156 0 95.7 ... 0 0
## 197 -1 104.3 ... 0 0
## 13 0 101.2 ... 0 0
## 61 1 98.8 ... 0 0
## 149 0 96.9 ... 0 0
## 124 3 95.9 ... 1 0
## 98 2 95.1 ... 0 0
## 16 0 103.5 ... 0 0
## 60 0 98.8 ... 0 0
## 139 2 93.7 ... 0 0
## 2 1 94.5 ... 0 0
## 142 0 97.2 ... 0 0
## 7 1 105.8 ... 0 0
## 55 3 95.3 ... 0 0
## 172 2 98.4 ... 0 0
## 73 0 120.9 ... 0 0
## 32 1 93.7 ... 0 0
## 158 0 95.7 ... 0 0
## 15 0 103.5 ... 0 0
## 49 0 102.0 ... 0 0
## 19 1 94.5 ... 0 0
## 93 1 94.5 ... 0 0
## 112 0 107.9 ... 0 0
##
## [41 rows x 43 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
## LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 6.32613859e+02, 2.27963855e+02, -1.10498740e+02, 1.05150436e+03,
## 6.57202182e+01, 3.23427770e+00, 1.01958489e+02, -4.46104395e+02,
## -4.30940839e+03, -9.10266208e+02, -1.14561783e+01, 1.80088062e+00,
## -1.53320270e+02, 1.92507393e+02, -5.87164142e+03, 1.96430362e+03,
## -8.03927732e+02, -2.24962852e+03, -2.75484291e+03, -1.76613571e+03,
## -2.76841960e+03, -7.11479655e+02, 6.27655796e+02, 1.40654542e+04,
## -5.71810369e+03, -5.68451258e+03, 3.61941565e+02, -2.36432179e+03,
## -5.87958377e+03, -1.00942368e+03, -8.83871908e+03, -9.11706679e+03,
## -7.15766828e+03, 2.51726428e+03, 1.81898940e-12, -1.00942368e+03,
## -9.66144774e+02, -3.00659664e+03, 5.87164142e+03, -4.03936001e+03,
## -1.27001937e+03, -3.08920128e+03, -1.47396074e+03])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.949825599700917
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [16906.24645928 17905.34379668 12687.08065739 14473.81234375
## 7531.24111686 18092.69851123 22002.6908004 6524.46701582
## 8555.44476079 8649.54186513 21546.70020133 10197.10748669
## 9383.47692113 16337.88097091 6237.12717436 14033.0496879
## 9190.99612388 6882.6353427 8022.76284149 16158.0504239
## 17888.05946516 10196.93786786 10749.92730184 15452.77150481
## 6457.26913979 26421.2491729 11556.11227891 6057.31654712
## 9121.86281173 7111.8637196 20288.21179853 12736.91430575
## 16409.34019649 44096.31976689 6063.21650899 7694.1831824
## 26350.36527899 40876.74888604 5535.7598136 5176.01593528]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 12 0 101.2 ... 20970.0 16906.246459
## 65 0 104.9 ... 18280.0 17905.343797
## 167 2 98.4 ... 8449.0 12687.080657
## 5 2 99.8 ... 15250.0 14473.812344
## 34 1 93.7 ... 7129.0 7531.241117
## 199 -1 104.3 ... 18950.0 18092.698511
## 201 -1 109.1 ... 19045.0 22002.690800
## 44 1 94.5 ... 8916.5 6524.467016
## 41 0 96.5 ... 12945.0 8555.444761
## 148 0 96.9 ... 8013.0 8649.541865
## 179 3 102.9 ... 15998.0 21546.700201
## 184 2 97.3 ... 7995.0 10197.107487
## 182 2 97.3 ... 7775.0 9383.476921
## 106 1 99.2 ... 18399.0 16337.880971
## 153 0 95.7 ... 6918.0 6237.127174
## 102 0 100.4 ... 14399.0 14033.049688
## 130 0 96.1 ... 9295.0 9190.996124
## 161 0 95.7 ... 8358.0 6882.635343
## 156 0 95.7 ... 6938.0 8022.762841
## 197 -1 104.3 ... 16515.0 16158.050424
## 13 0 101.2 ... 21105.0 17888.059465
## 61 1 98.8 ... 10595.0 10196.937868
## 149 0 96.9 ... 11694.0 10749.927302
## 124 3 95.9 ... 12764.0 15452.771505
## 98 2 95.1 ... 8249.0 6457.269140
## 16 0 103.5 ... 41315.0 26421.249173
## 60 0 98.8 ... 8495.0 11556.112279
## 139 2 93.7 ... 7053.0 6057.316547
## 2 1 94.5 ... 16500.0 9121.862812
## 142 0 97.2 ... 7775.0 7111.863720
## 7 1 105.8 ... 18920.0 20288.211799
## 55 3 95.3 ... 10945.0 12736.914306
## 172 2 98.4 ... 17669.0 16409.340196
## 73 0 120.9 ... 40960.0 44096.319767
## 32 1 93.7 ... 5399.0 6063.216509
## 158 0 95.7 ... 7898.0 7694.183182
## 15 0 103.5 ... 30760.0 26350.365279
## 49 0 102.0 ... 36000.0 40876.748886
## 19 1 94.5 ... 6295.0 5535.759814
## 93 1 94.5 ... 7349.0 5176.015935
## 112 0 107.9 ... 16900.0 17075.756190
##
## [41 rows x 45 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3461.5175007507987
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3461.5175007507987
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 1307
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
## DecisionTreeRegressor(random_state=1307)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 153
#plot = plot_tree(
# decision_tree = modelo_ar,
# feature_names = datos.drop(columns = "price").columns,
# class_names = 'price',
# filled = True,
# impurity = False,
# fontsize = 10,
# precision = 2,
# ax = ax
# )
#plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos_dummis.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2542.00
## | | |--- curbweight <= 2247.00
## | | | |--- curbweight <= 2072.00
## | | | | |--- carbody_hatchback <= 0.50
## | | | | | |--- carlength <= 156.50
## | | | | | | |--- value: [8916.50]
## | | | | | |--- carlength > 156.50
## | | | | | | |--- curbweight <= 1899.00
## | | | | | | | |--- value: [5499.00]
## | | | | | | |--- curbweight > 1899.00
## | | | | | | | |--- curbweight <= 1947.50
## | | | | | | | | |--- curbweight <= 1928.00
## | | | | | | | | | |--- carheight <= 53.25
## | | | | | | | | | | |--- value: [6575.00]
## | | | | | | | | | |--- carheight > 53.25
## | | | | | | | | | | |--- value: [6649.00]
## | | | | | | | | |--- curbweight > 1928.00
## | | | | | | | | | |--- wheelbase <= 93.80
## | | | | | | | | | | |--- value: [6695.00]
## | | | | | | | | | |--- wheelbase > 93.80
## | | | | | | | | | | |--- value: [6849.00]
## | | | | | | | |--- curbweight > 1947.50
## | | | | | | | | |--- carlength <= 168.75
## | | | | | | | | | |--- carlength <= 167.05
## | | | | | | | | | | |--- curbweight <= 1980.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- curbweight > 1980.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- carlength > 167.05
## | | | | | | | | | | |--- value: [6692.00]
## | | | | | | | | |--- carlength > 168.75
## | | | | | | | | | |--- value: [7999.00]
## | | | | |--- carbody_hatchback > 0.50
## | | | | | |--- stroke <= 3.26
## | | | | | | |--- citympg <= 33.00
## | | | | | | | |--- highwaympg <= 37.00
## | | | | | | | | |--- enginetype_ohcf <= 0.50
## | | | | | | | | | |--- value: [5195.00]
## | | | | | | | | |--- enginetype_ohcf > 0.50
## | | | | | | | | | |--- value: [5118.00]
## | | | | | | | |--- highwaympg > 37.00
## | | | | | | | | |--- curbweight <= 1985.50
## | | | | | | | | | |--- curbweight <= 1924.50
## | | | | | | | | | | |--- curbweight <= 1902.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 1902.50
## | | | | | | | | | | | |--- value: [6795.00]
## | | | | | | | | | |--- curbweight > 1924.50
## | | | | | | | | | | |--- enginesize <= 91.00
## | | | | | | | | | | | |--- value: [6229.00]
## | | | | | | | | | | |--- enginesize > 91.00
## | | | | | | | | | | | |--- value: [6189.00]
## | | | | | | | | |--- curbweight > 1985.50
## | | | | | | | | | |--- stroke <= 3.13
## | | | | | | | | | | |--- curbweight <= 2027.50
## | | | | | | | | | | | |--- value: [6488.00]
## | | | | | | | | | | |--- curbweight > 2027.50
## | | | | | | | | | | | |--- value: [6338.00]
## | | | | | | | | | |--- stroke > 3.13
## | | | | | | | | | | |--- value: [6669.00]
## | | | | | | |--- citympg > 33.00
## | | | | | | | |--- stroke <= 3.13
## | | | | | | | | |--- citympg <= 41.00
## | | | | | | | | | |--- value: [5348.00]
## | | | | | | | | |--- citympg > 41.00
## | | | | | | | | | |--- value: [5151.00]
## | | | | | | | |--- stroke > 3.13
## | | | | | | | | |--- carwidth <= 64.10
## | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | |--- carwidth > 64.10
## | | | | | | | | | |--- value: [5389.00]
## | | | | | |--- stroke > 3.26
## | | | | | | |--- curbweight <= 1984.00
## | | | | | | | |--- citympg <= 40.00
## | | | | | | | | |--- wheelbase <= 90.15
## | | | | | | | | | |--- value: [6855.00]
## | | | | | | | | |--- wheelbase > 90.15
## | | | | | | | | | |--- value: [6529.00]
## | | | | | | | |--- citympg > 40.00
## | | | | | | | | |--- value: [6479.00]
## | | | | | | |--- curbweight > 1984.00
## | | | | | | | |--- value: [7799.00]
## | | | |--- curbweight > 2072.00
## | | | | |--- highwaympg <= 29.50
## | | | | | |--- value: [9980.00]
## | | | | |--- highwaympg > 29.50
## | | | | | |--- highwaympg <= 35.50
## | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | | | | |--- value: [9258.00]
## | | | | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | | | | |--- value: [8195.00]
## | | | | | | |--- doornumber_two > 0.50
## | | | | | | | |--- carheight <= 50.70
## | | | | | | | | |--- value: [8558.00]
## | | | | | | | |--- carheight > 50.70
## | | | | | | | | |--- wheelbase <= 93.50
## | | | | | | | | | |--- fuelsystem_spdi <= 0.50
## | | | | | | | | | | |--- value: [7603.00]
## | | | | | | | | | |--- fuelsystem_spdi > 0.50
## | | | | | | | | | | |--- value: [7689.00]
## | | | | | | | | |--- wheelbase > 93.50
## | | | | | | | | | |--- fuelsystem_2bbl <= 0.50
## | | | | | | | | | | |--- peakrpm <= 5650.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- peakrpm > 5650.00
## | | | | | | | | | | | |--- value: [7895.00]
## | | | | | | | | | |--- fuelsystem_2bbl > 0.50
## | | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | | |--- value: [8058.00]
## | | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | | |--- value: [8238.00]
## | | | | | |--- highwaympg > 35.50
## | | | | | | |--- highwaympg <= 37.50
## | | | | | | | |--- horsepower <= 76.00
## | | | | | | | | |--- value: [7198.00]
## | | | | | | | |--- horsepower > 76.00
## | | | | | | | | |--- value: [7126.00]
## | | | | | | |--- highwaympg > 37.50
## | | | | | | | |--- carwidth <= 64.10
## | | | | | | | | |--- value: [7609.00]
## | | | | | | | |--- carwidth > 64.10
## | | | | | | | | |--- value: [7738.00]
## | | |--- curbweight > 2247.00
## | | | |--- horsepower <= 100.50
## | | | | |--- carwidth <= 65.75
## | | | | | |--- peakrpm <= 5100.00
## | | | | | | |--- carheight <= 53.90
## | | | | | | | |--- symboling <= 2.00
## | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | |--- citympg <= 26.50
## | | | | | | | | | | |--- peakrpm <= 4900.00
## | | | | | | | | | | | |--- value: [6785.00]
## | | | | | | | | | | |--- peakrpm > 4900.00
## | | | | | | | | | | | |--- value: [6989.00]
## | | | | | | | | | |--- citympg > 26.50
## | | | | | | | | | | |--- carheight <= 52.90
## | | | | | | | | | | | |--- value: [7788.00]
## | | | | | | | | | | |--- carheight > 52.90
## | | | | | | | | | | | |--- value: [7463.00]
## | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | |--- value: [8189.00]
## | | | | | | | |--- symboling > 2.00
## | | | | | | | | |--- value: [8499.00]
## | | | | | | |--- carheight > 53.90
## | | | | | | | |--- carwidth <= 64.10
## | | | | | | | | |--- value: [7898.00]
## | | | | | | | |--- carwidth > 64.10
## | | | | | | | | |--- carbody_wagon <= 0.50
## | | | | | | | | | |--- compressionratio <= 16.00
## | | | | | | | | | | |--- value: [9233.00]
## | | | | | | | | | |--- compressionratio > 16.00
## | | | | | | | | | | |--- value: [9495.00]
## | | | | | | | | |--- carbody_wagon > 0.50
## | | | | | | | | | |--- value: [8921.00]
## | | | | | |--- peakrpm > 5100.00
## | | | | | | |--- highwaympg <= 30.00
## | | | | | | | |--- value: [11595.00]
## | | | | | | |--- highwaympg > 30.00
## | | | | | | | |--- curbweight <= 2332.00
## | | | | | | | | |--- highwaympg <= 32.50
## | | | | | | | | | |--- value: [9995.00]
## | | | | | | | | |--- highwaympg > 32.50
## | | | | | | | | | |--- wheelbase <= 97.25
## | | | | | | | | | | |--- curbweight <= 2303.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2303.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- wheelbase > 97.25
## | | | | | | | | | | |--- value: [8495.00]
## | | | | | | | |--- curbweight > 2332.00
## | | | | | | | | |--- carheight <= 52.75
## | | | | | | | | | |--- value: [9960.00]
## | | | | | | | | |--- carheight > 52.75
## | | | | | | | | | |--- enginetype_ohcf <= 0.50
## | | | | | | | | | | |--- value: [10295.00]
## | | | | | | | | | |--- enginetype_ohcf > 0.50
## | | | | | | | | | | |--- value: [10198.00]
## | | | | |--- carwidth > 65.75
## | | | | | |--- curbweight <= 2397.50
## | | | | | | |--- citympg <= 25.50
## | | | | | | | |--- value: [10345.00]
## | | | | | | |--- citympg > 25.50
## | | | | | | | |--- stroke <= 3.47
## | | | | | | | | |--- value: [8845.00]
## | | | | | | | |--- stroke > 3.47
## | | | | | | | | |--- value: [8948.00]
## | | | | | |--- curbweight > 2397.50
## | | | | | | |--- carheight <= 52.20
## | | | | | | | |--- value: [9895.00]
## | | | | | | |--- carheight > 52.20
## | | | | | | | |--- curbweight <= 2419.50
## | | | | | | | | |--- carheight <= 54.40
## | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | |--- carheight > 54.40
## | | | | | | | | | |--- fuelsystem_2bbl <= 0.50
## | | | | | | | | | | |--- value: [10898.00]
## | | | | | | | | | |--- fuelsystem_2bbl > 0.50
## | | | | | | | | | | |--- value: [10245.00]
## | | | | | | | |--- curbweight > 2419.50
## | | | | | | | | |--- compressionratio <= 15.60
## | | | | | | | | | |--- symboling <= -0.50
## | | | | | | | | | | |--- value: [11248.00]
## | | | | | | | | | |--- symboling > -0.50
## | | | | | | | | | | |--- value: [11245.00]
## | | | | | | | | |--- compressionratio > 15.60
## | | | | | | | | | |--- peakrpm <= 4575.00
## | | | | | | | | | | |--- value: [10698.00]
## | | | | | | | | | |--- peakrpm > 4575.00
## | | | | | | | | | | |--- value: [10795.00]
## | | | |--- horsepower > 100.50
## | | | | |--- carlength <= 176.40
## | | | | | |--- enginetype_rotor <= 0.50
## | | | | | | |--- enginetype_ohcf <= 0.50
## | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | |--- drivewheel_rwd <= 0.50
## | | | | | | | | | |--- value: [9959.00]
## | | | | | | | | |--- drivewheel_rwd > 0.50
## | | | | | | | | | |--- highwaympg <= 29.50
## | | | | | | | | | | |--- value: [9538.00]
## | | | | | | | | | |--- highwaympg > 29.50
## | | | | | | | | | | |--- value: [9639.00]
## | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | |--- wheelbase <= 95.40
## | | | | | | | | | |--- value: [9298.00]
## | | | | | | | | |--- wheelbase > 95.40
## | | | | | | | | | |--- value: [9279.00]
## | | | | | | |--- enginetype_ohcf > 0.50
## | | | | | | | |--- value: [11259.00]
## | | | | | |--- enginetype_rotor > 0.50
## | | | | | | |--- horsepower <= 118.00
## | | | | | | | |--- curbweight <= 2382.50
## | | | | | | | | |--- value: [11845.00]
## | | | | | | | |--- curbweight > 2382.50
## | | | | | | | | |--- value: [13645.00]
## | | | | | | |--- horsepower > 118.00
## | | | | | | | |--- value: [15645.00]
## | | | | |--- carlength > 176.40
## | | | | | |--- drivewheel_fwd <= 0.50
## | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | |--- value: [16925.00]
## | | | | | | |--- doornumber_two > 0.50
## | | | | | | | |--- value: [16430.00]
## | | | | | |--- drivewheel_fwd > 0.50
## | | | | | | |--- value: [13950.00]
## | |--- curbweight > 2542.00
## | | |--- carwidth <= 68.65
## | | | |--- horsepower <= 118.50
## | | | | |--- carwidth <= 65.85
## | | | | | |--- wheelbase <= 92.15
## | | | | | | |--- value: [14997.50]
## | | | | | |--- wheelbase > 92.15
## | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | |--- value: [8778.00]
## | | | | | | |--- doornumber_two > 0.50
## | | | | | | | |--- curbweight <= 2615.00
## | | | | | | | | |--- value: [9989.00]
## | | | | | | | |--- curbweight > 2615.00
## | | | | | | | | |--- boreratio <= 3.52
## | | | | | | | | | |--- value: [11048.00]
## | | | | | | | | |--- boreratio > 3.52
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- value: [11199.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- value: [11549.00]
## | | | | |--- carwidth > 65.85
## | | | | | |--- curbweight <= 2697.50
## | | | | | | |--- carlength <= 181.65
## | | | | | | | |--- boreratio <= 3.10
## | | | | | | | | |--- value: [13845.00]
## | | | | | | | |--- boreratio > 3.10
## | | | | | | | | |--- value: [13295.00]
## | | | | | | |--- carlength > 181.65
## | | | | | | | |--- symboling <= 2.50
## | | | | | | | | |--- compressionratio <= 9.15
## | | | | | | | | | |--- value: [12290.00]
## | | | | | | | | |--- compressionratio > 9.15
## | | | | | | | | | |--- value: [12170.00]
## | | | | | | | |--- symboling > 2.50
## | | | | | | | | |--- value: [11850.00]
## | | | | | |--- curbweight > 2697.50
## | | | | | | |--- carheight <= 55.25
## | | | | | | | |--- horsepower <= 93.50
## | | | | | | | | |--- value: [18344.00]
## | | | | | | | |--- horsepower > 93.50
## | | | | | | | | |--- value: [17450.00]
## | | | | | | |--- carheight > 55.25
## | | | | | | | |--- curbweight <= 3241.00
## | | | | | | | | |--- stroke <= 3.11
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- carwidth <= 67.45
## | | | | | | | | | | | |--- value: [15510.00]
## | | | | | | | | | | |--- carwidth > 67.45
## | | | | | | | | | | | |--- value: [15580.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- value: [15040.00]
## | | | | | | | | |--- stroke > 3.11
## | | | | | | | | | |--- curbweight <= 3136.00
## | | | | | | | | | | |--- curbweight <= 3054.50
## | | | | | | | | | | | |--- truncated branch of depth 4
## | | | | | | | | | | |--- curbweight > 3054.50
## | | | | | | | | | | | |--- value: [16630.00]
## | | | | | | | | | |--- curbweight > 3136.00
## | | | | | | | | | | |--- compressionratio <= 14.70
## | | | | | | | | | | | |--- value: [12440.00]
## | | | | | | | | | | |--- compressionratio > 14.70
## | | | | | | | | | | | |--- value: [13200.00]
## | | | | | | | |--- curbweight > 3241.00
## | | | | | | | | |--- carheight <= 57.70
## | | | | | | | | | |--- curbweight <= 3268.50
## | | | | | | | | | | |--- value: [17950.00]
## | | | | | | | | | |--- curbweight > 3268.50
## | | | | | | | | | | |--- value: [16695.00]
## | | | | | | | | |--- carheight > 57.70
## | | | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | | | |--- value: [13860.00]
## | | | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | | | |--- value: [17075.00]
## | | | |--- horsepower > 118.50
## | | | | |--- horsepower <= 144.00
## | | | | | |--- peakrpm <= 5550.00
## | | | | | | |--- citympg <= 19.50
## | | | | | | | |--- value: [22018.00]
## | | | | | | |--- citympg > 19.50
## | | | | | | | |--- value: [24565.00]
## | | | | | |--- peakrpm > 5550.00
## | | | | | | |--- value: [18150.00]
## | | | | |--- horsepower > 144.00
## | | | | | |--- horsepower <= 158.00
## | | | | | | |--- compressionratio <= 9.10
## | | | | | | | |--- curbweight <= 2877.00
## | | | | | | | | |--- fuelsystem_spdi <= 0.50
## | | | | | | | | | |--- value: [12964.00]
## | | | | | | | | |--- fuelsystem_spdi > 0.50
## | | | | | | | | | |--- value: [12629.00]
## | | | | | | | |--- curbweight > 2877.00
## | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | |--- value: [13499.00]
## | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | | |--- value: [14869.00]
## | | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | | |--- value: [14489.00]
## | | | | | | |--- compressionratio > 9.10
## | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | |--- value: [15750.00]
## | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | |--- value: [15690.00]
## | | | | | |--- horsepower > 158.00
## | | | | | | |--- horsepower <= 187.50
## | | | | | | | |--- enginesize <= 135.50
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- curbweight <= 2946.00
## | | | | | | | | | | |--- value: [18620.00]
## | | | | | | | | | |--- curbweight > 2946.00
## | | | | | | | | | | |--- value: [18420.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- carwidth <= 67.20
## | | | | | | | | | | |--- value: [18150.00]
## | | | | | | | | | |--- carwidth > 67.20
## | | | | | | | | | | |--- value: [17859.17]
## | | | | | | | |--- enginesize > 135.50
## | | | | | | | | |--- enginetype_ohcv <= 0.50
## | | | | | | | | | |--- enginesize <= 155.50
## | | | | | | | | | | |--- value: [16503.00]
## | | | | | | | | | |--- enginesize > 155.50
## | | | | | | | | | | |--- value: [16558.00]
## | | | | | | | | |--- enginetype_ohcv > 0.50
## | | | | | | | | | |--- value: [17199.00]
## | | | | | | |--- horsepower > 187.50
## | | | | | | | |--- value: [19699.00]
## | | |--- carwidth > 68.65
## | | | |--- curbweight <= 2982.00
## | | | | |--- symboling <= 0.00
## | | | | | |--- value: [16845.00]
## | | | | |--- symboling > 0.00
## | | | | | |--- value: [17710.00]
## | | | |--- curbweight > 2982.00
## | | | | |--- compressionratio <= 8.55
## | | | | | |--- value: [23875.00]
## | | | | |--- compressionratio > 8.55
## | | | | | |--- horsepower <= 124.00
## | | | | | | |--- peakrpm <= 5100.00
## | | | | | | | |--- value: [22470.00]
## | | | | | | |--- peakrpm > 5100.00
## | | | | | | | |--- value: [22625.00]
## | | | | | |--- horsepower > 124.00
## | | | | | | |--- value: [21485.00]
## |--- enginesize > 182.00
## | |--- highwaympg <= 17.00
## | | |--- value: [45400.00]
## | |--- highwaympg > 17.00
## | | |--- compressionratio <= 9.75
## | | | |--- enginetype_ohc <= 0.50
## | | | | |--- carbody_hardtop <= 0.50
## | | | | | |--- enginesize <= 214.00
## | | | | | | |--- value: [37028.00]
## | | | | | |--- enginesize > 214.00
## | | | | | | |--- wheelbase <= 104.80
## | | | | | | | |--- value: [35056.00]
## | | | | | | |--- wheelbase > 104.80
## | | | | | | | |--- cylindernumber_six <= 0.50
## | | | | | | | | |--- value: [34184.00]
## | | | | | | | |--- cylindernumber_six > 0.50
## | | | | | | | | |--- value: [33900.00]
## | | | | |--- carbody_hardtop > 0.50
## | | | | | |--- value: [33278.00]
## | | | |--- enginetype_ohc > 0.50
## | | | | |--- value: [36880.00]
## | | |--- compressionratio > 9.75
## | | | |--- carwidth <= 71.00
## | | | | |--- carbody_sedan <= 0.50
## | | | | | |--- doornumber_two <= 0.50
## | | | | | | |--- value: [28248.00]
## | | | | | |--- doornumber_two > 0.50
## | | | | | | |--- value: [28176.00]
## | | | | |--- carbody_sedan > 0.50
## | | | | | |--- value: [25552.00]
## | | | |--- carwidth > 71.00
## | | | | |--- compressionratio <= 15.75
## | | | | | |--- value: [31400.50]
## | | | | |--- compressionratio > 15.75
## | | | | | |--- value: [31600.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos_dummis.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 0.661943
## 5 curbweight 0.226520
## 10 horsepower 0.031904
## 3 carwidth 0.025962
## 13 highwaympg 0.017288
## 9 compressionratio 0.011234
## 2 carlength 0.006263
## 29 enginetype_rotor 0.003402
## 1 wheelbase 0.003112
## 11 peakrpm 0.003039
## 4 carheight 0.002600
## 12 citympg 0.001662
## 18 carbody_hatchback 0.000957
## 8 stroke 0.000887
## 26 enginetype_ohc 0.000567
## 19 carbody_sedan 0.000526
## 16 doornumber_two 0.000520
## 21 drivewheel_fwd 0.000512
## 17 carbody_hardtop 0.000348
## 0 symboling 0.000288
## 27 enginetype_ohcf 0.000275
## 40 fuelsystem_mpfi 0.000058
## 20 carbody_wagon 0.000032
## 28 enginetype_ohcv 0.000031
## 36 fuelsystem_2bbl 0.000028
## 7 boreratio 0.000023
## 22 drivewheel_rwd 0.000009
## 41 fuelsystem_spdi 0.000006
## 32 cylindernumber_six 0.000006
## 34 cylindernumber_twelve 0.000000
## 39 fuelsystem_mfi 0.000000
## 38 fuelsystem_idi 0.000000
## 37 fuelsystem_4bbl 0.000000
## 35 cylindernumber_two 0.000000
## 31 cylindernumber_four 0.000000
## 33 cylindernumber_three 0.000000
## 15 aspiration_turbo 0.000000
## 30 cylindernumber_five 0.000000
## 14 fueltype_gas 0.000000
## 25 enginetype_l 0.000000
## 24 enginetype_dohcv 0.000000
## 23 enginelocation_rear 0.000000
## 42 fuelsystem_spfi 0.000000
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([24565. , 22018. , 9639. , 13950. , 6529. , 18420. , 21485. ,
## 8916.5, 9279. , 8921. , 16558. , 9495. , 9495. , 17199. ,
## 7898. , 13499. , 13295. , 9258. , 7198. , 15985. , 24565. ,
## 8845. , 8778. , 12629. , 7150.5, 36880. , 10245. , 8238. ,
## 12964. , 9258. , 17710. , 11845. , 11199. , 45400. , 5348. ,
## 7463. , 36880. , 45400. , 5348. , 7999. , 17950. ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 12 0 101.2 ... 20970.0 24565.0
## 65 0 104.9 ... 18280.0 22018.0
## 167 2 98.4 ... 8449.0 9639.0
## 5 2 99.8 ... 15250.0 13950.0
## 34 1 93.7 ... 7129.0 6529.0
## 199 -1 104.3 ... 18950.0 18420.0
## 201 -1 109.1 ... 19045.0 21485.0
## 44 1 94.5 ... 8916.5 8916.5
## 41 0 96.5 ... 12945.0 9279.0
## 148 0 96.9 ... 8013.0 8921.0
## 179 3 102.9 ... 15998.0 16558.0
## 184 2 97.3 ... 7995.0 9495.0
## 182 2 97.3 ... 7775.0 9495.0
## 106 1 99.2 ... 18399.0 17199.0
## 153 0 95.7 ... 6918.0 7898.0
## 102 0 100.4 ... 14399.0 13499.0
## 130 0 96.1 ... 9295.0 13295.0
## 161 0 95.7 ... 8358.0 9258.0
## 156 0 95.7 ... 6938.0 7198.0
## 197 -1 104.3 ... 16515.0 15985.0
## 13 0 101.2 ... 21105.0 24565.0
## 61 1 98.8 ... 10595.0 8845.0
## 149 0 96.9 ... 11694.0 8778.0
## 124 3 95.9 ... 12764.0 12629.0
## 98 2 95.1 ... 8249.0 7150.5
## 16 0 103.5 ... 41315.0 36880.0
## 60 0 98.8 ... 8495.0 10245.0
## 139 2 93.7 ... 7053.0 8238.0
## 2 1 94.5 ... 16500.0 12964.0
## 142 0 97.2 ... 7775.0 9258.0
## 7 1 105.8 ... 18920.0 17710.0
## 55 3 95.3 ... 10945.0 11845.0
## 172 2 98.4 ... 17669.0 11199.0
## 73 0 120.9 ... 40960.0 45400.0
## 32 1 93.7 ... 5399.0 5348.0
## 158 0 95.7 ... 7898.0 7463.0
## 15 0 103.5 ... 30760.0 36880.0
## 49 0 102.0 ... 36000.0 45400.0
## 19 1 94.5 ... 6295.0 5348.0
## 93 1 94.5 ... 7349.0 7999.0
## 112 0 107.9 ... 16900.0 17950.0
##
## [41 rows x 45 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2848.317613190242
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2848.317613190242
Se construye el modelo de árbol de regresión (ar). Semilla 1307 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1307)
modelo_rf.fit(X_entrena, Y_entrena)
## RandomForestRegressor(n_estimators=20, random_state=1307)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([15853.55 , 15804.7 , 10331. , 12790.9 ,
## 6985.475 , 17827.7 , 19760.15 , 6745.95 ,
## 12716.05 , 10337.35 , 16633.9 , 8641.8 ,
## 8449.8 , 16958.45 , 8007.55 , 14487.05 ,
## 11779.75 , 8457.25 , 7490.4 , 13902.15 ,
## 16487.1 , 9982.65 , 14338.11666667, 14808.2 ,
## 7192.01666667, 36052.55 , 10084.85 , 7875. ,
## 13790.17085 , 8299.125 , 19134.4 , 11983.4 ,
## 12924.55 , 40175.7 , 5866.1 , 7907.35 ,
## 36395.05 , 36480.35 , 6337.8 , 7608.1 ,
## 16169.4 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 12 0 101.2 ... 20970.0 15853.550000
## 65 0 104.9 ... 18280.0 15804.700000
## 167 2 98.4 ... 8449.0 10331.000000
## 5 2 99.8 ... 15250.0 12790.900000
## 34 1 93.7 ... 7129.0 6985.475000
## 199 -1 104.3 ... 18950.0 17827.700000
## 201 -1 109.1 ... 19045.0 19760.150000
## 44 1 94.5 ... 8916.5 6745.950000
## 41 0 96.5 ... 12945.0 12716.050000
## 148 0 96.9 ... 8013.0 10337.350000
## 179 3 102.9 ... 15998.0 16633.900000
## 184 2 97.3 ... 7995.0 8641.800000
## 182 2 97.3 ... 7775.0 8449.800000
## 106 1 99.2 ... 18399.0 16958.450000
## 153 0 95.7 ... 6918.0 8007.550000
## 102 0 100.4 ... 14399.0 14487.050000
## 130 0 96.1 ... 9295.0 11779.750000
## 161 0 95.7 ... 8358.0 8457.250000
## 156 0 95.7 ... 6938.0 7490.400000
## 197 -1 104.3 ... 16515.0 13902.150000
## 13 0 101.2 ... 21105.0 16487.100000
## 61 1 98.8 ... 10595.0 9982.650000
## 149 0 96.9 ... 11694.0 14338.116667
## 124 3 95.9 ... 12764.0 14808.200000
## 98 2 95.1 ... 8249.0 7192.016667
## 16 0 103.5 ... 41315.0 36052.550000
## 60 0 98.8 ... 8495.0 10084.850000
## 139 2 93.7 ... 7053.0 7875.000000
## 2 1 94.5 ... 16500.0 13790.170850
## 142 0 97.2 ... 7775.0 8299.125000
## 7 1 105.8 ... 18920.0 19134.400000
## 55 3 95.3 ... 10945.0 11983.400000
## 172 2 98.4 ... 17669.0 12924.550000
## 73 0 120.9 ... 40960.0 40175.700000
## 32 1 93.7 ... 5399.0 5866.100000
## 158 0 95.7 ... 7898.0 7907.350000
## 15 0 103.5 ... 30760.0 36395.050000
## 49 0 102.0 ... 36000.0 36480.350000
## 19 1 94.5 ... 6295.0 6337.800000
## 93 1 94.5 ... 7349.0 7608.100000
## 112 0 107.9 ... 16900.0 16169.400000
##
## [41 rows x 45 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2215.6474443416696
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2215.6474443416696
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 12 0 101.2 ... 24565.0 15853.550000
## 65 0 104.9 ... 22018.0 15804.700000
## 167 2 98.4 ... 9639.0 10331.000000
## 5 2 99.8 ... 13950.0 12790.900000
## 34 1 93.7 ... 6529.0 6985.475000
## 199 -1 104.3 ... 18420.0 17827.700000
## 201 -1 109.1 ... 21485.0 19760.150000
## 44 1 94.5 ... 8916.5 6745.950000
## 41 0 96.5 ... 9279.0 12716.050000
## 148 0 96.9 ... 8921.0 10337.350000
## 179 3 102.9 ... 16558.0 16633.900000
## 184 2 97.3 ... 9495.0 8641.800000
## 182 2 97.3 ... 9495.0 8449.800000
## 106 1 99.2 ... 17199.0 16958.450000
## 153 0 95.7 ... 7898.0 8007.550000
## 102 0 100.4 ... 13499.0 14487.050000
## 130 0 96.1 ... 13295.0 11779.750000
## 161 0 95.7 ... 9258.0 8457.250000
## 156 0 95.7 ... 7198.0 7490.400000
## 197 -1 104.3 ... 15985.0 13902.150000
## 13 0 101.2 ... 24565.0 16487.100000
## 61 1 98.8 ... 8845.0 9982.650000
## 149 0 96.9 ... 8778.0 14338.116667
## 124 3 95.9 ... 12629.0 14808.200000
## 98 2 95.1 ... 7150.5 7192.016667
## 16 0 103.5 ... 36880.0 36052.550000
## 60 0 98.8 ... 10245.0 10084.850000
## 139 2 93.7 ... 8238.0 7875.000000
## 2 1 94.5 ... 12964.0 13790.170850
## 142 0 97.2 ... 9258.0 8299.125000
## 7 1 105.8 ... 17710.0 19134.400000
## 55 3 95.3 ... 11845.0 11983.400000
## 172 2 98.4 ... 11199.0 12924.550000
## 73 0 120.9 ... 45400.0 40175.700000
## 32 1 93.7 ... 5348.0 5866.100000
## 158 0 95.7 ... 7463.0 7907.350000
## 15 0 103.5 ... 36880.0 36395.050000
## 49 0 102.0 ... 45400.0 36480.350000
## 19 1 94.5 ... 5348.0 6337.800000
## 93 1 94.5 ... 7999.0 7608.100000
## 112 0 107.9 ... 17950.0 16169.400000
##
## [41 rows x 47 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3461.51750075, 2848.31761319, 2215.64744434]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 3461.517501 2848.317613 2215.647444
Las variables con confianza mayor a 90% son: fueltypegas, carbodyhardtop, carbodyhatchback, carbodywagon, enginelocationrear, carwidth, curbweight, enginetypeohc, enginetypeohcv, cylindernumbertwelve, enginesize, fuelsystemspdi, boreratio, stroke, compressionratio, peakrpm.
Este caso es constante en ambos lenguajes donde el modelo de random forest mantiene el lugar de modelo mas preciso durante la prediccion, con mi semilla el rmse mas bajo es el de 2215.64, este es mayor que el de R.
Usando python no me encontre con ningun problema de la separacion de datos al usar datos categoricos, no estoy seguro de la implementacion de R, creo que la libreria de python los maneja de mejor manera o en este caso la separacion para mi semilla no tuvo problemas. ¿