Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
## car_ID symboling CarName ... citympg highwaympg price
## 0 1 3 alfa-romero giulia ... 21 27 13495.0
## 1 2 3 alfa-romero stelvio ... 21 27 16500.0
## 2 3 1 alfa-romero Quadrifoglio ... 19 26 16500.0
## 3 4 2 audi 100 ls ... 24 30 13950.0
## 4 5 2 audi 100ls ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 volvo 145e (sw) ... 23 28 16845.0
## 201 202 -1 volvo 144ea ... 19 25 19045.0
## 202 203 -1 volvo 244dl ... 18 23 21485.0
## 203 204 -1 volvo 246 ... 26 27 22470.0
## 204 205 -1 volvo 264gl ... 19 25 22625.0
##
## [205 rows x 26 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID int64
## symboling int64
## CarName object
## fueltype object
## aspiration object
## doornumber object
## carbody object
## drivewheel object
## enginelocation object
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginetype object
## cylindernumber object
## enginesize int64
## fuelsystem object
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
## symboling fueltype aspiration ... citympg highwaympg price
## 0 3 gas std ... 21 27 13495.0
## 1 3 gas std ... 21 27 16500.0
## 2 1 gas std ... 19 26 16500.0
## 3 2 gas std ... 24 30 13950.0
## 4 2 gas std ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 gas std ... 23 28 16845.0
## 201 -1 gas turbo ... 19 25 19045.0
## 202 -1 gas std ... 18 23 21485.0
## 203 -1 diesel turbo ... 26 27 22470.0
## 204 -1 gas turbo ... 19 25 22625.0
##
## [205 rows x 24 columns]
Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object
Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.
El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.
¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.
Ejemplo
| genero |
|---|
| MASCULINO |
| FEMENINO |
| MASCULINO |
Mismos datos con variables dummis
| genero_masculino | genero_femenino |
|---|---|
| 1 | 0 |
| 0 | 1 |
| 1 | 0 |
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 0 3 88.6 ... 0 0
## 1 3 88.6 ... 0 0
## 2 1 94.5 ... 0 0
## 3 2 99.8 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 200 -1 109.1 ... 0 0
## 201 -1 109.1 ... 0 0
## 202 -1 109.1 ... 0 0
## 203 -1 109.1 ... 0 0
## 204 -1 109.1 ... 0 0
##
## [205 rows x 44 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 2022
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80, random_state = 2022)
X_entrena
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 152 1 95.7 ... 0 0
## 185 2 97.3 ... 0 0
## 162 0 95.7 ... 0 0
## 47 0 113.0 ... 0 0
## 163 1 94.5 ... 0 0
## .. ... ... ... ... ...
## 183 2 97.3 ... 0 0
## 177 -1 102.4 ... 0 0
## 112 0 107.9 ... 0 0
## 173 -1 102.4 ... 0 0
## 125 3 94.5 ... 0 0
##
## [164 rows x 43 columns]
X_valida
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 36 0 96.5 ... 0 0
## 198 -2 104.3 ... 0 0
## 102 0 100.4 ... 0 0
## 146 0 97.0 ... 0 0
## 79 1 93.0 ... 1 0
## 32 1 93.7 ... 0 0
## 107 0 107.9 ... 0 0
## 180 -1 104.5 ... 0 0
## 127 3 89.5 ... 0 0
## 149 0 96.9 ... 0 0
## 43 0 94.3 ... 0 0
## 40 0 96.5 ... 0 0
## 203 -1 109.1 ... 0 0
## 138 2 93.7 ... 0 0
## 201 -1 109.1 ... 0 0
## 20 0 94.5 ... 0 0
## 164 1 94.5 ... 0 0
## 65 0 104.9 ... 0 0
## 22 1 93.7 ... 0 0
## 186 2 97.3 ... 0 0
## 106 1 99.2 ... 0 0
## 156 0 95.7 ... 0 0
## 111 0 107.9 ... 0 0
## 68 -1 110.0 ... 0 0
## 123 -1 103.3 ... 0 0
## 108 0 107.9 ... 0 0
## 78 2 93.7 ... 0 0
## 8 1 105.8 ... 0 0
## 74 1 112.0 ... 0 0
## 10 2 101.2 ... 0 0
## 113 0 114.2 ... 0 0
## 82 3 95.9 ... 1 0
## 57 3 95.3 ... 0 0
## 158 0 95.7 ... 0 0
## 58 3 95.3 ... 0 0
## 17 0 110.0 ... 0 0
## 129 1 98.4 ... 0 0
## 150 1 95.7 ... 0 0
## 73 0 120.9 ... 0 0
## 116 0 107.9 ... 0 0
## 30 2 86.6 ... 0 0
##
## [41 rows x 43 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
## LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([-1.44061234e+02, 4.20304710e+01, -7.69669355e+01, 5.40067885e+02,
## 8.20826526e+01, 5.54718628e+00, 8.41017077e+01, -1.58255808e+03,
## -3.59143919e+03, -3.37663052e+02, 1.46591017e+01, 2.09786814e+00,
## -2.37983590e+02, 1.83095361e+02, -3.71831408e+03, 1.50095971e+03,
## 1.92820429e+02, -4.20182555e+03, -3.61331893e+03, -2.55787518e+03,
## -4.29555739e+03, 8.03555949e+02, 1.62354190e+03, 6.86505950e+03,
## -2.27373675e-13, -4.60472536e+02, 2.94237926e+03, 1.56214668e+03,
## -6.39338776e+03, -1.72796049e+03, -1.06398381e+04, -1.12804032e+04,
## -6.76468915e+03, -2.15379329e+03, -8.33237282e+03, -1.72796049e+03,
## 1.20349506e+03, -1.72796049e+03, 3.71831408e+03, -1.97754583e+03,
## 5.35151220e+02, -1.84032434e+03, -2.24293176e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9286642852257763
Se hacen predicciones con los datos de validación
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 6275.26876164 20183.80316317 15898.03172714 7458.97081347
## 7476.32581442 4100.96861125 13547.48559838 22607.94940748
## 32555.04912632 10035.2107168 8993.20379876 8298.73782873
## 26892.05603303 6948.98854852 21800.61585107 6681.8695686
## 7779.68551985 16688.81823958 6245.85861703 9591.29125384
## 17418.52661503 8079.40012354 17414.70182742 26960.09519985
## 10570.39918611 17496.98536569 7304.08137327 21648.18939631
## 36651.49427211 13620.71628995 16167.7240898 13398.56832513
## 11422.73593142 8593.97388554 15901.18418856 33722.62184987
## 39795.20858981 5592.15191463 39243.07428789 17802.08061133]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 36 0 96.5 ... 7295.0 6275.268762
## 198 -2 104.3 ... 18420.0 20183.803163
## 102 0 100.4 ... 14399.0 15898.031727
## 146 0 97.0 ... 7463.0 7458.970813
## 79 1 93.0 ... 7689.0 7476.325814
## 32 1 93.7 ... 5399.0 4100.968611
## 107 0 107.9 ... 11900.0 13547.485598
## 180 -1 104.5 ... 15690.0 22607.949407
## 127 3 89.5 ... 34028.0 32555.049126
## 149 0 96.9 ... 11694.0 10035.210717
## 43 0 94.3 ... 6785.0 8993.203799
## 40 0 96.5 ... 10295.0 8298.737829
## 203 -1 109.1 ... 22470.0 26892.056033
## 138 2 93.7 ... 5118.0 6948.988549
## 201 -1 109.1 ... 19045.0 21800.615851
## 20 0 94.5 ... 6575.0 6681.869569
## 164 1 94.5 ... 8238.0 7779.685520
## 65 0 104.9 ... 18280.0 16688.818240
## 22 1 93.7 ... 6377.0 6245.858617
## 186 2 97.3 ... 8495.0 9591.291254
## 106 1 99.2 ... 18399.0 17418.526615
## 156 0 95.7 ... 6938.0 8079.400124
## 111 0 107.9 ... 15580.0 17414.701827
## 68 -1 110.0 ... 28248.0 26960.095200
## 123 -1 103.3 ... 8921.0 10570.399186
## 108 0 107.9 ... 13200.0 17496.985366
## 78 2 93.7 ... 6669.0 7304.081373
## 8 1 105.8 ... 23875.0 21648.189396
## 74 1 112.0 ... 45400.0 36651.494272
## 10 2 101.2 ... 16430.0 13620.716290
## 113 0 114.2 ... 16695.0 16167.724090
## 82 3 95.9 ... 12629.0 13398.568325
## 57 3 95.3 ... 13645.0 11422.735931
## 158 0 95.7 ... 7898.0 8593.973886
## 58 3 95.3 ... 15645.0 15901.184189
## 17 0 110.0 ... 36880.0 33722.621850
## 129 1 98.4 ... 31400.5 39795.208590
## 150 1 95.7 ... 5348.0 5592.151915
## 73 0 120.9 ... 40960.0 39243.074288
## 116 0 107.9 ... 17950.0 17802.080611
## 30 2 86.6 ... 6479.0 1307.069171
##
## [41 rows x 45 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2887.3005726006995
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2887.3005726006995
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 2022
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
## DecisionTreeRegressor(random_state=2022)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 13
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 152
#plot = plot_tree(
# decision_tree = modelo_ar,
# feature_names = datos.drop(columns = "price").columns,
# class_names = 'price',
# filled = True,
# impurity = False,
# fontsize = 10,
# precision = 2,
# ax = ax
# )
#plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos_dummis.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2697.50
## | | |--- curbweight <= 2291.50
## | | | |--- citympg <= 29.50
## | | | | |--- symboling <= 2.50
## | | | | | |--- peakrpm <= 4600.00
## | | | | | | |--- curbweight <= 2155.00
## | | | | | | | |--- value: [7053.00]
## | | | | | | |--- curbweight > 2155.00
## | | | | | | | |--- symboling <= 1.00
## | | | | | | | | |--- value: [7775.00]
## | | | | | | | |--- symboling > 1.00
## | | | | | | | | |--- value: [7603.00]
## | | | | | |--- peakrpm > 4600.00
## | | | | | | |--- boreratio <= 3.22
## | | | | | | | |--- carlength <= 168.10
## | | | | | | | | |--- curbweight <= 2134.00
## | | | | | | | | | |--- aspiration_turbo <= 0.50
## | | | | | | | | | | |--- value: [8358.00]
## | | | | | | | | | |--- aspiration_turbo > 0.50
## | | | | | | | | | | |--- value: [7957.00]
## | | | | | | | | |--- curbweight > 2134.00
## | | | | | | | | | |--- peakrpm <= 5150.00
## | | | | | | | | | | |--- value: [9258.00]
## | | | | | | | | | |--- peakrpm > 5150.00
## | | | | | | | | | | |--- curbweight <= 2262.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2262.50
## | | | | | | | | | | | |--- value: [9095.00]
## | | | | | | | |--- carlength > 168.10
## | | | | | | | | |--- curbweight <= 2251.00
## | | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | | |--- value: [8195.00]
## | | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | | |--- curbweight <= 2189.00
## | | | | | | | | | | | |--- value: [8058.00]
## | | | | | | | | | | |--- curbweight > 2189.00
## | | | | | | | | | | | |--- value: [7975.00]
## | | | | | | | | |--- curbweight > 2251.00
## | | | | | | | | | |--- value: [7898.00]
## | | | | | | |--- boreratio > 3.22
## | | | | | | | |--- value: [9298.00]
## | | | | |--- symboling > 2.50
## | | | | | |--- curbweight <= 2237.50
## | | | | | | |--- value: [9980.00]
## | | | | | |--- curbweight > 2237.50
## | | | | | | |--- value: [11595.00]
## | | | |--- citympg > 29.50
## | | | | |--- carbody_hatchback <= 0.50
## | | | | | |--- carlength <= 156.60
## | | | | | | |--- value: [8916.50]
## | | | | | |--- carlength > 156.60
## | | | | | | |--- curbweight <= 1947.50
## | | | | | | | |--- curbweight <= 1903.50
## | | | | | | | | |--- value: [5499.00]
## | | | | | | | |--- curbweight > 1903.50
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- carwidth <= 64.00
## | | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | | |--- carwidth > 64.00
## | | | | | | | | | | |--- value: [6695.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- value: [6649.00]
## | | | | | | |--- curbweight > 1947.50
## | | | | | | | |--- symboling <= 1.50
## | | | | | | | | |--- enginesize <= 94.50
## | | | | | | | | | |--- carlength <= 167.05
## | | | | | | | | | | |--- carheight <= 52.35
## | | | | | | | | | | | |--- value: [7150.50]
## | | | | | | | | | | |--- carheight > 52.35
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- carlength > 167.05
## | | | | | | | | | | |--- carwidth <= 63.70
## | | | | | | | | | | | |--- value: [6918.00]
## | | | | | | | | | | |--- carwidth > 63.70
## | | | | | | | | | | | |--- value: [6692.00]
## | | | | | | | | |--- enginesize > 94.50
## | | | | | | | | | |--- enginesize <= 100.50
## | | | | | | | | | | |--- curbweight <= 2030.50
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- curbweight > 2030.50
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- enginesize > 100.50
## | | | | | | | | | | |--- fuelsystem_2bbl <= 0.50
## | | | | | | | | | | | |--- value: [7099.00]
## | | | | | | | | | | |--- fuelsystem_2bbl > 0.50
## | | | | | | | | | | | |--- value: [7126.00]
## | | | | | | | |--- symboling > 1.50
## | | | | | | | | |--- carheight <= 54.50
## | | | | | | | | | |--- value: [8249.00]
## | | | | | | | | |--- carheight > 54.50
## | | | | | | | | | |--- curbweight <= 2262.50
## | | | | | | | | | | |--- value: [7775.00]
## | | | | | | | | | |--- curbweight > 2262.50
## | | | | | | | | | | |--- value: [7995.00]
## | | | | |--- carbody_hatchback > 0.50
## | | | | | |--- enginesize <= 94.50
## | | | | | | |--- highwaympg <= 39.50
## | | | | | | | |--- highwaympg <= 32.50
## | | | | | | | | |--- value: [5195.00]
## | | | | | | | |--- highwaympg > 32.50
## | | | | | | | | |--- fuelsystem_2bbl <= 0.50
## | | | | | | | | | |--- curbweight <= 1948.00
## | | | | | | | | | | |--- wheelbase <= 90.15
## | | | | | | | | | | | |--- value: [6855.00]
## | | | | | | | | | | |--- wheelbase > 90.15
## | | | | | | | | | | | |--- value: [6529.00]
## | | | | | | | | | |--- curbweight > 1948.00
## | | | | | | | | | | |--- value: [7129.00]
## | | | | | | | | |--- fuelsystem_2bbl > 0.50
## | | | | | | | | | |--- boreratio <= 3.00
## | | | | | | | | | | |--- carheight <= 50.70
## | | | | | | | | | | | |--- value: [6229.00]
## | | | | | | | | | | |--- carheight > 50.70
## | | | | | | | | | | | |--- value: [6189.00]
## | | | | | | | | | |--- boreratio > 3.00
## | | | | | | | | | | |--- curbweight <= 1902.50
## | | | | | | | | | | | |--- value: [6095.00]
## | | | | | | | | | | |--- curbweight > 1902.50
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | |--- highwaympg > 39.50
## | | | | | | | |--- wheelbase <= 94.10
## | | | | | | | | |--- citympg <= 42.00
## | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | |--- value: [5389.00]
## | | | | | | | | |--- citympg > 42.00
## | | | | | | | | | |--- value: [5151.00]
## | | | | | | | |--- wheelbase > 94.10
## | | | | | | | | |--- value: [6295.00]
## | | | | | |--- enginesize > 94.50
## | | | | | | |--- horsepower <= 69.50
## | | | | | | | |--- peakrpm <= 4850.00
## | | | | | | | | |--- value: [7788.00]
## | | | | | | | |--- peakrpm > 4850.00
## | | | | | | | | |--- value: [7799.00]
## | | | | | | |--- horsepower > 69.50
## | | | | | | | |--- value: [7198.00]
## | | |--- curbweight > 2291.50
## | | | |--- citympg <= 22.00
## | | | | |--- enginesize <= 125.50
## | | | | | |--- drivewheel_fwd <= 0.50
## | | | | | | |--- value: [11395.00]
## | | | | | |--- drivewheel_fwd > 0.50
## | | | | | | |--- symboling <= 2.50
## | | | | | | | |--- value: [12170.00]
## | | | | | | |--- symboling > 2.50
## | | | | | | | |--- value: [11850.00]
## | | | | |--- enginesize > 125.50
## | | | | | |--- highwaympg <= 24.50
## | | | | | | |--- value: [13295.00]
## | | | | | |--- highwaympg > 24.50
## | | | | | | |--- boreratio <= 3.33
## | | | | | | | |--- value: [15250.00]
## | | | | | | |--- boreratio > 3.33
## | | | | | | | |--- value: [14997.50]
## | | | |--- citympg > 22.00
## | | | | |--- wheelbase <= 99.30
## | | | | | |--- curbweight <= 2422.50
## | | | | | | |--- horsepower <= 91.00
## | | | | | | | |--- wheelbase <= 96.95
## | | | | | | | | |--- curbweight <= 2346.50
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- value: [8845.00]
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- value: [8499.00]
## | | | | | | | | |--- curbweight > 2346.50
## | | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | | |--- value: [6989.00]
## | | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | | |--- enginetype_ohc <= 0.50
## | | | | | | | | | | | |--- value: [8013.00]
## | | | | | | | | | | |--- enginetype_ohc > 0.50
## | | | | | | | | | | | |--- value: [8189.00]
## | | | | | | | |--- wheelbase > 96.95
## | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | |--- highwaympg <= 28.50
## | | | | | | | | | | |--- value: [9233.00]
## | | | | | | | | | |--- highwaympg > 28.50
## | | | | | | | | | | |--- fueltype_gas <= 0.50
## | | | | | | | | | | | |--- value: [9495.00]
## | | | | | | | | | | |--- fueltype_gas > 0.50
## | | | | | | | | | | | |--- value: [9370.00]
## | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | |--- value: [9720.00]
## | | | | | | |--- horsepower > 91.00
## | | | | | | | |--- carwidth <= 65.45
## | | | | | | | | |--- enginetype_ohcf <= 0.50
## | | | | | | | | | |--- carheight <= 50.50
## | | | | | | | | | | |--- value: [9959.00]
## | | | | | | | | | |--- carheight > 50.50
## | | | | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | |--- enginetype_ohcf > 0.50
## | | | | | | | | | |--- value: [9960.00]
## | | | | | | | |--- carwidth > 65.45
## | | | | | | | | |--- wheelbase <= 96.90
## | | | | | | | | | |--- value: [10345.00]
## | | | | | | | | |--- wheelbase > 96.90
## | | | | | | | | | |--- value: [9995.00]
## | | | | | |--- curbweight > 2422.50
## | | | | | | |--- carwidth <= 65.30
## | | | | | | | |--- value: [12945.00]
## | | | | | | |--- carwidth > 65.30
## | | | | | | | |--- stroke <= 3.45
## | | | | | | | | |--- carbody_wagon <= 0.50
## | | | | | | | | | |--- peakrpm <= 4725.00
## | | | | | | | | | | |--- value: [10795.00]
## | | | | | | | | | |--- peakrpm > 4725.00
## | | | | | | | | | | |--- enginetype_ohc <= 0.50
## | | | | | | | | | | | |--- value: [11259.00]
## | | | | | | | | | | |--- enginetype_ohc > 0.50
## | | | | | | | | | | | |--- value: [11245.00]
## | | | | | | | | |--- carbody_wagon > 0.50
## | | | | | | | | | |--- value: [10198.00]
## | | | | | | | |--- stroke > 3.45
## | | | | | | | | |--- curbweight <= 2629.00
## | | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | | |--- curbweight <= 2538.00
## | | | | | | | | | | | |--- value: [9639.00]
## | | | | | | | | | | |--- curbweight > 2538.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | | |--- drivewheel_fwd <= 0.50
## | | | | | | | | | | | |--- value: [9989.00]
## | | | | | | | | | | |--- drivewheel_fwd > 0.50
## | | | | | | | | | | | |--- value: [9895.00]
## | | | | | | | | |--- curbweight > 2629.00
## | | | | | | | | | |--- value: [11199.00]
## | | | | |--- wheelbase > 99.30
## | | | | | |--- wheelbase <= 101.80
## | | | | | | |--- compressionratio <= 8.90
## | | | | | | | |--- value: [16925.00]
## | | | | | | |--- compressionratio > 8.90
## | | | | | | | |--- carbody_wagon <= 0.50
## | | | | | | | | |--- peakrpm <= 5000.00
## | | | | | | | | | |--- value: [13845.00]
## | | | | | | | | |--- peakrpm > 5000.00
## | | | | | | | | | |--- value: [13950.00]
## | | | | | | | |--- carbody_wagon > 0.50
## | | | | | | | | |--- value: [12290.00]
## | | | | | |--- wheelbase > 101.80
## | | | | | | |--- carlength <= 175.10
## | | | | | | | |--- value: [8921.00]
## | | | | | | |--- carlength > 175.10
## | | | | | | | |--- highwaympg <= 33.50
## | | | | | | | | |--- curbweight <= 2436.00
## | | | | | | | | | |--- carheight <= 54.40
## | | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | | |--- carheight > 54.40
## | | | | | | | | | | |--- value: [10898.00]
## | | | | | | | | |--- curbweight > 2436.00
## | | | | | | | | | |--- carheight <= 54.40
## | | | | | | | | | | |--- value: [11248.00]
## | | | | | | | | | |--- carheight > 54.40
## | | | | | | | | | | |--- value: [10698.00]
## | | | | | | | |--- highwaympg > 33.50
## | | | | | | | | |--- value: [8948.00]
## | |--- curbweight > 2697.50
## | | |--- enginesize <= 119.50
## | | | |--- drivewheel_rwd <= 0.50
## | | | | |--- value: [8778.00]
## | | | |--- drivewheel_rwd > 0.50
## | | | | |--- value: [11048.00]
## | | |--- enginesize > 119.50
## | | | |--- peakrpm <= 5450.00
## | | | | |--- peakrpm <= 4525.00
## | | | | | |--- enginesize <= 158.00
## | | | | | | |--- citympg <= 26.50
## | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | |--- value: [13860.00]
## | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | |--- value: [17075.00]
## | | | | | | |--- citympg > 26.50
## | | | | | | | |--- citympg <= 29.50
## | | | | | | | | |--- value: [16900.00]
## | | | | | | | |--- citympg > 29.50
## | | | | | | | | |--- value: [18344.00]
## | | | | | |--- enginesize > 158.00
## | | | | | | |--- wheelbase <= 102.35
## | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | |--- value: [21105.00]
## | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | |--- value: [20970.00]
## | | | | | | |--- wheelbase > 102.35
## | | | | | | | |--- value: [24565.00]
## | | | | |--- peakrpm > 4525.00
## | | | | | |--- carwidth <= 68.65
## | | | | | | |--- horsepower <= 153.00
## | | | | | | | |--- carheight <= 52.50
## | | | | | | | | |--- curbweight <= 2869.50
## | | | | | | | | | |--- stroke <= 3.68
## | | | | | | | | | | |--- value: [11549.00]
## | | | | | | | | | |--- stroke > 3.68
## | | | | | | | | | | |--- fuelsystem_spdi <= 0.50
## | | | | | | | | | | | |--- value: [12964.00]
## | | | | | | | | | | |--- fuelsystem_spdi > 0.50
## | | | | | | | | | | | |--- value: [12764.00]
## | | | | | | | | |--- curbweight > 2869.50
## | | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | | |--- value: [14869.00]
## | | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | | |--- value: [14489.00]
## | | | | | | | |--- carheight > 52.50
## | | | | | | | | |--- citympg <= 23.50
## | | | | | | | | | |--- carlength <= 187.75
## | | | | | | | | | | |--- horsepower <= 131.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- horsepower > 131.00
## | | | | | | | | | | | |--- value: [13499.00]
## | | | | | | | | | |--- carlength > 187.75
## | | | | | | | | | | |--- stroke <= 3.17
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- stroke > 3.17
## | | | | | | | | | | | |--- value: [12440.00]
## | | | | | | | | |--- citympg > 23.50
## | | | | | | | | | |--- carheight <= 54.60
## | | | | | | | | | | |--- value: [17669.00]
## | | | | | | | | | |--- carheight > 54.60
## | | | | | | | | | | |--- curbweight <= 2988.50
## | | | | | | | | | | | |--- value: [15985.00]
## | | | | | | | | | | |--- curbweight > 2988.50
## | | | | | | | | | | | |--- value: [16515.00]
## | | | | | | |--- horsepower > 153.00
## | | | | | | | |--- citympg <= 18.00
## | | | | | | | | |--- stroke <= 3.21
## | | | | | | | | | |--- value: [18950.00]
## | | | | | | | | |--- stroke > 3.21
## | | | | | | | | | |--- value: [19699.00]
## | | | | | | | |--- citympg > 18.00
## | | | | | | | | |--- wheelbase <= 92.90
## | | | | | | | | | |--- value: [17199.00]
## | | | | | | | | |--- wheelbase > 92.90
## | | | | | | | | | |--- curbweight <= 2996.00
## | | | | | | | | | | |--- symboling <= 2.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- symboling > 2.00
## | | | | | | | | | | | |--- value: [16558.00]
## | | | | | | | | | |--- curbweight > 2996.00
## | | | | | | | | | | |--- wheelbase <= 103.70
## | | | | | | | | | | | |--- value: [15998.00]
## | | | | | | | | | | |--- wheelbase > 103.70
## | | | | | | | | | | | |--- value: [15750.00]
## | | | | | |--- carwidth > 68.65
## | | | | | | |--- aspiration_turbo <= 0.50
## | | | | | | | |--- value: [16845.00]
## | | | | | | |--- aspiration_turbo > 0.50
## | | | | | | | |--- value: [22625.00]
## | | | |--- peakrpm > 5450.00
## | | | | |--- enginesize <= 143.50
## | | | | | |--- carbody_wagon <= 0.50
## | | | | | | |--- boreratio <= 3.37
## | | | | | | | |--- wheelbase <= 99.45
## | | | | | | | | |--- value: [17450.00]
## | | | | | | | |--- wheelbase > 99.45
## | | | | | | | | |--- carbody_hatchback <= 0.50
## | | | | | | | | | |--- value: [17710.00]
## | | | | | | | | |--- carbody_hatchback > 0.50
## | | | | | | | | | |--- value: [17859.17]
## | | | | | | |--- boreratio > 3.37
## | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | |--- citympg <= 18.50
## | | | | | | | | | |--- value: [18150.00]
## | | | | | | | | |--- citympg > 18.50
## | | | | | | | | | |--- value: [18620.00]
## | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | |--- value: [18150.00]
## | | | | | |--- carbody_wagon > 0.50
## | | | | | | |--- value: [18920.00]
## | | | | |--- enginesize > 143.50
## | | | | | |--- cylindernumber_four <= 0.50
## | | | | | | |--- value: [21485.00]
## | | | | | |--- cylindernumber_four > 0.50
## | | | | | | |--- value: [22018.00]
## |--- enginesize > 182.00
## | |--- fuelsystem_mpfi <= 0.50
## | | |--- wheelbase <= 112.80
## | | | |--- carbody_sedan <= 0.50
## | | | | |--- value: [28176.00]
## | | | |--- carbody_sedan > 0.50
## | | | | |--- value: [25552.00]
## | | |--- wheelbase > 112.80
## | | | |--- value: [31600.00]
## | |--- fuelsystem_mpfi > 0.50
## | | |--- doornumber_two <= 0.50
## | | | |--- highwaympg <= 20.50
## | | | | |--- enginetype_ohcv <= 0.50
## | | | | | |--- value: [33900.00]
## | | | | |--- enginetype_ohcv > 0.50
## | | | | | |--- value: [34184.00]
## | | | |--- highwaympg > 20.50
## | | | | |--- value: [30760.00]
## | | |--- doornumber_two > 0.50
## | | | |--- compressionratio <= 8.15
## | | | | |--- value: [41315.00]
## | | | |--- compressionratio > 8.15
## | | | | |--- curbweight <= 2778.00
## | | | | | |--- value: [32528.00]
## | | | | |--- curbweight > 2778.00
## | | | | | |--- highwaympg <= 21.50
## | | | | | | |--- horsepower <= 208.50
## | | | | | | | |--- value: [35056.00]
## | | | | | | |--- horsepower > 208.50
## | | | | | | | |--- value: [36000.00]
## | | | | | |--- highwaympg > 21.50
## | | | | | | |--- value: [37028.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos_dummis.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 0.663199
## 5 curbweight 0.255530
## 12 citympg 0.014422
## 11 peakrpm 0.012998
## 1 wheelbase 0.012245
## 40 fuelsystem_mpfi 0.011149
## 10 horsepower 0.005951
## 3 carwidth 0.005139
## 9 compressionratio 0.004650
## 16 doornumber_two 0.002660
## 13 highwaympg 0.002078
## 15 aspiration_turbo 0.001959
## 2 carlength 0.001671
## 0 symboling 0.001545
## 18 carbody_hatchback 0.001421
## 4 carheight 0.001183
## 8 stroke 0.000566
## 19 carbody_sedan 0.000403
## 20 carbody_wagon 0.000380
## 22 drivewheel_rwd 0.000342
## 7 boreratio 0.000204
## 26 enginetype_ohc 0.000144
## 36 fuelsystem_2bbl 0.000061
## 21 drivewheel_fwd 0.000045
## 27 enginetype_ohcf 0.000029
## 31 cylindernumber_four 0.000017
## 28 enginetype_ohcv 0.000006
## 41 fuelsystem_spdi 0.000002
## 14 fueltype_gas 0.000001
## 24 enginetype_dohcv 0.000000
## 35 cylindernumber_two 0.000000
## 39 fuelsystem_mfi 0.000000
## 38 fuelsystem_idi 0.000000
## 37 fuelsystem_4bbl 0.000000
## 17 carbody_hardtop 0.000000
## 33 cylindernumber_three 0.000000
## 34 cylindernumber_twelve 0.000000
## 25 enginetype_l 0.000000
## 32 cylindernumber_six 0.000000
## 30 cylindernumber_five 0.000000
## 29 enginetype_rotor 0.000000
## 23 enginelocation_rear 0.000000
## 42 fuelsystem_spfi 0.000000
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 7295. , 18950. , 13499. , 9298. , 7895. , 5572. , 16630. ,
## 15750. , 32528. , 10198. , 8845. , 6989. , 22625. , 7799. ,
## 22625. , 6849. , 7975. , 14997.5, 6189. , 7898. , 15998. ,
## 7609. , 16630. , 28176. , 8921. , 16900. , 6189. , 17710. ,
## 41315. , 16925. , 13415. , 12764. , 11395. , 7099. , 11395. ,
## 33900. , 37028. , 6488. , 34184. , 16900. , 5151. ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 36 0 96.5 ... 7295.0 7295.0
## 198 -2 104.3 ... 18420.0 18950.0
## 102 0 100.4 ... 14399.0 13499.0
## 146 0 97.0 ... 7463.0 9298.0
## 79 1 93.0 ... 7689.0 7895.0
## 32 1 93.7 ... 5399.0 5572.0
## 107 0 107.9 ... 11900.0 16630.0
## 180 -1 104.5 ... 15690.0 15750.0
## 127 3 89.5 ... 34028.0 32528.0
## 149 0 96.9 ... 11694.0 10198.0
## 43 0 94.3 ... 6785.0 8845.0
## 40 0 96.5 ... 10295.0 6989.0
## 203 -1 109.1 ... 22470.0 22625.0
## 138 2 93.7 ... 5118.0 7799.0
## 201 -1 109.1 ... 19045.0 22625.0
## 20 0 94.5 ... 6575.0 6849.0
## 164 1 94.5 ... 8238.0 7975.0
## 65 0 104.9 ... 18280.0 14997.5
## 22 1 93.7 ... 6377.0 6189.0
## 186 2 97.3 ... 8495.0 7898.0
## 106 1 99.2 ... 18399.0 15998.0
## 156 0 95.7 ... 6938.0 7609.0
## 111 0 107.9 ... 15580.0 16630.0
## 68 -1 110.0 ... 28248.0 28176.0
## 123 -1 103.3 ... 8921.0 8921.0
## 108 0 107.9 ... 13200.0 16900.0
## 78 2 93.7 ... 6669.0 6189.0
## 8 1 105.8 ... 23875.0 17710.0
## 74 1 112.0 ... 45400.0 41315.0
## 10 2 101.2 ... 16430.0 16925.0
## 113 0 114.2 ... 16695.0 13415.0
## 82 3 95.9 ... 12629.0 12764.0
## 57 3 95.3 ... 13645.0 11395.0
## 158 0 95.7 ... 7898.0 7099.0
## 58 3 95.3 ... 15645.0 11395.0
## 17 0 110.0 ... 36880.0 33900.0
## 129 1 98.4 ... 31400.5 37028.0
## 150 1 95.7 ... 5348.0 6488.0
## 73 0 120.9 ... 40960.0 34184.0
## 116 0 107.9 ... 17950.0 16900.0
## 30 2 86.6 ... 6479.0 5151.0
##
## [41 rows x 45 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2609.8260641977377
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2609.8260641977377
Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 2022)
modelo_rf.fit(X_entrena, Y_entrena)
## RandomForestRegressor(n_estimators=20, random_state=2022)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 7491.525 , 17491. , 14890.2 , 8524.8 ,
## 8057.65 , 5844.125 , 15933.55 , 16705.9 ,
## 33379.85 , 12634.36666667, 8236.6 , 8964.05 ,
## 18462.6 , 7126.6 , 18935.05 , 8497.975 ,
## 8069.15 , 13756.65 , 5623.75 , 8621.4 ,
## 18107.5 , 7749.25 , 16508.125 , 28952.95 ,
## 9214.95 , 16684.3 , 6481.55 , 19457.3667 ,
## 36978.05 , 15329.55 , 15677.75 , 13909.00835 ,
## 11499.45 , 7993.925 , 11575.1 , 35511.35 ,
## 36234.7 , 6544.775 , 36366.2 , 16684.3 ,
## 5707.925 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 36 0 96.5 ... 7295.0 7491.525000
## 198 -2 104.3 ... 18420.0 17491.000000
## 102 0 100.4 ... 14399.0 14890.200000
## 146 0 97.0 ... 7463.0 8524.800000
## 79 1 93.0 ... 7689.0 8057.650000
## 32 1 93.7 ... 5399.0 5844.125000
## 107 0 107.9 ... 11900.0 15933.550000
## 180 -1 104.5 ... 15690.0 16705.900000
## 127 3 89.5 ... 34028.0 33379.850000
## 149 0 96.9 ... 11694.0 12634.366667
## 43 0 94.3 ... 6785.0 8236.600000
## 40 0 96.5 ... 10295.0 8964.050000
## 203 -1 109.1 ... 22470.0 18462.600000
## 138 2 93.7 ... 5118.0 7126.600000
## 201 -1 109.1 ... 19045.0 18935.050000
## 20 0 94.5 ... 6575.0 8497.975000
## 164 1 94.5 ... 8238.0 8069.150000
## 65 0 104.9 ... 18280.0 13756.650000
## 22 1 93.7 ... 6377.0 5623.750000
## 186 2 97.3 ... 8495.0 8621.400000
## 106 1 99.2 ... 18399.0 18107.500000
## 156 0 95.7 ... 6938.0 7749.250000
## 111 0 107.9 ... 15580.0 16508.125000
## 68 -1 110.0 ... 28248.0 28952.950000
## 123 -1 103.3 ... 8921.0 9214.950000
## 108 0 107.9 ... 13200.0 16684.300000
## 78 2 93.7 ... 6669.0 6481.550000
## 8 1 105.8 ... 23875.0 19457.366700
## 74 1 112.0 ... 45400.0 36978.050000
## 10 2 101.2 ... 16430.0 15329.550000
## 113 0 114.2 ... 16695.0 15677.750000
## 82 3 95.9 ... 12629.0 13909.008350
## 57 3 95.3 ... 13645.0 11499.450000
## 158 0 95.7 ... 7898.0 7993.925000
## 58 3 95.3 ... 15645.0 11575.100000
## 17 0 110.0 ... 36880.0 35511.350000
## 129 1 98.4 ... 31400.5 36234.700000
## 150 1 95.7 ... 5348.0 6544.775000
## 73 0 120.9 ... 40960.0 36366.200000
## 116 0 107.9 ... 17950.0 16684.300000
## 30 2 86.6 ... 6479.0 5707.925000
##
## [41 rows x 45 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2468.378731065271
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2468.378731065271
Se comparan las predicciones de lo tres modelos de regresión
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 36 0 96.5 ... 7295.0 7491.525000
## 198 -2 104.3 ... 18950.0 17491.000000
## 102 0 100.4 ... 13499.0 14890.200000
## 146 0 97.0 ... 9298.0 8524.800000
## 79 1 93.0 ... 7895.0 8057.650000
## 32 1 93.7 ... 5572.0 5844.125000
## 107 0 107.9 ... 16630.0 15933.550000
## 180 -1 104.5 ... 15750.0 16705.900000
## 127 3 89.5 ... 32528.0 33379.850000
## 149 0 96.9 ... 10198.0 12634.366667
## 43 0 94.3 ... 8845.0 8236.600000
## 40 0 96.5 ... 6989.0 8964.050000
## 203 -1 109.1 ... 22625.0 18462.600000
## 138 2 93.7 ... 7799.0 7126.600000
## 201 -1 109.1 ... 22625.0 18935.050000
## 20 0 94.5 ... 6849.0 8497.975000
## 164 1 94.5 ... 7975.0 8069.150000
## 65 0 104.9 ... 14997.5 13756.650000
## 22 1 93.7 ... 6189.0 5623.750000
## 186 2 97.3 ... 7898.0 8621.400000
## 106 1 99.2 ... 15998.0 18107.500000
## 156 0 95.7 ... 7609.0 7749.250000
## 111 0 107.9 ... 16630.0 16508.125000
## 68 -1 110.0 ... 28176.0 28952.950000
## 123 -1 103.3 ... 8921.0 9214.950000
## 108 0 107.9 ... 16900.0 16684.300000
## 78 2 93.7 ... 6189.0 6481.550000
## 8 1 105.8 ... 17710.0 19457.366700
## 74 1 112.0 ... 41315.0 36978.050000
## 10 2 101.2 ... 16925.0 15329.550000
## 113 0 114.2 ... 13415.0 15677.750000
## 82 3 95.9 ... 12764.0 13909.008350
## 57 3 95.3 ... 11395.0 11499.450000
## 158 0 95.7 ... 7099.0 7993.925000
## 58 3 95.3 ... 11395.0 11575.100000
## 17 0 110.0 ... 33900.0 35511.350000
## 129 1 98.4 ... 37028.0 36234.700000
## 150 1 95.7 ... 6488.0 6544.775000
## 73 0 120.9 ... 34184.0 36366.200000
## 116 0 107.9 ... 16900.0 16684.300000
## 30 2 86.6 ... 5151.0 5707.925000
##
## [41 rows x 47 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2887.3005726 , 2609.8260642 , 2468.37873107]])
Se construye data.frame a partir del arreglo numpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 2887.300573 2609.826064 2468.378731
Puede ser similar a la de R ….. Pendiente …..
Se cargaron todos los datos numéricos y categóricos del conjunto de datos de precios de automóviles.
Pendiente
El mejor modelo de regresión conforme al estadístico raiz del error cuadrático medio (rmse) fue bosques aleatorios; se tuvo como resultado un de 2468.37 de diferencia en promedio de las predicciones conforme a valores reales.
Se construyeron datos de entrenamiento y validación y con el porcentaje de 80% y 20% respectivamente.