Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv
Participan todas las variables del conjunto de datos.
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal comparando el estadístico RMSE de cada modelo y se menciona cual modelo es mejor predictor.
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment.csv")
datos
## car_ID symboling CarName ... citympg highwaympg price
## 0 1 3 alfa-romero giulia ... 21 27 13495.0
## 1 2 3 alfa-romero stelvio ... 21 27 16500.0
## 2 3 1 alfa-romero Quadrifoglio ... 19 26 16500.0
## 3 4 2 audi 100 ls ... 24 30 13950.0
## 4 5 2 audi 100ls ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 volvo 145e (sw) ... 23 28 16845.0
## 201 202 -1 volvo 144ea ... 19 25 19045.0
## 202 203 -1 volvo 244dl ... 18 23 21485.0
## 203 204 -1 volvo 246 ... 26 27 22470.0
## 204 205 -1 volvo 264gl ... 19 25 22625.0
##
## [205 rows x 26 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 26)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## car_ID int64
## symboling int64
## CarName object
## fueltype object
## aspiration object
## doornumber object
## carbody object
## drivewheel object
## enginelocation object
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginetype object
## cylindernumber object
## enginesize int64
## fuelsystem object
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Car_ID | Unique id of each observation (Interger) |
| 2 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 3 | carCompany | Name of car company (Categorical) |
| 4 | fueltype | Car fuel type i.e gas or diesel (Categorical) |
| 5 | aspiration | Aspiration used in a car (Categorical) (Std o Turbo) |
| 6 | doornumber | Number of doors in a car (Categorical). Puertas |
| 7 | carbody | body of car (Categorical). (convertible, sedan, wagon …) |
| 8 | drivewheel | type of drive wheel (Categorical). (hidráulica, manual, ) |
| 9 | enginelocation | Location of car engine (Categorical). Lugar del motor |
| 10 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 11 | carlength | Length of car (Numeric). Longitud |
| 12 | carwidth | Width of car (Numeric). Amplitud |
| 13 | carheight | height of car (Numeric). Altura |
| 14 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 15 | enginetype | Type of engine. (Categorical). Tipo de motor |
| 16 | cylindernumber | cylinder placed in the car (Categorical). Cilindraje |
| 17 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 18 | fuelsystem | Fuel system of car (Categorical) |
| 19 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 20 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 21 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 22 | horsepower | Horsepower (Numeric). Poder del carro |
| 23 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 24 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 25 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 26 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Quitar variables que no reflejan algún interés estadístico es decir, quitar la columnas 1 y 3, car_ID y CarName
datos = datos[['symboling','fueltype','aspiration','doornumber','carbody','drivewheel','enginelocation','wheelbase','carlength','carwidth','carheight','curbweight', 'enginetype','cylindernumber','enginesize','fuelsystem','boreratio','stroke','compressionratio','horsepower','peakrpm','citympg','highwaympg', 'price']]
# datos.describe()
datos
## symboling fueltype aspiration ... citympg highwaympg price
## 0 3 gas std ... 21 27 13495.0
## 1 3 gas std ... 21 27 16500.0
## 2 1 gas std ... 19 26 16500.0
## 3 2 gas std ... 24 30 13950.0
## 4 2 gas std ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 gas std ... 23 28 16845.0
## 201 -1 gas turbo ... 19 25 19045.0
## 202 -1 gas std ... 18 23 21485.0
## 203 -1 diesel turbo ... 26 27 22470.0
## 204 -1 gas turbo ... 19 25 22625.0
##
## [205 rows x 24 columns]
Existen variables que son categóricas: fueltype object aspiration object doornumber object carbody object drivewheel object enginelocation object enginetype object cylindernumber object fuelsystem object
Identificar variables Dummys y construir un conjunto de datos que incluye las variable dummis.
El método de la librería de Pandas llamado get_dummies() convierte los datos categóricos en variables indicadoras o ficticias.
¿Qué son las variable dummis?, significa realizar una codificación de una variable categórica para convertirla en varias columnas con el identificador del registro al que corresponde, obteniendo 1 o 0 en el caso de que se cumpla la condición en el registro.
Ejemplo
| genero |
|---|
| MASCULINO |
| FEMENINO |
| MASCULINO |
Mismos datos con variables dummis
| genero_masculino | genero_femenino |
|---|---|
| 1 | 0 |
| 0 | 1 |
| 1 | 0 |
datos_dummis = pd.get_dummies (datos, drop_first = True)
datos_dummis
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 0 3 88.6 ... 0 0
## 1 3 88.6 ... 0 0
## 2 1 94.5 ... 0 0
## 3 2 99.8 ... 0 0
## 4 2 99.4 ... 0 0
## .. ... ... ... ... ...
## 200 -1 109.1 ... 0 0
## 201 -1 109.1 ... 0 0
## 202 -1 109.1 ... 0 0
## 203 -1 109.1 ... 0 0
## 204 -1 109.1 ... 0 0
##
## [205 rows x 44 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1270
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos_dummis.drop(columns = "price"), datos_dummis['price'],train_size = 0.80, random_state = 1270)
X_entrena
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 14 1 103.5 ... 0 0
## 24 1 93.7 ... 0 0
## 195 -1 104.3 ... 0 0
## 118 1 93.7 ... 0 0
## 112 0 107.9 ... 0 0
## .. ... ... ... ... ...
## 99 0 97.2 ... 0 0
## 57 3 95.3 ... 0 0
## 50 1 93.1 ... 0 0
## 23 1 93.7 ... 0 0
## 46 2 96.0 ... 0 1
##
## [164 rows x 43 columns]
X_valida
## symboling wheelbase ... fuelsystem_spdi fuelsystem_spfi
## 91 1 94.5 ... 0 0
## 188 2 97.3 ... 0 0
## 136 3 99.1 ... 0 0
## 178 3 102.9 ... 0 0
## 191 0 100.4 ... 0 0
## 41 0 96.5 ... 0 0
## 138 2 93.7 ... 0 0
## 125 3 94.5 ... 0 0
## 68 -1 110.0 ... 0 0
## 159 0 95.7 ... 0 0
## 165 1 94.5 ... 0 0
## 110 0 114.2 ... 0 0
## 128 3 89.5 ... 0 0
## 12 0 101.2 ... 0 0
## 21 1 93.7 ... 0 0
## 42 1 96.5 ... 0 0
## 130 0 96.1 ... 0 0
## 49 0 102.0 ... 0 0
## 79 1 93.0 ... 1 0
## 154 0 95.7 ... 0 0
## 108 0 107.9 ... 0 0
## 200 -1 109.1 ... 0 0
## 31 2 86.6 ... 0 0
## 66 0 104.9 ... 0 0
## 105 3 91.3 ... 0 0
## 143 0 97.2 ... 0 0
## 157 0 95.7 ... 0 0
## 140 2 93.3 ... 0 0
## 78 2 93.7 ... 0 0
## 122 1 93.7 ... 0 0
## 150 1 95.7 ... 0 0
## 33 1 93.7 ... 0 0
## 15 0 103.5 ... 0 0
## 63 0 98.8 ... 0 0
## 43 0 94.3 ... 0 0
## 28 -1 103.3 ... 0 0
## 9 0 99.5 ... 0 0
## 97 1 94.5 ... 0 0
## 95 1 94.5 ... 0 0
## 147 0 97.0 ... 0 0
## 81 3 96.3 ... 0 0
##
## [41 rows x 43 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 7.08246130e+01, 4.97123832e+01, -2.16196011e+01, 5.48528107e+02,
## 5.27371588e+01, 3.82732156e+00, 1.19387168e+02, -1.90095363e+03,
## -4.52661596e+03, -1.26052438e+03, 1.94580412e+00, 2.38234138e+00,
## 8.28824386e+00, 9.34452584e+01, -8.56556091e+03, 1.80339184e+03,
## -6.44361199e+00, -3.48242766e+03, -3.78594142e+03, -2.95267608e+03,
## -4.43529876e+03, 3.33446885e+02, 1.27793976e+03, 7.59764056e+03,
## -3.41562385e+03, 1.85344789e+02, 3.64200606e+03, 1.96999358e+03,
## -5.23758925e+03, 6.40707629e+02, -8.62431175e+03, -1.02967042e+04,
## -5.78054666e+03, -2.28037137e+03, -8.64019967e-12, 6.40707629e+02,
## -1.46827527e+02, -2.05584075e+03, 8.56556091e+03, -3.77553325e+03,
## -2.60481249e+02, -3.44964556e+03, -5.12116830e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.9403176387600196
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 6210.17677474 9001.16003137 11694.59290399 20638.42790364
## 16948.32352563 10089.76805812 7173.53049915 17263.34502768
## 26803.60880878 8073.32087234 8191.92373462 16252.52201166
## 36928.82980856 20705.74251711 6055.94363584 9466.96761627
## 9212.65992687 43574.42836317 6880.11129263 5658.2567314
## 17460.90236722 18151.05606293 6711.3249534 13153.36507387
## 20616.77263081 10143.10039083 7086.24773204 6595.5361855
## 6867.09661419 8522.70022369 6253.75631603 7107.52874324
## 31173.31571665 11409.83549429 8449.78726549 9745.354715
## 20755.85344936 5030.77576914 5728.14633041 8982.88289 ]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 91 1 94.5 ... 6649.000 6210.176775
## 188 2 97.3 ... 9995.000 9001.160031
## 136 3 99.1 ... 18150.000 11694.592904
## 178 3 102.9 ... 16558.000 20638.427904
## 191 0 100.4 ... 13295.000 16948.323526
## 41 0 96.5 ... 12945.000 10089.768058
## 138 2 93.7 ... 5118.000 7173.530499
## 125 3 94.5 ... 22018.000 17263.345028
## 68 -1 110.0 ... 28248.000 26803.608809
## 159 0 95.7 ... 7788.000 8073.320872
## 165 1 94.5 ... 9298.000 8191.923735
## 110 0 114.2 ... 13860.000 16252.522012
## 128 3 89.5 ... 37028.000 36928.829809
## 12 0 101.2 ... 20970.000 20705.742517
## 21 1 93.7 ... 5572.000 6055.943636
## 42 1 96.5 ... 10345.000 9466.967616
## 130 0 96.1 ... 9295.000 9212.659927
## 49 0 102.0 ... 36000.000 43574.428363
## 79 1 93.0 ... 7689.000 6880.111293
## 154 0 95.7 ... 7898.000 5658.256731
## 108 0 107.9 ... 13200.000 17460.902367
## 200 -1 109.1 ... 16845.000 18151.056063
## 31 2 86.6 ... 6855.000 6711.324953
## 66 0 104.9 ... 18344.000 13153.365074
## 105 3 91.3 ... 19699.000 20616.772631
## 143 0 97.2 ... 9960.000 10143.100391
## 157 0 95.7 ... 7198.000 7086.247732
## 140 2 93.3 ... 7603.000 6595.536185
## 78 2 93.7 ... 6669.000 6867.096614
## 122 1 93.7 ... 7609.000 8522.700224
## 150 1 95.7 ... 5348.000 6253.756316
## 33 1 93.7 ... 6529.000 7107.528743
## 15 0 103.5 ... 30760.000 31173.315717
## 63 0 98.8 ... 10795.000 11409.835494
## 43 0 94.3 ... 6785.000 8449.787265
## 28 -1 103.3 ... 8921.000 9745.354715
## 9 0 99.5 ... 17859.167 20755.853449
## 97 1 94.5 ... 7999.000 5030.775769
## 95 1 94.5 ... 7799.000 5728.146330
## 147 0 97.0 ... 10198.000 8982.882890
## 81 3 96.3 ... 8499.000 9632.441263
##
## [41 rows x 45 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 2518.6653145552323
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 2518.6653145552323
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 1270
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1270)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(random_state=1270)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 16
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 150
#plot = plot_tree(
# decision_tree = modelo_ar,
# feature_names = datos.drop(columns = "price").columns,
# class_names = 'price',
# filled = True,
# impurity = False,
# fontsize = 10,
# precision = 2,
# ax = ax
# )
#plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos_dummis.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2544.00
## | | |--- curbweight <= 2295.00
## | | | |--- curbweight <= 2121.00
## | | | | |--- carbody_hatchback <= 0.50
## | | | | | |--- carlength <= 156.50
## | | | | | | |--- value: [8916.50]
## | | | | | |--- carlength > 156.50
## | | | | | | |--- curbweight <= 1899.00
## | | | | | | | |--- value: [5499.00]
## | | | | | | |--- curbweight > 1899.00
## | | | | | | | |--- symboling <= 1.50
## | | | | | | | | |--- curbweight <= 1947.50
## | | | | | | | | | |--- stroke <= 3.22
## | | | | | | | | | | |--- curbweight <= 1927.00
## | | | | | | | | | | | |--- value: [6575.00]
## | | | | | | | | | | |--- curbweight > 1927.00
## | | | | | | | | | | | |--- value: [6695.00]
## | | | | | | | | | |--- stroke > 3.22
## | | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | |--- curbweight > 1947.50
## | | | | | | | | | |--- curbweight <= 2087.50
## | | | | | | | | | | |--- carheight <= 53.25
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- carheight > 53.25
## | | | | | | | | | | | |--- truncated branch of depth 6
## | | | | | | | | | |--- curbweight > 2087.50
## | | | | | | | | | | |--- value: [7738.00]
## | | | | | | | |--- symboling > 1.50
## | | | | | | | | |--- value: [8249.00]
## | | | | |--- carbody_hatchback > 0.50
## | | | | | |--- horsepower <= 71.50
## | | | | | | |--- enginesize <= 84.50
## | | | | | | | |--- symboling <= 1.50
## | | | | | | | | |--- value: [5399.00]
## | | | | | | | |--- symboling > 1.50
## | | | | | | | | |--- value: [5151.00]
## | | | | | | |--- enginesize > 84.50
## | | | | | | | |--- citympg <= 30.50
## | | | | | | | | |--- value: [5195.00]
## | | | | | | | |--- citympg > 30.50
## | | | | | | | | |--- peakrpm <= 5450.00
## | | | | | | | | | |--- curbweight <= 1902.50
## | | | | | | | | | | |--- curbweight <= 1887.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 1887.00
## | | | | | | | | | | | |--- value: [6095.00]
## | | | | | | | | | |--- curbweight > 1902.50
## | | | | | | | | | | |--- carwidth <= 63.90
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- carwidth > 63.90
## | | | | | | | | | | | |--- value: [6795.00]
## | | | | | | | | |--- peakrpm > 5450.00
## | | | | | | | | | |--- citympg <= 34.00
## | | | | | | | | | | |--- curbweight <= 1910.00
## | | | | | | | | | | | |--- value: [6377.00]
## | | | | | | | | | | |--- curbweight > 1910.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- citympg > 34.00
## | | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | | |--- value: [5389.00]
## | | | | | |--- horsepower > 71.50
## | | | | | | |--- enginetype_ohcf <= 0.50
## | | | | | | | |--- value: [7129.00]
## | | | | | | |--- enginetype_ohcf > 0.50
## | | | | | | | |--- value: [7053.00]
## | | | |--- curbweight > 2121.00
## | | | | |--- highwaympg <= 29.50
## | | | | | |--- carwidth <= 64.10
## | | | | | | |--- value: [9980.00]
## | | | | | |--- carwidth > 64.10
## | | | | | | |--- value: [11595.00]
## | | | | |--- highwaympg > 29.50
## | | | | | |--- highwaympg <= 36.50
## | | | | | | |--- boreratio <= 3.23
## | | | | | | | |--- curbweight <= 2282.00
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- citympg <= 27.50
## | | | | | | | | | | |--- curbweight <= 2243.50
## | | | | | | | | | | | |--- value: [8195.00]
## | | | | | | | | | | |--- curbweight > 2243.50
## | | | | | | | | | | | |--- value: [8495.00]
## | | | | | | | | | |--- citympg > 27.50
## | | | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | | | |--- value: [8358.00]
## | | | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | | | |--- value: [9258.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- carheight <= 50.70
## | | | | | | | | | | |--- value: [8558.00]
## | | | | | | | | | |--- carheight > 50.70
## | | | | | | | | | | |--- stroke <= 3.21
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- stroke > 3.21
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | |--- curbweight > 2282.00
## | | | | | | | | |--- value: [9095.00]
## | | | | | | |--- boreratio > 3.23
## | | | | | | | |--- highwaympg <= 32.50
## | | | | | | | | |--- value: [7463.00]
## | | | | | | | |--- highwaympg > 32.50
## | | | | | | | | |--- stroke <= 3.00
## | | | | | | | | | |--- value: [7775.00]
## | | | | | | | | |--- stroke > 3.00
## | | | | | | | | | |--- value: [7898.00]
## | | | | | |--- highwaympg > 36.50
## | | | | | | |--- compressionratio <= 16.25
## | | | | | | | |--- compressionratio <= 9.25
## | | | | | | | | |--- value: [6918.00]
## | | | | | | | |--- compressionratio > 9.25
## | | | | | | | | |--- value: [7126.00]
## | | | | | | |--- compressionratio > 16.25
## | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | |--- value: [7995.00]
## | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | |--- value: [7775.00]
## | | |--- curbweight > 2295.00
## | | | |--- peakrpm <= 5350.00
## | | | | |--- wheelbase <= 96.95
## | | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | | |--- curbweight <= 2385.00
## | | | | | | | |--- value: [6989.00]
## | | | | | | |--- curbweight > 2385.00
## | | | | | | | |--- boreratio <= 3.48
## | | | | | | | | |--- value: [8189.00]
## | | | | | | | |--- boreratio > 3.48
## | | | | | | | | |--- value: [8013.00]
## | | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | | |--- value: [9895.00]
## | | | | |--- wheelbase > 96.95
## | | | | | |--- curbweight <= 2412.00
## | | | | | | |--- symboling <= 0.50
## | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | |--- value: [9549.00]
## | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | |--- curbweight <= 2355.50
## | | | | | | | | | |--- stroke <= 3.50
## | | | | | | | | | | |--- value: [8949.00]
## | | | | | | | | | |--- stroke > 3.50
## | | | | | | | | | | |--- value: [8948.00]
## | | | | | | | | |--- curbweight > 2355.50
## | | | | | | | | | |--- carheight <= 54.90
## | | | | | | | | | | |--- value: [9233.00]
## | | | | | | | | | |--- carheight > 54.90
## | | | | | | | | | | |--- value: [9370.00]
## | | | | | | |--- symboling > 0.50
## | | | | | | | |--- fuelsystem_idi <= 0.50
## | | | | | | | | |--- value: [9720.00]
## | | | | | | | |--- fuelsystem_idi > 0.50
## | | | | | | | | |--- value: [9495.00]
## | | | | | |--- curbweight > 2412.00
## | | | | | | |--- curbweight <= 2522.50
## | | | | | | | |--- curbweight <= 2419.50
## | | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | | |--- value: [10898.00]
## | | | | | | | |--- curbweight > 2419.50
## | | | | | | | | |--- fuelsystem_idi <= 0.50
## | | | | | | | | | |--- wheelbase <= 97.90
## | | | | | | | | | | |--- value: [11259.00]
## | | | | | | | | | |--- wheelbase > 97.90
## | | | | | | | | | | |--- symboling <= -0.50
## | | | | | | | | | | | |--- value: [11248.00]
## | | | | | | | | | | |--- symboling > -0.50
## | | | | | | | | | | | |--- value: [11245.00]
## | | | | | | | | |--- fuelsystem_idi > 0.50
## | | | | | | | | | |--- value: [10698.00]
## | | | | | | |--- curbweight > 2522.50
## | | | | | | | |--- curbweight <= 2538.00
## | | | | | | | | |--- fuelsystem_mpfi <= 0.50
## | | | | | | | | | |--- value: [8921.00]
## | | | | | | | | |--- fuelsystem_mpfi > 0.50
## | | | | | | | | | |--- value: [9639.00]
## | | | | | | | |--- curbweight > 2538.00
## | | | | | | | | |--- value: [8449.00]
## | | | |--- peakrpm > 5350.00
## | | | | |--- carlength <= 176.00
## | | | | | |--- citympg <= 20.00
## | | | | | | |--- curbweight <= 2382.50
## | | | | | | | |--- value: [11395.00]
## | | | | | | |--- curbweight > 2382.50
## | | | | | | | |--- fuelsystem_4bbl <= 0.50
## | | | | | | | | |--- value: [15645.00]
## | | | | | | | |--- fuelsystem_4bbl > 0.50
## | | | | | | | | |--- value: [13645.00]
## | | | | | |--- citympg > 20.00
## | | | | | | |--- carwidth <= 63.25
## | | | | | | | |--- value: [10295.00]
## | | | | | | |--- carwidth > 63.25
## | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | |--- wheelbase <= 95.40
## | | | | | | | | | |--- value: [9538.00]
## | | | | | | | | |--- wheelbase > 95.40
## | | | | | | | | | |--- value: [9959.00]
## | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | |--- horsepower <= 101.00
## | | | | | | | | | |--- value: [8845.00]
## | | | | | | | | |--- horsepower > 101.00
## | | | | | | | | | |--- value: [9279.00]
## | | | | |--- carlength > 176.00
## | | | | | |--- enginesize <= 108.50
## | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | |--- value: [16925.00]
## | | | | | | |--- doornumber_two > 0.50
## | | | | | | | |--- value: [16430.00]
## | | | | | |--- enginesize > 108.50
## | | | | | | |--- carwidth <= 66.25
## | | | | | | | |--- value: [13950.00]
## | | | | | | |--- carwidth > 66.25
## | | | | | | | |--- value: [15250.00]
## | |--- curbweight > 2544.00
## | | |--- wheelbase <= 100.80
## | | | |--- horsepower <= 153.00
## | | | | |--- citympg <= 22.00
## | | | | | |--- cylindernumber_five <= 0.50
## | | | | | | |--- stroke <= 2.88
## | | | | | | | |--- drivewheel_rwd <= 0.50
## | | | | | | | | |--- value: [15040.00]
## | | | | | | | |--- drivewheel_rwd > 0.50
## | | | | | | | | |--- value: [14997.50]
## | | | | | | |--- stroke > 2.88
## | | | | | | | |--- curbweight <= 2726.50
## | | | | | | | | |--- doornumber_two <= 0.50
## | | | | | | | | | |--- value: [12170.00]
## | | | | | | | | |--- doornumber_two > 0.50
## | | | | | | | | | |--- value: [11850.00]
## | | | | | | | |--- curbweight > 2726.50
## | | | | | | | | |--- carheight <= 55.60
## | | | | | | | | | |--- curbweight <= 2877.00
## | | | | | | | | | | |--- stroke <= 3.88
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- stroke > 3.88
## | | | | | | | | | | | |--- value: [12964.00]
## | | | | | | | | | |--- curbweight > 2877.00
## | | | | | | | | | | |--- wheelbase <= 98.15
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- wheelbase > 98.15
## | | | | | | | | | | | |--- value: [13499.00]
## | | | | | | | | |--- carheight > 55.60
## | | | | | | | | | |--- highwaympg <= 25.00
## | | | | | | | | | | |--- value: [14399.00]
## | | | | | | | | | |--- highwaympg > 25.00
## | | | | | | | | | | |--- value: [15510.00]
## | | | | | |--- cylindernumber_five > 0.50
## | | | | | | |--- value: [17450.00]
## | | | | |--- citympg > 22.00
## | | | | | |--- wheelbase <= 95.85
## | | | | | | |--- value: [8778.00]
## | | | | | |--- wheelbase > 95.85
## | | | | | | |--- curbweight <= 2854.50
## | | | | | | | |--- carbody_sedan <= 0.50
## | | | | | | | | |--- curbweight <= 2557.00
## | | | | | | | | | |--- value: [9989.00]
## | | | | | | | | |--- curbweight > 2557.00
## | | | | | | | | | |--- highwaympg <= 30.50
## | | | | | | | | | | |--- boreratio <= 3.52
## | | | | | | | | | | | |--- value: [11048.00]
## | | | | | | | | | | |--- boreratio > 3.52
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- highwaympg > 30.50
## | | | | | | | | | | |--- value: [12290.00]
## | | | | | | | |--- carbody_sedan > 0.50
## | | | | | | | | |--- value: [13845.00]
## | | | | | | |--- curbweight > 2854.50
## | | | | | | | |--- value: [17669.00]
## | | | |--- horsepower > 153.00
## | | | | |--- carlength <= 174.85
## | | | | | |--- highwaympg <= 25.50
## | | | | | | |--- value: [17199.00]
## | | | | | |--- highwaympg > 25.50
## | | | | | | |--- value: [16500.00]
## | | | | |--- carlength > 174.85
## | | | | | |--- enginesize <= 151.00
## | | | | | | |--- value: [18620.00]
## | | | | | |--- enginesize > 151.00
## | | | | | | |--- value: [18399.00]
## | | |--- wheelbase > 100.80
## | | | |--- carheight <= 56.10
## | | | | |--- horsepower <= 141.00
## | | | | | |--- curbweight <= 2983.00
## | | | | | | |--- horsepower <= 120.50
## | | | | | | | |--- curbweight <= 2899.00
## | | | | | | | | |--- cylindernumber_four <= 0.50
## | | | | | | | | | |--- value: [17710.00]
## | | | | | | | | |--- cylindernumber_four > 0.50
## | | | | | | | | | |--- value: [18280.00]
## | | | | | | | |--- curbweight > 2899.00
## | | | | | | | | |--- value: [18920.00]
## | | | | | | |--- horsepower > 120.50
## | | | | | | | |--- value: [21105.00]
## | | | | | |--- curbweight > 2983.00
## | | | | | | |--- wheelbase <= 107.45
## | | | | | | | |--- drivewheel_rwd <= 0.50
## | | | | | | | | |--- value: [23875.00]
## | | | | | | | |--- drivewheel_rwd > 0.50
## | | | | | | | | |--- value: [24565.00]
## | | | | | | |--- wheelbase > 107.45
## | | | | | | | |--- enginesize <= 159.00
## | | | | | | | | |--- curbweight <= 3139.50
## | | | | | | | | | |--- value: [22625.00]
## | | | | | | | | |--- curbweight > 3139.50
## | | | | | | | | | |--- value: [22470.00]
## | | | | | | | |--- enginesize > 159.00
## | | | | | | | | |--- value: [21485.00]
## | | | | |--- horsepower > 141.00
## | | | | | |--- carwidth <= 68.15
## | | | | | | |--- enginetype_ohc <= 0.50
## | | | | | | | |--- carlength <= 185.65
## | | | | | | | | |--- value: [15998.00]
## | | | | | | | |--- carlength > 185.65
## | | | | | | | | |--- enginesize <= 166.00
## | | | | | | | | | |--- value: [15750.00]
## | | | | | | | | |--- enginesize > 166.00
## | | | | | | | | | |--- value: [15690.00]
## | | | | | | |--- enginetype_ohc > 0.50
## | | | | | | | |--- value: [16503.00]
## | | | | | |--- carwidth > 68.15
## | | | | | | |--- compressionratio <= 7.85
## | | | | | | | |--- value: [18150.00]
## | | | | | | |--- compressionratio > 7.85
## | | | | | | | |--- value: [19045.00]
## | | | |--- carheight > 56.10
## | | | | |--- aspiration_turbo <= 0.50
## | | | | | |--- curbweight <= 3038.00
## | | | | | | |--- citympg <= 23.50
## | | | | | | | |--- citympg <= 21.00
## | | | | | | | | |--- value: [11900.00]
## | | | | | | | |--- citympg > 21.00
## | | | | | | | | |--- curbweight <= 2973.00
## | | | | | | | | | |--- value: [12940.00]
## | | | | | | | | |--- curbweight > 2973.00
## | | | | | | | | | |--- value: [13415.00]
## | | | | | | |--- citympg > 23.50
## | | | | | | | |--- value: [15985.00]
## | | | | | |--- curbweight > 3038.00
## | | | | | | |--- carheight <= 58.10
## | | | | | | | |--- carlength <= 187.75
## | | | | | | | | |--- stroke <= 2.69
## | | | | | | | | | |--- value: [15580.00]
## | | | | | | | | |--- stroke > 2.69
## | | | | | | | | | |--- value: [16630.00]
## | | | | | | | |--- carlength > 187.75
## | | | | | | | | |--- enginetype_ohc <= 0.50
## | | | | | | | | | |--- value: [16695.00]
## | | | | | | | | |--- enginetype_ohc > 0.50
## | | | | | | | | | |--- value: [16515.00]
## | | | | | | |--- carheight > 58.10
## | | | | | | | |--- value: [12440.00]
## | | | | |--- aspiration_turbo > 0.50
## | | | | | |--- highwaympg <= 23.50
## | | | | | | |--- symboling <= -1.50
## | | | | | | | |--- value: [18420.00]
## | | | | | | |--- symboling > -1.50
## | | | | | | | |--- value: [18950.00]
## | | | | | |--- highwaympg > 23.50
## | | | | | | |--- highwaympg <= 29.00
## | | | | | | | |--- value: [17075.00]
## | | | | | | |--- highwaympg > 29.00
## | | | | | | | |--- value: [17425.00]
## |--- enginesize > 182.00
## | |--- compressionratio <= 8.05
## | | |--- carbody_sedan <= 0.50
## | | | |--- value: [45400.00]
## | | |--- carbody_sedan > 0.50
## | | | |--- carlength <= 195.40
## | | | | |--- value: [41315.00]
## | | | |--- carlength > 195.40
## | | | | |--- carheight <= 56.50
## | | | | | |--- value: [36880.00]
## | | | | |--- carheight > 56.50
## | | | | | |--- value: [40960.00]
## | |--- compressionratio > 8.05
## | | |--- fuelsystem_idi <= 0.50
## | | | |--- carheight <= 50.65
## | | | | |--- value: [31400.50]
## | | | |--- carheight > 50.65
## | | | | |--- carheight <= 51.20
## | | | | | |--- value: [35056.00]
## | | | | |--- carheight > 51.20
## | | | | | |--- compressionratio <= 8.90
## | | | | | | |--- highwaympg <= 18.50
## | | | | | | | |--- value: [34184.00]
## | | | | | | |--- highwaympg > 18.50
## | | | | | | | |--- value: [33900.00]
## | | | | | |--- compressionratio > 8.90
## | | | | | | |--- value: [33278.00]
## | | |--- fuelsystem_idi > 0.50
## | | | |--- carwidth <= 71.00
## | | | | |--- carlength <= 189.20
## | | | | | |--- value: [28176.00]
## | | | | |--- carlength > 189.20
## | | | | | |--- value: [25552.00]
## | | | |--- carwidth > 71.00
## | | | | |--- value: [31600.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos_dummis.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 6.714462e-01
## 5 curbweight 2.170050e-01
## 9 compressionratio 2.306875e-02
## 1 wheelbase 2.222260e-02
## 10 horsepower 1.377919e-02
## 4 carheight 1.242327e-02
## 2 carlength 7.705249e-03
## 11 peakrpm 6.253695e-03
## 12 citympg 6.199828e-03
## 38 fuelsystem_idi 5.362574e-03
## 15 aspiration_turbo 3.143578e-03
## 19 carbody_sedan 3.001312e-03
## 3 carwidth 2.608831e-03
## 13 highwaympg 1.874106e-03
## 30 cylindernumber_five 1.179311e-03
## 18 carbody_hatchback 1.153570e-03
## 8 stroke 5.788389e-04
## 40 fuelsystem_mpfi 3.645312e-04
## 37 fuelsystem_4bbl 1.932777e-04
## 0 symboling 1.629297e-04
## 7 boreratio 1.055430e-04
## 16 doornumber_two 7.912552e-05
## 26 enginetype_ohc 3.610621e-05
## 22 drivewheel_rwd 2.501778e-05
## 31 cylindernumber_four 1.569898e-05
## 17 carbody_hardtop 1.150043e-05
## 27 enginetype_ohcf 2.790930e-07
## 20 carbody_wagon 1.022439e-07
## 39 fuelsystem_mfi 0.000000e+00
## 34 cylindernumber_twelve 0.000000e+00
## 36 fuelsystem_2bbl 0.000000e+00
## 41 fuelsystem_spdi 0.000000e+00
## 35 cylindernumber_two 0.000000e+00
## 21 drivewheel_fwd 0.000000e+00
## 33 cylindernumber_three 0.000000e+00
## 32 cylindernumber_six 0.000000e+00
## 29 enginetype_rotor 0.000000e+00
## 28 enginetype_ohcv 0.000000e+00
## 25 enginetype_l 0.000000e+00
## 24 enginetype_dohcv 0.000000e+00
## 23 enginelocation_rear 0.000000e+00
## 14 fueltype_gas 0.000000e+00
## 42 fuelsystem_spfi 0.000000e+00
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, citympg, peakrpm, y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 6849. , 8845. , 18620. , 15998. , 17450. , 8845. , 6338. ,
## 12764. , 25552. , 7995. , 9980. , 17075. , 33278. , 21105. ,
## 5572. , 9095. , 12290. , 31400.5, 7957. , 9095. , 17425. ,
## 18920. , 7129. , 18280. , 17199. , 8949. , 5195. , 7463. ,
## 6229. , 7126. , 6338. , 7129. , 41315. , 10698. , 6989. ,
## 8921. , 18620. , 7349. , 6338. , 11259. , 6989. ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 91 1 94.5 ... 6649.000 6849.0
## 188 2 97.3 ... 9995.000 8845.0
## 136 3 99.1 ... 18150.000 18620.0
## 178 3 102.9 ... 16558.000 15998.0
## 191 0 100.4 ... 13295.000 17450.0
## 41 0 96.5 ... 12945.000 8845.0
## 138 2 93.7 ... 5118.000 6338.0
## 125 3 94.5 ... 22018.000 12764.0
## 68 -1 110.0 ... 28248.000 25552.0
## 159 0 95.7 ... 7788.000 7995.0
## 165 1 94.5 ... 9298.000 9980.0
## 110 0 114.2 ... 13860.000 17075.0
## 128 3 89.5 ... 37028.000 33278.0
## 12 0 101.2 ... 20970.000 21105.0
## 21 1 93.7 ... 5572.000 5572.0
## 42 1 96.5 ... 10345.000 9095.0
## 130 0 96.1 ... 9295.000 12290.0
## 49 0 102.0 ... 36000.000 31400.5
## 79 1 93.0 ... 7689.000 7957.0
## 154 0 95.7 ... 7898.000 9095.0
## 108 0 107.9 ... 13200.000 17425.0
## 200 -1 109.1 ... 16845.000 18920.0
## 31 2 86.6 ... 6855.000 7129.0
## 66 0 104.9 ... 18344.000 18280.0
## 105 3 91.3 ... 19699.000 17199.0
## 143 0 97.2 ... 9960.000 8949.0
## 157 0 95.7 ... 7198.000 5195.0
## 140 2 93.3 ... 7603.000 7463.0
## 78 2 93.7 ... 6669.000 6229.0
## 122 1 93.7 ... 7609.000 7126.0
## 150 1 95.7 ... 5348.000 6338.0
## 33 1 93.7 ... 6529.000 7129.0
## 15 0 103.5 ... 30760.000 41315.0
## 63 0 98.8 ... 10795.000 10698.0
## 43 0 94.3 ... 6785.000 6989.0
## 28 -1 103.3 ... 8921.000 8921.0
## 9 0 99.5 ... 17859.167 18620.0
## 97 1 94.5 ... 7999.000 7349.0
## 95 1 94.5 ... 7799.000 6338.0
## 147 0 97.0 ... 10198.000 11259.0
## 81 3 96.3 ... 8499.000 6989.0
##
## [41 rows x 45 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2887.266992410073
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2887.266992410073
Se construye el modelo de árbol de regresión (ar). Semilla 1270 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1270)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1270)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_estimators=20, random_state=1270)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 6773.475 , 9918.9 , 16995.15 , 15886.15 ,
## 14693.25 , 13149.15 , 6962.05 , 14531.25 ,
## 27242.7 , 7503.9 , 9446.25 , 16894.25 ,
## 33227.85 , 19542.75 , 5712.9 , 9644.6 ,
## 11217.05 , 34100.55 , 8159.75 , 7804.5 ,
## 17365.9 , 18593.5 , 6773.375 , 11945.5 ,
## 17081.05 , 8990.05 , 7345.75 , 8331.1 ,
## 6438.28333333, 7588.05 , 6299. , 6901.2 ,
## 38980.7 , 10003.6 , 9407.15 , 9189.3 ,
## 17632.35 , 7280.575 , 6822.4 , 10253.95 ,
## 8232. ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 91 1 94.5 ... 6649.000 6773.475000
## 188 2 97.3 ... 9995.000 9918.900000
## 136 3 99.1 ... 18150.000 16995.150000
## 178 3 102.9 ... 16558.000 15886.150000
## 191 0 100.4 ... 13295.000 14693.250000
## 41 0 96.5 ... 12945.000 13149.150000
## 138 2 93.7 ... 5118.000 6962.050000
## 125 3 94.5 ... 22018.000 14531.250000
## 68 -1 110.0 ... 28248.000 27242.700000
## 159 0 95.7 ... 7788.000 7503.900000
## 165 1 94.5 ... 9298.000 9446.250000
## 110 0 114.2 ... 13860.000 16894.250000
## 128 3 89.5 ... 37028.000 33227.850000
## 12 0 101.2 ... 20970.000 19542.750000
## 21 1 93.7 ... 5572.000 5712.900000
## 42 1 96.5 ... 10345.000 9644.600000
## 130 0 96.1 ... 9295.000 11217.050000
## 49 0 102.0 ... 36000.000 34100.550000
## 79 1 93.0 ... 7689.000 8159.750000
## 154 0 95.7 ... 7898.000 7804.500000
## 108 0 107.9 ... 13200.000 17365.900000
## 200 -1 109.1 ... 16845.000 18593.500000
## 31 2 86.6 ... 6855.000 6773.375000
## 66 0 104.9 ... 18344.000 11945.500000
## 105 3 91.3 ... 19699.000 17081.050000
## 143 0 97.2 ... 9960.000 8990.050000
## 157 0 95.7 ... 7198.000 7345.750000
## 140 2 93.3 ... 7603.000 8331.100000
## 78 2 93.7 ... 6669.000 6438.283333
## 122 1 93.7 ... 7609.000 7588.050000
## 150 1 95.7 ... 5348.000 6299.000000
## 33 1 93.7 ... 6529.000 6901.200000
## 15 0 103.5 ... 30760.000 38980.700000
## 63 0 98.8 ... 10795.000 10003.600000
## 43 0 94.3 ... 6785.000 9407.150000
## 28 -1 103.3 ... 8921.000 9189.300000
## 9 0 99.5 ... 17859.167 17632.350000
## 97 1 94.5 ... 7999.000 7280.575000
## 95 1 94.5 ... 7799.000 6822.400000
## 147 0 97.0 ... 10198.000 10253.950000
## 81 3 96.3 ... 8499.000 8232.000000
##
## [41 rows x 45 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2447.738067029221
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2447.738067029221
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 91 1 94.5 ... 6849.0 6773.475000
## 188 2 97.3 ... 8845.0 9918.900000
## 136 3 99.1 ... 18620.0 16995.150000
## 178 3 102.9 ... 15998.0 15886.150000
## 191 0 100.4 ... 17450.0 14693.250000
## 41 0 96.5 ... 8845.0 13149.150000
## 138 2 93.7 ... 6338.0 6962.050000
## 125 3 94.5 ... 12764.0 14531.250000
## 68 -1 110.0 ... 25552.0 27242.700000
## 159 0 95.7 ... 7995.0 7503.900000
## 165 1 94.5 ... 9980.0 9446.250000
## 110 0 114.2 ... 17075.0 16894.250000
## 128 3 89.5 ... 33278.0 33227.850000
## 12 0 101.2 ... 21105.0 19542.750000
## 21 1 93.7 ... 5572.0 5712.900000
## 42 1 96.5 ... 9095.0 9644.600000
## 130 0 96.1 ... 12290.0 11217.050000
## 49 0 102.0 ... 31400.5 34100.550000
## 79 1 93.0 ... 7957.0 8159.750000
## 154 0 95.7 ... 9095.0 7804.500000
## 108 0 107.9 ... 17425.0 17365.900000
## 200 -1 109.1 ... 18920.0 18593.500000
## 31 2 86.6 ... 7129.0 6773.375000
## 66 0 104.9 ... 18280.0 11945.500000
## 105 3 91.3 ... 17199.0 17081.050000
## 143 0 97.2 ... 8949.0 8990.050000
## 157 0 95.7 ... 5195.0 7345.750000
## 140 2 93.3 ... 7463.0 8331.100000
## 78 2 93.7 ... 6229.0 6438.283333
## 122 1 93.7 ... 7126.0 7588.050000
## 150 1 95.7 ... 6338.0 6299.000000
## 33 1 93.7 ... 7129.0 6901.200000
## 15 0 103.5 ... 41315.0 38980.700000
## 63 0 98.8 ... 10698.0 10003.600000
## 43 0 94.3 ... 6989.0 9407.150000
## 28 -1 103.3 ... 8921.0 9189.300000
## 9 0 99.5 ... 18620.0 17632.350000
## 97 1 94.5 ... 7349.0 7280.575000
## 95 1 94.5 ... 6338.0 6822.400000
## 147 0 97.0 ... 11259.0 10253.950000
## 81 3 96.3 ... 6989.0 8232.000000
##
## [41 rows x 47 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[2518.66531456, 2887.26699241, 2447.73806703]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 2518.665315 2887.266992 2447.738067
Se cambio la semilla propuesta de 1271 a la 1270, esto con el fin de que fuera la misma usada en R, en el cual la semilla 1271 no lograba00 abarcara el maxima cantidad de etiquetas, generando un error. Por lo que se elige 1270 para poder hacer la comparación entre Python y R.
Con nuestra semilla el mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con un rmse de 2447.738067029221, esto usando 80% de datos de entrenamiento y 20% de validación. En orden de resultados los modelos quedaron de la siguiente manera:
Para estos datos podemos concluir que para este analisis el mejor modelo tanto en Python como en R fue el del Random Forest, donde tenemos como mejor lenguaje para este caso fue el obtenido con R, con el valor de 1712.114, con una gran diferencia a lo obtenido en Python con un valor de 2447.738067029221