Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
datos
## Unnamed: 0 symboling wheelbase ... citympg highwaympg price
## 0 1 3 88.6 ... 21 27 13495.0
## 1 2 3 88.6 ... 21 27 16500.0
## 2 3 1 94.5 ... 19 26 16500.0
## 3 4 2 99.8 ... 24 30 13950.0
## 4 5 2 99.4 ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 109.1 ... 23 28 16845.0
## 201 202 -1 109.1 ... 19 25 19045.0
## 202 203 -1 109.1 ... 18 23 21485.0
## 203 204 -1 109.1 ... 26 27 22470.0
## 204 205 -1 109.1 ... 19 25 22625.0
##
## [205 rows x 16 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 16)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## Unnamed: 0 int64
## symboling int64
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginesize int64
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
Dejar solo las variables necesarias:
‘symboling’, ‘wheelbase’, ‘carlength’, ‘carwidth’, ‘carheight’, ‘curbweight’, ‘enginesize’, ‘boreratio’, ‘stroke’, ‘compressionratio’, ‘horsepower’, ‘peakrpm’, ‘citympg’, ‘highwaympg’, ‘price’
datos = datos[['symboling', 'wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price']]
datos.describe()
## symboling wheelbase carlength ... citympg highwaympg price
## count 205.000000 205.000000 205.000000 ... 205.000000 205.000000 205.000000
## mean 0.834146 98.756585 174.049268 ... 25.219512 30.751220 13276.710571
## std 1.245307 6.021776 12.337289 ... 6.542142 6.886443 7988.852332
## min -2.000000 86.600000 141.100000 ... 13.000000 16.000000 5118.000000
## 25% 0.000000 94.500000 166.300000 ... 19.000000 25.000000 7788.000000
## 50% 1.000000 97.000000 173.200000 ... 24.000000 30.000000 10295.000000
## 75% 2.000000 102.400000 183.100000 ... 30.000000 34.000000 16503.000000
## max 3.000000 120.900000 208.100000 ... 49.000000 54.000000 45400.000000
##
## [8 rows x 15 columns]
datos
## symboling wheelbase carlength ... citympg highwaympg price
## 0 3 88.6 168.8 ... 21 27 13495.0
## 1 3 88.6 168.8 ... 21 27 16500.0
## 2 1 94.5 171.2 ... 19 26 16500.0
## 3 2 99.8 176.6 ... 24 30 13950.0
## 4 2 99.4 176.6 ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 109.1 188.8 ... 23 28 16845.0
## 201 -1 109.1 188.8 ... 19 25 19045.0
## 202 -1 109.1 188.8 ... 18 23 21485.0
## 203 -1 109.1 188.8 ... 26 27 22470.0
## 204 -1 109.1 188.8 ... 19 25 22625.0
##
## [205 rows x 15 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 2022
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos.drop(columns = "price"), datos['price'],train_size = 0.80, random_state = 2022)
X_entrena
## symboling wheelbase carlength ... peakrpm citympg highwaympg
## 152 1 95.7 158.7 ... 4800 31 38
## 185 2 97.3 171.7 ... 5250 27 34
## 162 0 95.7 166.3 ... 4800 28 34
## 47 0 113.0 199.6 ... 4750 15 19
## 163 1 94.5 168.7 ... 4800 29 34
## .. ... ... ... ... ... ... ...
## 183 2 97.3 171.7 ... 5250 27 34
## 177 -1 102.4 175.6 ... 4200 27 32
## 112 0 107.9 186.7 ... 4150 28 33
## 173 -1 102.4 175.6 ... 4200 29 34
## 125 3 94.5 168.9 ... 5500 19 27
##
## [164 rows x 14 columns]
X_valida
## symboling wheelbase carlength ... peakrpm citympg highwaympg
## 36 0 96.5 157.1 ... 6000 30 34
## 198 -2 104.3 188.8 ... 5100 17 22
## 102 0 100.4 184.6 ... 5200 17 22
## 146 0 97.0 173.5 ... 4800 28 32
## 79 1 93.0 157.3 ... 5500 24 30
## 32 1 93.7 150.0 ... 5500 38 42
## 107 0 107.9 186.7 ... 5000 19 24
## 180 -1 104.5 187.8 ... 5200 20 24
## 127 3 89.5 168.9 ... 5900 17 25
## 149 0 96.9 173.6 ... 4800 23 23
## 43 0 94.3 170.7 ... 4800 24 29
## 40 0 96.5 175.4 ... 5800 27 33
## 203 -1 109.1 188.8 ... 4800 26 27
## 138 2 93.7 156.9 ... 4900 31 36
## 201 -1 109.1 188.8 ... 5300 19 25
## 20 0 94.5 158.8 ... 5400 38 43
## 164 1 94.5 168.7 ... 4800 29 34
## 65 0 104.9 175.0 ... 5000 19 27
## 22 1 93.7 157.3 ... 5500 31 38
## 186 2 97.3 171.7 ... 5250 27 34
## 106 1 99.2 178.5 ... 5200 19 25
## 156 0 95.7 166.3 ... 4800 30 37
## 111 0 107.9 186.7 ... 5000 19 24
## 68 -1 110.0 190.9 ... 4350 22 25
## 123 -1 103.3 174.6 ... 5000 24 30
## 108 0 107.9 186.7 ... 4150 28 33
## 78 2 93.7 157.3 ... 5500 31 38
## 8 1 105.8 192.7 ... 5500 17 20
## 74 1 112.0 199.2 ... 4500 14 16
## 10 2 101.2 176.8 ... 5800 23 29
## 113 0 114.2 198.9 ... 5000 19 24
## 82 3 95.9 173.2 ... 5000 19 24
## 57 3 95.3 169.0 ... 6000 17 23
## 158 0 95.7 166.3 ... 4500 34 36
## 58 3 95.3 169.0 ... 6000 16 23
## 17 0 110.0 197.0 ... 5400 15 20
## 129 1 98.4 175.7 ... 5750 17 28
## 150 1 95.7 158.7 ... 4800 35 39
## 73 0 120.9 208.1 ... 4500 14 16
## 116 0 107.9 186.7 ... 4150 28 33
## 30 2 86.6 144.6 ... 4800 49 54
##
## [41 rows x 14 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
## LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 9.52673247e+01, 1.81953839e+02, -1.28460128e+02, 2.59733294e+02,
## 1.73000745e+02, 4.28258282e+00, 1.06140749e+02, -8.55114242e+02,
## -3.19867165e+03, 2.59655471e+02, 2.92332989e+01, 2.06775583e+00,
## -4.56614472e+02, 3.82529624e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.8347922699728584
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [ 8868.95979292 16715.53931972 23107.53653894 8818.03421416
## 8623.47306273 5461.51958231 15048.03395087 20678.17706935
## 26577.0976254 10007.94335078 7167.58858707 8902.49607572
## 19625.68457485 8829.47097854 19927.5491719 5757.26535678
## 6290.35140508 17189.29706731 6650.17766836 10005.933005
## 22820.3487011 7059.04332282 18423.78105848 25736.16593187
## 12360.08359525 18730.60224629 7661.73706868 17339.95676112
## 37325.85810799 13165.15590746 18902.21907465 15028.83777249
## 8237.19412365 8338.33771142 11241.64527253 28938.56728162
## 34898.22431444 5502.18614768 39070.50660777 18966.14430163]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 36 0 96.5 ... 7295.0 8868.959793
## 198 -2 104.3 ... 18420.0 16715.539320
## 102 0 100.4 ... 14399.0 23107.536539
## 146 0 97.0 ... 7463.0 8818.034214
## 79 1 93.0 ... 7689.0 8623.473063
## 32 1 93.7 ... 5399.0 5461.519582
## 107 0 107.9 ... 11900.0 15048.033951
## 180 -1 104.5 ... 15690.0 20678.177069
## 127 3 89.5 ... 34028.0 26577.097625
## 149 0 96.9 ... 11694.0 10007.943351
## 43 0 94.3 ... 6785.0 7167.588587
## 40 0 96.5 ... 10295.0 8902.496076
## 203 -1 109.1 ... 22470.0 19625.684575
## 138 2 93.7 ... 5118.0 8829.470979
## 201 -1 109.1 ... 19045.0 19927.549172
## 20 0 94.5 ... 6575.0 5757.265357
## 164 1 94.5 ... 8238.0 6290.351405
## 65 0 104.9 ... 18280.0 17189.297067
## 22 1 93.7 ... 6377.0 6650.177668
## 186 2 97.3 ... 8495.0 10005.933005
## 106 1 99.2 ... 18399.0 22820.348701
## 156 0 95.7 ... 6938.0 7059.043323
## 111 0 107.9 ... 15580.0 18423.781058
## 68 -1 110.0 ... 28248.0 25736.165932
## 123 -1 103.3 ... 8921.0 12360.083595
## 108 0 107.9 ... 13200.0 18730.602246
## 78 2 93.7 ... 6669.0 7661.737069
## 8 1 105.8 ... 23875.0 17339.956761
## 74 1 112.0 ... 45400.0 37325.858108
## 10 2 101.2 ... 16430.0 13165.155907
## 113 0 114.2 ... 16695.0 18902.219075
## 82 3 95.9 ... 12629.0 15028.837772
## 57 3 95.3 ... 13645.0 8237.194124
## 158 0 95.7 ... 7898.0 8338.337711
## 58 3 95.3 ... 15645.0 11241.645273
## 17 0 110.0 ... 36880.0 28938.567282
## 129 1 98.4 ... 31400.5 34898.224314
## 150 1 95.7 ... 5348.0 5502.186148
## 73 0 120.9 ... 40960.0 39070.506608
## 116 0 107.9 ... 17950.0 18966.144302
## 30 2 86.6 ... 6479.0 2314.338652
##
## [41 rows x 16 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3703.892330296177
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3703.892330296177
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 2022
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
## DecisionTreeRegressor(random_state=2022)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 152
plot = plot_tree(
decision_tree = modelo_ar,
feature_names = datos.drop(columns = "price").columns,
class_names = 'price',
filled = True,
impurity = False,
fontsize = 10,
precision = 2,
ax = ax
)
plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2697.50
## | | |--- curbweight <= 2291.50
## | | | |--- citympg <= 29.50
## | | | | |--- symboling <= 2.50
## | | | | | |--- peakrpm <= 4600.00
## | | | | | | |--- carwidth <= 63.70
## | | | | | | | |--- value: [7053.00]
## | | | | | | |--- carwidth > 63.70
## | | | | | | | |--- carheight <= 54.10
## | | | | | | | | |--- value: [7775.00]
## | | | | | | | |--- carheight > 54.10
## | | | | | | | | |--- value: [7603.00]
## | | | | | |--- peakrpm > 4600.00
## | | | | | | |--- highwaympg <= 29.50
## | | | | | | | |--- value: [9298.00]
## | | | | | | |--- highwaympg > 29.50
## | | | | | | | |--- carlength <= 168.10
## | | | | | | | | |--- curbweight <= 2134.00
## | | | | | | | | | |--- carheight <= 51.80
## | | | | | | | | | | |--- value: [7957.00]
## | | | | | | | | | |--- carheight > 51.80
## | | | | | | | | | | |--- value: [8358.00]
## | | | | | | | | |--- curbweight > 2134.00
## | | | | | | | | | |--- citympg <= 27.50
## | | | | | | | | | | |--- curbweight <= 2262.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2262.50
## | | | | | | | | | | | |--- value: [9095.00]
## | | | | | | | | | |--- citympg > 27.50
## | | | | | | | | | | |--- value: [9258.00]
## | | | | | | | |--- carlength > 168.10
## | | | | | | | | |--- carwidth <= 63.80
## | | | | | | | | | |--- value: [7898.00]
## | | | | | | | | |--- carwidth > 63.80
## | | | | | | | | | |--- curbweight <= 2210.50
## | | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | | |--- value: [8058.00]
## | | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | | |--- value: [7975.00]
## | | | | | | | | | |--- curbweight > 2210.50
## | | | | | | | | | | |--- value: [8195.00]
## | | | | |--- symboling > 2.50
## | | | | | |--- carheight <= 53.50
## | | | | | | |--- value: [9980.00]
## | | | | | |--- carheight > 53.50
## | | | | | | |--- value: [11595.00]
## | | | |--- citympg > 29.50
## | | | | |--- wheelbase <= 94.10
## | | | | | |--- highwaympg <= 39.50
## | | | | | | |--- highwaympg <= 32.50
## | | | | | | | |--- value: [5195.00]
## | | | | | | |--- highwaympg > 32.50
## | | | | | | | |--- curbweight <= 1978.00
## | | | | | | | | |--- compressionratio <= 9.30
## | | | | | | | | | |--- curbweight <= 1947.50
## | | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | | |--- truncated branch of depth 4
## | | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | | |--- value: [6855.00]
## | | | | | | | | | |--- curbweight > 1947.50
## | | | | | | | | | | |--- highwaympg <= 36.00
## | | | | | | | | | | | |--- value: [7129.00]
## | | | | | | | | | | |--- highwaympg > 36.00
## | | | | | | | | | | | |--- value: [7395.00]
## | | | | | | | | |--- compressionratio > 9.30
## | | | | | | | | | |--- curbweight <= 1955.50
## | | | | | | | | | | |--- value: [6189.00]
## | | | | | | | | | |--- curbweight > 1955.50
## | | | | | | | | | | |--- value: [6229.00]
## | | | | | | | |--- curbweight > 1978.00
## | | | | | | | | |--- enginesize <= 94.00
## | | | | | | | | | |--- carlength <= 162.30
## | | | | | | | | | | |--- value: [7150.50]
## | | | | | | | | | |--- carlength > 162.30
## | | | | | | | | | | |--- value: [6692.00]
## | | | | | | | | |--- enginesize > 94.00
## | | | | | | | | | |--- value: [7609.00]
## | | | | | |--- highwaympg > 39.50
## | | | | | | |--- carheight <= 52.00
## | | | | | | | |--- carwidth <= 64.10
## | | | | | | | | |--- value: [5572.00]
## | | | | | | | |--- carwidth > 64.10
## | | | | | | | | |--- value: [5389.00]
## | | | | | | |--- carheight > 52.00
## | | | | | | | |--- value: [5151.00]
## | | | | |--- wheelbase > 94.10
## | | | | | |--- carheight <= 54.00
## | | | | | | |--- curbweight <= 2016.00
## | | | | | | | |--- curbweight <= 1891.50
## | | | | | | | | |--- value: [7605.75]
## | | | | | | | |--- curbweight > 1891.50
## | | | | | | | | |--- highwaympg <= 40.00
## | | | | | | | | | |--- value: [8249.00]
## | | | | | | | | |--- highwaympg > 40.00
## | | | | | | | | | |--- value: [8916.50]
## | | | | | | |--- curbweight > 2016.00
## | | | | | | | |--- horsepower <= 69.50
## | | | | | | | | |--- curbweight <= 2026.00
## | | | | | | | | | |--- value: [7349.00]
## | | | | | | | | |--- curbweight > 2026.00
## | | | | | | | | | |--- carlength <= 168.25
## | | | | | | | | | | |--- curbweight <= 2151.50
## | | | | | | | | | | | |--- value: [7799.00]
## | | | | | | | | | | |--- curbweight > 2151.50
## | | | | | | | | | | | |--- value: [7788.00]
## | | | | | | | | | |--- carlength > 168.25
## | | | | | | | | | | |--- value: [7999.00]
## | | | | | | | |--- horsepower > 69.50
## | | | | | | | | |--- highwaympg <= 42.00
## | | | | | | | | | |--- carheight <= 52.65
## | | | | | | | | | | |--- value: [7126.00]
## | | | | | | | | | |--- carheight > 52.65
## | | | | | | | | | | |--- value: [7198.00]
## | | | | | | | | |--- highwaympg > 42.00
## | | | | | | | | | |--- value: [7738.00]
## | | | | | |--- carheight > 54.00
## | | | | | | |--- curbweight <= 1903.50
## | | | | | | | |--- value: [5499.00]
## | | | | | | |--- curbweight > 1903.50
## | | | | | | | |--- horsepower <= 53.50
## | | | | | | | | |--- curbweight <= 2262.50
## | | | | | | | | | |--- value: [7775.00]
## | | | | | | | | |--- curbweight > 2262.50
## | | | | | | | | | |--- value: [7995.00]
## | | | | | | | |--- horsepower > 53.50
## | | | | | | | | |--- carlength <= 161.05
## | | | | | | | | | |--- curbweight <= 2027.50
## | | | | | | | | | | |--- value: [6488.00]
## | | | | | | | | | |--- curbweight > 2027.50
## | | | | | | | | | | |--- value: [6338.00]
## | | | | | | | | |--- carlength > 161.05
## | | | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | | | |--- curbweight <= 1928.00
## | | | | | | | | | | | |--- value: [6649.00]
## | | | | | | | | | | |--- curbweight > 1928.00
## | | | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | | | |--- peakrpm <= 5000.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- peakrpm > 5000.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | |--- curbweight > 2291.50
## | | | |--- citympg <= 22.00
## | | | | |--- compressionratio <= 9.15
## | | | | | |--- wheelbase <= 100.10
## | | | | | | |--- carlength <= 173.05
## | | | | | | | |--- value: [14997.50]
## | | | | | | |--- carlength > 173.05
## | | | | | | | |--- value: [15250.00]
## | | | | | |--- wheelbase > 100.10
## | | | | | | |--- value: [13295.00]
## | | | | |--- compressionratio > 9.15
## | | | | | |--- citympg <= 19.00
## | | | | | | |--- value: [11395.00]
## | | | | | |--- citympg > 19.00
## | | | | | | |--- symboling <= 2.50
## | | | | | | | |--- value: [12170.00]
## | | | | | | |--- symboling > 2.50
## | | | | | | | |--- value: [11850.00]
## | | | |--- citympg > 22.00
## | | | | |--- wheelbase <= 99.30
## | | | | | |--- curbweight <= 2422.50
## | | | | | | |--- horsepower <= 91.00
## | | | | | | | |--- wheelbase <= 96.95
## | | | | | | | | |--- curbweight <= 2346.50
## | | | | | | | | | |--- wheelbase <= 96.40
## | | | | | | | | | | |--- value: [8499.00]
## | | | | | | | | | |--- wheelbase > 96.40
## | | | | | | | | | | |--- value: [8845.00]
## | | | | | | | | |--- curbweight > 2346.50
## | | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | | |--- value: [6989.00]
## | | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | | |--- carheight <= 53.25
## | | | | | | | | | | | |--- value: [8189.00]
## | | | | | | | | | | |--- carheight > 53.25
## | | | | | | | | | | | |--- value: [8013.00]
## | | | | | | | |--- wheelbase > 96.95
## | | | | | | | | |--- carheight <= 54.00
## | | | | | | | | | |--- value: [9720.00]
## | | | | | | | | |--- carheight > 54.00
## | | | | | | | | | |--- citympg <= 25.00
## | | | | | | | | | | |--- value: [9233.00]
## | | | | | | | | | |--- citympg > 25.00
## | | | | | | | | | | |--- horsepower <= 76.00
## | | | | | | | | | | | |--- value: [9495.00]
## | | | | | | | | | | |--- horsepower > 76.00
## | | | | | | | | | | | |--- value: [9370.00]
## | | | | | | |--- horsepower > 91.00
## | | | | | | | |--- carwidth <= 65.45
## | | | | | | | | |--- horsepower <= 95.50
## | | | | | | | | | |--- value: [9960.00]
## | | | | | | | | |--- horsepower > 95.50
## | | | | | | | | | |--- symboling <= 2.00
## | | | | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- symboling > 2.00
## | | | | | | | | | | |--- value: [9959.00]
## | | | | | | | |--- carwidth > 65.45
## | | | | | | | | |--- compressionratio <= 9.55
## | | | | | | | | | |--- value: [10345.00]
## | | | | | | | | |--- compressionratio > 9.55
## | | | | | | | | | |--- value: [9995.00]
## | | | | | |--- curbweight > 2422.50
## | | | | | | |--- highwaympg <= 28.50
## | | | | | | | |--- value: [12945.00]
## | | | | | | |--- highwaympg > 28.50
## | | | | | | | |--- stroke <= 3.45
## | | | | | | | | |--- peakrpm <= 5000.00
## | | | | | | | | | |--- highwaympg <= 37.00
## | | | | | | | | | | |--- boreratio <= 3.50
## | | | | | | | | | | | |--- value: [11245.00]
## | | | | | | | | | | |--- boreratio > 3.50
## | | | | | | | | | | | |--- value: [11259.00]
## | | | | | | | | | |--- highwaympg > 37.00
## | | | | | | | | | | |--- value: [10795.00]
## | | | | | | | | |--- peakrpm > 5000.00
## | | | | | | | | | |--- value: [10198.00]
## | | | | | | | |--- stroke > 3.45
## | | | | | | | | |--- curbweight <= 2629.00
## | | | | | | | | | |--- curbweight <= 2538.00
## | | | | | | | | | | |--- highwaympg <= 30.50
## | | | | | | | | | | | |--- value: [9639.00]
## | | | | | | | | | | |--- highwaympg > 30.50
## | | | | | | | | | | | |--- value: [9895.00]
## | | | | | | | | | |--- curbweight > 2538.00
## | | | | | | | | | | |--- curbweight <= 2545.50
## | | | | | | | | | | | |--- value: [8449.00]
## | | | | | | | | | | |--- curbweight > 2545.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | |--- curbweight > 2629.00
## | | | | | | | | | |--- value: [11199.00]
## | | | | |--- wheelbase > 99.30
## | | | | | |--- wheelbase <= 101.80
## | | | | | | |--- peakrpm <= 5650.00
## | | | | | | | |--- carlength <= 181.65
## | | | | | | | | |--- curbweight <= 2458.00
## | | | | | | | | | |--- value: [13950.00]
## | | | | | | | | |--- curbweight > 2458.00
## | | | | | | | | | |--- value: [13845.00]
## | | | | | | | |--- carlength > 181.65
## | | | | | | | | |--- value: [12290.00]
## | | | | | | |--- peakrpm > 5650.00
## | | | | | | | |--- value: [16925.00]
## | | | | | |--- wheelbase > 101.80
## | | | | | | |--- wheelbase <= 102.85
## | | | | | | | |--- highwaympg <= 33.50
## | | | | | | | | |--- curbweight <= 2436.00
## | | | | | | | | | |--- carheight <= 54.40
## | | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | | |--- carheight > 54.40
## | | | | | | | | | | |--- value: [10898.00]
## | | | | | | | | |--- curbweight > 2436.00
## | | | | | | | | | |--- stroke <= 3.44
## | | | | | | | | | | |--- value: [10698.00]
## | | | | | | | | | |--- stroke > 3.44
## | | | | | | | | | | |--- value: [11248.00]
## | | | | | | | |--- highwaympg > 33.50
## | | | | | | | | |--- value: [8948.00]
## | | | | | | |--- wheelbase > 102.85
## | | | | | | | |--- value: [8921.00]
## | |--- curbweight > 2697.50
## | | |--- enginesize <= 119.50
## | | | |--- stroke <= 3.13
## | | | | |--- value: [8778.00]
## | | | |--- stroke > 3.13
## | | | | |--- value: [11048.00]
## | | |--- enginesize > 119.50
## | | | |--- peakrpm <= 5450.00
## | | | | |--- peakrpm <= 4525.00
## | | | | | |--- wheelbase <= 104.20
## | | | | | | |--- wheelbase <= 102.35
## | | | | | | | |--- curbweight <= 2737.50
## | | | | | | | | |--- value: [20970.00]
## | | | | | | | |--- curbweight > 2737.50
## | | | | | | | | |--- value: [21105.00]
## | | | | | | |--- wheelbase > 102.35
## | | | | | | | |--- value: [24565.00]
## | | | | | |--- wheelbase > 104.20
## | | | | | | |--- curbweight <= 3341.00
## | | | | | | | |--- stroke <= 3.58
## | | | | | | | | |--- value: [16900.00]
## | | | | | | | |--- stroke > 3.58
## | | | | | | | | |--- value: [18344.00]
## | | | | | | |--- curbweight > 3341.00
## | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | |--- value: [13860.00]
## | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | |--- value: [17075.00]
## | | | | |--- peakrpm > 4525.00
## | | | | | |--- carwidth <= 68.65
## | | | | | | |--- horsepower <= 153.00
## | | | | | | | |--- carheight <= 52.50
## | | | | | | | | |--- curbweight <= 2869.50
## | | | | | | | | | |--- boreratio <= 3.61
## | | | | | | | | | | |--- boreratio <= 3.59
## | | | | | | | | | | | |--- value: [12764.00]
## | | | | | | | | | | |--- boreratio > 3.59
## | | | | | | | | | | | |--- value: [12964.00]
## | | | | | | | | | |--- boreratio > 3.61
## | | | | | | | | | | |--- value: [11549.00]
## | | | | | | | | |--- curbweight > 2869.50
## | | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | | |--- value: [14869.00]
## | | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | | |--- value: [14489.00]
## | | | | | | | |--- carheight > 52.50
## | | | | | | | | |--- citympg <= 23.50
## | | | | | | | | | |--- carlength <= 187.75
## | | | | | | | | | | |--- carlength <= 185.60
## | | | | | | | | | | | |--- value: [13499.00]
## | | | | | | | | | | |--- carlength > 185.60
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- carlength > 187.75
## | | | | | | | | | | |--- carlength <= 193.85
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- carlength > 193.85
## | | | | | | | | | | | |--- value: [12440.00]
## | | | | | | | | |--- citympg > 23.50
## | | | | | | | | | |--- highwaympg <= 29.00
## | | | | | | | | | | |--- carheight <= 56.85
## | | | | | | | | | | | |--- value: [15985.00]
## | | | | | | | | | | |--- carheight > 56.85
## | | | | | | | | | | | |--- value: [16515.00]
## | | | | | | | | | |--- highwaympg > 29.00
## | | | | | | | | | | |--- value: [17669.00]
## | | | | | | |--- horsepower > 153.00
## | | | | | | | |--- compressionratio <= 7.90
## | | | | | | | | |--- symboling <= 1.00
## | | | | | | | | | |--- value: [18950.00]
## | | | | | | | | |--- symboling > 1.00
## | | | | | | | | | |--- value: [19699.00]
## | | | | | | | |--- compressionratio > 7.90
## | | | | | | | | |--- carheight <= 50.85
## | | | | | | | | | |--- value: [17199.00]
## | | | | | | | | |--- carheight > 50.85
## | | | | | | | | | |--- curbweight <= 2996.00
## | | | | | | | | | | |--- citympg <= 19.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- citympg > 19.50
## | | | | | | | | | | | |--- value: [16558.00]
## | | | | | | | | | |--- curbweight > 2996.00
## | | | | | | | | | | |--- carwidth <= 67.10
## | | | | | | | | | | | |--- value: [15750.00]
## | | | | | | | | | | |--- carwidth > 67.10
## | | | | | | | | | | | |--- value: [15998.00]
## | | | | | |--- carwidth > 68.65
## | | | | | | |--- curbweight <= 3007.00
## | | | | | | | |--- value: [16845.00]
## | | | | | | |--- curbweight > 3007.00
## | | | | | | | |--- value: [22625.00]
## | | | |--- peakrpm > 5450.00
## | | | | |--- enginesize <= 143.50
## | | | | | |--- curbweight <= 2845.50
## | | | | | | |--- enginesize <= 128.50
## | | | | | | | |--- value: [18150.00]
## | | | | | | |--- enginesize > 128.50
## | | | | | | | |--- carlength <= 184.65
## | | | | | | | | |--- value: [17450.00]
## | | | | | | | |--- carlength > 184.65
## | | | | | | | | |--- value: [17710.00]
## | | | | | |--- curbweight > 2845.50
## | | | | | | |--- highwaympg <= 24.50
## | | | | | | | |--- highwaympg <= 23.00
## | | | | | | | | |--- value: [17859.17]
## | | | | | | | |--- highwaympg > 23.00
## | | | | | | | | |--- value: [18150.00]
## | | | | | | |--- highwaympg > 24.50
## | | | | | | | |--- carheight <= 55.90
## | | | | | | | | |--- value: [18920.00]
## | | | | | | | |--- carheight > 55.90
## | | | | | | | | |--- value: [18620.00]
## | | | | |--- enginesize > 143.50
## | | | | | |--- citympg <= 18.50
## | | | | | | |--- value: [21485.00]
## | | | | | |--- citympg > 18.50
## | | | | | | |--- value: [22018.00]
## |--- enginesize > 182.00
## | |--- citympg <= 19.50
## | | |--- carheight <= 54.70
## | | | |--- compressionratio <= 8.05
## | | | | |--- value: [41315.00]
## | | | |--- compressionratio > 8.05
## | | | | |--- curbweight <= 2778.00
## | | | | | |--- value: [32528.00]
## | | | | |--- curbweight > 2778.00
## | | | | | |--- stroke <= 3.00
## | | | | | | |--- carlength <= 180.30
## | | | | | | | |--- value: [37028.00]
## | | | | | | |--- carlength > 180.30
## | | | | | | | |--- value: [36000.00]
## | | | | | |--- stroke > 3.00
## | | | | | | |--- citympg <= 15.50
## | | | | | | | |--- value: [33900.00]
## | | | | | | |--- citympg > 15.50
## | | | | | | | |--- value: [35056.00]
## | | |--- carheight > 54.70
## | | | |--- carwidth <= 69.30
## | | | | |--- value: [30760.00]
## | | | |--- carwidth > 69.30
## | | | | |--- value: [34184.00]
## | |--- citympg > 19.50
## | | |--- wheelbase <= 112.80
## | | | |--- carheight <= 55.70
## | | | | |--- value: [28176.00]
## | | | |--- carheight > 55.70
## | | | | |--- value: [25552.00]
## | | |--- wheelbase > 112.80
## | | | |--- value: [31600.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 0.654370
## 5 curbweight 0.257881
## 12 citympg 0.023633
## 1 wheelbase 0.020205
## 11 peakrpm 0.014138
## 9 compressionratio 0.007895
## 10 horsepower 0.005439
## 3 carwidth 0.005062
## 4 carheight 0.003937
## 13 highwaympg 0.002349
## 2 carlength 0.001952
## 8 stroke 0.001505
## 0 symboling 0.001471
## 7 boreratio 0.000165
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, peakrpm, carheight y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([ 6488. , 18950. , 13499. , 8195. , 8558. , 5151. ,
## 16630. , 15750. , 32528. , 12945. , 8499. , 6989. ,
## 22625. , 7609. , 22625. , 8916.5 , 8058. , 13295. ,
## 6189. , 8195. , 17199. , 7198. , 16630. , 25552. ,
## 8921. , 16900. , 7150.5 , 17859.167, 34184. , 16925. ,
## 12440. , 12764. , 11395. , 7788. , 11395. , 34184. ,
## 35056. , 6488. , 34184. , 16900. , 5572. ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 36 0 96.5 ... 7295.0 6488.000
## 198 -2 104.3 ... 18420.0 18950.000
## 102 0 100.4 ... 14399.0 13499.000
## 146 0 97.0 ... 7463.0 8195.000
## 79 1 93.0 ... 7689.0 8558.000
## 32 1 93.7 ... 5399.0 5151.000
## 107 0 107.9 ... 11900.0 16630.000
## 180 -1 104.5 ... 15690.0 15750.000
## 127 3 89.5 ... 34028.0 32528.000
## 149 0 96.9 ... 11694.0 12945.000
## 43 0 94.3 ... 6785.0 8499.000
## 40 0 96.5 ... 10295.0 6989.000
## 203 -1 109.1 ... 22470.0 22625.000
## 138 2 93.7 ... 5118.0 7609.000
## 201 -1 109.1 ... 19045.0 22625.000
## 20 0 94.5 ... 6575.0 8916.500
## 164 1 94.5 ... 8238.0 8058.000
## 65 0 104.9 ... 18280.0 13295.000
## 22 1 93.7 ... 6377.0 6189.000
## 186 2 97.3 ... 8495.0 8195.000
## 106 1 99.2 ... 18399.0 17199.000
## 156 0 95.7 ... 6938.0 7198.000
## 111 0 107.9 ... 15580.0 16630.000
## 68 -1 110.0 ... 28248.0 25552.000
## 123 -1 103.3 ... 8921.0 8921.000
## 108 0 107.9 ... 13200.0 16900.000
## 78 2 93.7 ... 6669.0 7150.500
## 8 1 105.8 ... 23875.0 17859.167
## 74 1 112.0 ... 45400.0 34184.000
## 10 2 101.2 ... 16430.0 16925.000
## 113 0 114.2 ... 16695.0 12440.000
## 82 3 95.9 ... 12629.0 12764.000
## 57 3 95.3 ... 13645.0 11395.000
## 158 0 95.7 ... 7898.0 7788.000
## 58 3 95.3 ... 15645.0 11395.000
## 17 0 110.0 ... 36880.0 34184.000
## 129 1 98.4 ... 31400.5 35056.000
## 150 1 95.7 ... 5348.0 6488.000
## 73 0 120.9 ... 40960.0 34184.000
## 116 0 107.9 ... 17950.0 16900.000
## 30 2 86.6 ... 6479.0 5572.000
##
## [41 rows x 16 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 3083.213106381579
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 3083.213106381579
Se construye el modelo de árbol de regresión (ar). Semilla 2022 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 2022)
modelo_rf.fit(X_entrena, Y_entrena)
## RandomForestRegressor(n_estimators=20, random_state=2022)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([ 7221.3 , 17313.25 , 15198.55 , 8344.125 ,
## 8148.3 , 5997.65833333, 16042.3 , 16319.15 ,
## 32770.1 , 13142.31666667, 8251. , 9378.7 ,
## 17923.7 , 7193.4 , 19358. , 8454.35625 ,
## 8138.45 , 13321.575 , 5722.45 , 8594.4 ,
## 18117.65 , 7528.5 , 16760.675 , 28334.2 ,
## 9723.4 , 16598.2 , 7032.6 , 19354.52505 ,
## 35721.9 , 15169.1 , 15848.925 , 13572.8 ,
## 11598.2 , 7746.55 , 11859.95 , 36633.05 ,
## 36368.65 , 6461.85 , 35681.5 , 16598.2 ,
## 5827.19375 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 36 0 96.5 ... 7295.0 7221.300000
## 198 -2 104.3 ... 18420.0 17313.250000
## 102 0 100.4 ... 14399.0 15198.550000
## 146 0 97.0 ... 7463.0 8344.125000
## 79 1 93.0 ... 7689.0 8148.300000
## 32 1 93.7 ... 5399.0 5997.658333
## 107 0 107.9 ... 11900.0 16042.300000
## 180 -1 104.5 ... 15690.0 16319.150000
## 127 3 89.5 ... 34028.0 32770.100000
## 149 0 96.9 ... 11694.0 13142.316667
## 43 0 94.3 ... 6785.0 8251.000000
## 40 0 96.5 ... 10295.0 9378.700000
## 203 -1 109.1 ... 22470.0 17923.700000
## 138 2 93.7 ... 5118.0 7193.400000
## 201 -1 109.1 ... 19045.0 19358.000000
## 20 0 94.5 ... 6575.0 8454.356250
## 164 1 94.5 ... 8238.0 8138.450000
## 65 0 104.9 ... 18280.0 13321.575000
## 22 1 93.7 ... 6377.0 5722.450000
## 186 2 97.3 ... 8495.0 8594.400000
## 106 1 99.2 ... 18399.0 18117.650000
## 156 0 95.7 ... 6938.0 7528.500000
## 111 0 107.9 ... 15580.0 16760.675000
## 68 -1 110.0 ... 28248.0 28334.200000
## 123 -1 103.3 ... 8921.0 9723.400000
## 108 0 107.9 ... 13200.0 16598.200000
## 78 2 93.7 ... 6669.0 7032.600000
## 8 1 105.8 ... 23875.0 19354.525050
## 74 1 112.0 ... 45400.0 35721.900000
## 10 2 101.2 ... 16430.0 15169.100000
## 113 0 114.2 ... 16695.0 15848.925000
## 82 3 95.9 ... 12629.0 13572.800000
## 57 3 95.3 ... 13645.0 11598.200000
## 158 0 95.7 ... 7898.0 7746.550000
## 58 3 95.3 ... 15645.0 11859.950000
## 17 0 110.0 ... 36880.0 36633.050000
## 129 1 98.4 ... 31400.5 36368.650000
## 150 1 95.7 ... 5348.0 6461.850000
## 73 0 120.9 ... 40960.0 35681.500000
## 116 0 107.9 ... 17950.0 16598.200000
## 30 2 86.6 ... 6479.0 5827.193750
##
## [41 rows x 16 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2646.5187973672428
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2646.5187973672428
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 36 0 96.5 ... 6488.000 7221.300000
## 198 -2 104.3 ... 18950.000 17313.250000
## 102 0 100.4 ... 13499.000 15198.550000
## 146 0 97.0 ... 8195.000 8344.125000
## 79 1 93.0 ... 8558.000 8148.300000
## 32 1 93.7 ... 5151.000 5997.658333
## 107 0 107.9 ... 16630.000 16042.300000
## 180 -1 104.5 ... 15750.000 16319.150000
## 127 3 89.5 ... 32528.000 32770.100000
## 149 0 96.9 ... 12945.000 13142.316667
## 43 0 94.3 ... 8499.000 8251.000000
## 40 0 96.5 ... 6989.000 9378.700000
## 203 -1 109.1 ... 22625.000 17923.700000
## 138 2 93.7 ... 7609.000 7193.400000
## 201 -1 109.1 ... 22625.000 19358.000000
## 20 0 94.5 ... 8916.500 8454.356250
## 164 1 94.5 ... 8058.000 8138.450000
## 65 0 104.9 ... 13295.000 13321.575000
## 22 1 93.7 ... 6189.000 5722.450000
## 186 2 97.3 ... 8195.000 8594.400000
## 106 1 99.2 ... 17199.000 18117.650000
## 156 0 95.7 ... 7198.000 7528.500000
## 111 0 107.9 ... 16630.000 16760.675000
## 68 -1 110.0 ... 25552.000 28334.200000
## 123 -1 103.3 ... 8921.000 9723.400000
## 108 0 107.9 ... 16900.000 16598.200000
## 78 2 93.7 ... 7150.500 7032.600000
## 8 1 105.8 ... 17859.167 19354.525050
## 74 1 112.0 ... 34184.000 35721.900000
## 10 2 101.2 ... 16925.000 15169.100000
## 113 0 114.2 ... 12440.000 15848.925000
## 82 3 95.9 ... 12764.000 13572.800000
## 57 3 95.3 ... 11395.000 11598.200000
## 158 0 95.7 ... 7788.000 7746.550000
## 58 3 95.3 ... 11395.000 11859.950000
## 17 0 110.0 ... 34184.000 36633.050000
## 129 1 98.4 ... 35056.000 36368.650000
## 150 1 95.7 ... 6488.000 6461.850000
## 73 0 120.9 ... 34184.000 35681.500000
## 116 0 107.9 ... 16900.000 16598.200000
## 30 2 86.6 ... 5572.000 5827.193750
##
## [41 rows x 18 columns]
Se compara el RMSE Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3703.8923303 , 3083.21310638, 2646.51879737]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 3703.89233 3083.213106 2646.518797
Puede ser similar a la de R ….. Pendiente …..
Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
Pendiente
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue bosques aleatorios; se tuvo como resultado un de 2646.51 de diferencia en promedio de las predicciones conforme a valores reales.
Se construyeron datos de entrenamiento y validación y con el porcentaje de 80% y 20% respectivamente.