| title: “Caso 6. Comparación modelos de regresión. Datos Precios autos. Programamción Python” |
| author: “Luis Alberto Jimenez Soto” |
| date: “2022-10-18” |
| output: |
| html_document: |
| code_folding: hide |
| toc: true |
| toc_float: true |
| toc_depth: 6 |
| number_sections: yes |
Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
datos
## Unnamed: 0 symboling wheelbase ... citympg highwaympg price
## 0 1 3 88.6 ... 21 27 13495.0
## 1 2 3 88.6 ... 21 27 16500.0
## 2 3 1 94.5 ... 19 26 16500.0
## 3 4 2 99.8 ... 24 30 13950.0
## 4 5 2 99.4 ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 109.1 ... 23 28 16845.0
## 201 202 -1 109.1 ... 19 25 19045.0
## 202 203 -1 109.1 ... 18 23 21485.0
## 203 204 -1 109.1 ... 26 27 22470.0
## 204 205 -1 109.1 ... 19 25 22625.0
##
## [205 rows x 16 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 16)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## Unnamed: 0 int64
## symboling int64
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginesize int64
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
~Fuente: https://archive.ics.uci.edu/ml/datasets/Automobile~
Dejar solo las variables necesarias:
‘symboling’, ‘wheelbase’, ‘carlength’, ‘carwidth’, ‘carheight’, ‘curbweight’, ‘enginesize’, ‘boreratio’, ‘stroke’, ‘compressionratio’, ‘horsepower’, ‘peakrpm’, ‘citympg’, ‘highwaympg’, ‘price’
datos = datos[['symboling', 'wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price']]
datos.describe()
## symboling wheelbase carlength ... citympg highwaympg price
## count 205.000000 205.000000 205.000000 ... 205.000000 205.000000 205.000000
## mean 0.834146 98.756585 174.049268 ... 25.219512 30.751220 13276.710571
## std 1.245307 6.021776 12.337289 ... 6.542142 6.886443 7988.852332
## min -2.000000 86.600000 141.100000 ... 13.000000 16.000000 5118.000000
## 25% 0.000000 94.500000 166.300000 ... 19.000000 25.000000 7788.000000
## 50% 1.000000 97.000000 173.200000 ... 24.000000 30.000000 10295.000000
## 75% 2.000000 102.400000 183.100000 ... 30.000000 34.000000 16503.000000
## max 3.000000 120.900000 208.100000 ... 49.000000 54.000000 45400.000000
##
## [8 rows x 15 columns]
datos
## symboling wheelbase carlength ... citympg highwaympg price
## 0 3 88.6 168.8 ... 21 27 13495.0
## 1 3 88.6 168.8 ... 21 27 16500.0
## 2 1 94.5 171.2 ... 19 26 16500.0
## 3 2 99.8 176.6 ... 24 30 13950.0
## 4 2 99.4 176.6 ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 109.1 188.8 ... 23 28 16845.0
## 201 -1 109.1 188.8 ... 19 25 19045.0
## 202 -1 109.1 188.8 ... 18 23 21485.0
## 203 -1 109.1 188.8 ... 26 27 22470.0
## 204 -1 109.1 188.8 ... 19 25 22625.0
##
## [205 rows x 15 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1271
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos.drop(columns = "price"), datos['price'],train_size = 0.80, random_state = 1271)
X_entrena
## symboling wheelbase carlength ... peakrpm citympg highwaympg
## 156 0 95.7 166.3 ... 4800 30 37
## 69 0 106.7 187.5 ... 4350 22 25
## 116 0 107.9 186.7 ... 4150 28 33
## 111 0 107.9 186.7 ... 5000 19 24
## 115 0 107.9 186.7 ... 5000 19 24
## .. ... ... ... ... ... ... ...
## 125 3 94.5 168.9 ... 5500 19 27
## 36 0 96.5 157.1 ... 6000 30 34
## 174 -1 102.4 175.6 ... 4500 30 33
## 27 1 93.7 157.3 ... 5500 24 30
## 56 3 95.3 169.0 ... 6000 17 23
##
## [164 rows x 14 columns]
X_valida
## symboling wheelbase carlength ... peakrpm citympg highwaympg
## 84 3 95.9 173.2 ... 5000 19 24
## 139 2 93.7 157.9 ... 4400 26 31
## 143 0 97.2 172.0 ... 5200 26 32
## 38 0 96.5 167.5 ... 5800 27 33
## 15 0 103.5 189.0 ... 5400 16 22
## 122 1 93.7 167.3 ... 5500 31 38
## 150 1 95.7 158.7 ... 4800 35 39
## 160 0 95.7 166.3 ... 4800 38 47
## 179 3 102.9 183.5 ... 5200 19 24
## 30 2 86.6 144.6 ... 4800 49 54
## 9 0 99.5 178.2 ... 5500 16 22
## 35 0 96.5 163.4 ... 6000 30 34
## 76 2 93.7 157.3 ... 5500 37 41
## 168 2 98.4 176.2 ... 4800 24 30
## 8 1 105.8 192.7 ... 5500 17 20
## 89 1 94.5 165.3 ... 5200 31 37
## 5 2 99.8 177.3 ... 5500 19 25
## 145 0 97.0 172.0 ... 4800 24 29
## 51 1 93.1 159.1 ... 5000 31 38
## 66 0 104.9 175.0 ... 4200 31 39
## 49 0 102.0 191.7 ... 5000 13 17
## 43 0 94.3 170.7 ... 4800 24 29
## 186 2 97.3 171.7 ... 5250 27 34
## 141 0 97.2 172.0 ... 4800 32 37
## 142 0 97.2 172.0 ... 4400 28 33
## 23 1 93.7 157.3 ... 5500 24 30
## 193 0 100.4 183.1 ... 5500 25 31
## 10 2 101.2 176.8 ... 5800 23 29
## 1 3 88.6 168.8 ... 5000 21 27
## 86 1 96.3 172.4 ... 5000 25 32
## 57 3 95.3 169.0 ... 6000 17 23
## 62 0 98.8 177.8 ... 4800 26 32
## 20 0 94.5 158.8 ... 5400 38 43
## 13 0 101.2 176.8 ... 4250 21 28
## 185 2 97.3 171.7 ... 5250 27 34
## 119 1 93.7 157.3 ... 5500 24 30
## 198 -2 104.3 188.8 ... 5100 17 22
## 165 1 94.5 168.7 ... 6600 26 29
## 196 -2 104.3 188.8 ... 5400 24 28
## 3 2 99.8 176.6 ... 5500 24 30
## 32 1 93.7 150.0 ... 5500 38 42
##
## [41 rows x 14 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 2.11081298e+02, 1.62441842e+02, -7.45636564e+01, 3.90949747e+02,
## 9.97443007e+01, 6.79302644e-01, 1.38939896e+02, -1.37839030e+03,
## -3.87041646e+03, 3.28628928e+02, 3.32912969e+01, 2.53731651e+00,
## -2.07657002e+02, 1.08210062e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.8609377759913558
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [15299.82250044 8572.83459328 11337.10303479 9970.03560356
## 27836.14524479 7294.84326523 5412.33585414 6021.46270628
## 22751.98267255 2262.18611057 16885.41402655 7950.15614101
## 6545.73015596 13768.61197328 18155.76432809 6282.43971037
## 15678.75014976 10814.12751094 5950.89996771 13797.49363348
## 50223.89310709 6592.59871662 9738.13946724 9249.63962107
## 8663.06939391 8424.0042451 10477.5299584 13567.64320057
## 13740.30019382 9806.25989378 7773.0707937 9821.45440669
## 5933.70180563 17035.70356939 9695.3434007 8424.0042451
## 15753.99075796 12157.06245924 16003.19579602 11674.06257723]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 84 3 95.9 ... 14489.000 15299.822500
## 139 2 93.7 ... 7053.000 8572.834593
## 143 0 97.2 ... 9960.000 11337.103035
## 38 0 96.5 ... 9095.000 9970.035604
## 15 0 103.5 ... 30760.000 27836.145245
## 122 1 93.7 ... 7609.000 7294.843265
## 150 1 95.7 ... 5348.000 5412.335854
## 160 0 95.7 ... 7738.000 6021.462706
## 179 3 102.9 ... 15998.000 22751.982673
## 30 2 86.6 ... 6479.000 2262.186111
## 9 0 99.5 ... 17859.167 16885.414027
## 35 0 96.5 ... 7295.000 7950.156141
## 76 2 93.7 ... 5389.000 6545.730156
## 168 2 98.4 ... 9639.000 13768.611973
## 8 1 105.8 ... 23875.000 18155.764328
## 89 1 94.5 ... 5499.000 6282.439710
## 5 2 99.8 ... 15250.000 15678.750150
## 145 0 97.0 ... 11259.000 10814.127511
## 51 1 93.1 ... 6095.000 5950.899968
## 66 0 104.9 ... 18344.000 13797.493633
## 49 0 102.0 ... 36000.000 50223.893107
## 43 0 94.3 ... 6785.000 6592.598717
## 186 2 97.3 ... 8495.000 9738.139467
## 141 0 97.2 ... 7126.000 9249.639621
## 142 0 97.2 ... 7775.000 8663.069394
## 23 1 93.7 ... 7957.000 8424.004245
## 193 0 100.4 ... 12290.000 10477.529958
## 10 2 101.2 ... 16430.000 13567.643201
## 1 3 88.6 ... 16500.000 13740.300194
## 86 1 96.3 ... 8189.000 9806.259894
## 57 3 95.3 ... 13645.000 7773.070794
## 62 0 98.8 ... 10245.000 9821.454407
## 20 0 94.5 ... 6575.000 5933.701806
## 13 0 101.2 ... 21105.000 17035.703569
## 185 2 97.3 ... 8195.000 9695.343401
## 119 1 93.7 ... 7957.000 8424.004245
## 198 -2 104.3 ... 18420.000 15753.990758
## 165 1 94.5 ... 9298.000 12157.062459
## 196 -2 104.3 ... 15985.000 16003.195796
## 3 2 99.8 ... 13950.000 11674.062577
## 32 1 93.7 ... 5399.000 5607.114218
##
## [41 rows x 16 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3351.4574039517665
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3351.4574039517665
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 1271
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1271)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(random_state=1271)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 14
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 153
plot = plot_tree(
decision_tree = modelo_ar,
feature_names = datos.drop(columns = "price").columns,
class_names = 'price',
filled = True,
impurity = False,
fontsize = 10,
precision = 2,
ax = ax
)
plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2659.50
## | | |--- curbweight <= 2291.50
## | | | |--- symboling <= 2.50
## | | | | |--- curbweight <= 2115.50
## | | | | | |--- wheelbase <= 94.10
## | | | | | | |--- stroke <= 3.09
## | | | | | | | |--- citympg <= 39.00
## | | | | | | | | |--- value: [5118.00]
## | | | | | | | |--- citympg > 39.00
## | | | | | | | | |--- value: [5151.00]
## | | | | | | |--- stroke > 3.09
## | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | |--- highwaympg <= 32.50
## | | | | | | | | | |--- value: [5195.00]
## | | | | | | | | |--- highwaympg > 32.50
## | | | | | | | | | |--- highwaympg <= 39.50
## | | | | | | | | | | |--- wheelbase <= 93.40
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- wheelbase > 93.40
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- highwaympg > 39.50
## | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | |--- boreratio <= 3.05
## | | | | | | | | | |--- curbweight <= 1978.00
## | | | | | | | | | | |--- carwidth <= 63.90
## | | | | | | | | | | | |--- value: [6229.00]
## | | | | | | | | | | |--- carwidth > 63.90
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- curbweight > 1978.00
## | | | | | | | | | | |--- carheight <= 50.70
## | | | | | | | | | | | |--- value: [7150.50]
## | | | | | | | | | | |--- carheight > 50.70
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | |--- boreratio > 3.05
## | | | | | | | | | |--- value: [7395.00]
## | | | | | |--- wheelbase > 94.10
## | | | | | | |--- carheight <= 54.00
## | | | | | | | |--- compressionratio <= 9.20
## | | | | | | | | |--- carheight <= 52.90
## | | | | | | | | | |--- value: [7198.00]
## | | | | | | | | |--- carheight > 52.90
## | | | | | | | | | |--- value: [6938.00]
## | | | | | | | |--- compressionratio > 9.20
## | | | | | | | | |--- symboling <= 0.50
## | | | | | | | | | |--- value: [8916.50]
## | | | | | | | | |--- symboling > 0.50
## | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | |--- curbweight <= 2026.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2026.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | |--- value: [8249.00]
## | | | | | | |--- carheight > 54.00
## | | | | | | | |--- carwidth <= 63.70
## | | | | | | | | |--- curbweight <= 2027.50
## | | | | | | | | | |--- value: [6488.00]
## | | | | | | | | |--- curbweight > 2027.50
## | | | | | | | | | |--- value: [6338.00]
## | | | | | | | |--- carwidth > 63.70
## | | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | | |--- curbweight <= 1928.00
## | | | | | | | | | | |--- value: [6649.00]
## | | | | | | | | | |--- curbweight > 1928.00
## | | | | | | | | | | |--- value: [6849.00]
## | | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | | |--- horsepower <= 62.00
## | | | | | | | | | | |--- value: [7099.00]
## | | | | | | | | | |--- horsepower > 62.00
## | | | | | | | | | | |--- wheelbase <= 95.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- wheelbase > 95.50
## | | | | | | | | | | | |--- value: [7295.00]
## | | | | |--- curbweight > 2115.50
## | | | | | |--- curbweight <= 2142.50
## | | | | | | |--- curbweight <= 2131.00
## | | | | | | | |--- value: [8358.00]
## | | | | | | |--- curbweight > 2131.00
## | | | | | | | |--- value: [9258.00]
## | | | | | |--- curbweight > 2142.50
## | | | | | | |--- curbweight <= 2277.50
## | | | | | | | |--- carheight <= 50.70
## | | | | | | | | |--- value: [8558.00]
## | | | | | | | |--- carheight > 50.70
## | | | | | | | | |--- carwidth <= 63.90
## | | | | | | | | | |--- stroke <= 3.02
## | | | | | | | | | | |--- value: [7603.00]
## | | | | | | | | | |--- stroke > 3.02
## | | | | | | | | | | |--- value: [7689.00]
## | | | | | | | | |--- carwidth > 63.90
## | | | | | | | | | |--- curbweight <= 2206.50
## | | | | | | | | | | |--- curbweight <= 2186.50
## | | | | | | | | | | | |--- value: [8058.00]
## | | | | | | | | | | |--- curbweight > 2186.50
## | | | | | | | | | | | |--- value: [8238.00]
## | | | | | | | | | |--- curbweight > 2206.50
## | | | | | | | | | | |--- citympg <= 37.50
## | | | | | | | | | | | |--- truncated branch of depth 4
## | | | | | | | | | | |--- citympg > 37.50
## | | | | | | | | | | | |--- value: [7788.00]
## | | | | | | |--- curbweight > 2277.50
## | | | | | | | |--- highwaympg <= 34.50
## | | | | | | | | |--- carlength <= 171.60
## | | | | | | | | | |--- value: [7898.00]
## | | | | | | | | |--- carlength > 171.60
## | | | | | | | | | |--- value: [7463.00]
## | | | | | | | |--- highwaympg > 34.50
## | | | | | | | | |--- value: [6918.00]
## | | | |--- symboling > 2.50
## | | | | |--- carlength <= 162.50
## | | | | | |--- value: [11595.00]
## | | | | |--- carlength > 162.50
## | | | | | |--- value: [9980.00]
## | | |--- curbweight > 2291.50
## | | | |--- highwaympg <= 29.50
## | | | | |--- horsepower <= 91.50
## | | | | | |--- highwaympg <= 27.00
## | | | | | | |--- value: [9233.00]
## | | | | | |--- highwaympg > 27.00
## | | | | | | |--- value: [8013.00]
## | | | | |--- horsepower > 91.50
## | | | | | |--- wheelbase <= 100.15
## | | | | | | |--- citympg <= 16.50
## | | | | | | | |--- value: [15645.00]
## | | | | | | |--- citympg > 16.50
## | | | | | | | |--- peakrpm <= 6300.00
## | | | | | | | | |--- carwidth <= 65.30
## | | | | | | | | | |--- curbweight <= 2506.50
## | | | | | | | | | | |--- value: [12945.00]
## | | | | | | | | | |--- curbweight > 2506.50
## | | | | | | | | | | |--- value: [13495.00]
## | | | | | | | | |--- carwidth > 65.30
## | | | | | | | | | |--- carlength <= 171.30
## | | | | | | | | | | |--- value: [11395.00]
## | | | | | | | | | |--- carlength > 171.30
## | | | | | | | | | | |--- compressionratio <= 8.51
## | | | | | | | | | | | |--- value: [11694.00]
## | | | | | | | | | | |--- compressionratio > 8.51
## | | | | | | | | | | | |--- value: [11850.00]
## | | | | | | | |--- peakrpm > 6300.00
## | | | | | | | | |--- value: [9538.00]
## | | | | | |--- wheelbase > 100.15
## | | | | | | |--- value: [16925.00]
## | | | |--- highwaympg > 29.50
## | | | | |--- carwidth <= 66.75
## | | | | | |--- compressionratio <= 8.55
## | | | | | | |--- boreratio <= 3.34
## | | | | | | | |--- symboling <= 2.00
## | | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | | |--- value: [9549.00]
## | | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | | |--- compressionratio <= 8.00
## | | | | | | | | | | |--- value: [9279.00]
## | | | | | | | | | |--- compressionratio > 8.00
## | | | | | | | | | | |--- carheight <= 57.25
## | | | | | | | | | | | |--- value: [8949.00]
## | | | | | | | | | | |--- carheight > 57.25
## | | | | | | | | | | | |--- value: [8921.00]
## | | | | | | | |--- symboling > 2.00
## | | | | | | | | |--- value: [9959.00]
## | | | | | | |--- boreratio > 3.34
## | | | | | | | |--- carlength <= 172.70
## | | | | | | | | |--- value: [6989.00]
## | | | | | | | |--- carlength > 172.70
## | | | | | | | | |--- curbweight <= 2431.50
## | | | | | | | | | |--- value: [8499.00]
## | | | | | | | | |--- curbweight > 2431.50
## | | | | | | | | | |--- value: [8921.00]
## | | | | | |--- compressionratio > 8.55
## | | | | | | |--- curbweight <= 2412.00
## | | | | | | | |--- curbweight <= 2397.50
## | | | | | | | | |--- curbweight <= 2302.00
## | | | | | | | | | |--- enginesize <= 109.50
## | | | | | | | | | | |--- value: [9995.00]
## | | | | | | | | | |--- enginesize > 109.50
## | | | | | | | | | | |--- value: [10345.00]
## | | | | | | | | |--- curbweight > 2302.00
## | | | | | | | | | |--- curbweight <= 2349.00
## | | | | | | | | | | |--- stroke <= 3.47
## | | | | | | | | | | | |--- value: [9495.00]
## | | | | | | | | | | |--- stroke > 3.47
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- curbweight > 2349.00
## | | | | | | | | | | |--- peakrpm <= 5300.00
## | | | | | | | | | | | |--- value: [9720.00]
## | | | | | | | | | | |--- peakrpm > 5300.00
## | | | | | | | | | | | |--- value: [10295.00]
## | | | | | | | |--- curbweight > 2397.50
## | | | | | | | | |--- value: [8495.00]
## | | | | | | |--- curbweight > 2412.00
## | | | | | | | |--- citympg <= 24.50
## | | | | | | | | |--- carwidth <= 66.55
## | | | | | | | | | |--- curbweight <= 2545.50
## | | | | | | | | | | |--- value: [8449.00]
## | | | | | | | | | |--- curbweight > 2545.50
## | | | | | | | | | | |--- curbweight <= 2565.00
## | | | | | | | | | | | |--- value: [9989.00]
## | | | | | | | | | | |--- curbweight > 2565.00
## | | | | | | | | | | | |--- value: [9295.00]
## | | | | | | | | |--- carwidth > 66.55
## | | | | | | | | | |--- value: [9895.00]
## | | | | | | | |--- citympg > 24.50
## | | | | | | | | |--- stroke <= 3.00
## | | | | | | | | | |--- value: [10198.00]
## | | | | | | | | |--- stroke > 3.00
## | | | | | | | | | |--- curbweight <= 2419.50
## | | | | | | | | | | |--- carheight <= 54.40
## | | | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | | | |--- carheight > 54.40
## | | | | | | | | | | | |--- value: [10898.00]
## | | | | | | | | | |--- curbweight > 2419.50
## | | | | | | | | | | |--- horsepower <= 78.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- horsepower > 78.50
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | |--- carwidth > 66.75
## | | | | | |--- value: [13845.00]
## | |--- curbweight > 2659.50
## | | |--- carwidth <= 68.60
## | | | |--- horsepower <= 118.00
## | | | | |--- horsepower <= 92.50
## | | | | | |--- enginesize <= 105.50
## | | | | | | |--- value: [8778.00]
## | | | | | |--- enginesize > 105.50
## | | | | | | |--- value: [11048.00]
## | | | | |--- horsepower > 92.50
## | | | | | |--- curbweight <= 2736.00
## | | | | | | |--- boreratio <= 3.37
## | | | | | | | |--- peakrpm <= 5375.00
## | | | | | | | | |--- value: [15040.00]
## | | | | | | | |--- peakrpm > 5375.00
## | | | | | | | | |--- value: [13295.00]
## | | | | | | |--- boreratio > 3.37
## | | | | | | | |--- carwidth <= 66.05
## | | | | | | | | |--- curbweight <= 2696.50
## | | | | | | | | | |--- value: [11199.00]
## | | | | | | | | |--- curbweight > 2696.50
## | | | | | | | | | |--- value: [11549.00]
## | | | | | | | |--- carwidth > 66.05
## | | | | | | | | |--- value: [12170.00]
## | | | | | |--- curbweight > 2736.00
## | | | | | | |--- carwidth <= 66.45
## | | | | | | | |--- boreratio <= 3.40
## | | | | | | | | |--- value: [17450.00]
## | | | | | | | |--- boreratio > 3.40
## | | | | | | | | |--- value: [17669.00]
## | | | | | | |--- carwidth > 66.45
## | | | | | | | |--- curbweight <= 3241.00
## | | | | | | | | |--- curbweight <= 3136.00
## | | | | | | | | | |--- curbweight <= 3038.00
## | | | | | | | | | | |--- carwidth <= 66.85
## | | | | | | | | | | | |--- value: [15510.00]
## | | | | | | | | | | |--- carwidth > 66.85
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- curbweight > 3038.00
## | | | | | | | | | | |--- horsepower <= 96.00
## | | | | | | | | | | | |--- value: [15580.00]
## | | | | | | | | | | |--- horsepower > 96.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | |--- curbweight > 3136.00
## | | | | | | | | | |--- horsepower <= 96.00
## | | | | | | | | | | |--- value: [13200.00]
## | | | | | | | | | |--- horsepower > 96.00
## | | | | | | | | | | |--- value: [12440.00]
## | | | | | | | |--- curbweight > 3241.00
## | | | | | | | | |--- carheight <= 57.70
## | | | | | | | | | |--- highwaympg <= 28.50
## | | | | | | | | | | |--- value: [16695.00]
## | | | | | | | | | |--- highwaympg > 28.50
## | | | | | | | | | | |--- value: [17425.00]
## | | | | | | | | |--- carheight > 57.70
## | | | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | | | |--- value: [13860.00]
## | | | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | | | |--- value: [17075.00]
## | | | |--- horsepower > 118.00
## | | | | |--- stroke <= 3.24
## | | | | | |--- enginesize <= 145.50
## | | | | | | |--- boreratio <= 3.77
## | | | | | | | |--- carlength <= 187.75
## | | | | | | | | |--- peakrpm <= 5550.00
## | | | | | | | | | |--- curbweight <= 2827.50
## | | | | | | | | | | |--- carwidth <= 66.30
## | | | | | | | | | | | |--- value: [18280.00]
## | | | | | | | | | | |--- carwidth > 66.30
## | | | | | | | | | | | |--- value: [18150.00]
## | | | | | | | | | |--- curbweight > 2827.50
## | | | | | | | | | | |--- value: [18620.00]
## | | | | | | | | |--- peakrpm > 5550.00
## | | | | | | | | | |--- value: [18150.00]
## | | | | | | | |--- carlength > 187.75
## | | | | | | | | |--- value: [18950.00]
## | | | | | | |--- boreratio > 3.77
## | | | | | | | |--- value: [16503.00]
## | | | | | |--- enginesize > 145.50
## | | | | | | |--- highwaympg <= 26.00
## | | | | | | | |--- value: [24565.00]
## | | | | | | |--- highwaympg > 26.00
## | | | | | | | |--- enginesize <= 157.50
## | | | | | | | | |--- value: [22018.00]
## | | | | | | | |--- enginesize > 157.50
## | | | | | | | | |--- value: [20970.00]
## | | | | |--- stroke > 3.24
## | | | | | |--- horsepower <= 153.00
## | | | | | | |--- curbweight <= 2877.00
## | | | | | | | |--- boreratio <= 3.59
## | | | | | | | | |--- boreratio <= 3.58
## | | | | | | | | | |--- value: [12629.00]
## | | | | | | | | |--- boreratio > 3.58
## | | | | | | | | | |--- value: [12764.00]
## | | | | | | | |--- boreratio > 3.59
## | | | | | | | | |--- value: [12964.00]
## | | | | | | |--- curbweight > 2877.00
## | | | | | | | |--- compressionratio <= 8.00
## | | | | | | | | |--- value: [14869.00]
## | | | | | | | |--- compressionratio > 8.00
## | | | | | | | | |--- curbweight <= 3195.50
## | | | | | | | | | |--- value: [13499.00]
## | | | | | | | | |--- curbweight > 3195.50
## | | | | | | | | | |--- value: [14399.00]
## | | | | | |--- horsepower > 153.00
## | | | | | | |--- boreratio <= 3.35
## | | | | | | | |--- wheelbase <= 103.70
## | | | | | | | | |--- citympg <= 19.50
## | | | | | | | | | |--- value: [16500.00]
## | | | | | | | | |--- citympg > 19.50
## | | | | | | | | | |--- value: [16558.00]
## | | | | | | | |--- wheelbase > 103.70
## | | | | | | | | |--- enginesize <= 166.00
## | | | | | | | | | |--- value: [15750.00]
## | | | | | | | | |--- enginesize > 166.00
## | | | | | | | | | |--- value: [15690.00]
## | | | | | | |--- boreratio > 3.35
## | | | | | | | |--- horsepower <= 180.00
## | | | | | | | | |--- carlength <= 174.60
## | | | | | | | | | |--- value: [17199.00]
## | | | | | | | | |--- carlength > 174.60
## | | | | | | | | | |--- value: [18399.00]
## | | | | | | | |--- horsepower > 180.00
## | | | | | | | | |--- value: [19699.00]
## | | |--- carwidth > 68.60
## | | | |--- curbweight <= 2983.00
## | | | | |--- curbweight <= 2953.00
## | | | | | |--- citympg <= 21.00
## | | | | | | |--- value: [17710.00]
## | | | | | |--- citympg > 21.00
## | | | | | | |--- value: [16845.00]
## | | | | |--- curbweight > 2953.00
## | | | | | |--- value: [18920.00]
## | | | |--- curbweight > 2983.00
## | | | | |--- horsepower <= 147.00
## | | | | | |--- enginesize <= 159.00
## | | | | | | |--- peakrpm <= 5100.00
## | | | | | | | |--- value: [22470.00]
## | | | | | | |--- peakrpm > 5100.00
## | | | | | | | |--- value: [22625.00]
## | | | | | |--- enginesize > 159.00
## | | | | | | |--- value: [21485.00]
## | | | | |--- horsepower > 147.00
## | | | | | |--- value: [19045.00]
## |--- enginesize > 182.00
## | |--- compressionratio <= 8.05
## | | |--- symboling <= 0.50
## | | | |--- highwaympg <= 21.00
## | | | | |--- carlength <= 202.55
## | | | | | |--- value: [36880.00]
## | | | | |--- carlength > 202.55
## | | | | | |--- value: [40960.00]
## | | | |--- highwaympg > 21.00
## | | | | |--- value: [41315.00]
## | | |--- symboling > 0.50
## | | | |--- value: [45400.00]
## | |--- compressionratio > 8.05
## | | |--- compressionratio <= 9.75
## | | | |--- curbweight <= 2778.00
## | | | | |--- value: [33278.00]
## | | | |--- curbweight > 2778.00
## | | | | |--- enginesize <= 214.00
## | | | | | |--- value: [37028.00]
## | | | | |--- enginesize > 214.00
## | | | | | |--- carheight <= 51.80
## | | | | | | |--- value: [35056.00]
## | | | | | |--- carheight > 51.80
## | | | | | | |--- boreratio <= 3.55
## | | | | | | | |--- value: [34184.00]
## | | | | | | |--- boreratio > 3.55
## | | | | | | | |--- value: [33900.00]
## | | |--- compressionratio > 9.75
## | | | |--- carwidth <= 71.00
## | | | | |--- curbweight <= 3632.50
## | | | | | |--- carlength <= 189.20
## | | | | | | |--- value: [28176.00]
## | | | | | |--- carlength > 189.20
## | | | | | | |--- value: [25552.00]
## | | | | |--- curbweight > 3632.50
## | | | | | |--- value: [28248.00]
## | | | |--- carwidth > 71.00
## | | | | |--- carwidth <= 72.00
## | | | | | |--- value: [31600.00]
## | | | | |--- carwidth > 72.00
## | | | | | |--- value: [31400.50]
importancia_predictores = pd.DataFrame(
{'predictor': datos.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 0.699023
## 5 curbweight 0.207796
## 9 compressionratio 0.030440
## 10 horsepower 0.018983
## 3 carwidth 0.015350
## 8 stroke 0.009370
## 13 highwaympg 0.004972
## 0 symboling 0.004642
## 1 wheelbase 0.002686
## 7 boreratio 0.002103
## 12 citympg 0.001633
## 2 carlength 0.001482
## 11 peakrpm 0.000781
## 4 carheight 0.000739
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, peakrpm, carheight y wheelbase
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([14869. , 8358. , 9495. , 7898. , 41315. , 7689. , 6488. ,
## 6938. , 16500. , 5572. , 16500. , 7295. , 5572. , 8449. ,
## 22625. , 6649. , 11694. , 11694. , 6795. , 11048. , 28248. ,
## 8013. , 7898. , 8058. , 8238. , 8358. , 13845. , 16925. ,
## 13495. , 6989. , 11395. , 8495. , 8916.5, 20970. , 7975. ,
## 8358. , 18950. , 7898. , 12940. , 9495. , 5118. ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 84 3 95.9 ... 14489.000 14869.0
## 139 2 93.7 ... 7053.000 8358.0
## 143 0 97.2 ... 9960.000 9495.0
## 38 0 96.5 ... 9095.000 7898.0
## 15 0 103.5 ... 30760.000 41315.0
## 122 1 93.7 ... 7609.000 7689.0
## 150 1 95.7 ... 5348.000 6488.0
## 160 0 95.7 ... 7738.000 6938.0
## 179 3 102.9 ... 15998.000 16500.0
## 30 2 86.6 ... 6479.000 5572.0
## 9 0 99.5 ... 17859.167 16500.0
## 35 0 96.5 ... 7295.000 7295.0
## 76 2 93.7 ... 5389.000 5572.0
## 168 2 98.4 ... 9639.000 8449.0
## 8 1 105.8 ... 23875.000 22625.0
## 89 1 94.5 ... 5499.000 6649.0
## 5 2 99.8 ... 15250.000 11694.0
## 145 0 97.0 ... 11259.000 11694.0
## 51 1 93.1 ... 6095.000 6795.0
## 66 0 104.9 ... 18344.000 11048.0
## 49 0 102.0 ... 36000.000 28248.0
## 43 0 94.3 ... 6785.000 8013.0
## 186 2 97.3 ... 8495.000 7898.0
## 141 0 97.2 ... 7126.000 8058.0
## 142 0 97.2 ... 7775.000 8238.0
## 23 1 93.7 ... 7957.000 8358.0
## 193 0 100.4 ... 12290.000 13845.0
## 10 2 101.2 ... 16430.000 16925.0
## 1 3 88.6 ... 16500.000 13495.0
## 86 1 96.3 ... 8189.000 6989.0
## 57 3 95.3 ... 13645.000 11395.0
## 62 0 98.8 ... 10245.000 8495.0
## 20 0 94.5 ... 6575.000 8916.5
## 13 0 101.2 ... 21105.000 20970.0
## 185 2 97.3 ... 8195.000 7975.0
## 119 1 93.7 ... 7957.000 8358.0
## 198 -2 104.3 ... 18420.000 18950.0
## 165 1 94.5 ... 9298.000 7898.0
## 196 -2 104.3 ... 15985.000 12940.0
## 3 2 99.8 ... 13950.000 9495.0
## 32 1 93.7 ... 5399.000 5118.0
##
## [41 rows x 16 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2759.7797861241993
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2759.7797861241993
Se construye el modelo de árbol de regresión (ar). Semilla 1271 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1271)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1271)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_estimators=20, random_state=1271)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([14033.5 , 7287.4 , 9564.1 , 8227.75 ,
## 40392.75 , 7222.7125 , 6304.3 , 7091.95 ,
## 16733.9 , 5485.5 , 17475.6 , 7284.275 ,
## 5708.15 , 8796.2 , 18386.85 , 6723.45 ,
## 11413.35 , 10336.75 , 6356.8 , 13603.8 ,
## 32523.325 , 9309.7 , 8189.25 , 7890.925 ,
## 8242.95 , 8533.8 , 12139.1 , 14795.9 ,
## 12091.55 , 8481.88333333, 12001.2 , 9569.46666667,
## 8429.12916667, 19361.2 , 7947.4 , 8533.8 ,
## 17821. , 9616.2 , 14617. , 10729.38333333,
## 5626.25 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 84 3 95.9 ... 14489.000 14033.500000
## 139 2 93.7 ... 7053.000 7287.400000
## 143 0 97.2 ... 9960.000 9564.100000
## 38 0 96.5 ... 9095.000 8227.750000
## 15 0 103.5 ... 30760.000 40392.750000
## 122 1 93.7 ... 7609.000 7222.712500
## 150 1 95.7 ... 5348.000 6304.300000
## 160 0 95.7 ... 7738.000 7091.950000
## 179 3 102.9 ... 15998.000 16733.900000
## 30 2 86.6 ... 6479.000 5485.500000
## 9 0 99.5 ... 17859.167 17475.600000
## 35 0 96.5 ... 7295.000 7284.275000
## 76 2 93.7 ... 5389.000 5708.150000
## 168 2 98.4 ... 9639.000 8796.200000
## 8 1 105.8 ... 23875.000 18386.850000
## 89 1 94.5 ... 5499.000 6723.450000
## 5 2 99.8 ... 15250.000 11413.350000
## 145 0 97.0 ... 11259.000 10336.750000
## 51 1 93.1 ... 6095.000 6356.800000
## 66 0 104.9 ... 18344.000 13603.800000
## 49 0 102.0 ... 36000.000 32523.325000
## 43 0 94.3 ... 6785.000 9309.700000
## 186 2 97.3 ... 8495.000 8189.250000
## 141 0 97.2 ... 7126.000 7890.925000
## 142 0 97.2 ... 7775.000 8242.950000
## 23 1 93.7 ... 7957.000 8533.800000
## 193 0 100.4 ... 12290.000 12139.100000
## 10 2 101.2 ... 16430.000 14795.900000
## 1 3 88.6 ... 16500.000 12091.550000
## 86 1 96.3 ... 8189.000 8481.883333
## 57 3 95.3 ... 13645.000 12001.200000
## 62 0 98.8 ... 10245.000 9569.466667
## 20 0 94.5 ... 6575.000 8429.129167
## 13 0 101.2 ... 21105.000 19361.200000
## 185 2 97.3 ... 8195.000 7947.400000
## 119 1 93.7 ... 7957.000 8533.800000
## 198 -2 104.3 ... 18420.000 17821.000000
## 165 1 94.5 ... 9298.000 9616.200000
## 196 -2 104.3 ... 15985.000 14617.000000
## 3 2 99.8 ... 13950.000 10729.383333
## 32 1 93.7 ... 5399.000 5626.250000
##
## [41 rows x 16 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2380.5577291513127
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2380.5577291513127
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 84 3 95.9 ... 14869.0 14033.500000
## 139 2 93.7 ... 8358.0 7287.400000
## 143 0 97.2 ... 9495.0 9564.100000
## 38 0 96.5 ... 7898.0 8227.750000
## 15 0 103.5 ... 41315.0 40392.750000
## 122 1 93.7 ... 7689.0 7222.712500
## 150 1 95.7 ... 6488.0 6304.300000
## 160 0 95.7 ... 6938.0 7091.950000
## 179 3 102.9 ... 16500.0 16733.900000
## 30 2 86.6 ... 5572.0 5485.500000
## 9 0 99.5 ... 16500.0 17475.600000
## 35 0 96.5 ... 7295.0 7284.275000
## 76 2 93.7 ... 5572.0 5708.150000
## 168 2 98.4 ... 8449.0 8796.200000
## 8 1 105.8 ... 22625.0 18386.850000
## 89 1 94.5 ... 6649.0 6723.450000
## 5 2 99.8 ... 11694.0 11413.350000
## 145 0 97.0 ... 11694.0 10336.750000
## 51 1 93.1 ... 6795.0 6356.800000
## 66 0 104.9 ... 11048.0 13603.800000
## 49 0 102.0 ... 28248.0 32523.325000
## 43 0 94.3 ... 8013.0 9309.700000
## 186 2 97.3 ... 7898.0 8189.250000
## 141 0 97.2 ... 8058.0 7890.925000
## 142 0 97.2 ... 8238.0 8242.950000
## 23 1 93.7 ... 8358.0 8533.800000
## 193 0 100.4 ... 13845.0 12139.100000
## 10 2 101.2 ... 16925.0 14795.900000
## 1 3 88.6 ... 13495.0 12091.550000
## 86 1 96.3 ... 6989.0 8481.883333
## 57 3 95.3 ... 11395.0 12001.200000
## 62 0 98.8 ... 8495.0 9569.466667
## 20 0 94.5 ... 8916.5 8429.129167
## 13 0 101.2 ... 20970.0 19361.200000
## 185 2 97.3 ... 7975.0 7947.400000
## 119 1 93.7 ... 8358.0 8533.800000
## 198 -2 104.3 ... 18950.0 17821.000000
## 165 1 94.5 ... 7898.0 9616.200000
## 196 -2 104.3 ... 12940.0 14617.000000
## 3 2 99.8 ... 9495.0 10729.383333
## 32 1 93.7 ... 5118.0 5626.250000
##
## [41 rows x 18 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3351.45740395, 2759.77978612, 2380.55772915]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 3351.457404 2759.779786 2380.557729
Como resultado en general SE obtuvieron mejores resultados a diferencia con los datos obtenidos en R, usando la misma semilla de 1271 Se cargaron datos numéricos de precios de automóviles basados en algunas variables numéricas.
El modelo de arbol de regresion destaca variables estadísticamente significativas: - enginesize -> 69.9023 % - curbweight -> 20.7796 % - compressionratio -> 03.0440 % - horsepower -> 01.8983 % - carwidth -> 01.5350 %
El modelo de árbol de regresión sus variables de importancia fueron: enginesize, compressionratio, curbweight y horsepower.
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, compressionratio
A destacar la variable enginesize en todos los modelos como importante y significativa y las variables enginesize, curbweight y horsepower como importantes en los modelos árbol de regresión y bosque aleatorio.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. Asi quedaron de mas aceptado al menor: