Comparar modelos de supervisados a través de la aplicación de algoritmos de predicción de precios de automóviles determinando el estadístico del error cuadrático medio (rmse).
Se cargan los datos previamente preparados de la dirección https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv
Se crean datos de entrenamiento al 80%
Se crean datos de validación al 20%
Se crea el modelo regresión múltiple con datos de entrenamiento
Con este modelo se responde a preguntas tales como:
¿cuáles son variables que están por encima del 90% de confianza como predictores?,
¿Cuál es el valor de R Square Adjusted o que tanto representan las variables dependientes al precio del vehículo?
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Se crea el modelo árboles de regresión con los datos de entrenamiento
Se identifica la importancia de las variables sobre el precio
Se visualiza el árbol de regresión y sus reglas de asociación
Se hacen predicciones con datos de validación
Se determinar el estadístico RMSE para efectos de comparación
Se construye el modelo bosques aleatorios con datos de entrenamiento y con 20 árboles simulados
Se identifica la importancia de las variables sobre el precio
Se generan predicciones con datos de validación
Se determina el estadístico RMSE para efectos de comparación
Al final del caso, se describe una interpretación personal
# Tratamiento de datos
import numpy as np
import pandas as pd
# Gráficos
import matplotlib.pyplot as plt
# Preprocesado y moYdelado
from sklearn.model_selection import train_test_split
# Estadisticos y lineal múltiple
import statsmodels.api as sm # Estadísticas R Adjused
import seaborn as sns # Gráficos
from sklearn import linear_model
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures # Polinomial
# Arbol de regresion
from sklearn.tree import DecisionTreeRegressor
from sklearn.tree import plot_tree
from sklearn.tree import export_graphviz
from sklearn.tree import export_text
from sklearn.model_selection import GridSearchCV
# Random Forest
from sklearn.ensemble import RandomForestRegressor
# Metricas
from sklearn import metrics
from sklearn.metrics import mean_squared_error, r2_score
datos = pd.read_csv("https://raw.githubusercontent.com/rpizarrog/Analisis-Inteligente-de-datos/main/datos/CarPrice_Assignment_Numericas_Preparado.csv")
datos
## Unnamed: 0 symboling wheelbase ... citympg highwaympg price
## 0 1 3 88.6 ... 21 27 13495.0
## 1 2 3 88.6 ... 21 27 16500.0
## 2 3 1 94.5 ... 19 26 16500.0
## 3 4 2 99.8 ... 24 30 13950.0
## 4 5 2 99.4 ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 201 -1 109.1 ... 23 28 16845.0
## 201 202 -1 109.1 ... 19 25 19045.0
## 202 203 -1 109.1 ... 18 23 21485.0
## 203 204 -1 109.1 ... 26 27 22470.0
## 204 205 -1 109.1 ... 19 25 22625.0
##
## [205 rows x 16 columns]
print("Observaciones y variables: ", datos.shape)
## Observaciones y variables: (205, 16)
print("Columnas y tipo de dato")
# datos.columns
## Columnas y tipo de dato
datos.dtypes
## Unnamed: 0 int64
## symboling int64
## wheelbase float64
## carlength float64
## carwidth float64
## carheight float64
## curbweight int64
## enginesize int64
## boreratio float64
## stroke float64
## compressionratio float64
## horsepower int64
## peakrpm int64
## citympg int64
## highwaympg int64
## price float64
## dtype: object
| Col | Nombre | Descripción |
|---|---|---|
| 1 | Symboling | Its assigned insurance risk rating, A value of +3 indicates that the auto is risky, -3 that it is probably pretty safe.(Categorical) |
| 2 | wheelbase | Weelbase of car (Numeric). Distancia de ejes en pulgadas. |
| 3 | carlength | Length of car (Numeric). Longitud |
| 4 | carwidth | Width of car (Numeric). Amplitud |
| 5 | carheight | height of car (Numeric). Altura |
| 6 | curbweight | The weight of a car without occupants or baggage. (Numeric). Peso del auto |
| 7 | enginesize | Size of car (Numeric). Tamaño del carro en … |
| 8 | boreratio | Boreratio of car (Numeric). Eficiencia de motor |
| 9 | stroke | Stroke or volume inside the engine (Numeric). Pistones, tiempos, combustión |
| 10 | compressionratio | compression ratio of car (Numeric). Comprensión o medición de presión en motor |
| 11 | horsepower | Horsepower (Numeric). Poder del carro |
| 12 | peakrpm | car peak rpm (Numeric). Picos de revoluciones por minuto |
| 13 | citympg | Mileage in city (Numeric). Consumo de gasolina |
| 14 | highwaympg | Mileage on highway (Numeric). Consumo de gasolina |
| 16 | price (Dependent variable) |
Price of car (Numeric). Precio del carro en dólares |
Dejar solo las variables necesarias:
‘symboling’, ‘wheelbase’, ‘carlength’, ‘carwidth’, ‘carheight’, ‘curbweight’, ‘enginesize’, ‘boreratio’, ‘stroke’, ‘compressionratio’, ‘horsepower’, ‘peakrpm’, ‘citympg’, ‘highwaympg’, ‘price’
datos = datos[['symboling', 'wheelbase', 'carlength', 'carwidth', 'carheight', 'curbweight', 'enginesize', 'boreratio', 'stroke', 'compressionratio', 'horsepower', 'peakrpm', 'citympg', 'highwaympg', 'price']]
datos.describe()
## symboling wheelbase carlength ... citympg highwaympg price
## count 205.000000 205.000000 205.000000 ... 205.000000 205.000000 205.000000
## mean 0.834146 98.756585 174.049268 ... 25.219512 30.751220 13276.710571
## std 1.245307 6.021776 12.337289 ... 6.542142 6.886443 7988.852332
## min -2.000000 86.600000 141.100000 ... 13.000000 16.000000 5118.000000
## 25% 0.000000 94.500000 166.300000 ... 19.000000 25.000000 7788.000000
## 50% 1.000000 97.000000 173.200000 ... 24.000000 30.000000 10295.000000
## 75% 2.000000 102.400000 183.100000 ... 30.000000 34.000000 16503.000000
## max 3.000000 120.900000 208.100000 ... 49.000000 54.000000 45400.000000
##
## [8 rows x 15 columns]
datos
## symboling wheelbase carlength ... citympg highwaympg price
## 0 3 88.6 168.8 ... 21 27 13495.0
## 1 3 88.6 168.8 ... 21 27 16500.0
## 2 1 94.5 171.2 ... 19 26 16500.0
## 3 2 99.8 176.6 ... 24 30 13950.0
## 4 2 99.4 176.6 ... 18 22 17450.0
## .. ... ... ... ... ... ... ...
## 200 -1 109.1 188.8 ... 23 28 16845.0
## 201 -1 109.1 188.8 ... 19 25 19045.0
## 202 -1 109.1 188.8 ... 18 23 21485.0
## 203 -1 109.1 188.8 ... 26 27 22470.0
## 204 -1 109.1 188.8 ... 19 25 22625.0
##
## [205 rows x 15 columns]
Datos de entrenamiento al 80% de los datos y 20% los datos de validación. Semilla 1280
X_entrena, X_valida, Y_entrena, Y_valida = train_test_split(datos.drop(columns = "price"), datos['price'],train_size = 0.80, random_state = 1280)
X_entrena
## symboling wheelbase carlength ... peakrpm citympg highwaympg
## 68 -1 110.0 190.9 ... 4350 22 25
## 135 2 99.1 186.6 ... 5250 21 28
## 34 1 93.7 150.0 ... 6000 30 34
## 186 2 97.3 171.7 ... 5250 27 34
## 30 2 86.6 144.6 ... 4800 49 54
## .. ... ... ... ... ... ... ...
## 173 -1 102.4 175.6 ... 4200 29 34
## 49 0 102.0 191.7 ... 5000 13 17
## 178 3 102.9 183.5 ... 5200 20 24
## 3 2 99.8 176.6 ... 5500 24 30
## 189 3 94.5 159.3 ... 5500 24 29
##
## [164 rows x 14 columns]
X_valida
## symboling wheelbase carlength ... peakrpm citympg highwaympg
## 128 3 89.5 168.9 ... 5900 17 25
## 55 3 95.3 169.0 ... 6000 17 23
## 14 1 103.5 189.0 ... 4250 20 25
## 42 1 96.5 169.1 ... 5500 25 31
## 88 -1 96.3 172.4 ... 5500 23 30
## 15 0 103.5 189.0 ... 5400 16 22
## 107 0 107.9 186.7 ... 5000 19 24
## 194 -2 104.3 188.8 ... 5400 23 28
## 64 0 98.8 177.8 ... 4800 26 32
## 199 -1 104.3 188.8 ... 5100 17 22
## 7 1 105.8 192.7 ... 5500 19 25
## 133 2 99.1 186.6 ... 5250 21 28
## 136 3 99.1 186.6 ... 5500 19 26
## 59 1 98.8 177.8 ... 4800 26 32
## 62 0 98.8 177.8 ... 4800 26 32
## 17 0 110.0 197.0 ... 5400 15 20
## 166 1 94.5 168.7 ... 6600 26 29
## 12 0 101.2 176.8 ... 4250 21 28
## 138 2 93.7 156.9 ... 4900 31 36
## 6 1 105.8 192.7 ... 5500 19 25
## 119 1 93.7 157.3 ... 5500 24 30
## 74 1 112.0 199.2 ... 4500 14 16
## 187 2 97.3 171.7 ... 4500 37 42
## 160 0 95.7 166.3 ... 4800 38 47
## 182 2 97.3 171.7 ... 4800 37 46
## 171 2 98.4 176.2 ... 4800 24 30
## 50 1 93.1 159.1 ... 5000 30 31
## 177 -1 102.4 175.6 ... 4200 27 32
## 191 0 100.4 180.2 ... 5500 19 24
## 53 1 93.1 166.8 ... 5000 31 38
## 111 0 107.9 186.7 ... 5000 19 24
## 197 -1 104.3 188.8 ... 5400 24 28
## 156 0 95.7 166.3 ... 4800 30 37
## 132 3 99.1 186.6 ... 5250 21 28
## 5 2 99.8 177.3 ... 5500 19 25
## 54 1 93.1 166.8 ... 5000 31 38
## 179 3 102.9 183.5 ... 5200 19 24
## 184 2 97.3 171.7 ... 4800 37 46
## 36 0 96.5 157.1 ... 6000 30 34
## 4 2 99.4 176.6 ... 5500 18 22
## 109 0 114.2 198.9 ... 5000 19 24
##
## [41 rows x 14 columns]
Se construye el modelo de regresión lineal múltiple (rm)
modelo_rm = LinearRegression()
modelo_rm.fit(X_entrena,Y_entrena)
LinearRegression()In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
LinearRegression()
Solo se muestran los coeficientes de: \(\beta_1, \beta_2, ...\beta_n\)
modelo_rm.coef_
## array([ 4.02793231e+02, 2.03014113e+02, -9.74700439e+01, 5.09515292e+02,
## 9.68098709e+01, 4.02781858e+00, 9.55256449e+01, -6.38186583e+02,
## -3.18635391e+03, 2.77323992e+02, 2.04070616e+01, 2.90014494e+00,
## -2.67368697e+02, 2.28507848e+02])
print(modelo_rm.score(X_entrena, Y_entrena))
## 0.8519801797047908
predicciones_rm = modelo_rm.predict(X_valida)
print(predicciones_rm[:-1])
## [25334.93889816 9901.46967623 16808.08912172 10303.49217485
## 9889.71141769 25260.33422037 15501.79756236 16251.70951709
## 9806.12905593 16606.51147682 19062.5317603 14241.4476332
## 16839.29817633 9873.55177656 9745.71177722 28814.31580314
## 12283.80130552 14950.53947059 8768.32689048 18619.4717164
## 8279.68437465 36599.42082388 9948.09759854 6679.00941098
## 11172.04600752 13773.31129715 4511.76981691 8083.27802033
## 15022.27444079 5314.96673833 18868.86737442 17036.60329956
## 6480.51886485 14497.98481704 15097.6104327 5303.19650209
## 21798.15169376 11184.12946327 8929.30313789 16073.7941481 ]
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rm.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 128 3 89.5 ... 37028.0 25334.938898
## 55 3 95.3 ... 10945.0 9901.469676
## 14 1 103.5 ... 24565.0 16808.089122
## 42 1 96.5 ... 10345.0 10303.492175
## 88 -1 96.3 ... 9279.0 9889.711418
## 15 0 103.5 ... 30760.0 25260.334220
## 107 0 107.9 ... 11900.0 15501.797562
## 194 -2 104.3 ... 12940.0 16251.709517
## 64 0 98.8 ... 11245.0 9806.129056
## 199 -1 104.3 ... 18950.0 16606.511477
## 7 1 105.8 ... 18920.0 19062.531760
## 133 2 99.1 ... 12170.0 14241.447633
## 136 3 99.1 ... 18150.0 16839.298176
## 59 1 98.8 ... 8845.0 9873.551777
## 62 0 98.8 ... 10245.0 9745.711777
## 17 0 110.0 ... 36880.0 28814.315803
## 166 1 94.5 ... 9538.0 12283.801306
## 12 0 101.2 ... 20970.0 14950.539471
## 138 2 93.7 ... 5118.0 8768.326890
## 6 1 105.8 ... 17710.0 18619.471716
## 119 1 93.7 ... 7957.0 8279.684375
## 74 1 112.0 ... 45400.0 36599.420824
## 187 2 97.3 ... 9495.0 9948.097599
## 160 0 95.7 ... 7738.0 6679.009411
## 182 2 97.3 ... 7775.0 11172.046008
## 171 2 98.4 ... 11549.0 13773.311297
## 50 1 93.1 ... 5195.0 4511.769817
## 177 -1 102.4 ... 11248.0 8083.278020
## 191 0 100.4 ... 13295.0 15022.274441
## 53 1 93.1 ... 6695.0 5314.966738
## 111 0 107.9 ... 15580.0 18868.867374
## 197 -1 104.3 ... 16515.0 17036.603300
## 156 0 95.7 ... 6938.0 6480.518865
## 132 3 99.1 ... 11850.0 14497.984817
## 5 2 99.8 ... 15250.0 15097.610433
## 54 1 93.1 ... 7395.0 5303.196502
## 179 3 102.9 ... 15998.0 21798.151694
## 184 2 97.3 ... 7995.0 11184.129463
## 36 0 96.5 ... 7295.0 8929.303138
## 4 2 99.4 ... 17450.0 16073.794148
## 109 0 114.2 ... 12440.0 16631.113581
##
## [41 rows x 16 columns]
rmse_rm = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rm,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rm}")
## El error (rmse) de test es: 3792.859350862403
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rm)))
## Root Mean Squared Error RMSE: 3792.859350862403
Se construye el modelo de árbol de regresión (ar)
modelo_ar = DecisionTreeRegressor(
#max_depth = 3,
random_state = 1280
)
Entrenar el modelo
modelo_ar.fit(X_entrena, Y_entrena)
DecisionTreeRegressor(random_state=1280)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
DecisionTreeRegressor(random_state=1280)
fig, ax = plt.subplots(figsize=(12, 5))
print(f"Profundidad del árbol: {modelo_ar.get_depth()}")
## Profundidad del árbol: 15
print(f"Número de nodos terminales: {modelo_ar.get_n_leaves()}")
## Número de nodos terminales: 154
plot = plot_tree(
decision_tree = modelo_ar,
feature_names = datos.drop(columns = "price").columns,
class_names = 'price',
filled = True,
impurity = False,
fontsize = 10,
precision = 2,
ax = ax
)
plot
Reglas de asociación del árbol
texto_modelo = export_text(
decision_tree = modelo_ar,
feature_names = list(datos.drop(columns = "price").columns)
)
print(texto_modelo)
## |--- enginesize <= 182.00
## | |--- curbweight <= 2544.00
## | | |--- horsepower <= 89.00
## | | | |--- curbweight <= 2121.00
## | | | | |--- horsepower <= 68.50
## | | | | | |--- curbweight <= 1987.00
## | | | | | | |--- citympg <= 33.00
## | | | | | | | |--- curbweight <= 1924.50
## | | | | | | | | |--- curbweight <= 1902.50
## | | | | | | | | | |--- carlength <= 158.20
## | | | | | | | | | | |--- value: [6377.00]
## | | | | | | | | | |--- carlength > 158.20
## | | | | | | | | | | |--- value: [6095.00]
## | | | | | | | | |--- curbweight > 1902.50
## | | | | | | | | | |--- value: [6795.00]
## | | | | | | | |--- curbweight > 1924.50
## | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | |--- value: [6229.00]
## | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | |--- value: [6189.00]
## | | | | | | |--- citympg > 33.00
## | | | | | | | |--- stroke <= 3.32
## | | | | | | | | |--- highwaympg <= 47.50
## | | | | | | | | | |--- enginesize <= 91.00
## | | | | | | | | | | |--- horsepower <= 64.00
## | | | | | | | | | | | |--- value: [5399.00]
## | | | | | | | | | | |--- horsepower > 64.00
## | | | | | | | | | | | |--- value: [5572.00]
## | | | | | | | | | |--- enginesize > 91.00
## | | | | | | | | | | |--- wheelbase <= 94.70
## | | | | | | | | | | | |--- value: [5389.00]
## | | | | | | | | | | |--- wheelbase > 94.70
## | | | | | | | | | | | |--- value: [5348.00]
## | | | | | | | | |--- highwaympg > 47.50
## | | | | | | | | | |--- value: [5151.00]
## | | | | | | | |--- stroke > 3.32
## | | | | | | | | |--- value: [6479.00]
## | | | | | |--- curbweight > 1987.00
## | | | | | | |--- boreratio <= 3.02
## | | | | | | | |--- carheight <= 50.70
## | | | | | | | | |--- value: [7150.50]
## | | | | | | | |--- carheight > 50.70
## | | | | | | | | |--- curbweight <= 2010.50
## | | | | | | | | | |--- carwidth <= 64.10
## | | | | | | | | | | |--- value: [6692.00]
## | | | | | | | | | |--- carwidth > 64.10
## | | | | | | | | | | |--- value: [6669.00]
## | | | | | | | | |--- curbweight > 2010.50
## | | | | | | | | | |--- value: [7099.00]
## | | | | | | |--- boreratio > 3.02
## | | | | | | | |--- curbweight <= 2027.50
## | | | | | | | | |--- value: [6488.00]
## | | | | | | | |--- curbweight > 2027.50
## | | | | | | | | |--- value: [6338.00]
## | | | | |--- horsepower > 68.50
## | | | | | |--- carheight <= 53.60
## | | | | | | |--- compressionratio <= 9.30
## | | | | | | | |--- curbweight <= 1948.00
## | | | | | | | | |--- carwidth <= 63.95
## | | | | | | | | | |--- value: [6855.00]
## | | | | | | | | |--- carwidth > 63.95
## | | | | | | | | | |--- value: [6529.00]
## | | | | | | | |--- curbweight > 1948.00
## | | | | | | | | |--- carlength <= 158.15
## | | | | | | | | | |--- value: [7129.00]
## | | | | | | | | |--- carlength > 158.15
## | | | | | | | | | |--- value: [7198.00]
## | | | | | | |--- compressionratio > 9.30
## | | | | | | | |--- carlength <= 157.35
## | | | | | | | | |--- curbweight <= 1891.50
## | | | | | | | | | |--- value: [7605.75]
## | | | | | | | | |--- curbweight > 1891.50
## | | | | | | | | | |--- value: [8916.50]
## | | | | | | | |--- carlength > 157.35
## | | | | | | | | |--- curbweight <= 1958.50
## | | | | | | | | | |--- value: [6575.00]
## | | | | | | | | |--- curbweight > 1958.50
## | | | | | | | | | |--- symboling <= 1.50
## | | | | | | | | | | |--- curbweight <= 2026.00
## | | | | | | | | | | | |--- value: [7349.00]
## | | | | | | | | | | |--- curbweight > 2026.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | |--- symboling > 1.50
## | | | | | | | | | | |--- value: [8249.00]
## | | | | | |--- carheight > 53.60
## | | | | | | |--- curbweight <= 1903.50
## | | | | | | | |--- value: [5499.00]
## | | | | | | |--- curbweight > 1903.50
## | | | | | | | |--- curbweight <= 1944.50
## | | | | | | | | |--- curbweight <= 1928.00
## | | | | | | | | | |--- value: [6649.00]
## | | | | | | | | |--- curbweight > 1928.00
## | | | | | | | | | |--- value: [6849.00]
## | | | | | | | |--- curbweight > 1944.50
## | | | | | | | | |--- highwaympg <= 32.50
## | | | | | | | | | |--- value: [7053.00]
## | | | | | | | | |--- highwaympg > 32.50
## | | | | | | | | | |--- wheelbase <= 95.50
## | | | | | | | | | | |--- curbweight <= 1961.00
## | | | | | | | | | | | |--- value: [7299.00]
## | | | | | | | | | | |--- curbweight > 1961.00
## | | | | | | | | | | | |--- value: [7499.00]
## | | | | | | | | | |--- wheelbase > 95.50
## | | | | | | | | | | |--- value: [7295.00]
## | | | |--- curbweight > 2121.00
## | | | | |--- carlength <= 174.10
## | | | | | |--- carwidth <= 63.90
## | | | | | | |--- horsepower <= 75.50
## | | | | | | | |--- citympg <= 29.00
## | | | | | | | | |--- citympg <= 26.50
## | | | | | | | | | |--- value: [7603.00]
## | | | | | | | | |--- citympg > 26.50
## | | | | | | | | | |--- value: [7898.00]
## | | | | | | | |--- citympg > 29.00
## | | | | | | | | |--- horsepower <= 65.00
## | | | | | | | | | |--- value: [6918.00]
## | | | | | | | | |--- horsepower > 65.00
## | | | | | | | | | |--- value: [7609.00]
## | | | | | | |--- horsepower > 75.50
## | | | | | | | |--- value: [6785.00]
## | | | | | |--- carwidth > 63.90
## | | | | | | |--- highwaympg <= 27.00
## | | | | | | | |--- value: [9233.00]
## | | | | | | |--- highwaympg > 27.00
## | | | | | | | |--- boreratio <= 3.23
## | | | | | | | | |--- curbweight <= 2282.00
## | | | | | | | | | |--- curbweight <= 2154.50
## | | | | | | | | | | |--- curbweight <= 2131.00
## | | | | | | | | | | | |--- value: [8358.00]
## | | | | | | | | | | |--- curbweight > 2131.00
## | | | | | | | | | | | |--- value: [9258.00]
## | | | | | | | | | |--- curbweight > 2154.50
## | | | | | | | | | | |--- curbweight <= 2255.50
## | | | | | | | | | | | |--- truncated branch of depth 5
## | | | | | | | | | | |--- curbweight > 2255.50
## | | | | | | | | | | | |--- value: [8495.00]
## | | | | | | | | |--- curbweight > 2282.00
## | | | | | | | | | |--- value: [9095.00]
## | | | | | | | |--- boreratio > 3.23
## | | | | | | | | |--- symboling <= 2.00
## | | | | | | | | | |--- curbweight <= 2385.00
## | | | | | | | | | | |--- peakrpm <= 4650.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | | |--- peakrpm > 4650.00
## | | | | | | | | | | | |--- truncated branch of depth 3
## | | | | | | | | | |--- curbweight > 2385.00
## | | | | | | | | | | |--- enginesize <= 115.00
## | | | | | | | | | | | |--- value: [8013.00]
## | | | | | | | | | | |--- enginesize > 115.00
## | | | | | | | | | | | |--- value: [8189.00]
## | | | | | | | | |--- symboling > 2.00
## | | | | | | | | | |--- value: [8499.00]
## | | | | |--- carlength > 174.10
## | | | | | |--- compressionratio <= 15.75
## | | | | | | |--- curbweight <= 2397.50
## | | | | | | | |--- curbweight <= 2338.00
## | | | | | | | | |--- value: [8845.00]
## | | | | | | | |--- curbweight > 2338.00
## | | | | | | | | |--- carlength <= 176.60
## | | | | | | | | | |--- value: [10295.00]
## | | | | | | | | |--- carlength > 176.60
## | | | | | | | | | |--- value: [10595.00]
## | | | | | | |--- curbweight > 2397.50
## | | | | | | | |--- compressionratio <= 8.55
## | | | | | | | | |--- value: [8921.00]
## | | | | | | | |--- compressionratio > 8.55
## | | | | | | | | |--- value: [8495.00]
## | | | | | |--- compressionratio > 15.75
## | | | | | | |--- peakrpm <= 4575.00
## | | | | | | | |--- value: [10698.00]
## | | | | | | |--- peakrpm > 4575.00
## | | | | | | | |--- value: [10795.00]
## | | |--- horsepower > 89.00
## | | | |--- peakrpm <= 5650.00
## | | | | |--- wheelbase <= 94.10
## | | | | | |--- curbweight <= 2168.00
## | | | | | | |--- curbweight <= 2136.50
## | | | | | | | |--- value: [7957.00]
## | | | | | | |--- curbweight > 2136.50
## | | | | | | | |--- value: [7689.00]
## | | | | | |--- curbweight > 2168.00
## | | | | | | |--- value: [8558.00]
## | | | | |--- wheelbase > 94.10
## | | | | | |--- stroke <= 3.43
## | | | | | | |--- wheelbase <= 98.55
## | | | | | | | |--- carlength <= 162.50
## | | | | | | | | |--- value: [11595.00]
## | | | | | | | |--- carlength > 162.50
## | | | | | | | | |--- compressionratio <= 8.10
## | | | | | | | | | |--- value: [11259.00]
## | | | | | | | | |--- compressionratio > 8.10
## | | | | | | | | | |--- curbweight <= 2397.50
## | | | | | | | | | | |--- curbweight <= 2320.00
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | | | | |--- curbweight > 2320.00
## | | | | | | | | | | | |--- value: [9960.00]
## | | | | | | | | | |--- curbweight > 2397.50
## | | | | | | | | | | |--- value: [10198.00]
## | | | | | | |--- wheelbase > 98.55
## | | | | | | | |--- value: [13950.00]
## | | | | | |--- stroke > 3.43
## | | | | | | |--- curbweight <= 2538.00
## | | | | | | | |--- curbweight <= 2408.50
## | | | | | | | | |--- carheight <= 50.50
## | | | | | | | | | |--- value: [9959.00]
## | | | | | | | | |--- carheight > 50.50
## | | | | | | | | | |--- curbweight <= 2313.00
## | | | | | | | | | | |--- value: [9549.00]
## | | | | | | | | | |--- curbweight > 2313.00
## | | | | | | | | | | |--- stroke <= 3.47
## | | | | | | | | | | | |--- value: [9279.00]
## | | | | | | | | | | |--- stroke > 3.47
## | | | | | | | | | | | |--- truncated branch of depth 2
## | | | | | | | |--- curbweight > 2408.50
## | | | | | | | | |--- carheight <= 54.40
## | | | | | | | | | |--- highwaympg <= 30.50
## | | | | | | | | | | |--- value: [9639.00]
## | | | | | | | | | |--- highwaympg > 30.50
## | | | | | | | | | | |--- symboling <= 0.50
## | | | | | | | | | | | |--- value: [9988.00]
## | | | | | | | | | | |--- symboling > 0.50
## | | | | | | | | | | | |--- value: [9895.00]
## | | | | | | | | |--- carheight > 54.40
## | | | | | | | | | |--- value: [10898.00]
## | | | | | | |--- curbweight > 2538.00
## | | | | | | | |--- value: [8449.00]
## | | | |--- peakrpm > 5650.00
## | | | | |--- curbweight <= 2382.50
## | | | | | |--- carheight <= 51.10
## | | | | | | |--- value: [11845.00]
## | | | | | |--- carheight > 51.10
## | | | | | | |--- value: [9298.00]
## | | | | |--- curbweight > 2382.50
## | | | | | |--- boreratio <= 3.41
## | | | | | | |--- horsepower <= 118.00
## | | | | | | | |--- wheelbase <= 95.90
## | | | | | | | | |--- value: [13645.00]
## | | | | | | | |--- wheelbase > 95.90
## | | | | | | | | |--- value: [12945.00]
## | | | | | | |--- horsepower > 118.00
## | | | | | | | |--- value: [15645.00]
## | | | | | |--- boreratio > 3.41
## | | | | | | |--- symboling <= 1.00
## | | | | | | | |--- value: [16925.00]
## | | | | | | |--- symboling > 1.00
## | | | | | | | |--- value: [16430.00]
## | |--- curbweight > 2544.00
## | | |--- carwidth <= 67.45
## | | | |--- wheelbase <= 100.80
## | | | | |--- citympg <= 22.00
## | | | | | |--- horsepower <= 153.00
## | | | | | | |--- horsepower <= 128.00
## | | | | | | | |--- stroke <= 2.88
## | | | | | | | | |--- highwaympg <= 27.50
## | | | | | | | | | |--- value: [14997.50]
## | | | | | | | | |--- highwaympg > 27.50
## | | | | | | | | | |--- value: [15040.00]
## | | | | | | | |--- stroke > 2.88
## | | | | | | | | |--- value: [15510.00]
## | | | | | | |--- horsepower > 128.00
## | | | | | | | |--- curbweight <= 2877.00
## | | | | | | | | |--- stroke <= 3.88
## | | | | | | | | | |--- boreratio <= 3.58
## | | | | | | | | | | |--- value: [12629.00]
## | | | | | | | | | |--- boreratio > 3.58
## | | | | | | | | | | |--- value: [12764.00]
## | | | | | | | | |--- stroke > 3.88
## | | | | | | | | | |--- value: [12964.00]
## | | | | | | | |--- curbweight > 2877.00
## | | | | | | | | |--- peakrpm <= 5100.00
## | | | | | | | | | |--- curbweight <= 2923.50
## | | | | | | | | | | |--- value: [14869.00]
## | | | | | | | | | |--- curbweight > 2923.50
## | | | | | | | | | | |--- value: [14489.00]
## | | | | | | | | |--- peakrpm > 5100.00
## | | | | | | | | | |--- curbweight <= 3195.50
## | | | | | | | | | | |--- value: [13499.00]
## | | | | | | | | | |--- curbweight > 3195.50
## | | | | | | | | | | |--- value: [14399.00]
## | | | | | |--- horsepower > 153.00
## | | | | | | |--- enginesize <= 136.50
## | | | | | | | |--- value: [18620.00]
## | | | | | | |--- enginesize > 136.50
## | | | | | | | |--- value: [16500.00]
## | | | | |--- citympg > 22.00
## | | | | | |--- carheight <= 55.15
## | | | | | | |--- curbweight <= 2854.50
## | | | | | | | |--- boreratio <= 3.31
## | | | | | | | | |--- citympg <= 29.00
## | | | | | | | | | |--- value: [12290.00]
## | | | | | | | | |--- citympg > 29.00
## | | | | | | | | | |--- value: [13845.00]
## | | | | | | | |--- boreratio > 3.31
## | | | | | | | | |--- curbweight <= 2600.50
## | | | | | | | | | |--- value: [9989.00]
## | | | | | | | | |--- curbweight > 2600.50
## | | | | | | | | | |--- stroke <= 2.94
## | | | | | | | | | | |--- value: [11694.00]
## | | | | | | | | | |--- stroke > 2.94
## | | | | | | | | | | |--- compressionratio <= 9.25
## | | | | | | | | | | | |--- value: [11048.00]
## | | | | | | | | | | |--- compressionratio > 9.25
## | | | | | | | | | | | |--- value: [11199.00]
## | | | | | | |--- curbweight > 2854.50
## | | | | | | | |--- value: [17669.00]
## | | | | | |--- carheight > 55.15
## | | | | | | |--- enginesize <= 112.00
## | | | | | | | |--- value: [8778.00]
## | | | | | | |--- enginesize > 112.00
## | | | | | | | |--- value: [9295.00]
## | | | |--- wheelbase > 100.80
## | | | | |--- peakrpm <= 5150.00
## | | | | | |--- wheelbase <= 102.75
## | | | | | | |--- value: [21105.00]
## | | | | | |--- wheelbase > 102.75
## | | | | | | |--- curbweight <= 2872.50
## | | | | | | | |--- citympg <= 25.00
## | | | | | | | | |--- value: [18280.00]
## | | | | | | | |--- citympg > 25.00
## | | | | | | | | |--- value: [18344.00]
## | | | | | | |--- curbweight > 2872.50
## | | | | | | | |--- value: [18420.00]
## | | | | |--- peakrpm > 5150.00
## | | | | | |--- carheight <= 56.85
## | | | | | | |--- highwaympg <= 26.00
## | | | | | | | |--- curbweight <= 3141.00
## | | | | | | | | |--- value: [15690.00]
## | | | | | | | |--- curbweight > 3141.00
## | | | | | | | | |--- value: [15750.00]
## | | | | | | |--- highwaympg > 26.00
## | | | | | | | |--- value: [15985.00]
## | | | | | |--- carheight > 56.85
## | | | | | | |--- value: [13415.00]
## | | |--- carwidth > 67.45
## | | | |--- carwidth <= 68.85
## | | | | |--- peakrpm <= 5100.00
## | | | | | |--- curbweight <= 3224.50
## | | | | | | |--- horsepower <= 96.00
## | | | | | | | |--- value: [13200.00]
## | | | | | | |--- horsepower > 96.00
## | | | | | | | |--- carlength <= 182.55
## | | | | | | | | |--- value: [16503.00]
## | | | | | | | |--- carlength > 182.55
## | | | | | | | | |--- value: [16630.00]
## | | | | | |--- curbweight > 3224.50
## | | | | | | |--- curbweight <= 3357.50
## | | | | | | | |--- stroke <= 2.86
## | | | | | | | | |--- value: [16695.00]
## | | | | | | | |--- stroke > 2.86
## | | | | | | | | |--- value: [17425.00]
## | | | | | | |--- curbweight > 3357.50
## | | | | | | | |--- curbweight <= 3457.50
## | | | | | | | | |--- value: [13860.00]
## | | | | | | | |--- curbweight > 3457.50
## | | | | | | | | |--- value: [17075.00]
## | | | | |--- peakrpm > 5100.00
## | | | | | |--- boreratio <= 3.86
## | | | | | | |--- compressionratio <= 8.85
## | | | | | | | |--- compressionratio <= 7.40
## | | | | | | | | |--- peakrpm <= 5550.00
## | | | | | | | | | |--- value: [17859.17]
## | | | | | | | | |--- peakrpm > 5550.00
## | | | | | | | | | |--- value: [18150.00]
## | | | | | | | |--- compressionratio > 7.40
## | | | | | | | | |--- compressionratio <= 8.25
## | | | | | | | | | |--- value: [19699.00]
## | | | | | | | | |--- compressionratio > 8.25
## | | | | | | | | | |--- value: [19045.00]
## | | | | | | |--- compressionratio > 8.85
## | | | | | | | |--- curbweight <= 3105.00
## | | | | | | | | |--- carwidth <= 67.80
## | | | | | | | | | |--- value: [16558.00]
## | | | | | | | | |--- carwidth > 67.80
## | | | | | | | | | |--- value: [17199.00]
## | | | | | | | |--- curbweight > 3105.00
## | | | | | | | | |--- value: [18399.00]
## | | | | | |--- boreratio > 3.86
## | | | | | | |--- value: [22018.00]
## | | | |--- carwidth > 68.85
## | | | | |--- curbweight <= 2982.00
## | | | | | |--- value: [16845.00]
## | | | | |--- curbweight > 2982.00
## | | | | | |--- carheight <= 55.70
## | | | | | | |--- compressionratio <= 9.15
## | | | | | | | |--- value: [21485.00]
## | | | | | | |--- compressionratio > 9.15
## | | | | | | | |--- compressionratio <= 16.25
## | | | | | | | | |--- value: [22625.00]
## | | | | | | | |--- compressionratio > 16.25
## | | | | | | | | |--- value: [22470.00]
## | | | | | |--- carheight > 55.70
## | | | | | | |--- value: [23875.00]
## |--- enginesize > 182.00
## | |--- compressionratio <= 8.05
## | | |--- highwaympg <= 19.00
## | | | |--- value: [40960.00]
## | | |--- highwaympg > 19.00
## | | | |--- value: [41315.00]
## | |--- compressionratio > 8.05
## | | |--- peakrpm <= 4550.00
## | | | |--- carwidth <= 71.00
## | | | | |--- carheight <= 57.60
## | | | | | |--- carheight <= 55.70
## | | | | | | |--- value: [28176.00]
## | | | | | |--- carheight > 55.70
## | | | | | | |--- value: [25552.00]
## | | | | |--- carheight > 57.60
## | | | | | |--- value: [28248.00]
## | | | |--- carwidth > 71.00
## | | | | |--- value: [31600.00]
## | | |--- peakrpm > 4550.00
## | | | |--- citympg <= 16.50
## | | | | |--- carlength <= 195.65
## | | | | | |--- enginesize <= 280.00
## | | | | | | |--- value: [35056.00]
## | | | | | |--- enginesize > 280.00
## | | | | | | |--- value: [36000.00]
## | | | | |--- carlength > 195.65
## | | | | | |--- symboling <= -0.50
## | | | | | | |--- value: [34184.00]
## | | | | | |--- symboling > -0.50
## | | | | | | |--- value: [33900.00]
## | | | |--- citympg > 16.50
## | | | | |--- carheight <= 51.05
## | | | | | |--- value: [31400.50]
## | | | | |--- carheight > 51.05
## | | | | | |--- value: [33278.00]
importancia_predictores = pd.DataFrame(
{'predictor': datos.drop(columns = "price").columns,
'importancia': modelo_ar.feature_importances_}
)
print("Importancia de los predictores en el modelo")
## Importancia de los predictores en el modelo
importancia_predictores.sort_values('importancia', ascending=False)
## predictor importancia
## 6 enginesize 0.657813
## 5 curbweight 0.220324
## 10 horsepower 0.028156
## 3 carwidth 0.027944
## 11 peakrpm 0.022573
## 9 compressionratio 0.015858
## 1 wheelbase 0.010841
## 12 citympg 0.005752
## 4 carheight 0.004025
## 7 boreratio 0.003041
## 2 carlength 0.002328
## 8 stroke 0.001070
## 13 highwaympg 0.000169
## 0 symboling 0.000106
Estos sería los predictores más importantes para el modelo de árbol de regresión enginesize, curbweight, horsepower, carwidth y peakrpm
predicciones_ar = modelo_ar.predict(X = X_valida)
predicciones_ar
## array([33278., 11845., 18420., 9549., 9279., 41315., 16630., 15985.,
## 8495., 18420., 16845., 15510., 18620., 10595., 8495., 41315.,
## 9298., 21105., 7499., 16845., 7957., 40960., 9095., 7198.,
## 8495., 11199., 6095., 9988., 15510., 6229., 13200., 13415.,
## 7198., 15510., 13950., 6229., 16558., 8495., 7295., 15510.,
## 17425.])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_ar.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 128 3 89.5 ... 37028.0 33278.0
## 55 3 95.3 ... 10945.0 11845.0
## 14 1 103.5 ... 24565.0 18420.0
## 42 1 96.5 ... 10345.0 9549.0
## 88 -1 96.3 ... 9279.0 9279.0
## 15 0 103.5 ... 30760.0 41315.0
## 107 0 107.9 ... 11900.0 16630.0
## 194 -2 104.3 ... 12940.0 15985.0
## 64 0 98.8 ... 11245.0 8495.0
## 199 -1 104.3 ... 18950.0 18420.0
## 7 1 105.8 ... 18920.0 16845.0
## 133 2 99.1 ... 12170.0 15510.0
## 136 3 99.1 ... 18150.0 18620.0
## 59 1 98.8 ... 8845.0 10595.0
## 62 0 98.8 ... 10245.0 8495.0
## 17 0 110.0 ... 36880.0 41315.0
## 166 1 94.5 ... 9538.0 9298.0
## 12 0 101.2 ... 20970.0 21105.0
## 138 2 93.7 ... 5118.0 7499.0
## 6 1 105.8 ... 17710.0 16845.0
## 119 1 93.7 ... 7957.0 7957.0
## 74 1 112.0 ... 45400.0 40960.0
## 187 2 97.3 ... 9495.0 9095.0
## 160 0 95.7 ... 7738.0 7198.0
## 182 2 97.3 ... 7775.0 8495.0
## 171 2 98.4 ... 11549.0 11199.0
## 50 1 93.1 ... 5195.0 6095.0
## 177 -1 102.4 ... 11248.0 9988.0
## 191 0 100.4 ... 13295.0 15510.0
## 53 1 93.1 ... 6695.0 6229.0
## 111 0 107.9 ... 15580.0 13200.0
## 197 -1 104.3 ... 16515.0 13415.0
## 156 0 95.7 ... 6938.0 7198.0
## 132 3 99.1 ... 11850.0 15510.0
## 5 2 99.8 ... 15250.0 13950.0
## 54 1 93.1 ... 7395.0 6229.0
## 179 3 102.9 ... 15998.0 16558.0
## 184 2 97.3 ... 7995.0 8495.0
## 36 0 96.5 ... 7295.0 7295.0
## 4 2 99.4 ... 17450.0 15510.0
## 109 0 114.2 ... 12440.0 17425.0
##
## [41 rows x 16 columns]
rmse_ar = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_ar,
squared = False
)
print(f"El error (rmse) de test es: {rmse_ar}")
## El error (rmse) de test es: 2885.7290324596465
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_ar)))
## Root Mean Squared Error RMSE: 2885.7290324596465
Se construye el modelo de árbol de regresión (ar). Semilla 1280 y 20 árboles de entrenamiento
modelo_rf = RandomForestRegressor(n_estimators = 20, random_state = 1280)
modelo_rf.fit(X_entrena, Y_entrena)
RandomForestRegressor(n_estimators=20, random_state=1280)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
RandomForestRegressor(n_estimators=20, random_state=1280)
# pendiente ... ...
predicciones_rf = modelo_rf.predict(X_valida)
predicciones_rf
## array([31829.2 , 12755. , 17371.1 , 9242.9 ,
## 9214.6 , 37967.875 , 16620.65 , 15406.95 ,
## 9755.4 , 17145.85 , 21182.9 , 14734.55 ,
## 17433.3 , 10369.8 , 9700.25 , 38584.625 ,
## 10579.25 , 18922.05 , 7282.0875 , 21206.4 ,
## 8003.7 , 39273.2 , 8010.7 , 7725.025 ,
## 7487.25 , 12735.9 , 5999. , 10244.4 ,
## 15082.175 , 6519.25 , 16294.25 , 14781.65 ,
## 7702.65 , 13798.19166667, 13619.575 , 6552.65 ,
## 17096.45 , 7487.25 , 7740.225 , 16513.95 ,
## 16326.25 ])
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Real Precio_Prediccion
## 128 3 89.5 ... 37028.0 31829.200000
## 55 3 95.3 ... 10945.0 12755.000000
## 14 1 103.5 ... 24565.0 17371.100000
## 42 1 96.5 ... 10345.0 9242.900000
## 88 -1 96.3 ... 9279.0 9214.600000
## 15 0 103.5 ... 30760.0 37967.875000
## 107 0 107.9 ... 11900.0 16620.650000
## 194 -2 104.3 ... 12940.0 15406.950000
## 64 0 98.8 ... 11245.0 9755.400000
## 199 -1 104.3 ... 18950.0 17145.850000
## 7 1 105.8 ... 18920.0 21182.900000
## 133 2 99.1 ... 12170.0 14734.550000
## 136 3 99.1 ... 18150.0 17433.300000
## 59 1 98.8 ... 8845.0 10369.800000
## 62 0 98.8 ... 10245.0 9700.250000
## 17 0 110.0 ... 36880.0 38584.625000
## 166 1 94.5 ... 9538.0 10579.250000
## 12 0 101.2 ... 20970.0 18922.050000
## 138 2 93.7 ... 5118.0 7282.087500
## 6 1 105.8 ... 17710.0 21206.400000
## 119 1 93.7 ... 7957.0 8003.700000
## 74 1 112.0 ... 45400.0 39273.200000
## 187 2 97.3 ... 9495.0 8010.700000
## 160 0 95.7 ... 7738.0 7725.025000
## 182 2 97.3 ... 7775.0 7487.250000
## 171 2 98.4 ... 11549.0 12735.900000
## 50 1 93.1 ... 5195.0 5999.000000
## 177 -1 102.4 ... 11248.0 10244.400000
## 191 0 100.4 ... 13295.0 15082.175000
## 53 1 93.1 ... 6695.0 6519.250000
## 111 0 107.9 ... 15580.0 16294.250000
## 197 -1 104.3 ... 16515.0 14781.650000
## 156 0 95.7 ... 6938.0 7702.650000
## 132 3 99.1 ... 11850.0 13798.191667
## 5 2 99.8 ... 15250.0 13619.575000
## 54 1 93.1 ... 7395.0 6552.650000
## 179 3 102.9 ... 15998.0 17096.450000
## 184 2 97.3 ... 7995.0 7487.250000
## 36 0 96.5 ... 7295.0 7740.225000
## 4 2 99.4 ... 17450.0 16513.950000
## 109 0 114.2 ... 12440.0 16326.250000
##
## [41 rows x 16 columns]
rmse_rf = mean_squared_error(
y_true = Y_valida,
y_pred = predicciones_rf,
squared = False
)
print(f"El error (rmse) de test es: {rmse_rf}")
## El error (rmse) de test es: 2631.9569804706516
o
print('Root Mean Squared Error RMSE:', np.sqrt(metrics.mean_squared_error(Y_valida, predicciones_rf)))
## Root Mean Squared Error RMSE: 2631.9569804706516
Se comparan las predicciones
comparaciones = pd.DataFrame(X_valida)
comparaciones = comparaciones.assign(Precio_Real = Y_valida)
comparaciones = comparaciones.assign(Precio_Prediccion_rm = predicciones_rm.flatten().tolist(), Precio_Prediccion_ar = predicciones_ar.flatten().tolist(), Precio_Prediccion_rf = predicciones_rf.flatten().tolist())
print(comparaciones)
## symboling wheelbase ... Precio_Prediccion_ar Precio_Prediccion_rf
## 128 3 89.5 ... 33278.0 31829.200000
## 55 3 95.3 ... 11845.0 12755.000000
## 14 1 103.5 ... 18420.0 17371.100000
## 42 1 96.5 ... 9549.0 9242.900000
## 88 -1 96.3 ... 9279.0 9214.600000
## 15 0 103.5 ... 41315.0 37967.875000
## 107 0 107.9 ... 16630.0 16620.650000
## 194 -2 104.3 ... 15985.0 15406.950000
## 64 0 98.8 ... 8495.0 9755.400000
## 199 -1 104.3 ... 18420.0 17145.850000
## 7 1 105.8 ... 16845.0 21182.900000
## 133 2 99.1 ... 15510.0 14734.550000
## 136 3 99.1 ... 18620.0 17433.300000
## 59 1 98.8 ... 10595.0 10369.800000
## 62 0 98.8 ... 8495.0 9700.250000
## 17 0 110.0 ... 41315.0 38584.625000
## 166 1 94.5 ... 9298.0 10579.250000
## 12 0 101.2 ... 21105.0 18922.050000
## 138 2 93.7 ... 7499.0 7282.087500
## 6 1 105.8 ... 16845.0 21206.400000
## 119 1 93.7 ... 7957.0 8003.700000
## 74 1 112.0 ... 40960.0 39273.200000
## 187 2 97.3 ... 9095.0 8010.700000
## 160 0 95.7 ... 7198.0 7725.025000
## 182 2 97.3 ... 8495.0 7487.250000
## 171 2 98.4 ... 11199.0 12735.900000
## 50 1 93.1 ... 6095.0 5999.000000
## 177 -1 102.4 ... 9988.0 10244.400000
## 191 0 100.4 ... 15510.0 15082.175000
## 53 1 93.1 ... 6229.0 6519.250000
## 111 0 107.9 ... 13200.0 16294.250000
## 197 -1 104.3 ... 13415.0 14781.650000
## 156 0 95.7 ... 7198.0 7702.650000
## 132 3 99.1 ... 15510.0 13798.191667
## 5 2 99.8 ... 13950.0 13619.575000
## 54 1 93.1 ... 6229.0 6552.650000
## 179 3 102.9 ... 16558.0 17096.450000
## 184 2 97.3 ... 8495.0 7487.250000
## 36 0 96.5 ... 7295.0 7740.225000
## 4 2 99.4 ... 15510.0 16513.950000
## 109 0 114.2 ... 17425.0 16326.250000
##
## [41 rows x 18 columns]
Se compara el RMSE.
Se crea un arreglo numpy
rmse = np.array([[rmse_rm, rmse_ar, rmse_rf]])
rmse
## array([[3792.85935086, 2885.72903246, 2631.95698047]])
Se construye data.frame a partir del rreglo nmpy
rmse = pd.DataFrame(rmse)
rmse.columns = ['rmse_rm', 'rmse_ar', 'rmse_rf']
rmse
## rmse_rm rmse_ar rmse_rf
## 0 3792.859351 2885.729032 2631.95698
El ejercicio consistió en cargar un conjunto de datos numéricos de precios de automóviles con respecto a algunas variables numéricas.
El modelo de regresión linea múltiple destaca el estadístico Adjusted R-squared con un valor de 0.8519, lo que se define como que las variables independientes explican aproximadamente el 85.19% de la variable dependiente precio.
En el modelode árbol de regresión las variables que corresponden a los predictores más importantes para este modelo son enginesize, curbweight, horsepower, carwidth y peakrpm
El modelo de bosque aleatorio considera variables de importancia tales como: enginesize, curbweight, horsepower, citympg y carwidth.
Un dato interesante que me gustaría hacer notar es que la variable enginesize esta presente como la más importante en todos los modelos de regresión, incluso en los que corresponden a la programación en R.
El mejor modelo conforme al estadístico raiz del error cuadrático medio (rmse) fue el de bosques aleatorios con estos datos de entrenamiento y validación y con el porcentaje de datos de entrenamiento y validación de 80% y 20%. El valor que arrojó fue de 2631.95698, siendo el más bajo de los 3 modelos de regresión.
Comparando los resultados en R con los resultados arrojados en Python, el modelo que proporcionó el menor valor del estádistico RMSE fue el de random forest en ambos casos. No obstante, en R tuvo una cantidad de 2245.088 y en Python tuvo otra de 2631.95698, por lo tanto se puede concluir en que el modelo más óptimo, especificamente con estos datos, es efectivamente el random forest pero haciendo uso de la programación en R.