La regresión lineal múltiple permite generar un modelo lineal en el que el valor de la variable dependiente o respuesta (\(Y\)) se determina a partir de un conjunto de variables independientes llamadas predictores (\(X1\), \(X2\), \(X3\),…, \(Xn\)). Es una extensión de la regresión lineal simple.
Los modelos lineales múltiples tiene la siguiente forma: \(Y_{i}=(\beta_{0}+\beta_{1}X_{1i}+\beta_{2}X_{2i}+\cdots+\beta_{n}X_{ni})+e_{i}\)
Donde \(Y\) es la variablde
dependiente.
\(\beta_{0}\) es la ordenada en el
origen (o intercepta), el valor de la variable dependiente \(Y\) cuando todos los predictores son
cero.
\(\beta_{i}\) es el efecto promedio que
tiene el incremento en una unidad de la variable predictora \(Yi\) sobre la variable dependiente \(Xi\), manteniéndose constantes el resto de
variables. Se conocen como coeficientes parciales de regresión.
\(e_{i}\): es el residuo o error.
Calculado como la diferencia entre el valor observado y el estimado por
el modelo.
El modelo de Regresion Lineal Multiple (MRM) tiene la siguiente estructura matricial. \[Y_{nx1}=X_{nx(k+1)}\beta _{k+1}+\varepsilon _{nx1}\]
Modelo \[\begin{bmatrix}Y_{1} \\Y_{2} \\\vdots \\ Y_{n} \end{bmatrix}=\begin{bmatrix} 1 & X_{11} & \cdots & X_{1k}\\ 1 & X_{12} & \cdots & \cdots\\ \vdots & \vdots & \ddots & \ddots \\ 1 & X_{n1} & \cdots & X_{nk} \end{bmatrix}\begin{bmatrix} \beta _{0}\\ \beta _{1}\ \\\vdots \ \\\beta _{k} \ \end{bmatrix}+\begin{bmatrix} \varepsilon _{1} \\ \varepsilon _{2} \\ \vdots \\ \varepsilon _{n} \end{bmatrix}\]
library(tidyr)
library(pastecs)
library(correlation)
## Warning: package 'correlation' was built under R version 4.1.3
library(boot)
library(corrplot)
library(qgraph)
Si no abre, copia y pega el link a tu explorador
library(vembedr)
embed_url("https://www.youtube.com/watch?v=cpLoEcJC1m4")
Los datos que se utilizarán son de las librerías de r, estos son: data(“trees”). Estos datos vienen como: “Girth”, “Height”, y “Volume”, es decir; es la circunferencia de árbol a 1.3 m, la altura total del árbol, y el volumen. Son 31 observaciones y 3 variables de Black Cherry (Prunus serotina). Note que las unidades son en pulgadas, pies y pies cubicos, respectivamente. Para eso será necesario convertirlas al sistema métrico decimal (S.M.D.), para mejor entendimiento. Ademas re-nombraremos las variables. Note that the diameter (in inches) is erroneously labelled Girth in the data.
Datos<-trees # el objeto
Datos<-Datos %>% drop_na() # Eliminar celdas con NA
attach(Datos) # crucial
head(Datos)
## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
Datos$D<-(Girth*2.54) #Convertir de "inches" a "cm" y llamar "Girth" como "Diametro"
Datos$H<-(Height*0.3048) #Convertir de "ft" a "m" y llamar "Height" como "Altura"
Datos$V<-(Volume*0.0283168) #Convertir de "cubic ft" a "m3" y llamar "Volume" como "Volumen"
attach(Datos)
plot(D, V)
#
# #Generando las variables necesarias para el modelo
#
Datos$DH<-D*H; attach(Datos)
Datos$D2<-D*D; attach(Datos)
Datos$H2<-H*H; attach(Datos)
Datos$D2H<-D2*H; attach(Datos)
Datos$D2H2<-D2*H2; attach(Datos)
Datos$lnDH<-log(DH); attach(Datos)
Datos$lnD2H<-log(D2H); attach(Datos)
Datos$lnD2H2<-log(D2H2); attach(Datos)
Datos$lnD<-log(D); attach(Datos)
Datos$lnH<-log(H); attach(Datos)
Datos$lnV<-log(V); attach(Datos)
View(Datos)
Datos<-Datos %>% drop_na() # Eliminar celdas con NA
attach(Datos)
library(psych)
## Warning: package 'psych' was built under R version 4.1.3
library(colorRamps)
## Warning: package 'colorRamps' was built under R version 4.1.3
library(corrr)
## Warning: package 'corrr' was built under R version 4.1.3
res.cor <- correlate(Datos); res.cor
## # A tibble: 17 x 18
## term Girth Height Volume D H V DH D2 H2 D2H
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Girth NA 0.519 0.967 1 0.519 0.967 0.972 0.993 0.520 0.980
## 2 Height 0.519 NA 0.598 0.519 1 0.598 0.697 0.508 0.999 0.596
## 3 Volume 0.967 0.598 NA 0.967 0.598 1 0.977 0.979 0.603 0.989
## 4 D 1 0.519 0.967 NA 0.519 0.967 0.972 0.993 0.520 0.980
## 5 H 0.519 1 0.598 0.519 NA 0.598 0.697 0.508 0.999 0.596
## 6 V 0.967 0.598 1 0.967 0.598 NA 0.977 0.979 0.603 0.989
## 7 DH 0.972 0.697 0.977 0.972 0.697 0.977 NA 0.971 0.699 0.987
## 8 D2 0.993 0.508 0.979 0.993 0.508 0.979 0.971 NA 0.512 0.992
## 9 H2 0.520 0.999 0.603 0.520 0.999 0.603 0.699 0.512 NA 0.601
## 10 D2H 0.980 0.596 0.989 0.980 0.596 0.989 0.987 0.992 0.601 NA
## 11 D2H2 0.957 0.655 0.984 0.957 0.655 0.984 0.987 0.974 0.662 0.995
## 12 lnDH 0.957 0.726 0.938 0.957 0.726 0.938 0.984 0.936 0.724 0.947
## 13 lnD2H 0.981 0.644 0.946 0.981 0.644 0.946 0.982 0.959 0.642 0.957
## 14 lnD2H2 0.957 0.726 0.938 0.957 0.726 0.938 0.984 0.936 0.724 0.947
## 15 lnD 0.992 0.530 0.940 0.992 0.530 0.940 0.961 0.970 0.529 0.953
## 16 lnH 0.517 0.999 0.592 0.517 0.999 0.592 0.693 0.504 0.995 0.590
## 17 lnV 0.969 0.648 0.960 0.969 0.648 0.960 0.974 0.950 0.646 0.950
## # ... with 7 more variables: D2H2 <dbl>, lnDH <dbl>, lnD2H <dbl>, lnD2H2 <dbl>,
## # lnD <dbl>, lnH <dbl>, lnV <dbl>
corPlot(Datos[, 4:15],
gr = colorRampPalette(heat.colors(40))) # correlacion
mod_1 <- lm(V ~ D, data=Datos); summary(mod_1)
##
## Call:
## lm(formula = V ~ D, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.228385 -0.087972 0.004303 0.098961 0.271468
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.046121 0.095290 -10.98 7.62e-12 ***
## D 0.056476 0.002758 20.48 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1204 on 29 degrees of freedom
## Multiple R-squared: 0.9353, Adjusted R-squared: 0.9331
## F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16
mod_2 <- lm(V ~ H, data=Datos); summary(mod_2)
##
## Call:
## lm(formula = V ~ H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.60242 -0.28018 -0.08195 0.34171 0.84532
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.46706 0.82892 -2.976 0.005835 **
## H 0.14338 0.03566 4.021 0.000378 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3794 on 29 degrees of freedom
## Multiple R-squared: 0.3579, Adjusted R-squared: 0.3358
## F-statistic: 16.16 on 1 and 29 DF, p-value: 0.0003784
mod_3 <- lm(V ~ D + H, data=Datos); summary(mod_3)
##
## Call:
## lm(formula = V ~ D + H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.181411 -0.075021 -0.008143 0.062306 0.240259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.642025 0.244607 -6.713 2.75e-07 ***
## D 0.052488 0.002946 17.816 < 2e-16 ***
## H 0.031517 0.012091 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
mod_4 <- lm(V ~ D + H + DH, data=Datos); summary(mod_4)
##
## Call:
## lm(formula = V ~ D + H + DH, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.186383 -0.030222 0.008568 0.044291 0.132096
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9650816 0.6749523 2.911 0.00713 **
## D -0.0652830 0.0214197 -3.048 0.00511 **
## H -0.1205028 0.0287853 -4.186 0.00027 ***
## DH 0.0049251 0.0008916 5.524 7.48e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0767 on 27 degrees of freedom
## Multiple R-squared: 0.9756, Adjusted R-squared: 0.9728
## F-statistic: 359.3 on 3 and 27 DF, p-value: < 2.2e-16
mod_5 <- lm(V ~ D2 + H2, data=Datos); summary(mod_5)
##
## Call:
## lm(formula = V ~ D2 + H2, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.145436 -0.062615 0.005763 0.069788 0.120467
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.162e-01 9.040e-02 -4.604 8.19e-05 ***
## D2 7.381e-04 2.919e-05 25.283 < 2e-16 ***
## H2 7.206e-04 1.887e-04 3.818 0.000682 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07875 on 28 degrees of freedom
## Multiple R-squared: 0.9733, Adjusted R-squared: 0.9714
## F-statistic: 510 on 2 and 28 DF, p-value: < 2.2e-16
mod_6 <- lm(V ~ D + H + D2H, data=Datos); summary(mod_6)
##
## Call:
## lm(formula = V ~ D + H + D2H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.139710 -0.031007 -0.004463 0.048810 0.119015
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.682e-02 3.097e-01 -0.151 0.881
## D -1.050e-03 9.067e-03 -0.116 0.909
## H 2.780e-03 9.326e-03 0.298 0.768
## D2H 3.092e-05 5.113e-06 6.047 1.87e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.07295 on 27 degrees of freedom
## Multiple R-squared: 0.9779, Adjusted R-squared: 0.9754
## F-statistic: 398.1 on 3 and 27 DF, p-value: < 2.2e-16
mod_7 <- lm(V ~ DH, data=Datos); summary(mod_7)
##
## Call:
## lm(formula = V ~ DH, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.27423 -0.05360 -0.01545 0.07503 0.15607
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.148e-01 6.652e-02 -10.75 1.26e-11 ***
## DH 1.993e-03 8.125e-05 24.53 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1015 on 29 degrees of freedom
## Multiple R-squared: 0.954, Adjusted R-squared: 0.9524
## F-statistic: 601.7 on 1 and 29 DF, p-value: < 2.2e-16
mod_8 <- lm(V ~ D2H -1, data=Datos); summary(mod_8)
##
## Call:
## lm(formula = V ~ D2H - 1, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.132227 -0.030672 -0.009461 0.045434 0.121603
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## D2H 3.036e-05 3.920e-07 77.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06952 on 30 degrees of freedom
## Multiple R-squared: 0.995, Adjusted R-squared: 0.9949
## F-statistic: 5996 on 1 and 30 DF, p-value: < 2.2e-16
mod_9 <- lm(V ~ D2H2, data=Datos); summary(mod_9)
##
## Call:
## lm(formula = V ~ D2H2, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.210030 -0.062616 -0.005158 0.062191 0.151258
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.058e-02 2.975e-02 2.709 0.0112 *
## D2H2 1.155e-06 3.836e-08 30.100 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08338 on 29 degrees of freedom
## Multiple R-squared: 0.969, Adjusted R-squared: 0.9679
## F-statistic: 906 on 1 and 29 DF, p-value: < 2.2e-16
mod_10 <- lm(lnV ~ lnD, data=Datos); summary(mod_10)
##
## Call:
## lm(formula = lnV ~ lnD, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.205999 -0.068702 0.001011 0.072585 0.247963
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -7.96836 0.31416 -25.36 <2e-16 ***
## lnD 2.19997 0.08983 24.49 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.115 on 29 degrees of freedom
## Multiple R-squared: 0.9539, Adjusted R-squared: 0.9523
## F-statistic: 599.7 on 1 and 29 DF, p-value: < 2.2e-16
mod_11 <- lm(lnV ~ lnH, data=Datos); summary(mod_11)
##
## Call:
## lm(formula = lnV ~ lnH, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.65691 -0.27917 -0.08039 0.42193 0.61252
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.7919 2.7247 -4.695 5.92e-05 ***
## lnH 3.9821 0.8677 4.589 7.93e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4074 on 29 degrees of freedom
## Multiple R-squared: 0.4207, Adjusted R-squared: 0.4008
## F-statistic: 21.06 on 1 and 29 DF, p-value: 7.928e-05
mod_12 <- lm(lnV ~ lnDH, data=Datos); summary(mod_12)
##
## Call:
## lm(formula = lnV ~ lnDH, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.245089 -0.032449 0.006533 0.067887 0.150925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.19442 0.40029 -30.46 <2e-16 ***
## lnDH 1.79567 0.06033 29.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0953 on 29 degrees of freedom
## Multiple R-squared: 0.9683, Adjusted R-squared: 0.9672
## F-statistic: 885.8 on 1 and 29 DF, p-value: < 2.2e-16
mod_13 <- lm(lnV ~ lnD2H, data=Datos); summary(mod_13)
##
## Call:
## lm(formula = lnV ~ lnD2H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.167714 -0.048284 -0.007407 0.064576 0.137522
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.45760 0.28721 -36.41 <2e-16 ***
## lnD2H 1.00473 0.02835 35.44 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08041 on 29 degrees of freedom
## Multiple R-squared: 0.9774, Adjusted R-squared: 0.9767
## F-statistic: 1256 on 1 and 29 DF, p-value: < 2.2e-16
mod_14 <- lm(lnV ~ lnD2H2, data=Datos); summary(mod_14)
##
## Call:
## lm(formula = lnV ~ lnD2H2, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.245089 -0.032449 0.006533 0.067887 0.150925
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12.19442 0.40029 -30.46 <2e-16 ***
## lnD2H2 0.89784 0.03017 29.76 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.0953 on 29 degrees of freedom
## Multiple R-squared: 0.9683, Adjusted R-squared: 0.9672
## F-statistic: 885.8 on 1 and 29 DF, p-value: < 2.2e-16
mod_15 <- lm(lnV ~ lnD + lnH + lnD2H, data=Datos); summary(mod_15)
##
## Call:
## lm(formula = lnV ~ lnD + lnH + lnD2H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.168561 -0.048488 0.002431 0.063637 0.129223
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.71682 0.54996 -19.487 < 2e-16 ***
## lnD 1.98265 0.07501 26.432 < 2e-16 ***
## lnH 1.11712 0.20444 5.464 7.81e-06 ***
## lnD2H NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared: 0.9777, Adjusted R-squared: 0.9761
## F-statistic: 613.2 on 2 and 28 DF, p-value: < 2.2e-16
mod_16 <- lm(lnV ~ lnD + lnH, data=Datos); summary(mod_16)
##
## Call:
## lm(formula = lnV ~ lnD + lnH, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.168561 -0.048488 0.002431 0.063637 0.129223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -10.71682 0.54996 -19.487 < 2e-16 ***
## lnD 1.98265 0.07501 26.432 < 2e-16 ***
## lnH 1.11712 0.20444 5.464 7.81e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.08139 on 28 degrees of freedom
## Multiple R-squared: 0.9777, Adjusted R-squared: 0.9761
## F-statistic: 613.2 on 2 and 28 DF, p-value: < 2.2e-16
# estadisticos de todos los modelos
library(performance)
## Warning: package 'performance' was built under R version 4.1.3
Seleccion<-compare_performance(mod_1, mod_2, mod_3, mod_4, mod_5, mod_6, mod_7,
mod_8, mod_9, mod_10, mod_11, mod_12, mod_13, mod_14,
mod_15, mod_16,
rank = TRUE,
verbose = TRUE, metrics = "all");
## Warning: When comparing models, please note that probably not all models were fit
## from same data.
Seleccion
## # Comparison of Model Performance Indices
##
## Name | Model | R2 | R2 (adj.) | RMSE | Sigma | AIC weights | BIC weights | Performance-Score
## --------------------------------------------------------------------------------------------------
## mod_8 | lm | 0.995 | 0.995 | 0.068 | 0.070 | 0.915 | 0.984 | 99.98%
## mod_6 | lm | 0.978 | 0.975 | 0.068 | 0.073 | 0.052 | 0.007 | 66.62%
## mod_4 | lm | 0.976 | 0.973 | 0.072 | 0.077 | 0.011 | 0.001 | 65.29%
## mod_13 | lm | 0.977 | 0.977 | 0.078 | 0.080 | 0.006 | 0.003 | 64.88%
## mod_5 | lm | 0.973 | 0.971 | 0.075 | 0.079 | 0.008 | 0.002 | 64.87%
## mod_15 | lm | 0.978 | 0.976 | 0.077 | 0.081 | 0.003 | 7.00e-04 | 64.74%
## mod_16 | lm | 0.978 | 0.976 | 0.077 | 0.081 | 0.003 | 7.00e-04 | 64.74%
## mod_9 | lm | 0.969 | 0.968 | 0.081 | 0.083 | 0.002 | 0.001 | 64.03%
## mod_12 | lm | 0.968 | 0.967 | 0.092 | 0.095 | 3.23e-05 | 1.70e-05 | 62.77%
## mod_14 | lm | 0.968 | 0.967 | 0.092 | 0.095 | 3.23e-05 | 1.70e-05 | 62.77%
## mod_7 | lm | 0.954 | 0.952 | 0.098 | 0.102 | 4.55e-06 | 2.39e-06 | 61.40%
## mod_3 | lm | 0.948 | 0.944 | 0.104 | 0.110 | 2.45e-07 | 6.29e-08 | 60.30%
## mod_10 | lm | 0.954 | 0.952 | 0.111 | 0.115 | 9.65e-08 | 5.07e-08 | 60.07%
## mod_1 | lm | 0.935 | 0.933 | 0.116 | 0.120 | 2.30e-08 | 1.21e-08 | 58.56%
## mod_11 | lm | 0.421 | 0.401 | 0.394 | 0.407 | 8.93e-25 | 4.69e-25 | 3.29%
## mod_2 | lm | 0.358 | 0.336 | 0.367 | 0.379 | 8.14e-24 | 4.27e-24 | 2.77%
library(DT)
datatable(Seleccion, extensions = "Buttons",
options = list(dom = "Bfrtip",
buttons = c("copy", "csv", "excel", "pdf")))
library(see)
## Warning: package 'see' was built under R version 4.1.3
plot(compare_performance(mod_1, mod_2, mod_3, mod_4, mod_5, mod_6, mod_7,
mod_8, mod_9, mod_10, mod_11, mod_12, mod_13, mod_14,
mod_15, mod_16, rank = TRUE))
## Warning: When comparing models, please note that probably not all models were fit
## from same data.
############ DE AQUI EN ADELANTE ES EL ANALISIS DEL MODELO QUE DESEE
# revisar los supuestos del modelo
library(performance)
library(car)
modelo<-mod_3 # nombre del modelo (objeto) a evaluar
summary(modelo)
##
## Call:
## lm(formula = V ~ D + H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.181411 -0.075021 -0.008143 0.062306 0.240259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.642025 0.244607 -6.713 2.75e-07 ***
## D 0.052488 0.002946 17.816 < 2e-16 ***
## H 0.031517 0.012091 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
confint(modelo)
## 2.5 % 97.5 %
## (Intercept) -2.143079490 -1.14097039
## D 0.046453365 0.05852304
## H 0.006749267 0.05628557
check_model(modelo)
outlierTest(modelo, cutoff=Inf, n.max=5) # prueba de bonferroni para detectar outliers
## rstudent unadjusted p-value Bonferroni p
## 31 2.765603 0.010122 0.31377
## 18 -1.859901 0.073827 NA
## 2 1.651668 0.110190 NA
## 3 1.567740 0.128590 NA
## 1 1.532069 0.137140 NA
check_heteroskedasticity (modelo) # los errores deben mostrar homogeneidad
## OK: Error variance appears to be homoscedastic (p = 0.206).
check_autocorrelation (modelo) # residualaes no correlacionados
## Warning: Autocorrelated residuals detected (p = 0.022).
check_collinearity (modelo) # variables (x) independienets no correlacioandas
## # Check for Multicollinearity
##
## Low Correlation
##
## Term VIF VIF 95% CI Increased SE Tolerance Tolerance 95% CI
## D 1.37 [1.09, 2.47] 1.17 0.73 [0.41, 0.91]
## H 1.37 [1.09, 2.47] 1.17 0.73 [0.41, 0.91]
check_normality (modelo) # normalidad de los residuales (errores)
## OK: residuals appear as normally distributed (p = 0.639).
library("effects")
plot(predictorEffects(modelo))
library(sjPlot)
## Warning: package 'sjPlot' was built under R version 4.1.3
plot_model(modelo, type = "pred", show.data = TRUE, xlab="hola")
## $D
##
## $H
library(car)
Anova<-Anova(modelo); Anova # el anova del modelo
## Anova Table (Type II tests)
##
## Response: V
## Sum Sq Df F value Pr(>F)
## D 3.8352 1 317.4129 < 2e-16 ***
## H 0.0821 1 6.7943 0.01449 *
## Residuals 0.3383 28
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(modelo)
##
## Call:
## lm(formula = V ~ D + H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.181411 -0.075021 -0.008143 0.062306 0.240259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.642025 0.244607 -6.713 2.75e-07 ***
## D 0.052488 0.002946 17.816 < 2e-16 ***
## H 0.031517 0.012091 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
# Análisis de observaciones influyentes
library(car)
summary(influence.measures(modelo))
## Potentially influential observations of
## lm(formula = V ~ D + H, data = Datos) :
##
## dfb.1_ dfb.D dfb.H dffit cov.r cook.d hat
## 31 -0.74 0.97 0.34 1.50_* 0.68 0.61 0.23
influencePlot(modelo)
## StudRes Hat CookD
## 18 -1.859901 0.1434615 0.1775359
## 20 -1.105744 0.2112366 0.1082855
## 31 2.765603 0.2270585 0.6052326
library(splines)
library(visreg)
## Warning: package 'visreg' was built under R version 4.1.3
summary(modelo)
##
## Call:
## lm(formula = V ~ D + H, data = Datos)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.181411 -0.075021 -0.008143 0.062306 0.240259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.642025 0.244607 -6.713 2.75e-07 ***
## D 0.052488 0.002946 17.816 < 2e-16 ***
## H 0.031517 0.012091 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
# forma elegante de ver el modelo
visreg2d(modelo, x="D", y="H", plot.type="image") # que variables?
# visreg2d(modelo, x="D", y="H", plot.type="persp")
# visreg2d(modelo, "D", "H", plot.type="rgl")
library("plot3D")
colnames(Datos)
## [1] "Girth" "Height" "Volume" "D" "H" "V" "DH" "D2"
## [9] "H2" "D2H" "D2H2" "lnDH" "lnD2H" "lnD2H2" "lnD" "lnH"
## [17] "lnV"
# x, y and z coordinates
x <- D # Cual es la variable x de su modelo?
y <- H # Cual es la variable y de su modelo?
z <- V # Cual es la variable que esta prediciendo?
scatter3D(x, y, z, clab = c("Ozone", "(ppb)"))
# Compute the linear regression
fit <- lm(z ~ x + y); summary(fit)
##
## Call:
## lm(formula = z ~ x + y)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.181411 -0.075021 -0.008143 0.062306 0.240259
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.642025 0.244607 -6.713 2.75e-07 ***
## x 0.052488 0.002946 17.816 < 2e-16 ***
## y 0.031517 0.012091 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1099 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
# predict values on regular xy grid
grid.lines = 100
x.pred <- seq(min(x), max(x), length.out = grid.lines)
y.pred <- seq(min(y), max(y), length.out = grid.lines)
xy <- expand.grid( x = x.pred, y = y.pred)
z.pred <- matrix(predict(fit, newdata = xy),
nrow = grid.lines, ncol = grid.lines)
# fitted points for droplines to surface
fitpoints <- predict(fit)
# scatter plot with regression plane
scatter3D(x, y, z, pch = 18, cex = 2,
theta = 20, phi = 20, ticktype = "detailed",
xlab = "Diametro (cm)", ylab = "Altura (m)", zlab = "Volumen (m3)",
surf = list(x = x.pred, y = y.pred, z = z.pred,
facets = NA, fit = fitpoints), main = "")
library("plot3Drgl")
plotrgl()
library(report)
## Warning: package 'report' was built under R version 4.1.3
report(modelo)
## We fitted a linear model (estimated using OLS) to predict V with D (formula: V
## ~ D + H). The model explains a statistically significant and substantial
## proportion of variance (R2 = 0.95, F(2, 28) = 254.97, p < .001, adj. R2 =
## 0.94). The model's intercept, corresponding to D = 0, is at -1.64 (95% CI
## [-2.14, -1.14], t(28) = -6.71, p < .001). Within this model:
##
## - The effect of D is statistically significant and positive (beta = 0.05, 95%
## CI [0.05, 0.06], t(28) = 17.82, p < .001; Std. beta = 0.90, 95% CI [0.80,
## 1.00])
## - The effect of H is statistically significant and positive (beta = 0.03, 95%
## CI [6.75e-03, 0.06], t(28) = 2.61, p = 0.014; Std. beta = 0.13, 95% CI [0.03,
## 0.23])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation. and We fitted a linear
## model (estimated using OLS) to predict V with H (formula: V ~ D + H). The model
## explains a statistically significant and substantial proportion of variance (R2
## = 0.95, F(2, 28) = 254.97, p < .001, adj. R2 = 0.94). The model's intercept,
## corresponding to H = 0, is at -1.64 (95% CI [-2.14, -1.14], t(28) = -6.71, p <
## .001). Within this model:
##
## - The effect of D is statistically significant and positive (beta = 0.05, 95%
## CI [0.05, 0.06], t(28) = 17.82, p < .001; Std. beta = 0.90, 95% CI [0.80,
## 1.00])
## - The effect of H is statistically significant and positive (beta = 0.03, 95%
## CI [6.75e-03, 0.06], t(28) = 2.61, p = 0.014; Std. beta = 0.13, 95% CI [0.03,
## 0.23])
##
## Standardized parameters were obtained by fitting the model on a standardized
## version of the dataset. 95% Confidence Intervals (CIs) and p-values were
## computed using a Wald t-distribution approximation.
tabla<-report_table(modelo)
library(DT)
# datatable(tabla)
datatable(tabla, extensions = "Buttons",
options = list(dom = "Bfrtip",
buttons = c("copy", "csv", "excel", "pdf")))
library(equatiomatic)
## Warning: package 'equatiomatic' was built under R version 4.1.3
extract_eq(modelo) # El modelo final
\[ \operatorname{V} = \alpha + \beta_{1}(\operatorname{D}) + \beta_{2}(\operatorname{H}) + \epsilon \]
extract_eq(modelo, wrap = TRUE, use_coefs = TRUE) # El modelo final y coefficientes
\[ \begin{aligned} \operatorname{\widehat{V}} &= -1.64 + 0.05(\operatorname{D}) + 0.03(\operatorname{H}) \end{aligned} \]
library(vembedr)
embed_url("https://www.youtube.com/watch?v=Ke3vfYpKmtU") # lecccion 1
embed_url("https://www.youtube.com/watch?v=Yk8QOjhUvBY") # lecccion 2
embed_url("https://www.youtube.com/watch?v=qFIXTihSZ2w") # lecccion 3
embed_url("https://www.youtube.com/watch?v=KBoTaKgGtg8") # lecccion 4
embed_url("https://www.youtube.com/watch?v=5cDtzvCDzy0") # lecccion 5