A continuación se presenta la matriz de dispersión de la variable independiente Volume y las variables independientes Girth y Heightd de la base de datos tree:
trees <- as.data.frame(trees)
attach(trees)
pairs(Volume~Girth+Height,col='blue',main='Matriz de Dispersión de las variables de Estudio')
Presentamos el primer modelo propuesto:
regre.1 <- lm(Volume ~ Girth + Height)
summary(regre.1)
##
## Call:
## lm(formula = Volume ~ Girth + Height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.4065 -2.6493 -0.2876 2.2003 8.4847
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -57.9877 8.6382 -6.713 2.75e-07 ***
## Girth 4.7082 0.2643 17.816 < 2e-16 ***
## Height 0.3393 0.1302 2.607 0.0145 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.882 on 28 degrees of freedom
## Multiple R-squared: 0.948, Adjusted R-squared: 0.9442
## F-statistic: 255 on 2 and 28 DF, p-value: < 2.2e-16
El Criterio de Información de AKAIKE (AIC) se muestra acontinuación:
AIC(regre.1)
## [1] 176.91
A continuación se presenta el segundo modelo propuesto, para ello se obtiene el valor umbral o punto de corte:
library(chngpt)
## Warning: package 'chngpt' was built under R version 3.6.1
##
## The package 'chngpt' has been loaded. If used for a publication, please cite:
## Fong, Huang, Gilbert, Permar (2017), BMC Bioinformatics, DOI:10.1186/s12859-017-1863-x
## chngpt: threshold regression model estimation and inference
test=chngpt.test(formula.null=Volume~Height, formula.chngpt=~Girth, trees,
type="segmented", family="gaussian")
test
##
## Maximum of Likelihood Ratio Statistics
##
## data: trees
## Maximal statistic = 24.904, threshold = 13.3, p-value < 2.2e-16
## alternative hypothesis: two-sided
Con el siguiente gráfico visualizamos el valor umbral para cada uno de los valores de la variable umbral:
El modelo de regresión umbral fue connstruido con la librería chngpt, donde visualizaremos los coeficientes de regresión:
library(chngpt)
regre.2=chngptm(formula.1=Volume~Height, formula.2=~Girth, data=trees,
type="segmented", family="gaussian")
summary(regre.2)[-2]
## $coefficients
## est Std. Error* (lower upper) p.value
## (Intercept) -39.4511441 7.3083389 -53.7754885 -25.1267998 6.735034e-08
## Height 0.3955727 0.1176930 0.1648944 0.6262509 7.764513e-04
## Girth 2.6666296 0.8619170 0.9772722 4.3559869 1.975870e-03
## (Girth-chngpt)+ 3.3709885 0.7256953 1.9486256 4.7933513 3.397732e-06
##
## $threshold.type
## [1] "segmented"
Procedemos a obtener el Coeficiente de Determinación:
chngpt = 13.300000
trees$Yest = ifelse(trees$Girth > chngpt,
-39.4511441 + 0.3955727 * trees$Height + 2.6666296 * trees$Girth + 3.3709885 *(trees$Girth-chngpt),
-39.4511441 + 0.3955727 * trees$Height + 2.6666296 * trees$Girth)
trees$erroru = trees$Volume - trees$Yest
r_cuadrado = sum((trees$Yest - mean(trees$Volume))^2)/ sum((trees$Volume - mean(trees$Volume))^2)
r_cuadrado
## [1] 0.9766911
También obtenemos el AIC
regre.2$best.fit
##
## Call: glm(formula = formula.new, family = family, data = data, weights = chngpt.glm.weights,
## offset = chngpt.glm.offset)
##
## Coefficients:
## (Intercept) Height Girth x.mod.e
## -39.4511 0.3956 2.6666 3.3710
##
## Degrees of Freedom: 30 Total (i.e. Null); 27 Residual
## Null Deviance: 8106
## Residual Deviance: 188.9 AIC: 154
A continuación se presenta el tercer modelo propuesto, para ello separaremos la data en dos grupos, considerando el grupo de corte con el valor umbral:
base1 <- subset.data.frame(trees,trees$Girth <= chngpt)
head(base1)
## Girth Height Volume Yest erroru
## 1 8.3 70 10.3 10.371971 -0.07197058
## 2 8.6 65 10.3 9.194096 1.10590404
## 3 8.8 63 10.2 8.936276 1.26372352
## 4 10.5 72 16.4 17.029701 -0.62970110
## 5 10.7 81 18.8 21.123181 -2.32318132
## 6 10.8 83 19.7 22.180990 -2.48098968
base2 <- subset.data.frame(trees,trees$Girth > chngpt)
head(base2)
## Girth Height Volume Yest erroru
## 19 13.7 71 25.7 26.51574 -0.8157385
## 20 13.8 64 24.9 24.35049 0.5495086
## 21 14.0 78 34.5 31.09603 3.4039672
## 22 14.2 80 31.7 33.09470 -1.3947019
## 23 14.5 74 36.3 32.53255 3.7674489
## 24 16.0 72 38.3 40.79783 -2.4978328
Luego, procedemos a estimar los modelos de regresión múltiple
umbral.1 <- lm(base1$Volume ~ base1$Girth + base1$Height)
summary(umbral.1)
##
## Call:
## lm(formula = base1$Volume ~ base1$Girth + base1$Height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9936 -1.7492 -0.1353 0.8771 5.7239
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -36.6780 6.1068 -6.006 2.41e-05 ***
## base1$Girth 2.6483 0.5141 5.151 0.000118 ***
## base1$Height 0.3599 0.1066 3.375 0.004162 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.23 on 15 degrees of freedom
## Multiple R-squared: 0.8753, Adjusted R-squared: 0.8587
## F-statistic: 52.64 on 2 and 15 DF, p-value: 1.656e-07
umbral.2 <- lm(base2$Volume ~ base2$Girth + base2$Height)
summary(umbral.2)
##
## Call:
## lm(formula = base2$Volume ~ base2$Girth + base2$Height)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.822 -2.051 1.438 2.649 3.372
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -91.3395 12.7786 -7.148 3.11e-05 ***
## base2$Girth 5.5149 0.6643 8.302 8.50e-06 ***
## base2$Height 0.5987 0.2425 2.469 0.0332 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.238 on 10 degrees of freedom
## Multiple R-squared: 0.961, Adjusted R-squared: 0.9533
## F-statistic: 123.4 on 2 and 10 DF, p-value: 8.971e-08
Obtenemos el coeficiente de determinación:
## [1] 0.9454559
Finalmente obtenemos el AIC:
## [1] 156.6976
Una vez realizado los cálculos para los 03 modelos, procederemos a realizar la comparación de los Coeficientes de Determinación y Criterio de Información de Akaike (AIC)
## Modelo 1 Modelo 2 Modelo 3
## R2 0.948 0.977 0.945
## AIC 176.910 154.000 156.698
A continuación se presenta el análisis de residuales obtenidos con el modelo umbral:
qqnorm(trees$erroru)
qqline(trees$erroru, lty = 2,col='red')
library(nortest)
ad.test(trees$erroru)
##
## Anderson-Darling normality test
##
## data: trees$erroru
## A = 0.15917, p-value = 0.9441
library(alr3)
## Warning: package 'alr3' was built under R version 3.6.1
## Loading required package: car
## Warning: package 'car' was built under R version 3.6.1
## Loading required package: carData
## Warning: package 'carData' was built under R version 3.6.1
durbinWatsonTest(trees$erroru)
## [1] 2.232977