Librerias

library(readxl)
library(lmtest)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(olsrr)

## 
## Attaching package: 'olsrr'

## The following object is masked from 'package:datasets':
## 
##     rivers

library(car)

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(gvlma)

PARTE 1

datos <- read_excel('03Memoria.xlsx')
datos |> select_if(is.numeric) |> pairs()

attach(datos)
modelo <-lm(proporcion~tiempo)
modelo

## 
## Call:
## lm(formula = proporcion ~ tiempo)
## 
## Coefficients:
## (Intercept)       tiempo  
##   5.049e-01   -5.477e-05

modelo |> residuals() -> residuales

Normalidad

Gráfica:

data.frame(residuales) |> 
ggplot(aes(sample=residuales))+
stat_qq(size = 2) +
stat_qq_line(distribution = "qnorm")+
labs(x = "Cuantil teórico",
y = "Residuales")+
theme_minimal()

Prueba de hipótesis:

H0 : Los errores se distribuyen normalmente
H1 : Los errores No se distribuyen normalmente

Nivel de significancia: α = 0.05

Estadistico: Test de normalidad de Shapiro Wilk

residuales |> shapiro.test()

## 
##  Shapiro-Wilk normality test
## 
## data:  residuales
## W = 0.93006, p-value = 0.2181

Decisión: Como el p-value = 0.2181 > 0.05 , No se rechaza Ho.

Conclusión: A un 95% de confianza existe evidencia estadística para afirmar que los errores se distribuyen normalmente a pesar de que graficamente no se observe eso.

Homocedasticidad de errores

Gráfica:

  modelo |> plot(which=1)

Prueba de hipótesis:

H0 : Los errores tienen varianza homogenea
H1 : Los errores No tienen varianza homogenea

Nivel de significancia: α = 0.05

Estadistico: Prueba de Breusch Pagan

modelo |> ols_test_breusch_pagan()

## 
##  Breusch Pagan Test for Heteroskedasticity
##  -----------------------------------------
##  Ho: the variance is constant            
##  Ha: the variance is not constant        
## 
##                  Data                  
##  --------------------------------------
##  Response : proporcion 
##  Variables: fitted values of proporcion 
## 
##         Test Summary         
##  ----------------------------
##  DF            =    1 
##  Chi2          =    0.259826 
##  Prob > Chi2   =    0.6102397

Decisión: Como el p-value = 0.6102397 > 0.05 , No se rechaza Ho.

Conclusión: A un 95% de confianza existe evidencia estadística para afirmar que los errores son homocedasticos

Independencia de errores

Gráfica:

data.frame(residuales) |>
ggplot(aes(x=1:nrow(datos),y=residuales))+
geom_point(size = 3) +
geom_line()+
geom_hline(yintercept=0)+
labs(x = "Orden",
y = "Residual",
title = "Evaluación de independencia",
subtitle = "Modelo")+
theme_minimal()

### Prueba de hipótesis: - H0 : Los errores son independientes - H1 : Los errores No son independientes

Nivel de significancia: α = 0.05

Estadistico: Prueba de Durbin Watson

modelo |> dwtest(alternative = "two.sided")

## 
##  Durbin-Watson test
## 
## data:  modelo
## DW = 2.1898, p-value = 0.6424
## alternative hypothesis: true autocorrelation is not 0

Decisión: Como el p-value = 0.6424 > 0.05 , No se rechaza Ho.

Conclusión: A un 95% de confianza existe evidencia estadística para afirmar que los errores son independientes

Linealidad

Grafico:

library(GGally)

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

datos |> select_if(is.numeric) |> ggpairs()

### Prueba de hipótesis: - H0 : El modelo es lineal - H1 : El modelo No es lineal

Nivel de significancia: α = 0.05

Estadístico: Prueba de Función de enlace

modelo |> gvlma() |> summary()

## 
## Call:
## lm(formula = proporcion ~ tiempo)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.18017 -0.10551 -0.02945  0.10588  0.33511 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.049e-01  4.212e-02  11.988 4.38e-09 ***
## tiempo      -5.477e-05  1.429e-05  -3.833  0.00163 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1525 on 15 degrees of freedom
## Multiple R-squared:  0.4948, Adjusted R-squared:  0.4611 
## F-statistic: 14.69 on 1 and 15 DF,  p-value: 0.001631
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = modelo) 
## 
##                     Value p-value                   Decision
## Global Stat        8.1961 0.08465    Assumptions acceptable.
## Skewness           1.3826 0.23966    Assumptions acceptable.
## Kurtosis           0.1210 0.72791    Assumptions acceptable.
## Link Function      6.1972 0.01280 Assumptions NOT satisfied!
## Heteroscedasticity 0.4953 0.48159    Assumptions acceptable.

Decisión: Como el p-value = 0.01279548 < 0.05 , Se rechaza Ho.

Conclusión: A un 95% de confianza existe evidencia estadística para afirmar que el modelo no es lineal.

PARTE II

datos1 = read_xlsx('03Memoria.xlsx')
datos1 |> mutate(tiempo2 = exp(-tiempo)) -> datos1
lm(proporcion~tiempo2, datos1) |> summary()

## 
## Call:
## lm(formula = proporcion ~ tiempo2, data = datos1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32102 -0.13102 -0.02102  0.13898  0.30084 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.4010     0.0459   8.737 2.86e-07 ***
## tiempo2       1.2084     0.5144   2.349   0.0329 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1834 on 15 degrees of freedom
## Multiple R-squared:  0.269,  Adjusted R-squared:  0.2202 
## F-statistic: 5.519 on 1 and 15 DF,  p-value: 0.03293

par(mfrow=c(1,2))
plot(datos1$tiempo2,datos1$proporcion)
lm(proporcion~tiempo2, datos1) |> plot(which=1)

modelo2 <-lm(proporcion~tiempo2, datos1)

PARTE 3

Normalidad

modelo2 |> residuals() -> residuales
residuales |> shapiro.test()

## 
##  Shapiro-Wilk normality test
## 
## data:  residuales
## W = 0.98053, p-value = 0.9617

Homocedasticidad de errores

modelo2 |> ols_test_breusch_pagan()

## 
##  Breusch Pagan Test for Heteroskedasticity
##  -----------------------------------------
##  Ho: the variance is constant            
##  Ha: the variance is not constant        
## 
##                  Data                  
##  --------------------------------------
##  Response : proporcion 
##  Variables: fitted values of proporcion 
## 
##         Test Summary         
##  ----------------------------
##  DF            =    1 
##  Chi2          =    0.4918378 
##  Prob > Chi2   =    0.4831087

Independencia de errores

modelo2 |> dwtest(alternative = "two.sided")

## 
##  Durbin-Watson test
## 
## data:  modelo2
## DW = 2.6568, p-value = 0.2042
## alternative hypothesis: true autocorrelation is not 0

Linealidad

modelo2 |> gvlma() |> summary()

## 
## Call:
## lm(formula = proporcion ~ tiempo2, data = datos1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.32102 -0.13102 -0.02102  0.13898  0.30084 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.4010     0.0459   8.737 2.86e-07 ***
## tiempo2       1.2084     0.5144   2.349   0.0329 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1834 on 15 degrees of freedom
## Multiple R-squared:  0.269,  Adjusted R-squared:  0.2202 
## F-statistic: 5.519 on 1 and 15 DF,  p-value: 0.03293
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = modelo2) 
## 
##                       Value p-value                Decision
## Global Stat        3.763140 0.43901 Assumptions acceptable.
## Skewness           0.001900 0.96523 Assumptions acceptable.
## Kurtosis           0.470270 0.49286 Assumptions acceptable.
## Link Function      3.289759 0.06971 Assumptions acceptable.
## Heteroscedasticity 0.001211 0.97224 Assumptions acceptable.

Con el modelo aplicado se cumplen todos los supuestos puede haber mejores modelos

Tambien dado que se verifica los supuestos de normalidad, homocedasticidad e independencia de errores y NO la linealidad aplicaremos Transformación Box Tidwell

datos1 = read_xlsx('03Memoria.xlsx')
boxTidwell(proporcion~tiempo, data=datos1)

##  MLE of lambda Score Statistic (z)  Pr(>|z|)    
##       0.040488              4.9101 9.101e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## iterations =  6

Tarea5

Luis Salsavilca

2022-06-09

Librerias

PARTE 1

Normalidad

Gráfica:

Prueba de hipótesis:

Homocedasticidad de errores

Gráfica:

Prueba de hipótesis:

Independencia de errores

Gráfica:

Linealidad

Grafico:

PARTE II

PARTE 3

Normalidad

Homocedasticidad de errores

Independencia de errores

Linealidad