library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.2.3
## Warning: package 'ggplot2' was built under R version 4.2.3
## Warning: package 'tibble' was built under R version 4.2.3
## Warning: package 'tidyr' was built under R version 4.2.3
## Warning: package 'readr' was built under R version 4.2.3
## Warning: package 'purrr' was built under R version 4.3.1
## Warning: package 'dplyr' was built under R version 4.2.3
## Warning: package 'forcats' was built under R version 4.3.1
## Warning: package 'lubridate' was built under R version 4.3.1
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the ]8;;http://conflicted.r-lib.org/conflicted package]8;; to force all conflicts to become errors
library(broom.mixed)
## Warning: package 'broom.mixed' was built under R version 4.2.3
library(lme4)
## Warning: package 'lme4' was built under R version 4.2.3
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
library(ggplot2)
data("CO2")
uptake2 = CO2$uptake + runif(84,0.02, 0.04)
view(uptake2)
ggplot(CO2, aes(x = conc, y = uptake2, color = Type)) + geom_point()
Captación deCO2 de distitas planta tiene alta variación. Vemos plantas de Quebecen la parte superior y Missisipi en la inferior.
Tratamiento experimental = enfriamiento
ggplot(CO2, aes(x = conc, y = uptake2, color = Type)) + geom_point(aes(shape = Treatment)) + theme_bw()
ggplot(CO2, aes(x = conc, y = uptake2)) +
geom_point(aes(color = Treatment))+
facet_wrap(~Type)
ggplot(CO2, aes(x = conc, y = uptake2, color = Type)) +
geom_point(aes(shape = Treatment)) +
geom_path(aes(group = Plant, lty = Treatment)) +
theme_bw()
ggplot(CO2, aes(x = conc, y = uptake2, color = Type)) +
geom_point(aes(shape = Treatment)) +
geom_path(aes(group = Plant, lty = Treatment)) +
theme_bw() + facet_wrap(~Type)
Cada planta aislada, en cada tratamiento (Enfriamiento y no enfriamiento) = Variabilidad asociada a la panta.
tienen la misma pendiente pero diferente intercepto.
## Modelo normal / lineal simple - Minimos cuadrados: es la tecnica que me permite identificar los parametros entre el intercepto y la pendiente (y=a+bx)
Fit1 <- lm(uptake2 ~ I(log(conc)) + Type:Treatment, data = CO2)
summary(Fit1)
##
## Call:
## lm(formula = uptake2 ~ I(log(conc)) + Type:Treatment, data = CO2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.7166 -2.8981 0.5869 2.7640 8.8698
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -33.5303 4.0765 -8.225 3.17e-12 ***
## I(log(conc)) 8.4845 0.6783 12.509 < 2e-16 ***
## TypeQuebec:Treatmentnonchilled 19.5200 1.4402 13.554 < 2e-16 ***
## TypeMississippi:Treatmentnonchilled 10.1389 1.4402 7.040 6.26e-10 ***
## TypeQuebec:Treatmentchilled 15.9401 1.4402 11.068 < 2e-16 ***
## TypeMississippi:Treatmentchilled NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.667 on 79 degrees of freedom
## Multiple R-squared: 0.8228, Adjusted R-squared: 0.8138
## F-statistic: 91.69 on 4 and 79 DF, p-value: < 2.2e-16
Captación CO2- variable respuesta de - Concentración CO2: Hay relación entre las variables con respecto a la captación de CO2 de las plantas, siendo significativos en el modelo.
En base a pendiente del intercepto, se rechaza hipotesis nula (No es igual a 0)
R cuadrado ajustado = 81.3 % Los datos estan relacionados, (supera el 70%, es positivo en los datos)
Hay sigularidad en los datso TypeMississippi:Treatmentchilled
ggplot(CO2, aes(x = conc, y = uptake2, color = Type)) +
geom_point() +
geom_smooth(method='lm', formula=y~x, se=FALSE, col='darkblue') +
theme_light()
Se incluye un factor aleatorio en este caso las plantas, por ello se usa el modelo lineal mixto
Utilizando el paquete lme4 se puede llevar el modelo a plantas que no estan en el experimento, entonces cada planta tendra un intercepto distinto.
# Agregar el factor aleatorio: (Planta) - Para estimar los parámetros del modelos
Fit2 <- lmer(uptake2 ~ I(log(conc)) + Type:Treatment + (1|Plant), data = CO2)
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
summary(Fit2)
## Linear mixed model fit by REML ['lmerMod']
## Formula: uptake2 ~ I(log(conc)) + Type:Treatment + (1 | Plant)
## Data: CO2
##
## REML criterion at convergence: 482.3
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -2.6932 -0.5388 0.1034 0.6896 1.8919
##
## Random effects:
## Groups Name Variance Std.Dev.
## Plant (Intercept) 2.152 1.467
## Residual 20.252 4.500
## Number of obs: 84, groups: Plant, 12
##
## Fixed effects:
## Estimate Std. Error t value
## (Intercept) -33.5303 4.0213 -8.338
## I(log(conc)) 8.4845 0.6541 12.971
## TypeQuebec:Treatmentnonchilled 19.5200 1.8341 10.643
## TypeMississippi:Treatmentnonchilled 10.1389 1.8341 5.528
## TypeQuebec:Treatmentchilled 15.9401 1.8341 8.691
##
## Correlation of Fixed Effects:
## (Intr) I(l()) TypQbc:Trtmntn TypM:T
## I(log(cnc)) -0.947
## TypQbc:Trtmntn -0.228 0.000
## TypMssssp:T -0.228 0.000 0.500
## TypQbc:Trtmntc -0.228 0.000 0.500 0.500
## fit warnings:
## fixed-effect model matrix is rank deficient so dropping 1 column / coefficient
*Cada una de las plantas va a tener un intercepto distinto. La media de los residuos cambia en relación al modelo lineal simple.
# Valores efectos variables:
ranef(Fit2)
## $Plant
## (Intercept)
## Qn1 -0.89792104
## Qn2 -0.07460288
## Qn3 0.97252392
## Qc1 -0.75881301
## Qc3 0.35571971
## Qc2 0.40309330
## Mn3 -0.78337733
## Mn2 0.59186336
## Mn1 0.19151397
## Mc2 -1.56595943
## Mc3 0.63343986
## Mc1 0.93251957
##
## with conditional variances for "Plant"
# Valores efectos fijos:
fixef(Fit2)
## (Intercept) I(log(conc))
## -33.530325 8.484494
## TypeQuebec:Treatmentnonchilled TypeMississippi:Treatmentnonchilled
## 19.519957 10.138867
## TypeQuebec:Treatmentchilled
## 15.940114
# Modelo lineal simple
broom::glance(Fit1)
## # A tibble: 1 × 12
## r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.823 0.814 4.67 91.7 6.94e-29 4 -246. 504. 519.
## # ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>
broom::tidy(Fit1)
## # A tibble: 6 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) -33.5 4.08 -8.23 3.17e-12
## 2 I(log(conc)) 8.48 0.678 12.5 2.01e-20
## 3 TypeQuebec:Treatmentnonchilled 19.5 1.44 13.6 2.59e-22
## 4 TypeMississippi:Treatmentnonchilled 10.1 1.44 7.04 6.26e-10
## 5 TypeQuebec:Treatmentchilled 15.9 1.44 11.1 9.85e-18
## 6 TypeMississippi:Treatmentchilled NA NA NA NA
El modelo lineal mixto no se tienen R cuadrado:
#Modelo lineal mixto:
broom.mixed::glance(Fit2)
## # A tibble: 1 × 7
## nobs sigma logLik AIC BIC REMLcrit df.residual
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
## 1 84 4.50 -241. 496. 513. 482. 77
broom.mixed::tidy(Fit2)
## # A tibble: 7 × 6
## effect group term estimate std.error statistic
## <chr> <chr> <chr> <dbl> <dbl> <dbl>
## 1 fixed <NA> (Intercept) -33.5 4.02 -8.34
## 2 fixed <NA> I(log(conc)) 8.48 0.654 13.0
## 3 fixed <NA> TypeQuebec:Treatmentnonchilled 19.5 1.83 10.6
## 4 fixed <NA> TypeMississippi:Treatmentnonch… 10.1 1.83 5.53
## 5 fixed <NA> TypeQuebec:Treatmentchilled 15.9 1.83 8.69
## 6 ran_pars Plant sd__(Intercept) 1.47 NA NA
## 7 ran_pars Residual sd__Observation 4.50 NA NA
broom.mixed::tidy(Fit2) %>% View()
En tidy se puede observar los valores estimados para la interacción entre las plantas y el enfriamento, la que tiene una estaimación mayor es TypeQuebec:Treatmentnonchilled seguida por TypeQuebec:Treatmentchilled, coon esto se puede confirmar lo encontrado en el analisis gráfico en donde con cualquiera de los dos tratamientos las plantas de Quebec tienen mayor captura de CO2.
### CONCLUSIÓNES:
Con esta función se agrega una familia, se agrega la familia poisson porque no se tienen datos decimales y son valores positivos -La base de datos ChickWeight (peso de pollos) es muy similar a la anterior, se tiene el peso, tiempo y dieta, tambien la identidad del pollo los cuales son 50
data("ChickWeight")
ggplot(ChickWeight, aes(x = Time, y = weight)) + geom_point(aes(color = Diet)) + geom_path(aes(color = Diet, group = Chick))
Fit1_Poisson <- glm(weight ~ Diet:Time, data = ChickWeight, family = poisson)
summary(Fit1_Poisson)
##
## Call:
## glm(formula = weight ~ Diet:Time, family = poisson, data = ChickWeight)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -12.0552 -1.1072 -0.3559 1.2662 8.3815
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.8589865 0.0093064 414.66 <2e-16 ***
## Diet1:Time 0.0655659 0.0007219 90.82 <2e-16 ***
## Diet2:Time 0.0753719 0.0007680 98.14 <2e-16 ***
## Diet3:Time 0.0862448 0.0007292 118.28 <2e-16 ***
## Diet4:Time 0.0822816 0.0007564 108.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 22444.4 on 577 degrees of freedom
## Residual deviance: 4037.8 on 573 degrees of freedom
## AIC: 7790.6
##
## Number of Fisher Scoring iterations: 4
Fit2_Poisson <- glmer(weight ~ Diet:Time + (1|Chick), data = ChickWeight, family = poisson)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model is nearly unidentifiable: very large eigenvalue
## - Rescale variables?
summary(Fit2_Poisson)
## Generalized linear mixed model fit by maximum likelihood (Laplace
## Approximation) [glmerMod]
## Family: poisson ( log )
## Formula: weight ~ Diet:Time + (1 | Chick)
## Data: ChickWeight
##
## AIC BIC logLik deviance df.resid
## 5302.2 5328.3 -2645.1 5290.2 572
##
## Scaled residuals:
## Min 1Q Median 3Q Max
## -5.1497 -0.9033 0.0867 0.9551 5.1246
##
## Random effects:
## Groups Name Variance Std.Dev.
## Chick (Intercept) 0.04522 0.2127
## Number of obs: 578, groups: Chick, 50
##
## Fixed effects:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 3.838164 0.031528 121.74 <2e-16 ***
## Diet1:Time 0.066646 0.001046 63.72 <2e-16 ***
## Diet2:Time 0.074865 0.001293 57.90 <2e-16 ***
## Diet3:Time 0.086779 0.001232 70.47 <2e-16 ***
## Diet4:Time 0.076251 0.001263 60.37 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Correlation of Fixed Effects:
## (Intr) Dt1:Tm Dt2:Tm Dt3:Tm
## Diet1:Time -0.173
## Diet2:Time -0.124 0.021
## Diet3:Time -0.122 0.021 0.015
## Diet4:Time -0.117 0.017 0.014 0.014
## optimizer (Nelder_Mead) convergence code: 0 (OK)
## Model is nearly unidentifiable: very large eigenvalue
## - Rescale variables?
#Modleo lineal
broom::tidy(Fit1_Poisson)
## # A tibble: 5 × 5
## term estimate std.error statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 (Intercept) 3.86 0.00931 415. 0
## 2 Diet1:Time 0.0656 0.000722 90.8 0
## 3 Diet2:Time 0.0754 0.000768 98.1 0
## 4 Diet3:Time 0.0862 0.000729 118. 0
## 5 Diet4:Time 0.0823 0.000756 109. 0
#Modelo con mixto
broom.mixed::tidy(Fit2_Poisson)
## # A tibble: 6 × 7
## effect group term estimate std.error statistic p.value
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 fixed <NA> (Intercept) 3.84 0.0315 122. 0
## 2 fixed <NA> Diet1:Time 0.0666 0.00105 63.7 0
## 3 fixed <NA> Diet2:Time 0.0749 0.00129 57.9 0
## 4 fixed <NA> Diet3:Time 0.0868 0.00123 70.5 0
## 5 fixed <NA> Diet4:Time 0.0763 0.00126 60.4 0
## 6 ran_pars Chick sd__(Intercept) 0.213 NA NA NA
*El intercepto cambia. Hay variabilidad por los individuos.
View(broom::tidy(Fit1_Poisson))
View(broom.mixed::tidy(Fit2_Poisson))
Tomar en cuenta la variabilidad de un individuo Vs otro mejora el modelo. Se debe tener más de una medida para cada individuo