completing approaches
unique tells us all factors in var. length says how many different var.
length(unique(popula$school))
## [1] 100
How many pupills in each school
popula %>%
group_by(school) %>%
summarise(n_sch = n())
schoold <- popula %>%
group_by(school) %>%
summarise(mean_p = mean(popular, na.rm = TRUE), mean_s = mean(sex, na.rm = TRUE))
head(schoold)
ecoreg <- lm(mean_p ~ mean_s, data = schoold)
summary(ecoreg)
##
## Call:
## lm(formula = mean_p ~ mean_s, data = schoold)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.75980 -0.66987 0.00539 0.56354 2.21650
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.3813 0.4528 9.676 6.12e-16 ***
## mean_s 1.9045 0.9108 2.091 0.0391 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9394 on 98 degrees of freedom
## Multiple R-squared: 0.04271, Adjusted R-squared: 0.03294
## F-statistic: 4.373 on 1 and 98 DF, p-value: 0.03911
The fallacy of ecological level association student-level causal connection.
dc ## CPOOLING MODEL
# cpooling <- gls(popular ~ sex, data = popula, method = "ML")
cpooling <- lm(popular ~ sex, data = popula)
summary(cpooling)
##
## Call:
## lm(formula = popular ~ sex, data = popula)
##
## Residuals:
## popularity according to sociometric score
## Min 1Q Median 3Q Max
## -2.8782 -0.8782 0.1218 1.1218 4.1218
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.87817 0.03572 136.58 <2e-16 ***
## sex 0.88261 0.05118 17.25 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.144 on 1998 degrees of freedom
## Multiple R-squared: 0.1296, Adjusted R-squared: 0.1291
## F-statistic: 297.4 on 1 and 1998 DF, p-value: < 2.2e-16
School environment may be an important determiant of what is popular and what is not There may be important between-school variations in the relationship The overall relationship is easily dominated by a few very large schools
dcoef <- popula %>%
group_by(school) %>%
do(mod = lm(popular ~ sex, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()
This histogram states the average pop. index for males in the 100 schools. Intercept signifies the start the avg. of pop of males.
dcoef <- popula %>%
group_by(school) %>%
do(mod = lm(popular ~ sex, data = .))
coef <- dcoef %>% do(data.frame(sexc = coef(.$mod)[2]))
ggplot(coef, aes(x = sexc)) + geom_histogram()
slopes signifies difference in variable gender. in some schools boys feel more popular than girls this is shown from the below 0 area. but, majority of schools girls feel more popular than boys.
compared to complete-pooling * partial pooling model allows between- school variations. compared to no-pooling * distribution superimpose.
Before looking a plot create a superimpose hypothesis.
m1_lme <- lme(popular ~ sex, data = popula, random = ~1|school, method = "ML")
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: popula
## AIC BIC logLik
## 4492.886 4515.29 -2242.443
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.9237762 0.6779928
##
## Fixed effects: popular ~ sex
## Value Std.Error DF t-value p-value
## (Intercept) 4.897215 0.09487616 1899 51.61692 0
## sex 0.843713 0.03097064 1899 27.24234 0
## Correlation:
## (Intr)
## sex -0.159
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -3.319094760 -0.688695720 0.002237255 0.595886663 3.825100008
##
## Number of Observations: 2000
## Number of Groups: 100
random intercept does not assume the affect to be diff. we only assume that the intercept follows a normal distribution. random= ~1 gives us the random school intercept. Random effects is divided into two parts residual and intercept. instead of plot 100 schools intercept in this one we are forcing schools to follow a normal intercept distribution.
m2_lme <- lme(popular ~ sex, data = popula, random = ~ sex|school, method = "ML")
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: popula
## AIC BIC logLik
## 4341.617 4375.223 -2164.809
##
## Random effects:
## Formula: ~sex | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.9645887 (Intr)
## sex 0.5185785 -0.278
## Residual 0.6264514
##
## Fixed effects: popular ~ sex
## Value Std.Error DF t-value p-value
## (Intercept) 4.890112 0.09856659 1899 49.61226 0
## sex 0.843120 0.05936106 1899 14.20325 0
## Correlation:
## (Intr)
## sex -0.307
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.948021007 -0.657669324 -0.003250328 0.535579989 3.530548405
##
## Number of Observations: 2000
## Number of Groups: 100
random = ~ sex|school want to see random intercept and random affect of slope.
fixed affect gives the common trend shared by 100 schools.
states on average the avg. popularity index is 4.89 for boys on the otherhand the is subject to diff. school levels.
and average of girls is .8431 more than.
random effect is unniquue to an individual school
AIC(cpooling, m1_lme, m2_lme)
m3_lme <- lme(popular ~ sex + texp, data = popula, random = ~ sex|school, method = "ML")
summary(m3_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: popula
## AIC BIC logLik
## 4275.175 4314.382 -2130.588
##
## Random effects:
## Formula: ~sex | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.6344228 (Intr)
## sex 0.5193270 0.064
## Residual 0.6264869
##
## Fixed effects: popular ~ sex + texp
## Value Std.Error DF t-value p-value
## (Intercept) 3.339973 0.15928089 1899 20.96907 0
## sex 0.843175 0.05943020 1899 14.18765 0
## texp 0.108353 0.01011962 98 10.70718 0
## Correlation:
## (Intr) sex
## sex -0.020
## texp -0.908 0.000
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.937854583 -0.616381486 0.002220378 0.528756210 3.518629489
##
## Number of Observations: 2000
## Number of Groups: 100
teachers experience seem to increase pupils popularity.
m4_lme <- lme(popular ~ sex*texp, data = popula, random = ~ sex|school, method = "ML")
summary(m4_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: popula
## AIC BIC logLik
## 4261.85 4306.657 -2122.925
##
## Random effects:
## Formula: ~sex | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.6347377 (Intr)
## sex 0.4692521 0.08
## Residual 0.6264320
##
## Fixed effects: popular ~ sex * texp
## Value Std.Error DF t-value p-value
## (Intercept) 3.313651 0.15954654 1898 20.769180 0e+00
## sex 1.329479 0.13183479 1898 10.084432 0e+00
## texp 0.110229 0.01013882 98 10.872007 0e+00
## sex:texp -0.034025 0.00837995 1898 -4.060303 1e-04
## Correlation:
## (Intr) sex texp
## sex -0.046
## texp -0.909 0.042
## sex:texp 0.042 -0.908 -0.046
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.93387434 -0.64733763 0.02294381 0.53272602 3.49150352
##
## Number of Observations: 2000
## Number of Groups: 100
by creating interaction between sex and teach.exp. fixed effect seem to be a significant interaction between gender and teach exp. on avg. the popularity indexx for a boy with teacher that has 0exp. is 3.31 . for a boy whose teacher has 1 yr of exp. is 1.32 + .11
for females- on avverage 3.3136+ 1.3294 for female with 1 yr of exp= 3.3136 + 1.3294 + (-.034)
creates a new center the mean becomes the starting point. 0. anything below mean will fall below 0 and those above will be higher than 0.
popula %<>% mutate(ctexp = texp - mean(texp))
Random intercept for school and slope for sex with intearction with centered teacher exp.
m4a_lme <- lme(popular ~ sex*ctexp, data = popula, random = ~ sex|school, method = "ML")
summary(m4a_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: popula
## AIC BIC logLik
## 4261.85 4306.657 -2122.925
##
## Random effects:
## Formula: ~sex | school
## Structure: General positive-definite, Log-Cholesky parametrization
## StdDev Corr
## (Intercept) 0.6347377 (Intr)
## sex 0.4692521 0.08
## Residual 0.6264320
##
## Fixed effects: popular ~ sex * ctexp
## Value Std.Error DF t-value p-value
## (Intercept) 4.885851 0.06660875 1898 73.35149 0e+00
## sex 0.844178 0.05510824 1898 15.31855 0e+00
## ctexp 0.110229 0.01013882 98 10.87201 0e+00
## sex:ctexp -0.034025 0.00837995 1898 -4.06030 1e-04
## Correlation:
## (Intr) sex ctexp
## sex -0.044
## ctexp -0.006 0.000
## sex:ctexp 0.000 -0.004 -0.046
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.93387434 -0.64733763 0.02294381 0.53272602 3.49150352
##
## Number of Observations: 2000
## Number of Groups: 100
fixed effect: on average males perception of popularity is 4.88 when teacher exp. is averaged. on average females perception of pop. is 4.88 + .844 when teacher exp is averaged.
when teachers exp. increases by 1 year boys perception is 4.885 + .11 when teachers exp. increases by 1 year more than the average is 4.885 + .844 + -.03
texreg(list(m4_lme, m4a_lme))
##
## \begin{table}
## \begin{center}
## \begin{tabular}{l c c }
## \hline
## & Model 1 & Model 2 \\
## \hline
## (Intercept) & $3.31^{***}$ & $4.89^{***}$ \\
## & $(0.16)$ & $(0.07)$ \\
## sex & $1.33^{***}$ & $0.84^{***}$ \\
## & $(0.13)$ & $(0.06)$ \\
## texp & $0.11^{***}$ & \\
## & $(0.01)$ & \\
## sex:texp & $-0.03^{***}$ & \\
## & $(0.01)$ & \\
## ctexp & & $0.11^{***}$ \\
## & & $(0.01)$ \\
## sex:ctexp & & $-0.03^{***}$ \\
## & & $(0.01)$ \\
## \hline
## AIC & 4261.85 & 4261.85 \\
## BIC & 4306.66 & 4306.66 \\
## Log Likelihood & -2122.92 & -2122.92 \\
## Num. obs. & 2000 & 2000 \\
## Num. groups & 100 & 100 \\
## \hline
## \multicolumn{3}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
## \end{tabular}
## \caption{Statistical models}
## \label{table:coefficients}
## \end{center}
## \end{table}
m0_lme <- lme(popular ~ 1, random = ~ 1|school, data = popula, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
## Data: popula
## AIC BIC logLik
## 5118.722 5135.525 -2556.361
##
## Random effects:
## Formula: ~1 | school
## (Intercept) Residual
## StdDev: 0.9331052 0.7991726
##
## Fixed effects: popular ~ 1
## Value Std.Error DF t-value p-value
## (Intercept) 5.307603 0.0950469 1900 55.84194 0
##
## Standardized Within-Group Residuals:
## Min Q1 Med Q3 Max
## -2.88722829 -0.63321133 -0.05066821 0.71097805 3.00407338
##
## Number of Observations: 2000
## Number of Groups: 100
.933/ (.933+ .799)= .538 the difference of individual pop. .538 between schools.
intervals(m0_lme)
## Approximate 95% confidence intervals
##
## Fixed effects:
## lower est. upper
## (Intercept) 5.121243 5.307603 5.493964
## attr(,"label")
## [1] "Fixed effects:"
##
## Random Effects:
## Level: school
## lower est. upper
## sd((Intercept)) 0.801257 0.9331052 1.086649
##
## Within-group standard error:
## lower est. upper
## 0.7741578 0.7991726 0.8249956