completing approaches

unique tells us all factors in var. length says how many different var.

length(unique(popula$school))
## [1] 100

How many pupills in each school

popula %>% 
  group_by(school) %>% 
  summarise(n_sch = n())

First model ecological analysis

summarizing the mean of popularity and sex by school

schoold <- popula %>% 
  group_by(school) %>% 
  summarise(mean_p = mean(popular, na.rm = TRUE), mean_s = mean(sex, na.rm = TRUE))
head(schoold)

so in school 1, the mean popularity perception of pupils is 7.55 and the mean of sex is .50

ecoreg <- lm(mean_p ~ mean_s, data = schoold)
summary(ecoreg)
## 
## Call:
## lm(formula = mean_p ~ mean_s, data = schoold)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.75980 -0.66987  0.00539  0.56354  2.21650 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.3813     0.4528   9.676 6.12e-16 ***
## mean_s        1.9045     0.9108   2.091   0.0391 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9394 on 98 degrees of freedom
## Multiple R-squared:  0.04271,    Adjusted R-squared:  0.03294 
## F-statistic: 4.373 on 1 and 98 DF,  p-value: 0.03911

On average higher proportion of females in school is associated to a 1.9045.

The fallacy of ecological level association student-level causal connection.

dc ## CPOOLING MODEL

# cpooling <- gls(popular ~ sex, data = popula, method = "ML")
cpooling <- lm(popular ~ sex, data = popula)
summary(cpooling)
## 
## Call:
## lm(formula = popular ~ sex, data = popula)
## 
## Residuals:
## popularity according to sociometric score 
##     Min      1Q  Median      3Q     Max 
## -2.8782 -0.8782  0.1218  1.1218  4.1218 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.87817    0.03572  136.58   <2e-16 ***
## sex          0.88261    0.05118   17.25   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.144 on 1998 degrees of freedom
## Multiple R-squared:  0.1296, Adjusted R-squared:  0.1291 
## F-statistic: 297.4 on 1 and 1998 DF,  p-value: < 2.2e-16

This model shows that sex influences popularity but, fallacy environment might have influence

School environment may be an important determiant of what is popular and what is not There may be important between-school variations in the relationship The overall relationship is easily dominated by a few very large schools

NO POOLING

dcoef <- popula %>% 
    group_by(school) %>% 
    do(mod = lm(popular ~ sex, data = .))
coef <- dcoef %>% do(data.frame(intc = coef(.$mod)[1]))
ggplot(coef, aes(x = intc)) + geom_histogram()

This histogram states the average pop. index for males in the 100 schools. Intercept signifies the start the avg. of pop of males.

GENDER DIFFERENCE by looking at slope!

dcoef <- popula %>% 
    group_by(school) %>% 
    do(mod = lm(popular ~ sex, data = .))
coef <- dcoef %>% do(data.frame(sexc = coef(.$mod)[2]))
ggplot(coef, aes(x = sexc)) + geom_histogram()

slopes signifies difference in variable gender. in some schools boys feel more popular than girls this is shown from the below 0 area. but, majority of schools girls feel more popular than boys.

Partial Pooling

compared to complete-pooling * partial pooling model allows between- school variations. compared to no-pooling * distribution superimpose.

Before looking a plot create a superimpose hypothesis.

Random Intercept

m1_lme <- lme(popular ~ sex, data = popula, random = ~1|school, method = "ML")
summary(m1_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: popula 
##        AIC     BIC    logLik
##   4492.886 4515.29 -2242.443
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept)  Residual
## StdDev:   0.9237762 0.6779928
## 
## Fixed effects: popular ~ sex 
##                Value  Std.Error   DF  t-value p-value
## (Intercept) 4.897215 0.09487616 1899 51.61692       0
## sex         0.843713 0.03097064 1899 27.24234       0
##  Correlation: 
##     (Intr)
## sex -0.159
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -3.319094760 -0.688695720  0.002237255  0.595886663  3.825100008 
## 
## Number of Observations: 2000
## Number of Groups: 100

random intercept does not assume the affect to be diff. we only assume that the intercept follows a normal distribution. random= ~1 gives us the random school intercept. Random effects is divided into two parts residual and intercept. instead of plot 100 schools intercept in this one we are forcing schools to follow a normal intercept distribution.

Multilevel model #2

m2_lme <- lme(popular ~ sex, data = popula, random = ~ sex|school, method = "ML")
summary(m2_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: popula 
##        AIC      BIC    logLik
##   4341.617 4375.223 -2164.809
## 
## Random effects:
##  Formula: ~sex | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.9645887 (Intr)
## sex         0.5185785 -0.278
## Residual    0.6264514       
## 
## Fixed effects: popular ~ sex 
##                Value  Std.Error   DF  t-value p-value
## (Intercept) 4.890112 0.09856659 1899 49.61226       0
## sex         0.843120 0.05936106 1899 14.20325       0
##  Correlation: 
##     (Intr)
## sex -0.307
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -2.948021007 -0.657669324 -0.003250328  0.535579989  3.530548405 
## 
## Number of Observations: 2000
## Number of Groups: 100

random = ~ sex|school want to see random intercept and random affect of slope.

fixed affect gives the common trend shared by 100 schools.

AIC(cpooling, m1_lme, m2_lme)

Adding school-level covariate (adding teacher experience)

m3_lme <- lme(popular ~ sex + texp, data = popula, random = ~ sex|school, method = "ML")
summary(m3_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: popula 
##        AIC      BIC    logLik
##   4275.175 4314.382 -2130.588
## 
## Random effects:
##  Formula: ~sex | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.6344228 (Intr)
## sex         0.5193270 0.064 
## Residual    0.6264869       
## 
## Fixed effects: popular ~ sex + texp 
##                Value  Std.Error   DF  t-value p-value
## (Intercept) 3.339973 0.15928089 1899 20.96907       0
## sex         0.843175 0.05943020 1899 14.18765       0
## texp        0.108353 0.01011962   98 10.70718       0
##  Correlation: 
##      (Intr) sex   
## sex  -0.020       
## texp -0.908  0.000
## 
## Standardized Within-Group Residuals:
##          Min           Q1          Med           Q3          Max 
## -2.937854583 -0.616381486  0.002220378  0.528756210  3.518629489 
## 
## Number of Observations: 2000
## Number of Groups: 100

teachers experience seem to increase pupils popularity.

Interaction model

m4_lme <- lme(popular ~ sex*texp, data = popula, random = ~ sex|school, method = "ML")
summary(m4_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: popula 
##       AIC      BIC    logLik
##   4261.85 4306.657 -2122.925
## 
## Random effects:
##  Formula: ~sex | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.6347377 (Intr)
## sex         0.4692521 0.08  
## Residual    0.6264320       
## 
## Fixed effects: popular ~ sex * texp 
##                 Value  Std.Error   DF   t-value p-value
## (Intercept)  3.313651 0.15954654 1898 20.769180   0e+00
## sex          1.329479 0.13183479 1898 10.084432   0e+00
## texp         0.110229 0.01013882   98 10.872007   0e+00
## sex:texp    -0.034025 0.00837995 1898 -4.060303   1e-04
##  Correlation: 
##          (Intr) sex    texp  
## sex      -0.046              
## texp     -0.909  0.042       
## sex:texp  0.042 -0.908 -0.046
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.93387434 -0.64733763  0.02294381  0.53272602  3.49150352 
## 
## Number of Observations: 2000
## Number of Groups: 100

by creating interaction between sex and teach.exp. fixed effect seem to be a significant interaction between gender and teach exp. on avg. the popularity indexx for a boy with teacher that has 0exp. is 3.31 . for a boy whose teacher has 1 yr of exp. is 1.32 + .11

for females- on avverage 3.3136+ 1.3294 for female with 1 yr of exp= 3.3136 + 1.3294 + (-.034)

CENTERING COVARIATES

creates a new center the mean becomes the starting point. 0. anything below mean will fall below 0 and those above will be higher than 0.

popula %<>% mutate(ctexp = texp - mean(texp))

Random intercept for school and slope for sex with intearction with centered teacher exp.

m4a_lme <- lme(popular ~ sex*ctexp, data = popula, random = ~ sex|school, method = "ML")
summary(m4a_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: popula 
##       AIC      BIC    logLik
##   4261.85 4306.657 -2122.925
## 
## Random effects:
##  Formula: ~sex | school
##  Structure: General positive-definite, Log-Cholesky parametrization
##             StdDev    Corr  
## (Intercept) 0.6347377 (Intr)
## sex         0.4692521 0.08  
## Residual    0.6264320       
## 
## Fixed effects: popular ~ sex * ctexp 
##                 Value  Std.Error   DF  t-value p-value
## (Intercept)  4.885851 0.06660875 1898 73.35149   0e+00
## sex          0.844178 0.05510824 1898 15.31855   0e+00
## ctexp        0.110229 0.01013882   98 10.87201   0e+00
## sex:ctexp   -0.034025 0.00837995 1898 -4.06030   1e-04
##  Correlation: 
##           (Intr) sex    ctexp 
## sex       -0.044              
## ctexp     -0.006  0.000       
## sex:ctexp  0.000 -0.004 -0.046
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.93387434 -0.64733763  0.02294381  0.53272602  3.49150352 
## 
## Number of Observations: 2000
## Number of Groups: 100

fixed effect: on average males perception of popularity is 4.88 when teacher exp. is averaged. on average females perception of pop. is 4.88 + .844 when teacher exp is averaged.

when teachers exp. increases by 1 year boys perception is 4.885 + .11 when teachers exp. increases by 1 year more than the average is 4.885 + .844 + -.03

texreg(list(m4_lme, m4a_lme))
## 
## \begin{table}
## \begin{center}
## \begin{tabular}{l c c }
## \hline
##  & Model 1 & Model 2 \\
## \hline
## (Intercept)    & $3.31^{***}$  & $4.89^{***}$  \\
##                & $(0.16)$      & $(0.07)$      \\
## sex            & $1.33^{***}$  & $0.84^{***}$  \\
##                & $(0.13)$      & $(0.06)$      \\
## texp           & $0.11^{***}$  &               \\
##                & $(0.01)$      &               \\
## sex:texp       & $-0.03^{***}$ &               \\
##                & $(0.01)$      &               \\
## ctexp          &               & $0.11^{***}$  \\
##                &               & $(0.01)$      \\
## sex:ctexp      &               & $-0.03^{***}$ \\
##                &               & $(0.01)$      \\
## \hline
## AIC            & 4261.85       & 4261.85       \\
## BIC            & 4306.66       & 4306.66       \\
## Log Likelihood & -2122.92      & -2122.92      \\
## Num. obs.      & 2000          & 2000          \\
## Num. groups    & 100           & 100           \\
## \hline
## \multicolumn{3}{l}{\scriptsize{$^{***}p<0.001$, $^{**}p<0.01$, $^*p<0.05$}}
## \end{tabular}
## \caption{Statistical models}
## \label{table:coefficients}
## \end{center}
## \end{table}

Intra- class correlation

popularity mainly an individual level no gender or teacher exp.

m0_lme <- lme(popular ~ 1, random = ~ 1|school, data = popula, method = "ML")
summary(m0_lme)
## Linear mixed-effects model fit by maximum likelihood
##  Data: popula 
##        AIC      BIC    logLik
##   5118.722 5135.525 -2556.361
## 
## Random effects:
##  Formula: ~1 | school
##         (Intercept)  Residual
## StdDev:   0.9331052 0.7991726
## 
## Fixed effects: popular ~ 1 
##                Value Std.Error   DF  t-value p-value
## (Intercept) 5.307603 0.0950469 1900 55.84194       0
## 
## Standardized Within-Group Residuals:
##         Min          Q1         Med          Q3         Max 
## -2.88722829 -0.63321133 -0.05066821  0.71097805  3.00407338 
## 
## Number of Observations: 2000
## Number of Groups: 100

.933/ (.933+ .799)= .538 the difference of individual pop. .538 between schools.

Computing confidence intervals

intervals(m0_lme)
## Approximate 95% confidence intervals
## 
##  Fixed effects:
##                lower     est.    upper
## (Intercept) 5.121243 5.307603 5.493964
## attr(,"label")
## [1] "Fixed effects:"
## 
##  Random Effects:
##   Level: school 
##                    lower      est.    upper
## sd((Intercept)) 0.801257 0.9331052 1.086649
## 
##  Within-group standard error:
##     lower      est.     upper 
## 0.7741578 0.7991726 0.8249956