1 Introduction

The data set days absent contains attendance data on 314 high school juniors from high schools. The response variable of interest is days absent from school.

The standardized math score for each student was given as well as the program variable indicating the type of instructional program in which the student was enrolled.

Column 1: Student ID
Column 2: Gender ID
Column 3: Program ID
Column 4: Standardized mathematics score
Column 5: Days absent from school

The interest is to model the number of days of absence from school using student’s gender, the standardized test in math, and the type of program in which the student is enrolled.

2 Data management

#input data
dta <- read.csv("C:/Users/Ching-Fang Wu/Documents/data/days_absent.csv",h=T)

head(dta,8)

##      ID Gender  Program Math Days
## 1 S1001   male Academic   63    4
## 2 S1002   male Academic   27    4
## 3 S1003 female Academic   20    2
## 4 S1004 female Academic   16    3
## 5 S1005 female Academic    2    3
## 6 S1006 female Academic   71   13
## 7 S1007 female Academic   63   11
## 8 S1008   male Academic    3    7

tail(dta,8)

##        ID Gender    Program Math Days
## 307 S2150   male Vocational   63    3
## 308 S2151 female Vocational   59    7
## 309 S2152 female Vocational   46    1
## 310 S2153   male   Academic   26    1
## 311 S2154 female Vocational   79    3
## 312 S2155 female   Academic   59    0
## 313 S2156 female Vocational   90    0
## 314 S2157 female Vocational   77    2

str(dta)

## 'data.frame':    314 obs. of  5 variables:
##  $ ID     : chr  "S1001" "S1002" "S1003" "S1004" ...
##  $ Gender : chr  "male" "male" "female" "female" ...
##  $ Program: chr  "Academic" "Academic" "Academic" "Academic" ...
##  $ Math   : int  63 27 20 16 2 71 63 3 51 49 ...
##  $ Days   : int  4 4 2 3 3 13 11 7 10 9 ...

類別變數有三個：ID、Gender、Program 數值變數有兩個：Math、Days

summary(dta)

##       ID               Gender            Program               Math      
##  Length:314         Length:314         Length:314         Min.   : 1.00  
##  Class :character   Class :character   Class :character   1st Qu.:28.00  
##  Mode  :character   Mode  :character   Mode  :character   Median :48.00  
##                                                           Mean   :48.27  
##                                                           3rd Qu.:70.00  
##                                                           Max.   :99.00  
##       Days       
##  Min.   : 0.000  
##  1st Qu.: 1.000  
##  Median : 4.000  
##  Mean   : 5.955  
##  3rd Qu.: 8.000  
##  Max.   :35.000

標準化數學分數的平均值=48.27和中位數差不多。缺席天數平均值=5.955，中位數=4。

3 Data Visulization

library(ggplot2)
ggplot(dta, 
       aes(Days)) +
 geom_histogram(binwidth = 1, fill=8, col=1)+
 facet_grid(Gender ~ Program) + #依據Gender及Program來畫缺席天數次數表
 labs(x="Days absent", 
      y="Frequency") +
 theme_minimal()+
 theme(legend.position = c(.95, .9))

4 mean and variance

knitr::kable(aggregate(cbind(Days, Math)~ Gender+Program, data=dta, FUN=mean))

Gender	Program	Days	Math
female	Academic	7.759036	43.69880
male	Academic	6.119048	41.50000
female	General	12.363636	44.40909
male	General	8.555556	45.33333
female	Vocational	2.709091	58.00000
male	Vocational	2.634615	58.84615

Academic program 的女學生平均缺席天數為7.759，數學標準化分數平均為43.698。 General program的女學生平均缺席天數為12.364，數學標準化分數平均為44.409。 Vocational program的女學生平均缺席天數為2.709，數學標準化分數平均為58。

knitr::kable(aggregate(cbind(Days, Math)~ Gender+Program, data=dta, FUN=var))

Gender	Program	Days	Math
female	Academic	68.55098	586.7252
male	Academic	41.81698	689.8193
female	General	84.33766	804.5390
male	General	41.67320	923.5294
female	Vocational	10.28418	512.4815
male	Vocational	18.07956	364.9170

5 Effects of program, gender, and math

ggplot(dta, 
       aes(Math, Days, 
           color=Gender)) +
 stat_smooth(method="glm",
             formula=y ~ x,
             method.args=list(family=poisson),
             size=rel(.5)) +
 geom_point(pch=20, alpha=.5) +
 facet_grid(. ~ Program) +
 labs(y="Days absent", 
      x="Math score") +
 theme_minimal()+
 theme(legend.position = c(.95, .9))

Acdemic及Generalprogram的學生，不論男女，缺課天數越少，數學分數越高。

不同program學生的缺課天數，缺課天數最多的是General的學生，Acdemic次之，Vactional最少。

6 Parameter estimates

sjPlot::tab_model(m0 <- glm(Days ~ Program*Gender*Math, data=dta,family=poisson(link=log)), show.se=T, show.r2=F, show.obs=F)

	Days
Predictors	Incidence Rate Ratios	std. Error	CI	p
(Intercept)	10.43	0.08	8.94 – 12.14	<0.001
Program [General]	1.47	0.13	1.13 – 1.91	0.004
Program [Vocational]	0.23	0.25	0.14 – 0.36	<0.001
Gender [male]	0.85	0.11	0.69 – 1.06	0.146
Math	0.99	0.00	0.99 – 1.00	<0.001
Program [General] * Gender [male]	0.97	0.20	0.65 – 1.43	0.863
Program [Vocational] * Gender [male]	1.59	0.37	0.76 – 3.27	0.214
Program [General] * Math	1.00	0.00	1.00 – 1.01	0.479
Program [Vocational] * Math	1.01	0.00	1.00 – 1.02	0.019
Gender [male] * Math	1.00	0.00	0.99 – 1.00	0.274
(Program [General] Gender [male]) Math	1.00	0.00	0.99 – 1.01	0.692
(Program [Vocational] Gender [male]) Math	1.00	0.01	0.98 – 1.01	0.634

7 Goodness of fit and overdispersion

1 - pchisq(deviance(m0), df.residual(m0))

## [1] 0

performance::check_overdispersion(m0)

## # Overdispersion test
## 
##        dispersion ratio =    6.492
##   Pearson's Chi-Squared = 1960.446
##                 p-value =  < 0.001

## Overdispersion detected.

dispersion ratio = 6.492

plot(log(fitted(m0)), log((dta$Days-fitted(m0))^2), 
     xlab=expression(hat(mu)),
     ylab=expression((y-hat(mu))^2))
grid()
abline(0, 1, col=2)

8 Dispersion parameter

knitr::kable(as.data.frame(summary(m0, dispersion=6.492)$coef))

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	2.3450136	0.1989867	11.7847779	0.0000000
ProgramGeneral	0.3873043	0.3411743	1.1352096	0.2562875
ProgramVocational	-1.4878389	0.6257345	-2.3777477	0.0174187
Gendermale	-0.1588403	0.2785265	-0.5702878	0.5684825
Math	-0.0071045	0.0043252	-1.6425893	0.1004679
ProgramGeneral:Gendermale	-0.0347250	0.5144980	-0.0674929	0.9461893
ProgramVocational:Gendermale	0.4617659	0.9469746	0.4876222	0.6258175
ProgramGeneral:Math	0.0019809	0.0071360	0.2775861	0.7813301
ProgramVocational:Math	0.0094842	0.0103312	0.9180121	0.3586125
Gendermale:Math	-0.0026914	0.0062654	-0.4295662	0.6675112
ProgramGeneral:Gendermale:Math	-0.0017088	0.0110050	-0.1552787	0.8766016
ProgramVocational:Gendermale:Math	-0.0029735	0.0159262	-0.1867067	0.8518906

Week8 homework 2

Ching-Fang Wu

Sun Nov 01 18:16:38 2020