Two measures of bullying behavior have been constructed. The first uses self-report on a 9-item Illinois Bully Scale. The second is peer nomination in which children list any person they perceive as a bully. The total number of nominations a person receives is a measure of bullying behavior.
Source: Espelage, D.L., Holt, M.K., & Henkel, R.R. (2003). Examination of Peer-Group Contextual Effects on Aggression During Early Adolescence. Child Development, 74, 205-220.
pacman::p_load(tidyverse, MASS, pscl)
dta1<- read.table("bullying.txt", h=T)
str(dta1)
## 'data.frame': 291 obs. of 2 variables:
## $ Score : num 1.56 1.56 1.11 1.56 1.22 1.33 1.11 1 1.11 1.22 ...
## $ Nomination: int 0 0 0 0 4 0 0 0 0 19 ...
head(dta1)
## Score Nomination
## 1 1.56 0
## 2 1.56 0
## 3 1.11 0
## 4 1.56 0
## 5 1.22 4
## 6 1.33 0
# proportion of zeros
sum(dta1$Nomination < 1)/length(dta1$Nomination)
## [1] 0.5635739
colMeans(dta1)
## Score Nomination
## 1.658110 2.487973
# use the Freedman-Diaconis rule for bin width
bd <- 2*IQR(dta1$Nomination)/(dim(dta1)[1]^(1/3))
ot<-theme_set(theme_bw())
# histogram of nomination
ggplot(dta1, aes(Nomination))+
stat_bin(binwidth = bd)+
labs(x="Number of nominations",
y="Count")+
theme_bw()
#Poisson fit
ggplot(dta1, aes(Score, Nomination))+
geom_jitter(alpha=.2)+
stat_smooth(method = "glm", method.args = list(family=poisson))+
labs(y="Number of peer nomination",
x="Bully score")
#poisoon model
summary(m0<-glm(Nomination~Score, family=poisson, data=dta1))
##
## Call:
## glm(formula = Nomination ~ Score, family = poisson, data = dta1)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -6.4995 -1.8246 -1.5952 -0.1552 11.6806
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.66308 0.08871 -7.475 7.75e-14 ***
## Score 0.81434 0.03504 23.239 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 2205.2 on 290 degrees of freedom
## Residual deviance: 1774.6 on 289 degrees of freedom
## AIC: 2160.3
##
## Number of Fisher Scoring iterations: 6
# zero-inflated poisson
summary(m1<-zeroinfl(Nomination~Score, data=dta1)) #equivalent Nomination~Score|Score
##
## Call:
## zeroinfl(formula = Nomination ~ Score, data = dta1)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -2.37068 -0.68873 -0.59685 -0.05357 9.87491
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.49604 0.09987 4.967 6.81e-07 ***
## Score 0.59609 0.03942 15.120 < 2e-16 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.4967 0.3463 4.322 1.54e-05 ***
## Score -0.7716 0.1967 -3.924 8.72e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 8
## Log-likelihood: -780.6 on 4 Df
# zero-inflated negative binomial distribution
summary(m2<-zeroinfl(Nomination~Score, data=dta1, dist="negbin"))
##
## Call:
## zeroinfl(formula = Nomination ~ Score, data = dta1, dist = "negbin")
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.58338 -0.48698 -0.38705 0.02297 6.96982
##
## Count model coefficients (negbin with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.6737 0.4354 -1.547 0.122
## Score 0.8819 0.2052 4.297 1.73e-05 ***
## Log(theta) -1.0658 0.1792 -5.947 2.73e-09 ***
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.954 2.484 1.189 0.234
## Score -3.419 2.260 -1.513 0.130
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Theta = 0.3445
## Number of iterations in BFGS optimization: 21
## Log-likelihood: -494.7 on 5 Df
vuong(m0, m1)
## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw -4.891816 model2 > model1 4.9955e-07
## AIC-corrected -4.858930 model2 > model1 5.9011e-07
## BIC-corrected -4.798531 model2 > model1 7.9917e-07
vuong(m1, m2)
## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw -4.242754 model2 > model1 1.104e-05
## AIC-corrected -4.242754 model2 > model1 1.104e-05
## BIC-corrected -4.242754 model2 > model1 1.104e-05
# Fortify data with fitted values and residuals
dta1_m<- data.frame(dta1,
fit1=predict(m1, type="response"),
fit2=predict(m2, type="response"),
r1=resid(m1, type="pearson"),
r2=resid(m2, type="pearson"))
ggplot(dta1_m, aes(fit1, r1))+
geom_point(pch=20)+
geom_point(aes(fit2,r2), pch=1)+
geom_hline(yintercept = 0)+
labs(x="Fitted values", y="Residuals",
title="Poisson vs. Negative Binomial")
#Fortify data with model fits and CIs
dta1_m2<-data.frame(dta1, yhat=predict(m2))
#model fit over observations
ggplot(dta1_m2, aes(Score, Nomination))+
geom_point(pch=20, alpha=.2)+
geom_line(aes(Score, yhat), color="magenta", lwd=rel(1))+
stat_smooth(method = "glm", method.args = list(family=poisson))+
labs(x="Bully score", y="Predicted number of nomination")
A sample of 978 young men aged 20 to 22 years was extracted from the National Longitudinal Survey of Youth (NLSY). The outcome of interest is whether a youth’s reported major activity in 1985 was (1) working, (2) in school, or (3) inactive. Six additional variables were also recorded.
Source: Powers, D.A., & Xie, Y. (2000). Statistical Methods for Categorical Data Analysis. p. 232.
dta2<- read.table("youth_employment.txt", h=T)
head(dta2)
## Employment Race Family DadEdu Income Local ASAB
## 1 Work Others Non-Intact Otherwise 0.4909 6.9 -0.110
## 2 School Others Non-Intact Otherwise 0.6940 4.8 0.452
## 3 Work Others Otherwise Otherwise 1.1000 6.5 0.967
## 4 Work Others Non-Intact College 1.5000 3.8 1.667
## 5 Work Others Non-Intact Otherwise 0.2544 6.9 0.000
## 6 School Others Non-Intact Otherwise 0.9391 5.4 0.000
str(dta2)
## 'data.frame': 978 obs. of 7 variables:
## $ Employment: chr "Work" "School" "Work" "Work" ...
## $ Race : chr "Others" "Others" "Others" "Others" ...
## $ Family : chr "Non-Intact" "Non-Intact" "Otherwise" "Non-Intact" ...
## $ DadEdu : chr "Otherwise" "Otherwise" "Otherwise" "College" ...
## $ Income : num 0.491 0.694 1.1 1.5 0.254 ...
## $ Local : num 6.9 4.8 6.5 3.8 6.9 5.4 4.2 6.2 6.2 6.9 ...
## $ ASAB : num -0.11 0.452 0.967 1.667 0 ...
dta2$Race <- relevel(factor(dta2$Race), ref="Others", levels=c("Black", "Others"))
dta2$Family<-relevel(factor(dta2$Family), ref="Otherwise")
dta2$DadEdu<-relevel(factor(dta2$DadEdu), ref="Otherwise")
dta2$Incf<- factor(cut(dta2$Income,
breaks=c(quantile(dta2$Income, probs=seq(0, 1, by=.33))),
labels=c("Low", "Middle","High"), ordered=T, lowest.inclued=T))
dta2$Localf<- factor(cut(dta2$Local,
breaks=c(quantile(dta2$Local, probs=seq(0, 1, by=.5))),
labels=c("Low","High"), ordered=T, lowest.inclued=T))
dta2$ASABf<- factor(cut(dta2$ASAB,
breaks=c(quantile(dta2$ASAB, probs=seq(0, 1, by=.33))),
labels=c("Low", "Middle","High"), ordered=T, lowest.inclued=T))
str(dta2)
## 'data.frame': 978 obs. of 10 variables:
## $ Employment: chr "Work" "School" "Work" "Work" ...
## $ Race : Factor w/ 2 levels "Others","Black": 1 1 1 1 1 1 1 1 1 1 ...
## $ Family : Factor w/ 2 levels "Otherwise","Non-Intact": 2 2 1 2 2 2 1 1 2 1 ...
## $ DadEdu : Factor w/ 2 levels "Otherwise","College": 1 1 1 2 1 1 1 2 2 2 ...
## $ Income : num 0.491 0.694 1.1 1.5 0.254 ...
## $ Local : num 6.9 4.8 6.5 3.8 6.9 5.4 4.2 6.2 6.2 6.9 ...
## $ ASAB : num -0.11 0.452 0.967 1.667 0 ...
## $ Incf : Ord.factor w/ 3 levels "Low"<"Middle"<..: 2 2 3 3 1 3 3 3 3 3 ...
## $ Localf : Ord.factor w/ 2 levels "Low"<"High": 1 1 1 1 1 1 1 1 1 1 ...
## $ ASABf : Ord.factor w/ 3 levels "Low"<"Middle"<..: 2 3 3 NA 2 2 2 3 3 3 ...
with(dta2,ftable(Race,Family, DadEdu,Employment))
## Employment Idle School Work
## Race Family DadEdu
## Others Otherwise Otherwise 84 161 141
## College 23 66 72
## Non-Intact Otherwise 47 43 54
## College 5 12 18
## Black Otherwise Otherwise 28 60 26
## College 6 8 3
## Non-Intact Otherwise 39 40 27
## College 3 10 2
a<-as.data.frame(with(dta2,ftable(Race,Family, DadEdu,Employment)))
with(dta2, ftable(Incf,ASABf,Localf, Employment))
## Employment Idle School Work
## Incf ASABf Localf
## Low Low Low 26 39 19
## High 42 34 19
## Middle Low 21 22 15
## High 11 26 10
## High Low 3 13 8
## High 4 5 3
## Middle Low Low 13 20 24
## High 11 24 10
## Middle Low 22 28 36
## High 6 16 13
## High Low 9 22 29
## High 9 19 12
## High Low Low 6 9 10
## High 4 6 5
## Middle Low 18 23 16
## High 4 16 15
## High Low 15 40 59
## High 11 26 27
b<-as.data.frame(with(dta2, ftable(Incf,ASABf,Localf, Employment)))
dta2_fn<-subset(dta2, Employment!="School")%>% mutate(resp=ifelse(Employment=="Work", 1,0))
dta2_fp<-subset(dta2, Employment!="Work") %>% mutate(resp=ifelse(Employment=="School",1,0))
m_fn<-glm(resp~Race+Family+DadEdu+Income+Local+ASAB, data=dta2_fn, family=binomial)
m_fp<-glm(resp~Race+Family+DadEdu+Income+Local+ASAB, data=dta2_fp, family=binomial)
sjPlot::tab_model(m_fn, m_fp, show.obs = F, show.r2 = F)
| resp | resp | |||||
|---|---|---|---|---|---|---|
| Predictors | Odds Ratios | CI | p | Odds Ratios | CI | p |
| (Intercept) | 2.02 | 1.00 – 4.10 | 0.051 | 1.52 | 0.78 – 2.94 | 0.218 |
| Race [Black] | 0.64 | 0.41 – 0.99 | 0.046 | 1.25 | 0.85 – 1.84 | 0.265 |
| Family [Non-Intact] | 0.90 | 0.62 – 1.33 | 0.609 | 0.57 | 0.40 – 0.82 | 0.003 |
| DadEdu [College] | 1.19 | 0.74 – 1.92 | 0.471 | 1.28 | 0.81 – 2.04 | 0.303 |
| Income | 1.72 | 1.10 – 2.73 | 0.019 | 1.22 | 0.84 – 1.81 | 0.316 |
| Local | 0.92 | 0.85 – 0.99 | 0.025 | 1.01 | 0.95 – 1.09 | 0.708 |
| ASAB | 1.34 | 1.07 – 1.67 | 0.010 | 1.22 | 0.99 – 1.51 | 0.067 |
Really strange plots
ggplot(subset(a, Employment!="Idle"), aes(DadEdu, Freq/sum(Freq), color=Employment))+
geom_line()+
stat_smooth(method = "glm", method.args = list(family=binomial))+
geom_point()+
facet_grid(Family~Race)+
scale_y_continuous(seq(-1.5, 1,by=.5))+
labs(x="Father's education", y="Log-odds(Inactive)")
ggplot(subset(b, Employment!="Idle"), aes(ASABf, Freq/cumsum(Freq), color=Employment))+
geom_line()+
stat_smooth(method = "glm", method.args = list(family=binomial))+
geom_point()+
facet_wrap(Incf~Localf, nrow=2)+
scale_y_continuous(seq(-1.5, 1,by=.5))+
labs(x="Father's education", y="Log-odds(Inactive)")
summary(m0 <- nnet::multinom(Employment ~ ., data = dta2))
## # weights: 39 (24 variable)
## initial value 1046.977511
## iter 10 value 995.801470
## iter 20 value 989.576609
## final value 989.321874
## converged
## Call:
## nnet::multinom(formula = Employment ~ ., data = dta2)
##
## Coefficients:
## (Intercept) RaceBlack FamilyNon-Intact DadEduCollege Income
## School 0.7018017 0.2391504 -0.54708859 0.2263025 -0.07437054
## Work 0.7184842 -0.3885990 -0.04838905 0.1326759 0.22198338
## Local ASAB Incf.L Incf.Q Localf.L ASABf.L
## School -0.004238422 -0.01547256 0.1184440 -0.09025778 0.11135278 0.4116813
## Work -0.067554926 -0.08513663 0.3412993 -0.29255812 -0.02332271 0.5843150
## ASABf.Q
## School 0.04306545
## Work 0.12077936
##
## Std. Errors:
## (Intercept) RaceBlack FamilyNon-Intact DadEduCollege Income
## School 0.5384768 0.1997812 0.1888134 0.2396764 0.4137664
## Work 0.5635782 0.2230273 0.1957335 0.2457077 0.4089509
## Local ASAB Incf.L Incf.Q Localf.L ASABf.L ASABf.Q
## School 0.05398749 0.2637922 0.3230540 0.1537179 0.1882321 0.4201313 0.1469578
## Work 0.05915372 0.2756148 0.3261682 0.1578611 0.2015597 0.4372910 0.1540811
##
## Residual Deviance: 1978.644
## AIC: 2026.644
The mental health status of a sample of 1,660 young New York residents in midtown Manhattan was classified by their parents’ socioeconomic status (A=High, F=Low).
Source: Srole, L. & Fisher, A.K. (1978). Mental health in the metropolis. The midtown Manhattan study. Reported in Agresti, A. (1989). Tutorial on modeling ordered categorical response data. Psychological Bulletin, 105, 290-301.
dta3<- read.table("mentalSES.txt", h=T)
str(dta3)
## 'data.frame': 1660 obs. of 8 variables:
## $ Id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ ses : int 6 6 6 6 6 6 6 6 6 6 ...
## $ A : int 1 1 1 1 1 1 1 1 1 1 ...
## $ B : int 0 0 0 0 0 0 0 0 0 0 ...
## $ C : int 0 0 0 0 0 0 0 0 0 0 ...
## $ D : int 0 0 0 0 0 0 0 0 0 0 ...
## $ E : int 0 0 0 0 0 0 0 0 0 0 ...
## $ mental: int 1 1 1 1 1 1 1 1 1 1 ...
head(dta3)
## Id ses A B C D E mental
## 1 1 6 1 0 0 0 0 1
## 2 2 6 1 0 0 0 0 1
## 3 3 6 1 0 0 0 0 1
## 4 4 6 1 0 0 0 0 1
## 5 5 6 1 0 0 0 0 1
## 6 6 6 1 0 0 0 0 1
In a survey people were asked, “Is addiction a disease or are addicts weak-willed?” The response was in three categories, 0 = “addicts are weak-willed,” 1 = “addiction is a disease,” or 2 = “both alternatives hold”. Use an appropriate model to examine how the response depends on respondent’s gender (0 = male, 1 = female), age, and academic status (1 = academic or 0 = otherwise). The dataset is available as addiction{catdata}.
data(addiction, package = "catdata")
head(addiction)
## ill gender age university
## 1 1 1 61 0
## 2 0 1 43 0
## 3 2 0 44 0
## 4 0 1 21 1
## 5 0 0 33 0
## 6 1 0 83 0
str(addiction)
## 'data.frame': 712 obs. of 4 variables:
## $ ill : int 1 0 2 0 0 1 0 1 1 1 ...
## $ gender : int 1 1 0 1 0 0 0 0 1 0 ...
## $ age : int 61 43 44 21 33 83 29 61 37 19 ...
## $ university: int 0 0 0 1 0 0 0 0 0 0 ...
m0_addic<-glm(ill~factor(gender)+age+factor(university), data=addiction, family = poisson)
summary(m0_addic)
##
## Call:
## glm(formula = ill ~ factor(gender) + age + factor(university),
## family = poisson, data = addiction)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.92293 -1.17525 0.01138 0.33807 1.33959
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.570631 0.113507 -5.027 4.98e-07 ***
## factor(gender)1 0.081732 0.079617 1.027 0.30462
## age 0.010792 0.002222 4.857 1.19e-06 ***
## factor(university)1 0.235455 0.084554 2.785 0.00536 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 528.42 on 681 degrees of freedom
## Residual deviance: 498.77 on 678 degrees of freedom
## (30 observations deleted due to missingness)
## AIC: 1570.6
##
## Number of Fisher Scoring iterations: 5
The data were collected during face-to-face home-based interviews with 357 female participants. Each participant’s height, weight as well as her answer to the following question were recorded. “Vigorous physical activities usually make you breathe hard or feel tired most of the time. Examples of vigorous activities include: jogging, fast dancing, soccer, fast swimming, fast biking, and Stairmaster. How many days in a typical week do you do vigorous physical activities for 20 minutes or more?” BMI was calculated using the Quetelet index (kg/m2), which is considered both a convenient and reliable indicator of overweight and obesity.
Source: Slymen, D.J., Ayala, G.X., Arredondo, E.M., & Elder, J.P. (2006). A demonstration of modeling count data with an application to physical activity. Epidemiologic Perspectives & Innovations, 3:3, 1-9.
dta_2<- read.table("physical_activity.txt", h=T)
head(dta_2)
## bmi count
## 1 22 0
## 2 31 0
## 3 29 0
## 4 33 0
## 5 18 0
## 6 25 0
str(dta_2)
## 'data.frame': 357 obs. of 2 variables:
## $ bmi : int 22 31 29 33 18 25 28 30 29 16 ...
## $ count: int 0 0 0 0 0 0 0 0 0 0 ...
t<-rbind(colMeans(dta_2), apply(dta_2, 2, FUN=var))
rownames(t)<-c("ave", "var")
t
## bmi count
## ave 26.44538 0.5938375
## var 14.88816 2.2194001
ggplot(dta_2, aes(x=bmi, y=count))+
geom_bar(stat="identity")+
labs(x="BMI", y="Count")+
theme_minimal()+
theme(legend.position = c(.1,.9))
#Poisson fit
ggplot(dta_2, aes(bmi, count))+
geom_jitter(alpha=.2)+
stat_smooth(method = "glm", method.args = list(family=poisson))+
labs(y="Count of BMI",
x="BMI")
summary(m0_dta_2<-glm(count~bmi, data=dta_2, family="poisson"))
##
## Call:
## glm(formula = count ~ bmi, family = "poisson", data = dta_2)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8404 -1.1270 -0.9570 -0.8127 4.8654
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.27048 0.43383 5.234 1.66e-07 ***
## bmi -0.10898 0.01729 -6.302 2.94e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 795.57 on 356 degrees of freedom
## Residual deviance: 756.44 on 355 degrees of freedom
## AIC: 946.99
##
## Number of Fisher Scoring iterations: 6
summary(m1_dta_2<-zeroinfl(count~bmi,data=dta_2))
##
## Call:
## zeroinfl(formula = count ~ bmi, data = dta_2)
##
## Pearson residuals:
## Min 1Q Median 3Q Max
## -0.7046 -0.4250 -0.3568 -0.2989 4.8343
##
## Count model coefficients (poisson with log link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.73817 0.53913 3.224 0.00126 **
## bmi -0.02276 0.02165 -1.051 0.29321
##
## Zero-inflation model coefficients (binomial with logit link):
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.51385 0.95813 -1.580 0.11411
## bmi 0.11588 0.03736 3.101 0.00193 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Number of iterations in BFGS optimization: 7
## Log-likelihood: -281.4 on 4 Df
vuong(m0_dta_2, m1_dta_2)
## Vuong Non-Nested Hypothesis Test-Statistic:
## (test-statistic is asymptotically distributed N(0,1) under the
## null that the models are indistinguishible)
## -------------------------------------------------------------
## Vuong z-statistic H_A p-value
## Raw -7.822786 model2 > model1 2.5834e-15
## AIC-corrected -7.740484 model2 > model1 4.9520e-15
## BIC-corrected -7.580912 model2 > model1 1.7157e-14
#parameter estimates
show(m0_est<-cbind(Estimate=coef(m0_dta_2), confint(m0_dta_2)))
## Estimate 2.5 % 97.5 %
## (Intercept) 2.2704806 1.4099338 3.11111748
## bmi -0.1089812 -0.1428471 -0.07502992
show(m1_est<-cbind(Estimate=coef(m1_dta_2), confint(m1_dta_2)))
## Estimate 2.5 % 97.5 %
## count_(Intercept) 1.73816504 0.68148156 2.79484853
## count_bmi -0.02275706 -0.06519126 0.01967714
## zero_(Intercept) -1.51384962 -3.39175256 0.36405331
## zero_bmi 0.11588088 0.04264803 0.18911372
exp(m0_est)
## Estimate 2.5 % 97.5 %
## (Intercept) 9.6840538 4.0956844 22.4461134
## bmi 0.8967472 0.8668866 0.9277157
exp(m1_est)
## Estimate 2.5 % 97.5 %
## count_(Intercept) 5.6868986 1.97680431 16.360151
## count_bmi 0.9774999 0.93688825 1.019872
## zero_(Intercept) 0.2200612 0.03364965 1.439151
## zero_bmi 1.1228621 1.04357053 1.208178
anova(m0_dta_2, m1_dta_2)
## Analysis of Deviance Table
##
## Model: poisson, link: log
##
## Response: count
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev
## NULL 356 795.57
## bmi 1 39.13 355 756.44
dta_2_m<- data.frame(dta_2,
fit0=predict(m0_dta_2, type="response"),
fit1=predict(m1_dta_2, type="response"),
r0=resid(m0_dta_2, type="pearson"),
r1=resid(m1_dta_2, type="pearson"))
ggplot(dta_2_m, aes(fit0, r0))+
geom_point(pch=20)+
geom_point(aes(fit1, r1),pch=3)+
geom_hline(yintercept = 0)+
labs(x="Fitted values", y="Residuals")
-m1_dta_2 model is better fitted(?)
In a clinical study a sample of 127 patients with sport related injuries have been treated with two different therapies (chosen by random design). After 3,7 and 10 days of treatment the pain occuring during knee movement was observed. The dataset is available as knee{catdata}. Analyze the effects of gender and age on the level of pain, (R1 = 0, no pain, …, 5 = severe pain), before treatment started. You may consult the package vignette for an analysis on the variable R4 by issuing the command in your R session: > data(knee, package=“catdata”) > vignette(“ordinal-knee1”)
data(knee, package="catdata")
head(knee)
## N Th Age Sex R1 R2 R3 R4
## 1 1 1 28 1 4 4 4 4
## 2 2 1 32 1 4 4 4 4
## 3 3 1 41 1 3 3 3 3
## 4 4 2 21 1 4 3 3 2
## 5 5 2 34 1 4 3 3 2
## 6 6 1 24 1 3 3 3 2
str(knee)
## 'data.frame': 127 obs. of 8 variables:
## $ N : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Th : int 1 1 1 2 2 1 2 2 2 1 ...
## $ Age: int 28 32 41 21 34 24 28 40 24 39 ...
## $ Sex: num 1 1 1 1 1 1 1 1 0 0 ...
## $ R1 : int 4 4 3 4 4 3 4 3 4 4 ...
## $ R2 : int 4 4 3 3 3 3 3 2 4 4 ...
## $ R3 : int 4 4 3 3 3 3 3 2 4 4 ...
## $ R4 : int 4 4 3 2 2 2 2 2 3 3 ...
dta_3<-reshape2::melt(knee[,5:8], key=, value=value)
names(dta_3)<-c("Measure Time", "Pain level")
summary(m0_dta_3<-polr(factor(dta_3$`Pain level`)~factor(dta_3$`Measure Time`), method = "probit",Hess=T))
## Call:
## polr(formula = factor(dta_3$`Pain level`) ~ factor(dta_3$`Measure Time`),
## Hess = T, method = "probit")
##
## Coefficients:
## Value Std. Error t value
## factor(dta_3$`Measure Time`)R2 -0.3476 0.1335 -2.604
## factor(dta_3$`Measure Time`)R3 -0.5326 0.1342 -3.968
## factor(dta_3$`Measure Time`)R4 -0.7395 0.1354 -5.460
##
## Intercepts:
## Value Std. Error t value
## 1|2 -1.1634 0.1073 -10.8458
## 2|3 -0.7974 0.1042 -7.6538
## 3|4 -0.1022 0.1003 -1.0183
## 4|5 0.9509 0.1075 8.8467
##
## Residual Deviance: 1523.003
## AIC: 1537.003