Homework

In Class Exercise 1

Two measures of bullying behavior have been constructed. The first uses self-report on a 9-item Illinois Bully Scale. The second is peer nomination in which children list any person they perceive as a bully. The total number of nominations a person receives is a measure of bullying behavior.

Source: Espelage, D.L., Holt, M.K., & Henkel, R.R. (2003). Examination of Peer-Group Contextual Effects on Aggression During Early Adolescence. Child Development, 74, 205-220.

Column 1: Score on the Illinois Bully Scale
Column 2: Total number of peer nominations

pacman::p_load(tidyverse, MASS, pscl)
dta1<- read.table("bullying.txt", h=T)
str(dta1)

## 'data.frame':    291 obs. of  2 variables:
##  $ Score     : num  1.56 1.56 1.11 1.56 1.22 1.33 1.11 1 1.11 1.22 ...
##  $ Nomination: int  0 0 0 0 4 0 0 0 0 19 ...

head(dta1)

##   Score Nomination
## 1  1.56          0
## 2  1.56          0
## 3  1.11          0
## 4  1.56          0
## 5  1.22          4
## 6  1.33          0

# proportion of zeros
sum(dta1$Nomination < 1)/length(dta1$Nomination)

## [1] 0.5635739

colMeans(dta1)

##      Score Nomination 
##   1.658110   2.487973

Visualization

# use the Freedman-Diaconis rule for bin width
bd <- 2*IQR(dta1$Nomination)/(dim(dta1)[1]^(1/3))
ot<-theme_set(theme_bw())
# histogram of nomination
ggplot(dta1, aes(Nomination))+
  stat_bin(binwidth = bd)+
  labs(x="Number of nominations",
       y="Count")+
  theme_bw()

#Poisson fit
ggplot(dta1, aes(Score, Nomination))+
  geom_jitter(alpha=.2)+
  stat_smooth(method = "glm", method.args = list(family=poisson))+
  labs(y="Number of peer nomination",
       x="Bully score")

Models

#poisoon model
summary(m0<-glm(Nomination~Score, family=poisson, data=dta1))

## 
## Call:
## glm(formula = Nomination ~ Score, family = poisson, data = dta1)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -6.4995  -1.8246  -1.5952  -0.1552  11.6806  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.66308    0.08871  -7.475 7.75e-14 ***
## Score        0.81434    0.03504  23.239  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 2205.2  on 290  degrees of freedom
## Residual deviance: 1774.6  on 289  degrees of freedom
## AIC: 2160.3
## 
## Number of Fisher Scoring iterations: 6

# zero-inflated poisson
summary(m1<-zeroinfl(Nomination~Score, data=dta1)) #equivalent Nomination~Score|Score

## 
## Call:
## zeroinfl(formula = Nomination ~ Score, data = dta1)
## 
## Pearson residuals:
##      Min       1Q   Median       3Q      Max 
## -2.37068 -0.68873 -0.59685 -0.05357  9.87491 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  0.49604    0.09987   4.967 6.81e-07 ***
## Score        0.59609    0.03942  15.120  < 2e-16 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.4967     0.3463   4.322 1.54e-05 ***
## Score        -0.7716     0.1967  -3.924 8.72e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 8 
## Log-likelihood: -780.6 on 4 Df

# zero-inflated negative binomial distribution
summary(m2<-zeroinfl(Nomination~Score, data=dta1, dist="negbin"))

## 
## Call:
## zeroinfl(formula = Nomination ~ Score, data = dta1, dist = "negbin")
## 
## Pearson residuals:
##      Min       1Q   Median       3Q      Max 
## -0.58338 -0.48698 -0.38705  0.02297  6.96982 
## 
## Count model coefficients (negbin with log link):
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -0.6737     0.4354  -1.547    0.122    
## Score         0.8819     0.2052   4.297 1.73e-05 ***
## Log(theta)   -1.0658     0.1792  -5.947 2.73e-09 ***
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept)    2.954      2.484   1.189    0.234
## Score         -3.419      2.260  -1.513    0.130
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Theta = 0.3445 
## Number of iterations in BFGS optimization: 21 
## Log-likelihood: -494.7 on 5 Df

Comparing non-nested models

vuong(m0, m1)

## Vuong Non-Nested Hypothesis Test-Statistic: 
## (test-statistic is asymptotically distributed N(0,1) under the
##  null that the models are indistinguishible)
## -------------------------------------------------------------
##               Vuong z-statistic             H_A    p-value
## Raw                   -4.891816 model2 > model1 4.9955e-07
## AIC-corrected         -4.858930 model2 > model1 5.9011e-07
## BIC-corrected         -4.798531 model2 > model1 7.9917e-07

m1 is better fitted

vuong(m1, m2)

## Vuong Non-Nested Hypothesis Test-Statistic: 
## (test-statistic is asymptotically distributed N(0,1) under the
##  null that the models are indistinguishible)
## -------------------------------------------------------------
##               Vuong z-statistic             H_A   p-value
## Raw                   -4.242754 model2 > model1 1.104e-05
## AIC-corrected         -4.242754 model2 > model1 1.104e-05
## BIC-corrected         -4.242754 model2 > model1 1.104e-05

There is significant differece between m1 and m2

Redidual plots

# Fortify data with fitted values and residuals
dta1_m<- data.frame(dta1, 
                    fit1=predict(m1, type="response"),
                    fit2=predict(m2, type="response"),
                    r1=resid(m1, type="pearson"),
                    r2=resid(m2, type="pearson"))

ggplot(dta1_m, aes(fit1, r1))+
  geom_point(pch=20)+
  geom_point(aes(fit2,r2), pch=1)+
  geom_hline(yintercept = 0)+
  labs(x="Fitted values", y="Residuals",
       title="Poisson vs. Negative Binomial")

#Fortify data with model fits and CIs
dta1_m2<-data.frame(dta1, yhat=predict(m2))

#model fit over observations
ggplot(dta1_m2, aes(Score, Nomination))+
  geom_point(pch=20, alpha=.2)+
  geom_line(aes(Score, yhat), color="magenta", lwd=rel(1))+
  stat_smooth(method = "glm", method.args = list(family=poisson))+
  labs(x="Bully score", y="Predicted number of nomination")

In Class Exercise 2

A sample of 978 young men aged 20 to 22 years was extracted from the National Longitudinal Survey of Youth (NLSY). The outcome of interest is whether a youth’s reported major activity in 1985 was (1) working, (2) in school, or (3) inactive. Six additional variables were also recorded.

Source: Powers, D.A., & Xie, Y. (2000). Statistical Methods for Categorical Data Analysis. p. 232.

Column 1: Employment status: School = In school, Work = Working, Idle = Inactive
Column 2: Ethnicity, Black or Others
Column 3: Family structure, Non-intact or Otherwise
Column 4: Father’s education, College or Otherwise
Column 5: Family income in 1979 in thousands
Column 6: Local unemployment rate in 1980
Column 7: Standardized score on the Armed Services Aptitude Battery Test

dta2<- read.table("youth_employment.txt", h=T)
head(dta2)

##   Employment   Race     Family    DadEdu Income Local   ASAB
## 1       Work Others Non-Intact Otherwise 0.4909   6.9 -0.110
## 2     School Others Non-Intact Otherwise 0.6940   4.8  0.452
## 3       Work Others  Otherwise Otherwise 1.1000   6.5  0.967
## 4       Work Others Non-Intact   College 1.5000   3.8  1.667
## 5       Work Others Non-Intact Otherwise 0.2544   6.9  0.000
## 6     School Others Non-Intact Otherwise 0.9391   5.4  0.000

str(dta2)

## 'data.frame':    978 obs. of  7 variables:
##  $ Employment: chr  "Work" "School" "Work" "Work" ...
##  $ Race      : chr  "Others" "Others" "Others" "Others" ...
##  $ Family    : chr  "Non-Intact" "Non-Intact" "Otherwise" "Non-Intact" ...
##  $ DadEdu    : chr  "Otherwise" "Otherwise" "Otherwise" "College" ...
##  $ Income    : num  0.491 0.694 1.1 1.5 0.254 ...
##  $ Local     : num  6.9 4.8 6.5 3.8 6.9 5.4 4.2 6.2 6.2 6.9 ...
##  $ ASAB      : num  -0.11 0.452 0.967 1.667 0 ...

dta2$Race <- relevel(factor(dta2$Race), ref="Others", levels=c("Black", "Others"))
dta2$Family<-relevel(factor(dta2$Family), ref="Otherwise")
dta2$DadEdu<-relevel(factor(dta2$DadEdu), ref="Otherwise")
dta2$Incf<- factor(cut(dta2$Income, 
                       breaks=c(quantile(dta2$Income, probs=seq(0, 1, by=.33))),
                       labels=c("Low", "Middle","High"), ordered=T, lowest.inclued=T))
dta2$Localf<- factor(cut(dta2$Local, 
                       breaks=c(quantile(dta2$Local, probs=seq(0, 1, by=.5))),
                       labels=c("Low","High"), ordered=T, lowest.inclued=T))
dta2$ASABf<- factor(cut(dta2$ASAB, 
                       breaks=c(quantile(dta2$ASAB, probs=seq(0, 1, by=.33))),
                       labels=c("Low", "Middle","High"), ordered=T, lowest.inclued=T))

str(dta2)

## 'data.frame':    978 obs. of  10 variables:
##  $ Employment: chr  "Work" "School" "Work" "Work" ...
##  $ Race      : Factor w/ 2 levels "Others","Black": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Family    : Factor w/ 2 levels "Otherwise","Non-Intact": 2 2 1 2 2 2 1 1 2 1 ...
##  $ DadEdu    : Factor w/ 2 levels "Otherwise","College": 1 1 1 2 1 1 1 2 2 2 ...
##  $ Income    : num  0.491 0.694 1.1 1.5 0.254 ...
##  $ Local     : num  6.9 4.8 6.5 3.8 6.9 5.4 4.2 6.2 6.2 6.9 ...
##  $ ASAB      : num  -0.11 0.452 0.967 1.667 0 ...
##  $ Incf      : Ord.factor w/ 3 levels "Low"<"Middle"<..: 2 2 3 3 1 3 3 3 3 3 ...
##  $ Localf    : Ord.factor w/ 2 levels "Low"<"High": 1 1 1 1 1 1 1 1 1 1 ...
##  $ ASABf     : Ord.factor w/ 3 levels "Low"<"Middle"<..: 2 3 3 NA 2 2 2 3 3 3 ...

with(dta2,ftable(Race,Family, DadEdu,Employment))

##                             Employment Idle School Work
## Race   Family     DadEdu                               
## Others Otherwise  Otherwise              84    161  141
##                   College                23     66   72
##        Non-Intact Otherwise              47     43   54
##                   College                 5     12   18
## Black  Otherwise  Otherwise              28     60   26
##                   College                 6      8    3
##        Non-Intact Otherwise              39     40   27
##                   College                 3     10    2

a<-as.data.frame(with(dta2,ftable(Race,Family, DadEdu,Employment)))

with(dta2, ftable(Incf,ASABf,Localf, Employment))

##                      Employment Idle School Work
## Incf   ASABf  Localf                            
## Low    Low    Low                 26     39   19
##               High                42     34   19
##        Middle Low                 21     22   15
##               High                11     26   10
##        High   Low                  3     13    8
##               High                 4      5    3
## Middle Low    Low                 13     20   24
##               High                11     24   10
##        Middle Low                 22     28   36
##               High                 6     16   13
##        High   Low                  9     22   29
##               High                 9     19   12
## High   Low    Low                  6      9   10
##               High                 4      6    5
##        Middle Low                 18     23   16
##               High                 4     16   15
##        High   Low                 15     40   59
##               High                11     26   27

b<-as.data.frame(with(dta2, ftable(Incf,ASABf,Localf, Employment)))

dta2_fn<-subset(dta2, Employment!="School")%>% mutate(resp=ifelse(Employment=="Work", 1,0))
dta2_fp<-subset(dta2, Employment!="Work") %>% mutate(resp=ifelse(Employment=="School",1,0))
m_fn<-glm(resp~Race+Family+DadEdu+Income+Local+ASAB, data=dta2_fn, family=binomial)
m_fp<-glm(resp~Race+Family+DadEdu+Income+Local+ASAB, data=dta2_fp, family=binomial)
sjPlot::tab_model(m_fn, m_fp, show.obs = F, show.r2 = F)

	resp			resp
Predictors	Odds Ratios	CI	p	Odds Ratios	CI	p
(Intercept)	2.02	1.00 – 4.10	0.051	1.52	0.78 – 2.94	0.218
Race [Black]	0.64	0.41 – 0.99	0.046	1.25	0.85 – 1.84	0.265
Family [Non-Intact]	0.90	0.62 – 1.33	0.609	0.57	0.40 – 0.82	0.003
DadEdu [College]	1.19	0.74 – 1.92	0.471	1.28	0.81 – 2.04	0.303
Income	1.72	1.10 – 2.73	0.019	1.22	0.84 – 1.81	0.316
Local	0.92	0.85 – 0.99	0.025	1.01	0.95 – 1.09	0.708
ASAB	1.34	1.07 – 1.67	0.010	1.22	0.99 – 1.51	0.067

Really strange plots

ggplot(subset(a, Employment!="Idle"), aes(DadEdu, Freq/sum(Freq), color=Employment))+
  geom_line()+
  stat_smooth(method = "glm", method.args = list(family=binomial))+
  geom_point()+
  facet_grid(Family~Race)+
  scale_y_continuous(seq(-1.5, 1,by=.5))+
  labs(x="Father's education", y="Log-odds(Inactive)")

ggplot(subset(b, Employment!="Idle"), aes(ASABf, Freq/cumsum(Freq), color=Employment))+
  geom_line()+
  stat_smooth(method = "glm", method.args = list(family=binomial))+
  geom_point()+
  facet_wrap(Incf~Localf, nrow=2)+
  scale_y_continuous(seq(-1.5, 1,by=.5))+
  labs(x="Father's education", y="Log-odds(Inactive)")

summary(m0 <- nnet::multinom(Employment ~ ., data = dta2))

## # weights:  39 (24 variable)
## initial  value 1046.977511 
## iter  10 value 995.801470
## iter  20 value 989.576609
## final  value 989.321874 
## converged

## Call:
## nnet::multinom(formula = Employment ~ ., data = dta2)
## 
## Coefficients:
##        (Intercept)  RaceBlack FamilyNon-Intact DadEduCollege      Income
## School   0.7018017  0.2391504      -0.54708859     0.2263025 -0.07437054
## Work     0.7184842 -0.3885990      -0.04838905     0.1326759  0.22198338
##               Local        ASAB    Incf.L      Incf.Q    Localf.L   ASABf.L
## School -0.004238422 -0.01547256 0.1184440 -0.09025778  0.11135278 0.4116813
## Work   -0.067554926 -0.08513663 0.3412993 -0.29255812 -0.02332271 0.5843150
##           ASABf.Q
## School 0.04306545
## Work   0.12077936
## 
## Std. Errors:
##        (Intercept) RaceBlack FamilyNon-Intact DadEduCollege    Income
## School   0.5384768 0.1997812        0.1888134     0.2396764 0.4137664
## Work     0.5635782 0.2230273        0.1957335     0.2457077 0.4089509
##             Local      ASAB    Incf.L    Incf.Q  Localf.L   ASABf.L   ASABf.Q
## School 0.05398749 0.2637922 0.3230540 0.1537179 0.1882321 0.4201313 0.1469578
## Work   0.05915372 0.2756148 0.3261682 0.1578611 0.2015597 0.4372910 0.1540811
## 
## Residual Deviance: 1978.644 
## AIC: 2026.644

In Class Exercise 3

The mental health status of a sample of 1,660 young New York residents in midtown Manhattan was classified by their parents’ socioeconomic status (A=High, F=Low).

Source: Srole, L. & Fisher, A.K. (1978). Mental health in the metropolis. The midtown Manhattan study. Reported in Agresti, A. (1989). Tutorial on modeling ordered categorical response data. Psychological Bulletin, 105, 290-301.

Column 1: Subject ID
Column 2: SES, numerical code A=6, … , F=1
Column 3: SES, reference code for A
Column 4: SES, reference code for B
Column 5: SES, reference code for C
Column 6: SES, reference code for D
Column 7: SES, reference code for E
Column 9: Mental impairment, well=1, mild=2, moderate=3, impaired=4

dta3<- read.table("mentalSES.txt", h=T)
str(dta3)

## 'data.frame':    1660 obs. of  8 variables:
##  $ Id    : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ ses   : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ A     : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ B     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ C     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ D     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ E     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ mental: int  1 1 1 1 1 1 1 1 1 1 ...

head(dta3)

##   Id ses A B C D E mental
## 1  1   6 1 0 0 0 0      1
## 2  2   6 1 0 0 0 0      1
## 3  3   6 1 0 0 0 0      1
## 4  4   6 1 0 0 0 0      1
## 5  5   6 1 0 0 0 0      1
## 6  6   6 1 0 0 0 0      1

Exercise 1

In a survey people were asked, “Is addiction a disease or are addicts weak-willed?” The response was in three categories, 0 = “addicts are weak-willed,” 1 = “addiction is a disease,” or 2 = “both alternatives hold”. Use an appropriate model to examine how the response depends on respondent’s gender (0 = male, 1 = female), age, and academic status (1 = academic or 0 = otherwise). The dataset is available as addiction{catdata}.

data(addiction, package = "catdata")
head(addiction)

##   ill gender age university
## 1   1      1  61          0
## 2   0      1  43          0
## 3   2      0  44          0
## 4   0      1  21          1
## 5   0      0  33          0
## 6   1      0  83          0

str(addiction)

## 'data.frame':    712 obs. of  4 variables:
##  $ ill       : int  1 0 2 0 0 1 0 1 1 1 ...
##  $ gender    : int  1 1 0 1 0 0 0 0 1 0 ...
##  $ age       : int  61 43 44 21 33 83 29 61 37 19 ...
##  $ university: int  0 0 0 1 0 0 0 0 0 0 ...

m0_addic<-glm(ill~factor(gender)+age+factor(university), data=addiction, family = poisson)
summary(m0_addic)

## 
## Call:
## glm(formula = ill ~ factor(gender) + age + factor(university), 
##     family = poisson, data = addiction)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.92293  -1.17525   0.01138   0.33807   1.33959  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         -0.570631   0.113507  -5.027 4.98e-07 ***
## factor(gender)1      0.081732   0.079617   1.027  0.30462    
## age                  0.010792   0.002222   4.857 1.19e-06 ***
## factor(university)1  0.235455   0.084554   2.785  0.00536 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 528.42  on 681  degrees of freedom
## Residual deviance: 498.77  on 678  degrees of freedom
##   (30 observations deleted due to missingness)
## AIC: 1570.6
## 
## Number of Fisher Scoring iterations: 5

Exercise 2

The data were collected during face-to-face home-based interviews with 357 female participants. Each participant’s height, weight as well as her answer to the following question were recorded. “Vigorous physical activities usually make you breathe hard or feel tired most of the time. Examples of vigorous activities include: jogging, fast dancing, soccer, fast swimming, fast biking, and Stairmaster. How many days in a typical week do you do vigorous physical activities for 20 minutes or more?” BMI was calculated using the Quetelet index (kg/m2), which is considered both a convenient and reliable indicator of overweight and obesity.

Source: Slymen, D.J., Ayala, G.X., Arredondo, E.M., & Elder, J.P. (2006). A demonstration of modeling count data with an application to physical activity. Epidemiologic Perspectives & Innovations, 3:3, 1-9.

Column 1: Participant’s BMI index
Column 2: Number of times of vigorous activities per week

dta_2<- read.table("physical_activity.txt", h=T)
head(dta_2)

##   bmi count
## 1  22     0
## 2  31     0
## 3  29     0
## 4  33     0
## 5  18     0
## 6  25     0

str(dta_2)

## 'data.frame':    357 obs. of  2 variables:
##  $ bmi  : int  22 31 29 33 18 25 28 30 29 16 ...
##  $ count: int  0 0 0 0 0 0 0 0 0 0 ...

t<-rbind(colMeans(dta_2), apply(dta_2, 2, FUN=var))
rownames(t)<-c("ave", "var")
t

##          bmi     count
## ave 26.44538 0.5938375
## var 14.88816 2.2194001

ggplot(dta_2, aes(x=bmi, y=count))+
  geom_bar(stat="identity")+
  labs(x="BMI", y="Count")+
  theme_minimal()+
  theme(legend.position = c(.1,.9))

#Poisson fit
ggplot(dta_2, aes(bmi, count))+
  geom_jitter(alpha=.2)+
  stat_smooth(method = "glm", method.args = list(family=poisson))+
  labs(y="Count of BMI",
       x="BMI")

summary(m0_dta_2<-glm(count~bmi, data=dta_2, family="poisson"))

## 
## Call:
## glm(formula = count ~ bmi, family = "poisson", data = dta_2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.8404  -1.1270  -0.9570  -0.8127   4.8654  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  2.27048    0.43383   5.234 1.66e-07 ***
## bmi         -0.10898    0.01729  -6.302 2.94e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 795.57  on 356  degrees of freedom
## Residual deviance: 756.44  on 355  degrees of freedom
## AIC: 946.99
## 
## Number of Fisher Scoring iterations: 6

summary(m1_dta_2<-zeroinfl(count~bmi,data=dta_2))

## 
## Call:
## zeroinfl(formula = count ~ bmi, data = dta_2)
## 
## Pearson residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7046 -0.4250 -0.3568 -0.2989  4.8343 
## 
## Count model coefficients (poisson with log link):
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept)  1.73817    0.53913   3.224  0.00126 **
## bmi         -0.02276    0.02165  -1.051  0.29321   
## 
## Zero-inflation model coefficients (binomial with logit link):
##             Estimate Std. Error z value Pr(>|z|)   
## (Intercept) -1.51385    0.95813  -1.580  0.11411   
## bmi          0.11588    0.03736   3.101  0.00193 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
## 
## Number of iterations in BFGS optimization: 7 
## Log-likelihood: -281.4 on 4 Df

vuong(m0_dta_2, m1_dta_2)

## Vuong Non-Nested Hypothesis Test-Statistic: 
## (test-statistic is asymptotically distributed N(0,1) under the
##  null that the models are indistinguishible)
## -------------------------------------------------------------
##               Vuong z-statistic             H_A    p-value
## Raw                   -7.822786 model2 > model1 2.5834e-15
## AIC-corrected         -7.740484 model2 > model1 4.9520e-15
## BIC-corrected         -7.580912 model2 > model1 1.7157e-14

#parameter estimates
show(m0_est<-cbind(Estimate=coef(m0_dta_2), confint(m0_dta_2)))

##               Estimate      2.5 %      97.5 %
## (Intercept)  2.2704806  1.4099338  3.11111748
## bmi         -0.1089812 -0.1428471 -0.07502992

show(m1_est<-cbind(Estimate=coef(m1_dta_2), confint(m1_dta_2)))

##                      Estimate       2.5 %     97.5 %
## count_(Intercept)  1.73816504  0.68148156 2.79484853
## count_bmi         -0.02275706 -0.06519126 0.01967714
## zero_(Intercept)  -1.51384962 -3.39175256 0.36405331
## zero_bmi           0.11588088  0.04264803 0.18911372

exp(m0_est)

##              Estimate     2.5 %     97.5 %
## (Intercept) 9.6840538 4.0956844 22.4461134
## bmi         0.8967472 0.8668866  0.9277157

exp(m1_est)

##                    Estimate      2.5 %    97.5 %
## count_(Intercept) 5.6868986 1.97680431 16.360151
## count_bmi         0.9774999 0.93688825  1.019872
## zero_(Intercept)  0.2200612 0.03364965  1.439151
## zero_bmi          1.1228621 1.04357053  1.208178

As feel breath hard or feel tired, 0.89 times as frequent for increasing one BMI unit.

anova(m0_dta_2, m1_dta_2)

## Analysis of Deviance Table
## 
## Model: poisson, link: log
## 
## Response: count
## 
## Terms added sequentially (first to last)
## 
## 
##      Df Deviance Resid. Df Resid. Dev
## NULL                   356     795.57
## bmi   1    39.13       355     756.44

dta_2_m<- data.frame(dta_2, 
                     fit0=predict(m0_dta_2, type="response"),
                     fit1=predict(m1_dta_2, type="response"),
                     r0=resid(m0_dta_2, type="pearson"),
                     r1=resid(m1_dta_2, type="pearson"))
ggplot(dta_2_m, aes(fit0, r0))+
  geom_point(pch=20)+
  geom_point(aes(fit1, r1),pch=3)+
  geom_hline(yintercept = 0)+
  labs(x="Fitted values", y="Residuals")

-m1_dta_2 model is better fitted(?)

Exercise 3

In a clinical study a sample of 127 patients with sport related injuries have been treated with two different therapies (chosen by random design). After 3,7 and 10 days of treatment the pain occuring during knee movement was observed. The dataset is available as knee{catdata}. Analyze the effects of gender and age on the level of pain, (R1 = 0, no pain, …, 5 = severe pain), before treatment started. You may consult the package vignette for an analysis on the variable R4 by issuing the command in your R session: > data(knee, package=“catdata”) > vignette(“ordinal-knee1”)

data(knee, package="catdata")
head(knee)

##   N Th Age Sex R1 R2 R3 R4
## 1 1  1  28   1  4  4  4  4
## 2 2  1  32   1  4  4  4  4
## 3 3  1  41   1  3  3  3  3
## 4 4  2  21   1  4  3  3  2
## 5 5  2  34   1  4  3  3  2
## 6 6  1  24   1  3  3  3  2

str(knee)

## 'data.frame':    127 obs. of  8 variables:
##  $ N  : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Th : int  1 1 1 2 2 1 2 2 2 1 ...
##  $ Age: int  28 32 41 21 34 24 28 40 24 39 ...
##  $ Sex: num  1 1 1 1 1 1 1 1 0 0 ...
##  $ R1 : int  4 4 3 4 4 3 4 3 4 4 ...
##  $ R2 : int  4 4 3 3 3 3 3 2 4 4 ...
##  $ R3 : int  4 4 3 3 3 3 3 2 4 4 ...
##  $ R4 : int  4 4 3 2 2 2 2 2 3 3 ...

dta_3<-reshape2::melt(knee[,5:8], key=, value=value)

names(dta_3)<-c("Measure Time", "Pain level")
summary(m0_dta_3<-polr(factor(dta_3$`Pain level`)~factor(dta_3$`Measure Time`), method = "probit",Hess=T))

## Call:
## polr(formula = factor(dta_3$`Pain level`) ~ factor(dta_3$`Measure Time`), 
##     Hess = T, method = "probit")
## 
## Coefficients:
##                                  Value Std. Error t value
## factor(dta_3$`Measure Time`)R2 -0.3476     0.1335  -2.604
## factor(dta_3$`Measure Time`)R3 -0.5326     0.1342  -3.968
## factor(dta_3$`Measure Time`)R4 -0.7395     0.1354  -5.460
## 
## Intercepts:
##     Value    Std. Error t value 
## 1|2  -1.1634   0.1073   -10.8458
## 2|3  -0.7974   0.1042    -7.6538
## 3|4  -0.1022   0.1003    -1.0183
## 4|5   0.9509   0.1075     8.8467
## 
## Residual Deviance: 1523.003 
## AIC: 1537.003

Homework_1102

yuwenchen

11/2/2020