In completing the takehome portion of Midterm I, I consulted my class notes, reading notes (mine), as well as the assigned textbook. I used Rstudio in completing this portion of the midterm.
I have neigher given nor received any assistance on any portion of this exam.
require(mosaic)
## Loading required package: mosaic
## Loading required package: lattice
## Loading required package: grid
## Loading required package: survival
## Loading required package: splines
## Loading required package: Hmisc
## Hmisc library by Frank E Harrell Jr
##
## Type library(help='Hmisc'), ?Overview, or ?Hmisc.Overview') to see overall
## documentation.
##
## NOTE:Hmisc no longer redefines [.factor to drop unused levels when
## subsetting. To get the old behavior of Hmisc type dropUnusedLevels().
## Attaching package: 'Hmisc'
## The following object(s) are masked from 'package:survival':
##
## untangle.specials
## The following object(s) are masked from 'package:base':
##
## format.pval, round.POSIXt, trunc.POSIXt, units
## Attaching package: 'mosaic'
## The following object(s) are masked from 'package:Hmisc':
##
## do
## The following object(s) are masked from 'package:stats':
##
## binom.test, D, median, prop.test, sd, var
## The following object(s) are masked from 'package:base':
##
## max, mean, min, print, sample
help = fetchData("HELPrct")
## Data HELPrct found in package.
Question 1
histogram(~pcs | substance, data = help, main = "Fig 1. Physical Component Score by Substance Group",
xlab = "Physical Component Score (%)")
favstats(pcs ~ substance, data = help)
## min Q1 median Q3 max mean sd n missing
## alcohol 14.07 38.20 47.36 56.90 68.12 46.83 11.250 177 0
## cocaine 23.48 44.63 53.52 58.33 71.63 51.26 10.293 152 0
## heroin 21.92 39.55 46.40 52.45 74.81 45.85 9.822 124 0
The above graphs depict the distribution of the Physical Component Score, PCS, for each substance group. The shape of the distribution for the alcohol group is skewed left, centered about a median of 47.36% with half of the group scoring above 47.36% and the other half scoring below it while the middle 50% of the group scored within a PCS of 18.70% of each other. The shape of the distribution of the cocaine group is also skewed left, centered about a median of 53.52% meaning half of the group scored above 53.52% and the other half scored below it. The middle 50% of the group scored within a PCS of 13.70% of each other. The shape of the distribution for the heroin group is fairly normal, but slightly skewed right. The distribution is centered about the average PCS of 45.85% with a standard deviation of 9.82%, meaning that 68% of heroin group scored between 36.03% and 55.67% and 95% of the heroin group scored between 26.21% and 65.49%.
Question 2
mod1 = mm(pcs ~ 1, data = help)
mod1
## Groupwise Model.
## Call:
## pcs ~ 1
##
## Coefficients:
## all
## 48
Mod1 is a grand model for PCS. The model shows a mean of 48.05%. This means that the best value for PCS without distinction between groups is 48.05%; that is the average score for all the subjects is 48.05%.
Question 3
sd(residuals(mod1))
## [1] 10.78
The standard deviation for the residuals of this model is 10.78%. This means that 68% of the residuals, the difference between the model predicted value and the observed value, are confined in a range of 21.56% and that 95% of the residuals are confined in a range of 43.12%.
Question 4
mod2 = mm(pcs ~ substance, data = help)
mod2
## Groupwise Model.
## Call:
## pcs ~ substance
##
## Coefficients:
## alcohol cocaine heroin
## 46.8 51.3 45.8
Mod2 is a groupwise model in which we take the substance group the subjects fall under into consideration in order to reduce as much residual variance as we can. Here, the coefficients are 46.83%, 51.26%, and 45.85% for the alcohol, cocain, and heroin groups, respectively. These coefficients are the best values for PCS for each group, meaning they are the average score for each group.
Question 5
sd(residuals(mod2))
## [1] 10.53
The standard deviation of the residuals is 10.53%. This means that 68% of the residuals are confined within a range of 21.06% and 95% of the residuals are confined within a range of 42.12%.
Question 6
var(resid(mod1))
## [1] 116.3
var(resid(mod2))
## [1] 110.9
The variances differ because model 2 accounts for more variability by dividing the population into substance groups. Model 1 is a model without taking into account any factors that may influence the PCS. Because the model variance for model 2 accounts for more of the variability, its residual variance is less.
Question 7
bootstrap = do(1000) * median(pcs ~ substance, data = resample(help))
confint(bootstrap)
## name lower upper
## 1 alcohol 44.27 49.68
## 2 cocaine 51.71 55.00
## 3 heroin 44.07 48.35
I am 95% confident that the total median score is within 44.29-49.71% for the alcohol group, 51.63-55.00% for the cocain group, and 43.92-48.35% for the heroin group.