In this exercise, students will work together to co-construct knowledge about a useful statistics technique called polynomial contrasts. For convenience, we will consider the Rehabilitation Therapy data set described in Exercise 16.9 (which you had studied in a previous assignment), but now replace the prior fitness status by a certain fitness score (whose value is a real number). In particular, ‘below average’ is replaced by a score of 10, ‘average’ is replaced by a score of 20, and ‘above average’ is replaced by 30; the observed values of the response variable are kept the same. Please work together on the following questions.
Answer: Polynomial contrasts are orthogonal contrasts that can generalize to several means (more then 2). Polynomial contrasts can be used to construct F test to see if there is a polynomial relationship between the means. These contrasts can be considered when 1) The contrasts are orthogonal that is \[\sum{c_i\bar{Y_i}} \] \[\sum{d_i\bar{Y_i}}\] are such that \[\sum{c_i} = \sum{d_i} = 0\] But further \[\sum{\frac{c_id_i}{n_i}} =0\]
where \(n_i\) is the number of observations from each class. 2) The spacing between the levels is constant
df = read.table("http://www.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%2016%20Data%20Sets/CH16PR09.txt")
df$Recover = df$V1
df$Fitness = df$V2
df$V1 <- df$V2 <- df$V3 <- NULL
df$Fitness[df$Fitness ==1] = 10
df$Fitness[df$Fitness ==2] = 20
df$Fitness[df$Fitness ==3] = 30
The spacing between the contrasts is constant.
One can pick \(c_i = (1,-1,0)\) and \(d_i=(4/9,5/9,-1)\) and we get that \(\sum{c_i} = \sum{d_i} =0\) and \[\sum{\frac{c_id_i}{n_i}} = \frac{4/9}{8} + \frac{-5/9}{10}+\frac{0*-1}{6} = \frac{2}{9} - \frac{2}{9} = 0\]
So yes, a polynomial contrast can be used with this data
Answer in part b d) As in Exercise 16.9(g), describe the relationship between physical fitness scores and duration of required physical therapy. If possible, use polynomial contrasts to draw a conclusion. [5pts]
library(devtools)
## Warning: package 'devtools' was built under R version 3.4.2
library(easyGgplot2)
## Loading required package: ggplot2
library("ggplot2")
ggplot2.dotplot(data=df, xName='Fitness',yName='Recover', groupName='Fitness'
,legendPosition="top",addBoxplot=TRUE)
## `stat_bindot()` using `bins = 30`. Pick better value with `binwidth`.
There seems to be a linear relationship between the means
Yes we can make prediction by extrapolation of the “linear regression” fit by the polynomial contrast.
df = read.table("http://www.stat.ufl.edu/~rrandles/sta4210/Rclassnotes/data/textdatasets/KutnerData/Chapter%2019%20Data%20Sets/CH19PR10.txt")
df$Cash = df$V1
df$Age = df$V2
df$Gender = df$V3
df$V1 <- df$V2 <- df$V3 <- df$V4 <- NULL
df$Age[df$Age ==1] = "Young"
df$Age[df$Age ==2] = "Middle"
df$Age[df$Age ==3] = "Elderly"
df$Gender[df$Gender == 1] = "Male"
df$Gender[df$Gender == 2] = "Female"
av = aov(Cash~Age+Gender+Age*Gender,data=df)
anova(av)
## Analysis of Variance Table
##
## Response: Cash
## Df Sum Sq Mean Sq F value Pr(>F)
## Age 2 316.72 158.361 66.2907 9.789e-12 ***
## Gender 1 5.44 5.444 2.2791 0.1416
## Age:Gender 2 5.06 2.528 1.0581 0.3597
## Residuals 30 71.67 2.389
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
\(H_0:all (\alpha\beta) =0\) \(H_a: \exists (\alpha_i\beta_i) \neq 0\) The p value in the ANOVA table is \(0.3597 > 0.05\) we thus fail to reject
\(H_0:\forall i(\beta_i) =0\) \(H_a: \exists i: (\beta_i) \neq 0\) The p value in the ANOVA table is \(0.1416> 0.05\) so we conclude that Gender effect is not significant