Reading: Ch. 5, 6.1
Exercises to hand in: 5.28, 5.34, 5.38, 5.46, 5.52, 5.58

library(tidyverse)
library(Stat2Data)
library(skimr)
library(agricolae)

5.28 Discrimination: exploratory

The city of New Haven, Conneticut, administered exams (both written and oral) in November and December of 2003 to firefighters hoping to qualify for promotion to either Lieutenant or Captain in the city fire department. A final score consisting of a 60% weight for the written exam anda 40% weight for the oral exam was computed for each person who took the exam. Those people receiving a total score of at least 70% were deemed to be eligible for promotion. In a situation where \(t\) openings were available, the people with the top \(t+2\) scores would be considered for those openings. A concern was raised, however, that the exams were discriminatory with respect to race anda lawsuit was filed. The data are given in the data file Ricci. For each person who took the exams, there are measurements on their race (black, white, or Hispanic), which position they were trying for (Lieutenant, Captain), scores on the oral and written exams, and the combined score. The concern over the exams administered by the city was that they were discriminatory based on race. Here we concentrate on the overall, combined score on the two tests for these people seeking promotion and we analyze the average score for the three different races.

data("Ricci")

Use a graphical approach to answer the question of whether the average combined score is different for the three races. What do the graphs suggest about any further analysis that could be done? Explain.

data("Ricci")
Ricci %>%
  group_by(Race) %>%
  skim(Combine)

## Skim summary statistics
##  n obs: 118 
##  n variables: 5 
##  group variables: Race 
## 
## ── Variable type:numeric ──────────────────────────────────────────────
##  Race variable missing complete  n  mean   sd    p0   p25   p50   p75
##     B  Combine       0       27 27 63.74 8.74 45.93 57.66 61.07 72.03
##     H  Combine       0       23 23 65.34 7.14 54.13 60.08 65    69.95
##     W  Combine       0       68 68 72.68 8.83 56.32 68.02 71.64 78.45
##   p100     hist
##  76.6  ▁▂▅▇▁▃▅▆
##  79.68 ▅▅▇▅▅▅▂▃
##  92.81 ▅▂▇▇▆▅▃▂

ggplot(Ricci) + geom_boxplot(aes(x=Race, y=Combine))

The boxplot shows that the mean between the three races are different. The scores are generally higher for White race compared to Black and Hispanic. Since there are some variability between and within groups, we can conclude that the average combined scores between the groups are different.

Check the conditions necessary for conducting an ANOVA to determine if the combined score is significantly different for at least one race.

The conditions for ANOVA is that the data have to be independent between and within groups, shows normality, and have equal variance between and within groups. We see that races are independent from each other and so are the scores so the condition for independent holds. From our QQ plot, we see that the plot follows the line pretty straightly so we can say that the data have normality. Lastly, for equality of variance, we see that there are similar spread so we can conclude that the conditions are met for ANOVA. From our summary table, we see that the p-value is very small. This indicates that the mean scores between races aren’t the same and we can accept the alternative hypothesis that the combined score is significantly different for at least one race.

a1 = aov(Combine ~ Race, data=Ricci)
anova(a1)

## Analysis of Variance Table
## 
## Response: Combine
##            Df Sum Sq Mean Sq F value    Pr(>F)    
## Race        2 1971.7  985.83  13.595 5.014e-06 ***
## Residuals 115 8339.1   72.51                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(a1, which=2)

plot(a1, which=1)

5.34 Aphid honeydew

Aphids (a type of small insect) produce a form of liquid waste, called honeydew, when they eat plant sap. An experiment was conducted to see whether the amounts of honeydew produced by aphids differ for different combinations of type of aphid and type of host plant. The following ANOVA table was produced with the data from this experiment.

table <- matrix(c("m-1","SST","4.9807", "MST/MSE", "0.000","46","39.87","0.8667", " ", " ", "51", "64.77", " ", " ", " "),ncol=5,byrow=TRUE)
colnames(table) <- c("Df", " Sum Sq", "Mean Sq", "F value", "Pr(>F)")
rownames(table) <- c("aphid race-host plant combina","Error","Total")
table <- as.table(table)
table

##                               Df   Sum Sq Mean Sq F value Pr(>F)
## aphid race-host plant combina m-1 SST     4.9807  MST/MSE 0.000 
## Error                         46  39.87   0.8667                
## Total                         51  64.77

Fill in the three missing values in this ANOVA table. Also show how you calculate them.

table <- matrix(c("5","24.9","4.9807", "5.747", "0.000","46","39.87","0.8667", " ", " ", "51", "64.77", " ", " ", " "),ncol=5,byrow=TRUE)
colnames(table) <- c("Df", " Sum Sq", "Mean Sq", "F value", "Pr(>F)")
rownames(table) <- c("aphid race-host plant combina","Error","Total")
table <- as.table(table)
table

##                               Df  Sum Sq Mean Sq F value Pr(>F)
## aphid race-host plant combina 5  24.9    4.9807  5.747   0.000 
## Error                         46 39.87   0.8667                
## Total                         51 64.77

How many different aphid/plant combinations were considered in this analysis? Explain how you know.

There are 6 different aphid/plant combinations for this analysis because of the total degrees of freedom in the table.

Summarize the conclusion from this ANOVA (in context).

The F-ratio indicates that the average variation between groups is about 6 times larger than the average variation within groups. The p-value is really small so we can reject the null hypothesis and accept the alternative hypothesis that the amounts of honeydew produced by aphids differ for different combinations of type of aphid and type of host plant.

5.38 Meniscus: stiffness

An experiment was conducted to compare three different methods of repairing a meniscus (cartilage in the knee). Eighteen lightly embalmed cadaveric specimens were used., with each being randomly assigned to one of the three treatments: vertical suture, meniscus arrow, FasT-Fix. Each knee was evaluated on three different response variables: load at failture, stiffness, and displacement. The data are located in the file Meniscus. For this exercise we will concentrate on the stiffness response variable (variable name stiffness).

Give the hypotheses that would be tested in an ANOVA procedure for this dataset.

\[ H_0: \mu_1=\mu_2=\mu_3\\ H_A: at\ least\ one\ \mu_i\ not\ the\ same\ \]

Show that the conditions for ANOVA are met for these data.

data(Meniscus)
a1 <- aov(Stiffness~factor(Method), data=Meniscus)
plot(a1, which=2)

plot(a1, which=1)

ggplot(Meniscus) + geom_boxplot(aes(x=factor(Method), y=Stiffness))

We see that the methods are independent from each other so the condition for independent is met. Looking at the QQ plot, we see that most of the data follows the dotted line so conditions for normality is met. We see that for the residuals vs. fitted plot, the red line is almost straight and close to the dotted line so this means that it is banded and conditions for equality of variance is met. Therefore, the conditions for ANOVA is met for these data.

Conduct an ANOVA. Report the ANOVA table and interpret the results. Do the data provide strong evidence that the mean value of stiffness differs based on the type of meniscus repair? Explain.

anova(a1)

## Analysis of Variance Table
## 
## Response: Stiffness
##                Df Sum Sq Mean Sq F value  Pr(>F)  
## factor(Method)  2 10.570   5.285  4.9811 0.02193 *
## Residuals      15 15.915   1.061                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that our f-ratio is about 5, indicating the average variation between groups is about 5 times larger than the average variation within groups. Since out p-value is smaller than the significance level at \(\alpha=0.05\), we have enough evidence to reject the null hypothesis that the mean value of stiffness are the same and accept the alternative hypothesis that they are different based on the type of meniscus repair.

5.46 Meniscus stiffness: Fisher’s LSD

In Exercise 5.38 we discovered that there was a significant difference between the treatments with respect to the response variable stiffness. For this variable, larger values are better (less stiffness to the specimen). The researchers were comparing a potential new treatment (FasT-Fix) to two commonly used treatments (vertical suture and meniscus arrod). Use Fisher’s LSD to determine which differences exist between the treatments and discuss the ramifications of your conclusions for doctors.

a1 = aov(Stiffness~factor(Method), data=Meniscus)
anova(a1)

## Analysis of Variance Table
## 
## Response: Stiffness
##                Df Sum Sq Mean Sq F value  Pr(>F)  
## factor(Method)  2 10.570   5.285  4.9811 0.02193 *
## Residuals      15 15.915   1.061                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

print(LSD.test(a1,"factor(Method)"))

## $statistics
##   MSerror Df     Mean       CV t.value     LSD
##     1.061 15 7.183333 14.33942 2.13145 1.26757
## 
## $parameters
##         test p.ajusted         name.t ntr alpha
##   Fisher-LSD      none factor(Method)   3  0.05
## 
## $means
##   Stiffness       std r      LCL      UCL Min Max   Q25  Q50   Q75
## 1      7.75 0.9710819 6 6.853692 8.646308 6.3 8.7 7.225 7.80 8.600
## 2      6.10 1.3266499 6 5.203692 6.996308 4.7 8.4 5.200 5.95 6.475
## 3      7.70 0.6928203 6 6.803692 8.596308 6.4 8.3 7.625 7.85 8.150
## 
## $comparison
## NULL
## 
## $groups
##   Stiffness groups
## 1      7.75      a
## 3      7.70      a
## 2      6.10      b
## 
## attr(,"class")
## [1] "group"

The biggest difference exists between Method 1 and 2 and Method 2 and 3. This means that the new method will be similar to that of Method 1 but will be different compared to Method 2.

5.52 Words with Friends

Revisit the dataset WordsWithFriends that was analyzed in Section 5.8. In that analysis in the Case Study, we questioned whether there number of blank tiles that a player receives was related to the final score. In that analysis, our conclusion was that there is a noticeable difference between the final scores of the games, depending on how many blank tiles the player receives. In this exercise we ask the same question, but with respect to the winning margin rather than the final score.

Show that the conditions for ANOVA are met for these data.

data("WordsWithFriends")
a1 <- aov(WinMargin~factor(BlanksNumber), data=WordsWithFriends)
plot(a1, which=2)

plot(a1, which=1)

ggplot(WordsWithFriends) + geom_boxplot(aes(x=factor(BlanksNumber), y=WinMargin))

We see that the blanks are independent from each other so our condition for independence hold. Looking at the QQ plot, we see that most of the data lines up with the dotted line so the condition for normality holds. In the residual vs. fitted plot, our red line is flat and on the dotted line, showing that the condition for equality of variance is also met.

Conduct an ANOVA. Report the ANOVA table and interpret the results.

anova(a1)

## Analysis of Variance Table
## 
## Response: WinMargin
##                       Df Sum Sq Mean Sq F value   Pr(>F)   
## factor(BlanksNumber)   2   9514  4757.2  6.9884 0.001028 **
## Residuals            441 300202   680.7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We see that our F-ratio is pretty high at 7, indicating that the average variation between groups is about 7 times larger than the average variation within groups. Our p-value is also really smaller than our significance level, therefore, there is enough evidence to reject the null hypothesis there are no differences between the final scores of the games and the blank tiles the player receives. We will accept the alternative hypothesis that there is a difference between final scores and number of blank tiles. Therefore, the winning margins will be different depending on the number of tiles that are left.

5.58 Salary

A researcher wanted to know if the mean salaries of men and women are different. She chose a stratified random sample of 280 people form the 2000 U.S. Census consisting of men and women from New York State, Oregon, Arizona, and Iowa. The researcher, not understanding much about statistics, had Minitab compute an ANOVA table for her. It is shown below:

table <- matrix(c("1","8190848743","8190848743", "12.45", "0.000","278","1.82913E+11","657958980", " ", " ", "279", "1.91103E+11", " ", " ", " "),ncol=5,byrow=TRUE)
colnames(table) <- c("Df", " Sum Sq", "Mean Sq", "F value", "Pr(>F)")
rownames(table) <- c("sex","Error","Total")
table <- as.table(table)
add <- matrix(c("S=25651","R-sq=4.29","R-sq(adj)=3.94"),ncol=3,byrow=TRUE)
table

##       Df   Sum Sq     Mean Sq    F value Pr(>F)
## sex   1   8190848743  8190848743 12.45   0.000 
## Error 278 1.82913E+11 657958980                
## Total 279 1.91103E+11

add

##      [,1]      [,2]        [,3]            
## [1,] "S=25651" "R-sq=4.29" "R-sq(adj)=3.94"

Is a person’s sex significant in predicting their salary? Explain your conclusions.

According to the ANOVA table, the person’s sex is significany in predicting their salary because the p-value is significant.

What value of \(R^2\) value does the ANOVA model have? Is this good? Explain.

The \(R^2\) value for this ANOVA model is 4.29%. This \(R^2\) value is really small, therefore there’s no much variability in the data and this is not a good \(R^2\) value.

The researcher did not look at residual plots. They are shown in Figure 5.32. What conclusions do you reach about the ANOVA after examining these plots? Explain.

The condition for normality for ANOVA is not really met because the QQ plot shows curvature in the data. We also see that there are some outliers so the residual vs. fitted plot have some points that are outside of the band so the dataset is not the best to use for ANOVA.