library(tidyverse)
library(Stat2Data)
library(skimr)
library(agricolae)
The city of New Haven, Conneticut, administered exams (both written and oral) in November and December of 2003 to firefighters hoping to qualify for promotion to either Lieutenant or Captain in the city fire department. A final score consisting of a 60% weight for the written exam anda 40% weight for the oral exam was computed for each person who took the exam. Those people receiving a total score of at least 70% were deemed to be eligible for promotion. In a situation where \(t\) openings were available, the people with the top \(t+2\) scores would be considered for those openings. A concern was raised, however, that the exams were discriminatory with respect to race anda lawsuit was filed. The data are given in the data file Ricci. For each person who took the exams, there are measurements on their race (black, white, or Hispanic), which position they were trying for (Lieutenant, Captain), scores on the oral and written exams, and the combined score. The concern over the exams administered by the city was that they were discriminatory based on race. Here we concentrate on the overall, combined score on the two tests for these people seeking promotion and we analyze the average score for the three different races.
data("Ricci")
data("Ricci")
Ricci %>%
group_by(Race) %>%
skim(Combine)
## Skim summary statistics
## n obs: 118
## n variables: 5
## group variables: Race
##
## ── Variable type:numeric ──────────────────────────────────────────────
## Race variable missing complete n mean sd p0 p25 p50 p75
## B Combine 0 27 27 63.74 8.74 45.93 57.66 61.07 72.03
## H Combine 0 23 23 65.34 7.14 54.13 60.08 65 69.95
## W Combine 0 68 68 72.68 8.83 56.32 68.02 71.64 78.45
## p100 hist
## 76.6 ▁▂▅▇▁▃▅▆
## 79.68 ▅▅▇▅▅▅▂▃
## 92.81 ▅▂▇▇▆▅▃▂
ggplot(Ricci) + geom_boxplot(aes(x=Race, y=Combine))
a1 = aov(Combine ~ Race, data=Ricci)
anova(a1)
## Analysis of Variance Table
##
## Response: Combine
## Df Sum Sq Mean Sq F value Pr(>F)
## Race 2 1971.7 985.83 13.595 5.014e-06 ***
## Residuals 115 8339.1 72.51
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(a1, which=2)
plot(a1, which=1)
Aphids (a type of small insect) produce a form of liquid waste, called honeydew, when they eat plant sap. An experiment was conducted to see whether the amounts of honeydew produced by aphids differ for different combinations of type of aphid and type of host plant. The following ANOVA table was produced with the data from this experiment.
table <- matrix(c("m-1","SST","4.9807", "MST/MSE", "0.000","46","39.87","0.8667", " ", " ", "51", "64.77", " ", " ", " "),ncol=5,byrow=TRUE)
colnames(table) <- c("Df", " Sum Sq", "Mean Sq", "F value", "Pr(>F)")
rownames(table) <- c("aphid race-host plant combina","Error","Total")
table <- as.table(table)
table
## Df Sum Sq Mean Sq F value Pr(>F)
## aphid race-host plant combina m-1 SST 4.9807 MST/MSE 0.000
## Error 46 39.87 0.8667
## Total 51 64.77
table <- matrix(c("5","24.9","4.9807", "5.747", "0.000","46","39.87","0.8667", " ", " ", "51", "64.77", " ", " ", " "),ncol=5,byrow=TRUE)
colnames(table) <- c("Df", " Sum Sq", "Mean Sq", "F value", "Pr(>F)")
rownames(table) <- c("aphid race-host plant combina","Error","Total")
table <- as.table(table)
table
## Df Sum Sq Mean Sq F value Pr(>F)
## aphid race-host plant combina 5 24.9 4.9807 5.747 0.000
## Error 46 39.87 0.8667
## Total 51 64.77
An experiment was conducted to compare three different methods of repairing a meniscus (cartilage in the knee). Eighteen lightly embalmed cadaveric specimens were used., with each being randomly assigned to one of the three treatments: vertical suture, meniscus arrow, FasT-Fix. Each knee was evaluated on three different response variables: load at failture, stiffness, and displacement. The data are located in the file Meniscus. For this exercise we will concentrate on the stiffness response variable (variable name stiffness).
\[ H_0: \mu_1=\mu_2=\mu_3\\ H_A: at\ least\ one\ \mu_i\ not\ the\ same\ \]
data(Meniscus)
a1 <- aov(Stiffness~factor(Method), data=Meniscus)
plot(a1, which=2)
plot(a1, which=1)
ggplot(Meniscus) + geom_boxplot(aes(x=factor(Method), y=Stiffness))
anova(a1)
## Analysis of Variance Table
##
## Response: Stiffness
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(Method) 2 10.570 5.285 4.9811 0.02193 *
## Residuals 15 15.915 1.061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In Exercise 5.38 we discovered that there was a significant difference between the treatments with respect to the response variable stiffness. For this variable, larger values are better (less stiffness to the specimen). The researchers were comparing a potential new treatment (FasT-Fix) to two commonly used treatments (vertical suture and meniscus arrod). Use Fisher’s LSD to determine which differences exist between the treatments and discuss the ramifications of your conclusions for doctors.
a1 = aov(Stiffness~factor(Method), data=Meniscus)
anova(a1)
## Analysis of Variance Table
##
## Response: Stiffness
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(Method) 2 10.570 5.285 4.9811 0.02193 *
## Residuals 15 15.915 1.061
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
print(LSD.test(a1,"factor(Method)"))
## $statistics
## MSerror Df Mean CV t.value LSD
## 1.061 15 7.183333 14.33942 2.13145 1.26757
##
## $parameters
## test p.ajusted name.t ntr alpha
## Fisher-LSD none factor(Method) 3 0.05
##
## $means
## Stiffness std r LCL UCL Min Max Q25 Q50 Q75
## 1 7.75 0.9710819 6 6.853692 8.646308 6.3 8.7 7.225 7.80 8.600
## 2 6.10 1.3266499 6 5.203692 6.996308 4.7 8.4 5.200 5.95 6.475
## 3 7.70 0.6928203 6 6.803692 8.596308 6.4 8.3 7.625 7.85 8.150
##
## $comparison
## NULL
##
## $groups
## Stiffness groups
## 1 7.75 a
## 3 7.70 a
## 2 6.10 b
##
## attr(,"class")
## [1] "group"
Revisit the dataset WordsWithFriends that was analyzed in Section 5.8. In that analysis in the Case Study, we questioned whether there number of blank tiles that a player receives was related to the final score. In that analysis, our conclusion was that there is a noticeable difference between the final scores of the games, depending on how many blank tiles the player receives. In this exercise we ask the same question, but with respect to the winning margin rather than the final score.
data("WordsWithFriends")
a1 <- aov(WinMargin~factor(BlanksNumber), data=WordsWithFriends)
plot(a1, which=2)
plot(a1, which=1)
ggplot(WordsWithFriends) + geom_boxplot(aes(x=factor(BlanksNumber), y=WinMargin))
anova(a1)
## Analysis of Variance Table
##
## Response: WinMargin
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(BlanksNumber) 2 9514 4757.2 6.9884 0.001028 **
## Residuals 441 300202 680.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
A researcher wanted to know if the mean salaries of men and women are different. She chose a stratified random sample of 280 people form the 2000 U.S. Census consisting of men and women from New York State, Oregon, Arizona, and Iowa. The researcher, not understanding much about statistics, had Minitab compute an ANOVA table for her. It is shown below:
table <- matrix(c("1","8190848743","8190848743", "12.45", "0.000","278","1.82913E+11","657958980", " ", " ", "279", "1.91103E+11", " ", " ", " "),ncol=5,byrow=TRUE)
colnames(table) <- c("Df", " Sum Sq", "Mean Sq", "F value", "Pr(>F)")
rownames(table) <- c("sex","Error","Total")
table <- as.table(table)
add <- matrix(c("S=25651","R-sq=4.29","R-sq(adj)=3.94"),ncol=3,byrow=TRUE)
table
## Df Sum Sq Mean Sq F value Pr(>F)
## sex 1 8190848743 8190848743 12.45 0.000
## Error 278 1.82913E+11 657958980
## Total 279 1.91103E+11
add
## [,1] [,2] [,3]
## [1,] "S=25651" "R-sq=4.29" "R-sq(adj)=3.94"