Unless otherwise stated, assume all significance levels are 0.05.

Chi-square test

file_path <- "http://www.sthda.com/sthda/RDoc/data/housetasks.txt"
housetasks <- read.delim(file_path, row.names = 1)

chisq = chisq.test(housetasks)

chisq$observed
##            Wife Alternating Husband Jointly
## Laundry     156          14       2       4
## Main_meal   124          20       5       4
## Dinner       77          11       7      13
## Breakfeast   82          36      15       7
## Tidying      53          11       1      57
## Dishes       32          24       4      53
## Shopping     33          23       9      55
## Official     12          46      23      15
## Driving      10          51      75       3
## Finances     13          13      21      66
## Insurance     8           1      53      77
## Repairs       0           3     160       2
## Holidays      0           1       6     153
round(chisq$expected,2)
##             Wife Alternating Husband Jointly
## Laundry    60.55       25.63   38.45   51.37
## Main_meal  52.64       22.28   33.42   44.65
## Dinner     37.16       15.73   23.59   31.52
## Breakfeast 48.17       20.39   30.58   40.86
## Tidying    41.97       17.77   26.65   35.61
## Dishes     38.88       16.46   24.69   32.98
## Shopping   41.28       17.48   26.22   35.02
## Official   33.03       13.98   20.97   28.02
## Driving    47.82       20.24   30.37   40.57
## Finances   38.88       16.46   24.69   32.98
## Insurance  47.82       20.24   30.37   40.57
## Repairs    56.77       24.03   36.05   48.16
## Holidays   55.05       23.30   34.95   46.70
chisq
## 
##  Pearson's Chi-squared test
## 
## data:  housetasks
## X-squared = 1944.5, df = 36, p-value < 2.2e-16

Given the above output:

  1. Write a null and alternative hypothesis for the data.
  2. Make a statistical conclusion.
  3. Make a scientific (biological) conclusion.

Z test

One-sample

Suppose the IQ in a certain population is normally distributed with a mean = 100 and standard deviation = 15.

A scientist wants to know if a new medication affects IQ levels, so she recruits 20 patients to use it for one month and records their IQ levels at the end of the month.

library(BSDA)
## Loading required package: lattice
## 
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
## 
##     Orange
iqlevels = c(88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115)

z.test(iqlevels, mu=100, sigma.x=15)
## 
##  One-sample z-Test
## 
## data:  iqlevels
## z = 0.90933, p-value = 0.3632
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
##   96.47608 109.62392
## sample estimates:
## mean of x 
##    103.05

Given the above output:

  1. Write a null and alternative hypothesis for the data.
  2. What is the test statistic?
  3. What does the p value indicate?

Two-Sample

Suppose the IQ levels among individuals in two different cities are known to be normally distributed each with population standard deviations = 15.

A scientist wants to know if the mean IQ level between individuals in city A and city B are different, so she selects a simple random sample of 20 individuals from each city and records their IQ levels.

library(BSDA)
cityA = c(82, 84, 85, 89, 91, 91, 92, 94, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 114)

cityB = c(90, 91, 91, 91, 95, 95, 99, 99, 108, 109, 109, 114, 115, 116, 117, 117, 128, 129, 130, 133)

z.test(x=cityA, y=cityB, mu=0, sigma.x=15, sigma.y=15)
## 
##  Two-sample z-Test
## 
## data:  cityA and cityB
## z = -1.7182, p-value = 0.08577
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -17.446925   1.146925
## sample estimates:
## mean of x mean of y 
##    100.65    108.80

Given the above output:

  1. Write a null and alternative hypothesis for the data.
  2. What is the test statistic?
  3. What do the 95% confidence intervals indicate?
  4. Write a statistical conclusion.
  5. Write a scientific (biological) conclusion.

T-test

1-sample

We want to test if the mean number of buses that passes by in one day is 3.

x <- c(3,7,11,0,7,0,4,5,6,2)

qqnorm(x)
qqline(x)

  1. Does this data look normal?
t.test(x, mu=3)
## 
##  One Sample t-test
## 
## data:  x
## t = 1.3789, df = 9, p-value = 0.2012
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
##  2.0392 6.9608
## sample estimates:
## mean of x 
##       4.5
  1. What are the null and alternate hypotheses?
  2. What is the test statistic?
  3. What was the mean number of buses observed?
  4. Write a statistical conclusion.
  5. Write a scientific conclusion.

Two-sample

Paired

A scientist wants to compare the weights of mice before and after treatment. She wants to see if there is any significant difference in their weights.

before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)

after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)

mice <- data.frame(group = rep(c("before", "after"), each = 10), weight = c(before,  after))

d <-with(mice, weight[group == "before"] - weight[group == "after"])
shapiro.test(d)
## 
##  Shapiro-Wilk normality test
## 
## data:  d
## W = 0.94536, p-value = 0.6141
  1. What does the Shapiro-Wilk test tell us?
t.test(before, after, paired=TRUE)
## 
##  Paired t-test
## 
## data:  before and after
## t = -20.883, df = 9, p-value = 6.2e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -215.5581 -173.4219
## sample estimates:
## mean of the differences 
##                 -194.49
  1. What are the null and alternate hypotheses?
  2. What was the value for the mean difference?
  3. Write a statistical conclusion.
  4. Write a biological conclusion.

Equal variance (unpaired)

We want to know - does the average woman’s weight differ from the average mans?

women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)

men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4) 

genderweights <- data.frame(group = rep(c("Woman", "Man"), each = 9), weight = c(women_weight,  men_weight))

genderweights
##    group weight
## 1  Woman   38.9
## 2  Woman   61.2
## 3  Woman   73.3
## 4  Woman   21.8
## 5  Woman   63.4
## 6  Woman   64.6
## 7  Woman   48.4
## 8  Woman   48.8
## 9  Woman   48.5
## 10   Man   67.8
## 11   Man   60.0
## 12   Man   63.4
## 13   Man   76.0
## 14   Man   89.4
## 15   Man   73.3
## 16   Man   67.3
## 17   Man   61.3
## 18   Man   62.4
# Shapiro-Wilk normality test for Men's weights
with(genderweights, shapiro.test(weight[group == "Man"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[group == "Man"]
## W = 0.86425, p-value = 0.1066
# Shapiro-Wilk normality test for Women's weights
with(genderweights, shapiro.test(weight[group == "Woman"]))
## 
##  Shapiro-Wilk normality test
## 
## data:  weight[group == "Woman"]
## W = 0.94266, p-value = 0.6101
var.test(weight~group, data = genderweights)
## 
##  F test to compare two variances
## 
## data:  weight by group
## F = 0.36134, num df = 8, denom df = 8, p-value = 0.1714
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.08150656 1.60191315
## sample estimates:
## ratio of variances 
##          0.3613398
  1. Have the assumptions of a t-test been satisfied? Use the output to support your claim.
t.test(women_weight, men_weight, var.equal=TRUE)
## 
##  Two Sample t-test
## 
## data:  women_weight and men_weight
## t = -2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29.748019  -4.029759
## sample estimates:
## mean of x mean of y 
##  52.10000  68.98889
  1. What are the null and alternate hypotheses?
  2. What was the value for the mean difference?
  3. Write a statistical conclusion.
  4. Write a biological conclusion.

Unequal variance (unpaired)

If the above example was found to have unequal variances, the code would be:

t.test(women_weight, men_weight, var.equal=FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  women_weight and men_weight
## t = -2.7842, df = 13.114, p-value = 0.01538
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -29.981920  -3.795858
## sample estimates:
## mean of x mean of y 
##  52.10000  68.98889
  1. What are the null and alternate hypotheses?
  2. What was the value for the mean difference?
  3. Write a statistical conclusion.
  4. Write a scientific conclusion.

Wilcoxon signed-rank (one-sample)

When data cannot be assumed to be normally distributed.

Is the median weight of the rabbits greater than 25g?

set.seed(1234)
rabbits = data.frame(name = paste0(rep("R_", 10), 1:10), weight = round(rnorm(10, 30, 2), 1))

wilcox.test(rabbits$weight, mu=25, alternative = "greater")
## Warning in wilcox.test.default(rabbits$weight, mu = 25, alternative =
## "greater"): cannot compute exact p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  rabbits$weight
## V = 55, p-value = 0.002897
## alternative hypothesis: true location is greater than 25
  1. What are the null and alternate hypotheses?
  2. Write a statistical conclusion.
  3. Write a scientific conclusion.

Wilcoxon Mann Whitney 2-sample

mpg = gas milage of various 1974 U.S. automobiles

am = transmission type (0 = automatic, 1 = manual)

The gas milage data for manual and automatic transmissions are independent.

wilcox.test(mpg~am, data=mtcars)
## Warning in wilcox.test.default(x = c(21.4, 18.7, 18.1, 14.3, 24.4, 22.8, :
## cannot compute exact p-value with ties
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  mpg by am
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0
  1. What are the null and alternate hypotheses?
  2. Write a statistical conclusion.
  3. Write a scientific conclusion.

Paired Wilcoxon 2-sample

5 varieties of barley were grown in six locations in 1931 and 1932.

Loc = location

Var = variety of barley (“manchuria”, “svansota”, “velvet”, “trebi” and “peatland”)

Y1 = yield in 1931

Y2 = yield in 1932

Want to find out if there is any difference between the yields in 1931 and 1932 (if they are identical populations).

library(MASS)
head(immer)
##   Loc Var    Y1    Y2
## 1  UF   M  81.0  80.7
## 2  UF   S 105.4  82.3
## 3  UF   V 119.7  80.4
## 4  UF   T 109.7  87.2
## 5  UF   P  98.3  84.2
## 6   W   M 146.6 100.4
wilcox.test(immer$Y1, immer$Y2, paired = TRUE)
## Warning in wilcox.test.default(immer$Y1, immer$Y2, paired = TRUE): cannot
## compute exact p-value with ties
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  immer$Y1 and immer$Y2
## V = 368.5, p-value = 0.005318
## alternative hypothesis: true location shift is not equal to 0
  1. What are the null and alternate hypotheses?
  2. Write a statistical conclusion.
  3. Write a scientific conclusion.