Unless otherwise stated, assume all significance levels are 0.05.
file_path <- "http://www.sthda.com/sthda/RDoc/data/housetasks.txt"
housetasks <- read.delim(file_path, row.names = 1)
chisq = chisq.test(housetasks)
chisq$observed
## Wife Alternating Husband Jointly
## Laundry 156 14 2 4
## Main_meal 124 20 5 4
## Dinner 77 11 7 13
## Breakfeast 82 36 15 7
## Tidying 53 11 1 57
## Dishes 32 24 4 53
## Shopping 33 23 9 55
## Official 12 46 23 15
## Driving 10 51 75 3
## Finances 13 13 21 66
## Insurance 8 1 53 77
## Repairs 0 3 160 2
## Holidays 0 1 6 153
round(chisq$expected,2)
## Wife Alternating Husband Jointly
## Laundry 60.55 25.63 38.45 51.37
## Main_meal 52.64 22.28 33.42 44.65
## Dinner 37.16 15.73 23.59 31.52
## Breakfeast 48.17 20.39 30.58 40.86
## Tidying 41.97 17.77 26.65 35.61
## Dishes 38.88 16.46 24.69 32.98
## Shopping 41.28 17.48 26.22 35.02
## Official 33.03 13.98 20.97 28.02
## Driving 47.82 20.24 30.37 40.57
## Finances 38.88 16.46 24.69 32.98
## Insurance 47.82 20.24 30.37 40.57
## Repairs 56.77 24.03 36.05 48.16
## Holidays 55.05 23.30 34.95 46.70
chisq
##
## Pearson's Chi-squared test
##
## data: housetasks
## X-squared = 1944.5, df = 36, p-value < 2.2e-16
Given the above output:
Suppose the IQ in a certain population is normally distributed with a mean = 100 and standard deviation = 15.
A scientist wants to know if a new medication affects IQ levels, so she recruits 20 patients to use it for one month and records their IQ levels at the end of the month.
library(BSDA)
## Loading required package: lattice
##
## Attaching package: 'BSDA'
## The following object is masked from 'package:datasets':
##
## Orange
iqlevels = c(88, 92, 94, 94, 96, 97, 97, 97, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 115)
z.test(iqlevels, mu=100, sigma.x=15)
##
## One-sample z-Test
##
## data: iqlevels
## z = 0.90933, p-value = 0.3632
## alternative hypothesis: true mean is not equal to 100
## 95 percent confidence interval:
## 96.47608 109.62392
## sample estimates:
## mean of x
## 103.05
Given the above output:
Suppose the IQ levels among individuals in two different cities are known to be normally distributed each with population standard deviations = 15.
A scientist wants to know if the mean IQ level between individuals in city A and city B are different, so she selects a simple random sample of 20 individuals from each city and records their IQ levels.
library(BSDA)
cityA = c(82, 84, 85, 89, 91, 91, 92, 94, 99, 99, 105, 109, 109, 109, 110, 112, 112, 113, 114, 114)
cityB = c(90, 91, 91, 91, 95, 95, 99, 99, 108, 109, 109, 114, 115, 116, 117, 117, 128, 129, 130, 133)
z.test(x=cityA, y=cityB, mu=0, sigma.x=15, sigma.y=15)
##
## Two-sample z-Test
##
## data: cityA and cityB
## z = -1.7182, p-value = 0.08577
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -17.446925 1.146925
## sample estimates:
## mean of x mean of y
## 100.65 108.80
Given the above output:
We want to test if the mean number of buses that passes by in one day is 3.
x <- c(3,7,11,0,7,0,4,5,6,2)
qqnorm(x)
qqline(x)
t.test(x, mu=3)
##
## One Sample t-test
##
## data: x
## t = 1.3789, df = 9, p-value = 0.2012
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
## 2.0392 6.9608
## sample estimates:
## mean of x
## 4.5
A scientist wants to compare the weights of mice before and after treatment. She wants to see if there is any significant difference in their weights.
before <-c(200.1, 190.9, 192.7, 213, 241.4, 196.9, 172.2, 185.5, 205.2, 193.7)
after <-c(392.9, 393.2, 345.1, 393, 434, 427.9, 422, 383.9, 392.3, 352.2)
mice <- data.frame(group = rep(c("before", "after"), each = 10), weight = c(before, after))
d <-with(mice, weight[group == "before"] - weight[group == "after"])
shapiro.test(d)
##
## Shapiro-Wilk normality test
##
## data: d
## W = 0.94536, p-value = 0.6141
t.test(before, after, paired=TRUE)
##
## Paired t-test
##
## data: before and after
## t = -20.883, df = 9, p-value = 6.2e-09
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -215.5581 -173.4219
## sample estimates:
## mean of the differences
## -194.49
We want to know - does the average woman’s weight differ from the average mans?
women_weight <- c(38.9, 61.2, 73.3, 21.8, 63.4, 64.6, 48.4, 48.8, 48.5)
men_weight <- c(67.8, 60, 63.4, 76, 89.4, 73.3, 67.3, 61.3, 62.4)
genderweights <- data.frame(group = rep(c("Woman", "Man"), each = 9), weight = c(women_weight, men_weight))
genderweights
## group weight
## 1 Woman 38.9
## 2 Woman 61.2
## 3 Woman 73.3
## 4 Woman 21.8
## 5 Woman 63.4
## 6 Woman 64.6
## 7 Woman 48.4
## 8 Woman 48.8
## 9 Woman 48.5
## 10 Man 67.8
## 11 Man 60.0
## 12 Man 63.4
## 13 Man 76.0
## 14 Man 89.4
## 15 Man 73.3
## 16 Man 67.3
## 17 Man 61.3
## 18 Man 62.4
# Shapiro-Wilk normality test for Men's weights
with(genderweights, shapiro.test(weight[group == "Man"]))
##
## Shapiro-Wilk normality test
##
## data: weight[group == "Man"]
## W = 0.86425, p-value = 0.1066
# Shapiro-Wilk normality test for Women's weights
with(genderweights, shapiro.test(weight[group == "Woman"]))
##
## Shapiro-Wilk normality test
##
## data: weight[group == "Woman"]
## W = 0.94266, p-value = 0.6101
var.test(weight~group, data = genderweights)
##
## F test to compare two variances
##
## data: weight by group
## F = 0.36134, num df = 8, denom df = 8, p-value = 0.1714
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.08150656 1.60191315
## sample estimates:
## ratio of variances
## 0.3613398
t.test(women_weight, men_weight, var.equal=TRUE)
##
## Two Sample t-test
##
## data: women_weight and men_weight
## t = -2.7842, df = 16, p-value = 0.01327
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.748019 -4.029759
## sample estimates:
## mean of x mean of y
## 52.10000 68.98889
If the above example was found to have unequal variances, the code would be:
t.test(women_weight, men_weight, var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: women_weight and men_weight
## t = -2.7842, df = 13.114, p-value = 0.01538
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -29.981920 -3.795858
## sample estimates:
## mean of x mean of y
## 52.10000 68.98889
When data cannot be assumed to be normally distributed.
Is the median weight of the rabbits greater than 25g?
set.seed(1234)
rabbits = data.frame(name = paste0(rep("R_", 10), 1:10), weight = round(rnorm(10, 30, 2), 1))
wilcox.test(rabbits$weight, mu=25, alternative = "greater")
## Warning in wilcox.test.default(rabbits$weight, mu = 25, alternative =
## "greater"): cannot compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: rabbits$weight
## V = 55, p-value = 0.002897
## alternative hypothesis: true location is greater than 25
mpg = gas milage of various 1974 U.S. automobiles
am = transmission type (0 = automatic, 1 = manual)
The gas milage data for manual and automatic transmissions are independent.
wilcox.test(mpg~am, data=mtcars)
## Warning in wilcox.test.default(x = c(21.4, 18.7, 18.1, 14.3, 24.4, 22.8, :
## cannot compute exact p-value with ties
##
## Wilcoxon rank sum test with continuity correction
##
## data: mpg by am
## W = 42, p-value = 0.001871
## alternative hypothesis: true location shift is not equal to 0
5 varieties of barley were grown in six locations in 1931 and 1932.
Loc = location
Var = variety of barley (“manchuria”, “svansota”, “velvet”, “trebi” and “peatland”)
Y1 = yield in 1931
Y2 = yield in 1932
Want to find out if there is any difference between the yields in 1931 and 1932 (if they are identical populations).
library(MASS)
head(immer)
## Loc Var Y1 Y2
## 1 UF M 81.0 80.7
## 2 UF S 105.4 82.3
## 3 UF V 119.7 80.4
## 4 UF T 109.7 87.2
## 5 UF P 98.3 84.2
## 6 W M 146.6 100.4
wilcox.test(immer$Y1, immer$Y2, paired = TRUE)
## Warning in wilcox.test.default(immer$Y1, immer$Y2, paired = TRUE): cannot
## compute exact p-value with ties
##
## Wilcoxon signed rank test with continuity correction
##
## data: immer$Y1 and immer$Y2
## V = 368.5, p-value = 0.005318
## alternative hypothesis: true location shift is not equal to 0