library(readr)
library(pander)
library(BSDA)
\(~\)
Problem 1: The following data presents the productivity scores of 10 workers before and after a certain training program designed to improve productivity. Higher scores indicate better productivity.
Worker | Before | After |
---|---|---|
1 | 95 | 105 |
2 | 97 | 111 |
3 | 105 | 111 |
4 | 94 | 106 |
5 | 103 | 106 |
6 | 97 | 104 |
7 | 98 | 105 |
8 | 95 | 102 |
9 | 100 | 106 |
10 | 95 | 102 |
Do these data provide sufficient evidence that the training program has improved productivity? Perform a hypothesis test at the \(5\%\) significance level.
\(~\)
Problem: Is the mean productivity scores of workers after the training program significantly higher than their productivity scores before the program?
let \(\mu_1\) be the mean productivity scores of the workers after the training program.
let \(\mu_2\) be the mean productivity scores of the workers before the training program.
Ho: The mean productivity scores of the workers after the training program is less than or equal to the mean productivity scores before the training program.
Ha: The mean productivity scores of the workers after the training program is greater than the mean productivity scores before the training program.
significance level: \(0.05\).
Test Statistic: t-statistic for dependent or paired observations.
Critical Region:
tcritical <- qt(0.05, 9, lower.tail=F)
tcritical
[1] 1.833113
Decision Rule: Reject Ho if t-computed is greater than or equal to \(1.833\). Otherwise, fail to reject Ho.
# Create data manually:
before <- c(95, 97, 105, 94, 103, 97, 98, 95, 100, 95)
after <- c(105, 111, 111, 106, 106, 104, 105, 102, 106, 102)
pander(t.test(after, before, paired = TRUE, alternative = "greater", mu=0))
Test statistic | df | P value | Alternative hypothesis |
---|---|---|---|
7.776 | 9 | 1.388e-05 * * * | greater |
mean of the differences |
---|
7.9 |
\(~\)
\(~\)
Problem 2: According to a recent study, when shopping online for luxury goods, men spend a mean of \(\$2,401\), whereas women spend a mean of \(\$1,527\). (Data extracted from R.A. Smith, “Fashion Online: Retailers Tackle the Gender Gap,” The Wall Street Journal, March 13, 2008, pp. D1, D10) Suppose that the study was based on a sample of 600 mean and 700 women, and the standard deviation of the amount spent was \(\$1,200\) for men and \(\$1,000\) for women. At the \(0.01\) significance level, is there evidence that the mean amount spent by men is higher than that by women?
Problem: Is the mean amount spent shopping online by men (significantly) greater than the mean amount spent by women?
Let \(\mu_1\) be the mean amount spent shopping online by men.
Let \(\mu_2\) be the mean amount spent shopping online by women.
Ho: The mean amount spent shopping online by men is less than or equal to the mean amount spent by women.
Ha: The mean amount spent shopping online by men is greater than the mean amount spent by women.
Significance Level: \(0.01\).
Test Statistic: t-statistic assuming equal popn variances (since the popn variances or standard deviations are unknowns).
Critical Region:
tcritical <- qt(0.01, 1298, lower.tail = F)
tcritical
[1] 2.329224
Decision Rule: Reject Ho if t-computed is greater than \(2.329\). Otherwise, fail to reject Ho. (Note: since the sample sizes are way too large, the critical t-value would be the same as that reported in the z-table for a one-tailed test at the 0.01 significance level.)
pander(tsum.test(mean.x = 2401, s.x = 1200, n.x = 600, mean.y = 1527, s.y = 1000, n.y = 700,
alternative = "greater", mu = 0, var.equal = TRUE))
Test statistic | df | P value | Alternative hypothesis | mean of x | mean of y |
---|---|---|---|---|---|
14.32 | 1298 | 0 * * * | greater | 2401 | 1527 |
\(~\)
Problem 3: A bank with a branch located in a commercial district of a city has the business objective of developing an improved process for serving customers during the noon-to-1 pm lunch period. Management decides to first study the waiting time in the current process. The waiting time is defined as the time that elapses from when the customer enters the line until he or she reaches the teller window. Data are collected from a random sample of 35 customers.
Suppose that another branch, located in a residential area, is also concerned with improving the process of serving customers in the noon-to-1 pm lunch period. Data are collected from a random sample of 40 customers. The obtained data from the two branches are stored in the “waitingtime.csv” file.
At the \(5\%\) significance level, is there evidence of a difference in the mean waiting time between the two branches? What is/are implications of the result?
# Import "waitingtime.csv" file.
waitingtime <- read.csv("waitingtime.csv")
head(waitingtime)
branch1 branch2
1 3.25 8.21
2 7.26 7.55
3 1.29 9.25
4 2.15 8.15
5 3.85 9.77
6 4.25 12.25
# Test for equality of population variances.
pander(var.test(waitingtime$branch1, waitingtime$branch2))
Test statistic | num df | denom df | P value | Alternative hypothesis |
---|---|---|---|---|
0.6195 | 34 | 39 | 0.1582 | two.sided |
ratio of variances |
---|
0.6195 |
# Perform the hypothesis test:
pander(t.test(waitingtime$branch1, waitingtime$branch2, alternative = "two.sided",
mu=0, var.equal = TRUE))
Test statistic | df | P value | Alternative hypothesis | mean of x |
---|---|---|---|---|
-9.744 | 73 | 7.609e-15 * * * | two.sided | 3.76 |
mean of y |
---|
7.893 |
\(~\)
The test for equality of variances indicate that the population variances are equal (\(p > 0.05\)). The assumption of equality of variances is hence considered for the hypothesis test.