LRA 9-1 TEST CONCERNING THE MEANS OF TWO POPULATIONS

library(readr)
library(pander)
library(BSDA)

$~$

Problem 1: The following data presents the productivity scores of 10 workers before and after a certain training program designed to improve productivity. Higher scores indicate better productivity.

Worker	Before	After
1	95	105
2	97	111
3	105	111
4	94	106
5	103	106
6	97	104
7	98	105
8	95	102
9	100	106
10	95	102

Do these data provide sufficient evidence that the training program has improved productivity? Perform a hypothesis test at the $5\%$ significance level.

$~$

Problem: Is the mean productivity scores of workers after the training program significantly higher than their productivity scores before the program?
let $\mu_1$ be the mean productivity scores of the workers after the training program.
let $\mu_2$ be the mean productivity scores of the workers before the training program.
Ho: The mean productivity scores of the workers after the training program is less than or equal to the mean productivity scores before the training program.
Ha: The mean productivity scores of the workers after the training program is greater than the mean productivity scores before the training program.
significance level: $0.05$.
Test Statistic: t-statistic for dependent or paired observations.
Critical Region:

  tcritical <- qt(0.05, 9, lower.tail=F)
  tcritical

  [1] 1.833113

Decision Rule: Reject Ho if t-computed is greater than or equal to $1.833$. Otherwise, fail to reject Ho.

Computation:

# Create data manually:
before <- c(95, 97, 105, 94, 103, 97, 98, 95, 100, 95)
after <- c(105, 111, 111, 106, 106, 104, 105, 102, 106, 102)
pander(t.test(after, before, paired = TRUE, alternative = "greater", mu=0))

Paired t-test: `after` and `before` (continued below)
Test statistic	df	P value	Alternative hypothesis
7.776	9	1.388e-05 * * *	greater

mean of the differences
7.9

$~$

Decision: Reject Ho since t-computed is greater than $1.833$.
Conclusion: There is no sufficient data supporting Ho. Hence, the mean productivity scores of the workers after the training program is significantly greater than their mean productivity scores before the training program. This would further imply that the training program has improved productivity.

$~$

Problem 2: According to a recent study, when shopping online for luxury goods, men spend a mean of $\$2,401$, whereas women spend a mean of $\$1,527$. (Data extracted from R.A. Smith, “Fashion Online: Retailers Tackle the Gender Gap,” The Wall Street Journal, March 13, 2008, pp. D1, D10) Suppose that the study was based on a sample of 600 mean and 700 women, and the standard deviation of the amount spent was $\$1,200$ for men and $\$1,000$ for women. At the $0.01$ significance level, is there evidence that the mean amount spent by men is higher than that by women?

Problem: Is the mean amount spent shopping online by men (significantly) greater than the mean amount spent by women?
Let $\mu_1$ be the mean amount spent shopping online by men.
Let $\mu_2$ be the mean amount spent shopping online by women.
Ho: The mean amount spent shopping online by men is less than or equal to the mean amount spent by women.
Ha: The mean amount spent shopping online by men is greater than the mean amount spent by women.
Significance Level: $0.01$.
Test Statistic: t-statistic assuming equal popn variances (since the popn variances or standard deviations are unknowns).
Critical Region:

tcritical <- qt(0.01, 1298, lower.tail = F)
tcritical

  [1] 2.329224

Decision Rule: Reject Ho if t-computed is greater than $2.329$. Otherwise, fail to reject Ho. (Note: since the sample sizes are way too large, the critical t-value would be the same as that reported in the z-table for a one-tailed test at the 0.01 significance level.)

Computation:

pander(tsum.test(mean.x = 2401, s.x = 1200, n.x = 600, mean.y = 1527, s.y = 1000, n.y = 700, 
                 alternative = "greater", mu = 0, var.equal = TRUE))

Standard Two-Sample t-Test: `Summarized x` and `y` $~$
Test statistic	df	P value	Alternative hypothesis	mean of x	mean of y
14.32	1298	0 * * *	greater	2401	1527

Decision: Reject Ho since t-computed is greater than $2.329$.
Conclusion: There is no sufficient data supporting Ho. Hence, the mean amount spent shopping online by men is significantly greater than the mean amount spent by women.

$~$

Problem 3: A bank with a branch located in a commercial district of a city has the business objective of developing an improved process for serving customers during the noon-to-1 pm lunch period. Management decides to first study the waiting time in the current process. The waiting time is defined as the time that elapses from when the customer enters the line until he or she reaches the teller window. Data are collected from a random sample of 35 customers.

Suppose that another branch, located in a residential area, is also concerned with improving the process of serving customers in the noon-to-1 pm lunch period. Data are collected from a random sample of 40 customers. The obtained data from the two branches are stored in the “waitingtime.csv” file.
At the $5\%$ significance level, is there evidence of a difference in the mean waiting time between the two branches? What is/are implications of the result?

Problem: Is there a significant difference in the mean waiting time of customers between the two bank branches?
Let $\mu_1$ be the mean waiting time of customers for the branch located in a commercial district.
Let $\mu_2$ be the mean waiting time of customers for the branch located in a residential area.
Ho: There is no significant difference in the mean waiting time of customers between the two bank branches. Ha: There is a significant difference in the mean waiting time of customers between the two bank branches.
Significance Level: $0.05$.
Test Statistic: t-statistic.
Computations:

# Import "waitingtime.csv" file.
waitingtime <- read.csv("waitingtime.csv")
head(waitingtime)

    branch1 branch2
  1    3.25    8.21
  2    7.26    7.55
  3    1.29    9.25
  4    2.15    8.15
  5    3.85    9.77
  6    4.25   12.25

# Test for equality of population variances.
pander(var.test(waitingtime$branch1, waitingtime$branch2))

F test to compare two variances: `waitingtime$branch1` and `waitingtime$branch2` (continued below)
Test statistic	num df	denom df	P value	Alternative hypothesis
0.6195	34	39	0.1582	two.sided

ratio of variances
0.6195

# Perform the hypothesis test:
pander(t.test(waitingtime$branch1, waitingtime$branch2, alternative = "two.sided", 
              mu=0, var.equal = TRUE))

Two Sample t-test: `waitingtime$branch1` and `waitingtime$branch2` (continued below)
Test statistic	df	P value	Alternative hypothesis	mean of x
-9.744	73	7.609e-15 * * *	two.sided	3.76

mean of y
7.893

$~$

The test for equality of variances indicate that the population variances are equal ($p > 0.05$). The assumption of equality of variances is hence considered for the hypothesis test.

Decision: Reject Ho since $p < 0.05$.
Conclusion: There is no sufficient data to support Ho. Hence, there is a significant difference in the mean waiting time of customers between the two bank branches. The result further imply that significantly lower waiting times are recorded for the bank branch located in the commercial district, on the average, compared to those of the branch in the residential area. Considering that more people are likely to transact with the bank in the commercial area than that in the residential area, the bank in the residential area could adopt the measures being employed by the branch in the commercial area in the hope of improving their customer service during the noon-to-1 pm lunch period.

LRA 9-1 TEST CONCERNING THE MEANS OF TWO POPULATIONS

AE311

null

Worker	Before	After
1	95	105
2	97	111
3	105	111
4	94	106
5	103	106
6	97	104
7	98	105
8	95	102
9	100	106
10	95	102

Worker	Before	After
1	95	105
2	97	111
3	105	111
4	94	106
5	103	106
6	97	104
7	98	105
8	95	102
9	100	106
10	95	102

Worker	Before	After
1	95	105
2	97	111
3	105	111
4	94	106
5	103	106
6	97	104
7	98	105
8	95	102
9	100	106
10	95	102