Inference for Two Means

library(tidyverse)

Questions

Provide a detailed and unambiguous example of a business question that is appropriately addressed with a paired samples $t$-test (and sufficiently different than examples discussed in the books or class).

Answer:

 At McDonald’s, we tested if adding table service (i.e. order your food at the counter, and we’ll bring it out to your table) would materially improve overall customer satisfaction vs. the standard in-store procedure, where customers need to wait by the counter for their order after ordering. We found a group of restaurants, replaced the standard in-store procedure with table service, and measured the pre-post change in in-store customer satisfaction, with pre-period reflecting the CSAT scores before adding table service, and post-period reflecting the scores after table service.

Provide a detailed and unambiguous example of a business question that is appropriately addressed with an independent samples $t$-test (and sufficiently different than examples discussed in the books or class).

Answer:

 At McDonald’s we launched a delivery pilot in 2017 partnering with a major mobile delivery partner. We needed to figure out what was the optimal digital menu layout within the mobile delivery partner’s app. We created 2 configurations and tested them across 2 similar, but independent groups of consumers. After 4 weeks, we evaluated which layout was rated best in terms of user experience and basket creation / conversion rates, and adopted the “winner”.

Provide an explanation to a coworker that is trying to understand the difference between a “paired samples $t$-test” and a “two independent samples $t$-test.”

Answer:

 Means can be  paired or independent.  You must first determine what kind of mean to correctly run a t-test. A paired sample t-test will have a "link" in some way thus looking at the difference score between the two observations. The null hypothesis is true then there will be no difference in the population mean score. Basically, it is a single sample test of the difference. A independent sample t-test is comparing two independent groups and see if they are significantly different from one another.

Here is a sample, by day, of the FedEx and UPS cargo tonnage at the Indianapolis airport. Notice that the FedEx and UPS data are taken on the same day. Is there any statistical difference in the mean amount of cargo handled by the companies at the Indianapolis airport?

Day	FedEx	UPS
27-Dec	2.5	2.1
13-Jan	3.5	4.24
9-Mar	2.5	3.01
27-Nov	7.2	5.2
9-Oct	4.72	3.1
19-Oct	3	3.3
28-Feb	3.7	2.2
20-Jul	5.1	2.95
26-Jul	2.9	2.6
9-Mar	3.95	1.9

  indy <- data.frame(cbind(
    c('27-dec','13-1','9-mar','27-10','9-10','19-10','28-2','20-7','26-7','9-3'),
    c(2.5,3.5,2.5,7.2,4.72,3,3.7,5.1,2.9,3.95),
    c(2.1,4.24,3.01,5.2,3.1,3.3,2.2,2.95,2.6,1.9)))
    colnames(indy) <- c('day','fedex','ups')

    indy <- indy %>%
    gather(key, value, -day)

    indy <- indy %>%
    mutate(key = as.factor(key), value = as.double(value))

    t_test <- t.test(value ~ key, 
           data = indy,
           alternative = c("two.sided"),
           var.equal = TRUE, 
           paired = TRUE)
    t_test

## 
##    Paired t-test
## 
## data:  value by key
## t = 2.3519, df = 9, p-value = 0.04317
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.03230894 1.66169106
## sample estimates:
## mean of the differences 
##                   0.847

What is the null hypothesis?
Answer:

 Null is that FedEx and UPS are the same

What is the $p$-value for the test of the null hypothesis?
Answer:
```
 0.0431727  
```
What is the appropriate 95% confidence interval?
Answer:
```
 0.0323089, 1.6616911     
```

What is a one-sentence summary of the analysis?
Answer:

 Based on the P-Value we reject the Null Hypothesis.

We are in an environment where screen time is across four screens (phone, computer, home TV, and TVs outside of the home) and radio. A marketing firm used a sample of high-value individuals to collect data on the hours per week spent watching television or steaming media and hours per week spent listening to radio and podcasts to assess if there were any differences in the mean amount of time consuming the different media platforms. The data set is available here as an SPSS file and here as a CSV file.
```
w_l <- read_csv('~/Downloads/Watching_and_Listening (2).csv')

w_l <- w_l %>%
gather(key,value, - ID)

w_l <- w_l %>%
mutate(key = as.factor(key), value = as.double(value))

w_l_t <- t.test(value ~ key, data = w_l, paired = TRUE)
w_l_t
```
```
## 
##    Paired t-test
## 
## data:  value by key
## t = 2.9802, df = 15, p-value = 0.009341
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.174769 7.074918
## sample estimates:
## mean of the differences 
##                4.124844
```
1. What specific type of $t$-test is appropriate?
  Answer:
```
 Paired is most appropriate here  
```
2. What is the $p$-value?
  Answer:
```
 0.0093412 
```
3. What is the 95% confidence interval?
  Answer:
```
 0.0093412  
```
4. What is your one sentence summary for conveying this to a top management team?
  Answer:
```
 We Reject the Null Hypothesis  
```

A manufacturer produces both a deluxe and a standard model of handmade coffee mugs. Selling prices obtained from a sample of retail partners are provided here. The suggested difference in the retail price between the deluxe and standard is $10, due to the premium that should be charged for the deluxe version. Use a .05 Type I error rate and test that the mean difference between the prices of the two models at different retail stores is $10.

Retail Outlet	Deluxe	Standard
1	39	27
2	39	28
3	45	35
4	38	30
5	40	30
6	39	34
7	35	29

    mugs <- data.frame(cbind(c(1,2,3,4,5,6,7),
             c(39,39,45,38,40,39,35),
             c(27,28,35,30,30,34,29)))

    colnames(mugs) <- c('outlet','deluxe','standard')                   

    mugs <- mugs %>% 
    gather(key,value, -outlet)

    mug_t.test <- t.test(value ~ key, data = mugs,mu = 10,paired = TRUE) 
    mug_t.test

## 
##    Paired t-test
## 
## data:  value by key
## t = -1.1587, df = 6, p-value = 0.2906
## alternative hypothesis: true difference in means is not equal to 10
## 95 percent confidence interval:
##   6.443752 11.270534
## sample estimates:
## mean of the differences 
##                8.857143

What is the appropriate test statistic and $p$-value for the test of interest?
Answer:
```
 T-value: -1.1587309   
 P-Value: 0.2905957  
```
What is the 95% confidence interval for the difference between the mean prices of the two models?
Answer:
```
 6.4437519, 11.2705338  
```

On average, are retailers following the advice of a $10 differential between the versions? =
Answer:

 They do on average follow the advice because we fail to reject the null hypothesis. The  p-value is .490. They are

When Major League Baseball expanded by 15%, the number of players put on the disabled list due to injury increased 32% over the same period (USA Today). Address whether Major League Baseball players being put on the disabled list are on the list longer, on average, now as compared to a decade ago.

    #Created a function to find the t and P value 
    ind_two_means_T <- function(x1,x2,sd1,sd2,n1,n2,obs,D,sides) {
    mean_diff <- (x1-x2)
    df <- (n1+n2)-obs 
    pooled <-sqrt((((sd1^2)*(n1-1))+((sd2^2)*(n2-1)))/df)
    std_err_mean_diff = pooled*(sqrt((1/n1)+(1/n2)))
    t <- (mean_diff - D)/std_err_mean_diff
    p <- sides*pt(-abs(t),df)
    return(cat('mean difference - ',mean_diff,
     '\ndegress freedom - ',df,
     '\npooled -',pooled,
     '\nstd err. - ',std_err_mean_diff,
     '\nt-value -  ',t,
     '\np-value -  ',p))
    }

Formulate null and alternative hypotheses that can be used to address the question.
Answer:
```
 Ho: M1 >= M2  
 Ha: M1 < M2  
```

Here is a summary of the data.

Metrics |  Now | Last Decade
 ------------- | ------------- | ------------- 
 Sample Size | $n_1=45$  | $n_2=38$ |
Sample Mean | $\bar{x}_1=60$ | $\bar{x}_2=51$ |
 Sample Standard Deviation | $s_1=18$ | $s_2=15$

What is the point estimate of the difference between population mean number of days on the disabled list Now compared to Last Decade?
Answer:
```
 60 - 51 = 9  
```

Use $\alpha=.01$. What is your conclusion about the number of days on the disabled list? What is the $p$-value?
Answer:

ind_two_means_T(60,51,18,15,45,38,2,0,1)

## mean difference -  9 
## degress freedom -  81 
## pooled - 16.69664 
## std err. -  3.678494 
## t-value -   2.446653 
## p-value -   0.008291782

Do these data suggest that Major League Baseball should be concerned about the apparent increase of players on the disabled list? Provide an explanation and use formal statistical reasoning for your conclusions/statements you make.
Answer: Fail the reject the Null hypothesis. Insufficient evidence that now is less than the last decade.

Periodically, Merrill Lynch customers are asked to evaluate Merrill Lynch financial consultants. Higher ratings on the client satisfaction survey indicate better service, with 7 the maximum service rating. Independent samples of service ratings for two financial consultants are summarized here from high-value clients. Use $\alpha=.05$ and test to see if there is evidence that one of the two consultants performs statistically better than the other.

Metrics Consultant A Consultant B

Sample Size $n_1=16$ $n_2=10$

Sample Mean $\bar{x}_1=6.82$ $\bar{x}_2=6.25$

Sample Standard Deviation $s_1=.64$ $s_2=.75$
1. State the null and alternative hypotheses.
  Answer:
```
 Ho = Consultant A = Consultant B  
 Ha = Consultant A != Consultant B  
```
2. Compute the value of the test statistic.
  Answer:
```
ind_two_means_T(6.82,6.25,.64,.75,16,10,2,0,1)
```
```
## mean difference -  0.57 
## degress freedom -  24 
## pooled - 0.6833283 
## std err. -  0.2754584 
## t-value -   2.069278 
## p-value -   0.02472583
```
3. What is the $p$-value?
  Answer:
```
 p-value -   0.02472583  
```
4. What is your conclusion?
  Answer:
```
 Reject the Null Hypothesis there is a statistical difference  
```
FedEx and UPS are the world’s two leading cargo carriers by volume and revenue (The Wall Street Journal). Memphis International and Louisville are two of the 10 largest cargo airports in the world. The following are random samples of the tons of cargo per day handled by these two airports. Is there evidence that there is a difference in the amount of cargo handled by the airports? You may use SPSS or Excel.

Memphis louisville

9.1 4.7

15.1 5

8.8 4.2

10 3.3

7.5 5.5

10.5 2.2

8.3 4.1

9.2 2.6

6 3.4

5.8 7

12.1

9.3
```
memphis <- c(9.1,15.1,8.8,10,7.5,10.5,8.3,9.2,6,5.8,12.1,9.3)
louisville <- c(4.7,5,4.2,3.3,5.5,2.2,4.1,2.6,3.4,7)

 cat('           Memphis | Louisville\n','mean - ',mean(memphis),' | ',mean(louisville),'\ncount -       ',length(memphis),' | ',length(louisville),'\nsd -    ',sd(memphis),' | ',sd(louisville))
```
```
##            Memphis | Louisville
##  mean -  9.308333  |  4.2 
## count -        12  |  10 
## sd -     2.542175  |  1.431394
```
```
  ind_two_means_T(9.308,4.2,2.542,1.43,12,10,2,0,2) 
```
```
## mean difference -  5.108 
## degress freedom -  20 
## pooled - 2.115225 
## std err. -  0.9056851 
## t-value -   5.63993 
## p-value -   1.608778e-05
```
1. What are the most appropriate null and alternative hypotheses?
  Answer:
```
Ho: Memphis = Louisvlle  
Ha: Memphis != Louisville  
```
2. What is the mean difference?
  Answer:
```
 mean difference -  5.108 
```
3. What is the pooled standard deviation?
  Answer:
```
 pooled - 2.115225  
```
4. What is the standard error of the mean difference?
  Answer:
```
 std err. -  0.9056851  
```
5. Assuming homogeneity of variance and a Type 1 error rate of .05, can the null hypothesis be rejected?
  Answer:
```
 Reject the Null 
```
6. What is the 95% confidence interval for the mean difference?
  Answer:
```
 [3.219, 6.998]  
```
7. What is a one-sentence summary of the analysis?
  Answer:
```
 On average Memphis airport is carrying more  
```
Using the power.t.test() function, suppose you have two groups in which you wish to try to estimate a reasonable sample size to have 90% statistical power to find an effect that shows there is a 1 standard deviation impact on the mean. Assume that you want a two-sided test, a Type I error rate of .05, and assume homogeneity of variance. What is the necessary sample size per group that should be used? As before in the Inference for a Mean assignment, be sure to examine the helpfile and consider each of the arguments carefully for the situation.
```
power.t.test(power=.90, delta=1, sig.level=.05,alternative="two.sided")
```
```
## 
##      Two-sample t test power calculation 
## 
##               n = 22.0211
##           delta = 1
##              sd = 1
##       sig.level = 0.05
##           power = 0.9
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
```

Metrics	Consultant A	Consultant B
Sample Size	\(n_1=16\)	\(n_2=10\)
Sample Mean	\(\bar{x}_1=6.82\)	\(\bar{x}_2=6.25\)
Sample Standard Deviation	\(s_1=.64\)	\(s_2=.75\)

Memphis	louisville
9.1	4.7
15.1	5
8.8	4.2
10	3.3
7.5	5.5
10.5	2.2
8.3	4.1
9.2	2.6
6	3.4
5.8	7
12.1
9.3

Watch the John Rauser Keynote Statistics Without the Agonizing Pain. He talks about “50,000 repetitions” in which the label does not matter. Relate this to the idea of how we talked about sampling distributions. John Rauser compares (unflattering) $t$-tests with an empirical approach to statistical inference known as resampling (that he demonstrates; more to come on this topic later; of which the jackknife and bootstrap methods are special cases). Provide some thoughts about the theoretical and the empirical approaches two two group compairisons and how they may work for your particular situation.

Answer:

 John talked about two groups, one being the STATS 101 aka analytical method and the second being the computational method. It is true, he describes the analytical method in an unflattering way. John describes the sampling test and which provided two groups means and has a difference of 4.4 thus creating a Null Hypothesis and Alternative Hypothesis..The debate was whether drinking beer has an effect on being bit by a mosquito. John walks us through the analytical method using welsh’s t-test. He was able to reject the skeptic by finding the critical value and t-statistic is  larger than critical value and reject the null hypothesis based on the 4.4 difference.What John says is that  sampling distribution formula is difficult to overcome and the idea of sampling distribution is hard to understand and presenting in mathematical form is hopeless. In the computational method we are trying to figure out if 4.4 is a large or small difference. The data is color coded and random shuffled around to compute a new mean thus creating a  new difference. He does this over and over again. After repeating the process; what happens is a sampling distribution under the null hypothesis argument. To complete this test; we need three  things to follow the computational method : ability to follow a simple logical argument, random number generation and iteration. The computational way allows us to understand the arguments in a simple and clear way. If you can program a computer you have super power to learn statistics. You can tinker with the fundamentals of statistics and be playful instead of fearful.

Inference for Two Means

Ken Kelley’s Statistics for Managerial Decision Making Course University of Notre Dame’s MS in Business Analytics Program

Ken Kelley’s Statistics for Managerial Decision Making Course
University of Notre Dame’s MS in Business Analytics Program