Notre Dame’s MSBA Program

Team One: Matt Bufalino, Will Mason, Mary Keller, Jackie Cisneros, Paul Sliwka

library(tidyverse)
options(scipen=999)

Questions
Inference for One and Two Proportions

Creating a function for inference one Proportion
Click Code to expand

inf_for_prop <- function(success,n,null,alpha,sides,alternative) {
  
  std_error_true <- sqrt(null*(1-null)/n)
  
  std_error_false <- sqrt(success*(1-success)/n)
  
  critical_values <- if (sides == 2) {
    qnorm(c(alpha/2,(1-(alpha/2)))) 
  } else if (sides == 1 & alternative == 'greater') {
    c(qnorm(1-alpha),'inf')
  } else if (sides == 1 & alternative == 'less') {
    c('-inf',qnorm(alpha)) 
  } else {
    return("argument needs greater or less")
  } 
  
  conf_interval <- if (sides == 2) {
    c(success+std_error_false*critical_values[1],success+std_error_false*critical_values[2])
  } else if (sides == 1 & alternative == 'greater') {
    c(success-std_error_false*as.numeric(critical_values[1]),1)
  } else if (sides == 1 & alternative == 'less') {
    c(0,success-std_error_false*as.numeric(critical_values[2]))
  } else {
    return("argument needs greater or less")
  }
  
  moe <- if (sides == 2) {
    success-conf_interval[1] 
  } else if (sides == 1 & alternative == 'greater') {
    success-conf_interval[1] 
  } else if (sides == 1 & alternative == 'less') {
    conf_interval[2] - success
  } else {
    return("argument needs greater or less")
  } 

  z <- (success-null)/sqrt((null*(1-null))/n)
  
  p <- if (sides == 2) {
    sides * (1-pnorm(abs(z))) 
  } else if (sides == 1 & alternative == 'greater') {
    sides * (1-pnorm(abs(z))) 
  } else if (sides == 1 & alternative == 'less') {
    pnorm(abs(z)) 
  } else {
    return("argument needs greater or less")
  }
  
  return(cat("Standard error Null True: ",std_error_true,
             "\nStandard error Null False: ",std_error_false,
             "\nCritical Values: ",critical_values,
             "\nMargin of Error: ",moe,
             "\nConfidence Interval: ",conf_interval,
             "\nZ-value: ",z,
             "\nP-Value",p))
}

Function for inference for two proportions
Click Code to expand

inf_for_prop_two <- function(prop1, n1, prop2, n2, alpha,sides,alternative) {
  diff <- prop1 - prop2
  
  pooled <- ((prop1*n1)+(prop2*n2))/(n1+n2)
  
  std_err_true <- sqrt((pooled*(1-pooled))*((1/n1)+(1/n2)))
  
  std_err_false <- sqrt(prop1*(1-prop1)/n1+prop2*(1-prop2)/n2)
  
  critical_values <- if (sides == 2) {
    qnorm(c(alpha/2,(1-(alpha/2)))) 
  } else if (sides == 1 & alternative == 'greater') {
    c(qnorm(1-alpha),'inf')
  } else if (sides == 1 & alternative == 'less') {
    c('-inf',qnorm(alpha)) 
  } else {
    return("argument needs greater or less")
  } 
  
  
  conf_interval <- if (sides == 2) {
    c(diff+std_err_false*critical_values[1],diff+std_err_false*critical_values[2])
  } else if (sides == 1 & alternative == 'greater') {
    c(diff-std_err_false*as.numeric(critical_values[1]),1)
  } else if (sides == 1 & alternative == 'less') {
    c(-1,diff-std_err_false*as.numeric(critical_values[2]))
  } else {
    return("argument needs greater or less")
  }
  

  z <- diff/std_err_true
  
  p <- if (sides == 2) {
    sides * (1-pnorm(abs(z))) 
  } else if (sides == 1 & alternative == 'greater') {
    sides * (1-pnorm(abs(z))) 
  } else if (sides == 1 & alternative == 'less') {
    pnorm(abs(z)) 
  } else {
    return("argument needs greater or less")
  }

  
  return(cat('\nObserved Proportion Difference: ',diff,
             '\nPooled Estimate of Proportion (for Hypothesis Test): ',pooled,
             '\nStandard Error of Difference in Proportions (Null True): ',std_err_true,
             '\nStandard Error of Difference in Proportions (Null Flase): ',std_err_false,
             '\nCritical Values: ',critical_values,
             '\nConfidence Interval: ',conf_interval,
             '\nZ-Value: ',z,
             '\nP-Value: ',p))
 
}

Consider the following null hypothesis significance test: \(H_0: p=.35\) \(H_a: p\neq.35\)
A sample of 300 provided a sample proportion \(p\)=.275.

Compute the value of the test statistic.

What is the \(p\)-value? Answers:

inf_for_prop(.275,300,.35,.05,2,'greater')

## Standard error Null True:  0.02753785 
## Standard error Null False:  0.02577951 
## Critical Values:  -1.959964 1.959964 
## Margin of Error:  0.05052692 
## Confidence Interval:  0.2244731 0.3255269 
## Z-value:  -2.723524 
## P-Value 0.006458954

At \(\alpha\)=.05, what is your conclusion? Answer:
```
 Reject the Null Hypothesis
```

What is/are your conclusions?
Answer:

 .35 is excluded from the upper confidence interval (i.e., .35 is higher than the range of plausible  
 parameter values(CI (.224.326)). and thus we accept the alternative hypothesis with .0275 of the   
 population with 95% confidence interval.

The Consumer Reports National Research Center conducted a telephone survey of 2,000 adults to learn about the major economic concerns for the future. The survey results showed that 1,760 of the respondents think the future health of Social Security is a major economic concern.
```
inf_for_prop(.88,2000,.50,.10,2,'greater')  
```
```
## Standard error Null True:  0.01118034 
## Standard error Null False:  0.007266361 
## Critical Values:  -1.644854 1.644854 
## Margin of Error:  0.0119521 
## Confidence Interval:  0.8680479 0.8919521 
## Z-value:  33.98823 
## P-Value 0
```
1. What is the point estimate of the population proportion of adults who think the future health of Social Security is a major economic concern.
```
1760/2000
```
```
## [1] 0.88
```
2. At 90% confidence, what is the margin of error?
```
  1.645*(sqrt((.88*(1-.88))/2000))
```
```
## [1] 0.01195316
```
3. Develop a 90% confidence interval (two-sided) for the population proportion of adults who think the future health of Social Security is a major economic concern. Answer:
```
 Confidence Interval:  0.8680479 0.8919521
```

Facebook was voted the most popular website, with 17% of a sample of 2,500 Internet users in the 12–17 age group using the site.

inf_for_prop(.17,2500,.50,.05,2,'greater')

## Standard error Null True:  0.01 
## Standard error Null False:  0.007512656 
## Critical Values:  -1.959964 1.959964 
## Margin of Error:  0.01472454 
## Confidence Interval:  0.1552755 0.1847245 
## Z-value:  -33 
## P-Value 0

At 95% confidence, what is the margin of error? Answer:
```
 Margin of Error:  0.01472454   
```
What is the interval estimate of the population proportion for which Facebook is the most popular website among Internet users, using a 95% two-sided confidence interval. Answer:
```
 Confidence Interval:  0.1552755 0.1847245   
```
How would your conclusions have changed if only 1,000 youths participated in the survey the the same estimate obtained?
```
inf_for_prop(.17,1000,.50,.05,2,'greater')
```
```
## Standard error Null True:  0.01581139 
## Standard error Null False:  0.01187855 
## Critical Values:  -1.959964 1.959964 
## Margin of Error:  0.02328153 
## Confidence Interval:  0.1467185 0.1932815 
## Z-value:  -20.87103 
## P-Value 0
```
Answer:
```
 The Margin of error and confidence interval gets larger.  
 Therefore we are less confident in our results  
```
1. For a similar setting, would you recommend spending the resourced to obtain another sample of size 2,500?
Answer:
```
No I would not the P value is close to 0 therefor there is statistical evidence,  
additionally the confidence interval and Moe are small, meaning we are accurate. 
```
Although you would get a more accurate estimate because you have reduced sampling error, you probably do not need to the increased accuracy. Although you do have a more accurate estimate (less sampling error), you don’t act upon that information in any way. So, arguable it is not needed to pay for the additional costs of the larger sample size.

How would your conclusions have changed if only 100 youths participated in the survey the the same estimate obtained?

inf_for_prop(.17,100,.50,.05,2,'greater')

## Standard error Null True:  0.05 
## Standard error Null False:  0.03756328 
## Critical Values:  -1.959964 1.959964 
## Margin of Error:  0.07362268 
## Confidence Interval:  0.09637732 0.2436227 
## Z-value:  -6.6 
## P-Value 0.00000000004111578

Answer:

 The Margin of error and the area between confidence interval gets larger. therefore we are less confident

What are the implications for sample size on how your use of the data and the conclusions you can draw changes?

Answer:

 The implications are, the sameple size can impact the power, and therefore introduce errors.

Eagle Outfitters is a chain of stores specializing in specializing in outdoor apparel and camping gear. They are considering a promotion that involves mailing discount coupons to all their credit card customers. This promotion will be considered a success if more than 10% of those receiving the coupons use them. Before going national with the promotion, coupons were sent to a sample of 100 credit card customers. The Eagle data file can be found here as CSV file.
Load the Data
```
eagle <- read_csv('https://www.dropbox.com/s/bpmn8uo3pz5dkjm/Eagle.csv?dl=1')  

eagle <- eagle %>% 
  mutate(Used_Coupon = as.factor(Used_Coupon))

summary(eagle)
```
```
##  Used_Coupon
##  No :87     
##  Yes:13
```
1. Develop the null and alternative hypotheses that most appropriately address the question of having more than 10% coupon usage.
  
  Answer:
```
   Ho: use <= .10
   Ha: use > .10 
```
2. The file Eagle contains the sample data. Develop a point estimate of the population proportion.
  
  Answer:
```
   point estimate = 0.13
```
3. Use \(\alpha\)=.05 to conduct your hypothesis test. Should Eagle go national with the promotion?
```
inf_for_prop(.13,100,.10,.05,1,'greater')
```
```
## Standard error Null True:  0.03 
## Standard error Null False:  0.03363034 
## Critical Values:  1.64485362695147 inf 
## Margin of Error:  0.05531699 
## Confidence Interval:  0.07468301 1 
## Z-value:  1 
## P-Value 0.1586553
```
4. What is the most appropriate confidence interval that goes along with the hypothesis test in part c?
  
  Answer:
```
    Confidence Interval:  0.07468301 - 1
```
Describe a scenario in which a marketing research team would be interested in testing the difference between two proportions using \(\alpha\)= .05.
Answer:
```
   Success of two different Ads "A|B" testing. Proporation of customers who choose Ad A vs Ad B.     
```

In a two group context, suppose that group 1 had a proportion of .75 with a sample size of 437, whereas the group 2 had a proportion of .687 with a sample size of 219.

 inf_for_prop_two(.75,437,.687,219,.05,2,'equal')

## 
## Observed Proportion Difference:  0.063 
## Pooled Estimate of Proportion (for Hypothesis Test):  0.728968 
## Standard Error of Difference in Proportions (Null True):  0.0368005 
## Standard Error of Difference in Proportions (Null Flase):  0.03756246 
## Critical Values:  -1.959964 1.959964 
## Confidence Interval:  -0.01062107 0.1366211 
## Z-Value:  1.711933 
## P-Value:  0.08690893

What is the \(p\)-value for the test of equal proportions?
Answer:
```
 P-Value: 0.08690893
```
What are the confidence intervals limits for the population difference between proportions?
Answer:
```
 Confidence Interval:  -0.01062107 0.1366211  
```
What is your conclusion regarding the null hypothesis?
Answer:
```
 Fail to Reject  
```

Suppose, instead, that the sample sizes of group 1 and group 2 were 645 and 931, respectively.

   inf_for_prop_two(.75,675,.687,931,.05,2,'equal')

## 
## Observed Proportion Difference:  0.063 
## Pooled Estimate of Proportion (for Hypothesis Test):  0.7134788 
## Standard Error of Difference in Proportions (Null True):  0.02285677 
## Standard Error of Difference in Proportions (Null Flase):  0.02255539 
## Critical Values:  -1.959964 1.959964 
## Confidence Interval:  0.01879225 0.1072077 
## Z-Value:  2.756294 
## P-Value:  0.005846035

What is the \(p\)-value for the test of equal proportions?
Answer:
```
    P-Value:  0.008542451
```
What are the confidence intervals limits for the population difference between proportions? Answer:
```
 Confidence Interval:   0.01584506 0.1041549  
```
What is your conclusion regarding the null hypothesis?
Answer:
```
 Reject the Null Hypothesis  
```

What is different about the confidence intervals in the two scenarios and what causes the difference?
Answer:
```
 The larger sample size from D. allows us to calculate a more accurate hypothesis test. 
```

The Professional Golf Association (PGA) measured the putting accuracy of professional golfers playing on the PGA Tour and the best amateur golfers playing in the World Amateur Championship (Golf Magazine). A sample of 1,075 6-foot putts by professional golfers found 688 made putts. A sample of 1,200 6-foot putts by amateur golfers found 696 made putts.
```
  inf_for_prop_two(.64,1075,.58,1200,.05,2,'equal')
```
```
## 
## Observed Proportion Difference:  0.06 
## Pooled Estimate of Proportion (for Hypothesis Test):  0.6083516 
## Standard Error of Difference in Proportions (Null True):  0.02049847 
## Standard Error of Difference in Proportions (Null Flase):  0.02042855 
## Critical Values:  -1.959964 1.959964 
## Confidence Interval:  0.01996078 0.1000392 
## Z-Value:  2.927048 
## P-Value:  0.003421956
```
1. Give the proportions of made 6-foot putts by both professional golfers and amateur golfers.
```
c(688/1075,
696/1200)
```
```
## [1] 0.64 0.58
```
2. What is the point estimate of the difference between the proportions of the two populations?
  Answer:
```
 Observed Proportion Difference:  0.06  
```
3. What is the 95% two-sided confidence interval for the difference between the two population proportions?
  Answer:
```
 Confidence Interval:  0.01996078 0.1000392  
```
4. Interpret the 95% confidence interval and provide a summary statement about the difference between the two populations.
  Answer:
```
 We would reject the null hypothesis that the two samples are equal
```
Chicago O’Hare (ORD) and Atlanta Hartsfield-Jackson (ATL) are among the busiest airports in the United States. The congestion often leads to delayed flight arrivals as well as delayed flight departures. The Bureau of Transportation tracks the on-time and delayed performance at major airports (Travel & Leisure). A flight is considered delayed if it is more than 15 minutes behind schedule. The following sample data show the delayed departures at Chicago O’Hare and Atlanta Hartsfield-Jackson airports.
```
library(tidyverse, quietly = TRUE)
Flights <- tribble(
            ~Airport, ~Flights, ~Delays,
            "ORD", 900, 252,
            "ATL", 1200, 312)
Flights
```
1. State in words — not in equation form — the null hypotheses that can be used to infer whether the population proportions of delayed departures differ at these two airports.
  Answer:
```
 The Null hypothsis is, there seems to be no difference in the delays, propotional to the amount of flights. 
```
```
inf_for_prop_two(.28,1900,.26,1200,.05,2,'equal')
```
```
## 
## Observed Proportion Difference:  0.02 
## Pooled Estimate of Proportion (for Hypothesis Test):  0.2722581 
## Standard Error of Difference in Proportions (Null True):  0.01641317 
## Standard Error of Difference in Proportions (Null Flase):  0.01632295 
## Critical Values:  -1.959964 1.959964 
## Confidence Interval:  -0.01199239 0.05199239 
## Z-Value:  1.218534 
## P-Value:  0.2230213
```
2. What is the point estimate of the proportion of flights that have delayed departures at Chicago O’Hare?
  Answer:
```
    O'hare 0.28  
```
3. What is the point estimate of the proportion of flights that have delayed departures at Atlanta Hartsfield-Jackson?
  Answer:
```
    Atlanta 0.26  
```
4. What is the \(p\)-value for the hypothesis test?
  Answer:
```
    P-Value:  0.2230213 
```
5. Provide a summary statement that describes the outcomes to the question of interest and your conclusion.
  Answer:
```
 There is not a statistical difference between the delayed flights in Chicago vs Atlanta.  
 Based on the p-value: .223
```
A Republican and a Democratic representative discussed the possibility of legislation that they two were putting forward together to a mixed audience of politically interested adults. Of those in the audience that participated in a follow-up questionnaire, 161 of 350 Republicans supported the legislation, whereas 79 of 250 Democrats supported it.
1. With a Type I error rate of .05, is there a difference in the level of support for the legislation between Republicans and Democrats? Explain your conclusion.
  Answer:
  
  Republican - 0.46
  Democrat - 0.316
  From the Above my initial assumption is that they are not equal. After my analysis below the p value supports this and would reject the null hypothesis that they are equal
```
inf_for_prop_two(.46,350,.316,250,.05,2,'equal')
```
```
## 
## Observed Proportion Difference:  0.144 
## Pooled Estimate of Proportion (for Hypothesis Test):  0.4 
## Standard Error of Difference in Proportions (Null True):  0.0405674 
## Standard Error of Difference in Proportions (Null Flase):  0.03967733 
## Critical Values:  -1.959964 1.959964 
## Confidence Interval:  0.06623387 0.2217661 
## Z-Value:  3.549648 
## P-Value:  0.0003857468
```
1. What is the \(p\)-value for the null hypothesis test?
  Answer:
```
    P-Value:  0.0003857468 
```
2. What is the 95% confidence interval for the difference in the proportions?
  Answer:
```
 Confidence Interval:  0.06623387 0.2217661   
```

Notre Dame’s MSBA Program

Ken Kelley’s Statistics for Managerial Decision Making