Name(s):
Matt Bufalino, Mary Keller, Will Mason, Jackie Cisneros, Paul Sliwka
Questions
Inference for One and Two Proportions
# The answer is....The standard Error of the significance test is .0275 so, z= -2.723524
p <- .275
p
## [1] 0.275
SE <- sqrt((.35*(1-.35))/300)
SE
## [1] 0.02753785
z <- (p-.35)/SE
z
## [1] -2.723524
What is the \(p\)-value?
# The answer is....The area more extreme than +/- 2.7272 is near zero, say < .006 At \(\alpha\)=.05, what is your conclusion?
# The answer is....Reject the Null Hypothesis What is/are your conclusions?
Answer:.35 is excluded from the upper confidence interval (i.e. , .35 is higher than the range of plausible parameter values(CI (.224- .326)). and thus we accept the alternative hypothesis with .275 of the population , with 95% confidence interval.
# The answer is.... If success are denoted as 1 and failures denoted as 0, the mean of the 0s and 1s is the proportion of success. There are 1760 1s over the n of 2000. Therefore the Point Estimate is .88
PE <- 1760/2000
PE
## [1] 0.88
# The answer is..... 0.01195316
SE <- sqrt((.88*(1-.88))/2000)
SE
## [1] 0.007266361
MOE <- 1.645*SE
MOE
## [1] 0.01195316
# The answer is.... CI90 = (0.8919532,0.8680468)
PE + MOE
## [1] 0.8919532
PE - MOE
## [1] 0.8680468 # The answer is....0.01472481
PE2 <- .17
n2 <- 2500
SE2 <- sqrt((PE2 *(1-PE2))/n2)
SE2
## [1] 0.007512656
MOE2 <- 1.96*SE2
MOE2
## [1] 0.01472481
# The answer is.... CI95 = (0.1552752,0.1847248)
PE2 - MOE2
## [1] 0.1552752
PE2 + MOE2
## [1] 0.1847248
# The answer is....CI95=[0.146718, 0.193282] which is .17 +/- 0.02328196 so the CI got wider with a smaller sample size.
PE3 <- .17
n <- 1000
SE3 <- sqrt((PE3*(1-PE3))/n)
SE3
## [1] 0.01187855
MOE3 <- 1.96*SE3
MOE3
## [1] 0.02328196
PE3 - MOE3
## [1] 0.146718
PE3 + MOE3
## [1] 0.193282
i. For a similar setting, would you recommend spending the resourced to obtain another sample of size 2,500?
```r
# The answer is.... Although you would get a more accurate estimate because you have reduced sampling error, you probably do not need to the increased accuracy. Although you do have a more accurate estimate (less sampling error), you don't act upon that information in any way. So, arguable it is not needed to pay for the additional costs of the larger sample size.
```
# The answer is.... CI95=[0.09637597, 0.243624] which is .17 +/- 0.03756328 so the CI got wider with a smaller sample size.
PE4 <- .17
n4 <- 100
SE4 <- sqrt((PE4*(1-PE4))/n4)
SE4
## [1] 0.03756328
MOE4 <- 1.96*SE4
MOE4
## [1] 0.07362403
PE4 - MOE4
## [1] 0.09637597
PE4 + MOE4
## [1] 0.243624
a. What are the implications for sample size on how your use of the data and the conclusions you can draw changes?
```r
# The answer is....We demonstrate that 17% voted Facebook as most popular website from a sample size of 2,500 Internet users in the 12–17 age group using the site.CI95 = (0.1552752,0.1847248).
``` PE5 <- 10/100
PE5
## [1] 0.1
##**Answer**: The value statistic is the standard error for the significance test is 0.029. So, z= 0.0.349 with a critical value of 1.165, to right of distribution. The area more extreme than +/- .0.349, say <.363 . Thus we fail to reject the null hypothesis
n5 <- 100
PE5 <- .10
null <- .09
SEforhyptest <- sqrt((null*(1-null))/n5)
SEforhyptest
## [1] 0.02861818
SEforCI <- sqrt((PE5*(1-PE5))/n5)
SEforCI
## [1] 0.03
zvalue <- (PE5-null)/SEforhyptest
zvalue
## [1] 0.3494283
d. What is the most appropriate confidence interval that goes along with the hypothesis test in part c?
```r
# The answer is.... CI95 =[.051, 1]
MOE5 <- 1.96*SEforCI
MOE5
```
```
## [1] 0.0588
```
```r
PE5 + MOE5
```
```
## [1] 0.1588
```
```r
PE5 - MOE5
```
```
## [1] 0.0412
```
Describe a scenario in which a marketing research team would be interested in testing the difference between two proportions using \(\alpha\)= .05.
Answer: What is the difference in between the proportion of credit card holders making purchase who received a 10% coupon as compared to a 20% coupon.
In a two group context, suppose that group 1 had a proportion of .75 with a sample size of 437, whereas the group 2 had a proportion of .687 with a sample size of 219.
# The answer is.... P-value >.05 at 0.0869,
n1 <- 437
n2 <- 219
p1_positive <- .75
p2_positive <- .687
diff <- p1_positive - p2_positive
diff
## [1] 0.063
pooled_proportion <- (n1* p1_positive + n2* p2_positive)/(n1 + n2)
pooled_proportion
## [1] 0.728968
SE_diff_proportion <- sqrt(pooled_proportion*(1-pooled_proportion)*(1/n1 + 1/n2))
SE_diff_proportion
## [1] 0.0368005
z.test.diff <- diff/SE_diff_proportion
z.test.diff
## [1] 1.711933
p.test.diff <- 2*(1-pnorm(abs(z.test.diff)))
p.test.diff
## [1] 0.08690893
# The answer is....CI95=[-0.0091,0.1351] which is .063 +/- 0.0721
alpha <- .05
C <- 1-alpha
MOE.diff.prop <- SE_diff_proportion*qnorm(1-alpha/2)
diff
## [1] 0.063
MOE.diff.prop
## [1] 0.07212765
c(diff - MOE.diff.prop, diff, diff + MOE.diff.prop)
## [1] -0.009127646 0.063000000 0.135127646
# The answer is.... P-value >.01 (0.0065)
n3 <- 645
n4 <- 931
p3_positive <- .75
p4_positive <- .687
diff2 <- p3_positive - p4_positive
diff2
## [1] 0.063
pooled_proportion2 <- (n3* p3_positive + n4* p4_positive)/(n3 + n4)
pooled_proportion2
## [1] 0.7127836
SE_diff_proportion2 <- sqrt(pooled_proportion2*(1-pooled_proportion2)*(1/n3 + 1/n4))
SE_diff_proportion2
## [1] 0.02317965
z.test.diff2 <- diff2/SE_diff_proportion2
z.test.diff2
## [1] 2.717901
p.test.diff2 <- 2*(1-pnorm(abs(z.test.diff2)))
p.test.diff2
## [1] 0.006569743
i. What are the confidence intervals limits for the population difference between proportions?
# The answer is.... CI95=[0.0176, 0.1084] which is .0630 +/- .0454
alpha2 <- .05
C2 <- 1-alpha2
MOE.diff.prop2 <- SE_diff_proportion2*qnorm(1-alpha2/2)
diff2
## [1] 0.063
MOE.diff.prop2
## [1] 0.04543128
c(diff2 - MOE.diff.prop2, diff2, diff2 + MOE.diff.prop2)
## [1] 0.01756872 0.06300000 0.10843128
ii. What is your conclusion regarding the null hypothesis?
**Answer**: With a larger sample size we can reject the null hypothesis.
# The answer is.... Professional : 0.64 and Amateur: 0.58
np <- 1075
na <- 1200
pp_positive <- 688/1075
pp_positive
## [1] 0.64
pa_positive <- 696/1200
pa_positive
## [1] 0.58
b. What is the point estimate of the difference between the proportions of the two populations?
# The answer is.... .06
PGAdiff <- pp_positive - pa_positive
PGAdiff
## [1] 0.06
c. What is the 95% two-sided confidence interval for the difference between the two population proportions?
# The answer is.... CI95= (0.0198, 0.1002 which is 0.06000000 +/- 04018
PGApooled_proportion <- (np* pp_positive + na* pa_positive)/(np + na)
PGApooled_proportion
## [1] 0.6083516
PGASE_diff_proportion <- sqrt(PGApooled_proportion*(1-PGApooled_proportion)*(1/np + 1/na))
PGASE_diff_proportion
## [1] 0.02049847
PGAz.test.diff <- PGAdiff/PGASE_diff_proportion
PGAz.test.diff
## [1] 2.927048
PGAp.test.diff <- 2*(1-pnorm(abs(PGAz.test.diff)))
PGAp.test.diff
## [1] 0.003421956
alpha <- .05
PGAC <- 1-alpha
PGAMOE.diff.prop <- PGASE_diff_proportion*qnorm(1-alpha/2)
PGAdiff
## [1] 0.06
PGAMOE.diff.prop
## [1] 0.04017625
c(PGAdiff - PGAMOE.diff.prop, PGAdiff, PGAdiff + PGAMOE.diff.prop)
## [1] 0.01982375 0.06000000 0.10017625
a. Interpret the 95% confidence interval and provide a summary statement about the difference between the two populations.
**Answer**: P-value < .01 (.003421956) so we to reject the null hypothesis with a z value of 2.927 compared to the critical +/- 1.960.
library(tidyverse, quietly = TRUE)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3 v purrr 0.3.4
## v tibble 3.0.4 v dplyr 1.0.2
## v tidyr 1.1.2 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Flights <- tribble(
~Airport, ~Flights, ~Delays,
"ORD", 900, 252,
"ATL", 1200, 312)
Flights
a. State in words --- not in equation form --- the null hypotheses that can be used to infer whether the population proportions of delayed departures differ at these two airports.
**Answer**:
b. What is the point estimate of the proportion of flights that have delayed departures at Chicago O’Hare? ORD : .28
# The answer is....
ORDn <- 900
ORDp <- 252/900
ORDp
## [1] 0.28
b. What is the point estimate of the proportion of flights that have delayed departures at Atlanta Hartsfield-Jackson?
# The answer is.... ATL : .26
ATLn <- 1200
ATLp <- 312/1200
ATLp
## [1] 0.26
c. What is the $p$-value for the hypothesis test?
# The answer is....P-value is >.05 (0.3061511) thus we fail to reject the null hypothesis.
Flightspooled_proportion <- (ORDn* ORDp + ATLn* ATLp)/(ORDn + ATLn)
Flightspooled_proportion
## [1] 0.2685714
FlightsSE_diff_proportion <- sqrt(Flightspooled_proportion*(1-Flightspooled_proportion)*(1/ORDn + 1/ATLn))
FlightsSE_diff_proportion
## [1] 0.01954401
Flightsdiff <- ORDp - ATLp
Flightsdiff
## [1] 0.02
Flightsz.test.diff <- Flightsdiff/FlightsSE_diff_proportion
Flightsz.test.diff
## [1] 1.023332
Flightsp.test.diff <- 2*(1-pnorm(abs(Flightsz.test.diff)))
Flightsp.test.diff
## [1] 0.3061511
alpha <- .05
FlightsC <- 1-alpha
FlightsMOE.diff.prop <- FlightsSE_diff_proportion*qnorm(1-alpha/2)
Flightsdiff
## [1] 0.02
FlightsMOE.diff.prop
## [1] 0.03830555
c(Flightsdiff - FlightsMOE.diff.prop, Flightsdiff, Flightsdiff + FlightsMOE.diff.prop)
## [1] -0.01830555 0.02000000 0.05830555
d. Provide a summary statement that describes the outcomes to the question of interest and your conclusion.
**Answer**: There is not a statisically significant difference in delayed departures at Chicago O’Hare and Atlanta Hartsfield-Jackson airports based on a Pvalue >.05 with a z-value of 1.023 with a critcial z value +/- 1.960.
# The answer is...P-Value is < .001 (0.0003857468)
RepN <- 350
DemN <- 250
Rep_P_Positive <- 161/350
Rep_P_Positive
## [1] 0.46
Dem_P_Positive <- 79/250
Dem_P_Positive
## [1] 0.316
Politicspooled_proportion <- (RepN* Rep_P_Positive + DemN* Dem_P_Positive)/(RepN + DemN)
Politicspooled_proportion
## [1] 0.4
PoliticsSE_diff_proportion <- sqrt(Politicspooled_proportion*(1-Politicspooled_proportion)*(1/RepN + 1/DemN))
PoliticsSE_diff_proportion
## [1] 0.0405674
Politicsdiff <- Rep_P_Positive - Dem_P_Positive
Politicsdiff
## [1] 0.144
Politicsz.test.diff <- Politicsdiff/PoliticsSE_diff_proportion
Politicsz.test.diff
## [1] 3.549648
Politicsp.test.diff <- 2*(1-pnorm(abs(Politicsz.test.diff)))
Politicsp.test.diff
## [1] 0.0003857468
b. What is the 95% confidence interval for the difference in the proportions?
# The answer is.... CI95= (.0645, 0.2235) which is 0.14400000 +/- 0.0795
alpha <- .05
PoliticsC <- 1-alpha
PoliticsMOE.diff.prop <- PoliticsSE_diff_proportion*qnorm(1-alpha/2)
Politicsdiff
## [1] 0.144
PoliticsMOE.diff.prop
## [1] 0.07951065
c(Politicsdiff - PoliticsMOE.diff.prop, Politicsdiff, Politicsdiff + PoliticsMOE.diff.prop)
## [1] 0.06448935 0.14400000 0.22351065