Young millennials, adults aged 18 to 34, are viewed as the future of the restaurant industry. During 2011, this group consumed a mean of 192 restaurant meals per person (nPd group website, november 7, 2012). Conduct a hypothesis test to determine if the poor economy caused a change in the frequency of consuming restaurant meals by young millennials in 2012.
formulate hypotheses that can be used to determine whether the annual mean number of restaurant meals per person has changed for young millennials in 2012.
based on a sample, the nPd group stated that the mean number of restaurant meals consumed by young millennials in 2012 was 182. assume the sample size was 150 and that, based on past studies, the population standard deviation can be assumed to be 55. use the sample results to compute the test statistic and p-value for your hypothesis test.
# set up variables
n = 150
std_dev = 55
m0 = 192
m = 182
# compute z value (test statistic)
denom = std_dev/sqrt(n)
z_value = (round((m - m0)/denom, digits=4))
z_value
## [1] -2.2268
# compute p value
p_value = 2 * pnorm(abs(z_value), lower.tail = FALSE)
p_value
## [1] 0.02596064
# double-checking
library(TeachingDemos)
# since this a 2-sided test can accept the default 2-sided
z.test(m, mu=m0, sd=std_dev, n=n)
##
## One Sample z-test
##
## data: m
## z = -2.2268, n = 150.0000, Std. Dev. = 55.0000, Std. Dev. of the sample
## mean = 4.4907, p-value = 0.02596
## alternative hypothesis: true mean is not equal to 192
## 95 percent confidence interval:
## 173.1983 190.8017
## sample estimates:
## mean of m
## 182
at a significance of .05, what is your conclusion?
My sample p value is 0.026. This means that if the true mean was still 192, and I took 1000 samples, I would expect 26 of them to have means this far away from 192. Comparing it to the significance of 5% (50 out of 1000 samples would be this extreme), my sample is less likely then the threshold I have set. Therefore, at a significance level of 5% I reject H0.
At a significance level of 5%, the data provides evidence of a change in the frequency of consuming restaurant meals by young millennials in 2012.
fowle Marketing research, inc., bases charges to a client on the assumption that tele- phone surveys can be completed in a mean time of 15 minutes or less. if a longer mean survey time is necessary, a premium rate is charged. a sample of 35 surveys provided the survey times shown in the file named fowle. based upon past studies, the population standard deviation is assumed known with s = 4 minutes. is the premium rate justified?
first import data
library(readxl)
Fowle <- read_excel("~/StMary's/Data Analysis with R/Hypothesis Testing/Fowle.xlsx")
formulate the null and alternative hypotheses for this application.
\(H0: mu \le 15\) \(HA: mu > 15\)
note: this is a one-sided (upper) test
compute the value of the test statistic.
# set up variables
n = 35
m0 = 15
std_dev = 4
# get mean
m = mean(Fowle$Time)
# compute test statistic
# compute z value (test statistic)
denom = std_dev/sqrt(n)
z_value = (round((m - m0)/denom, digits=4))
z_value
## [1] 2.958
what is the p-value?
# since this is one-sided don't multiply by 2
p=pnorm(z_value, lower.tail=F)
p
## [1] 0.001548211
# double check
library(TeachingDemos)
z.test(17, 15, sd=4, n=35, alternative = "greater")
##
## One Sample z-test
##
## data: 17
## z = 2.958, n = 35.00000, Std. Dev. = 4.00000, Std. Dev. of the sample
## mean = 0.67612, p-value = 0.001548
## alternative hypothesis: true mean is greater than 15
## 95 percent confidence interval:
## 15.88788 Inf
## sample estimates:
## mean of 17
## 17
at a = .01, what is your conclusion?
My p-value
print(paste("My p value of", p, "is < then the significance of 0.01%. Therefore, I reject H0. At a significance of 1%, the data provides evidence that the premium rate is justified"))
## [1] "My p value of 0.00154821061985595 is < then the significance of 0.01%. Therefore, I reject H0. At a significance of 1%, the data provides evidence that the premium rate is justified"
The national mean annual salary for a school administrator is $90,000 a year (the Cincinnati Enquirer, april 7, 2012). a school official took a sample of 25 school administrators in the state of ohio to learn about salaries in that state to see if they differed from the national average.
formulate hypotheses that can be used to determine whether the population mean annual administrator salary in ohio differs from the national mean of $90,000.
\(H_0: \mu = 90,000\)
\(H_a: \mu \ne 90,000\)
the sample data for 25 ohio administrators is contained in the file named administra- tor. what is the p-value for your hypothesis test in part (a)?
Note: we do not have the sample standard deviation. Therefore, we need to use the t-distribution. We also need to calculate a new variable, degrees of freedom.
import data
library(readxl)
Administrator <- read_excel("Administrator.xlsx")
calculate mean & run ztest
# get sample mean & standard deviation
m = mean(Administrator$Salary)
sample_std = sd(Administrator$Salary)
# set up variables
m0 = 90000
n = 25
# calculate t value
t =round((m - m0) / (sample_std/sqrt(n)), digits=3)
t
## [1] -2.141
# calculate p value
p = round(2 * pt(abs(t), n-1, lower.tail = FALSE), digits=3) # multiply by 2 because it's two-sided
p
## [1] 0.043
at a = .05, can your null hypothesis be rejected? what is your conclusion?
print(paste("Reject H0. Since my p value of", p, "< the significance of 5%, at this significance level the dataset supports the conclusion that the mean salary in Ohio differs from the national mean survey of 90,000"))
## [1] "Reject H0. Since my p value of 0.043 < the significance of 5%, at this significance level the dataset supports the conclusion that the mean salary in Ohio differs from the national mean survey of 90,000"
Remember, p value is the likelihood of getting a sample like this (or less likely) if H0 is true. This sample is less likely then the threshold we have set if H0 is true. Therefore reject H0.
tcrit = round(qt(0.025,n-1), digits=3)
print(paste("Sample t value of", t, "< critical t value of", tcrit))
## [1] "Sample t value of -2.141 < critical t value of -2.064"
print(paste("Reject H0. Since the sample t value is lower then the critical t value, at 5% significance the data indicates the Ohio salaries differ from national salaries."))
## [1] "Reject H0. Since the sample t value is lower then the critical t value, at 5% significance the data indicates the Ohio salaries differ from national salaries."
note: email & ask about H0/ HA! # Problem 4: Question 33. The mean annual premium for automobile insurance.
the mean annual premium for automobile insurance in the united States is 1503 (insure.com website, March 6, 2014). being from Pennsylvania, you believe automobile insurance is cheaper there and wish to develop statistical support for your opinion. a sample of 25 automobile insurance policies from the state of Pennsylvania showed a mean annual premium of 1440 with a standard deviation of s = $165.
$H0: $ $HA: < 1503 $
Note: we don’t have the population standard deviation so we must use the t distribution and degrees of freedom.
# set up variables
x0 = 1503
xbar = 1440
sample_std_dev = 165
n = 25
# compute point estimate difference
x0 - xbar
## [1] 63
# compute test statistic (t value)
t = (xbar-x0) / (sample_std_dev/sqrt(n))
t
## [1] -1.909091
# compute p value
p = pt(t, n-1)
p
## [1] 0.03413866
# equivalent to pt(abs(t), n-1, lower.tail = F)
# compute critical t value
t_crit = qt(0.05, n-1)
t_crit
## [1] -1.710882
p value < 0.05 so At a significance level of 0.05, reject H0.
At a significance level of 0.05, our data provides evidence that the mean annual premium in Pennsylvania is lower than the national mean annual premium.
Eagle outfitters is a chain of stores specializing in outdoor apparel and camping gear. they are considering a promotion that involves mailing discount coupons to all their credit card customers. this promotion will be considered a success if more than 10% of those receiving the coupons use them. before going national with the promotion, coupons were sent to a sample of 100 credit card customers.
develop hypotheses that can be used to test whether the population proportion of those who will use the coupons is sufficient to go national.
H0: mean <= 10% HA: mean > 10%
# import data
library(readxl)
Eagle <- read_excel("Eagle.xlsx")
# get yes/ no values
table(Eagle$`Used Coupon?`)
##
## No Yes
## 87 13
# calculate point estimate
no = 87
yes = 13
pbar = yes/(no+yes)
# calculate z value
z=(pbar - 0.10)/sqrt(0.10*(1-0.10)/100)
# calculate p value
p=round(pnorm(z,lower.tail=F), digits=3)
print(paste("z value is",z, "and p value is", p))
## [1] "z value is 1 and p value is 0.159"
print("is the p value less then the significance level of 0.05?")
## [1] "is the p value less then the significance level of 0.05?"
print(p<0.015)
## [1] FALSE
At the significance level of 5%, we are unable to reject H0. Eagle Outfitters should not go through with the promotion.
In recent years more people have been working past the age of 65. in 2005, 27% of people aged 65–69 worked. a recent report from the organization for economic co- operation and development (oecd) claimed that the percentage working had increased (USa today, november 16, 2012). the findings reported by the oecd were consistent with taking a sample of 600 people aged 65–69 and finding that 180 of them were working. 1. develop a point estimate of the proportion of people aged 65–69 who are working.
pbar = 180/ 500
pbar
## [1] 0.36
\(H_0: \mu <= 27%\)
\(H_a: \mu > 27%\)
# set up variables
# we already set up pbar = 180/500
p0 = 0.27
n = 500
# compute test statistic (z value)
z = (pbar - p0)/(sqrt(p0 * (1 - p0)/n))
# compute p value
p = pnorm(z, lower.tail=F)
p
## [1] 2.907809e-06
# compare p value to significance level of 5%
print("Is the probability of finding a mean this low less then the probability we have chosen as the level of significance?")
## [1] "Is the probability of finding a mean this low less then the probability we have chosen as the level of significance?"
print(p < 0.05)
## [1] TRUE
print("Reject H0 at 5% significance. At a significance level of 5%, this data provides strong evidence that the % of older people working has increased.")
## [1] "Reject H0 at 5% significance. At a significance level of 5%, this data provides strong evidence that the % of older people working has increased."