Let \(\mu\) be the true average awesomeness of your instructor.
H\(_{0}: \mu = 0\) Don’t forget 2 spaces at the end of this line
H\(_{A}: \mu > 0\) Alternatively you can simply write H0 and Ha
Check assumptions: We can assume that the awesomness measure has a normal distribution since the sample size is large.
# If the data is available, use this space to graphically check the normality of the data. # Import the data here and set the code chunk to evaluate
t.test(awesomedata, mu=0, alternative="greater")State your decision to reject or fail to reject here.
Reject \(H_{0}\), the p-value is less than \(\alpha=.05\).
V. State your conclusion and justify it with a p-value. There is sufficient reason to believe that the true average awesomeness of your instructor is greater than 0 (p<.0001).
Exercise 4.16 presents the results of a 2006-2010 survey showing that the average age of woemn at first marriage is 23.44. Suppose a social scientist believes that this value has increased in 2012, but she woul dalso be interested if she found a decrease. Below is how she set up her hypothesis. Indicate any errors you see.
\(H_{0}: \mu = 23.44\) year old female at age of first marriage
\(H_{A}: \mu > 23.44\) year old female at age of first marriage
I see an error in her test in that she is only looking at if the data were greater than 23.44 years old at time of first marriage. And that is what her alternative hypothesis is that she is currently testing would be. She would instead need to conduct a two sided test, or a two sided t-test, to get the results that she wants and in that event she would have to instead make her alternative hypothesis not equal to 23.44. This would fix her error in her hypotheses.
Exercise 4.13 provides a 95% CI for the mean waiting time at an emergency room (ER) of (128 min., 147 min.). Answer the following questions based on this interval.
3*60## [1] 180
Since three hours is equal to 180 minutes, then no, the confidence interval is 95% certain that the waiting time is between 128 min and 147 min, therefore, the local newspaper must be inflating the numbers to get more readers. It is an error in their reporting.
2*60## [1] 120
Since two hours is 120 minutes, then I would say that yes, the Dean of Medicine is closer to getting the actual waiting time than the local newspaper, but is still a little conservative. In fact the Dean is off by 8 to 27 minutes. So no, this claim is not supported by the CI of 95% for the mean of the waiting time at an ER to be between 128 minutes and 147 minutes.
Since the Deans estimate was close to the 95% confidence interval, but was still outside the confidence interval by 8 to 27 minutes, then I would suppose that no, the dean would not be correct in stating that his estimate is in the 99% CI. In fact, I would expect that his estimate would actually be in the 90% or 85% CI instead. To make a 99% CI, you would narrow your dataset, not expand it.
The nutrition label on a bag of potato chips says that a one ounce (28 gram) serving of potato chips has 130 calories and contains ten grams of fat, with three grams of saturated fat. A random sample of 35 bags yielded a sample mean of 134 calories with a standard deviation of 17 calories. Is there evidence that the nutrition label does not provide an accuarte measure of calories in the bags of potato chips? We have verified the independence, sample size, and skew conditions are satisfied.
Given the data from the randomly sampled bag of potato chips, and using one standard deviation away, 68.27% of the sampled potato chips would be between 117 and 151 calories. If we were to increase this to two standard deviations away, which would include 95% of the data, then we would get 100 to 168 calories per bag of potato chips. Based on this data, 95% of the randomly sampled chip data set, the mean of 134 would be almost exactly in between 100 and 168. And since this 2 s.d. accounts for 95% of the data, indicating likely that the entire population would fall under the average 130 calories per bag of chips. In order to be 100% accurate however, I would like to take a larger sample size than just 35, perhaps double it, and see if we can’t increase or decrease our SD and our margin of error to truly determine the accuracy of our test.
According to Wikipedia, the average age for a woman in the US to get married is 28 years (http://en.Wikipedia.org/wiki/Age_at_first_marriage). The average age at first marriage of 5,534 US women who responded to the National Survey of Family Growth (NSFG) conducted by the CDC in the 2006 and 2010 cycle was 23.4 Is there reason to believe that women who respond to the NSFG survey marry significantly earlier than the average woman?
library(ggplot2)
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(openintro)## Please visit openintro.org for free statistics materials
##
## Attaching package: 'openintro'
## The following object is masked from 'package:datasets':
##
## cars
marriage <-read.csv("C:/Users/Aithne/Documents/R/ageAtMar.csv", header=TRUE, stringsAsFactors = FALSE)Let \(\mu\) be the true average age of women who took the NSFG survey at the time of their first marriage in the USA. The wikipedia page claims that the \(\mu\) will be 28 years of age. So we now can set our hypothesis testing.
\(H_{0}: \mu = 28\) year old female at age of first marriage
\(H_{A}: \mu > 28\) year old female at age of first marriage
The most appropriate statistical method to use would be Central Limit Theorum. We have a large sample size, all samples are independent observations and assuming normality of the data set. Therefore, we will conduct a one sample z test for a mean.
ggplot(marriage, aes(x=age)) + geom_histogram(aes(y=..density..), color="black", fill=NA) + stat_function(fun = dnorm, color="red", args = list(mean = mean(marriage$age), sd = sd(marriage$age)))## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
qqnorm(marriage$age)
qqline(marriage$age, col="red")Looking at the number of samples in the marriage data set, we have over 5,000 samples. This would qualify as a large sample size, because it is over 30, according to the CLT. Next, we can look at the Q-Q plot to determine if this data is normal. We have it plotted against a red dotted line and this data follows a normal curve here. Finally we can examine the histogram which gives us a fairly normal looking bell curve, with a few outliers in the 20 year range, but overall we can accept the condition that this data follows normal parameters. Finally we have to determine if the all of the individual cases are indepedent. We can assume that they are in fact independent. When one adult female gets married has no bearing on when another women would marry, and so we can also assume that these meet the independent observation conditions. Now that all of our conditions have been met for the CLT, we can examine our dataset.
t.test(marriage$age, mu=28, alternative="less")##
## One Sample t-test
##
## data: marriage$age
## t = -71.845, df = 5533, p-value < 2.2e-16
## alternative hypothesis: true mean is less than 28
## 95 percent confidence interval:
## -Inf 23.5446
## sample estimates:
## mean of x
## 23.44019
t.test(marriage$age, mu=28, alternative="two.sided")##
## One Sample t-test
##
## data: marriage$age
## t = -71.845, df = 5533, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 28
## 95 percent confidence interval:
## 23.31577 23.56461
## sample estimates:
## mean of x
## 23.44019
The first test that we get provides us with the following information: \(xbar\) = 23.4 and our p-vale is 0.0001. Since our p-value of 0.0001 < \(alpha\) , we will reject the null hypothesis. Additionally, we found running a two tailed test for our confidence intervals that we are 95% confident that the average age of women in the USA at the time of their first marriage is between 23.3 and 23.4 years of age.
There is very strong evidence to believe that the average age at time of first marriage for women who took the NSFG survey in the USA is less than what is claimed on wikipedia. Therefore, we reject our null hypothesis, because there is sufficient evidence to support our alternative, that women on average are less than 28 years old at the time of their first marriage. And, we are 95% confident that the average age of women in the USA at time of her first marriage is actually between 23.3 and 23.4 years of age based on the women who took the NSFG survey.
I do think this is reasonable with a few caveats. One, this data is widely sourced from any and all prius model drivers of 2012 prius’. However, this data is collected by people who willingly submit it freely, not every prius driver, just those who happened to volunteer their information. This to me seems like conveinence sampling. Two, there are only 14 users who drive prius that were sampled and added to the histogram, in order to avoid bias and actually utilize the CLT, we need at least independent samples, no skew, and over 30 samples. We would need to almost double our sample size in order to accurately use this dataset.
Let \(\mu\) be the average MPG a 2012 Prius gets both (city and highway combined).
H${0}: = 50 MPG $ H${A}: /= 50 MPG $
Check assumptions: We can assume that the awesomness measure has a normal distribution since the sample size is large. We must assume that these are independent variables, that they are beyond a skew, that they are normal in their distribution and that the sample size is large, i.e., above 30 samples drawn.
Compare your data to the hypothesized value by calculating a test statistic and a p-value. Now we would conduct our tests and compare our test statistic to our p-value, assuming that we chose alpha = 0.05.
State your decision to reject or fail to reject here.
Reject \(H_{0}\), the p-value is less than \(\alpha=.05\). Accept \(H_{a}\), the p-value is greater than or equal to \(\alpha=.05\)
V. State your conclusion and justify it with a p-value.
Here we would state our conclusion depending on if we found out that our p-value compared to the alpha value was less than, equal to or greater than.
Since 2 standard deviations out from our mean would give us a 95% chance that our sample data would fall within these two intervals, then I would estimate a 95% confidence interval as within 2 SD from the mean, in other words, between 39.6 and 60.4 MPG the average 2012 Prius gets to the gallon. However, if I actually had the data, I would conduct a t.test. I would state the r code for:
t.test(prius$mpg, mu=50, alternative=“less”)
t.test(prius$mpg, mu=50, alternative=“two.sided”)
In February 2013, two CA residents filed a class-action lawsuit against Anheuser-Busch, alleging the company was watering down beers to boost profits. They argued that because water was being added, the true alcohol content of the beer by volume is less than the advertized amount. They alleged that Budweiser beer has an alcohol content by volume of 4.7% instead of the stated 5%. A media outlet picked up on this suit and hired independent labs to test samples of Budweiser beer and find the alcohol content. Five cans were measured, and the alcohol content was 4.94, 5.00 and 4.99, 4.95 and 4.90.
bud <- c(4.94, 5.00, 4.99, 4.95, 4.90)I. Let $ $ be the average content of alcohol claimed by the plaintifs.
Let \(H_{0} \mu /= 4.7\)
hist(bud)var(bud)## [1] 0.00163
sd(bud)## [1] 0.04037326
summary(bud)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.900 4.940 4.950 4.956 4.990 5.000
t.test(bud, mu=4.7, alternative="less")##
## One Sample t-test
##
## data: bud
## t = 14.179, df = 4, p-value = 0.9999
## alternative hypothesis: true mean is less than 4.7
## 95 percent confidence interval:
## -Inf 4.994491
## sample estimates:
## mean of x
## 4.956
t.test(bud, mu=4.7, alternative="two.sided")##
## One Sample t-test
##
## data: bud
## t = 14.179, df = 4, p-value = 0.0001437
## alternative hypothesis: true mean is not equal to 4.7
## 95 percent confidence interval:
## 4.90587 5.00613
## sample estimates:
## mean of x
## 4.956
Assuming alpha = 0.05, from our p-value = 0.9999, we can state that our p-value > alpha, therefore we do not reject \(H_{0} \mu = 4.7\) . There is insufficient evidence to support \(H_{a}\) .
V. Make a decision about the hypothesis and state your conclusion in full English sentences in the context of the research hypothesis using no symbos or jargon.
Since our P-value is greater than 0.05, we have some evidence against \(H_{o}\) , however at this time, we are not going to reject the null hypothesis, as there is insufficient evidence to support our alternative. In english terms this means that we do not reject the Anheiser-Budweiser’s statement that the average alcohol content by volume of a Budweiser beer is 4.7%, we state this because based on our t-test analysis there is insufficient evidence to support that the average alcohol content by volume of a Budweiser beer is not equal to 4.7%. We are 95% confident that the true interval of alcohol content by volume of a Budweiser beer lies between 4.90 and 5.00%.
A type one error woud be committed if we rejected the null hypothesis when it is in fact true. The probability of making such an error, a Type 1 error, is alpha = 0.05. In the circumstance of this problem, a Type I error would be if we decided to reject the null hypothesis based upon our 0.9999 p-value that we received from our test. This would invalidate our test, and it would become erroneous.
The circumstances in which we would have a Type one error in this situation would be almost impossible. It is very difficult to accept a p-value of 0.9999. In order to reject our null, we would need a p-value of under 0.05, with a P-value as high as we got for this test, it is almost nonexistant that we would receive a Type-1 error in this test.