GPA and major. Undergraduate students taking an introductory statistics course at Duke University conducted a survey about GPA and major. The side-by-side box plots show the distribution of GPA among three groups of majors. Also provided is the ANOVA output.
Arts and Humanities Natural Sciences Social Sciences
\(H_0: \mu_1=\mu_2=\mu_3\)
\(H_A:\) At least one mean differs.
We fail to reject the null, there is no evidence to infer a difference in mean gpa.
df1 + df2 +1 = 198
Open source textbook. A professor using an open source introductory statistics book predicts that 60% of the students will purchase a hard copy of the book, 25% will print it out from the web, and 15% will read it online. At the end of the semester he asks his students to complete a survey where they indicate what format of the book they used. Of the 126 students, 71 said they bought a hard copy of the book, 30 said they printed it out from the web, and 25 said they read it online.
\(H_0:p_1=0.60, p_2=0.25, p_3=0.15\)
\(H_A:\) At least one proportion differs.
Expected counts are found by doing np
Hard copy = 1260.06 = 75.6
Printed copy = 1260.26 = 31.5
Online = 1260.15 = 18.9
Independence: We have no reason to believe that one person’s choice will influence another. No reason to doubt.
Sample Size: All expected values are greater than 5.
Degrees of freedom greater than 1. # of categories - 1. Have 3 categories, so df = 2.
Decision rule: reject the null if p-val is less than 5%
Test statistic and p-value
x=c(71,30,25)
p=c(.6,.25,.15)
chisq.test(x,p=p)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 2.3201, df = 2, p-value = 0.3135
Decision: Fail to reject the null. p-value greater than 5%.
Fail to reject the hypothesis that the distribution of textbooks is as stated by the professor.
Chicken diet and weight, Part III. In Exercises 5.29 and 5.31 we compared the effects of two types of feed at a time. A better analysis would first consider all feed types at once: casein, horsebean, linseed, meat meal, soybean, and sunflower. The ANOVA output below can be used to test for differences between the average weights of chicks on different diets.
Conduct a hypothesis test to determine if these data provide convincing evidence that the average weight of chicks varies across some (or all) groups. Make sure to check relevant conditions. Figures and summary statistics are shown below.
The IQR’s of this look they vary too greatly for ANOVA to work.
\(H_0: \mu_1=\mu_2=\mu_3=\mu_4=\mu_5=\mu_6\)
\(H_A:\) At least one mean differs.
Since the p value is 0, then we would reject the null and conclude that the type of feed does affect the weight of the chickens. But it does look like the variability across the groups is not equal.
5.42 Child care hours, Part II. Exercise 5.14 introduces the China Health and Nutrition Survey which, among other things, collects information on number of hours Chinese parents spend taking care of their children under age 6. The side by side box plots below show the distribution of this variable by educational attainment of the parent. Also provided below is the ANOVA output for comparing average hours across educational attainment categories.
\(H_0: \mu_1=\mu_2=\mu_3=\mu_4=\mu_5\)
\(H_A:\) At least one mean differs.
Although the mean for technical school attainment for the parent has a higher interquartile range than the other categories, the P value is .28 or 28%, which would fail to reject the null. The number of hours spent on child care is the same across education attainment levels.
Browsing on the mobile device. A 2012 survey of 2,254 American adults indicates that 17% of cell phone owners do their browsing on their phone rather than a computer or other device.
Null p1-p2 = 0
Alt p1-p2 =/ 0
Decision Rule: if p value is < 5% of alpha then reject the null.
Z=.21/(sqrt(.21*79)/2254)
2*(pnorm(-Z,0,1))
## [1] 0
Z score is 116
P value is 0
Because the p value is so miniscule it is returned as 0. This would reject the null. The proportion of Chinese people who browse the internet on their cell phones is far different than that of Americans.
U=.17+(1.97*sqrt(.17*.83)/2254)
L=.17-(1.97*sqrt(.17*.83)/2254)
c(L,U)
## [1] 0.1696717 0.1703283
0 does not fall anywhere near the CI, which we would expect from such a high Z score.
College smokers. We are interested in estimating the proportion of students at a university who smoke. Out of a random sample of 200 students from this university, 40 students smoke.
U=.2+1.97*(sqrt(.2*.8/200))
L=.2-1.97*(sqrt(.2*.8/200))
c(L,U)
## [1] 0.14428 0.25572
I’m not sure if phat applies but I don’t know what else to use for the Confidence Interval. I don’t know how I could possibly get the mean or standard deviation.
True or false, Part I. Determine if the statements below are true or false. For each false statement, suggest an alternative wording to make it a true statement.
The chi-square distribution, just like the normal distribution, has two parameters, mean and standard deviation.
False, there is only one parameter, degrees of freedom.
The chi-square distribution is always right skewed, regardless of the value of the degrees of freedom parameter.
True.
The chi-square statistic is always positive.
True because each difference is squared.
As the degrees of freedom increases, the shape of the chi-square distribution becomes more skewed.
False, it becomes less skewed and more normal as the degrees of freedom increases.
Evolution vs. creationism. A Gallup Poll released in December 2010 asked 1019 adults living in the Continental U.S. about their belief in the origin of humans. These results, along with results from a more comprehensive poll from 2001 (that we will assume to be exactly accurate), are summarized in the table below:
Humans evolved, with God guiding 1019.38 = 387
Humans evolved, but God had no part in process 1019 .16 = 163
God created humans in present form 1019.40 = 408
Other / No opinion 1019.06 = 61
\(H_0:p_1=0.37, p_2=0.12, p_3=0.45, p_4=.06\)
\(H_A:\) At least one proportion differs.
1019.37 = 377
1019.12 = 122
1019.45 = 459
1019.06 = 61
Independence: We have no reason to believe that one person’s choice will influence another. No reason to doubt.
Sample Size: All expected values are greater than 5. However we don’t have an n for 2001.
Degrees of freedom greater than 1. # of categories - 1. Have 4 categories, so df = 3.
Decision rule: reject the null if p-val is less than 5%
Test statistic and p-value
x=c(387,163,408,61)
p=c(.37,.12,.45,.06)
chisq.test(x,p=p)
##
## Chi-squared test for given probabilities
##
## data: x
## X-squared = 19.397, df = 3, p-value = 0.0002263
The p-value is less than 5% so we reject the null. People’s beliefs about the origins of human life has changed between 2001 and 2010.