Lab 2 Assignment MA 2612
Problem 1. The hair color and eye color of n = 592 undergraduate statistics students was randomly collected (from a 1974 study). The observed counts were recorded in a contingency table as follows, with rows representing hair color and columns representing eye color. This data can also be found as a built-in data set in R called “HairEyeColor”.
The differences between the expected and observed counts of students’ hair color and eye colors seems to vary a lot. For example the expected count of people with Blond hair and brown eyes was around 47. The observed count was only 7. This is a relatively large difference. This makes me believe that eye color and hair color are not going to be independent of each other, based on this sample.
Null Hypothesis: Hair and Eye Color of Students are Independent.
Alternative Hypothesis:Hair and Eye Color of Students are Not Independent.
α = 0.01
Conditions:
- Independence Assumption:
Data was stated to be randomly collected and 592 students is less than or equal to 10% of all undergraduate statistics students.
- Counted Data Condition:
We have counts of undergraduate statistics students in categories of Hair Color and Eye Color.
- Expected Cell Frequency Condition:
We expect at least 5 students in every cell
- df = (r-1)(c-1) = 9
Ok to use Chi-Squared model with 9 degrees of freedom and Chi-Squared Test of Independence.
#Converts the HairEyeColor Dataframe in R into a matrix (used for Chi-Squared Test) ignoring sex
hairEyeMatrix<-margin.table(HairEyeColor, margin = c(1,2))
# conducts the chi-squared test for independence
chisq.test(x=hairEyeMatrix)
##
## Pearson's Chi-squared test
##
## data: hairEyeMatrix
## X-squared = 138.29, df = 9, p-value < 2.2e-16
Since the p-value is less than 2.2e-16 which is below the significance level of 0.01 we reject the null hypothesis. There is strong enough statistical evidence to say that undergraduate statistics students’ hair and eye colors are likely not independent. This directly supports the preliminary observation made in part (a) as the high variation and differences in expected and observed counts are what lead to such a high Chi-Squared statistic and such a low p-value.
Problem 2. We’ve all heard that “7” is considered to be the lucky number. To find out if this is true, a randomly selected group of undergraduate students were asked to choose a number between 1 and 20. In that sample, n = 568 students were asked and x = 42 students chose 7.
Null Hypothesis: p = 0.05
Alternative Hypothesis: p > 0.05
Where p is the true proportion of undergraduate students who, when asked to choose a number between 1 and 20, will choose 7.
α = 0.05
Conditions:
- Independence Assumption: The sample was stated to be randomly selected and 568 undergraduate students is less than or equal to 10% of all undergraduate students. Thus we can assume that each student is independent of eachother.
- Sample Size Assumption:
p-naught = 42/568 = 0.0739
np-naught = (568)(0.0739) = 41.9752 > 10 successes
n(q-naught) = (568)(1-0.0739) = 526.0248 > 10 failures
We expect at least 10 successes and 10 failues (where a success is a student choosing 7 and a failure is not choosing 7)
OK to use Normal Model and One-Proportion z-test
prop.test(x=42,n=568,p=0.05, alternative="greater", correct =F)
##
## 1-sample proportions test without continuity correction
##
## data: 42 out of 568, null probability 0.05
## X-squared = 6.8554, df = 1, p-value = 0.004419
## alternative hypothesis: true p is greater than 0.05
## 95 percent confidence interval:
## 0.05783328 1.00000000
## sample estimates:
## p
## 0.07394366
Since the p-value of 0.004419 is less than the significance level of 0.05, we reject the null hypothesis. There is strong enough statistical evidence to suggest that the proportion of students who picked “7” is greater than 0.05. This could suggest that students show a bias towards luck number 7.
Null Hypothesis: p1 - p2 = 0
Alternative Hypothesis: p1 - p2 > 0
Where p1 is the true proportion of professional and semi-professional baseball players who when asked what their favorite number between 1 and 20 is choose 7 and p2 is the true proportion of undergraduate students who when asked what their favorite number between 1 and 20 is choose 7.
α = 0.05
Conditions:
- Independence Assumption: Both samples are stated to be randomly selected. 182 baseball players is less than or equal to 10% of all professional and semi-professional baseball players. 568 undergraduate students is less than or equal to 10% of all undergraduate students. Thus we can assume that each baseball player and student is independent of other baseball players and students.
- Independent Groups Assumption: We are going to assume that the bias baseball players may have to choose 7 is indepndent of the bias that students may have.
- Sample Size Assumption:
p1-naught = 25/182 = 0.1374
p2-naught = 42/568 = 0.0739
np1-naught = (182)(0.1374) = 25.0068 successes
nq1-naught = (182)(1 - 0.1374) = 156.9932 failures
np2-naught = (568)(0.0739) = 41.9752 > 10 successes
n(q2-naught) = (568)(1 - 0.0739) = 526.0248 > 10 failures
We expect at least 10 successes and 10 failues
OK to use Normal Model and Two-Proportion z-Test
prop.test(x = c(25, 42), n = c(182, 568), alternative = "greater", correct = FALSE)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: c(25, 42) out of c(182, 568)
## X-squared = 6.8143, df = 1, p-value = 0.004521
## alternative hypothesis: greater
## 95 percent confidence interval:
## 0.01772806 1.00000000
## sample estimates:
## prop 1 prop 2
## 0.13736264 0.07394366
Since the p-value of 0.004521 is lower than the significance level of 0.05, we reject the null hypothesis. There is strong enough statistical evidence to say that the proportion of professional and semi-professional baseball players who would choose the number 7 from numbers 1-20 is likely higher than the proportion of undergraduate students who would do the same thing. This could indicate that baseball players are more biased towards choosing “7” than the students.
Based on the statistical results from parts (a) and (b), there is strong evidence that the number “7” may be perceived as a lucky number among both undergraduate students and baseball players. The one-proportion z-test from part (a) showed that students selected “7” significantly more often than would be expected by chance, indicating a possible bias toward this number. Furthermore, the two-proportion z-test from part (b) comparing baseball players and students revealed that baseball players were significantly more likely to choose “7”, with an estimated difference of about 6.3 percentage points between the two groups. Together, these findings suggest that the belief in 7 as a “lucky” number may influence number preferences in general and that this superstition appears even stronger among baseball players, supporting the idea that they are more superstitious than the average student.