Lab 2 Assignment MA 2612

Problem 1. The hair color and eye color of n = 592 undergraduate statistics students was randomly collected (from a 1974 study). The observed counts were recorded in a contingency table as follows, with rows representing hair color and columns representing eye color. This data can also be found as a built-in data set in R called “HairEyeColor”.

  1. Without using R, calculate the expected counts for the students’ hair color and eye color. Include your calculations and then check your answer using R. What are your preliminary observations when comparing the expected counts to the observed counts?
Expected counts for students’ hair color and eye color
Expected counts for students’ hair color and eye color

The differences between the expected and observed counts of students’ hair color and eye colors seems to vary a lot. For example the expected count of people with Blond hair and brown eyes was around 47. The observed count was only 7. This is a relatively large difference. This makes me believe that eye color and hair color are not going to be independent of each other, based on this sample.

  1. Conduct and interpret, against a significance level of α = 0.01, a χ2 test of independence for hair against eye color for all students, regardless of biological sex. How does this result compare to your answer in part (a)?

Null Hypothesis: Hair and Eye Color of Students are Independent.

Alternative Hypothesis:Hair and Eye Color of Students are Not Independent.

α = 0.01

Conditions:

- Independence Assumption:
  Data was stated to be randomly collected and 592 students is less than or equal to 10% of all undergraduate statistics students.
- Counted Data Condition:
  We have counts of undergraduate statistics students in categories of Hair Color and Eye Color.
- Expected Cell Frequency Condition:
  We expect at least 5 students in every cell
- df = (r-1)(c-1) = 9 

Ok to use Chi-Squared model with 9 degrees of freedom and Chi-Squared Test of Independence.

#Converts the HairEyeColor Dataframe in R into a matrix (used for Chi-Squared Test) ignoring sex
hairEyeMatrix<-margin.table(HairEyeColor, margin = c(1,2))

# conducts the chi-squared test for independence
chisq.test(x=hairEyeMatrix)
## 
##  Pearson's Chi-squared test
## 
## data:  hairEyeMatrix
## X-squared = 138.29, df = 9, p-value < 2.2e-16

Since the p-value is less than 2.2e-16 which is below the significance level of 0.01 we reject the null hypothesis. There is strong enough statistical evidence to say that undergraduate statistics students’ hair and eye colors are likely not independent. This directly supports the preliminary observation made in part (a) as the high variation and differences in expected and observed counts are what lead to such a high Chi-Squared statistic and such a low p-value.

Problem 2. We’ve all heard that “7” is considered to be the lucky number. To find out if this is true, a randomly selected group of undergraduate students were asked to choose a number between 1 and 20. In that sample, n = 568 students were asked and x = 42 students chose 7.

  1. Against a significance level of α = 0.05, do the students show a bias towards lucky number 7? In other words, is the proportion of students who picked “7” greater than 0.05?

Null Hypothesis: p = 0.05

Alternative Hypothesis: p > 0.05

Where p is the true proportion of undergraduate students who, when asked to choose a number between 1 and 20, will choose 7.

α = 0.05

Conditions:

 - Independence Assumption: The sample was stated to be randomly selected and 568 undergraduate students is less than or equal to 10% of all undergraduate students. Thus we can assume that each student is independent of eachother.
 - Sample Size Assumption:
 
 p-naught = 42/568 = 0.0739 
 
 np-naught = (568)(0.0739) = 41.9752 > 10 successes
 
 n(q-naught) = (568)(1-0.0739) = 526.0248 > 10 failures
 
 We expect at least 10 successes and 10 failues (where a success is a student choosing 7 and a failure is not choosing 7)

OK to use Normal Model and One-Proportion z-test

prop.test(x=42,n=568,p=0.05, alternative="greater", correct =F)
## 
##  1-sample proportions test without continuity correction
## 
## data:  42 out of 568, null probability 0.05
## X-squared = 6.8554, df = 1, p-value = 0.004419
## alternative hypothesis: true p is greater than 0.05
## 95 percent confidence interval:
##  0.05783328 1.00000000
## sample estimates:
##          p 
## 0.07394366

Since the p-value of 0.004419 is less than the significance level of 0.05, we reject the null hypothesis. There is strong enough statistical evidence to suggest that the proportion of students who picked “7” is greater than 0.05. This could suggest that students show a bias towards luck number 7.

  1. Baseball players are considered to be superstitious. To determine if this is the case, a random sample of n = 182 professional and semi-professional baseball players were asked what their favorite number is between 1 and 20. In that sample, x = 25 baseball players chose 7. Against a significance level of α = 0.05, do baseball players have a greater bias towards “7” than the students?

Null Hypothesis: p1 - p2 = 0

Alternative Hypothesis: p1 - p2 > 0

Where p1 is the true proportion of professional and semi-professional baseball players who when asked what their favorite number between 1 and 20 is choose 7 and p2 is the true proportion of undergraduate students who when asked what their favorite number between 1 and 20 is choose 7.

α = 0.05

Conditions:

 - Independence Assumption: Both samples are stated to be randomly selected. 182 baseball players is less than or equal to 10% of all professional and semi-professional baseball players. 568 undergraduate students is less than or equal to 10% of all undergraduate students. Thus we can assume that each  baseball player and student is independent of other baseball players and students.
 - Independent Groups Assumption: We are going to assume that the bias baseball players may have to choose 7 is indepndent of the bias that students may have.
 - Sample Size Assumption:
 
 p1-naught = 25/182 = 0.1374
 
 p2-naught = 42/568 = 0.0739 
 
 np1-naught = (182)(0.1374) = 25.0068 successes
 
 nq1-naught = (182)(1 - 0.1374) = 156.9932 failures
 
 np2-naught = (568)(0.0739) = 41.9752 > 10 successes
 
 n(q2-naught) = (568)(1 - 0.0739) = 526.0248 > 10 failures
 
 We expect at least 10 successes and 10 failues
 

OK to use Normal Model and Two-Proportion z-Test

prop.test(x = c(25, 42), n = c(182, 568), alternative = "greater", correct = FALSE)
## 
##  2-sample test for equality of proportions without continuity correction
## 
## data:  c(25, 42) out of c(182, 568)
## X-squared = 6.8143, df = 1, p-value = 0.004521
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.01772806 1.00000000
## sample estimates:
##     prop 1     prop 2 
## 0.13736264 0.07394366

Since the p-value of 0.004521 is lower than the significance level of 0.05, we reject the null hypothesis. There is strong enough statistical evidence to say that the proportion of professional and semi-professional baseball players who would choose the number 7 from numbers 1-20 is likely higher than the proportion of undergraduate students who would do the same thing. This could indicate that baseball players are more biased towards choosing “7” than the students.

  1. Based on the results in parts (a) and (b), what conclusions can you draw about the number “7” being considered “lucky” and the superstition of baseball players?

Based on the statistical results from parts (a) and (b), there is strong evidence that the number “7” may be perceived as a lucky number among both undergraduate students and baseball players. The one-proportion z-test from part (a) showed that students selected “7” significantly more often than would be expected by chance, indicating a possible bias toward this number. Furthermore, the two-proportion z-test from part (b) comparing baseball players and students revealed that baseball players were significantly more likely to choose “7”, with an estimated difference of about 6.3 percentage points between the two groups. Together, these findings suggest that the belief in 7 as a “lucky” number may influence number preferences in general and that this superstition appears even stronger among baseball players, supporting the idea that they are more superstitious than the average student.