Exercise 5.5.

  • Work through the general process for hypothesis testing given in Section 5.1 using the data in Table 5.1.

  • Create a function which takes such a table (of any number of columns) as input and produces the same output as Exercise 5.2.

  • Work out how to use chisq.test() to perform a goodness of fit test and check the answer from that with the output from your function.

  • Have we proved that parents stop having children after the first girl? Why?

General process

The following is a six-step general process for hypothesis testing and p-values:

  1. Define your hypotheses: null, \(H_0\), and alternative, \(H_1\)

  2. Specify the statistical test under \(H_0\)

  3. State and check any assumptions you are making

  4. Calculate the test statistic to be used in the test

  5. Work out the p-value corresponding to the test statistic

  6. Draw conclusions in terms of the wording of \(H_0\)

1. Define the hypothesis: \(H_0\) and \(H_1\)

\(H_0\): The Geometric (0.5) model is a good fit.

\(H_1\): The Geometric (0.5) model is not a good fit.

2. Specify the statistical test under \(H_0\)

Statistical test:

  • \(\chi^2\) test

3. State and check any assumptions you are making

  • The probability of a female birth is 0.5;
  • We will ignore multiple births.

4. Calculate the test statistic to be used in the test

We can first look at the table

##     Observed Expected
## 0        100    115.0
## 1         90     57.5
## 2         28     28.8
## 3          7     14.4
## >=4        5     14.4

We need \((O_i - E_i)^2/E_i\) for the test statistic

  • Using the function!

Function

hyp_test <- function (a) {
  O <- a $ Observed
  E <- a $ Expected
  
  C <- ((O - E)^2) / E
  
  chi2 <- sum (C)
  
  D <- 1 * (length(Observed) - 1)
  
  p_value <- 1 - pchisq (q = chi2, df = D)
  
  return (list ("test statistics" = chi2,
                "degree of freedom" = D,
                "p-value" = p_value))
}

5. Work out p-value and the corresponding test statistic

hyp_test (a = T)
## $`test statistics`
## [1] 30.2872
## 
## $`degree of freedom`
## [1] 4
## 
## $`p-value`
## [1] 4.277784e-06

6. Draw conclusions in terms of the wording of \(H_0\)

How we interpret them? If we have 5 % significance level.

qchisq(p = 0.95, df = 4)
## [1] 9.487729
  • The critical value is 9.487729.
  • Since 30.2872 > 9.487729
  • There is sufficient evidence to reject \(H_0\).
  • There is sufficient evidence to state that a geometric (0.5) distribution is not a good model.

6. Draw conclusions in terms of the wording of \(H_0\)

Or, we can use the p-value: 4.277784e-06

(The p-value represents the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample data, under the assumption that the null hypothesis is true.)

  • This means: the observed data is unlikely to have occurred if the null hypothesis were true.
  • Therefore, we have sufficient evidence to reject the null hypothesis at a 0.05 significance level.
  • Conclusion: the geometric (0.5) distribution is not a good model.

How to use chisq.test() to perform a goodness of fit test?

chisq.test(T$Observed, p = T$Expected / sum(T$Expected))
## 
##  Chi-squared test for given probabilities
## 
## data:  T$Observed
## X-squared = 30.3, df = 4, p-value = 4.252e-06

Have we proved that parents stop having children after the first girl?

  • In the previous part, we have showed that: geometric (0.5) distribution is not a good model.

What does it means?

  • Family planning decisions might be influenced by factors other than the gender of the child.