Fitting probability models to frequency data

M. Drew LaMar
February 26, 2016


https://xkcd.com/882/

Class Announcements

  • Exam #1 will be graded by Monday
  • New reading assignment policy (WILL ANNOUNCE ON MONDAY)

Errors in Hypothesis Testing - Revisited

alt text

Definition: Type I error is rejecting a true null hypothesis. The probability of a Type I error is given by \[ \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ true}] = \alpha \]

Definition: Type II error is failing to reject a false null hypothesis. The probability of a Type II error is given by \[ \mathrm{Pr[Do \ not \ reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] = \beta \]

Errors in Hypothesis Testing - Power

alt text

Definition: The power of a statistical test (denoted \( 1-\beta \)) is given by \[ \begin{align*} \mathrm{Pr[Reject} \ H_{0} \ | \ H_{0} \ \mathrm{is \ false}] & = 1-\beta \\ & = 1 - \mathrm{Pr[Type \ II \ error]} \end{align*} \]

Probability of errors in hypothesis testing

alt text

  • \( \alpha \) is the significance level
  • \( 1-\beta \) is the power

Statistical power example
https://qubeshub.org/tools/statpowerviz/

Power analysis

Power of a statistical test is a function of
     - Significance level \( \alpha \)
     - Variability of data
     - Sample size
     - Effect size

  • Desired power is set by researcher (typically 80%)
  • Significance level set by researcher
  • Data variability and effect size can be estimated by previous studies or pilot studies
  • Sample size is then calculated to achieve desired power given previous fixed attributes

Chapter 8: Fitting probability models to frequency data

From proportions and binomial distributions…

Chapter 8: Fitting probability models to frequency data

…to working with direct frequency distributions.

         Right Left
Observed    14    4
Expected     9    9

plot of chunk unnamed-chunk-3

Chi-squared goodness-of-fit test

Note: The binomial test is an example of a goodness-of-fit test.

Definition: A goodness-of-fit test is a method for comparing an observed frequency distribution with the frequency distribution that would be expected under a simple probability model governing the occurrence of different outcomes.

Definition: A model in this case is a simplified, mathematical representation that mimics how we think a natural process works.

Working through an example

Assignment Problem #21

A more recent study of Feline High-Rise Syndrom (FHRS) included data on the month in which each of 119 cats fell (Vnuk et al. 2004). The data are in the accompanying table. Can we infer that the rate of cat falling varies between months of the year?

Month Number fallen Month Number fallen
January 4 July 19
February 6 August 13
March 8 September 12
April 10 October 12
May 9 November 7
June 14 December 5

Example - Assignment Problem #21

Null and alternative hypotheses

A more recent study of Feline High-Rise Syndrom (FHRS) included data on the month in which each of 119 cats fell (Vnuk et al. 2004). The data are in the accompanying table. Can we infer that the rate of cat falling varies between months of the year?

Question: What are the null and alternative hypotheses?

Answer:
     \( H_{0} \): The frequency of cats falling is the same in each month.
     \( H_{A} \): The frequency of cats falling is not the same in each month.

Example - Assignment Problem #21

Observed and Expected Frequencies

rows <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
Obs <- c(4, 6, 8, 10, 9, 14, 19, 13, 12, 12, 7, 5)
Exp <- rep(sum(Obs)/12, 12)
FHRSTable <- matrix(c(Obs, Exp), ncol = 2, dimnames = list(rows, c("Obs","Exp")))
addmargins(FHRSTable, margin = 1)

Example - Assignment Problem #21

Observed and Expected Frequencies

    Obs        Exp
Jan   4   9.916667
Feb   6   9.916667
Mar   8   9.916667
Apr  10   9.916667
May   9   9.916667
Jun  14   9.916667
Jul  19   9.916667
Aug  13   9.916667
Sep  12   9.916667
Oct  12   9.916667
Nov   7   9.916667
Dec   5   9.916667
Sum 119 119.000000

Example - Assignment Problem #21

barplot(FHRSTable, beside=TRUE)

plot of chunk unnamed-chunk-6

barplot(t(FHRSTable), beside=TRUE)

plot of chunk unnamed-chunk-7

Example - Assignment Problem #21

\( \chi^2 \) test statistic

Definition: The \( \chi^2 \) statistic measures the discrepancy between observed frequencies from the data and expected frequencies from the null hypothesis and is given by

\[ \chi^2 = \sum_{i}\frac{(Observed_{i} - Expected_{i})^2}{Expected_{i}} \]

Discuss: What would support the null hypothesis more: a small value or large value for \( \chi^{2} \)?

Answer: Small value for \( \chi^{2} \)