Assignment Goodness of Fit

Part 1

Book Questions

This is a modified question from Chapter 8, Q 2:

The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. The following table gives the number of nematodes per fish:

  1. Produce a graph of the data. What type of graph is most appropriate?

  2. Calculate the frequencies expected if nematodes infect fish “at random” (i.e., independently and with equal probability).

  3. Overlay the expected frequencies onto your graph. What are the noticeable differences?

  4. Is there evidence that nematodes do not worm their way into the fish at random? Here and always, show all four steps of hypothesis testing.

Number of nematodes (x) Number of fish (Observed frequency)
0 110
1 45
2 22
3 10
4 6
5 4
6 2
7 1
Total 200
Do you want to run this in R?

If you want to run this in R, run the following code:

nematodes <- data.frame(
  Nematodes = 0:7,
  Fish = c(110, 45, 22, 10, 6, 4, 2, 1)
)

# Convert to a vector of observations
obs <- rep(nematodes$Nematodes, times = nematodes$Fish)

You will end up with an object called “obs”. These are your observations. You can then run the code at the end of this document to run the analysis 😃

Book Reading

Read the summaries from Chapter 6, 7, and 8 (they are pretty short)! And mention something that you learned from them.

Part 2: Do your own study

Instructions

  1. Pick one type of count you can easily collect this week:
  • Cars passing a fixed point in short intervals

  • People entering a building or bus

  • Messages or notifications you receive per hour

  • (Or propose another similar idea!) –> You can ask ChatGPT! And you can make it related to your interests!

  1. Collect at least 20 counts using equal time or space intervals. Example: 20 one-minute counts of cars passing your street.
  2. Calculate the sample mean (you will see online that it is sometimes referred to as \(\lambda\), but you can call i \(\mu\)) of your counts.
  3. Computed expected Poisson frequenceis for 0, 1, 2, 3, … all the way to the highest number you saw
  4. Compare Observed vs Expected

\[ \chi^2 = \frac{(O_i - E_i)^2}{E_i} \]

How do I compute expected Poissons and \(\chi^2\)

You can check the last slideshow, which contains the steps, and check Chapter 6 in the book.

You can also follow the instructions in this document to run it in program R.

  1. Interpret the results:
  • Is your process plausibly Poisson (random)?

  • If not, what could cause it to deviate (e.g., bursts, patterns, clumping)?

  1. Turn in a short summary that includes:
  • Your data (table or brief summary)

  • Histogram of counts

  • The four steps of a Hypothesis test. Include the mean, variance, and \(\chi^2\) test result, as well as the decision

  • One paragraph interpreting what it means about your process


Instructions to estimate it in R, You can do it by hand or Excel as well.

Put your data in R inside the vector, separate it by commas:

x <- c(`put your data here`)

Step 1: Calculate a mean:

avg <- mean(x)

Step 2: Create a table

obs_counts <- table(factor(x, levels = 0:max(x)))

If you a re using R for your assignment, make sure the table is correct. And present the table in the results

Step 3:

expected_probs <- dpois(0:max(x), lambda = avg)
expected_counts <- sum(obs_counts) * expected_probs

Step 4: Obtain the statistic and the p-value

chisq_value <- sum((obs_counts - expected_counts)^2 / expected_counts)
df <- length(obs_counts) - 1 - 1   # subtract 1 for estimated mean
p_value <- 1 - pchisq(chisq_value, df)

Step 5:

Make sure to do a histogram of your data! hist(x)