nematodes <- data.frame(
Nematodes = 0:7,
Fish = c(110, 45, 22, 10, 6, 4, 2, 1)
)
# Convert to a vector of observations
obs <- rep(nematodes$Nematodes, times = nematodes$Fish)Assignment Goodness of Fit
Part 1
Book Questions
This is a modified question from Chapter 8, Q 2:
The parasitic nematode Camallanus oxycephalus infects many freshwater fish, including shad. The following table gives the number of nematodes per fish:
Produce a graph of the data. What type of graph is most appropriate?
Calculate the frequencies expected if nematodes infect fish “at random” (i.e., independently and with equal probability).
Overlay the expected frequencies onto your graph. What are the noticeable differences?
Is there evidence that nematodes do not worm their way into the fish at random? Here and always, show all four steps of hypothesis testing.
| Number of nematodes (x) | Number of fish (Observed frequency) |
|---|---|
| 0 | 110 |
| 1 | 45 |
| 2 | 22 |
| 3 | 10 |
| 4 | 6 |
| 5 | 4 |
| 6 | 2 |
| 7 | 1 |
| Total | 200 |
If you want to run this in R, run the following code:
You will end up with an object called “obs”. These are your observations. You can then run the code at the end of this document to run the analysis 😃
Book Reading
Read the summaries from Chapter 6, 7, and 8 (they are pretty short)! And mention something that you learned from them.
Part 2: Do your own study
Instructions
- Pick one type of count you can easily collect this week:
Cars passing a fixed point in short intervals
People entering a building or bus
Messages or notifications you receive per hour
(Or propose another similar idea!) –> You can ask ChatGPT! And you can make it related to your interests!
- Collect at least 20 counts using equal time or space intervals. Example: 20 one-minute counts of cars passing your street.
- Calculate the sample mean (you will see online that it is sometimes referred to as \(\lambda\), but you can call i \(\mu\)) of your counts.
- Computed expected Poisson frequenceis for 0, 1, 2, 3, … all the way to the highest number you saw
- Compare Observed vs Expected
\[ \chi^2 = \frac{(O_i - E_i)^2}{E_i} \]
You can check the last slideshow, which contains the steps, and check Chapter 6 in the book.
You can also follow the instructions in this document to run it in program R.
- Interpret the results:
Is your process plausibly Poisson (random)?
If not, what could cause it to deviate (e.g., bursts, patterns, clumping)?
- Turn in a short summary that includes:
Your data (table or brief summary)
Histogram of counts
The four steps of a Hypothesis test. Include the mean, variance, and \(\chi^2\) test result, as well as the decision
One paragraph interpreting what it means about your process
Instructions to estimate it in R, You can do it by hand or Excel as well.
Put your data in R inside the vector, separate it by commas:
x <- c(`put your data here`)Step 1: Calculate a mean:
avg <- mean(x)Step 2: Create a table
obs_counts <- table(factor(x, levels = 0:max(x)))If you a re using R for your assignment, make sure the table is correct. And present the table in the results
Step 3:
expected_probs <- dpois(0:max(x), lambda = avg)
expected_counts <- sum(obs_counts) * expected_probsStep 4: Obtain the statistic and the p-value
chisq_value <- sum((obs_counts - expected_counts)^2 / expected_counts)
df <- length(obs_counts) - 1 - 1 # subtract 1 for estimated mean
p_value <- 1 - pchisq(chisq_value, df)Step 5:
Make sure to do a histogram of your data! hist(x)