2024-11-17

Sample Size and Standard Deviation

  • Let’s consider two R Built-in Data Sets: women and chickens
  • The 1st data frame contains only 15 observations, the 2nd contains 578 observations.
  • How will the plots differ? How does sample size effect our ability to draw conclusions?

Plot with very few observations

  • First lets plot our tiny data set:

Plot with many more observations

  • Now lets plot our larger data set, and consider how they differ:

We can also view our data sets as bar plots

  • In this case, the difference in sample sizes is not as immediately obvious!

What equations help us draw conclutions from these types of data sets?

  • Because the weights of living things are continuous, we should use the formula for Continuous Mean:

\(\mu=\int xf(x) dx\)

How does sample size affect Standard Deviation?

  • Recall the formula for Standard Deviation:

\(\sigma=\sqrt{V(X)}=\sqrt{\sum_{i} P_{i}(x_{i} - \mu)^2}=\sqrt{\frac{\sum_{i} (x_{i} - \mu)^2}{n}} \\ \sigma=\sqrt{V(X)}=\sqrt{\int (x-\mu)^{2}f(x) dx}\)

Use caution when drawing conclusions:

  • Below is the code that calculates Standard Deviation for our two data sets:
stand_dev_women = sd(women$weight, na.rm =TRUE) 
stand_dev_women
[1] 15.49869
stand_dev_chickens = sd(ChickWeight$weight, na.rm =TRUE)
stand_dev_chickens
[1] 71.07196

Appearances can be deceiving

  • In general, a data set must have at least 30 observations to be meaningful.
  • In general, accuracy will increase as sample size increases.
  • In general, standard deviation will decrease as sample size increases.