Of Chickens and Women

2024-11-17

Sample Size and Standard Deviation

Let’s consider two R Built-in Data Sets: women and chickens
The 1^st data frame contains only 15 observations, the 2^nd contains 578 observations.
How will the plots differ? How does sample size effect our ability to draw conclusions?

Plot with very few observations

First lets plot our tiny data set:

Plot with many more observations

Now lets plot our larger data set, and consider how they differ:

We can also view our data sets as bar plots

In this case, the difference in sample sizes is not as immediately obvious!

What equations help us draw conclutions from these types of data sets?

Because the weights of living things are continuous, we should use the formula for Continuous Mean:

\(\mu=\int xf(x) dx\)

How does sample size affect Standard Deviation?

Recall the formula for Standard Deviation:

\(\sigma=\sqrt{V(X)}=\sqrt{\sum_{i} P_{i}(x_{i} - \mu)^2}=\sqrt{\frac{\sum_{i} (x_{i} - \mu)^2}{n}} \\ \sigma=\sqrt{V(X)}=\sqrt{\int (x-\mu)^{2}f(x) dx}\)

Use caution when drawing conclusions:

Below is the code that calculates Standard Deviation for our two data sets:

stand_dev_women = sd(women$weight, na.rm =TRUE) 
stand_dev_women

[1] 15.49869

stand_dev_chickens = sd(ChickWeight$weight, na.rm =TRUE)
stand_dev_chickens

[1] 71.07196

Appearances can be deceiving

In general, a data set must have at least 30 observations to be meaningful.
In general, accuracy will increase as sample size increases.
In general, standard deviation will decrease as sample size increases.