Law of Large Numbers Why do you think the mean of a sample would not likely represent the mean of the population when the sample size is very small? Explain using your own hypothetical research.In this exercise you will follow through our textbook, History of Teacup Giraffes, Module: Mean, Median, Mode. Make sure to read the module before you do this exercise.
The code chunk below creates the teacup giraffes dataset simiar to the on in the textbook. It is similar to the code used in Quiz2-a with two differences: 1) only island1 data is included and 2) three very tall teacup giraffes are added. You will work with this data set for the quiz.
# Load the package
library(tidyverse)
# set the seed for reproducible random data
set.seed(2020)
# Create height vectors of teacup giraffes
island1 <- rnorm(50, 9, 2) #50 giraffes around 9 inches tall
island1 <- c(island1, 90,100,110) # add three giraffes 90, 100, and 110 inches tall
# Combine the height vectors into a dataframe
d <-
data.frame(island1) %>%
gather(Location, Height, 1) #transforms data to a long form
d
## Location Height
## 1 island1 9.753944
## 2 island1 9.603097
## 3 island1 6.803954
## 4 island1 6.739188
## 5 island1 3.406931
## 6 island1 10.441147
## 7 island1 10.878242
## 8 island1 8.541245
## 9 island1 12.518263
## 10 island1 9.234734
## 11 island1 7.293754
## 12 island1 10.818518
## 13 island1 11.392746
## 14 island1 8.256832
## 15 island1 8.753480
## 16 island1 12.600086
## 17 island1 12.407992
## 18 island1 2.922471
## 19 island1 4.422050
## 20 island1 9.116607
## 21 island1 13.348731
## 22 island1 11.196365
## 23 island1 9.636441
## 24 island1 8.853705
## 25 island1 10.668537
## 26 island1 9.397501
## 27 island1 11.595683
## 28 island1 10.873437
## 29 island1 8.705134
## 30 island1 9.220864
## 31 island1 7.374991
## 32 island1 7.512596
## 33 island1 11.190690
## 34 island1 13.870747
## 35 island1 9.776237
## 36 island1 9.581255
## 37 island1 8.428803
## 38 island1 9.152029
## 39 island1 7.879403
## 40 island1 9.894377
## 41 island1 10.817002
## 42 island1 7.989881
## 43 island1 8.397992
## 44 island1 7.547928
## 45 island1 6.639846
## 46 island1 9.506149
## 47 island1 8.258577
## 48 island1 9.044359
## 49 island1 10.320088
## 50 island1 9.977587
## 51 island1 90.000000
## 52 island1 100.000000
## 53 island1 110.000000
The three common measurements are the mean (the average), median (the most middle number) and mode (the value that occurs the most frequent).
to find the mean you would take the sum of the values, and divide it by the number of values. to find the median you find if there is an odd or even amount of values, order them from smallest to largest, if there is an odd amount, than the median is the middle value, if there is an even amount then you find the value at the position n/2 and (n/2)+1, the median would be the mean of the values. to find the mode just find the number that occurs the most often in the data set.
Hint: Refer to the code in Data Visualization with R: Ch1.2.4 Summarizing data.
summarize(d, mean_ht = mean(Height, na.rm=TRUE))
## mean_ht
## 1 14.38797
Hint: Replace mean with median in the code in Q3.
summarize(d, median_ht = median(Height, na.rm=TRUE))
## median_ht
## 1 9.506149
Hint: Refer to the code in Data Visualization with R: Ch3.2.1 Histogram.
ggplot(d, aes(x = Height)) +
geom_histogram()
Hint: Note that the histogram in Q5 is peaked at 9 inches.
the best two measurements to use would be median because its more representative of the data set.
Law of Large Numbers Why do you think the mean of a sample would not likely represent the mean of the population when the sample size is very small? Explain using your own hypothetical research.Hint: Here is my own hypothetical example. I want to know how tall a typical PSU student is. But there are too many students to collect data from. So I collected a sample of 10 students to get the mean. Would the mean of the 10 student sample be a good represenation of all PSU students? What if there is a 7 footer in the 10 student sample? Would that skew the mean? Would the 7 footer has the same degree of influence had I collected a sample of 100 students?
The mean isn’t the best way to represent a population because depending on the size, one random off number could skew the entire data, for example if you were trying to find out how many people like hockey from PSU, so you ask the first 20 people who walk into the silver center, asking just the theater people wouldn’t be a great representation of the population of psu.
Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.