In this exercise you will follow through our textbook, History of Teacup Giraffes, Module: Mean, Median, Mode. Make sure to read the module before you do this exercise.

The code chunk below creates the teacup giraffes dataset simiar to the on in the textbook. It is similar to the code used in Quiz2-a with two differences: 1) only island1 data is included and 2) three very tall teacup giraffes are added. You will work with this data set for the quiz.

# Load the package
library(tidyverse)

# set the seed for reproducible random data
set.seed(2020)

# Create height vectors of teacup giraffes
island1 <- rnorm(50, 9, 2) #50 giraffes around 9 inches tall 
island1 <- c(island1, 90,100,110) # add three giraffes 90, 100, and 110 inches tall

# Combine the height vectors into a dataframe
d <- 
  data.frame(island1) %>% 
  gather(Location, Height, 1) #transforms data to a long form

d
##    Location     Height
## 1   island1   9.753944
## 2   island1   9.603097
## 3   island1   6.803954
## 4   island1   6.739188
## 5   island1   3.406931
## 6   island1  10.441147
## 7   island1  10.878242
## 8   island1   8.541245
## 9   island1  12.518263
## 10  island1   9.234734
## 11  island1   7.293754
## 12  island1  10.818518
## 13  island1  11.392746
## 14  island1   8.256832
## 15  island1   8.753480
## 16  island1  12.600086
## 17  island1  12.407992
## 18  island1   2.922471
## 19  island1   4.422050
## 20  island1   9.116607
## 21  island1  13.348731
## 22  island1  11.196365
## 23  island1   9.636441
## 24  island1   8.853705
## 25  island1  10.668537
## 26  island1   9.397501
## 27  island1  11.595683
## 28  island1  10.873437
## 29  island1   8.705134
## 30  island1   9.220864
## 31  island1   7.374991
## 32  island1   7.512596
## 33  island1  11.190690
## 34  island1  13.870747
## 35  island1   9.776237
## 36  island1   9.581255
## 37  island1   8.428803
## 38  island1   9.152029
## 39  island1   7.879403
## 40  island1   9.894377
## 41  island1  10.817002
## 42  island1   7.989881
## 43  island1   8.397992
## 44  island1   7.547928
## 45  island1   6.639846
## 46  island1   9.506149
## 47  island1   8.258577
## 48  island1   9.044359
## 49  island1  10.320088
## 50  island1   9.977587
## 51  island1  90.000000
## 52  island1 100.000000
## 53  island1 110.000000

Q1 What are the three common measures of centrality?

The three common measurements are the mean (the average), median (the most middle number) and mode (the value that occurs the most frequent).

Q2 Explain how you would manually calculate each of the three measures in your own words.

to find the mean you would take the sum of the values, and divide it by the number of values. to find the median you find if there is an odd or even amount of values, order them from smallest to largest, if there is an odd amount, than the median is the middle value, if there is an even amount then you find the value at the position n/2 and (n/2)+1, the median would be the mean of the values. to find the mode just find the number that occurs the most often in the data set.

Q3 Calculate the mean height using the mean() function.

Hint: Refer to the code in Data Visualization with R: Ch1.2.4 Summarizing data.

 summarize(d, mean_ht = mean(Height, na.rm=TRUE))
##    mean_ht
## 1 14.38797

Q4 Calculate the median height using the median() function.

Hint: Replace mean with median in the code in Q3.

summarize(d, median_ht = median(Height, na.rm=TRUE))
##   median_ht
## 1  9.506149

Q5 Create a histogram.

Hint: Refer to the code in Data Visualization with R: Ch3.2.1 Histogram.

ggplot(d, aes(x = Height)) +
  geom_histogram() 

Q6 Which of the two measures would be apprrpriate to represent the typical height? Why?

Hint: Note that the histogram in Q5 is peaked at 9 inches.

the best two measurements to use would be median because its more representative of the data set.

Q7 Law of Large Numbers Why do you think the mean of a sample would not likely represent the mean of the population when the sample size is very small? Explain using your own hypothetical research.

Hint: Here is my own hypothetical example. I want to know how tall a typical PSU student is. But there are too many students to collect data from. So I collected a sample of 10 students to get the mean. Would the mean of the 10 student sample be a good represenation of all PSU students? What if there is a 7 footer in the 10 student sample? Would that skew the mean? Would the 7 footer has the same degree of influence had I collected a sample of 100 students?

The mean isn’t the best way to represent a population because depending on the size, one random off number could skew the entire data, for example if you were trying to find out how many people like hockey from PSU, so you ask the first 20 people who walk into the silver center, asking just the theater people wouldn’t be a great representation of the population of psu.

Q8 Hide the messages, but display the code and its results on the webpage.

Hint: Use message, echo and results in the chunk options. Refer to the RMarkdown Reference Guide.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.