Math 32

Exploration 2

In this exploration, you are asked to

  • find sample means and medians from a randomized data sample
  • observe the differences in the practice of using means or medians
  1. First,
    1. Type your name into the author field in the YAML parameters above
    2. Also save this file while replacing “template” with your full name (be sure to keep the *.rmd file format).
  2. Demonstrate your understanding about the example code block below by adding at least 5 comments in the code block in your own words.
library("ggplot2")
#the above line loads a library that allows us to plot data
N <- 32
#this above line sets the variable N as 32 
shoe_sizes <- 6:14
#the above line sets a range of the available shoe sizes 
without_outlier <- sample(shoe_sizes, N, replace = TRUE)
#the above line takes a sample from the available shoe sizes 32  times and allows values to repeat with the replace = TRUE, it also assigns this to the without_outlier variable
df_without_outlier <- data.frame(without_outlier)

outlier <- 100
with_outlier <- c(without_outlier, outlier)
df_with_outlier <- data.frame(with_outlier)

#the lines bellow creates bar graph of the distribution of shoe sizes and adds a blue dashed line at the mean
ggplot(df_without_outlier, aes(x = without_outlier)) +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_vline(aes(xintercept = mean(without_outlier)),
             color = "blue", linetype = "dashed", size = 2) +
  labs(title = "Shoe Sizes", subtitle = "Exploring Means",
       caption = "Math 32", x = "shoe sizes") +
  theme_minimal() +
  theme(legend.position = "none")

  1. Make at least two observations about the results (the graph) in an unordered list.
  • The mean floats arround the 10 shoe size after several runs
  • The shoe size distribution changes at every run
  1. In the code block below,

    1. set the eval parameter for the code block to TRUE
    2. graph the “with outlier” data by making changes in the ggplot line
    3. add another geom_vline layer that
      • calculates and displays the mean with the outlier
      • has a red color
      • is a dotted line
    4. replace “Math 32” with your name in the caption
ggplot(df_with_outlier, aes(x = with_outlier)) +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_vline(aes(xintercept = mean(with_outlier)),
             color = "blue", linetype = "dashed", size = 2) +
  geom_vline(aes(xintercept = mean(with_outlier)),
             color = "red", linetype = "dotted", size = 2) +
  labs(title = "Shoe Sizes", subtitle = "Exploring Means",
       caption = "Humberto Flores", x = "shoe sizes") +
  theme_minimal() +
  theme(legend.position = "none")

  1. Repeat the previous task in a new code block, but compute the median without and with the outlier.
ggplot(df_without_outlier, aes(x = without_outlier)) +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_vline(aes(xintercept = median(without_outlier)),
             color = "red", linetype = "dotted", size = 2) +
  labs(title = "Shoe Sizes", subtitle = "Exploring Medians",
       caption = "Math 32", x = "shoe sizes") +
  theme_minimal() +
  theme(legend.position = "none")

ggplot(df_with_outlier, aes(x = with_outlier)) +
  geom_histogram(binwidth = 1, color = "black", fill = "white") +
  geom_vline(aes(xintercept = median(with_outlier)),
             color = "red", linetype = "dotted", size = 2) +
  labs(title = "Shoe Sizes", subtitle = "Exploring Medians",
       caption = "Humberto Flores", x = "shoe sizes") +
  theme_minimal() +
  theme(legend.position = "none")

  1. Compute the difference between the means from before and after including the outlier. Compute the difference between the medians from before and after including the outlier.
diff(mean(without_outlier), mean(with_outlier))
## numeric(0)
diff(median(without_outlier), median(with_outlier))
## numeric(0)
  1. Make at least two observations about the results (your new graphs) in an unordered list.
  • The outlier does not change the results for the median or mean
  • Including the outlier however does change the way the graph looks by showing a value all the way at 100
  1. Finally, knit your work as either an HTML or PDF document, and upload that document back into the CatCourses assignment.