DATA 605 Discussion 9

R Function

simulate <- function( simulations, samples ){
  
basket <- c(0,1,2,3,4,5,6,7,8,9)
stdDev <- sd(basket) # standard deviation
expected <- mean(basket) # mean
avgs <- rep(0, simulations) # store averages 
passed <- 0 # track passed tests


for(i in 1:simulations){
  trial <- sample(basket, size = samples, replace = TRUE)
  avg <- mean(trial)
  avgs[i] <- avg
  if(avg >= ( expected - 3*stdDev ) & avg <= ( expected + 3*stdDev ) ){
    passed <- passed + 1
  }
}
 
print( paste0("The average was within 3 standard deviations of the mean ", passed, " of ", simulations, " times."))

results <- data.frame(n = avgs)

ggplot(results, aes(n)) +
  geom_histogram(fill="steelblue", stat = "bin", bins=200) +
  labs( title="Histogram of final simulation", x= "Mean", y = "Frequncy") +
  geom_vline(xintercept = c(expected), linetype="dashed", color = "red") + 
  scale_x_continuous(limits=c(3, 6), expand = c(0, 0)) +
  theme_minimal()
}

A Single Simulation

simulate(1, 1000)

## [1] "The average was within 3 standard deviations of the mean 1 of 1 times."

As evident above, the simulation was within three standard deviations of the mean. Infact, it was within 1 standard deviation of the mean. Because were are taking 1000 samples this is not really surprising. Let’s see what the passing rate will be when doing 1000 simulations instead of one.

1000 Simulations

simulate(1000, 1000)

## [1] "The average was within 3 standard deviations of the mean 1000 of 1000 times."

Having a 1000 simulations, the histogram is centered very close to the expected value of 4.5. This is an evident improvement over doing a single simulation. It should also be noted that 100% percent of the means fell within 3 standard deviations of the mean. This distribution now resembles a normal distribution (thanks to the Central Limit Theorem) and it is expected that 99.7% of the data will be within 3 standard deviations of the mean, so this is not surprising. However, having a standard deviation of approximately 3.027 and a maximum value of 9, it seems impossible to get a value greater than 3 standard deviations of the mean. The same goes for the left side of the normal curve because 3 standard deviations below the mean of 4.5 will result in a negative value, which is impossible with a minimum value of 0 in the data set

DATA 605 Discussion 9

Albert Gilharry

October 24, 2018