Problem set 1

This week, you’ll have only one programming assignment. Please write a function to compute the expected value and standard deviation of an array of values. Compare your results with that of R’s mean and std functions. Please document your work in an R-Markdown file and ensure that you have good comments to help the reader follow your work.

#random sample
x = sample(1:12, 50, replace = TRUE)
x
##  [1]  9  4 11  6  8  8  7  3 12  5  2  4 12  6  8  4  8  7  4  3  6  3  2
## [24]  2  9  5  2  4 12 12  3  4  1 12  6  1  7  9  3  5 10  5  7  4 10  2
## [47]  9  7  4  4
#expected value will be equal to the mean of the array. 

meanB <- function(input) {
 calc <- sum(input)/length(input)
 return (calc)
}

#compare
meanB(x)
## [1] 6.02
mean(x)
## [1] 6.02
#Sd is the sqrt of the sum of the differences from the mean 
sdB <- function(input) {
  calc <- sqrt(sum((input-meanB(input))^2)/(length(input)-1))
  return (calc)
}

#compare
sdB(x)
## [1] 3.222941
sd(x)
## [1] 3.222941

Now, consider that instead of being able to neatly fit the values in memory in an array, you have an infinite stream of numbers coming by. How would you estimate the mean and standard deviation of such a stream? Your function should be able to return the current estimate of the mean and standard deviation at any time it is asked. Your program should maintain these current estimates and return them back at any invocation of these functions. (Hint: You can maintain a rolling estimate of the mean and standard deviation and allow these to slowly change over time as you see more and more new values).

library(zoo)
## Warning: package 'zoo' was built under R version 3.3.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
roller <- function(valuerange, n) {
  #takes an input for range and number of samples and calculates a rolling mean & sd
  
  #creates the array
  array <- sample(valuerange, n, replace = TRUE)
  
  #creates a rolling mean
  meanC <- rollapplyr(array, length(array), meanB, partial=TRUE)
  #rolling sd
  sdC <- rollapplyr(array, length(array), sdB, partial=TRUE)
  df <- data.frame(meanC, sdC, array)
  return (df)
  
  }
  

x <- roller(1:12, 300)
head(x, 20)
##       meanC      sdC array
## 1  2.000000      NaN     2
## 2  5.500000 4.949747     9
## 3  7.000000 4.358899    10
## 4  6.000000 4.082483     3
## 5  5.800000 3.563706     5
## 6  6.166667 3.311596     8
## 7  6.428571 3.101459     8
## 8  6.750000 3.011881     9
## 9  6.777778 2.818589     7
## 10 6.200000 3.224903     1
## 11 6.363636 3.107176     8
## 12 6.083333 3.117643     3
## 13 5.692308 3.301126     1
## 14 5.428571 3.321591     2
## 15 5.333333 3.221949     4
## 16 5.125000 3.222318     2
## 17 5.352941 3.258473     9
## 18 5.500000 3.222166     8
## 19 5.473684 3.133483     5
## 20 5.400000 3.067658     4
tail(x, 20)
##        meanC      sdC array
## 281 6.306050 3.444291     2
## 282 6.322695 3.449501    11
## 283 6.332155 3.447055     9
## 284 6.313380 3.455476     1
## 285 6.329825 3.460540    11
## 286 6.342657 3.461274    10
## 287 6.362369 3.471318    12
## 288 6.350694 3.470924     3
## 289 6.349481 3.464955     6
## 290 6.358621 3.462455     9
## 291 6.353952 3.457397     5
## 292 6.356164 3.451659     7
## 293 6.341297 3.455128     2
## 294 6.350340 3.452711     9
## 295 6.332203 3.460882     1
## 296 6.334459 3.455229     7
## 297 6.353535 3.465018    12
## 298 6.369128 3.469635    11
## 299 6.351171 3.477698     1
## 300 6.340000 3.477265     3