Problem Set

Please write a function to compute the expected value and standard deviation of an array of values. Compare your results with that of R’s mean and std functions. Please document your work in an RMarkdown file and ensure that you have good comments to help the reader follow your work.

Now, consider that instead of being able to neatly fit the values in memory in an array, you have an infinite stream of numbers coming by. How would you estimate the mean and standard deviation of such a stream? Your function should be able to return the current estimate of the mean and standard deviation at any time it is asked. Your program should maintain these current estimates and return them back at any invocation of these functions.

(Hint: You can maintain a rolling estimate of the mean and standard deviation and allow these to slowly change over time as you see more and more new values).

library(zoo)
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# create mean function
meanFun <- function(x) {
  sum(x)/length(x)
}

# compare meanFun() and mean()
a <- c(0.1, 0.34, 0.004)
mean(a)
## [1] 0.148
meanFun(a)
## [1] 0.148
# create sd function
sdFun <- function(x) {
  sqrt(sum((x - meanFun(x))^2)/(length(x) - 1))
}

# comparing the function sdFun() and sd()
sdFun(a)
## [1] 0.1730665
sd(a)
## [1] 0.1730665
# rolling mean and sd
rollingFun <- function(vec, width, FUN) 
    sapply(seq_along(vec), function(i) if (i < width) NA else FUN(vec[i:(i-width+1)]))

set.seed(1)
vec <- sample(1:50, 50)

# testing rolling mean
rollingFun(vec, 1, meanFun)
##  [1] 14 19 28 43 10 41 42 29 27  3  9  7 44 15 48 18 25 33 13 34 47 39 49
## [24]  4 30 46  1 40 20  8 31 12 23 38 50 11 36  2 35  5 16  6 26 17 21 22
## [47] 24 32 45 37
rollapply(vec, 1, mean)
##  [1] 14 19 28 43 10 41 42 29 27  3  9  7 44 15 48 18 25 33 13 34 47 39 49
## [24]  4 30 46  1 40 20  8 31 12 23 38 50 11 36  2 35  5 16  6 26 17 21 22
## [47] 24 32 45 37
# testing rolling sd
rollingFun(vec, 2, sdFun)
##  [1]         NA  3.5355339  6.3639610 10.6066017 23.3345238 21.9203102
##  [7]  0.7071068  9.1923882  1.4142136 16.9705627  4.2426407  1.4142136
## [13] 26.1629509 20.5060967 23.3345238 21.2132034  4.9497475  5.6568542
## [19] 14.1421356 14.8492424  9.1923882  5.6568542  7.0710678 31.8198052
## [25] 18.3847763 11.3137085 31.8198052 27.5771645 14.1421356  8.4852814
## [31] 16.2634560 13.4350288  7.7781746 10.6066017  8.4852814 27.5771645
## [37] 17.6776695 24.0416306 23.3345238 21.2132034  7.7781746  7.0710678
## [43] 14.1421356  6.3639610  2.8284271  0.7071068  1.4142136  5.6568542
## [49]  9.1923882  5.6568542
rollapply(vec, 2, sd)
##  [1]  3.5355339  6.3639610 10.6066017 23.3345238 21.9203102  0.7071068
##  [7]  9.1923882  1.4142136 16.9705627  4.2426407  1.4142136 26.1629509
## [13] 20.5060967 23.3345238 21.2132034  4.9497475  5.6568542 14.1421356
## [19] 14.8492424  9.1923882  5.6568542  7.0710678 31.8198052 18.3847763
## [25] 11.3137085 31.8198052 27.5771645 14.1421356  8.4852814 16.2634560
## [31] 13.4350288  7.7781746 10.6066017  8.4852814 27.5771645 17.6776695
## [37] 24.0416306 23.3345238 21.2132034  7.7781746  7.0710678 14.1421356
## [43]  6.3639610  2.8284271  0.7071068  1.4142136  5.6568542  9.1923882
## [49]  5.6568542

Using last computed E(X) and and the new value to compute rolling mean and rolling standard deviation

# rolling mean
rolling.mean <- function(x) {
  n <- length(x)
  (x[n] + meanFun(x[1:n-1]) * (n - 1))/n
}

# test
rolling.mean(a)
## [1] 0.148
mean(a)
## [1] 0.148
# rolling sd
# sd = sqrt(E(X^2) - E(X)^2)
rolling.sd <- function(x) {
n = length(x)
sqrt(n * sum(x^2) - sum(x)^2)/n
}

# test running sd. The test fails as the comparisons of the below 2 functions are not equal. 
rolling.sd(a)
## [1] 0.1413082
sd(a)
## [1] 0.1730665

Reference:

http://math.stackexchange.com/questions/102978/incremental-computation-of-standard-deviation

https://www.r-bloggers.com/using-r-for-introductory-statistics-chapter-5/

https://www.mathsisfun.com/data/standard-deviation-formulas.html

https://www.johndcook.com/blog/standard_deviation/