1. Problem set 1

Please write a function to compute the expected value and standard deviation of an array of values. Compare your results with that of R’s mean and std functions. Then, consider that instead of being able to neatly fit the values in memory into an array, you have an infinite stream of numbers coming by. How would you estimate the mean and standard deviation of such a stream? Your function should be able to return the current estimate of the mean and standard deviation at any time it is asked. Your program should maintain these current estimates and return them back at any invocation of these functions.

(Hint: You can maintain a rolling estimate of the mean and standard deviation and allow these to slowly change over time as you see more and more new values).

(1.1) Expected Value.

\[\bar { x } =\cfrac { 1 }{ n } \sum { X }\]

expected_value <- function(x) {
  total <- sum(x)
  count <- length(x)
  expected_value <- sum(x) / length(x)
  return(expected_value)
}
x <- array(rnorm(10^3))
x_mean <- expected_value(x); x_mean

## [1] 0.05925143

all.equal(mean(x),x_mean)

## [1] TRUE

(1.2) Standard Deviation.

The R sd function calculates the Standard Error which instead of \(n\), uses an \(n-1\) Bessel correction. \[s=\sqrt { \cfrac { 1 }{ n-1 } \sum { { \left( X-\bar { X } \right) }^{ 2 } } } =\sqrt { \cfrac { 1 }{ n-1 } \left( \sum { { X }^{ 2 } } -n{ \bar { X } }^{ 2 } \right) \quad } \]

standard_error <- function(x) {
  n <- length(x)
  x_mean <- expected_value(x)
  numerator <- sum(x^2) - n * mean(x)^2
  radicand <- numerator / (n - 1)
  standard_error <- sqrt(radicand)
  return(standard_error)
}
x_sd <- standard_error(x); x_sd

## [1] 0.9425413

all.equal(sd(x),x_sd)

## [1] TRUE

(1.3) Estimate Mean and standard deviation of an infinite stream.

Return the current estimate of the mean and standard deviation at any time it is asked. Maintain these current estimates and return them back at any invocation of these functions.

library(rstream)

The package rstream provides multiple streams of uniform random numbers.

(1.3.a) Moving Statistics.

Given a sequence \({ \left( a_{ i } \right) }_{ i=1 }^{ N }\), an \(n\)-moving average is a new sequence \({ \left( s_{ i } \right) }_{ i=1 }^{ N-n+1 }\) defined from the \(a_{ i }\) by taking the arithmetic mean of subsequences of \(n\) terms, such that \(s_{ i }=\frac { 1 }{ n } \sum _{ j=1 }^{ i+n-1 }{ a_{ j } }\). So the sequences \(S_{ n }\) giving \(n\)-moving averages are:

\[S_{ 2 }=\frac { 1 }{ 2 } \left( a_{ 1 }+a_{ 2 },\quad a_{ 2 }+a_{ 3 },\quad ...,\quad a_{ n-1 }+a_{ n } \right)\] \[S_{ 3 }=\frac { 1 }{ 3 } \left( a_{ 1 }+a_{ 2 }+a_{ 3 },\quad a_{ 2 }+a_{ 3 }+a_{ 4 },\quad ...,\quad a_{ n-2 }+a_{ n-1 }+a_{ n } \right)\] \[\vdots\] \[S_{ k }=\frac { 1 }{ k } \left( a_{ 1 }+\cdots +a_{ k },\quad a_{ 2 }+\cdots +a_{ k+1 },\quad ...,\quad a_{ n-k }+\cdots +a_{ k+n } \right)\]

A moving standard error flows the same divide and calculate pattern.

moving_statistics <- function(xt, n) {
  N <- length(xt)
  i <- N - n + 1
  mu_moving <- se_moving <- numeric(i)
  a <- seq(1, i, 1)
  b <- seq(n, N, 1)
  for (j in 1:i) {
    mu_moving[j] = expected_value(x[a[j]:b[j]])
    se_moving[j] = standard_error(x[a[j]:b[j]])
    }
  moving <- data.frame(mu_moving, se_moving)
  return(moving)
}
N = 10^3; n = 30
xt = rnorm(N)
head((x_MA <- moving_statistics(xt, n)))

##    mu_moving se_moving
## 1 -0.1654873 0.9108045
## 2 -0.1496909 0.9069050
## 3 -0.1065249 0.9004941
## 4 -0.1282317 0.8943737
## 5 -0.1173102 0.8962090
## 6 -0.1339914 0.8835013

(1.3.b) Rolling Window Statistics

\[\bar { x } _{ 1,2 }=\frac { \bar { x } _{ 1 }+\bar { x } _{ 2 } }{ 2 }\]

\[\bar { x } =\frac { \sum { { \bar { x } _{ i } } } }{ n }\]

\[s_{ 1,2 }=\sqrt { \frac { \left[ (N_{ 1 }-1)s_{ t_{ i } }^{ 2 }+N_{ 1 }\mu _{ 1 } \right] +\left[ (N_{ 2 }-1)s_{ t_{ j } }^{ 2 }+N_{ 2 }\mu _{ 2 } \right] -\left[ \left( N_{ 1 }+N_{ 2 } \right) { \left( \frac { \bar { x } _{ 1 }+\bar { x } _{ 2 } }{ 2 } \right) }^{ 2 } \right] }{ N_{ 1 }+N_{ 2 }-1 } }\]

\[\bar { s } =\frac { \sum { \left[ (N_{ i }-1)s_{ i }^{ 2 }+N_{ i }\mu _{ i }^{ 2 } \right] } -\left[ \sum { N_{ i } } { \left( \frac { \sum { { \bar { x } _{ 1 } } } }{ n } \right) }^{ 2 } \right] }{ \sum { N } -1 }\]

rolling_statistics <- function(xt){
  if(exists("x_Roll_Stats")) {
    avg_past <- x_Roll_Stats[1,1]
    avg_curr <- expected_value(xt)
    avg_roll <- (avg_past + avg_curr) / 2
    se_past <- x_Roll_Stats[1,3]
    se_curr <- standard_error(xt)
    var_past <- se_past^2
    var_curr <- se_curr^2
    ssq_past <- (N - 1) * var_past^2 + N * avg_past^2
    ssq_curr <- (N - 1) * var_curr^2 + N * avg_curr^2
    se_roll <- sqrt(((ssq_past + ssq_curr) - 2 * N * avg_roll^2) / (2 * N - 1))
    } else {
    avg_curr <- avg_roll <- expected_value(xt)
    se_curr <- se_roll <- standard_error(xt)
    }
  rolling <- data.frame(avg_curr, avg_roll, se_curr, se_roll)
  names(rolling) <- c("mu_Current", "mu_Rolling", "se_Current", "se_Rolling")
  return(rolling)
}
# Check the function (Current = Rolling on first run).
xt <- rnorm(N)
(x_Roll_Stats <- rolling_statistics(xt))

##    mu_Current  mu_Rolling se_Current se_Rolling
## 1 0.004490588 0.004490588   1.029393   1.029393

# Iterate the function (Current = Rolling on first run).
rm(x_Roll_Stats); i = 0; check <- 10
while (i < check) {
  xt <- rnorm(N)
  if(exists("x_Roll_Stats")) {
  x_Roll_Stats <- rolling_statistics(xt)
  stats <- rbind(stats, x_Roll_Stats)
  } else {
  x_Roll_Stats <- rolling_statistics(xt)
  stats <- x_Roll_Stats  
  }
  i = i + 1
}; stats

##      mu_Current   mu_Rolling se_Current se_Rolling
## 1   0.027025707  0.027025707  0.9762791  0.9762791
## 2   0.007581022  0.017303365  0.9978300  0.9744297
## 3   0.067009090  0.037295056  1.0158877  1.0141913
## 4   0.006494774  0.036751932  0.9739829  0.9914266
## 5  -0.016686458 -0.005095842  1.0326228  1.0090068
## 6   0.009766281 -0.003460088  1.0075413  1.0408627
## 7   0.059138571  0.034452426  0.9854847  0.9934613
## 8   0.049660348  0.054399459  0.9811796  0.9667257
## 9  -0.052716478 -0.001528065  1.0199836  1.0033488
## 10 -0.044361514 -0.048538996  0.9992143  1.0193670

References

http://mathworld.wolfram.com/MovingAverage.html

http://jonisalonen.com/2014/efficient-and-accurate-rolling-standard-deviation/

DATA 605 Assignment 7

Jose Zuniga