Please write a function to compute the expected value and standard deviation of an array of values. Compare your results with that of R’s mean and std functions. Then, consider that instead of being able to neatly fit the values in memory into an array, you have an infinite stream of numbers coming by. How would you estimate the mean and standard deviation of such a stream? Your function should be able to return the current estimate of the mean and standard deviation at any time it is asked. Your program should maintain these current estimates and return them back at any invocation of these functions.
(Hint: You can maintain a rolling estimate of the mean and standard deviation and allow these to slowly change over time as you see more and more new values).
\[\bar { x } =\cfrac { 1 }{ n } \sum { X }\]
expected_value <- function(x) {
total <- sum(x)
count <- length(x)
expected_value <- sum(x) / length(x)
return(expected_value)
}
x <- array(rnorm(10^3))
x_mean <- expected_value(x); x_mean
## [1] 0.05925143
all.equal(mean(x),x_mean)
## [1] TRUE
The R sd function calculates the Standard Error which instead of \(n\), uses an \(n-1\) Bessel correction. \[s=\sqrt { \cfrac { 1 }{ n-1 } \sum { { \left( X-\bar { X } \right) }^{ 2 } } } =\sqrt { \cfrac { 1 }{ n-1 } \left( \sum { { X }^{ 2 } } -n{ \bar { X } }^{ 2 } \right) \quad } \]
standard_error <- function(x) {
n <- length(x)
x_mean <- expected_value(x)
numerator <- sum(x^2) - n * mean(x)^2
radicand <- numerator / (n - 1)
standard_error <- sqrt(radicand)
return(standard_error)
}
x_sd <- standard_error(x); x_sd
## [1] 0.9425413
all.equal(sd(x),x_sd)
## [1] TRUE
Return the current estimate of the mean and standard deviation at any time it is asked. Maintain these current estimates and return them back at any invocation of these functions.
library(rstream)
The package rstream provides multiple streams of uniform random numbers.
Given a sequence \({ \left( a_{ i } \right) }_{ i=1 }^{ N }\), an \(n\)-moving average is a new sequence \({ \left( s_{ i } \right) }_{ i=1 }^{ N-n+1 }\) defined from the \(a_{ i }\) by taking the arithmetic mean of subsequences of \(n\) terms, such that \(s_{ i }=\frac { 1 }{ n } \sum _{ j=1 }^{ i+n-1 }{ a_{ j } }\). So the sequences \(S_{ n }\) giving \(n\)-moving averages are:
\[S_{ 2 }=\frac { 1 }{ 2 } \left( a_{ 1 }+a_{ 2 },\quad a_{ 2 }+a_{ 3 },\quad ...,\quad a_{ n-1 }+a_{ n } \right)\] \[S_{ 3 }=\frac { 1 }{ 3 } \left( a_{ 1 }+a_{ 2 }+a_{ 3 },\quad a_{ 2 }+a_{ 3 }+a_{ 4 },\quad ...,\quad a_{ n-2 }+a_{ n-1 }+a_{ n } \right)\] \[\vdots\] \[S_{ k }=\frac { 1 }{ k } \left( a_{ 1 }+\cdots +a_{ k },\quad a_{ 2 }+\cdots +a_{ k+1 },\quad ...,\quad a_{ n-k }+\cdots +a_{ k+n } \right)\]
A moving standard error flows the same divide and calculate pattern.
moving_statistics <- function(xt, n) {
N <- length(xt)
i <- N - n + 1
mu_moving <- se_moving <- numeric(i)
a <- seq(1, i, 1)
b <- seq(n, N, 1)
for (j in 1:i) {
mu_moving[j] = expected_value(x[a[j]:b[j]])
se_moving[j] = standard_error(x[a[j]:b[j]])
}
moving <- data.frame(mu_moving, se_moving)
return(moving)
}
N = 10^3; n = 30
xt = rnorm(N)
head((x_MA <- moving_statistics(xt, n)))
## mu_moving se_moving
## 1 -0.1654873 0.9108045
## 2 -0.1496909 0.9069050
## 3 -0.1065249 0.9004941
## 4 -0.1282317 0.8943737
## 5 -0.1173102 0.8962090
## 6 -0.1339914 0.8835013
\[\bar { x } _{ 1,2 }=\frac { \bar { x } _{ 1 }+\bar { x } _{ 2 } }{ 2 }\]
\[\bar { x } =\frac { \sum { { \bar { x } _{ i } } } }{ n }\]
\[s_{ 1,2 }=\sqrt { \frac { \left[ (N_{ 1 }-1)s_{ t_{ i } }^{ 2 }+N_{ 1 }\mu _{ 1 } \right] +\left[ (N_{ 2 }-1)s_{ t_{ j } }^{ 2 }+N_{ 2 }\mu _{ 2 } \right] -\left[ \left( N_{ 1 }+N_{ 2 } \right) { \left( \frac { \bar { x } _{ 1 }+\bar { x } _{ 2 } }{ 2 } \right) }^{ 2 } \right] }{ N_{ 1 }+N_{ 2 }-1 } }\]
\[\bar { s } =\frac { \sum { \left[ (N_{ i }-1)s_{ i }^{ 2 }+N_{ i }\mu _{ i }^{ 2 } \right] } -\left[ \sum { N_{ i } } { \left( \frac { \sum { { \bar { x } _{ 1 } } } }{ n } \right) }^{ 2 } \right] }{ \sum { N } -1 }\]
rolling_statistics <- function(xt){
if(exists("x_Roll_Stats")) {
avg_past <- x_Roll_Stats[1,1]
avg_curr <- expected_value(xt)
avg_roll <- (avg_past + avg_curr) / 2
se_past <- x_Roll_Stats[1,3]
se_curr <- standard_error(xt)
var_past <- se_past^2
var_curr <- se_curr^2
ssq_past <- (N - 1) * var_past^2 + N * avg_past^2
ssq_curr <- (N - 1) * var_curr^2 + N * avg_curr^2
se_roll <- sqrt(((ssq_past + ssq_curr) - 2 * N * avg_roll^2) / (2 * N - 1))
} else {
avg_curr <- avg_roll <- expected_value(xt)
se_curr <- se_roll <- standard_error(xt)
}
rolling <- data.frame(avg_curr, avg_roll, se_curr, se_roll)
names(rolling) <- c("mu_Current", "mu_Rolling", "se_Current", "se_Rolling")
return(rolling)
}
# Check the function (Current = Rolling on first run).
xt <- rnorm(N)
(x_Roll_Stats <- rolling_statistics(xt))
## mu_Current mu_Rolling se_Current se_Rolling
## 1 0.004490588 0.004490588 1.029393 1.029393
# Iterate the function (Current = Rolling on first run).
rm(x_Roll_Stats); i = 0; check <- 10
while (i < check) {
xt <- rnorm(N)
if(exists("x_Roll_Stats")) {
x_Roll_Stats <- rolling_statistics(xt)
stats <- rbind(stats, x_Roll_Stats)
} else {
x_Roll_Stats <- rolling_statistics(xt)
stats <- x_Roll_Stats
}
i = i + 1
}; stats
## mu_Current mu_Rolling se_Current se_Rolling
## 1 0.027025707 0.027025707 0.9762791 0.9762791
## 2 0.007581022 0.017303365 0.9978300 0.9744297
## 3 0.067009090 0.037295056 1.0158877 1.0141913
## 4 0.006494774 0.036751932 0.9739829 0.9914266
## 5 -0.016686458 -0.005095842 1.0326228 1.0090068
## 6 0.009766281 -0.003460088 1.0075413 1.0408627
## 7 0.059138571 0.034452426 0.9854847 0.9934613
## 8 0.049660348 0.054399459 0.9811796 0.9667257
## 9 -0.052716478 -0.001528065 1.0199836 1.0033488
## 10 -0.044361514 -0.048538996 0.9992143 1.0193670