This week, you’ll have only one programming assignment. Please write a function to compute the expected value and standard deviation of an array of values. Compare your results with that of R’s mean and std functions. Please document your work in an RMarkdown file and ensure that you have good comments to help the reader follow your work.
Now, consider that instead of being able to neatly fit the values in memory in an array, you have an infinite stream of numbers coming by. How would you estimate the mean and standard deviation of such a stream? Your function should be able to return the current estimate of the mean and standard deviation at any time it is asked. Your program should maintain these current estimates and return them back at any invocation of these functions. (Hint: You can maintain a rolling estimate of the mean and standard deviation and allow these to slowly change over time as you see more and more new values).
Sol:
# Expected value function
my_ev <- function(x){
m <- sum(x)/length(x)
return(m)
}
# Standard deviation function
my_sd <- function(x){
s <- sqrt(sum((x-my_ev(x))^2)/(length(x)-1))
return(s)
}
# Generate random array of values
set.seed(7)
x <- sample(100, 10, replace = FALSE, prob = NULL)
# Mean using function created
(my_ev_function <- my_ev(x))
## [1] 43.9
# Standard deviation using function
(my_sd_function <- my_sd(x))
## [1] 33.3315
# r mean
(r_mean <- mean(x))
## [1] 43.9
# r Standard deviation
(r_sd <- sd(x))
## [1] 33.3315
# Compare expected value/mean
my_ev_function == r_mean
## [1] TRUE
# Compare standard deviation
my_sd_function == r_sd
## [1] TRUE
stream_data <- function(x){
# Define variables
n <- 0
mean <- 0.0
M2 <- 0.0
for (i in x){
n <- n + 1
# Calculate mean of streaming data
d <- i - mean
mean <- mean + d/n
d2 <- i - mean
M2 <- M2 + d * d2
# Calculate standard deviation of streaming data
if (n < 2) {
sd <- 0
}else{
sd <- sqrt(M2/(n-1))
}
# Print the rolling mean and standard deviation for the global stream
result <- print (c("Length" = round(n), "Mean" = round(mean, 3), "Standard deviation" = round(sd, 3)))
}
}
# Test using the randomly generated array from previous section
test <- stream_data(x)
## Length Mean Standard deviation
## 1 99 0
## Length Mean Standard deviation
## 2.000 69.500 41.719
## Length Mean Standard deviation
## 3.000 50.333 44.411
## Length Mean Standard deviation
## 4.000 39.500 42.241
## Length Mean Standard deviation
## 5.000 36.400 37.233
## Length Mean Standard deviation
## 6.000 43.000 37.019
## Length Mean Standard deviation
## 7.000 41.429 34.048
## Length Mean Standard deviation
## 8.000 47.625 36.067
## Length Mean Standard deviation
## 9.000 44.111 35.346
## Length Mean Standard deviation
## 10.000 43.900 33.331