1. Problem Set 1

Please write a function to compute the expected value and standard deviation of an array of values.
Compare your results with that of R’s mean and std functions.

Solution:

# Expected value sum(x) / length(x)
exp_value <- function(v){
  if (length(v) != 0)
     return(sum(v)/length(v))
}
# validating the mean using mean()
a<- c(1,2,3,4,5)
(exp_value(a) == mean(a))
## [1] TRUE
Using the standard deviation equation for a sample of a population formula:
s= sqrt[ sum (x - meax(x))^2 / N -1 ]
where:
s= the standard deviation
x= each value in the sample
mean(x)= the mean of the values
N= the sample size
std_deviation<- function(v) {
  
  if ((length(v)-1) != 0)
      return(sqrt (sum( (v-exp_value(v)  ) ^2  )  /  (length(v)-1)) )
  }

# validating the standard deviation using sd()
a<- c(1,2,3,4,5)
(std_deviation(a) == sd(a))
## [1] TRUE

Now, consider that instead of being able to neatly fit the values in memory in an array,
you have an infinite stream of numbers coming by. How would you estimate the mean and
standard deviation of such a stream? Your function should be able to return the current
estimate of the mean and standard deviation at any time it is asked. Your program should
maintain these current estimates and return them back at any invocation of these functions.
(Hint: You can maintain a rolling estimate of the mean and standard deviation and allow
these to slowly change over time as you see more and more new values).

# initializing global variable to hold stream data 
stream<<-NA

    
rollingfunc <- function(x) {
   
  # initializing the stream array as global variable
   if (is.na(stream[1]) == TRUE) stream<<- stream[-1]
  
  # assign the input array to the global stream
    stream<<- c(stream, assign("stream", x, envir = .GlobalEnv))
    
  # print the global array, starting from the beginning to the current value
   print(data.frame(stream))
  
   # return the rolling mean and standard deviation for the global stream
   return(data.frame(mean = exp_value(stream), 
                     std = std_deviation(stream) ))
}

Testing

stream<<-NA

a<- c(1,2,3,4,5)
b<- c(10,55,22)
c<- c(11,22,33,44,55,10,55,22, 66)
d<- seq(1:20)

e<- c(a,b,c,d)


# initial test 
rollingfunc(a)
##   stream
## 1      1
## 2      2
## 3      3
## 4      4
## 5      5
##   mean      std
## 1    3 1.581139
mean(a)
## [1] 3
sd(a)
## [1] 1.581139
rollingfunc(b)
##   stream
## 1      1
## 2      2
## 3      3
## 4      4
## 5      5
## 6     10
## 7     55
## 8     22
##    mean      std
## 1 12.75 18.37506
rollingfunc(c)
##    stream
## 1       1
## 2       2
## 3       3
## 4       4
## 5       5
## 6      10
## 7      55
## 8      22
## 9      11
## 10     22
## 11     33
## 12     44
## 13     55
## 14     10
## 15     55
## 16     22
## 17     66
##       mean      std
## 1 24.70588 22.23107
k<- rollingfunc(d)
##    stream
## 1       1
## 2       2
## 3       3
## 4       4
## 5       5
## 6      10
## 7      55
## 8      22
## 9      11
## 10     22
## 11     33
## 12     44
## 13     55
## 14     10
## 15     55
## 16     22
## 17     66
## 18      1
## 19      2
## 20      3
## 21      4
## 22      5
## 23      6
## 24      7
## 25      8
## 26      9
## 27     10
## 28     11
## 29     12
## 30     13
## 31     14
## 32     15
## 33     16
## 34     17
## 35     18
## 36     19
## 37     20
# testing ... Please note that I assigned "e" to be cumulative of a, b, c, and d.
mean(e) == k$mean
## [1] TRUE
sd(e) == k$std
## [1] TRUE