This week, you’ll have only one programming assignment. Please write a function to compute the expected value and standard deviation of an array of values. Compare your results with that of R’s mean and std functions. Please document your work in an R-Markdown file and ensure that you have good comments to help the reader follow your work.
#random sample
x = sample(1:12, 50, replace = TRUE)
x
## [1] 9 4 11 6 8 8 7 3 12 5 2 4 12 6 8 4 8 7 4 3 6 3 2
## [24] 2 9 5 2 4 12 12 3 4 1 12 6 1 7 9 3 5 10 5 7 4 10 2
## [47] 9 7 4 4
#expected value will be equal to the mean of the array.
meanB <- function(input) {
calc <- sum(input)/length(input)
return (calc)
}
#compare
meanB(x)
## [1] 6.02
mean(x)
## [1] 6.02
#Sd is the sqrt of the sum of the differences from the mean
sdB <- function(input) {
calc <- sqrt(sum((input-meanB(input))^2)/(length(input)-1))
return (calc)
}
#compare
sdB(x)
## [1] 3.222941
sd(x)
## [1] 3.222941
Now, consider that instead of being able to neatly fit the values in memory in an array, you have an infinite stream of numbers coming by. How would you estimate the mean and standard deviation of such a stream? Your function should be able to return the current estimate of the mean and standard deviation at any time it is asked. Your program should maintain these current estimates and return them back at any invocation of these functions. (Hint: You can maintain a rolling estimate of the mean and standard deviation and allow these to slowly change over time as you see more and more new values).
library(zoo)
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
roller <- function(valuerange, n) {
#takes an input for range and number of samples and calculates a rolling mean & sd
#creates the array
array <- sample(valuerange, n, replace = TRUE)
#creates a rolling mean
meanC <- rollapplyr(array, length(array), meanB, partial=TRUE)
#rolling sd
sdC <- rollapplyr(array, length(array), sdB, partial=TRUE)
df <- data.frame(meanC, sdC, array)
return (df)
}
x <- roller(1:12, 300)
head(x, 20)
## meanC sdC array
## 1 2.000000 NaN 2
## 2 5.500000 4.949747 9
## 3 7.000000 4.358899 10
## 4 6.000000 4.082483 3
## 5 5.800000 3.563706 5
## 6 6.166667 3.311596 8
## 7 6.428571 3.101459 8
## 8 6.750000 3.011881 9
## 9 6.777778 2.818589 7
## 10 6.200000 3.224903 1
## 11 6.363636 3.107176 8
## 12 6.083333 3.117643 3
## 13 5.692308 3.301126 1
## 14 5.428571 3.321591 2
## 15 5.333333 3.221949 4
## 16 5.125000 3.222318 2
## 17 5.352941 3.258473 9
## 18 5.500000 3.222166 8
## 19 5.473684 3.133483 5
## 20 5.400000 3.067658 4
tail(x, 20)
## meanC sdC array
## 281 6.306050 3.444291 2
## 282 6.322695 3.449501 11
## 283 6.332155 3.447055 9
## 284 6.313380 3.455476 1
## 285 6.329825 3.460540 11
## 286 6.342657 3.461274 10
## 287 6.362369 3.471318 12
## 288 6.350694 3.470924 3
## 289 6.349481 3.464955 6
## 290 6.358621 3.462455 9
## 291 6.353952 3.457397 5
## 292 6.356164 3.451659 7
## 293 6.341297 3.455128 2
## 294 6.350340 3.452711 9
## 295 6.332203 3.460882 1
## 296 6.334459 3.455229 7
## 297 6.353535 3.465018 12
## 298 6.369128 3.469635 11
## 299 6.351171 3.477698 1
## 300 6.340000 3.477265 3