INFO371 Lab 1: Explore Statistics

1. Variance of normal RV-s

1. pick the sample size n (1000 is a good number)

n <- (1000)

2. generate n standard normal variables. Use the R function rnorm. (Find out it’s usage yourself).

vars <- rnorm(n)

3. compute the mean of your sample X.

# calculate mean of sample
meanVal <- mean(vars)

mean of sample = -0.0123271

4. calculate the variance of your sample. Use the variance defnition.

# calculate variance with variance definition
v1 <- mean((vars - meanVal)^2)

variance_1 = 1.0268284

5. next, calculate it by the shortcut formula Var X = Ex^2???(EX)2. Do you get the same results?

# calculate variance with shortcut formula
v2 <- mean(vars^2)-(meanVal^2)

variance_2 = 1.0268284

The result is the same as above!

6. now compute variance using the the var function. Figure out yourself how to use it.

# calculate varianace with the var function
v3 <- var(vars)

variance_3 = ’r v3`

7. compute the standard deviance of the sample sd X (whichever way you want).

# calculate standard deviance of 1000 standard normal variables
sdX <- sd(vars)

standard deviation = 1.0138325

8. Finally, we do some inference: what percentage of your numbers fall outside of the range [X - 1.96 * sdX, X + 1.96 * sdX]?

# calculate lower and upper bound of 95% confidence interval
lowerbound <- meanVal - 1.96 * sdX
upperbound <- meanVal + 1.96 * sdX

There is a 95% probability that the true mean is between -1.9994387 and 1.9747845.

With a standard deviation of 1.0138325, I infer that about 5% of the values will fall outside the range of (-1.9994387, 1.9747845).

9. Plot a histogram of your random numbers.

# creating histogram of 1000 standard normal variables
# includes vertical line to indicate 95% confidence interval bounds and the mean
hist(vars)
abline(v = meanVal, col='red')
abline(v = lowerbound, col="blue")
abline(v = upperbound, col="blue")

2. Variance of Means

1. pick a small sample size n, say n = 3. Pick the number of samples m, say m = 1000.

# assign sample size and number of sample values
n <- 3
m <- 1000

2. create m samples of size n of standard normals and store their corresponding means µ.

# create matrix of mean samples
means <- sapply(1:m, function(a) mean(rnorm(n)))

3. find the variance of the sample of means. (Use whatever way you like).

# calculate variance of means
varMeans <- var(means)

4. repeat the previous with 100× larger sample size (e.g. n = 300). How large is the variance now?

# calculate new variance with a sample size 100 times larger than above
newMeans <- sapply(1:m, function(a) mean(rnorm(300)))
newVar <- var(newMeans)

With a sample size 100 larger, the variance is now 0.0035209. This is roughly 100 times larger than the original variance (0.3223589).

5. how much smaller will be the standard deviation if the sample is 10× larger?

# calcualte and explore the difference in relationship of standard deviation and sample size.
a <- sd(means)
b <- sd(newMeans)

c <- sd(sapply(1:m, function(a) mean(rnorm(10))))
d <- sd(sapply(1:m, function(a) mean(rnorm(100))))
e <- sd(sapply(1:m, function(a) mean(rnorm(1000))))

After dividing to find the relationship of the standard deviations of various samples 100 times larger, I found the standard deviation will be 3 times smaller when the sample is ten times larger. Further, if the sample is 100 times larger, the standard deviation is roughly 9 times smaller.

INFO371 Lab 1: Explore Statistics

Katie Goulding

January 14, 2019

1. Variance of normal RV-s

2. Variance of Means