In this first homework tutorial we will explore some properties of the normal distribution.
rnorm() to sample 10 values from the standard normal distribution. What are the true mean and true standard deviation for the standard normal distribution? rnorm(10)
## [1] -0.4363084 0.5804208 -1.2827720 1.2097871 -0.6487326 1.0897538
## [7] 0.3723478 -1.5918110 0.7869499 -0.1909680
True mean = 0, true standard deviation = 1.
rnorm(10)
## [1] 1.0286419 0.6750924 -0.4912495 -0.3224056 -2.1993826 2.0653127
## [7] 0.5689649 0.2737641 1.3568065 -0.7871434
Don’t get the same output. Because the command is only sampling.
rnorm() to sample 100 values from the standard normal distribution. This time, instead of simply printing the output of rnorm(), assign it to a variable named r100. r100 = rnorm(100)
r100 using the function mean(). What value do you expect? Does the sample mean that you just computed differ from it? Why? mean(r100)
## [1] -0.1207082
The expected mean is 0. The sample mean is not equal to that. Because sampling is random, and its mean is random correspondingly.
length() and sum() to compute the mean of r100 in an alternative way. Do you get exactly the same answer? sum(r100)/length(r100)
## [1] -0.1207082
Yes. I get the same answer.
r100 using the function sd(). What value do you expect? Does the standard deviation that you just computed differ from it? Why? sd(r100)
## [1] 1.012902
The expextation of standard deviation is 1. The standard deviation of samples is not the same concept with that of standard normal distribution.
var() to compute the variance of r100. What relationship do you expect between the variance and standard deviation of r100? Does it hold exactly? var(r100)
## [1] 1.02597
sqrt(var(r100))
## [1] 1.012902
The standard deviation should equal to the square root of variance. The computation answer shows it is right.
r10K. Compute the mean and standard deviation. How do the values differ from what you got for r100? Why do you think that is? r10k = rnorm(10000)
r10k_mean = mean(r10k)
r10k_mean
## [1] 0.01308977
r10k_sd = sd(r10k)
r10k_sd
## [1] 1.003742
The mean and standard diviation od r10k are more stable after each running, and closer to the mean and atandard deviation of expectation than r100, which is 0 and 1.
hist(). Describe the shape. Does it look normal? hist(r10k)
Yes. It looks similar to a standard distribution.
r10K using plot.ecdf(). How does the shape of this ECDF relate to that of the histogram for the same data? plot.ecdf(r10k)
ECDF graph is the possibility integral of histogram from minimum to maximum.
r10K.m1. Add an ECDF for r10K.m1 to the current plot and color it red (see the help page for plot.ecdf). How do they differ? r10k.m1 = rnorm(10000, mean = 1)
plot.ecdf(r10k)
plot.ecdf(r10k.m1, col = "red", add=TRUE)
The red line has samilar shape with black one. It just shifts the curve 1 unit right, since the mean of this set has been changed to 1.
r10K.s2. Add its ECDF to the current plot and color it blue. How does this ECDF differ? r10k.s2 = rnorm(10000, mean = 0, sd = 2)
plot.ecdf(r10k)
plot.ecdf(r10k.m1, col = "red", add = TRUE)
plot.ecdf(r10k.s2, col = "blue", add = TRUE)
The blue line accumulates in a milder way at the point near the mean than other two curves, since the standard deviation of it has been changed to 2.
r10K.lt0 that tells us whether each object in r10K is less than zero. Use head() to look at the first few values of r10K, and then the corresponding values of r10K.lt0. r10k.lt0 = r10k < 0
head(r10k)
## [1] -0.4553348 -1.8291003 -1.9197924 1.1423789 -0.3629006 -0.4395897
head(r10k.lt0)
## [1] TRUE TRUE TRUE FALSE TRUE TRUE
table() to summarize r10K.lt0. Is this what you expected? Motivate your answer. table(r10k.lt0)
## r10k.lt0
## FALSE TRUE
## 5047 4953
The expectation answer is 50:50 for True:False. But the variation set is generated randomly, so thr real situation always is that True and False cases are not equal.
pnorm() to obtain the theoretically expected fraction of values less than 0. Is this what you would expect? pnorm(0)
## [1] 0.5
Yes. The fraction should be 50%.
r10K only, and then use abline() to draw a vertical line at \(x = 1\). Estimate (by eye) the proportion of values in r10K that are greater than 1. What is your estimate? How many values (out of 10,000) do you expect to be > 1? plot.ecdf(r10k)
abline(v = 1)
There may be around 15% of whole values are greater than 1 in r10k, that is 1500 out of 10000.
r10K.gt1 that tells us whether each object in r10K is greater than 1. r10k.gt1 = r10k > 1
table() to summarize r10K.gt1. Is this what you expected? Motivate your answer. table(r10k.gt1)
## r10k.gt1
## FALSE TRUE
## 8359 1641
The table answer is close to my estimation. And theoretically the table answer meets the expectation.
pnorm() to obtain the theoretically expected fraction of values greater than 1. It may require a little bit of thinking to get this right. 1 - pnorm(1)
## [1] 0.1586553
r10K.s2 do you expect to be larger than 10? plot.ecdf(r10k.s2)
abline(v = 10)
Less than 99.99%.