Assignment 1 - Normal Distribution

In this first homework tutorial we will explore some properties of the normal distribution.

Use the function rnorm() to sample 10 values from the standard normal distribution. What are the true mean and true standard deviation for the standard normal distribution?

    rnorm(10)

##  [1] -0.4363084  0.5804208 -1.2827720  1.2097871 -0.6487326  1.0897538
##  [7]  0.3723478 -1.5918110  0.7869499 -0.1909680

True mean = 0, true standard deviation = 1.

Repeat the same command. Do you get the same output? Why?

    rnorm(10)

##  [1]  1.0286419  0.6750924 -0.4912495 -0.3224056 -2.1993826  2.0653127
##  [7]  0.5689649  0.2737641  1.3568065 -0.7871434

Don’t get the same output. Because the command is only sampling.

Use rnorm() to sample 100 values from the standard normal distribution. This time, instead of simply printing the output of rnorm(), assign it to a variable named r100.

    r100 = rnorm(100)

Compute the mean of the variable r100 using the function mean(). What value do you expect? Does the sample mean that you just computed differ from it? Why?

    mean(r100)

## [1] -0.1207082

The expected mean is 0. The sample mean is not equal to that. Because sampling is random, and its mean is random correspondingly.

Combine the functions length() and sum() to compute the mean of r100 in an alternative way. Do you get exactly the same answer?

    sum(r100)/length(r100)

## [1] -0.1207082

Yes. I get the same answer.

Compute the standard deviation of the variable r100 using the function sd(). What value do you expect? Does the standard deviation that you just computed differ from it? Why?

    sd(r100)

## [1] 1.012902

The expextation of standard deviation is 1. The standard deviation of samples is not the same concept with that of standard normal distribution.

Use the function var() to compute the variance of r100. What relationship do you expect between the variance and standard deviation of r100? Does it hold exactly?

    var(r100)

## [1] 1.02597

    sqrt(var(r100))

## [1] 1.012902

The standard deviation should equal to the square root of variance. The computation answer shows it is right.

Generate a set of 10,000 standard normal random variates and store it as r10K. Compute the mean and standard deviation. How do the values differ from what you got for r100? Why do you think that is?

    r10k = rnorm(10000)
    r10k_mean = mean(r10k)
    r10k_mean

## [1] 0.01308977

    r10k_sd = sd(r10k)
    r10k_sd

## [1] 1.003742

The mean and standard diviation od r10k are more stable after each running, and closer to the mean and atandard deviation of expectation than r100, which is 0 and 1.

Plot a histogram of r10K using hist(). Describe the shape. Does it look normal?

    hist(r10k)

Yes. It looks similar to a standard distribution.

Plot an ECDF (empirical cumulative distribution function) of r10K using plot.ecdf(). How does the shape of this ECDF relate to that of the histogram for the same data?

    plot.ecdf(r10k)

ECDF graph is the possibility integral of histogram from minimum to maximum.

Generate a set of 10,000 random normal variates with an expected mean equal to 1. Call it r10K.m1. Add an ECDF for r10K.m1 to the current plot and color it red (see the help page for plot.ecdf). How do they differ?

    r10k.m1 = rnorm(10000, mean = 1)
    plot.ecdf(r10k)
    plot.ecdf(r10k.m1, col = "red", add=TRUE)

The red line has samilar shape with black one. It just shifts the curve 1 unit right, since the mean of this set has been changed to 1.

Generate a set of 10,000 random normal variates with an expected mean equal to zero and an expected standard deviation equal to 2. Call it r10K.s2. Add its ECDF to the current plot and color it blue. How does this ECDF differ?

    r10k.s2 = rnorm(10000, mean = 0, sd = 2)
    plot.ecdf(r10k)
    plot.ecdf(r10k.m1, col = "red", add = TRUE)
    plot.ecdf(r10k.s2, col = "blue", add = TRUE)

The blue line accumulates in a milder way at the point near the mean than other two curves, since the standard deviation of it has been changed to 2.

Create a logical array r10K.lt0 that tells us whether each object in r10K is less than zero. Use head() to look at the first few values of r10K, and then the corresponding values of r10K.lt0.

    r10k.lt0 = r10k < 0
    head(r10k)

## [1] -0.4553348 -1.8291003 -1.9197924  1.1423789 -0.3629006 -0.4395897

    head(r10k.lt0)

## [1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE

Use table() to summarize r10K.lt0. Is this what you expected? Motivate your answer.

    table(r10k.lt0)

## r10k.lt0
## FALSE  TRUE 
##  5047  4953

The expectation answer is 50:50 for True:False. But the variation set is generated randomly, so thr real situation always is that True and False cases are not equal.

Use pnorm() to obtain the theoretically expected fraction of values less than 0. Is this what you would expect?

    pnorm(0)

## [1] 0.5

Yes. The fraction should be 50%.

Redraw the ECDF for r10K only, and then use abline() to draw a vertical line at \(x = 1\). Estimate (by eye) the proportion of values in r10K that are greater than 1. What is your estimate? How many values (out of 10,000) do you expect to be > 1?

    plot.ecdf(r10k)
    abline(v = 1)

There may be around 15% of whole values are greater than 1 in r10k, that is 1500 out of 10000.

Create a logical array r10K.gt1 that tells us whether each object in r10K is greater than 1.

    r10k.gt1 = r10k > 1

Use table() to summarize r10K.gt1. Is this what you expected? Motivate your answer.

    table(r10k.gt1)

## r10k.gt1
## FALSE  TRUE 
##  8359  1641

The table answer is close to my estimation. And theoretically the table answer meets the expectation.

Use pnorm() to obtain the theoretically expected fraction of values greater than 1. It may require a little bit of thinking to get this right.

    1 - pnorm(1)

## [1] 0.1586553

What fraction of values of r10K.s2 do you expect to be larger than 10?

    plot.ecdf(r10k.s2)
    abline(v = 10)

Less than 99.99%.

Assignment 1 - Normal Distribution

Yikun Chen

Tue Jan 23 20:20:23 2018