a.Then run the following R commands. Please spend some time trying understand the code well.

rnorm2 <- function(n,mean,sd) { mean+sd*scale(rnorm(n)) } #the function represent the distribution calculation
set.seed(1239)
r1 <- rnorm2(100,25,4)
r2 <- rnorm2(50,10,3)
samplingframe <- c(r1,r2) #conmbine r1 and r2
hist(samplingframe, breaks=20,col = "pink")

Please describe the distribution that you obtain in one or two sentences

According to the code, r1 represents the calculation of distribution for 100 samples with mean 25 and standard deviation 4 and r2 represents same calculation but for 50 samples with mean 10 and standard deviation 3. The output histogram seems a negatively skewed binominal distribution because the result shows the distribution is neither bell curve nor symmetric.

b. Draw 50 samples of size 15 from the sampling frame in part a, and plot the sampling distribution of means as a histogram.

Set the function first.

samp1m <- function(si, sa){
  samp1 <- function(si){
    samp1 <- sample(samplingframe, size = si, replace = F) #sampling with size "si"
    return(mean(samp1)) #return the mean of the sampling results
  }
  set.seed(1239) #set the seed to ensure the data not wrong
  replicate(sa, samp1(si)) #using replicate to repeat the calculation of mean "sa" times
}

Output plot with 50 samples of size 15.

hist(samp1m (15,50), xlab = "Sample Mean", main = "Distribution of Sample Mean (Sample: 50, Size: 15)", col = "blue") #output the histogram with samples of 50 and size of 15 and set the x-axis, title, and color.

c. Draw 50 samples of size 45 from the sampling frame in part a, and plot the sampling distribution of means as a histogram.

Using same function used in part b. Set the size to 45 and change the color to red. Output the result.

hist(samp1m (45,50), xlab = "Sample Mean", main = "Distribution of Sample Mean (Sample: 50, Size: 45)", col = "red") #output the histogram with samples of 50 and size of 15 and set the x-axis, title, and color.

d. Please ensure that the distributions in parts b and c are side-by-side on the same plot. Explain the three histograms in terms of their differences and similarities (in less than 25 words)

par(mfrow=c(1,2))
hist(samp1m (15,50), xlab = "Sample Mean", main = "Distribution of Sample Mean (Sample: 50, Size: 15)", col = "blue")
hist(samp1m (45,50), xlab = "Sample Mean", main = "Distribution of Sample Mean (Sample: 50, Size: 45)", col = "red")

The distribution from Part A with more samples seems smoother than other two. The distributions from other two parts are more close to normal distribution.

e. Explain CLT in your own words in one or two sentences.

The Central Limit Theorems states that given the large amount of variables from a population, the distribution of all the samples will shows a normal distribution pattern. Additionally, if the sample size was large enough, the mean of samples will approximately equal to mean of population.

f. Does this exercise help you understand CLT? If so why? If not, why not? Restrict your response to one or two sentences.

Yes. In the exercise, we can see that, comparing to distribution with size 15, the distribution with size 45 is obviously more close to normal distribution. It supports the theory that the larger sample size, the closer to the normal distribution.

Assignment 02 Part C

Jiasheng Li

January 30, 2018