Problem set 1

  1. First write a function that will produce a sample of random variable that is distributed as follows:

\[ \begin{equation} f(x) = x, 0\leq x \leq 1 \\ f(x) = 2 - x, 1\lt x \leq 2 \end{equation} \]

That is, when your function is called, it will return a random variable between 0 and 2 that is distributed according to the above PDF. Please note that this is not the same as writing a function and sampling uniformly from it. In the online session this week, I’ll cover Sampling techniques. You will find it useful when you do the assignment for this week. In addition, as usual, there are one-liners in R that will give you samples from a function. We’ll cover both of these approaches in the online session.

pdf1 <- function(x){
  if(x>=0 && x<=2){
    if(x<=1){
      return(x)
  }else{
    return(2-x)
    }
  }
}
  1. Now, write a function that will produce a sample of random variable that is distributed as follows:

\[ \begin{equation} f(x) = 1-x, 0\leq x \leq 1 \\ f(x) = x-1, 1\lt x \leq 2 \end{equation} \]

pdf2 <- function(x){
  if(x>=0 && x<=2){
    if(x<=1){
      return(1-x)
    }else{
      return(x-1)
    }
  }
}
  1. Draw 1000 samples (call your function 1000 times each) from each of the above two distributions and plot the resulting histograms. You should have one histogram for each PDF. See that it matches your understanding of these PDFs.
#This will give us a random sample of values within our range
samp <- runif(1000,0,2)
samp[1:10]
##  [1] 1.8917914 1.7198534 0.8054960 0.4864305 0.8108921 0.5591463 1.9412815
##  [8] 0.8468654 1.4005029 1.3590370
#sapply function inputs the sample valus into our PDFs
p1 <- sapply(samp, pdf1)
p1[1:10]
##  [1] 0.10820859 0.28014658 0.80549601 0.48643052 0.81089210 0.55914625
##  [7] 0.05871845 0.84686538 0.59949713 0.64096302
p2 <- sapply(samp, pdf2)

#I would have ended up with this had I not seen the lecture.
hist(p1)

I was lost on what to do next but found the following link which showed the use of the “prob” argument:

http://stackoverflow.com/questions/17001808/generate-random-integers-between-two-values-with-a-given-probability-using-r

samp1 <- sample(samp, 1000, replace = TRUE, prob = p1)
hist(samp1, breaks=25)

samp2 <- sample(samp, 1000, replace = TRUE, prob = p2)
hist(samp2, breaks=25)

  1. Now, write a program that will take a sample set size n as a parameter and the PDF as the second parameter, and perform 1000 iterations where it samples from the PDF, each time taking n samples and computes the mean of these n samples. It then plots a histogram of these 1000 means that it computes.
meanSamp <- function(set, n, pdf){
  #take a large data set, the sample size you will take from the data set and the PDF
  means <- c() #empty vector
  for (i in 1:1000){
    p <- sapply(set, pdf)
    samp <- sample(set, n, replace = TRUE, prob = p)
    #store the means in the empty vector
    means <- c(means, mean(samp))
  }
  return (hist(means, breaks=25))

}
  1. Verify that as you set n to something like 10 or 20, each of the two PDFs produce normally distributed mean of samples, empirically verifying the Central Limit Theorem. Please play around with various values of n and you’ll see that even for reasonably small sample sizes such as 10, Central Limit Theorem holds
meanSamp(samp, 20, pdf1)

meanSamp(samp, 10, pdf1)

meanSamp(samp, 20, pdf2)

meanSamp(samp, 10, pdf2)

Even with a sample size of 10 the central limit theorem holds true