Question 1:

The following table gives the joint probability distribution between employment status and college graduation among those either employed or looking for work (unemployed) in the working-age U.S. population for September 2017.

Unemployed(Y=0) Employed(Y=1)
Non-college grads(X=0) 0.026 0.576
College grads (X=1) 0.009 0.389
p <- matrix(c(.026,.0009,.576,.389), ncol=2)
p
##        [,1]  [,2]
## [1,] 0.0260 0.576
## [2,] 0.0009 0.389
  1. Compute E(Y)
py <- apply(p,2,sum)
Ey <- .0269*0 + .9650*1
Ey
## [1] 0.965
  1. The unemployment rate is the fraction of the labor force that is unemployed. Which one of the following E(X), E(Y), 1-E(X), 1-E(Y), E(X)*E(Y), E(XY), is the unemployment rate? Explain.

Unemployment rate = # of unemployed / # of labor force = Pr(Y=0) = 1 - Pr(Y=1) = 1 - E(Y)

  1. Calculate E(Y|X=1) and E(Y|X=0)
px <- apply(p,1,sum)
p_y0_x0 <- p[1,1]/px[1]
p_y1_x0 <- p[1,2]/px[1]
p_y0_x1 <- p[2,1]/px[2]
p_y1_x1 <- p[2,2]/px[2]
e_y_x1 <- 0*p_y0_x1 + 1*p_y1_x1
e_y_x1
## [1] 0.9976917
e_y_x0 <- 0*p_y0_x0 + 1*p_y1_x0
e_y_x0
## [1] 0.9568106
  1. Calculate the unemployment rate for (i) college graduates, and (ii) non-college graduates.
unemployment_college <- 1-e_y_x1
unemployment_college
## [1] 0.002308284
unemployment_non <- 1-e_y_x0
unemployment_non
## [1] 0.04318937
  1. A randomly selected number of this population reports being unemployed. What is the probability that this worker is not a college graduate?
p_x1_y0 <- p[2,1]/py[1]
p_non <- 1-p_x1_y0
p_non
## [1] 0.9665428
  1. Are educational achievement and employment status independent? Explain.

No. Employment status is dependent on educational achievement. The expected value changes when rows and columns are switched.

Question 2:

The random variable Y has a mean of 1 and a variance of 4. Let Z=(Y-1)/2. Show that E(Z)=0 and var(Z)=1.

ey <- 1
vy <- 4
ez <- (1/2)*ey-(1/2)
ez
## [1] 0
vz <- (1/2)^2*(vy)
vz 
## [1] 1

Question 3:

Y follows normal distribution with mean=30, variance=100.

  1. Compute the theoretical probability of \(P(20\leq Y \leq 50)\). (Hint: the “pnorm” function returns the distribution function of the normal distribution.)
pnorm(50,mean=30,sd=10) - pnorm(20,mean=30,sd=10)
## [1] 0.8185946
pnorm
## function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) 
## .Call(C_pnorm, q, mean, sd, lower.tail, log.p)
## <bytecode: 0x7fcd71439120>
## <environment: namespace:stats>
  1. Use simulations (10,000 repetitions) to compute \(P(30\leq Y \leq 45)\). (Hint: the rnorm function returns random numbers from the normal distribution. Your code could look like the following:)
# draw 10,000 number from the above distribution and store it.
a <- rnorm(10000,30,10)
# count the number of draws that falls in the given range. 
sum(a>30 & a<45)
## [1] 4295

Question 4:

From the class example, we know that the distribution with possible values of X 7, 77, and 777, and the following probability:

\(P(X=7)=0.7\)
\(P(X=77)=0.2\)
\(P(X=777)=0.1\)

has a mean \(\mu =98\), standard deviation \(\sigma =228.01\). Use 500 repetitions, each with a sample size of 100,

  1. Verify the mean and the standard deviation of the above distribution
  2. Plot the distribution of means for those 500 repetitions

Your code could look like the following:

reps = 500
sample.size = 100
for(n in sample.size) {
  samplemean <- rep(0,reps)
  stdsamplemean <- rep(0,reps)
  
  for(i in 1:reps){
  x <- rnorm(n,98,228.01) #draw one such sample with n=100
  samplemean[i] <- mean(x)  #store the mean of the sample
  stdsamplemean[i] <- sd(x) #store the standard deviation of the sample
  }
}
# compute the mean of the sample means
mean(samplemean)
## [1] 98.53246
# compute the mean of the sample standard deviations
mean(stdsamplemean)
## [1] 226.3606
# plot your data:
  # the hist() function is the easiest way to do so.
hist(samplemean)

  #we will learn more sophisticated plotting skills later.