The following table gives the joint probability distribution between employment status and college graduation among those either employed or looking for work (unemployed) in the working-age U.S. population for September 2017.
| Unemployed(Y=0) | Employed(Y=1) | |
|---|---|---|
| Non-college grads(X=0) | 0.026 | 0.576 |
| College grads (X=1) | 0.009 | 0.389 |
p <- matrix(c(.026,.0009,.576,.389), ncol=2)
p
## [,1] [,2]
## [1,] 0.0260 0.576
## [2,] 0.0009 0.389
py <- apply(p,2,sum)
Ey <- .0269*0 + .9650*1
Ey
## [1] 0.965
Unemployment rate = # of unemployed / # of labor force = Pr(Y=0) = 1 - Pr(Y=1) = 1 - E(Y)
px <- apply(p,1,sum)
p_y0_x0 <- p[1,1]/px[1]
p_y1_x0 <- p[1,2]/px[1]
p_y0_x1 <- p[2,1]/px[2]
p_y1_x1 <- p[2,2]/px[2]
e_y_x1 <- 0*p_y0_x1 + 1*p_y1_x1
e_y_x1
## [1] 0.9976917
e_y_x0 <- 0*p_y0_x0 + 1*p_y1_x0
e_y_x0
## [1] 0.9568106
unemployment_college <- 1-e_y_x1
unemployment_college
## [1] 0.002308284
unemployment_non <- 1-e_y_x0
unemployment_non
## [1] 0.04318937
p_x1_y0 <- p[2,1]/py[1]
p_non <- 1-p_x1_y0
p_non
## [1] 0.9665428
No. Employment status is dependent on educational achievement. The expected value changes when rows and columns are switched.
The random variable Y has a mean of 1 and a variance of 4. Let Z=(Y-1)/2. Show that E(Z)=0 and var(Z)=1.
ey <- 1
vy <- 4
ez <- (1/2)*ey-(1/2)
ez
## [1] 0
vz <- (1/2)^2*(vy)
vz
## [1] 1
Y follows normal distribution with mean=30, variance=100.
pnorm(50,mean=30,sd=10) - pnorm(20,mean=30,sd=10)
## [1] 0.8185946
pnorm
## function (q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
## .Call(C_pnorm, q, mean, sd, lower.tail, log.p)
## <bytecode: 0x7fcd71439120>
## <environment: namespace:stats>
# draw 10,000 number from the above distribution and store it.
a <- rnorm(10000,30,10)
# count the number of draws that falls in the given range.
sum(a>30 & a<45)
## [1] 4295
From the class example, we know that the distribution with possible values of X 7, 77, and 777, and the following probability:
\(P(X=7)=0.7\)
\(P(X=77)=0.2\)
\(P(X=777)=0.1\)
has a mean \(\mu =98\), standard deviation \(\sigma =228.01\). Use 500 repetitions, each with a sample size of 100,
Your code could look like the following:
reps = 500
sample.size = 100
for(n in sample.size) {
samplemean <- rep(0,reps)
stdsamplemean <- rep(0,reps)
for(i in 1:reps){
x <- rnorm(n,98,228.01) #draw one such sample with n=100
samplemean[i] <- mean(x) #store the mean of the sample
stdsamplemean[i] <- sd(x) #store the standard deviation of the sample
}
}
# compute the mean of the sample means
mean(samplemean)
## [1] 98.53246
# compute the mean of the sample standard deviations
mean(stdsamplemean)
## [1] 226.3606
# plot your data:
# the hist() function is the easiest way to do so.
hist(samplemean)
#we will learn more sophisticated plotting skills later.