R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

Summary of different R functions

Distributions Generates random numbers from normal distribution Probability Density Function(PDF) Cumulative Distribution Function(CDF) Quantile Function inverse of p
Normal rnorm(n, mean = 0, sd = 1) dnorm(x, mean = 0, sd = 1, log = FALSE) pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
t Distribution rt(n, df, ncp) dt(x, df, ncp, log = FALSE) pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE) qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
Exponential rexp(n, rate = 1) dexp(x, rate = 1, log = FALSE) pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE) qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)
Poisson rpois(n, lambda) dpois(x, lambda, log = FALSE) ppois(q, lambda, lower.tail = TRUE, log.p = FALSE) qpois(p, lambda, lower.tail = TRUE, log.p = FALSE)
Chi-Sqrt rchisq(n, df, ncp = 0) dchisq(x, df, ncp = 0, log = FALSE) pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE)

6.3 Using the Normal Curve to Approximate Binormial Distribution Problems

6.17

Convert the following binomial distribution problems to normal distribution problems. Use the correction for continuity.

a.

P(x ≤ 16|n = 30 and p = .70)

# function to calculate mean for a binomial distribution 
bi.mu <- function(n, p){
  return(n*p)
}

# function to calculate standard deviation for a binomial distribution 
bi.sd <- function(n, p){
  return((n*p*(1-p))^.5)
}

# function for solving both problem 19 and 17. p17 is TRUE means it's for problem 17, FALSE means for problem 19
# dec is decimal places for rounding
# a and b are the lower and upper bound for x, for display purpose only
f17 <- function(n, p, a=-Inf, b=Inf, dec=4, p17=TRUE){
  mu = bi.mu(n, p)
  sd = bi.sd(n, p)
  
  # cumulative probability density function up until a and b
  b.cpdf = round(pnorm(b, mu, sd), dec)
  a.cpdf = round(pnorm(a, mu, sd), dec)
  
  # cumulative probability between a and b
  cpdf = b.cpdf-a.cpdf
  
  l = mu - 3 * sd
  h = mu + 3 * sd
  
  if (p17){
    sprintf("P(%.2f <= x <= %.2f | mu = %.2f and sd = %.2f)", a, b, mu, sd)
  } else {
    if ( !( 0 <= l && h <= n) ){
      sprintf("Failed Test: %.2f <= %.2f & %.2f <= %.2f", 0, l, h, n)
    } else {
      sprintf("P(%.2f <= x <= %.2f | mu = %.2f and sd = %.2f) = %.4f", a, b, mu, sd, cpdf)
    }
  }
  
}

f17(30, .7, b=16.5)
## [1] "P(-Inf <= x <= 16.50 | mu = 21.00 and sd = 2.51)"

b.

P(10 < x ≤ 20)|n = 25 and p = .50)

f17(25, .5, a=10.5, b=20.5)
## [1] "P(10.50 <= x <= 20.50 | mu = 12.50 and sd = 2.50)"

c.

P(x = 22|n = 40 and p = .60)

f17(40, .6, a=21.5, b=22.5)
## [1] "P(21.50 <= x <= 22.50 | mu = 24.00 and sd = 3.10)"

d.

P(x > 14|n = 16 and p = .45)

f17(16, .45, a=14.5)
## [1] "P(14.50 <= x <= Inf | mu = 7.20 and sd = 1.99)"

6.19

Where appropriate, work the following binomial distribution problems by using the normal curve. Also, use Table A.2 to find the answers by using the binomial distribution and compare the answers obtained by the two methods.

a.

P(x = 8|n = 25 and p = .40) = ?

reference: https://www.r-bloggers.com/normal-distribution-functions/

# function for problem 6.19, using same function as in problem 6.17 with p17=FALSE
f19 <- function(n, p, a=-Inf, b=Inf, dec=4, p17=FALSE){
  f17(n, p, a, b, dec, p17)
}

f19(n=25, p=.4, a=7.5, b=8.5)
## [1] "P(7.50 <= x <= 8.50 | mu = 10.00 and sd = 2.45) = 0.1164"

b.

notice difference between > and >= (similarly < vs <=)

P(x ≥ 13|n = 20 and p = .60) = ?

f19(n=20, p=.6, a=12.5, b=Inf)
## [1] "P(12.50 <= x <= Inf | mu = 12.00 and sd = 2.19) = 0.4097"

c.

P(x = 7|n = 15 and p = .50) = ?

f19(n=15, p=.5, a=7.5, b=8.5)
## [1] "P(7.50 <= x <= 8.50 | mu = 7.50 and sd = 1.94) = 0.1972"

d.

P(x < 3|n = 10 and p = .70) = ?

f19(n=10, p=.7, a=-Inf, b=2.5)
## [1] "Failed Test: 0.00 <= 2.65 & 11.35 <= 10.00"

6.21

One study on managers’ satisfaction with management tools reveals that 59% of all managers use self-directed work teams as a management tool. Suppose 70 managers selected randomly in the United States are interviewed. What is the probability that fewer than 35 use self-directed work teams as a management tool?

Reference: http://www.r-tutor.com/elementary-statistics/probability-distributions/binomial-distribution

Why is the answer .0495???

n = 70
p = .59
sum(dbinom(seq(0, 34), 70, .59))
## [1] 0.05013635
pbinom(34.5, size = 70, prob = .59)
## [1] 0.05013635

6.23

According to the International Data Corporation, HP is the leading company in the United States in PC sales with about 26% of the market share. Suppose a business researcher randomly selects 130 recent purchasers of PCs in the United States.

a.

What is the probability that more than 39 PC purchasers bought an HP computer?

p = .26
n = 130
x = 39
pbinom(x, n, p, lower.tail = FALSE)
## [1] 0.1280253

b.

What is the probability that between 28 and 38 PC purchasers (inclusive) bought an HP computer?

pbinom(38, n, p) - pbinom(27, n, p)
## [1] 0.724942

c.

What is the probability that fewer than 23 PC purchasers bought an HP computer?

pbinom(22, n, p)
## [1] 0.009623083

d.

What is the probability that exactly 33 PC purchasers bought an HP computer?

dbinom(33, n, p)
## [1] 0.07915209

6.4 Exponential Distribution

PDF

exp.d <- function(x, lambda=1){
  return(lambda * exp(-lambda * x))
}

x = seq(0, 8, .01);y=dexp(x);y1=exp.d(x)
all(y == y1)
## [1] TRUE

Cumulative PDF plot

require("ggplot2")
## Loading required package: ggplot2
lambdas = c(.2, .5, 1., 2.)

df <- data.frame()

for (lambda in lambdas) {
  df0 = data.frame(x)
  df0$y = dexp(x, lambda)
  df0$lambda = as.character(lambda)
  df = rbind(df, df0)
}

g <- ggplot(data=df, aes(x=x, y=y), size=.8) + 
  geom_line(aes(col=lambda)) + 
  ggtitle("Exponential Distribution") +
  theme(
  panel.background = element_blank(),
  plot.title = element_text(family = "Helvetica", face="bold", color="steelblue", size = (15), hjust=.5), 
                  axis.title = element_text(family = "Helvetica", size = (15), color="steelblue4"),
                  axis.text = element_text(family = "Courier", color="cornflowerblue", size = (15))
  )
g

Probabilities of Exponential Distribution

About 9.07% of the time when the rate of random arrivals is 1.2 per minute, 2 minutes or more will elapse between arrival

exp.p <- function(x, lambda, lower.tail = TRUE){
  p = exp(-x*lambda)
  if (lower.tail){
    return(1-p)
  } else {
    return(p)
  }
  
}

f6.27 <- function(x, l, lower.tail=TRUE){
  print(c(pexp(x, l, lower.tail), exp.p(x, l, lower.tail)))
}
f6.27(2, 1.2, FALSE)
## [1] 0.09071795 0.09071795

6.27

a.

P(x ≥ 5|λ = 1.35)

f6.27(5, 1.35, FALSE)
## [1] 0.00117088 0.00117088

b.

P(x < 3|λ = 0.68)

f6.27(3, .68)
## [1] 0.8699713 0.8699713

c.

P(x > 4|λ = 1.7)

f6.27(4, 1.7, FALSE)
## [1] 0.001113775 0.001113775

d.

P(x < 6|λ = 0.80)

f6.27(6, .8, TRUE)
## [1] 0.9917703 0.9917703

6.33

During the dry month of August, one U.S. city has measurable rain on average only two days per month. If the arrival of rainy days is Poisson distributed in this city during the month of August, what is the average number of days that will pass between measurable rain? What is the standard deviation? What is the probability during this month that there will be a period of less than two days between rain?

f6.27(2, 1/15)
## [1] 0.1248267 0.1248267

Chapter 7. Sampling and Sampling Distributions

7.1 Sampling

7.2 Sampling Distribution of x bar

CLT sd

clt.sd <- function(rho, n){
  return(rho/n^.5)
}

clt.sd.f <- function(rho, n, N){
  return( rho/(n^.5) * ((N-n)/(N-1))^.5 )
}


z.score <- function(x, mu, rho){
  (x-mu)/rho
}

7.13

A population has a mean of 50 and a standard deviation of 10. If a random sample of 64 is taken, what is the probability that the sample mean is each of the following?

a.

Greater than 52

x=52;mu=50;rho=10;size=64
rho.x_=clt.sd(rho, size)
q = z.score(x, mu, rho.x_)
pnorm(q, lower.tail = FALSE)
## [1] 0.05479929
f7.13 <- function(x=52, mu=50, rho=10, n=64, lower.tail=TRUE){
  rho.x_=clt.sd(rho, n)
  q = z.score(x, mu, rho.x_)
  return(pnorm(q, lower.tail=lower.tail))
}
f7.13(lower.tail=FALSE)
## [1] 0.05479929

b.

Less than 51

f7.13(x=51)
## [1] 0.7881446

c.

Less than 47

f7.13(x=47)
## [1] 0.008197536

d.

Between 48.5 and 52.4

f7.13(x=52.4) - f7.13(x=48.5)
## [1] 0.8575014

e.

Between 50.6 and 51.3

f7.13(x=51.3) - f7.13(x=50.6)
## [1] 0.1664437

7.15

Suppose a random sample of size 36 is drawn from a population with a mean of 278. If 86% of the time the sample mean is less than 280, what is the population standard deviation?

z = qnorm(.86)
x=280;mu=278;n=36
rho = (x-mu)/z*n^.5
rho
## [1] 11.10783

7.17

Find the probability in each case.

a.

f7.17 <- function(x=76.5, mu=75, rho=6, n=60, N=1000, lower.tail=TRUE){
  rho.x_=clt.sd.f(rho, n, N)
  q = z.score(x, mu, rho.x_)
  return(pnorm(q, lower.tail=lower.tail))
}
f7.17()
## [1] 0.9770515

b.

x1=107;x2=107.7;N=90;n=36;mu=108;rho=3.46
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.2391082

c.

x1=36;x2=Inf;N=250;n=100;mu=35.6;rho=4.89
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.1459611

d.

x1=-Inf;x2=123;N=5000;n=60;mu=125;rho=13.4
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.1224152

7.19

Suppose a subdivision on the southwest side of Denver, Colorado, contains 1,500 houses. The subdivision was built in 1983. A sample of 100 houses is selected randomly and evaluated by an appraiser. If the mean appraised value of a house in this subdivision for all houses is $227,000, with a standard deviation of $8,500, what is the probability that the sample average is greater than $229,000?

N=1500;n=100;mu=227000;rho=8500;x1=229000;x2=Inf
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.007451792

7.21

According to Nielsen Media Research, the average number of hours of TV viewing by adults (18 and over) per week in the United States is 36.07 hours. Suppose the standard deviation is 8.4 hours and a random sample of 42 adults is taken.

a.

What is the probability that the sample average is more than 38 hours?

mu=36.07;rho=8.4;n=42;x1=38;x2=Inf
f7.13(x2,mu,rho,n) - f7.13(x1, mu, rho, n)
## [1] 0.06824009

b.

What is the probability that the sample average is less than 33.5 hours?

f7.13(33.5, mu, rho, n)
## [1] 0.023695

c.

What is the probability that the sample average is less than 26 hours?

f7.13(26, mu, rho, n)
## [1] 3.94999e-15

If the sample average actually is less than 26 hours, what would it mean in terms of the Nielsen Media Research figures? - It likely means the sample is not randomized, the sample is very biased

d.

Suppose the population standard deviation is unknown. If 71% of all sample means are greater than 35 hours and the population mean is still 36.07 hours, what is the value of the population standard deviation?

z = qnorm(.71, lower.tail = FALSE)
x=35;mu=36.07;n=42
rho = (x-mu)/z*n^.5
rho
## [1] 12.53087

7.3 Sampling Distribution of p_hat (sample proportion)

7.5 z Formula for sample proportions for np>5 and nq>5

z.phat <- function(phat, p, n){
  return((phat-p)/(p*(1-p)/n)^.5)
}
z.phat(.5, .6, 120)
## [1] -2.236068

7.23

A population proportion is .58. Suppose a random sample of 660 items is sampled randomly from this population.

a.

What is the probability that the sample proportion is greater than .60?

p=.58;n=660;phat=.6
q = z.phat(phat, p, n)
pnorm(q, lower.tail = FALSE)
## [1] 0.1489308

b.

What is the probability that the sample proportion is between .55 and .65?

q1 = z.phat(.55, p, n)
q2 = z.phat(.65, p, n)
pnorm(q2) - pnorm(q1)
## [1] 0.940668

c.

What is the probability that the sample proportion is greater than .57?

q = z.phat(.57, p, n)
pnorm(q, lower.tail = FALSE)
## [1] 0.6986477

d.

What is the probability that the sample proportion is between .53 and .56?

q1 = z.phat(.53, p, n)
q2 = z.phat(.56, p, n)
pnorm(q2) - pnorm(q1)
## [1] 0.1443044

e.

What is the probability that the sample proportion is less than .48?

pnorm(z.phat(.48, p, n))
## [1] 9.69195e-08

7.25

If a population proportion is .28 and if the sample size is 140, 30% of the time the sample proportion will be less than what value if you are taking random samples?

p=.28;n=140;
z = qnorm(.3)
phat <- z * (p*(1-p)/n)^.5 + p
phat
## [1] 0.2601004

7.27

According to a survey by Accountemps, 48% of executives believe that employees are most productive on Tuesdays. Suppose 200 executives are randomly surveyed.

a.

What is the probability that fewer than 90 of the executives believe employees are most productive on Tuesdays?

p=.48;n=200
pnorm(z.phat(90/n, p, n))
## [1] 0.1978828

b.

What is the probability that more than 100 of the executives believe employees are most productive on Tuesdays?

pnorm(z.phat(100/n, p, n), lower.tail = FALSE)
## [1] 0.2856498

c.

What is the probability that more than 80 of the executives believe employees are most productive on Tuesdays?

pnorm(z.phat(80/n, p, n), lower.tail = FALSE)
## [1] 0.98823

Test2

(1)

Assume the outcomes from an experiment with a trials conform to a Poisson distribution with lambda equal to 2. Determine the probability of obtaining an outcome greater than 4. (Round to four decimal places.)

ppois(5, 2, lower.tail = FALSE)
## [1] 0.01656361

(2) Use Bayes’ theorem to find the indicated probability.

5.8% of a population is infected with a certain disease. There is a test for the disease, however the test is not completely accurate. 93.9% of those who have the disease test positive. However 4.1% of those who do not have the disease also test positive (false positives). A person is randomly selected and tested for the disease.

What is the probability that the person has the disease given that the test result is positive?

p.disease = .058
p.positive.when.disease = .939
p.positive.when.disease_no = .041

p.disease_no = 1 - p.disease

p.positive = p.positive.when.disease*p.disease + p.positive.when.disease_no*p.disease_no

p.disease.when.positive = p.disease * p.positive.when.disease / p.positive
p.disease.when.positive
## [1] 0.5850844

(3)

Given the following probability distribution for the variable x, compute the expected value and variance of x using the distribution below. Round to two decimal places.

x 0 1 2 3
Probability .749 .225 .024 .002
x <- c(0, 1, 2, 3)
p <- c(.749, .225, .024, .002)

mu = x %*% p
round(mu, 2)
##      [,1]
## [1,] 0.28
discrete.var <- function(x, mu, p){
  return((x-c(mu))^2 %*% p)
}
round(discrete.var(x, mu, p), 2)
##      [,1]
## [1,] 0.26

(4) Solve the problem.

Given the following sample data: 1.3,2.2,2.7,3.1,3.3,3.7, use quantile() in R with type = 7 to find the estimated 33rd percentile. Round to 2 decimal places. Pick the correct answer.

round(quantile(c(1.3,2.2,2.7,3.1,3.3,3.7), probs = c(0, .25, .33, .5, .75, 1), type = 7), 2)
##   0%  25%  33%  50%  75% 100% 
## 1.30 2.33 2.53 2.90 3.25 3.70

(5) Solve the problem.

A study of the amount of time it takes a mechanic to rebuild the transmission for a 2005 Chevrolet Cavalier shows that the mean is 8.4 hours and the standard deviation is 1.8 hours. If a random sample of 36 mechanics is selected, find the probability that their mean rebuild time exceeds 8.7 hours. Assume the mean rebuild time has a normal distribution. (Hint, interpolate in the tables or use pnorm().)

clt.sd <- function(rho, n){
  return(rho/n^.5)
}

clt.sd.f <- function(rho, n, N){
  return( rho/(n^.5) * ((N-n)/(N-1))^.5 )
}


z.score <- function(x, mu, rho){
  (x-mu)/rho
}

f7.13 <- function(x=52, mu=50, rho=10, n=64, lower.tail=TRUE){
  rho.x_=clt.sd(rho, n)
  q = z.score(x, mu, rho.x_)
  return(pnorm(q, lower.tail=lower.tail))
}

round(f7.13(x=8.7, mu=8.4, rho=1.8, n=36, lower.tail=FALSE), 3)
## [1] 0.159

(6)

Use the normal distribution to estimate the probability of 50 successes for a binomial distribution with n = 76 and the probability of success p = 0.7. Do NOT use the binomial distribution but use normal approximation to binomial distribution. Round to four decimal places.

# function to calculate mean for a binomial distribution 
bi.mu <- function(n, p){
  return(n*p)
}

# function to calculate standard deviation for a binomial distribution 
bi.sd <- function(n, p){
  return((n*p*(1-p))^.5)
}

n=76
p=.7

mu = bi.mu(n, p)
sd = bi.sd(n, p)

round(dnorm(50, mu, sd), 4)
## [1] 0.0725

(7)

The given values are discrete (binomial outcomes). Use the continuity correction and describe the region of the normal distribution that corresponds to the indicated probability.

The probability of less than 48 correct answers.

 The area to the left of 47.5 

(8) Solve the problem. Round to the nearest tenth unless indicated otherwise.

In one region, the September energy consumption levels for single-family homes are found to be normally distributed with a mean of 1050 kWh and a standard deviation of 225 kWh. Find P45 (45th percentile).

mu = 1050
rho = 225
p = .45
round(qnorm(p, mu, rho), 1)
## [1] 1021.7

(9) Find the indicated probability.

In a taste test, five different customers are each presented with 3 different soft drinks. The same soft drinks are used with each customer, but presented in random order. If the selections were made by random guesses, find the probability that all five customers witnesses would pick the same soft drink as their favorite. (There is more than one way the customers can agree.)

round(3/(3^5), 5)
## [1] 0.01235

(10) Find the indicated probability.

As a prize for winning a contest, the contestant is blindfolded and allowed to draw 3 dollar bills one at a time out of an urn. The urn contains forty $1 bills and ten $100 bills. The urn is churned before each selection so that the selection would be at random. What is the probability that none of the $100 bills are selected? (Round to four decimal places.)

round(40/50*39/49*38/48, 4)
## [1] 0.5041