This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Distributions | Generates random numbers from normal distribution | Probability Density Function(PDF) | Cumulative Distribution Function(CDF) | Quantile Function inverse of p |
---|---|---|---|---|
Normal | rnorm(n, mean = 0, sd = 1) | dnorm(x, mean = 0, sd = 1, log = FALSE) | pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) | qnorm(p, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) |
t Distribution | rt(n, df, ncp) | dt(x, df, ncp, log = FALSE) | pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE) | qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE) |
Exponential | rexp(n, rate = 1) | dexp(x, rate = 1, log = FALSE) | pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE) | qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE) |
Poisson | rpois(n, lambda) | dpois(x, lambda, log = FALSE) | ppois(q, lambda, lower.tail = TRUE, log.p = FALSE) | qpois(p, lambda, lower.tail = TRUE, log.p = FALSE) |
Chi-Sqrt | rchisq(n, df, ncp = 0) | dchisq(x, df, ncp = 0, log = FALSE) | pchisq(q, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) | qchisq(p, df, ncp = 0, lower.tail = TRUE, log.p = FALSE) |
Convert the following binomial distribution problems to normal distribution problems. Use the correction for continuity.
P(x ≤ 16|n = 30 and p = .70)
# function to calculate mean for a binomial distribution
bi.mu <- function(n, p){
return(n*p)
}
# function to calculate standard deviation for a binomial distribution
bi.sd <- function(n, p){
return((n*p*(1-p))^.5)
}
# function for solving both problem 19 and 17. p17 is TRUE means it's for problem 17, FALSE means for problem 19
# dec is decimal places for rounding
# a and b are the lower and upper bound for x, for display purpose only
f17 <- function(n, p, a=-Inf, b=Inf, dec=4, p17=TRUE){
mu = bi.mu(n, p)
sd = bi.sd(n, p)
# cumulative probability density function up until a and b
b.cpdf = round(pnorm(b, mu, sd), dec)
a.cpdf = round(pnorm(a, mu, sd), dec)
# cumulative probability between a and b
cpdf = b.cpdf-a.cpdf
l = mu - 3 * sd
h = mu + 3 * sd
if (p17){
sprintf("P(%.2f <= x <= %.2f | mu = %.2f and sd = %.2f)", a, b, mu, sd)
} else {
if ( !( 0 <= l && h <= n) ){
sprintf("Failed Test: %.2f <= %.2f & %.2f <= %.2f", 0, l, h, n)
} else {
sprintf("P(%.2f <= x <= %.2f | mu = %.2f and sd = %.2f) = %.4f", a, b, mu, sd, cpdf)
}
}
}
f17(30, .7, b=16.5)
## [1] "P(-Inf <= x <= 16.50 | mu = 21.00 and sd = 2.51)"
P(10 < x ≤ 20)|n = 25 and p = .50)
f17(25, .5, a=10.5, b=20.5)
## [1] "P(10.50 <= x <= 20.50 | mu = 12.50 and sd = 2.50)"
P(x = 22|n = 40 and p = .60)
f17(40, .6, a=21.5, b=22.5)
## [1] "P(21.50 <= x <= 22.50 | mu = 24.00 and sd = 3.10)"
P(x > 14|n = 16 and p = .45)
f17(16, .45, a=14.5)
## [1] "P(14.50 <= x <= Inf | mu = 7.20 and sd = 1.99)"
Where appropriate, work the following binomial distribution problems by using the normal curve. Also, use Table A.2 to find the answers by using the binomial distribution and compare the answers obtained by the two methods.
P(x = 8|n = 25 and p = .40) = ?
reference: https://www.r-bloggers.com/normal-distribution-functions/
# function for problem 6.19, using same function as in problem 6.17 with p17=FALSE
f19 <- function(n, p, a=-Inf, b=Inf, dec=4, p17=FALSE){
f17(n, p, a, b, dec, p17)
}
f19(n=25, p=.4, a=7.5, b=8.5)
## [1] "P(7.50 <= x <= 8.50 | mu = 10.00 and sd = 2.45) = 0.1164"
notice difference between > and >= (similarly < vs <=)
P(x ≥ 13|n = 20 and p = .60) = ?
f19(n=20, p=.6, a=12.5, b=Inf)
## [1] "P(12.50 <= x <= Inf | mu = 12.00 and sd = 2.19) = 0.4097"
P(x = 7|n = 15 and p = .50) = ?
f19(n=15, p=.5, a=7.5, b=8.5)
## [1] "P(7.50 <= x <= 8.50 | mu = 7.50 and sd = 1.94) = 0.1972"
P(x < 3|n = 10 and p = .70) = ?
f19(n=10, p=.7, a=-Inf, b=2.5)
## [1] "Failed Test: 0.00 <= 2.65 & 11.35 <= 10.00"
One study on managers’ satisfaction with management tools reveals that 59% of all managers use self-directed work teams as a management tool. Suppose 70 managers selected randomly in the United States are interviewed. What is the probability that fewer than 35 use self-directed work teams as a management tool?
Reference: http://www.r-tutor.com/elementary-statistics/probability-distributions/binomial-distribution
n = 70
p = .59
sum(dbinom(seq(0, 34), 70, .59))
## [1] 0.05013635
pbinom(34.5, size = 70, prob = .59)
## [1] 0.05013635
According to the International Data Corporation, HP is the leading company in the United States in PC sales with about 26% of the market share. Suppose a business researcher randomly selects 130 recent purchasers of PCs in the United States.
What is the probability that more than 39 PC purchasers bought an HP computer?
p = .26
n = 130
x = 39
pbinom(x, n, p, lower.tail = FALSE)
## [1] 0.1280253
What is the probability that between 28 and 38 PC purchasers (inclusive) bought an HP computer?
pbinom(38, n, p) - pbinom(27, n, p)
## [1] 0.724942
What is the probability that fewer than 23 PC purchasers bought an HP computer?
pbinom(22, n, p)
## [1] 0.009623083
What is the probability that exactly 33 PC purchasers bought an HP computer?
dbinom(33, n, p)
## [1] 0.07915209
exp.d <- function(x, lambda=1){
return(lambda * exp(-lambda * x))
}
x = seq(0, 8, .01);y=dexp(x);y1=exp.d(x)
all(y == y1)
## [1] TRUE
require("ggplot2")
## Loading required package: ggplot2
lambdas = c(.2, .5, 1., 2.)
df <- data.frame()
for (lambda in lambdas) {
df0 = data.frame(x)
df0$y = dexp(x, lambda)
df0$lambda = as.character(lambda)
df = rbind(df, df0)
}
g <- ggplot(data=df, aes(x=x, y=y), size=.8) +
geom_line(aes(col=lambda)) +
ggtitle("Exponential Distribution") +
theme(
panel.background = element_blank(),
plot.title = element_text(family = "Helvetica", face="bold", color="steelblue", size = (15), hjust=.5),
axis.title = element_text(family = "Helvetica", size = (15), color="steelblue4"),
axis.text = element_text(family = "Courier", color="cornflowerblue", size = (15))
)
g
About 9.07% of the time when the rate of random arrivals is 1.2 per minute, 2 minutes or more will elapse between arrival
exp.p <- function(x, lambda, lower.tail = TRUE){
p = exp(-x*lambda)
if (lower.tail){
return(1-p)
} else {
return(p)
}
}
f6.27 <- function(x, l, lower.tail=TRUE){
print(c(pexp(x, l, lower.tail), exp.p(x, l, lower.tail)))
}
f6.27(2, 1.2, FALSE)
## [1] 0.09071795 0.09071795
P(x ≥ 5|λ = 1.35)
f6.27(5, 1.35, FALSE)
## [1] 0.00117088 0.00117088
P(x < 3|λ = 0.68)
f6.27(3, .68)
## [1] 0.8699713 0.8699713
P(x > 4|λ = 1.7)
f6.27(4, 1.7, FALSE)
## [1] 0.001113775 0.001113775
P(x < 6|λ = 0.80)
f6.27(6, .8, TRUE)
## [1] 0.9917703 0.9917703
During the dry month of August, one U.S. city has measurable rain on average only two days per month. If the arrival of rainy days is Poisson distributed in this city during the month of August, what is the average number of days that will pass between measurable rain? What is the standard deviation? What is the probability during this month that there will be a period of less than two days between rain?
f6.27(2, 1/15)
## [1] 0.1248267 0.1248267
clt.sd <- function(rho, n){
return(rho/n^.5)
}
clt.sd.f <- function(rho, n, N){
return( rho/(n^.5) * ((N-n)/(N-1))^.5 )
}
z.score <- function(x, mu, rho){
(x-mu)/rho
}
A population has a mean of 50 and a standard deviation of 10. If a random sample of 64 is taken, what is the probability that the sample mean is each of the following?
Greater than 52
x=52;mu=50;rho=10;size=64
rho.x_=clt.sd(rho, size)
q = z.score(x, mu, rho.x_)
pnorm(q, lower.tail = FALSE)
## [1] 0.05479929
f7.13 <- function(x=52, mu=50, rho=10, n=64, lower.tail=TRUE){
rho.x_=clt.sd(rho, n)
q = z.score(x, mu, rho.x_)
return(pnorm(q, lower.tail=lower.tail))
}
f7.13(lower.tail=FALSE)
## [1] 0.05479929
Less than 51
f7.13(x=51)
## [1] 0.7881446
Less than 47
f7.13(x=47)
## [1] 0.008197536
Between 48.5 and 52.4
f7.13(x=52.4) - f7.13(x=48.5)
## [1] 0.8575014
Between 50.6 and 51.3
f7.13(x=51.3) - f7.13(x=50.6)
## [1] 0.1664437
Suppose a random sample of size 36 is drawn from a population with a mean of 278. If 86% of the time the sample mean is less than 280, what is the population standard deviation?
z = qnorm(.86)
x=280;mu=278;n=36
rho = (x-mu)/z*n^.5
rho
## [1] 11.10783
Find the probability in each case.
f7.17 <- function(x=76.5, mu=75, rho=6, n=60, N=1000, lower.tail=TRUE){
rho.x_=clt.sd.f(rho, n, N)
q = z.score(x, mu, rho.x_)
return(pnorm(q, lower.tail=lower.tail))
}
f7.17()
## [1] 0.9770515
x1=107;x2=107.7;N=90;n=36;mu=108;rho=3.46
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.2391082
x1=36;x2=Inf;N=250;n=100;mu=35.6;rho=4.89
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.1459611
x1=-Inf;x2=123;N=5000;n=60;mu=125;rho=13.4
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.1224152
Suppose a subdivision on the southwest side of Denver, Colorado, contains 1,500 houses. The subdivision was built in 1983. A sample of 100 houses is selected randomly and evaluated by an appraiser. If the mean appraised value of a house in this subdivision for all houses is $227,000, with a standard deviation of $8,500, what is the probability that the sample average is greater than $229,000?
N=1500;n=100;mu=227000;rho=8500;x1=229000;x2=Inf
f7.17(x2, mu, rho, n, N) - f7.17(x1, mu, rho, n, N)
## [1] 0.007451792
According to Nielsen Media Research, the average number of hours of TV viewing by adults (18 and over) per week in the United States is 36.07 hours. Suppose the standard deviation is 8.4 hours and a random sample of 42 adults is taken.
What is the probability that the sample average is more than 38 hours?
mu=36.07;rho=8.4;n=42;x1=38;x2=Inf
f7.13(x2,mu,rho,n) - f7.13(x1, mu, rho, n)
## [1] 0.06824009
What is the probability that the sample average is less than 33.5 hours?
f7.13(33.5, mu, rho, n)
## [1] 0.023695
What is the probability that the sample average is less than 26 hours?
f7.13(26, mu, rho, n)
## [1] 3.94999e-15
If the sample average actually is less than 26 hours, what would it mean in terms of the Nielsen Media Research figures? - It likely means the sample is not randomized, the sample is very biased
Suppose the population standard deviation is unknown. If 71% of all sample means are greater than 35 hours and the population mean is still 36.07 hours, what is the value of the population standard deviation?
z = qnorm(.71, lower.tail = FALSE)
x=35;mu=36.07;n=42
rho = (x-mu)/z*n^.5
rho
## [1] 12.53087
z.phat <- function(phat, p, n){
return((phat-p)/(p*(1-p)/n)^.5)
}
z.phat(.5, .6, 120)
## [1] -2.236068
A population proportion is .58. Suppose a random sample of 660 items is sampled randomly from this population.
What is the probability that the sample proportion is greater than .60?
p=.58;n=660;phat=.6
q = z.phat(phat, p, n)
pnorm(q, lower.tail = FALSE)
## [1] 0.1489308
What is the probability that the sample proportion is between .55 and .65?
q1 = z.phat(.55, p, n)
q2 = z.phat(.65, p, n)
pnorm(q2) - pnorm(q1)
## [1] 0.940668
What is the probability that the sample proportion is greater than .57?
q = z.phat(.57, p, n)
pnorm(q, lower.tail = FALSE)
## [1] 0.6986477
What is the probability that the sample proportion is between .53 and .56?
q1 = z.phat(.53, p, n)
q2 = z.phat(.56, p, n)
pnorm(q2) - pnorm(q1)
## [1] 0.1443044
What is the probability that the sample proportion is less than .48?
pnorm(z.phat(.48, p, n))
## [1] 9.69195e-08
If a population proportion is .28 and if the sample size is 140, 30% of the time the sample proportion will be less than what value if you are taking random samples?
p=.28;n=140;
z = qnorm(.3)
phat <- z * (p*(1-p)/n)^.5 + p
phat
## [1] 0.2601004
According to a survey by Accountemps, 48% of executives believe that employees are most productive on Tuesdays. Suppose 200 executives are randomly surveyed.
What is the probability that fewer than 90 of the executives believe employees are most productive on Tuesdays?
p=.48;n=200
pnorm(z.phat(90/n, p, n))
## [1] 0.1978828
What is the probability that more than 100 of the executives believe employees are most productive on Tuesdays?
pnorm(z.phat(100/n, p, n), lower.tail = FALSE)
## [1] 0.2856498
What is the probability that more than 80 of the executives believe employees are most productive on Tuesdays?
pnorm(z.phat(80/n, p, n), lower.tail = FALSE)
## [1] 0.98823
Assume the outcomes from an experiment with a trials conform to a Poisson distribution with lambda equal to 2. Determine the probability of obtaining an outcome greater than 4. (Round to four decimal places.)
ppois(5, 2, lower.tail = FALSE)
## [1] 0.01656361
5.8% of a population is infected with a certain disease. There is a test for the disease, however the test is not completely accurate. 93.9% of those who have the disease test positive. However 4.1% of those who do not have the disease also test positive (false positives). A person is randomly selected and tested for the disease.
What is the probability that the person has the disease given that the test result is positive?
p.disease = .058
p.positive.when.disease = .939
p.positive.when.disease_no = .041
p.disease_no = 1 - p.disease
p.positive = p.positive.when.disease*p.disease + p.positive.when.disease_no*p.disease_no
p.disease.when.positive = p.disease * p.positive.when.disease / p.positive
p.disease.when.positive
## [1] 0.5850844
Given the following probability distribution for the variable x, compute the expected value and variance of x using the distribution below. Round to two decimal places.
x | 0 | 1 | 2 | 3 |
---|---|---|---|---|
Probability | .749 | .225 | .024 | .002 |
x <- c(0, 1, 2, 3)
p <- c(.749, .225, .024, .002)
mu = x %*% p
round(mu, 2)
## [,1]
## [1,] 0.28
discrete.var <- function(x, mu, p){
return((x-c(mu))^2 %*% p)
}
round(discrete.var(x, mu, p), 2)
## [,1]
## [1,] 0.26
Given the following sample data: 1.3,2.2,2.7,3.1,3.3,3.7, use quantile() in R with type = 7 to find the estimated 33rd percentile. Round to 2 decimal places. Pick the correct answer.
round(quantile(c(1.3,2.2,2.7,3.1,3.3,3.7), probs = c(0, .25, .33, .5, .75, 1), type = 7), 2)
## 0% 25% 33% 50% 75% 100%
## 1.30 2.33 2.53 2.90 3.25 3.70
A study of the amount of time it takes a mechanic to rebuild the transmission for a 2005 Chevrolet Cavalier shows that the mean is 8.4 hours and the standard deviation is 1.8 hours. If a random sample of 36 mechanics is selected, find the probability that their mean rebuild time exceeds 8.7 hours. Assume the mean rebuild time has a normal distribution. (Hint, interpolate in the tables or use pnorm().)
clt.sd <- function(rho, n){
return(rho/n^.5)
}
clt.sd.f <- function(rho, n, N){
return( rho/(n^.5) * ((N-n)/(N-1))^.5 )
}
z.score <- function(x, mu, rho){
(x-mu)/rho
}
f7.13 <- function(x=52, mu=50, rho=10, n=64, lower.tail=TRUE){
rho.x_=clt.sd(rho, n)
q = z.score(x, mu, rho.x_)
return(pnorm(q, lower.tail=lower.tail))
}
round(f7.13(x=8.7, mu=8.4, rho=1.8, n=36, lower.tail=FALSE), 3)
## [1] 0.159
Use the normal distribution to estimate the probability of 50 successes for a binomial distribution with n = 76 and the probability of success p = 0.7. Do NOT use the binomial distribution but use normal approximation to binomial distribution. Round to four decimal places.
# function to calculate mean for a binomial distribution
bi.mu <- function(n, p){
return(n*p)
}
# function to calculate standard deviation for a binomial distribution
bi.sd <- function(n, p){
return((n*p*(1-p))^.5)
}
n=76
p=.7
mu = bi.mu(n, p)
sd = bi.sd(n, p)
round(dnorm(50, mu, sd), 4)
## [1] 0.0725
The given values are discrete (binomial outcomes). Use the continuity correction and describe the region of the normal distribution that corresponds to the indicated probability.
The probability of less than 48 correct answers.
The area to the left of 47.5
In one region, the September energy consumption levels for single-family homes are found to be normally distributed with a mean of 1050 kWh and a standard deviation of 225 kWh. Find P45 (45th percentile).
mu = 1050
rho = 225
p = .45
round(qnorm(p, mu, rho), 1)
## [1] 1021.7
In a taste test, five different customers are each presented with 3 different soft drinks. The same soft drinks are used with each customer, but presented in random order. If the selections were made by random guesses, find the probability that all five customers witnesses would pick the same soft drink as their favorite. (There is more than one way the customers can agree.)
round(3/(3^5), 5)
## [1] 0.01235
As a prize for winning a contest, the contestant is blindfolded and allowed to draw 3 dollar bills one at a time out of an urn. The urn contains forty $1 bills and ten $100 bills. The urn is churned before each selection so that the selection would be at random. What is the probability that none of the $100 bills are selected? (Round to four decimal places.)
round(40/50*39/49*38/48, 4)
## [1] 0.5041