4.16 the population distribution in the United States based on race/ethnicity and blood type as reported by the American Red Cross is given here. Blood type Race/Ethnicity O A B AB White 36% 32.2% 8.8% 3.2% Black 7% 2.9% 2.5% .5% Asian 1.7% 1.2% 1% .3% All others 1.5% .8% .3% .1% Revised table with totals
bloodtype <- read.csv(file="Bloodtype.csv", header = TRUE)
attach(bloodtype)
bloodtype
P(A and O) = 1.7%
P(W and !A) = 1 - P(W and A) = 1 - .322 = .678 = 67.8%
What is the probability that an Asian donor will have either type A or type B blood? P(Asian(A or B)) = P(Asian(A) + P(Asian(B))) = .012 + .010 = .022 = 2.2%
What is the probability that a donor will have neither type A nor type AB blood?
!P(A or AB) = P(B or O) = P(B) + P(O) = .462 + .126 = .588 = 58.8%
4.23 A survey of 1,000 U.S. government employees who have an advanced college degree produced the following responses to the offering of a promotion to a higher grade position that would involve moving to a new location. Married Promotion Both Spouses One Spouse Professional Professional Unmarried total Rejected 184 56 17 257 Accepted 276 314 153 743 total 460 370 170 1,000
Use the results of the survey to estimate the following probabilities. a. What is the probability that a randomly selected government employee having an advanced college degree would accept a promotion? P(A) = 743/1000 = .743 = 74.3%
What is the probability that a randomly selected government employee having an advanced college degree would not accept a promotion? P(R) = 257/1000 = .257 = 25.7%
What is the probability that a randomly selected government employee having an advanced college degree has a spouse with a professional position? P(SP or BSP) = P(SP) + P(BSP) - P(BSP and SP) = 370/1000 + 460/1000 - (.37 * .46) = .37 + .46 - .1702 = .6598 = 65.98%
4.24 Refer to Exercise 4.23. Define the following events. Event A: A randomly selected government employee having an advanced college degree would accept a promotion Event B: A randomly selected government employee having an advanced college degree has a spouse in a professional career Event C: A randomly selected government employee having an advanced college degree has a spouse without a professional position Event D: A randomly selected government employee having an advanced college degree is unmarried Use the results of the survey in Exercise 4.23 to compute the following probabilities:
P(A) P(A) = 743/1000 = .743 = 74.3%
P(B) P(SP or BSP) = P(SP) + P(BSP) - P(BSP and SP) = 370/1000 + 460/1000 - (.37 * .46) = .37 + .46 - .1702 = .6598 = 65.98%
P(A|C) P(A and C)/P(C) P(C) = 1- P(B) P(C) = .3402
P(A)P(C)/P(C) = (.743 * .3402)/.3402 = .2527686/.3402 = .743 = 74.3%
4.34 In a January 15, 1998, article, the New England Journal of Medicine (338:141-146) reported on the utility of using computerized tomography (Ct) as a diagnostic test for patients with clinically suspected appendicitis. In at least 20% of patients with appendicitis, the correct diagnosis was not made. On the other hand, the appendix was normal in 15% to 40% of patients who under- went emergency appendectomy. A study was designed to determine the prospective effectiveness of using Ct as a diagnostic test to improve the treatment of these patients. the study examined 100 consecutive patients suspected of having acute appendicitis who presented to the emergency department or were referred there from a physician’s office. the 100 patients underwent a Ct scan, and the surgeon made an assessment of the presence of appendicitis for each of the patients. the final clinical outcomes were determined at surgery and by pathological examination of the appendix after appendectomy or by clinical follow-up at least 2 months after Ct scanning.
Presence of Appendicitis
Radiologic Determination Confirmed (C) Ruled Out (RO) Definitely appendicitis (DA) 50 1 Equivocally appendicitis (EA) 2 2 Definitely not appendicitis (DNA) 1 44
appendicitis <- read.csv(file="appendicitis.csv", header = TRUE)
attach(appendicitis)
appendicitis
The 1996 rate of occurrence of appendicitis was approximately P(C) = .00108.
P(DA|C) = 50/51 = .98
P(DA|!C) = 1/51 = .02
P(EA|C) = 2/4 = .5
P(EA|!C) = 2/4 = .5
P(DNA|C) = 1/45 = .022
P(DNA|!C) = 44/45 = .97
Sensitivity of a diagnostic test is the TRUE positive, while the specoficity is the TRUE negative. Sensitivity = .98 Sepcificity = .977
Probability of patienthaving appendicitis is P(C) = .00108, and probabibility of NOt having appendicitis is P(!C) = .99892.
Bayes therom….. P(A|B) = P(B|A) * P(A) / P(B|A) * P(A) + P(B|!A) * P(!A)
The probability that a patient had appendicitis when he was diagnosis was definitely appendisitis
P(C|DA) = P(C|DA) * P(DA) / P(C|DA) * P(DA) + P(C|!DA) * P(!DA)
= .98 * .00108 / .98 * .00108 + .02 * .99892
= .0503
P(!C|DA) = P(DA|!C) * P(!C) / P(DA|!C) * P(!C) + P(DA|C) * P(C)
= 1 - P(C|DA)
= 1 - .053
= .947
P(!C|DNA) = P(DNA|!C) * P(!C) / P(DNA|!C) * P(!C) + P(DNA|C) * P(C)
= .97 * .99892 / .022 * .00108 + .97 *.99892
= .99
4.48 the CFO of a hospital is concerned about the risk of patients contracting an infection after a one-week or longer stay in the hospital. A long-term study estimates that the chance of contracting an infection after a one-week or longer stay in a hospital is 10%. A random sample of 50 patients who have been in the hospital at least 1 week is selected.
pbinom(4,50,.10,lower.tail = FALSE)
[1] 0.5688016
Proportionis the same for each event, each event is independent, and these assumptions are for a binomial distribution.
4.50 Customers arrive at a grocery store checkout at a rate of six per 30 minutes during the hours of 5 p.m. and 7 p.m. during the workweek. Let C be the number of customers arriving at the checkout during any 30-minute period of time. the management of the store wants to determine the frequency of the following events. Compute the probabilities of these events: a. No customers arrive.
dpois(0,lambda = 6)
[1] 0.002478752
ppois(6,lambda = 6,lower.tail = FALSE)
[1] 0.3936972
ppois(3,lambda = 6,lower.tail = TRUE)
[1] 0.1512039
4.70 the College Boards, which are administered each year to many thousands of high school students, are scored so as to yield a mean of 513 and a standard deviation of 130. these scores are close to being normally distributed. What percentage of the scores can be expected to satisfy each of the following conditions?
pnorm(600,mean = 513,sd = 130,lower.tail = FALSE)
[1] 0.2516741
pnorm(700,mean = 513,sd = 130,lower.tail = FALSE)
[1] 0.07515157
pnorm(449,mean = 513,sd = 130,lower.tail = TRUE)
[1] 0.3112509
pnorm(600,mean = 513,sd = 130,lower.tail = TRUE) - pnorm(450,mean = 513,sd = 130,lower.tail = TRUE)
[1] 0.4343513
4.82 Based on the 1990 census, the number of hours per day adults spend watching television is approximately normally distributed with a mean of 5 hours and a standard deviation of 1.3 hours.
x = random variable for number of hours spent watching tv P(x > 7)
pnorm(7,mean = 5,sd = 1.3,lower.tail = FALSE)
[1] 0.0619679
P(ybar > 5.5)
Central Limit theorem: ybar has a mean u ybar and a standard deviation o ybar where uy = u o ybar = o/sqrtn
u is the population mean and o is the population standard deviation.
n = 500
u = 5 hours
o =1.3 hours
so u y = 5
o y = 1.3/sqrt500 = .0581
therefore, ybar has a mean of 5 hours, and sd of .0581 hours.
= P(z > 5.5-5/.0581) = P(z > 8.61)
Since 99.7% of the values fallwithin 3 standard deviations of the mean,
pnorm(500,mean = 5.5,sd = 1.3,lower.tail = FALSE)
[1] 0
The results of this survey are not consistent with the 1990 census.
4.110 Suppose the probability that a major earthquake occurs on a given day in Fresno, California,is 1 in 10,000.
Probability of a major earthquake in a given day is 1/10000 = .0001 Expected earthquakes in the next 1000 days = 1000 * .0001 = .1 or 10%
ppois(0,lambda = .10,lower.tail = FALSE)
[1] 0.09516258
4.112 Airlines overbook (sell more tickets than there are seats) flights, based on past records that indicate that approximately 5% of all passengers fail to arrive on time for their flight. Suppose a plane will hold 250 passengers, but the airline books 260 seats. What is the probability that at least 1 passenger will be bumped from the flight?
Probability that a passenger doesn’t arrive on time is .05
Probability that a passenger arrives on time is .95
x = binomial random variable and success is # of passengers that arrive on time
P(x>250)
n = 260
pi = .95
pbinom(250,260,.95,lower.tail = FALSE)
[1] 0.1590758
4.113 For the last 300 years, extensive records have been kept on volcanic activity in Japan. In 2002, there were five eruptions or instances of major seismic activity. From historical records, the mean number of eruptions or instances of major seismic activity is 2.4 per year. A researcher is interested in modeling the number of eruptions or major seismic activities over the 5-year period of 2005-2010.
Since we are measuring volcanic activity OVER TIME, then Possion Model should be used
2.4*5
[1] 12
dpois(0,lambda = 2.4)
[1] 0.09071795
ppois(1,lambda = 2.4,lower.tail = FALSE)
[1] 0.691559
4.114 As part of a study to determine factors that may explain differences in animal species relative to their size, the following body masses (in grams) of 50 different bird species were reported in the paper “temperature and the Northern Distributions of Wintering Birds,” by Richard Repasky (1991). 7.7 10.1 21.6 8.6 12.0 11.4 16.6 9.4 11.5 9.0 8.2 20.2 48.5 21.6 26.1 6.2 19.1 21.0 28.1 10.6 31.6 6.7 5.0 68.8 23.9 19.8 20.1 6.0 99.6 19.8 16.5 9.0 448.0 21.3 17.4 36.9 34.0 41.0 15.9 12.5 10.2 31.0 21.5 11.9 32.5 9.8 93.9 10.9 19.6 14.5
birds <- read.csv(file="birds.csv", header = TRUE)
attach(birds)
No.
hist(Bird_Mass)
Now let’s perform a quantitative assessment
Let’s construct the intervals:
y+/-s
slower = mean(Bird_Mass) - sd(Bird_Mass)
shigher = mean(Bird_Mass) + sd(Bird_Mass)
slower
[1] -32.55376
shigher
[1] 94.03776
y+/-2s
s2lower = mean(Bird_Mass) - 2 * sd(Bird_Mass)
s2higher = mean(Bird_Mass) + 2 * sd(Bird_Mass)
s2lower
[1] -95.84951
s2higher
[1] 157.3335
y+/-3s
s3lower = mean(Bird_Mass) - 3 * sd(Bird_Mass)
s3higher = mean(Bird_Mass) + 3 * sd(Bird_Mass)
s3lower
[1] -159.1453
s3higher
[1] 220.6293
Let’s count the number of measurements falling in each of the three intervals:
y+/-s
sum((mean(Bird_Mass) - sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + sd(Bird_Mass)))
[1] 48
y+/-2s
sum((mean(Bird_Mass) - 2*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 2*sd(Bird_Mass)))
[1] 49
y+/-3s
sum((mean(Bird_Mass) - 3*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 3*sd(Bird_Mass)))
[1] 49
Let’s convert these numbers to percentages and compare the results to the Empirical Rule. y+/-s
sum((mean(Bird_Mass) - sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + sd(Bird_Mass)))/length(Bird_Mass)
[1] 0.96
Empirical Rule says: 68% fall within 1 sd
y+/-2s
sum((mean(Bird_Mass) - 2*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 2*sd(Bird_Mass)))/length(Bird_Mass)
[1] 0.98
Empirical Rule says: 95% fall within 2 sd
y+/-3s
sum((mean(Bird_Mass) - 3*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 3*sd(Bird_Mass)))/length(Bird_Mass)
[1] 0.98
Empirical Rule says: 99.7% fall within 3 sd
newbd <- birds[Bird_Mass != 448.0, ]
hist(newbd)
Now let’s perform a quantitative assessment Let’s construct the intervals:
y+/-s
slower = mean(newbd) - sd(newbd)
shigher = mean(newbd) + sd(newbd)
slower
[1] 2.513012
shigher
[1] 41.94005
y+/-2s
s2lower = mean(newbd) - 2 * sd(newbd)
s2higher = mean(newbd) + 2 * sd(newbd)
s2lower
[1] -17.20051
s2higher
[1] 61.65357
y+/-3s
s3lower = mean(newbd) - 3 * sd(newbd)
s3higher = mean(newbd) + 3 * sd(newbd)
s3lower
[1] -36.91403
s3higher
[1] 81.36709
Let’s count the number of measurements falling in each of the three intervals:
y+/-s
sum((mean(newbd) - sd(newbd)) < newbd & newbd < (mean(newbd) + sd(newbd)))
[1] 45
y+/-2s
sum((mean(newbd) - 2*sd(newbd)) < newbd & newbd < (mean(newbd) + 2*sd(newbd)))
[1] 46
y+/-3s
sum((mean(newbd) - 3*sd(newbd)) < newbd & newbd < (mean(newbd) + 3*sd(newbd)))
[1] 47
Let’s convert these numbers to percentages and compare the results to the Empirical Rule. y+/-s
sum((mean(newbd) - sd(newbd)) < newbd & newbd < (mean(newbd) + sd(newbd)))/length(newbd)
[1] 0.9183673
Empirical Rule says: 68% fall within 1 sd
y+/-2s
sum((mean(newbd) - 2*sd(newbd)) < newbd & newbd < (mean(newbd) + 2*sd(newbd)))/length(newbd)
[1] 0.9387755
Empirical Rule says: 95% fall within 2 sd
y+/-3s
sum((mean(newbd) - 3*sd(newbd)) < newbd & newbd < (mean(newbd) + 3*sd(newbd)))/length(newbd)
[1] 0.9591837
Empirical Rule says: 99.7% fall within 3 sd
With 448.0
summary(Bird_Mass)
Min. 1st Qu. Median Mean 3rd Qu. Max. 5.00 10.30 18.25 30.74 25.55 448.00
Without 448.0
summary(newbd)
Min. 1st Qu. Median Mean 3rd Qu. Max. 5.00 10.20 17.40 22.23 23.90 99.60
Standard Deviation and MAD with 448.0
sd(Bird_Mass)
[1] 63.29576
mad(Bird_Mass)
[1] 11.78667
Standard Deviation and MAD without 448.0
sd(newbd)
[1] 19.71352
mad(newbd)
[1] 10.67472
Extra Problem: Plot “P(drug user/ test positive)= (??1 )(??) / [(??1 )(??) +(1-??2 )(1-?? ) ]” against ??= 0:1 by 0.1, ??1 = .1, .5, .9, and ??2 = .9 on the same graph and comment.
x <- seq(0,1,by=.1)
y <- pt(x,df=.1)
y <- pt(x,df=.9)
plot(x,y)
title(main = "Drug User Test Positive")
There seems to be a positive correlation between drug users and positive tests.