4.16 the population distribution in the United States based on race/ethnicity and blood type as reported by the American Red Cross is given here. Blood type Race/Ethnicity O A B AB White 36% 32.2% 8.8% 3.2% Black 7% 2.9% 2.5% .5% Asian 1.7% 1.2% 1% .3% All others 1.5% .8% .3% .1% Revised table with totals

bloodtype <- read.csv(file="Bloodtype.csv", header = TRUE)
attach(bloodtype)
bloodtype
  1. A volunteer blood donor walks into a Red Cross blood donation center. What is the probability she will be Asian and have type O blood?

P(A and O) = 1.7%

  1. What is the probability that a white donor will not have type A blood?

P(W and !A) = 1 - P(W and A) = 1 - .322 = .678 = 67.8%

  1. What is the probability that an Asian donor will have either type A or type B blood? P(Asian(A or B)) = P(Asian(A) + P(Asian(B))) = .012 + .010 = .022 = 2.2%

  2. What is the probability that a donor will have neither type A nor type AB blood?

!P(A or AB) = P(B or O) = P(B) + P(O) = .462 + .126 = .588 = 58.8%

4.23 A survey of 1,000 U.S. government employees who have an advanced college degree produced the following responses to the offering of a promotion to a higher grade position that would involve moving to a new location. Married Promotion Both Spouses One Spouse Professional Professional Unmarried total Rejected 184 56 17 257 Accepted 276 314 153 743 total 460 370 170 1,000

Use the results of the survey to estimate the following probabilities. a. What is the probability that a randomly selected government employee having an advanced college degree would accept a promotion? P(A) = 743/1000 = .743 = 74.3%

  1. What is the probability that a randomly selected government employee having an advanced college degree would not accept a promotion? P(R) = 257/1000 = .257 = 25.7%

  2. What is the probability that a randomly selected government employee having an advanced college degree has a spouse with a professional position? P(SP or BSP) = P(SP) + P(BSP) - P(BSP and SP) = 370/1000 + 460/1000 - (.37 * .46) = .37 + .46 - .1702 = .6598 = 65.98%

4.24 Refer to Exercise 4.23. Define the following events. Event A: A randomly selected government employee having an advanced college degree would accept a promotion Event B: A randomly selected government employee having an advanced college degree has a spouse in a professional career Event C: A randomly selected government employee having an advanced college degree has a spouse without a professional position Event D: A randomly selected government employee having an advanced college degree is unmarried Use the results of the survey in Exercise 4.23 to compute the following probabilities:

  1. P(A) P(A) = 743/1000 = .743 = 74.3%

  2. P(B) P(SP or BSP) = P(SP) + P(BSP) - P(BSP and SP) = 370/1000 + 460/1000 - (.37 * .46) = .37 + .46 - .1702 = .6598 = 65.98%

  3. P(A|C) P(A and C)/P(C) P(C) = 1- P(B) P(C) = .3402

P(A)P(C)/P(C) = (.743 * .3402)/.3402 = .2527686/.3402 = .743 = 74.3%

  1. P(A|D) P(D) = 170/1000 P(D) = .17 P(A and D) = 153/1000 = .153 P(A and D)/P(D) = .153/.17 = .9 = 90%

4.34 In a January 15, 1998, article, the New England Journal of Medicine (338:141-146) reported on the utility of using computerized tomography (Ct) as a diagnostic test for patients with clinically suspected appendicitis. In at least 20% of patients with appendicitis, the correct diagnosis was not made. On the other hand, the appendix was normal in 15% to 40% of patients who under- went emergency appendectomy. A study was designed to determine the prospective effectiveness of using Ct as a diagnostic test to improve the treatment of these patients. the study examined 100 consecutive patients suspected of having acute appendicitis who presented to the emergency department or were referred there from a physician’s office. the 100 patients underwent a Ct scan, and the surgeon made an assessment of the presence of appendicitis for each of the patients. the final clinical outcomes were determined at surgery and by pathological examination of the appendix after appendectomy or by clinical follow-up at least 2 months after Ct scanning.

                              Presence of Appendicitis

Radiologic Determination Confirmed (C) Ruled Out (RO) Definitely appendicitis (DA) 50 1 Equivocally appendicitis (EA) 2 2 Definitely not appendicitis (DNA) 1 44

appendicitis <- read.csv(file="appendicitis.csv", header = TRUE)
attach(appendicitis)
appendicitis

The 1996 rate of occurrence of appendicitis was approximately P(C) = .00108.

  1. Find the sensitivity and specificity of the radiological determination of appendicitis.

P(DA|C) = 50/51 = .98

P(DA|!C) = 1/51 = .02

P(EA|C) = 2/4 = .5

P(EA|!C) = 2/4 = .5

P(DNA|C) = 1/45 = .022

P(DNA|!C) = 44/45 = .97

Sensitivity of a diagnostic test is the TRUE positive, while the specoficity is the TRUE negative. Sensitivity = .98 Sepcificity = .977

  1. Find the probability that a patient truly had appendicitis given that the radiological determination was definitely appendicitis (DA).

Probability of patienthaving appendicitis is P(C) = .00108, and probabibility of NOt having appendicitis is P(!C) = .99892.

Bayes therom….. P(A|B) = P(B|A) * P(A) / P(B|A) * P(A) + P(B|!A) * P(!A)

The probability that a patient had appendicitis when he was diagnosis was definitely appendisitis

P(C|DA) = P(C|DA) * P(DA) / P(C|DA) * P(DA) + P(C|!DA) * P(!DA)

    = .98 * .00108 / .98 * .00108 + .02 * .99892
    
    = .0503
  1. Find the probability that a patient truly did not have appendicitis given that the radiological determination was definitely appendicitis (DA).

P(!C|DA) = P(DA|!C) * P(!C) / P(DA|!C) * P(!C) + P(DA|C) * P(C)

        = 1 - P(C|DA)
        
        = 1 - .053
        
        = .947
        
  1. Find the probability that a patient truly did not have appendicitis given that the radiological determination was definitely not appendicitis (DNA).

P(!C|DNA) = P(DNA|!C) * P(!C) / P(DNA|!C) * P(!C) + P(DNA|C) * P(C)

        = .97 * .99892 / .022 * .00108 + .97 *.99892
        
        = .99
        
        

4.48 the CFO of a hospital is concerned about the risk of patients contracting an infection after a one-week or longer stay in the hospital. A long-term study estimates that the chance of contracting an infection after a one-week or longer stay in a hospital is 10%. A random sample of 50 patients who have been in the hospital at least 1 week is selected.

  1. If the 10% infection rate is correct, what is the probability that at least 5 patients out of the 50 will have an infection?
pbinom(4,50,.10,lower.tail = FALSE)

[1] 0.5688016

  1. What assumptions are you making in computing the probability in part (a)?

Proportionis the same for each event, each event is independent, and these assumptions are for a binomial distribution.

4.50 Customers arrive at a grocery store checkout at a rate of six per 30 minutes during the hours of 5 p.m. and 7 p.m. during the workweek. Let C be the number of customers arriving at the checkout during any 30-minute period of time. the management of the store wants to determine the frequency of the following events. Compute the probabilities of these events: a. No customers arrive.

dpois(0,lambda = 6)

[1] 0.002478752

  1. More than six customers arrive.
ppois(6,lambda = 6,lower.tail = FALSE)

[1] 0.3936972

  1. At most three customers arrive.
ppois(3,lambda = 6,lower.tail = TRUE)

[1] 0.1512039

4.70 the College Boards, which are administered each year to many thousands of high school students, are scored so as to yield a mean of 513 and a standard deviation of 130. these scores are close to being normally distributed. What percentage of the scores can be expected to satisfy each of the following conditions?

  1. Greater than 600
pnorm(600,mean = 513,sd = 130,lower.tail = FALSE)

[1] 0.2516741

  1. Greater than 700
pnorm(700,mean = 513,sd = 130,lower.tail = FALSE)

[1] 0.07515157

  1. Less than 450
pnorm(449,mean = 513,sd = 130,lower.tail = TRUE)

[1] 0.3112509

  1. Between 450 and 600
pnorm(600,mean = 513,sd = 130,lower.tail = TRUE) - pnorm(450,mean = 513,sd = 130,lower.tail = TRUE)

[1] 0.4343513

4.82 Based on the 1990 census, the number of hours per day adults spend watching television is approximately normally distributed with a mean of 5 hours and a standard deviation of 1.3 hours.

  1. What proportion of the population spends more than 7 hours per day watching television?

x = random variable for number of hours spent watching tv P(x > 7)

pnorm(7,mean = 5,sd = 1.3,lower.tail = FALSE)

[1] 0.0619679

  1. In a 1998 study of television viewing, a random sample of 500 adults reported that the average number of hours spent viewing television was greater than 5.5 hours per day. Do the results of this survey appear to be consistent with the 1990 census? (Hint: If the census results are still correct, what is the probability that the average viewing time would exceed 5.5 hours?)

P(ybar > 5.5)

Central Limit theorem: ybar has a mean u ybar and a standard deviation o ybar where uy = u o ybar = o/sqrtn

u is the population mean and o is the population standard deviation.

n = 500

u = 5 hours

o =1.3 hours

so u y = 5

o y = 1.3/sqrt500 = .0581

therefore, ybar has a mean of 5 hours, and sd of .0581 hours.

= P(z > 5.5-5/.0581) = P(z > 8.61)

Since 99.7% of the values fallwithin 3 standard deviations of the mean,

pnorm(500,mean = 5.5,sd = 1.3,lower.tail = FALSE)

[1] 0

The results of this survey are not consistent with the 1990 census.

4.110 Suppose the probability that a major earthquake occurs on a given day in Fresno, California,is 1 in 10,000.

  1. In the next 1,000 days, what is the expected number of major earthquakes in Fresno?

Probability of a major earthquake in a given day is 1/10000 = .0001 Expected earthquakes in the next 1000 days = 1000 * .0001 = .1 or 10%

  1. If the occurrence of major earthquakes can be modeled by the Poisson distribution, calculate the probability that there will be at least one major earthquake in Fresno during the next 1,000 days.
ppois(0,lambda = .10,lower.tail = FALSE)

[1] 0.09516258

4.112 Airlines overbook (sell more tickets than there are seats) flights, based on past records that indicate that approximately 5% of all passengers fail to arrive on time for their flight. Suppose a plane will hold 250 passengers, but the airline books 260 seats. What is the probability that at least 1 passenger will be bumped from the flight?

Probability that a passenger doesn’t arrive on time is .05

Probability that a passenger arrives on time is .95

x = binomial random variable and success is # of passengers that arrive on time

P(x>250)

n = 260

pi = .95

pbinom(250,260,.95,lower.tail = FALSE)

[1] 0.1590758

4.113 For the last 300 years, extensive records have been kept on volcanic activity in Japan. In 2002, there were five eruptions or instances of major seismic activity. From historical records, the mean number of eruptions or instances of major seismic activity is 2.4 per year. A researcher is interested in modeling the number of eruptions or major seismic activities over the 5-year period of 2005-2010.

  1. What probability model might be appropriate?

Since we are measuring volcanic activity OVER TIME, then Possion Model should be used

  1. What is the expected number of eruptions or instances of major seismic activity during 2005-2010?
2.4*5

[1] 12

  1. What is the probability of no eruptions or instances of major seismic activity during 2005-2010?
dpois(0,lambda = 2.4)

[1] 0.09071795

  1. What is the probability of at least two eruptions or instances of major seismic activity during 2005-2010?
ppois(1,lambda = 2.4,lower.tail = FALSE)

[1] 0.691559

4.114 As part of a study to determine factors that may explain differences in animal species relative to their size, the following body masses (in grams) of 50 different bird species were reported in the paper “temperature and the Northern Distributions of Wintering Birds,” by Richard Repasky (1991). 7.7 10.1 21.6 8.6 12.0 11.4 16.6 9.4 11.5 9.0 8.2 20.2 48.5 21.6 26.1 6.2 19.1 21.0 28.1 10.6 31.6 6.7 5.0 68.8 23.9 19.8 20.1 6.0 99.6 19.8 16.5 9.0 448.0 21.3 17.4 36.9 34.0 41.0 15.9 12.5 10.2 31.0 21.5 11.9 32.5 9.8 93.9 10.9 19.6 14.5

birds <- read.csv(file="birds.csv", header = TRUE)
attach(birds)
  1. Does the distribution of the body masses appear to follow a normal distribution? Provide both a graphical and a quantitative assessment.

No.

hist(Bird_Mass)

Now let’s perform a quantitative assessment

Let’s construct the intervals:

y+/-s

slower = mean(Bird_Mass) - sd(Bird_Mass)
shigher = mean(Bird_Mass) + sd(Bird_Mass)
slower

[1] -32.55376

shigher

[1] 94.03776

y+/-2s

s2lower = mean(Bird_Mass) - 2 * sd(Bird_Mass)
s2higher = mean(Bird_Mass) + 2 * sd(Bird_Mass)
s2lower

[1] -95.84951

s2higher

[1] 157.3335

y+/-3s

s3lower = mean(Bird_Mass) - 3 * sd(Bird_Mass)
s3higher = mean(Bird_Mass) + 3 * sd(Bird_Mass)
s3lower

[1] -159.1453

s3higher

[1] 220.6293

Let’s count the number of measurements falling in each of the three intervals:

y+/-s

 sum((mean(Bird_Mass) - sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + sd(Bird_Mass)))

[1] 48

y+/-2s

 sum((mean(Bird_Mass) - 2*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 2*sd(Bird_Mass)))

[1] 49

y+/-3s

 sum((mean(Bird_Mass) - 3*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 3*sd(Bird_Mass)))

[1] 49

Let’s convert these numbers to percentages and compare the results to the Empirical Rule. y+/-s

sum((mean(Bird_Mass) - sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + sd(Bird_Mass)))/length(Bird_Mass)

[1] 0.96

Empirical Rule says: 68% fall within 1 sd

y+/-2s

sum((mean(Bird_Mass) - 2*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 2*sd(Bird_Mass)))/length(Bird_Mass)

[1] 0.98

Empirical Rule says: 95% fall within 2 sd

y+/-3s

sum((mean(Bird_Mass) - 3*sd(Bird_Mass)) < Bird_Mass & Bird_Mass < (mean(Bird_Mass) + 3*sd(Bird_Mass)))/length(Bird_Mass)

[1] 0.98

Empirical Rule says: 99.7% fall within 3 sd

  1. Repeat part (a), with the outlier 448.0 removed. The distribution of the body masses still does Not appear to follow a normal distribution
newbd <- birds[Bird_Mass != 448.0, ]
hist(newbd)

Now let’s perform a quantitative assessment Let’s construct the intervals:

y+/-s

slower = mean(newbd) - sd(newbd)
shigher = mean(newbd) + sd(newbd)
slower

[1] 2.513012

shigher

[1] 41.94005

y+/-2s

s2lower = mean(newbd) - 2 * sd(newbd)
s2higher = mean(newbd) + 2 * sd(newbd)
s2lower

[1] -17.20051

s2higher

[1] 61.65357

y+/-3s

s3lower = mean(newbd) - 3 * sd(newbd)
s3higher = mean(newbd) + 3 * sd(newbd)
s3lower

[1] -36.91403

s3higher

[1] 81.36709

Let’s count the number of measurements falling in each of the three intervals:

y+/-s

 sum((mean(newbd) - sd(newbd)) < newbd & newbd < (mean(newbd) + sd(newbd)))

[1] 45

y+/-2s

 sum((mean(newbd) - 2*sd(newbd)) < newbd & newbd < (mean(newbd) + 2*sd(newbd)))

[1] 46

y+/-3s

 sum((mean(newbd) - 3*sd(newbd)) < newbd & newbd < (mean(newbd) + 3*sd(newbd)))

[1] 47

Let’s convert these numbers to percentages and compare the results to the Empirical Rule. y+/-s

sum((mean(newbd) - sd(newbd)) < newbd & newbd < (mean(newbd) + sd(newbd)))/length(newbd)

[1] 0.9183673

Empirical Rule says: 68% fall within 1 sd

y+/-2s

sum((mean(newbd) - 2*sd(newbd)) < newbd & newbd < (mean(newbd) + 2*sd(newbd)))/length(newbd)

[1] 0.9387755

Empirical Rule says: 95% fall within 2 sd

y+/-3s

sum((mean(newbd) - 3*sd(newbd)) < newbd & newbd < (mean(newbd) + 3*sd(newbd)))/length(newbd)

[1] 0.9591837

Empirical Rule says: 99.7% fall within 3 sd

  1. Determine the sample mean and median with and without the value 448.0 in the data set.

With 448.0

summary(Bird_Mass)

Min. 1st Qu. Median Mean 3rd Qu. Max. 5.00 10.30 18.25 30.74 25.55 448.00

Without 448.0

summary(newbd)

Min. 1st Qu. Median Mean 3rd Qu. Max. 5.00 10.20 17.40 22.23 23.90 99.60

  1. Determine the sample standard deviation and MAD (Mean Absolute Deviation) with and without the value 448.0 in the data set.

Standard Deviation and MAD with 448.0

sd(Bird_Mass)

[1] 63.29576

mad(Bird_Mass)

[1] 11.78667

Standard Deviation and MAD without 448.0

sd(newbd)

[1] 19.71352

mad(newbd)

[1] 10.67472

Extra Problem: Plot “P(drug user/ test positive)= (??1 )(??) / [(??1 )(??) +(1-??2 )(1-?? ) ]” against ??= 0:1 by 0.1, ??1 = .1, .5, .9, and ??2 = .9 on the same graph and comment.

x <- seq(0,1,by=.1)
y <- pt(x,df=.1)
y <- pt(x,df=.9)
plot(x,y)
title(main = "Drug User Test Positive")

There seems to be a positive correlation between drug users and positive tests.