Exerecise 5.1.1

#Consider taking a random sample of size 3 from the knee replacement population of Example 5.1.3. What is the probability that the total cost for those in the sample will be greater than $125,000? so this is the same are Pr{Y>125,000}. using the table 5.1.2 we summ the probability for sample total 130,155, and 180

prgr125 <- (sum(12/64+6/64+1/64))*100
print(prgr125)
## [1] 29.6875

#interpretation: for the random sample n = 3, there is 29.86% chance the total cost will exceed $125,000

Exercise 5.1.2

#Consider taking a random sample of size 3 from the knee replacement population of Example 5.1.3. What is the probability that the total cost for those in the sample will be between $80,000 and $125,000? this is Pr{80,000<Y<125,000}. from the table we summ Pr for sample totals 70,95,120.

Prbwt80to125 <- (sum(12/64+12/64+8/64))*100
print(Prbwt80to125)
## [1] 50

#interpretation for the random sample n = 3, there is 50% chance the total cost will be between $80,000 and $125,000

Exercise 5.2.4

#The serum cholesterol levels of a population of 12- to 14-year-olds follow a normal distribution with mean 155 mg/dl and standard deviation 27 mg/dl (as in Example 4.1.1).

#(a) What percentage of the 12- to 14-year-olds have serum cholesterol values between 145 and 165 mg/dl? As this population mean and sd, we use pnorm to find the AUC and take the difference 165 - 145

pr165to145 <-(pnorm(165,155,27)-pnorm(145,155,27))*100
print(pr165to145)
## [1] 28.88935

#interpretation: from a sample of randomly chosen 12-14yr old subjects, there is 28.8% chance their cholesterol values are between 165 and 145 mg/dl.

#(b) Suppose we were to choose at random from the population a large number of groups of nine 12- to 14-year-olds each. In what percentage of the groups would the group mean cholesterol value be between 145 and 165 mg/dl? here we asked for the proportion from the sampling distribution between 165 and 145. From example 4.1.1, the sample n = 431. based theorem 5.2.1 pg 162 the mean of sampling dist = pop mean mu (sub y) = mu. the SD of the sampling distribution Y bar is equal the pop sd sigma/sqrt of N. given mean 155, sd sample dist = 27/sqrt(431)

se <- (27/20.7605) # n = 431 so se 27/sqrt(431)
print(se)
## [1] 1.300547

#now calculate Pr {145<Y165} using pnorm fx

(pnorm(165,155,1.300547) - pnorm(145,155,1.300547))*100
## [1] 100

#interpretation: assuming random and independent sampling technique, for a sample distribution 12 - 14yr old subjects where n = 431 , there is 100% chance their cholesterol values are between 165 and 145 mg/dl.

Exercise 5.2.9

#The basal diameter of a sea anemone is an indicator of its age. The density curve shown here represents the distribution of diameters in a certain large population of anemones; the population mean diameter is 4.2 cm, the standard deviation is 1.4 cm. Let Y represent the mean diameter of 25 anemones randomly chosen from the population. Also let sample distribution se = sigma/sqrt(n) = 1.4/sqrt(25) = 7.
#(a) Find the approximate value of Pr{4<Y<5}.

(pnorm(5,4.2,7) - pnorm(4,4.2,7))*100
## [1] 5.689115

#interpretation: for the sample distribution n = 25, there is 5.68% chance that Y lies between 4 and 5 cm

#(b) Why is your answer to part (a) approximately correct even though the population distribution of diameters is clearly not normal? Would the same approach be equally valid for a sample of size 2 rather than 25? Why or why not?
#Ans given the Central Limit Theorem 5.2.1 no matter what distribution Y may have in the population,if the sample size is large enough, then the sampling distribution of Y will be approximately a normal distribution. So in this hypothetical we are close to n = 25 ~ 30 vs in the case of n = 2. this is very low n and so does meet the criteria for the central limit theory.

Exercise 5.2.13

#A certain assay for serum alanine aminotransferase (ALT) is rather imprecise. The results of repeated assays of a single specimen follow a normal distribution with mean equal to the ALT concentration for that specimen and standard deviation equal to 4 U/l (as in Exercise 4.S.15). Suppose a hospital lab measures many specimens every day, and specimens with reported ALT values of 40 or more are flagged as “unusually high.” If a patient’s true ALT concentration is 35 U/l, find the probability that his specimen will be flagged as “unusually high”

#(a) if the reported value is the result of a single assay. here mean = 35 and n =1 so se = 4/sqrt(1) = 4. now calculate chance AUC is to right of 35.

(1-pnorm(40, 35, 4))*100
## [1] 10.56498

#interpretation: for random sample ALT n=1, the chance there is 6.68% chance that it will be unusually high

#(b) if the reported value is the mean of three independent assays of the same specimen. here mean 35, se = 4/sqrt(3) = 2.3094

(1-pnorm(40, 35, 2.3094))*100
## [1] 1.519137

#interpretation: for random sample ALT n=3, the chance there is 1.51% chance that it will be unusually high

Exercise 5.2.19

#The partial pressure of oxygen, PaO2, is a measure of the amount of oxygen in the blood. Assume that the distribution of PaO2 levels among newborns has an average of 38 mmHg and a standard deviation of 9. If we take a sample of size n = 25, mean 38 per central linit theorem and se = 9/sqrt(25)

#(a) what is the probability that the sample average will be greater than 36?

(1-pnorm(36,38,9/sqrt(25)))*100
## [1] 86.67397

#interpretation: for the random sample n = 25, there is a 86.6% chance that the mean will greater than 36 mmHg oxygen

#(b) what is the probability that the sample average will be greater than 41?

(1-pnorm(41,38,9/sqrt(25)))*100
## [1] 4.779035

#interpretation: for the random sample n = 25, there is a 4.77% chance that the mean will greater than 41 mmHg oxygen

Exercise 5.4.1

#Consider interviewing a random sample of n = 50 adults. Let P denote the proportion of the 50 sampled adults who drink coffee. If the population proportion of coffee drinkers is 0.80, what is the appropriate approximate model for the distribution of P hat over many such samples of size 50? That is, what type of distribution is this, what is the mean, and what is the standard deviation? #P hat refers to the sampling distribution. If n is large then the sampling distribution phat can be approximated by a binomial distribution.

#the mean of the distribution of P hat = (p) = 0.8

#the sd of the distribution of P hat = sqrt (p(q)/n)

sdphat <- sqrt((0.8*0.2)/50)
print(sdphat)
## [1] 0.05656854

###n Exercise 5.4.3 #In the United States, 44% of the population has type O blood. Suppose a random sample of 12 persons is taken. Find the probability that 6 of the persons will have type O blood (and 6 will not)

#(a) using the binomial distribution formula. here we can use dbinom fx to return the value of the prob density fx where number success = 6, trials = 12, and prob = .44

Pry6 <- (dbinom(6,12,0.44))*100
print(Pry6)
## [1] 20.67836

#interpretation: given random sample n = 12, there is approximately a 20.6% chance that 6 subjects will have type O blood

#(b) using the normal approximation. mean = n(p) = 12(.44) = 5.28, sd = sqrt(npq) = sqrt(5.28*0.56) = 1.7195. now use the continuity correction to calculate exactly 6 successes. we will use pnorm and set n 0.5 above and below 6 so 6.5 and 5.5.

(pnorm(6.5,5.28,1.7195) - pnorm(5.5,5.28,1.7195))*100
## [1] 21.00921

#interpretation: given a random sample n = 12, there is approximately a 21.0% chance that 6 subjects will have type O blood

Exercise 5.4.7

#Consider random sampling from a dichotomous n population with p = 0.3, and let E be the event that Phat is within +/- 0.05 of p. Use the normal approximation (without the continuity correction) to calculate Pr{E} for a sample of size n = 400. Using Theorem 5.4.1 for phat mean = p = 0.3 and sd = sqrt(pq/n) = sqrt((0.3*0.7)/400) = 0.0229. now use pnorm for 0.35 and 0.25 and take difference

(pnorm(0.35,0.3,0.0299)- pnorm(0.25,0.3,0.0299))*100
## [1] 90.55232

#interpretation: for the random sample n = 400, there is approximately a 90.5% chance the Pr{E} is between 0.35 and 0.25.