R Markdown
http://bit.ly/snsdemo
- The average number of lymphocytes in 1 microliter of blood in a population of adults is 2,900 with a standard deviation of 600. Krishanu has his blood drawn with a recorded count of 4,000 lymphocytes in 1 microliter of blood. Assuming the population of lymphocyte count follows a normal distribution, what proportion of the population has a higher lymphocyte count than Krishanu?
Mean - Lymphocytes = 2900 SD = 600 Kris = 4000 what proportion x = or > than 4000?
z-score = 4000 - mean / V^2 = or = 4000 - mean / SD = 4000 - 2900/600 = 1.833 z-score= 1.83
#since its a normal distribution we can use pnorm, and we plug the z-score as quantile
result <- (1-(pnorm(1.833,0,1)))*100
print(result)
## [1] 3.340129
#The proportion is 3.34%
- The probability that a woman will develop non-Hodgkin lymphoma in her lifetime is 1.9%. What is the probability that exactly two women drawn from a sample of ten will develop non-Hodgkin lymphoma in their lifetime?
2 in 10 probability of success = 1.9 probability of failure = 98.1 n is 2 k is 10
combinations = 10!/2!(10-2)!
Probability = 10x9/2x1(0.019)^2 (0.9821)^10-2
Proba <- (1-pbinom(2,10,0.019))*100
print(Proba)
## [1] 0.07446244
#The probability is 0.07%
- Suppose we now sample 500 women. What is the probability that at least 15 develop non-Hodgkin lymphoma in their lifetime?
500 woman Probability of success: 0.019
500 x 0.019 = 9.5 variance = 500 x 0.019x0.981 = 9.15
Then, we have to standardize by figuring out the z-score first
z-score = 15-9.5/root2(9.15) z-score = 5.5/3.024 = 1.18 Then!
total <- (1- pnorm(1.18,0,1))*100
#The probability is 11.9 %
- In the following problem, you will verify the Central Limit Theorem:
- Load in the file “hanes_subset.Rdata”
load("/Users/victorleon/Documents/IntroR_F21/VilcekR_fall21/hanes_subset.RData")
- Plot a histogram of the variable “BMI”. What is the distribution?
hist(hanes$BMI)

This is a skewed distribution towards the right, strongly similar to a normal distribution
- Use descriptive statistics to calculate the mean, median, variance, standard deviation, range, and interquartile range of the variable “BMI”.
summary(hanes$BMI)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 17.08 22.67 25.50 27.01 29.36 63.58
# mean is 27.01
# median is 25.50max
var(hanes$BMI)
## [1] 44.22672
# variance is 44.22
sd(hanes$BMI)
## [1] 6.650317
#Standard deviation is 6.65
# To calculate the range we must substract
max(hanes$BMI)-min(hanes$BMI)
## [1] 46.5
#The range is 46.5
IQR(hanes$BMI,na.rm=T)
## [1] 6.69
#The interquantile range is 6.69
- There are 580 observations of the variable BMI. Write code to randomly sample 1000 subsets of the observations in BMI, each of size 10 without replacement. Then find the mean of each of these 1000 samples and make a histogram. What is the distribution? Now repeat the steps above using subsets of the data of size 40. Does this distribution look different? If so, why?
size10 <- sample(1000,10)
hist(size10)

mean(size10)
## [1] 591.1
size40<- sample(hanes$BMI,40)
hist(size40)

The second distribution looks more like a normal distribution because it reaches the threshold of 30
- No R code is required for these final questions. For the following, determine whether the error made is either 1) a Type I error or 2) a Type II error:
- As the 𝛼 level for a statistical test decreases, what error type also decreases?
Type I error
- Suppose you have a study looking at whether a new drug could reduce the rate of deaths from lung cancer. The truth (that you don’t know of course) is that the drug can reduce the rate of cancer-related deaths. You set up a clinical trial to compare this new drug to the standard drug on the market. Your null hypothesis is that there is no difference in reduction of lung cancer deaths between the 2 drugs, your alternative is that there is a difference. However, after running the study your statistical test concluded that there is no difference in cancer death rates when using the two drugs. What error was made?
Type II error
- If you perform a hypothesis test and the null hypothesis is false, which type of error cannot be made?
Type I error