Question 1.
dbinom (10, size=50, prob=0.1)
## [1] 0.01518333
plot(dbinom (0:50, size=50, prob=0.1))
# The probability of 10 or less influenza-related cases out of 50
pbinom(10, size=50, prob=0.1)
## [1] 0.9906454
# The probability of 10 or more influenza-related cases out of 50
1-pbinom(9, size=50, prob=0.1) # Pr(X>=10) = 1-Pr(X<=9) = 1 – pbinom(9, size=50, prob=0.1)
## [1] 0.02453794
Discussion
If 10 out of 50 visitors in the emergency room had influenza-related cases today, today rate is 20%. But, the probability of 10 influenza-related cases out of 50 is ~1.518%, and the cumulative probability of 10 or more influenza-related cases out of 50 is ~2.454%. Therefore, today rate is way higher than ~1.518% and ~2.454% , and it is very weird situation.
Question 2.
# negative binomial distribution of mean expression
# n = number of observations
# size = target for number of successful trials
# prob = probability of success in each trial
negativeRandomSample = rnbinom (10000, size=1, prob=0.0001)
# on average, it takes ~10,000 times to have one success
summary(negativeRandomSample)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 2882 6970 10020 13830 125300
# This makes sense because I set the probability to 0.0001
sorted=sort(negativeRandomSample)
cumulatedSum=cumsum(negativeRandomSample) # cumulative summation
pnbinom(6926, size=1,prob=0.0001) # median 6926
## [1] 0.4997937
qnbinom(0.5, size=1,prob=0.0001) #6931, close to median
## [1] 6931
Discussion
• In the generation of random sample of 10,000 negative binomial variates, as I set “size=1, prob=0.0001”, on average, it takes ~10,000 times to have one success. Therefore, it makes sense that I got mean=~10,040.
• In the 10,000 random samples of negative binomial variates, median was 6,926. It was confirmed when I used CDF; I got the probability ~0.4998. In the quantile function, I got 6,931, which is close to median. It is comparable to the sorted output; the medians for the sample and the nbinom distribution are almost the same
Question 3.
basicProbability=c(0.855,0.027,0.018,0.003,0.095,0.002)
basicProbability
## [1] 0.855 0.027 0.018 0.003 0.095 0.002
oneSample_size100 = rmultinom( n = 1, size =100, prob = basicProbability)
nineSample_size100 = rmultinom( n = 9, size =100, prob = basicProbability)
oneSample_size100 = array(oneSample_size100, dim=c(6,1))
nineSample_size100 = array(nineSample_size100, dim=c(6,9))
tenSample_size100 = cbind(oneSample_size100, nineSample_size100)
tenSample_size100
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 86 75 83 85 87 87 91 82 85 88
## [2,] 5 7 1 1 3 1 0 4 4 2
## [3,] 1 3 3 3 1 1 3 0 5 3
## [4,] 0 1 0 0 0 0 0 0 0 0
## [5,] 8 13 12 11 9 11 6 14 6 6
## [6,] 0 1 1 0 0 0 0 0 0 1
rownames(tenSample_size100) <- c("negative_healthy","positive_healthy", "indeterminate_healthy",
"negative_sick","positive_sick","indeterminate_sick")
healthyPositive <- tenSample_size100["positive_healthy", ]
table(healthyPositive)
## healthyPositive
## 0 1 2 3 4 5 7
## 1 3 1 1 2 1 1
summary(healthyPositive)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 1.0 2.5 2.8 4.0 7.0
hist(healthyPositive)