Because he is lazy, your teacher has assigned grades for an exam at random, and to help hide his deception he has given the fake grades a normal distribution with a mean of 70 and a standard deviation of 10
\(z = \frac{x – \mu}{\sigma}\)
zs <- function(x, mu, sd) {
z <- (x - mu)/sd
return(z)
}
(z <- zs(45, 70, 10))## [1] -2.5
What percentile are you?
paste(round(qnorm(0.45, mean = 0.7, sd = 0.1) * 100, 2), "%", sep = "")## [1] "68.74%"
What is the total chance of getting something at least that far from the mean, in either direction? (Ie, the chance of getting 45 or below or equally far or farther above the mean.)
pnorm(z) * 2## [1] 0.01241933
Write a script that generates a population of at least 10,000 numbers and samples at random 9 of them.
datatable(iqsample <- as.data.frame(matrix(sample(1:200, 10000, replace = T), 100,
10)))rs <- function(d, x) {
# d=data, x= # of samples
sampleVector <- c(d[sample(nrow(d), 1), sample(ncol(d), 1)]) #create initial #vector
x <- x - 1 #Account for existing values in vector when appending addtl
for (i in c(1:x)) {
row <- sample(nrow(d), 1) #choose a random row #
col <- sample(ncol(d), 1) #choose a random col #
sampleVector <- append(sampleVector, d[row, col], after = length(sampleVector)) #add that row,col value to the vector
i <- i + 1 #increment
}
return(sampleVector)
}
(sample <- rs(iqsample, 9))## [1] 154 79 127 117 143 175 145 4 27
Calculate by hand the sample mean. Please show your work using proper mathematical notation using latex.
\(\bar{x}=\frac{1}{N}\sum_{i=1}^{N}{x_i}\)
mean(sample)## [1] 107.8889
Calculate by hand the sample standard deviation. \(s=\sqrt{\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{n-1}}\)
sd(sample)## [1] 59.01153
Calculate by hand the standard error. \(SE=z*\frac{s}{\sqrt{N}}\)
(se <- qnorm(0.975) * (sd(sample)/sqrt(length(sample))))## [1] 38.55349
Calculate by hand the 95% CI using the normal (z) distribution. (You can use R or tables to get the score.) \(\text{Confidence Interval}:CI=\bar{x}\pm z*\frac{s}{\sqrt{N}}\)
ci <- function(cl, data) {
x <- (1 - cl)/2 + cl
CI <- c(mean(data) - qnorm(x) * sd(data)/sqrt(length(data)), mean(data) + qnorm(x) *
sd(data)/sqrt(length(data)))
return(CI)
}
ci(0.95, sample)## [1] 69.3354 146.4424
Calculate by hand the 95% CI using the t distribution. (You can use R or tables to get the score.) \(\text{Confidence Interval}:CI=\bar{x}\pm t*\frac{s}{\sqrt{N}}\)
cit <- function(cl, data) {
x <- (1 - cl)/2 + cl
CI <- c(mean(data) - qt(x, length(data) - 1) * sd(data)/sqrt(length(data)), mean(data) +
qt(x, length(data) - 1) * sd(data)/sqrt(length(data)))
return(CI)
}
(citVector <- cit(0.95, sample))## [1] 62.52861 153.24917
(tse <- qt(0.975, 8) * (sd(sample)/sqrt(length(sample))))## [1] 45.36028
\(SE=t*\frac{s}{\sqrt{N}}=\frac{45.36028}{2}\)
\(SE=2.306*\frac{59.012}{\sqrt{n}}=22.68014\)
and solve for n:
\(59.012=\frac{22.68014}{2.306}\sqrt{n}\)
\(6=\sqrt{n}\)
\(n=36\)
Algebraically, approximately 36 observations (27 more) are needed to half the interval. The script below runs an experiment with the data by incrementing n with each sample until the confidence interval is half the original. It works out to between 22 & 25 observations.
# Find a value for n that shrinks the 95%CI by 1/2
findCI.5 <- function(iVector, clevel, n) {
(dCit <- diff(iVector)) # the CI interval as is
(dCit.5 <- dCit/2) #1/2 the CI interval
while (dCit > dCit.5) {
# the condition
n <- n + 1 #increment n with each loop
ciVector <- cit(clevel, rs(iqsample, n)) #Find CI with an additional obs
dCit <- diff(ciVector) #Find CI Interval for comparison in the condition
# for Testing print(c(dCit,dCit.5))
}
return(n) #return the n when cond. met
}
findCI.5(citVector, 0.95, 9) #run the function## [1] 20
round(mean(replicate(30, findCI.5(citVector, 0.95, 9), simplify = T))) #find the average since it varies## [1] 24
I was not sure if this question was looking for an algebraic solving for n based on the equation for SE, or this script, but I assumed this script because the dependent variables of mean & sd for the confidence interval of the sample will change depending on what the value of an additional n is. Thus I coded this to try various iterations of n and stop when the confidence interval becomes smaller than half the initial confidence interval. The N is different each time (due to the dependent variables changing with each sample) with a mean between 22 ~ 25.
(n <- round((20000/(1000/qnorm(0.975)))^2))## [1] 1537
(n <- round((20000/(100/qnorm(0.975)))^2))## [1] 153658
Write a script to test the accuracy of the confidence interval calculation as in Module 4.3. But with a few differences: (1) Test the 99% CI, not the 95% CI. (2) Each sample should be only 20 individuals, which means you need to use the t distribution to calculate your 99% CI. (3) Run 1000 complete samples rather than 100. (4) Your population distribution must be different from that used in the lesson, although anything else is fine, including any of the other continuous distributions we’ve discussed so far.
# 1. Set how many times we do the whole thing
nruns <- 1000 #change (3)
# 2. Set how many samples to take in each run (1000 rather than the previous
# 10,000)
nsamples <- 20 #change (2)
# 3. Create an empty matrix to hold our summary data: the mean and the upper and
# lower CI bounds.
sample_summary <- matrix(NA, nruns, 3)
# 4. Run the loop
for (j in 1:nruns) {
sampler <- rep(NA, nsamples)
# 5. Our sampling loop A t distribution Example
for (i in 1:nsamples) {
sampler[i] <- rt(1, 19)
}
# An example using chronotypes (doesn't actually test the CLT but is an
# interesting topic) Twenty-five percent show a chronotype earlier than 2:24, 50%
# fall #between 2:24 and 4:15, and another 25% show a chronotype later than 4:15.
# source:http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0178782#sec008
# ct <- runif(1,1,100) if(ct <= 25){# A morning lark (MSF <2:24) sampler[i] <- 0
# } else{ if(ct > 25 && ct<=75 ){#A bear type {2:24<MSF<4:15} sampler[i] <- .5 }
# else{#A night owl {4:15<MSF} sampler[i] <- 1 } } }
# 7. Finally, calculate the mean and 99% CI's for each sample and save it in the
# correct row of our sample_summary matrix
sample_summary[j, 1] <- mean(sampler) # mean
standard_error <- sd(sampler)/sqrt(nsamples) # standard error
sample_summary[j, 2] <- mean(sampler) - qt(0.995, length(sampler) - 1) * standard_error # lower 99% CI bound changes (1,2)
sample_summary[j, 3] <- mean(sampler) + qt(0.995, length(sampler) - 1) * standard_error # upper 99% CI bound changes (1,2)
}
counter = 0
for (j in 1:nruns) {
# If .5 is above the lower CI bound and below the upper CI bound:
if (0 > sample_summary[j, 2] && 0 < sample_summary[j, 3]) {
counter <- counter + 1
}
}
counter/nruns## [1] 0.982