1. You get back your exam from problem 3.d of Homework 3, and you got a 45. What is your z score?
  2. What percentile are you?
  3. What is the total chance of getting something at least that far from the mean, in either direction? (Ie, the chance of getting 45 or below or equally far or farther above the mean.)
  1. Write a script that generates a population of at least 10,000 numbers and samples at random 9 of them.
  2. Calculate by hand the sample mean. Please show your work using proper mathematical notation using latex.
  3. Calculate by hand the sample standard deviation.
  4. Calculate by hand the standard error.
  5. Calculate by hand the 95% CI using the normal (z) distribution. (You can use R or tables to get the score.)
  6. Calculate by hand the 95% CI using the t distribution. (You can use R or tables to get the score.)
  1. Explain why 2.e is incorrect.
  2. In a sentence or two each, explain what’s wrong with each of the wrong answers in Module 4.4, “Calculating percentiles and scores,” and suggest what error in thinking might have led someone to choose that answer. (http://www.nickbeauchamp.com/comp_stats_NB/compstats_04-04.html)
  1. Based on 2, calculate how many more individuals you would have to sample from your population to shink your 95% CI by 1/2 (ie, reduce the interval to half the size). Please show your work.
  2. Say you want to know the average income in the US. Previous studies have suggested that the standard deviation of your sample will be $20,000. How many people do you need to survey to get a 95% cofidence interval of ± $1,000? How many people do you need to survey to get a 95% CI of ± $100?
  1. Write a script to test the accuracy of the confidence interval calculation as in Module 4.3. But with a few differences: (1) Test the 99% CI, not the 95% CI. (2) Each sample should be only 20 individuals, which means you need to use the t distribution to calculate your 99% CI. (3) Run 1000 complete samples rather than 100. (4) Your population distribution must be different from that used in the lesson, although anything else is fine, including any of the other continuous distributions we’ve discussed so far.

Homework 4

1

a

You get back your exam from problem 3.d of Homework 3, and you got a 45. What is your z score?
Because he is lazy, your teacher has assigned grades for an exam at random, and to help hide his deception he has given the fake grades a normal distribution with a mean of 70 and a standard deviation of 10

\(z = \frac{x – \mu}{\sigma}\)

zs <- function(x, mu, sd) {
    z <- (x - mu)/sd
    return(z)
}
(z <- zs(45, 70, 10))
## [1] -2.5

b

What percentile are you?

paste(round(qnorm(0.45, mean = 0.7, sd = 0.1) * 100, 2), "%", sep = "")
## [1] "68.74%"

c

What is the total chance of getting something at least that far from the mean, in either direction? (Ie, the chance of getting 45 or below or equally far or farther above the mean.)

pnorm(z) * 2
## [1] 0.01241933

2

a

Write a script that generates a population of at least 10,000 numbers and samples at random 9 of them.

datatable(iqsample <- as.data.frame(matrix(sample(1:200, 10000, replace = T), 100, 
    10)))
rs <- function(d, x) {
    # d=data, x= # of samples
    sampleVector <- c(d[sample(nrow(d), 1), sample(ncol(d), 1)])  #create initial #vector
    x <- x - 1  #Account for existing values in vector when appending addtl
    for (i in c(1:x)) {
        row <- sample(nrow(d), 1)  #choose a random row #
        col <- sample(ncol(d), 1)  #choose a random col #
        sampleVector <- append(sampleVector, d[row, col], after = length(sampleVector))  #add that row,col value to the vector
        i <- i + 1  #increment
    }
    return(sampleVector)
}
(sample <- rs(iqsample, 9))
## [1] 154  79 127 117 143 175 145   4  27

b

Calculate by hand the sample mean. Please show your work using proper mathematical notation using latex.
\(\bar{x}=\frac{1}{N}\sum_{i=1}^{N}{x_i}\)

mean(sample)
## [1] 107.8889

c

Calculate by hand the sample standard deviation. \(s=\sqrt{\sum_{i=1}^{n}\frac{(x_i-\bar{x})^2}{n-1}}\)

sd(sample)
## [1] 59.01153

d

Calculate by hand the standard error. \(SE=z*\frac{s}{\sqrt{N}}\)

(se <- qnorm(0.975) * (sd(sample)/sqrt(length(sample))))
## [1] 38.55349

e

Calculate by hand the 95% CI using the normal (z) distribution. (You can use R or tables to get the score.) \(\text{Confidence Interval}:CI=\bar{x}\pm z*\frac{s}{\sqrt{N}}\)

ci <- function(cl, data) {
    x <- (1 - cl)/2 + cl
    CI <- c(mean(data) - qnorm(x) * sd(data)/sqrt(length(data)), mean(data) + qnorm(x) * 
        sd(data)/sqrt(length(data)))
    return(CI)
}
ci(0.95, sample)
## [1]  69.3354 146.4424

f

Calculate by hand the 95% CI using the t distribution. (You can use R or tables to get the score.) \(\text{Confidence Interval}:CI=\bar{x}\pm t*\frac{s}{\sqrt{N}}\)

cit <- function(cl, data) {
    x <- (1 - cl)/2 + cl
    CI <- c(mean(data) - qt(x, length(data) - 1) * sd(data)/sqrt(length(data)), mean(data) + 
        qt(x, length(data) - 1) * sd(data)/sqrt(length(data)))
    return(CI)
}
(citVector <- cit(0.95, sample))
## [1]  62.52861 153.24917

3

a

  1. 2e is incorrect because the sample size of 9 does not meet the criterion for the Central Limit Theorem where for a sample to be considered normal it must have n>=30. An experiment with n<30 has a mean that will be too sensitive to outliers, a distribution that often exhibits skew, and a large standard error so the t-distribution is recommended for analyzing these data sets to adjust for these factors.

b

3±2∗1.5333
  1. When determining the critical t-value the alpha level is incorrectly assumed to be 1 minus the confidence level, or .10, and the degrees of freedom (\(n-1\)) is incorrectly equated with the n. The standard error \(SE=\frac{s}{\sqrt{N}}\) is confused with the sample standard deviation (2)
3±1∗1.5333
  1. When determining the critical t-value the alpha level is incorrectly assumed to be 1 minus the confidence level, or .10, and the degrees of freedom (\(n-1\)) is incorrectly equated with the n. The standard error \(SE=\frac{s}{\sqrt{N}}\) is correct
3±2∗1.6383
  1. When determining the critical t-value the alpha level is incorrectly assumed to be 1 minus the confidence level, or .10, the degrees of freedom (\(n-1\)) is correct. The standard error \(SE=\frac{s}{\sqrt{N}}\) is incorrectly equated with the sample standard deviation
3±1∗2.3533
A)Correct answer
3±1∗2.132
A)The alpha level is correct \(\frac{1-\text(CL)}{2}\), the degrees of freedom is incorrect. The standard error \(SE=\frac{s}{\sqrt{N}}\) is correct

4

a

Based on 2, calculate how many more individuals you would have to sample from your population to shrink your 95% CI by 1/2 (ie, reduce the interval to half the size). Please show your work.
A)If we are to ignore the change in \(\bar{x}\) and \(s\) with each additional observation and approach the question algebraically, we could calculate the standard error at 1/2 it’s current value to reduce the interval to half:
(tse <- qt(0.975, 8) * (sd(sample)/sqrt(length(sample))))
## [1] 45.36028

\(SE=t*\frac{s}{\sqrt{N}}=\frac{45.36028}{2}\)
\(SE=2.306*\frac{59.012}{\sqrt{n}}=22.68014\)
and solve for n:
\(59.012=\frac{22.68014}{2.306}\sqrt{n}\)
\(6=\sqrt{n}\)
\(n=36\)
Algebraically, approximately 36 observations (27 more) are needed to half the interval. The script below runs an experiment with the data by incrementing n with each sample until the confidence interval is half the original. It works out to between 22 & 25 observations.

# Find a value for n that shrinks the 95%CI by 1/2
findCI.5 <- function(iVector, clevel, n) {
    (dCit <- diff(iVector))  # the CI interval as is
    (dCit.5 <- dCit/2)  #1/2 the CI interval
    while (dCit > dCit.5) {
        # the condition
        n <- n + 1  #increment n with each loop
        ciVector <- cit(clevel, rs(iqsample, n))  #Find CI with an additional obs
        dCit <- diff(ciVector)  #Find CI Interval for comparison in the condition
        # for Testing print(c(dCit,dCit.5))
    }
    return(n)  #return the n when cond. met
}
findCI.5(citVector, 0.95, 9)  #run the function
## [1] 20
round(mean(replicate(30, findCI.5(citVector, 0.95, 9), simplify = T)))  #find the average since it varies
## [1] 24
I was not sure if this question was looking for an algebraic solving for n based on the equation for SE, or this script, but I assumed this script because the dependent variables of mean & sd for the confidence interval of the sample will change depending on what the value of an additional n is. Thus I coded this to try various iterations of n and stop when the confidence interval becomes smaller than half the initial confidence interval. The N is different each time (due to the dependent variables changing with each sample) with a mean between 22 ~ 25.

b

Say you want to know the average income in the US. Previous studies have suggested that the standard deviation of your sample will be $20,000. How many people do you need to survey to get a 95% confidence interval of ± $1,000? How many people do you need to survey to get a 95% CI of ± $100?
  1. \(1.95996\frac{s}{\sqrt{n}}=CI\)
    \(n=(\frac{20000}{\frac{CI}{1.95996}})^2\)
(n <- round((20000/(1000/qnorm(0.975)))^2))
## [1] 1537
(n <- round((20000/(100/qnorm(0.975)))^2))
## [1] 153658

5

Write a script to test the accuracy of the confidence interval calculation as in Module 4.3. But with a few differences: (1) Test the 99% CI, not the 95% CI. (2) Each sample should be only 20 individuals, which means you need to use the t distribution to calculate your 99% CI. (3) Run 1000 complete samples rather than 100. (4) Your population distribution must be different from that used in the lesson, although anything else is fine, including any of the other continuous distributions we’ve discussed so far.

# 1. Set how many times we do the whole thing
nruns <- 1000  #change (3)
# 2. Set how many samples to take in each run (1000 rather than the previous
# 10,000)
nsamples <- 20  #change (2)
# 3. Create an empty matrix to hold our summary data: the mean and the upper and
# lower CI bounds.
sample_summary <- matrix(NA, nruns, 3)
# 4. Run the loop
for (j in 1:nruns) {
    sampler <- rep(NA, nsamples)
    # 5. Our sampling loop A t distribution Example
    for (i in 1:nsamples) {
        sampler[i] <- rt(1, 19)
    }
    # An example using chronotypes (doesn't actually test the CLT but is an
    # interesting topic) Twenty-five percent show a chronotype earlier than 2:24, 50%
    # fall #between 2:24 and 4:15, and another 25% show a chronotype later than 4:15.
    # source:http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0178782#sec008
    # ct <- runif(1,1,100) if(ct <= 25){# A morning lark (MSF <2:24) sampler[i] <- 0
    # } else{ if(ct > 25 && ct<=75 ){#A bear type {2:24<MSF<4:15} sampler[i] <- .5 }
    # else{#A night owl {4:15<MSF} sampler[i] <- 1 } } }
    
    # 7. Finally, calculate the mean and 99% CI's for each sample and save it in the
    # correct row of our sample_summary matrix
    sample_summary[j, 1] <- mean(sampler)  # mean
    standard_error <- sd(sampler)/sqrt(nsamples)  # standard error
    sample_summary[j, 2] <- mean(sampler) - qt(0.995, length(sampler) - 1) * standard_error  # lower 99% CI bound changes (1,2)
    sample_summary[j, 3] <- mean(sampler) + qt(0.995, length(sampler) - 1) * standard_error  # upper 99% CI bound changes (1,2)
}
counter = 0
for (j in 1:nruns) {
    # If .5 is above the lower CI bound and below the upper CI bound:
    if (0 > sample_summary[j, 2] && 0 < sample_summary[j, 3]) {
        counter <- counter + 1
    }
}
counter/nruns
## [1] 0.982