In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.
Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities, using r chunks below. Here, x, denotes the IQ of the randomly selected person.
#We find the probability for an IQ greater than 120, making sure to keep lower.tail = FALSE
pnorm(q = 120, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.09121122
#We find the probability for a person with an IQ less than 120, lower.tail manipulations are not necessary.
pnorm(q = 120, mean = 100, sd = 15)
## [1] 0.9087888
#Finding the probability for both an IQ > 80 and an IQ < 120
pnorm(q = 120, mean = 100, sd = 15)
## [1] 0.9087888
pnorm(q = 80, mean = 100, sd = 15, lower.tail = FALSE)
## [1] 0.9087888
Considering that the probability is the same for a randomly selected person to have an IQ above 80 and below 120, I believe the proability that a randomly selected student will have an IQ between 80 and 120 is 90.88%.
#We will store the objects necessary for obtaining standard error of mean, degress of freedom, and the t-score to calculate probability
mu <- 100
sigma <- 15
n <- 12
x <- 120
#Find the standard error of mean and store it as an object for ease of access
SEM <- sigma / sqrt(n)
#Find the degree of freedom
df <- n - 1
#Find the t-score
t <- (x - mu) / SEM
#Using the pt() function we will calculate the probability of their mean IQ being greater than 120.
prob <- 1 - pt(t, df)
#To print answer...
print(prob)
## [1] 0.0003709099
Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean and sample size of the PulseRate variable in this dataset and answer the question that follows below.
# Store the NSCC student dataset in environment
nscc <- read.csv("nscc_student_data.csv")
# Find the mean pulse rate of this sample
mean(nscc$PulseRate, na.rm = TRUE)
## [1] 73.47368
# Find the sample size of pulse rates (hint: its how many non-NA values are there)
#sum(!is.na() will give us the actual sample size not including NA
sum(!is.na(nscc$PulseRate))
## [1] 38
Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?
I find it unlikely that the population mean pulse rate for all NSCC students is exactly equal to the sample mean found.
If we assume the mean pulse rate for all NSCC students is \(\sigma = 14\), construct a 95% confidence
interval for the mean pulse rate of all NSCC students and conclude in a
complete sentence below.
(Note: we can create a valid confidence interval here since n >
30)
# Store mean
mean <- mean(nscc$PulseRate, na.rm = TRUE)
# Calculate lower bound of 95% CI, we will use the form \mu + or - 1.96*(sd/sqrt(n)) for both lower and upper bound.
mean - 1.96*(14/sqrt(38))
## [1] 69.02233
# Calculate upper bound of 95% CI
mean + 1.96*(14/sqrt(38))
## [1] 77.92504
I am 95% confident that… the mean pulse rate for all NSCC students is between 69.02 < x < 77.93
Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.
# Calculate lower bound of 99% CI
mean - 2.56*(14/sqrt(38))
## [1] 67.65967
# Calculate upper bound of 99% CI
mean + 2.56*(14/sqrt(38))
## [1] 79.2877
I am 99% confident that… the mean pulse rate of all NSCC students is between 67.66 and 79.29.
Describe and explain the difference you observe in your confidence interval results for questions 5 and 6.
The difference between the range that the 95% and 99% confidence intervals have outputted is that the 99% confidence interval increases the width of the range that the mean might fall between. It accounts for more standard errors.
In the Fall 2019 semester of the 2019-20 academic year, the average NSCC student took 12.3 credits with \(\sigma = 3.4\). I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differs last year from Fall 2019.
\(H_0: \mu = 12.3\) \(H_A: \mu ≠ 12.3\)
# Calculate mean of Credits variable
mean(nscc$Credits)
## [1] 11.775
# Calculate sample size of Credits variable
sum(!is.na(nscc$Credits))
## [1] 40
# Lower bound of 95% CI
11.775 - 1.96*(3.4/sqrt(40))
## [1] 10.72133
# Upper bound of 95% CI
11.775 + 1.96*(3.4/sqrt(40))
## [1] 12.82867
Based on the confidence interval of 10.72 < x < 12.83, we fail to reject the null hypothesis.
There is not sufficient evidence to conclude that the mean credits of 2018 NSCC students is any different than the mean credits of 2019 NSCC students.
NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by the p-value method to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults. Recall the assumption that \(\sigma = 14\) for NSCC student pulse rates.
\(H_0: \mu = 72\) \(H_A: \mu > 72\)
# Calculate mean of PulseRate variable, we also store the mean as an object for ease of use in calculating pnorm...
mean_pr <- mean(nscc$PulseRate, na.rm = TRUE)
# Probability of getting that sample data by random chance if pop mean was indeed 72bpm
#We will use sum(!is(na)) to ensure that the n variable is accurate (sample size)
sum(!is.na(nscc$PulseRate))
## [1] 38
#To find probability if pop mean was indeed 72 bpm.
pnorm(q = 72, mean = mean_pr, sd = 14/sqrt(38))
## [1] 0.2582061
Based on the the p value of .2582 > .05, we fail to reject the null hypothesis.
There is not sufficient evidence to support the claim that there is an alternative hypothesis.