Project #4 - Sampling Distributions

Purpose

In this project, students will demonstrate their understanding of the normal distribution, sampling distributions, and confidence intervals and hypothesis tests.

Question 1

Assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities. Here, x, denotes the IQ of the randomly selected person.
a. P(x > 140)

# probabilities of randomly selected person has an IQ > 140
1-pnorm(140 , mean = 100, sd = 15)

## [1] 0.003830381

The probability of gettig an IQ grater than 140 is 0.4%

P(x < 110)

# probabilities of randomly selected person has an IQ < 110
pnorm(110 , mean = 100, sd = 15)

## [1] 0.7475075

The probability of gettig an IQ grater than 110 is 74.8%

What is the probability that a random selected student will have an IQ between 80 and 120?

# probability of random student with 80> IQ > 120
pnorm(120, 100, 15)-pnorm(80 , 100, 15)

## [1] 0.8175776

The probability of a random student getting an IQ between 80 and 120 is 81%.

Question 2

Continue to assume IQ scores are normally distributed with a mean of 100 and a standard deviation of 15. If a person is randomly selected, find each of the requested probabilities.
a. What is the probability of a randomly selected student will have an IQ greater than 110?

# probability of randomly selected person has an IQ > 110
1- pnorm(110, 100, 15)

## [1] 0.2524925

The Probability of a person with IQ greader than 110 is 25%.

Suppose that a random sample of 12 students is selected. What is the probability that their mean IQ is greater than 110?

# probability of 12 student thier IQ > 110
1- pnorm(110, 100, 15/sqrt(12))

## [1] 0.01046067

The sample 12 student with IQ > 110, probability is 1%.

Question 3

Load and store the sample NSCC Student Dataset using the read.csv() function. Find the mean, standard deviation, and sample size of the PulseRate variable in this dataset. Do you think it is likely or unlikely that the population mean pulse rate for all NSCC students is exactly equal to that sample mean found?

 nscc_data <- read.csv("C:/Users/selma/Desktop/Stats/nscc_data.csv")

Question 4

Construct a 95% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence. (Note: we can create a valid confidence interval here since n > 30)

# Store mean and std dev
mn_P<-mean(nscc_data$PulseRate,na.rm = TRUE) 
sd<-sd (nscc_data$PulseRate ,na.rm = TRUE)
 
# Calculate lower bound of 95% CI
mn_P - 1.96*(sd/sqrt(40))

## [1] 69.59647

# Calculate upper bound of 95% CI
mn_P + 1.96*(sd/sqrt(40))

## [1] 77.3509

We are 95% confident that all NSCC students mean puls rate is between 69.6and 77.4.

Question 5

Construct a 99% confidence interval for the mean pulse rate of all NSCC students and conclude your result in a complete sentence below the R chunk.

# Calculate lower bound of 99% CI
mn_P - 2.58*(sd/sqrt(40))

## [1] 68.37

# Calculate upper bound of 99% CI
mn_P + 2.58*(sd/sqrt(40))

## [1] 78.57736

The mean pulse rate of all NSCC students with 99% confidence intervals will be between 68.37 and 78.58

Question 6

Describe and explain the difference you observe in your confidence intervals for questions 4 and 5.

in the question 4 the 95% confidence intervals of the mean pulse rate, my or my not be in the interval but in question 5 the confidence intervals is 99%. The range here has with boundaries and the mean puls intervals will be in it.

What happened to our confidence interval estimates as we increased the percent confidence?

By increasing the percent of confidence, intervel estimates widely.

Question 7

In the Fall 2009 semester of the 2009-10 academic year, the average NSCC student took 12.1 credits. I’m curious if that average differs among NSCC students last year (a sample of which is in the NSCC student dataset). Conduct a hypothesis test by a confidence interval to determine if the average credits differslast year from Fall 2009.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\): \(\mu_{2017}\) = 12.1
\(H_A\): \(\mu_{2017}\) \(\neq\) 12.1

Create confidence interval

# Store mean of Credits variable
mn_C <-mean(nscc_data$Credits, na.rm = TRUE)

# Store standard deviation of Credits variable
s <-sd(nscc_data$Credits, na.rm = TRUE)

# Store sample size of Credits variable
n=40

# Lower bound of 95% CI
11.8 - 1.96*(3.37/sqrt(40))

## [1] 10.75563

# Upper bound of 95% CI
11.8 + 1.96*(3.37/sqrt(40))

## [1] 12.84437

Make decision to reject \(H_0\) or fail to reject \(H_0\) based on confidence interval

Since 12.1 falls within our interval, we cannot reject \(H_0\).

Write a concluding statement

There is no sufficient evidence to support the claim that the average credit in NSCC has chanred from 2009 - 2017.

Question 8

NSCC is investigating whether NSCC students have a higher than average stress level which can be identified by a higher than average standing pulse rate. Conduct a hypothesis test by a p-value to determine if NSCC students have a higher pulse rate than the national average of 72 bpm for adults.

Write the hypotheses (Try to emulate the “LaTeX” format I used in lecture notes. Otherwise, just give your best effort.)

\(H_0\): \(\mu\) = 72
\(H_A\): \(\mu\) > 72

Calculate p-value of getting sample statistics by chance

# Probability of getting sample data by random chance if mean was indeed 72bpm
pnorm(73.5, mean = 72 , sd = (sd/sqrt(n)), lower.tail = FALSE)

## [1] 0.2241427

Probability of getting sample data by random chance, if mean was 72bpm is 0.22.

Make decision to reject \(H_0\) or fail to reject \(H_0\) at a significance level of 0.05 based on p-value.

In this case, We fail to reject \(H_0\) at a significancelevel of 0.05 since the p-value is 0.22 > 0.05

Write a concluding statement.

There is not sufficient evidence to support the claim that NSCC students have a higher pulse rate than the national average of 72 bpm for adults.