Project #5 - The Full Picture: Inference on NSCC Students

Purpose – The Full Statistical Portrait of NSCC Students

In Project #4, we used the normal distribution to begin asking whether NSCC students differ from national averages. Now, armed with a full toolkit – t-tests, proportion tests, two-sample methods, and confidence intervals – we return to the NSCC Student Dataset to build a more complete statistical portrait. In this project, you will conduct multiple hypothesis tests and construct confidence intervals. For each inference question, you are responsible for identifying the correct type of inference to apply. Consider carefully: Is the variable of interest numeric or categorical? Are you comparing one group to a standard, or two groups to each other?

Preparation

Load the NSCC Student Dataset and familiarize yourself with its variables.

# Load the NSCC student dataset
nscc <- read.csv("C:/Users/sband/Downloads/Honors Stats/nscc_student_data.csv")

# Preview the structure of the dataset
str(nscc)

## 'data.frame':    40 obs. of  15 variables:
##  $ Gender      : chr  "Female" "Female" "Female" "Female" ...
##  $ PulseRate   : int  64 75 74 65 NA 72 72 60 66 60 ...
##  $ CoinFlip1   : int  5 4 6 4 NA 6 6 3 7 6 ...
##  $ CoinFlip2   : int  5 6 1 4 NA 5 6 5 8 5 ...
##  $ Height      : num  62 62 60 62 66 ...
##  $ ShoeLength  : num  11 11 10 10.8 NA ...
##  $ Age         : int  19 21 25 19 26 21 19 24 24 20 ...
##  $ Siblings    : int  4 3 2 1 6 1 2 2 3 1 ...
##  $ RandomNum   : int  797 749 13 613 53 836 423 16 12 543 ...
##  $ HoursWorking: int  35 25 30 18 24 15 20 0 40 30 ...
##  $ Credits     : int  13 12 6 9 15 9 15 15 13 16 ...
##  $ Birthday    : chr  "July 5" "December 27" "January 31" "6-13" ...
##  $ ProfsAge    : int  31 30 29 31 32 32 28 28 31 28 ...
##  $ Coffee      : chr  "No" "Yes" "Yes" "Yes" ...
##  $ VoterReg    : chr  "Yes" "Yes" "No" "Yes" ...

Variable Classification: The Gender, Birthday, Coffee, and VoterReg variables are all character variables. Height and ShoeLength are numeric variables, they include decimal values. The majority of the data is made up of whole number integer variables including PulseRate, CoinFlip1, CoinFlip2, Age, Siblings, RandomNum, HoursWorking, Credits, and ProfsAge.

Question 1: How Old Are NSCC Students?

According to the American Association of Community Colleges (AACC), the national average age of community college students is 26 years old. Is the average age of students in this NSCC sample consistent with that national figure, or do NSCC students differ?

a. Write the hypotheses.

Single Sample T-Test

\(H_0: \mu = 26\)

\(H_1: \mu \neq 26\)

\(\alpha = 0.05\)

b. Calculate the test statistic and p-value.

# Find the test statistic
t.test(nscc$Age, mu = 26)$statistic

##        t 
## -1.12844

# Find the p-value
t.test(nscc$Age, mu = 26)$p.value

## [1] 0.2660281

c. Decision and Conclusion.

\(p > \alpha\) = Fail to reject \(H_0\)

There is insufficient evidence to suggest that NSCC students differ in age compared to the average age of community college students.

Question 2: Are NSCC Students More Civically Engaged?

According to the U.S. Census Bureau, approximately 70% of eligible American adults are registered to vote. Do NSCC students register to vote at a higher rate than the general population?

a. Write the hypotheses.

Single Proportion Z-Test

\(H_0: \ p = 0.70\)

\(H_1: \ p > 0.70\)

\(\alpha = 0.05\)

b. Calculate the test statistic and p-value.

# Table of NSCC student data voter registration variable
table(nscc$VoterReg)

## 
##  No Yes 
##   9  31

# Test statistic of single proportion
prop.test(x = 31, n = 40, p = 0.70, alternative = "greater", correct = FALSE)$statistic

## X-squared 
##  1.071429

# P-value of single proportion
prop.test(x = 31, n = 40, p = 0.70, alternative = "greater", correct = FALSE)$p.value

## [1] 0.1503115

c. Decision and Conclusion.

\(p > \alpha\) = Fail to reject \(H_0\)

There is insufficient evidence to suggest that NSCC students register to vote at a higher rate than average eligible American adults.

Question 3: Do Male and Female Students Work Different Hours?

Balancing school and work is a reality for many community college students. Is there a significant difference in the average number of hours worked per week between male and female NSCC students?

a. Write the hypotheses.

Two Sample T-Test

\(H_0: \mu_F = \mu_M\)

\(H_1: \mu_F \neq \mu_M\)

\(\alpha = 0.05\)

Let \(\mu_F\) = mean hours worked per week for female NSCC students and \(\mu_M\) = mean hours worked per week for male NSCC students.

b. Calculate the test statistic and p-value.

# Store the NSCC female data into an object
nsccfemales <- subset(nscc, nscc$Gender == "Female")

# Store the NSCC male data into an object
nsccmales <- subset(nscc, nscc$Gender == "Male")

# Test statistic of two samples
t.test(nsccfemales$HoursWorking, nsccmales$HoursWorking)$statistic

##        t 
## 2.255941

# P-value of two samples
t.test(nsccfemales$HoursWorking, nsccmales$HoursWorking)$p.value

## [1] 0.03671193

c. Decision and Conclusion.

\(p < \alpha\) = Reject \(H_0\), Accept \(H_1\)

There is sufficient evidence to say that there is a difference between the amount of hours worked between male and female NSCC students.

Question 4: Does Coffee Preference Differ by Gender?

Three out of four NSCC students drink coffee – but is that rate the same for men and women? Test whether there is a significant difference in the proportion of coffee drinkers between male and female students.

a. Write the hypotheses.

Two Proportion Z-Test

\(H_0: p_F = p_M\)

\(H_0: p_F \neq p_M\)

\(\alpha = 0.05\)

Let \(p_F\) = proportion of female NSCC students who drink coffee and \(p_M\) = proportion of male NSCC students who drink coffee.

b. Calculate the test statistic and p-value.

# Table of NSCC female coffee drinkers
table(nsccfemales$Coffee)

## 
##  No Yes 
##   7  20

# Table of NSCC male coffee drinkers
table(nsccmales$Coffee)

## 
##  No Yes 
##   3  10

# Test statistic of pooled proportions
prop.test(x = c(20, 10), n = c(20 + 7, 10 + 3), correct = FALSE)$statistic

## Warning in prop.test(x = c(20, 10), n = c(20 + 7, 10 + 3), correct = FALSE):
## Chi-squared approximation may be incorrect

## X-squared 
## 0.0379867

# P-value of pooled proportions
prop.test(x = c(20, 10), n = c(20 + 7, 10 + 3), correct = FALSE)$p.value

## Warning in prop.test(x = c(20, 10), n = c(20 + 7, 10 + 3), correct = FALSE):
## Chi-squared approximation may be incorrect

## [1] 0.8454698

c. Decision and Conclusion.

\(p > \alpha\) = Fail to reject \(H_0\)

There is not enough data to suggest that there is a difference in the rate of coffee drinking among men and women at NSCC.

Question 5: How Many Credits Are NSCC Students Taking?

Rather than testing a specific claim, we want to estimate the true average number of credits taken per semester by all NSCC students.

a. Construct and interpret a 95% confidence interval for the mean credits taken per semester.

Single Sample T-Test

# Calculate a 95% confidence interval
t.test(nscc$Credits, conf.level = 0.95)$conf.int

## [1] 10.69715 12.85285
## attr(,"conf.level")
## [1] 0.95

b. Interpretation.

We can be 95% confident that the true mean credits that NSCC students are taking per semester falls between 10.70 and 12.85.

c. Follow-up question: Does this interval suggest students on average are taking a full-time load (defined as ≥ 12 credits)?

Since 12 is within our confidence interval, it is a possibility that students, on average, are taking on a full-time course load.

Question 6: Your Turn – Choose Your Own Inference

Using any variable(s) in the NSCC Student Dataset that have not yet been analyzed in this project, formulate an original research question. Your question must be answerable with inference via a hypothesis test and/or confidence interval.

a. State your research question.

Is there a difference in the average age of male and female students at NSCC?

b. Identify the type of inference and justify your choice.

I picked a two sample t-test because we are comparing the difference in means between two groups.

c. Write hypotheses (or describe what you are estimating, if using a confidence interval).

\(H_0: \mu_f = \mu_m\)

\(H_1: \mu_f \neq \mu_m\)

\(\alpha = 0.05\)

\(\mu_f\) = the average age of female students at NSCC and \(\mu_m\) = the average age of male students at NSCC

d. Conduct the analysis.

# Test statistic of two samples
t.test(nsccfemales$Age, nsccmales$Age)$statistic

##          t 
## -0.7852323

# P-value of two samples
t.test(nsccfemales$Age, nsccmales$Age)$p.value

## [1] 0.4433215

e. Conclusion.

\(p > \alpha\) = Fail to reject \(H_0\)

There is insufficient data to suggest that there is a difference in the average age of male and female NSCC students.

Question 7: Reflection

Across this project, you conducted multiple hypothesis tests on the same dataset.

What is the risk of conducting many hypothesis tests on the same dataset? (Hint: think about what α = 0.05 means in terms of false positives.)

Conducting too many hypothesis tests on the same dataset can cause the probability of error to go up. Every test has its own possibility of error, so when you start to conduct multiple, the chance of an error increases. The risk here is that our results end up having a higher chance of being incorrect.

Choose a question in this project where you found a result that was statistically significant. Do you believe this result is practically significant? If so, what do you believe could be reasons for this result? Discuss.

The result of working hours between male and female students came out to be statistically significant. However, I am not sure if the results are practically significant. It seems like there is a lot of variability between the amount of hours male and female students work, especially considering the data values where some male students work 0 hours, while female students work at least 12. This suggests a practical significance, but I’m not sure if this would correspond to the entire population of NSCC. There could be many different reasons for the amount of hours a student works, regardless of gender.

What is a limitation of the NSCC Student Dataset that affects all of the conclusions drawn in this project?

I think the biggest limitation is the sample size. 40 observations is technically an adequate sample size, but it just makes the cut. There are also limitations in variables like the hours working variable, which is most likely an estimate, especially considering how some of the values are overtime. Who’s to say if male or female NSCC students work more hours when working part-time or overtime shifts is subject to variability?

Project #5 - The Full Picture: Inference on NSCC Students

MAT143H - Introduction to Statistics Honors

Spencer Anderson

Due: 4/30/2026

Purpose – The Full Statistical Portrait of NSCC Students

Preparation

Question 1: How Old Are NSCC Students?

Question 2: Are NSCC Students More Civically Engaged?

Question 3: Do Male and Female Students Work Different Hours?

Question 4: Does Coffee Preference Differ by Gender?

Question 5: How Many Credits Are NSCC Students Taking?

Question 6: Your Turn – Choose Your Own Inference

Question 7: Reflection