knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
To answer this question, I looked into data.montgomerycountymd.gov where I found the Montgomery County College enrollment dataset (last updated as of July 5,2023). The dataset is extensive as it contains 25.3k cases (basically rows but in the context of the dataset, it’s the students in which they studied). and contains 18 columns which include race, student type, gender and etc. To answer the question, I will only be looking at 2 columns which are gender and student status.
college<- read.csv("Montgomery_College_Enrollment_Data_2025.csv")
head(college)
## Fall.Term Student.Type Student.Status Gender Ethnicity Race
## 1 2015 Continuing Full-Time Female Not Hispanic White
## 2 2015 Continuing Part-Time Male Not Hispanic White
## 3 2015 Continuing Part-Time Male Not Hispanic Black
## 4 2015 New Full-Time Male Not Hispanic Asian
## 5 2015 New Full-Time Female Hispanic White
## 6 2015 Continuing Full-Time Female Hispanic Hispanic
## Attending.Germantown Attending.Rockville Attending.Takoma.Park.SS
## 1 Yes Yes No
## 2 No Yes No
## 3 No Yes No
## 4 No Yes No
## 5 No Yes No
## 6 Yes No No
## Attend.Day.or.Evening MC.Program.Description
## 1 Day Only Health Sciences (Pre-Clinical Studies)
## 2 Evening Only Building Trades Technology (AA & AAS)
## 3 Day & Evening Computer Gaming & Simulation (AA - All Tracks)
## 4 Day Only Graphic Design (AA, AAS, & AFA - All Tracks)
## 5 Day & Evening General Studies (AA - All Tracks)
## 6 Day Only General Studies (AA - All Tracks)
## Age.Group HS.Category MCPS.High.School City.in.MD
## 1 25 - 29 Foreign Country Bethesda
## 2 21 - 24 MCPS Sherwood High School Olney
## 3 20 or Younger MCPS Quince Orchard Sr High School Gaithersburg
## 4 20 or Younger MCPS Thomas Sprigg Wootton High Sch North Potomac
## 5 20 or Younger MCPS Montgomery Blair High School Silver Spring
## 6 20 or Younger MCPS Clarksburg High School Germantown
## State ZIP County.in.MD
## 1 MD 20816 Montgomery
## 2 MD 20832 Montgomery
## 3 MD 20877 Montgomery
## 4 MD 20878 Montgomery
## 5 MD 20906 Montgomery
## 6 MD 20876 Montgomery
summary(college)
## Fall.Term Student.Type Student.Status Gender
## Min. :2015 Length:25320 Length:25320 Length:25320
## 1st Qu.:2015 Class :character Class :character Class :character
## Median :2015 Mode :character Mode :character Mode :character
## Mean :2015
## 3rd Qu.:2015
## Max. :2015
##
## Ethnicity Race Attending.Germantown Attending.Rockville
## Length:25320 Length:25320 Length:25320 Length:25320
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## Attending.Takoma.Park.SS Attend.Day.or.Evening MC.Program.Description
## Length:25320 Length:25320 Length:25320
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
##
## Age.Group HS.Category MCPS.High.School City.in.MD
## Length:25320 Length:25320 Length:25320 Length:25320
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## State ZIP County.in.MD
## Length:25320 Min. : 926 Length:25320
## Class :character 1st Qu.:20852 Class :character
## Mode :character Median :20877 Mode :character
## Mean :20892
## 3rd Qu.:20902
## Max. :95492
## NA's :99
To analyze the dataset, I selected only 2 columns which are the Student.Status and Duration columns using the select function.I then removed missing values from the 2 columns using the filter function.After cleaning the dataset, I filtered it again to only include male students, making a subset to compare the enrollment status among the male students. In order to conduct my hypothesis test, I created a frequency table and barplot to show the number of part-time and full-time male students.
college<- select(college, Gender, Student.Status)
college<- filter(college, !is.na(Gender), !is.na(Student.Status))
males_data <- filter(college, Gender == "Male")
# Frequency table
table(males_data$Student.Status)
##
## Full-Time Part-Time
## 4507 7456
# Barplot
barplot(table(males_data$Student.Status),
main = "Male Students by Enrollment Status",
xlab = "Student Status",
ylab = "Number of Male Students",
col = c("skyblue", "lightgreen"))
For this analysis, the aim was to determine whether the proportion of male students who are enrolled full-time is greater than the proportion of male students who are enrolled part-time at Montgomery College (MC). Since both Gender and Student.Status are categorical variables, I used a two-sample test for equality of proportions.
Hypotheses
\(H_0\): \(p_1\) = \(p_2\) \(H_a\): \(p_1\) > \(p_2\)
Where,
\(p_1\)= proportion of male students who are enrolled full-time
\(p_2\) = proportion of male students who are enrolled part-time
prop.test(c(4507,7446), c(4507 + 7456, 4507 + 7456), alternative= "greater")
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(4507, 7446) out of c(4507 + 7456, 4507 + 7456)
## X-squared = 1443.1, df = 1, p-value = 1
## alternative hypothesis: greater
## 95 percent confidence interval:
## -0.2560657 1.0000000
## sample estimates:
## prop 1 prop 2
## 0.3767450 0.6224191
Significance Level: α = 0.05
P-value = 1
Decision: Since the p-value is greater than the significance level, we fail to reject the null hypotheses. In the context of the question, it shows no evidence that the proportion of full-time male students (37.7%) is greater than the proportion of part-time male students (62%).