Research Question: Is student enrollment status (full-time vs part-time) associated with age group, and how do enrollment patterns vary across different racial groups?

The dataset for my project comes from the Montgomery College Enrollment from data Montgomery provided from dataMontgomery.gov (Link:https://data.montgomerycountymd.gov/Education/Montgomery-College-Enrollment-Data/wmr2-6hn6/about_data). This dataset contains 25,320 observations and 18 variables of student enrollment records at Montgomery College. Each row represents one student in a given term and includes information such as term session, student type, student status (full-time vs part-time), campus location, gender, ethnicity, race, and other enrollment characteristics.

The purpose of this project is to see whether enrollment status (full-time vs part-time) is associated with students’ age groups and to explore how these enrollment patterns differ across racial groups. I am interested in this dataset because, as a student at Montgomery College, I know that MC has a very large and diverse campus Understanding how age and race relate to full-time and part-time enrollment can provide insight into which groups of students are more likely to attend part-time—patterns that may reflect differences in work obligations, family responsibilities, or access to resources.

Variables Selected

student_status: part or full time enrollment (categorical)

age_group: age range of enrollee (categorical)

race: student race (categorical)

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.2
## ✔ ggplot2   4.0.0     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mc_enrollment <- read_csv("Montgomery_College_Enrollment_Data.csv")
## Rows: 25320 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (16): Student Type, Student Status, Gender, Ethnicity, Race, Attending G...
## dbl  (2): Fall Term, ZIP
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
str(mc_enrollment)
## spc_tbl_ [25,320 × 18] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Fall Term               : num [1:25320] 2015 2015 2015 2015 2015 ...
##  $ Student Type            : chr [1:25320] "Continuing" "Continuing" "Continuing" "New" ...
##  $ Student Status          : chr [1:25320] "Full-Time" "Part-Time" "Part-Time" "Full-Time" ...
##  $ Gender                  : chr [1:25320] "Female" "Male" "Male" "Male" ...
##  $ Ethnicity               : chr [1:25320] "Not Hispanic" "Not Hispanic" "Not Hispanic" "Not Hispanic" ...
##  $ Race                    : chr [1:25320] "White" "White" "Black" "Asian" ...
##  $ Attending Germantown    : chr [1:25320] "Yes" "No" "No" "No" ...
##  $ Attending Rockville     : chr [1:25320] "Yes" "Yes" "Yes" "Yes" ...
##  $ Attending Takoma Park/SS: chr [1:25320] "No" "No" "No" "No" ...
##  $ Attend Day or Evening   : chr [1:25320] "Day Only" "Evening Only" "Day & Evening" "Day Only" ...
##  $ MC Program Description  : chr [1:25320] "Health Sciences (Pre-Clinical Studies)" "Building Trades Technology (AA & AAS)" "Computer Gaming & Simulation (AA - All Tracks)" "Graphic Design (AA, AAS, & AFA - All Tracks)" ...
##  $ Age Group               : chr [1:25320] "25 - 29" "21 - 24" "20 or Younger" "20 or Younger" ...
##  $ HS Category             : chr [1:25320] "Foreign Country" "MCPS" "MCPS" "MCPS" ...
##  $ MCPS High School        : chr [1:25320] NA "Sherwood High School" "Quince Orchard Sr High School" "Thomas Sprigg Wootton High Sch" ...
##  $ City in MD              : chr [1:25320] "Bethesda" "Olney" "Gaithersburg" "North Potomac" ...
##  $ State                   : chr [1:25320] "MD" "MD" "MD" "MD" ...
##  $ ZIP                     : num [1:25320] 20816 20832 20877 20878 20906 ...
##  $ County in MD            : chr [1:25320] "Montgomery" "Montgomery" "Montgomery" "Montgomery" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Fall Term` = col_double(),
##   ..   `Student Type` = col_character(),
##   ..   `Student Status` = col_character(),
##   ..   Gender = col_character(),
##   ..   Ethnicity = col_character(),
##   ..   Race = col_character(),
##   ..   `Attending Germantown` = col_character(),
##   ..   `Attending Rockville` = col_character(),
##   ..   `Attending Takoma Park/SS` = col_character(),
##   ..   `Attend Day or Evening` = col_character(),
##   ..   `MC Program Description` = col_character(),
##   ..   `Age Group` = col_character(),
##   ..   `HS Category` = col_character(),
##   ..   `MCPS High School` = col_character(),
##   ..   `City in MD` = col_character(),
##   ..   State = col_character(),
##   ..   ZIP = col_double(),
##   ..   `County in MD` = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>
head(mc_enrollment)
## # A tibble: 6 × 18
##   `Fall Term` `Student Type` `Student Status` Gender Ethnicity    Race    
##         <dbl> <chr>          <chr>            <chr>  <chr>        <chr>   
## 1        2015 Continuing     Full-Time        Female Not Hispanic White   
## 2        2015 Continuing     Part-Time        Male   Not Hispanic White   
## 3        2015 Continuing     Part-Time        Male   Not Hispanic Black   
## 4        2015 New            Full-Time        Male   Not Hispanic Asian   
## 5        2015 New            Full-Time        Female Hispanic     White   
## 6        2015 Continuing     Full-Time        Female Hispanic     Hispanic
## # ℹ 12 more variables: `Attending Germantown` <chr>,
## #   `Attending Rockville` <chr>, `Attending Takoma Park/SS` <chr>,
## #   `Attend Day or Evening` <chr>, `MC Program Description` <chr>,
## #   `Age Group` <chr>, `HS Category` <chr>, `MCPS High School` <chr>,
## #   `City in MD` <chr>, State <chr>, ZIP <dbl>, `County in MD` <chr>

Data Analysis In order to answer my research questions, I first loaded in my dataset and started cleaning it using the EDA functions like str() and head() to inspect the structure of the dataset. My next step was to clean the data by fixing the variable names. I replaced spaces between words with underscores and converted uppercase letters to lowercase using the functions gsub() and tolower(). The dataset contained “Unknown” categories for both age and race, so I converted all those entries to NA and used na_if() to remove the “Unknown” values from the age_group and race variables. Once I finished cleaning the data set, I created a new data set named “mc_enrolled_clean.” In this new data set, I used select() in order to pull the variables I need for my chi-square test: student_status, age_group, and race. I then checked for any missing values and used filter() to remove the rows containing NA’s. Once I double checked for NA’s using colSums, I used arrange() to sort the data by age_group which helped make the cleaned dataset easier to read. Lastly, I used summary() to view the distribution of each variable before running my chi-squared tests.

#Cleaning variable names to replace spaces(" ") with underscores & putting them in lowercase
names(mc_enrollment) <- gsub(" ", "_", names(mc_enrollment)) 
names(mc_enrollment) <- tolower(names(mc_enrollment)) 

  mc_enrollment[mc_enrollment == ""] <- NA
mc_enrollment <- mc_enrollment |>
  mutate(age_group = na_if(age_group, "Unknown"),
    race = na_if(race, "Unknown")
  )
# Selecting only the variables I am using and get rid of NA's
mc_enrollment_clean <- mc_enrollment |>
  select(student_status, age_group, race) |>
  filter(!is.na(student_status),
         !is.na(age_group),
         !is.na(race))

colSums(is.na(mc_enrollment_clean))
## student_status      age_group           race 
##              0              0              0
#Arrange data by age group
mc_enrollment_clean <- mc_enrollment_clean |>
arrange(age_group)
str(mc_enrollment_clean)
## tibble [25,266 × 3] (S3: tbl_df/tbl/data.frame)
##  $ student_status: chr [1:25266] "Part-Time" "Full-Time" "Full-Time" "Full-Time" ...
##  $ age_group     : chr [1:25266] "20 or Younger" "20 or Younger" "20 or Younger" "20 or Younger" ...
##  $ race          : chr [1:25266] "Black" "Asian" "White" "Hispanic" ...
head(mc_enrollment_clean)
## # A tibble: 6 × 3
##   student_status age_group     race    
##   <chr>          <chr>         <chr>   
## 1 Part-Time      20 or Younger Black   
## 2 Full-Time      20 or Younger Asian   
## 3 Full-Time      20 or Younger White   
## 4 Full-Time      20 or Younger Hispanic
## 5 Full-Time      20 or Younger Hispanic
## 6 Full-Time      20 or Younger Black
summary(mc_enrollment_clean$student_status)
##    Length     Class      Mode 
##     25266 character character
summary(mc_enrollment_clean$age_group)
##    Length     Class      Mode 
##     25266 character character
summary(mc_enrollment_clean$race)
##    Length     Class      Mode 
##     25266 character character

Statistical Analysis I used the chi-squared test of independence to examine whether enrollment status (full-time vs part-time) is associated with age group and race across. First, I made two contingency tables, one to show students who were full-time or part-time in each age group, and another to show students who were full-time or part-time in each race group. Then I ran a chi squared test on each table and all expected values were about 5, so the chi square test was appropriate. For age group, the chi square test gave me the value of 2692.1 for the X^2, 3 for the degrees of freedom, and <2.2e-16 for the p-value. Since the p-value was extremely small, I rejected the null hypothesis, meaning enrollment status is associated with age group. The bar graph shows the same pattern. Meaning younger students tend to be full time more often, while older students are much more likely to be part time. For race, the chi square test gave me the value of 332.52 for the X^2, 6 for the degrees of freedom, and <2.2e-16 as the p-value. Since the p-value is extremely small again, like the other, we will reject the null again, meaning enrollment status is associated with race group. The bar graph shows that every racial group has more part time than full time students. The strongest pattern was among Black and White students, who make up the largest groups, while the other race groups also show higher part-time enrollment, though the size of the difference varies by race.

\(H_0\) : Enrollment status is not associated with age group \(H_a\) : Enrollment status is associated with age group

\(H_0\) : Enrollment status is not associated with race \(H_a\) : Enrollment status is associated with race

#Enrollment status vs. age 
observed_status_age <- table(mc_enrollment_clean$student_status,
mc_enrollment_clean$age_group)
observed_status_age
##            
##             20 or Younger 21 - 24 25 - 29 30 or Older
##   Full-Time          5537    1819     778         733
##   Part-Time          4982    4512    2539        4366
#Enrollment status vs. race 
observed_status_race <- table(mc_enrollment_clean$student_status,
mc_enrollment_clean$race)
observed_status_race
##            
##             Asian Black Hispanic Multi-Race Native American Pacific Islander
##   Full-Time  1577  2874      440        334             191              127
##   Part-Time  1961  5343     1588        526             310              166
##            
##             White
##   Full-Time  3324
##   Part-Time  6505
#Chi square test
#For Age
chi_status_age <- chisq.test(observed_status_age)
chi_status_age
## 
##  Pearson's Chi-squared test
## 
## data:  observed_status_age
## X-squared = 2692.1, df = 3, p-value < 2.2e-16
#For Race 
chi_status_race <- chisq.test(observed_status_race)
chi_status_race
## 
##  Pearson's Chi-squared test
## 
## data:  observed_status_race
## X-squared = 322.52, df = 6, p-value < 2.2e-16
#Check expected cell count assumptions
#Age Group
chi_status_age$expected
##            
##             20 or Younger  21 - 24  25 - 29 30 or Older
##   Full-Time        3691.6 2221.839 1164.088    1789.473
##   Part-Time        6827.4 4109.161 2152.912    3309.527
#Race
chi_status_race$expected
##            
##                Asian    Black  Hispanic Multi-Race Native American
##   Full-Time 1241.647 2883.723  711.7184   301.8135        175.8239
##   Part-Time 2296.353 5333.277 1316.2816   558.1865        325.1761
##            
##             Pacific Islander    White
##   Full-Time         102.8272 3449.448
##   Part-Time         190.1728 6379.552
#Bar-graph 
barplot(table(mc_enrollment_clean$student_status),
        main = "Enrollment Status at Montgomery College",
        xlab = "Enrollment Status",
        ylab = "Count",
        col  = c("#FF7256", "#00C5CD"))

# Side-by-side bar plot: Enrollment Status by Age Group (AI usage)
ggplot(mc_enrollment_clean, aes(x = age_group, fill = student_status)) +
geom_bar(position = "dodge") +
labs(title = "Enrollment Status by Age Group",
x = "Age Group",
y = "Count",
fill = "Enrollment Status") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

# Side-by-side bar plot: Enrollment Status by Race (AI usage)
ggplot(mc_enrollment_clean, aes(x = race, fill = student_status)) +
geom_bar(position = "dodge") +
labs(title = "Enrollment Status by Race",
x = "Race",
y = "Count",
fill = "Enrollment Status") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))

Conclusion and Future Directions In my analysis of the Montgomery College enrollment data, it showed significant differences in enrollment status in both age groups and racial groups. From the data I gathered, it showed me that younger students were more likely to be full time, while older student were much more likely to attend part time. The results for race also showed clear differences in full time and part time students, with every racial group having more part time students than full time. These findings answered my research questions by showing how enrollment status varies depending on both age and race. The results I got from this analysis show that older students are more likely to attend part time which suggest they may be balancing school with work or have other responsibilities. The data also showed all racial groups being more part time than full time, suggesting difficult access to resources such as financial aid, childcare, or transportation. Understanding these patterns is important because enrollment status is often linked to academic progress, time to degree, and overall student success. For future research, I would want to explore other variables such as gender, campus location and student type which is if theyre a new, continuing or returning which would help reveal additional patterns in who enrolls full-time versus part-time. Understanding these trends would help show whether certain groups need more academic, financial, or scheduling support.