Introduction - The Problem

The majority of community college students, approx. 80%, begin their college education with the intent to transfer to and complete a bachelor’s degree program either before or after completion of their associate’s degree, however, only about 17% are successful in earning a bachelor’s degree within 6 years.1 The affordability of community colleges make them especially attractive to socioeconomically disadvantaged students and those from underrepresented groups who may be the first in their family to attend college and they are often the path advisers recommend to students with financial difficulties. As a result, these types of students are disproportionately enrolled at community colleges.2 But there is evidence that beginning post-secondary education in a community college may hinder the long-term academic and career goals of bachelor’s degree seeking students.3 There are often significant hurdles to overcome in the transfer process, called the ‘transfer pipeline’, that are at least in part to blame for the low success rate in attaining a bachelor’s degree after starting in a community college. This may be even more true for students who choose to major in the Humanities who we suspect face unique difficulties in the transfer process since they are lacking the kind of support that is more often offered to STEM majors through programs specifically designed to support students in these fields navigate the transfer process.

Review of the Literature

Using Data Mining to Explore Why Community College Transfer Students Earn Bachelor’s Degrees With Excess Credits

This study used data mining techniques, partition trees, and multiple regression analysis to measure the effect of various predictors taken from both demographic and academic records on the total number of excess credits both attempted and earned as outcome variables. Excess credits were defined as those in excess of the credits required for the degree that was attained and are considered one of the major hindrances toward attaining a degree. Community college students who attained associate degrees before transfer, those who did not attain an associate degree before transfer, and those who started at a four-year college were compared.

Data Mining and Its Applications in Higher Education

This book chapter aims to give a brief overview of ‘data mining’ or machine learning techniques, including decision trees, artificial neural networks, and chi-squared automatic induction, and their applications to Higher Education. Emphasis is given to the difference between a data science approach to research in which understanding the data and refraining from making prior assumptions are the norm and the traditional research process in which statistical assumptions and prior hypotheses are needed.

Using-Longitudinal-Data-to-Increase-Community-College-Student-Success

The purpose of this article is to present a research framework to evaluate student progression via derived milestones. In the context of this paper, milestones refer to accomplishments achieved by students during the community college to 4 year college pipeline. Momentum is another KPI which consolidates stages that may not be captured in the milestone metric.

Project Description

Objectives

We intend to contribute toward an ongoing research project at the university whose data we will be using by investigating some of the factors that influence success or failure for humanities students at community colleges. We specifically intend to compare students who major in the humanities vs. non-humanities majors to answer the questions:

  1. Are there any student characteristics particular to humanities majors vs. non-humanities majors that are predictors for student outcomes like successful transfer and bachelor’s degree graduation rates?
  2. Are there differences in the predictors and outcomes depending on the the specific humanities major?

Data

We believe we will have access to the full university database that includes quantitative as well as qualitative data about student characteristics including demographic info, high school records, parental education, academic performance, transfer attempted and successful transfer, transferred credits, graduation date, etc… We will need to submit a list of variables that we would like to acquire for the project and they will be extracted for us. Student level data will be de-identified and only accessible to the researchers and their academic adviser for the project.

Methodology

Assumptions

We will be using the definition of Humanities agreed on by the University project team which includes the following fields:

  • Area, Ethnic, Cultural, Gender, And Group Studies
  • Communication, Journalism, And Related Programs
  • English Language And Literature/Letters
  • Foreign Languages, Literatures, And Linguistics
  • History
  • Liberal Arts And Sciences, General Studies And Humanities
  • Philosophy And Religious Studies
  • Visual And Performing Arts

We will also be using the University project team’s definitions for “Transfer Success” and “Graduation Success” although we do not yet have access to those definitions.

Tools

We will be using R (and possibly Python as well) for our analysis and GitHub for collaboration. Specific R (or Python) packages will be determined on an as needed basis.

EDA

We will perform a thorough examination of the data and its underlying structures. We need to get a sense of distributions, frequencies, missing data, and outliers. We also want to get a sense on how to map our data to a smart database design by identifying what to use as primary keys, foreign keys etc.

Correlations will be useful for identifying relationships among variables. Given that we do not have access to the data yet, hypothesize that the correlation between number of successful bachelor graduates vs household income.

Predictive modeling

We will use methods of predictive modeling to identify the features that best predict a successful transfer from a community college to a 4-year college and graduation completion. This also includes applying the necessary statistical rigor such as imputation, outlier treatment, transformations, feature engineering, and variable importance.

Evaluation Measure

Correlations and any other hypothesis will be evaluated using confidence levels. Models will be evaluated using metrics such as adjusted r squared, accuracy, MAPE, and RMSE.

References