Team info

  • Group name: Mammoth Statisticians
  • Group members: Nam Nguyen, Juhwan Jeong

Purpose

State your research question, a description of the variables you’ll use, and your data sources (please include website links if possible).

Research Question: How much does finacial aid affect graduation rates and retention rates in different types of institutions?

Variables: The name of the university/college (name) is our identification variable Amount of financial aid (aid_values) is our numerical explanatory/predictor variable The types of universities (type) is our categorical explanatory/predictor variable The graduation rate in expected time (ie. completing a degree after 4 years at 4-year institutions) (grad_100), graduation rate in 1.5 times expected time (ie. completing a degree after 6 years at 4-year institutions) (grad_150), and retention rate(retain) are our numerical outcome variables.

These data were pulled from the College Completion microsite produced by The Chronicle of Higher Education with support from the Bill & Melinda Gates Foundation. [http://collegecompletion.chronicle.com/][https://data.world/databeats/college-completion/workspace/file?filename=cc_institution_details.csv]

Load packages and data

  1. Load all necessary packages
  2. Load the dataset then run the clean_names() function from the janitor package then select() only the variables you are going to use.

Example:

name type aid_values grad_100 grad_150 retain
Alabama A&M University Public 7142 10.0 29.1 63.1
University of Alabama at Birmingham Public 6088 29.4 53.5 80.2
Amridge University Private not-for-profit 2540 0.0 66.7 37.5
University of Alabama at Huntsville Public 6647 16.5 48.4 81.0
Alabama State University Public 7256 8.8 25.2 62.2
University of Alabama at Tuscaloosa Public 10390 42.7 66.7 87.0

Create EDA visualizations

Create “exploratory data analysis” visualizations of your data. At this point these are preliminary and can change for the submission, but the only requirement is that your visualizations use each of the measurement variables included in your dataset to test out if they work.