Context of Data
In 2018, 21,441 students matriculated into medical schools across the United States (1). Fortunately for these students, they are part of the 40.8% of all applicants to receive admission into a medical school; however, the unfortunate news is that they are faced with an average annual tuition of over $50K. Smith College pre-health students are included in these statistics every year! From 2004-2013, Smith College matriculated 209 Smith graduates into medical schools across the nation (2). The number of students on the pre-health track increase every year, and this trend is evident in many of the introductory STEM courses (e.g., SDS 220, organic chemistry, cell biology, physics, etc.) as pre-health students compose a sizeable portion of these classes.
A student applying in the 2018 cycle would have considered various factors when choosing a potential medical school. Such factors include GPA, MCAT scores, location, enrollement size, school rank, tuition, and popular residency programs. Once accepted, tuition becomes a significant factor in influencing which medical school to attend as the average debt is approximately $220,000 for newly minted doctors.
Source of Data
To obtain such data points, U.S. News and World Report is a useful and well-established source for countless pre-health students (3). Each row in our dataset corresponds to one of 94 allopathic medical schools in the United States (i.e., schools that give M.D. degrees as opposed to D.O. degrees) and considers tuition, enrollement, and categorical rank.
As many of the 94 medical schools are tied for the same rank, we turned these rankings into a categorical variable by grouping the schools into 4 categories: 1st ranked, 2nd ranked, 3rd ranked, and 4th ranked.
Research Question
In this study, we set out to determine the factors that influence a medical school’s tuition rates. Specifically, we evaluate enrollement (numerical explanatory variable) and the rank (categorical explanatory variable).
Limitations of Data
Our dataset based on the information provided from U.S. News and World Report is limited in certain aspects in that 1) we can only consider allopathic medical schools, and 2) the ranking may not be 100% objective.
Applicants can choose to attend an allopathic medical school (M.D. degree) or osteopathic medical school (D.O. degree), but this site only ranks the allopathic medical schools. Thus, the 94 medical schools do not represent the entirety of American medical schools as D.O. schools are excluded in our dataset.
Moreover, the methodology in which U.S. News and World Report ranks the medical schools changes frequently, giving more weight to different factors every year (4). Also, the ranking methodology includes factors such as “academic peer assessment surveys” which seem to be more subjective.
The first six rows of our data set are visualized using the glimpse function.
## Observations: 92
## Variables: 4
## $ name_of_school <chr> "Harvard University", "Johns Hopkins Universi...
## $ tuition <dbl> 59800, 51900, 52814, 56229, 46631, 49900, 559...
## $ enrollement <int> 715, 485, 531, 482, 607, 263, 595, 727, 494, ...
## $ categorical_rank <chr> "1st", "1st", "1st", "1st", "1st", "1st", "1s...
| mean_tuition | median_tuition | sd_tuituon | IQR_tuition |
|---|---|---|---|
| 54961.92 | 55975 | 11264.49 | 10899.25 |
The mean and median tuition rates for M.D. medical schools across the United States appear to be similar with approximately a 1,000 (USD) difference. The standard deviation and IQR, however are much larger with the standard deviation of approximately 11,000 (USD) difference and IQR of approximately 11,000 (USD) between the 1st and 3rd quartile. This shows that variability exists between medical school tuitions. Our project sets out to determine the causes of such variability.
| categorical_rank | mean_tuition_rank | median_tuition_rank | sd_tuituon_rank | IQR_tuition_rank |
|---|---|---|---|---|
| 1st | 53800.96 | 54566 | 7112.431 | 7783.0 |
| 2nd | 54322.50 | 55318 | 9681.037 | 12814.5 |
| 3rd | 56296.76 | 56022 | 14288.276 | 14796.0 |
| 4th | 55729.31 | 56604 | 14122.631 | 3551.0 |
To examine the effects of rank on tuition rates, we completed the same summary statistics using the four ranks. We see that there are little differences in means and medians between the four ranks, but different levels of variability within each rank as seen with the standard deviations and IQR.
| mean_enrollement | median_enrollement |
|---|---|
| 651.3913 | 637 |
Finally, we wanted to see the average and median enrollement for American allopathic medical schools to provide context for our outcome variable, tuition. As the average enrollement appears to be 650 students, there are approximately 162 matriculating first year medical students to each school annually.
The distribution seems to evenly distributed as it resembles a bell curve. We do not see a left or right skew in our data.
| correlation |
|---|
| -0.0229924 |
This initial data visualization demonstrates that there is a weak negative correlation between enrollement and tuition. When the enrollement passes 1,000 students, the data points seem less concentrated.
We see that the median tuitions between all four ranks are quite similar, but greater variability is present within each rank. The lowest ranked medical schools have the most outliers in tuition rates, whereas the highest ranked medical schools have a smaller distribution, with only one outlier (Baylor College of Medicine).
For the 1st and 4th ranked medical schools, we see that a slight positive relationship between enrollement, rank, and tuition. The 3rd ranked medical schools exhibit a slight negative relationship between enrollement, rank, and tuition, while the 2nd ranked medical schools show a more negative relationship than the other ranks.
| term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
|---|---|---|---|---|---|---|
| intercept | 50162.105 | 8105.708 | 6.188 | 0.000 | 34043.015 | 66281.194 |
| enrollement | 6.170 | 13.176 | 0.468 | 0.641 | -20.032 | 32.372 |
| categorical_rank2nd | 12381.163 | 10829.344 | 1.143 | 0.256 | -9154.175 | 33916.500 |
| categorical_rank3rd | 8909.417 | 11395.336 | 0.782 | 0.437 | -13751.459 | 31570.292 |
| categorical_rank4th | 2307.526 | 10646.573 | 0.217 | 0.829 | -18864.352 | 23479.404 |
| enrollement:categorical_rank2nd | -18.491 | 16.672 | -1.109 | 0.271 | -51.645 | 14.663 |
| enrollement:categorical_rank3rd | -10.010 | 16.919 | -0.592 | 0.556 | -43.655 | 23.635 |
| enrollement:categorical_rank4th | -0.831 | 16.707 | -0.050 | 0.960 | -34.055 | 32.392 |
\(\widehat{y}\) = 50,162.104 + 6.170 x enrollment + 12,381 x 1 iscategorical_rank2nd -18.4 x enrollement x 1 iscategorical_rank2nd + 8909.417 x 1 iscategorical_rank3rd -10.010 x enrollement x 1 iscategorical_rank3rd + 2307.526 x 1 iscategorical_rank4th -0.831 x enrollement x 1 iscategorical_rank4th
Our model predicts that there are different slopes for each categorical rank based on enrollement. The baseline comparison is 1st categorical ranked schools, with an intercept of 50,162 (USD) and a slope of 6.170 (USD)/increase in student enrolled. The slope for schools in the 2nd categorical rank, the slope is -12.32 (USD)/student enrolled, however the intercept is much higher at 62,543 (USD). This is the highest intercept, but also the slope with the largest absolute magnitude. Schools in the 3rd categorical rank have a slope of -3.84, meaning that for every student enrolled, there is a decrease in 3.84 (USD) of tuition. The intercept for categorically 3rd ranked schools is lower than that of second ranked, but higher than 1st ranked schools, as its intercept is 59,071.417 (USD). Schools in the 4th categorical rank have a slope of 5.339, meaning that for every increase in student enrolled tuition increases by 5.34 (USD). The intercept is 52,469 (USD), which is higher than 1st ranked schools but lower than 2nd and 3rd ranked schools. The intercepts do not have a practical interpretation because no school has an enrollement of zero students.
We can see these observations in our exploratory data anlysis and our interaction slopes visualization. The slopes of 1st (red) and 4th (purple) categorically ranked schools are similar and positive, though the 4th is above the 1st meaning it has a higher intercept. Both the 2nd (green) and 3rd (blue) categorically ranked schools have negative slopes, though the green line is much steeper than the blue and also has a higher intercept.
Limitations for our analysis include the method of categorically grouping the ranks becuase we have limited our grouping to only four categories. There are outliers, specifically from Texas-based medical schools, that will cause higher degrees of variance within the categories. Therefore, this indicates that our model is not the most accurate predicition of tution for all schools. Another limitation is that we can not predict schools with less than 226 students or more than 1409 students as those are the lowest and highest enrollements in our data set.
There are no clear trends between categorical rank, enrollement, and tution rates. We might expect that higher ranked schools would be more expensive, however we see that the 4th categorically ranked schools have a higher intercept and slope than that of the 1st ranked. In our analysis, we are considering only out-of-state tutions, which can be more expensive than in-state tuitions, even though state schools populate much of the lower ranked categories. Overall, we see that enrollement does not have a uniform correlation to tutition across all four ranks.
Moral of the story: Medical schools are expensive, unless you’re in Texas.
Note: This section is to be skipped for the initial submission and completed for the resubmission.
Note: This section is to be skipped for the initial submission and completed for the resubmission.
Optional: If you have any other materials that you think are interesting, but not directly relevant to the project. For example interesting observations or a cool visualization.
As we noticed that Southwestern medical schools were the cheapest medical schools, we thought it would be interesting to see a multiple regression model with location as the categorical variable instead of rank.