Context of Data
Map of all medical schools in the United States1
In 2018, 21,441 students matriculated into medical schools across the United States 2. Fortunately for these students, they are part of the 40.8% of all applicants to receive admission into a medical school; however, the unfortunate news is that they are faced with an average annual tuition of over $50K. Smith College pre-health students are included in these statistics every year! From 2004-2013, Smith College matriculated 209 Smith graduates into medical schools across the nation3. The number of students on the pre-health track increase every year, and this trend is evident in many of the introductory STEM courses (e.g., SDS 220, organic chemistry, cell biology, physics, etc.) as pre-health students compose a sizeable portion of these classes.
A student applying in the 2018 cycle would have considered various factors when choosing a potential medical school. Such factors include GPA, MCAT scores, location, enrollement size, school rank, tuition, and popular residency programs. Once accepted, tuition becomes a significant factor in influencing which medical school to attend as the average debt is approximately $220,000 for newly minted doctors.
Source of Data
To obtain such data points, U.S. News and World Report is a useful and well-established source for countless pre-health students4. Each row in our dataset corresponds to one of 94 allopathic medical schools in the United States (i.e., schools that give M.D. degrees as opposed to D.O. degrees) and considers tuition, enrollement, and categorical rank.
As many of the 94 medical schools are tied for the same rank, we turned these rankings into a categorical variable by grouping the schools into 4 categories: 1st ranked, 2nd ranked, 3rd ranked, and 4th ranked.
Research Question
In this study, we set out to determine the factors that influence a medical school’s tuition rates. Specifically, we evaluate enrollement (numerical explanatory variable) and the rank (categorical explanatory variable).
Limitations of Data
Our dataset based on the information provided from U.S. News and World Report is limited in certain aspects in that 1) we can only consider allopathic medical schools, and 2) the ranking may not be 100% objective.
Applicants can choose to attend an allopathic medical school (M.D. degree) or osteopathic medical school (D.O. degree), but this site only ranks the allopathic medical schools. Thus, the 94 medical schools do not represent the entirety of American medical schools as D.O. schools are excluded in our dataset.
Moreover, the methodology in which U.S. News and World Report ranks the medical schools changes frequently, giving more weight to different factors every year5. Also, the ranking methodology includes factors such as “academic peer assessment surveys” which seem to be more subjective.
A sample of our data is seen below.name_of_school | tuition | enrollement | categorical_rank |
---|---|---|---|
Tufts University | 60704 | 835 | 3rd |
Boston University | 58976 | 697 | 2nd |
Stony Brook University– SUNY | 65160 | 535 | 3rd |
University of California–Davis | 49337 | 450 | 2nd |
University of Southern California (Keck) | 61428 | 720 | 2nd |
mean_tuition | median_tuition | sd_tuition | IQR_tuition |
---|---|---|---|
$54,961.92 | $55,975 | $11,264.49 | $10,899.25 |
The mean and median tuition rates for M.D. medical schools across the United States appear to be similar with approximately a 1,000 (USD) difference. The standard deviation and IQR, however are much larger with the standard deviation of approximately 11,000 (USD) difference and IQR of approximately 11,000 (USD) between the 1st and 3rd quartile. This shows that variability exists between medical school tuitions. Our project sets out to determine the causes of such variability.
categorical_rank | mean_tuition_rank | median_tuition_rank | sd_tuition_rank | IQR_tuition_rank |
---|---|---|---|---|
1st | $53,800.96 | $54,566 | $7,112.43 | $7,783.00 |
2nd | $54,322.50 | $55,318 | $9,681.04 | $12,814.50 |
3rd | $56,296.76 | $56,022 | $14,288.28 | $14,796.00 |
4th | $55,729.31 | $56,604 | $14,122.63 | $3,551.00 |
To examine the effects of rank on tuition rates, we completed the same summary statistics using the four ranks. We see that there are little differences in means and medians between the four ranks, but different levels of variability within each rank as seen with the standard deviations and IQR.
mean_enrollement | median_enrollement |
---|---|
651.3913 | 637 |
Finally, we wanted to see the average and median enrollement for American allopathic medical schools to provide context for our outcome variable, tuition. As the average enrollement appears to be 650 students, there are approximately 162 matriculating first year medical students to each school annually.
The distribution seems to be normal as it resembles a bell curve. We do not see a left or right skew in our data, meaning we do not need to complete any log transformations.
correlation |
---|
-0.023 |
In order to evaluate the affects of enrollement on tuition, we first found the correlation cooeficient of enrollement on tuition. This initial correlation calculation demonstrates that there is a weak negative correlation between enrollement and tuition. This means that as enrollment increases, there is actually a trend of tuition decreasing, though the correlation supporting this is not strong becuase the value is not close to 1.
We then explored a data visualization of tuition cost by categorical rank. We see that the median tuitions between all four ranks are quite similar, but greater variability is present within each rank. The lowest ranked medical schools have the most outliers in tuition rates, whereas the highest ranked medical schools have a smaller distribution, with only one outlier (Baylor College of Medicine).
For the 1st and 4th ranked medical schools, we see that a slight positive relationship between enrollement, rank, and tuition. The 3rd ranked medical schools exhibit a slight negative relationship between enrollement, rank, and tuition, while the 2nd ranked medical schools show a more negative relationship than the other ranks.
term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|
intercept | 50162.105 | 8105.708 | 6.188 | 0.000 | 34043.015 | 66281.194 |
enrollement | 6.170 | 13.176 | 0.468 | 0.641 | -20.032 | 32.372 |
categorical_rank2nd | 12381.163 | 10829.344 | 1.143 | 0.256 | -9154.175 | 33916.500 |
categorical_rank3rd | 8909.417 | 11395.336 | 0.782 | 0.437 | -13751.459 | 31570.292 |
categorical_rank4th | 2307.526 | 10646.573 | 0.217 | 0.829 | -18864.352 | 23479.404 |
enrollement:categorical_rank2nd | -18.491 | 16.672 | -1.109 | 0.271 | -51.645 | 14.663 |
enrollement:categorical_rank3rd | -10.010 | 16.919 | -0.592 | 0.556 | -43.655 | 23.635 |
enrollement:categorical_rank4th | -0.831 | 16.707 | -0.050 | 0.960 | -34.055 | 32.392 |
\(\widehat{y}\) = 50,162.104 + 6.170 x enrollment + 12,381 x 1 iscategorical_rank2nd -18.4 x enrollement x 1 iscategorical_rank2nd + 8909.417 x 1 iscategorical_rank3rd -10.010 x enrollement x 1 iscategorical_rank3rd + 2307.526 x 1 iscategorical_rank4th -0.831 x enrollement x 1 iscategorical_rank4th
Example for Second Ranked Schools
y = 62543.104 - 12.23 x enrollement
Example for Fourth Ranked Schools
y = 52469.63 - 5.339 x enrollement
We chose the interaction model becuase we did not want to force the slopes of the regression lines to be the same. We wanted to observe if there was a change in slope of enrollement rather than intercept between categorical ranks becuase the intercept has no practical meaning.
Our model predicts that there are different slopes for each categorical rank based on enrollement. The baseline comparison is 1st categorical ranked schools, with an intercept of 50,162 (USD) and a slope of 6.170 (USD)/increase in student enrolled. The slope for schools in the 2nd categorical rank, the slope is -12.32 (USD)/student enrolled, however the intercept is much higher at 62,543 (USD). This is the highest intercept, but also the slope with the largest absolute magnitude. Schools in the 3rd categorical rank have a slope of -3.84, meaning that for every student enrolled, there is a decrease in 3.84 (USD) of tuition. The intercept for categorically 3rd ranked schools is lower than that of second ranked, but higher than 1st ranked schools, as its intercept is 59,071.417 (USD). Schools in the 4th categorical rank have a slope of 5.339, meaning that for every increase in student enrolled tuition increases by 5.34 (USD). The intercept is 52,469 (USD), which is higher than 1st ranked schools but lower than 2nd and 3rd ranked schools. The intercepts do not have a practical interpretation because no school has an enrollement of zero students.
We can see these observations in our exploratory data anlysis and our interaction slopes visualization. The slopes of 1st (red) and 4th (purple) categorically ranked schools are similar and positive, though the 4th is above the 1st meaning it has a higher intercept. Both the 2nd (green) and 3rd (blue) categorically ranked schools have negative slopes, though the green line is much steeper than the blue and also has a higher intercept.
Limitations for our analysis include the method of categorically grouping the ranks becuase we have limited our grouping to only four categories. There are outliers, specifically from Texas-based medical schools, that will cause higher degrees of variance within the categories. Therefore, this indicates that our model is not the most accurate predicition of tution for all schools. Another limitation is that we can not predict schools with less than 226 students or more than 1409 students as those are the lowest and highest enrollements in our data set.
There are no clear trends between categorical rank, enrollement, and tution rates. We might expect that higher ranked schools would be more expensive, however we see that the 4th categorically ranked schools have a higher intercept and slope than that of the 1st ranked. In our analysis, we are considering only out-of-state tutions, which can be more expensive than in-state tuitions, even though state schools populate much of the lower ranked categories. Overall, we see that enrollement does not have a uniform correlation to tutition across all four ranks.
Moral of the story: Medical schools are expensive, unless you’re in Texas.
The null hypothesis is that there is no relationship between cost of tuition and enrollement size taking into account the categorical rank of schools.
\[\begin{aligned} H_0:&\mu_{enrollement:firstranked} = \mu_{enrollement:secondranked} = \mu_{enrollement:thirdranked} = \mu_{enrollement:fourthranked} \\\ \mbox{vs }H_A:& \mu_{enrollement:firstranked} \neq \mu_{enrollement:secondranked} \neq \mu_{enrollement:thirdranked} \neq \mu_{enrollement:fourthranked} \end{aligned}\]
term | p_value | lower_ci | upper_ci |
---|---|---|---|
intercept | 0.000 | 34043.015 | 66281.194 |
enrollement | 0.641 | -20.032 | 32.372 |
categorical_rank2nd | 0.256 | -9154.175 | 33916.500 |
categorical_rank3rd | 0.437 | -13751.459 | 31570.292 |
categorical_rank4th | 0.829 | -18864.352 | 23479.404 |
enrollement:categorical_rank2nd | 0.271 | -51.645 | 14.663 |
enrollement:categorical_rank3rd | 0.556 | -43.655 | 23.635 |
enrollement:categorical_rank4th | 0.960 | -34.055 | 32.392 |
First we must complete residual analysis to determine if we can make conclusions based on our linear regression.
The scatterplot of residuals shows no particular trends, meaning that the average value of value resuduals is around zero. There seem to be equal numbers of residuals above and below zero, therefore our regression model is creating both negative and positive error that average out around zero. The histogram also shows a normal distribution of the residuals, further indicating that the linear regression model is a viable option for exploring our data set.
If the regression slopes were zero, there would be no relationship between enrollement and tuition taking into accound categorical rank. Some sampling variation will cause a range of values for the regression slopes from sample to sample or year to year. The plausible range of values for the enrollement slope of each categorical rank, taking into account sampling variation, is found from the confidence intervals and p values in the table below.
The intercept and difference in intercepts have no practical interpretation because there are not schools with zero enrollement, therefore we will not interepret the intercepts of the regression table. The enrollement slope for the first categorical ranked schools is 6.17 and its confidence interval is [-20, 32.4], which contains zero meaning we can conclude with 95% certainty that the observed slope, though a different value than zero due to sampling variation, and is plausibly close enough to zero. The slope we obtained is within a reasonable distance to zero that we would expect to see due to sampling variation. The confidence interval contains zero and therefore we fail to reject the null hypothesis. The p value is 0.641, well above 0.05 and therefore we also fail to reject the null hypothesis that there is no effect of enrollement on tuition cost.
The enrollement slope for the second categorical ranked schools is -12.33 and its confidence interval is [-51.6, 14.7], which also contains zero meaning there is no difference in slopes for the offset or bump from second categorical ranked schools. We therefore conclude with 95% certainty that the slope of the line could be zero and therefore enrollement would not affect tuition rates for second ranked schools as well. The p value is 0.271, also above 0.05 and therefore we fail to reject the null hypothesis that there is no effect of enrollement on tuition cost.
The enrollement slope for the third categorical ranked schools is -3.83 and its confidence interval is [-43,23.6], which also contains zero meaning there is no difference in enrollement slope between first and third categorically ranked schools. We therefore conclude with 95% certainty that the slope of the line could be zero and therefore enrollement would not affect tuition rates for third categorically ranked schools either. The p value is 0.556, also above 0.05 and therefore we fail to reject the null hypothesis that there is no effect of enrollement on tuition cost.
The enrollement slope for the fourth categorical ranked schools is 5.339 and its confidence interval is [-34.1,32.4], which also contains zero meaning there is no difference in enrollement slope between first and fourth categorically ranked schools. We therefore conclude with 95% certainty that the slope of the line could be zero and therefore enrollement would not affect tuition rates for fourth categorically ranked schools either. The p value is 0.96, also above 0.05 and therefore we fail to reject the null hypothesis that there is no effect of enrollement on tuition cost.
Our results demonstrate that no statistically significant relationship exists between medical school tuition rates and enrollment taking into account categorical rank. Although our multiple regression model indicates a slightly positive relationship between enrollment, rank, and tuition for 1st and 4th ranked schools, and a slightly negative relationship between enrollment, rank and tuition for 2nd and 3rd ranked medical schools, we can conclude with 95% certainty that these slopes are close enough to zero that is no effect of enrollment on tuition across all four ranks.
The “take-home message” of our analysis is that tuition costs for medical schools are not directly influenced by enrollment and rank. Although such aspects may play a role in tuition rates, a host of other factors must be considered (e.g., location, private or public affiliation, facilities/infrastructure costs, staff salaries, malpractice insurance, etc.) to gain a comprehensive understanding.
Moreover, limitations to our research question are important to consider when evaluating the determinants of tuition rates. Our data set focuses solely on allopathic schools (n = 94) while excluding osteopathic schools (n = 34) since U.S. News and World Report does not have a uniform criteria for rating both types of medical schools in the same ranked list. Another caveat is that U.S. News and World Report adjust the weights of different factors annually, so schools on the borderline of the four categorial ranks can fluctuate by year. Lastly, tuition rates can vary markedly depending on the student’s in-state or out-of-state status. For instance, an in-state candidate to University of Alabama-Birmingham School of Medicine would pay 27,582 (USD) while an out-of-state candidate would pay 61,848 (USD). Our data set includes only out-of-state tuition values in order to generalize the findings to a wider population, so considering the in-state tuition costs could affect the results.
Thus, future work on this analysis could investigate other factors such as location or student resources, include data from osteopathic medical schools, and consider in-state tuition costs. Another intersting consideration for analyses on medical schools is: “What factors affect a medical school’s rank?” By way of illustration, preliminary findings of location and medical school rankings are included in the Supplementary Materials section.
As we noticed that Southwestern medical schools were the cheapest medical schools, we thought it would be interesting to see a multiple regression model with location as the categorical variable instead of rank.
term | estimate | std_error | statistic | p_value | lower_ci | upper_ci |
---|---|---|---|---|---|---|
intercept | 53891.339 | 4821.111 | 11.178 | 0.000 | 44300.613 | 63482.065 |
enrollement | 7.608 | 6.373 | 1.194 | 0.236 | -5.070 | 20.285 |
locationNE | 6698.295 | 9489.998 | 0.706 | 0.482 | -12180.334 | 25576.924 |
locationSE | -2774.167 | 8903.972 | -0.312 | 0.756 | -20487.004 | 14938.670 |
locationSW | -9202.384 | 16956.803 | -0.543 | 0.589 | -42934.868 | 24530.099 |
locationW | -3196.515 | 9048.017 | -0.353 | 0.725 | -21195.903 | 14802.874 |
enrollement:locationNE | -12.747 | 14.750 | -0.864 | 0.390 | -42.089 | 16.595 |
enrollement:locationSE | -1.899 | 12.828 | -0.148 | 0.883 | -27.417 | 23.619 |
enrollement:locationSW | -19.613 | 20.519 | -0.956 | 0.342 | -60.433 | 21.206 |
enrollement:locationW | 2.360 | 14.399 | 0.164 | 0.870 | -26.285 | 31.004 |
Using this visualization, we can see that although SW schools seem to have a similar enrollement slope as the other locations, however the schools in this region have much lower tuitions. From the regression table, we can see that all of the offsets in slopes for NE,SE, SW, and W have confidence intervals that contain zero. This means we fail to reject our null hypothesis that enrollement does not have an affect on tuition rates with 95% confidence. All of the p values are also above 0.05, meaning we fail to reject the null hypothesis. Even though the SW have lower tuition rates, the affect of enrollement on tuition does not change based on the location of the school.
“Map of Medical Schools .” Google Search, Google Maps, 2018,www.google.com/maps/d/u/0/viewer?ie=UTF8&oe=UTF8&msa=0&mid=1yAdRuBw9Avr0Iy-C_WcSgNK73bQ&ll=34.2736651081515%2C-112.2418275&z=3.↩
Association of American Medical Colleges. “MCAT Scores and GPAs for Applicants and Matriculants to U.S. Medical Schools, 2017-2018 through 2018-2019.” Fact Sheet, AAMC, 2018, www.aamc.org/download/321494/data/factstablea16.pdf.↩
“Smith College PreHealth Facts.” Frequently Asked Questions, Smith College , 2018, www.smith.edu/prehealth/faq.php.↩
“The Best Medical Schools for Research, Ranked.” U.S. News & World Report, U.S. News & World Report, 2018, premium.usnews.com/best-graduate-schools/top-medical-schools/research-rankings.↩
Strauss, Valerie. “Why U.S. News College Rankings Shouldn’t Matter to Anyone.” The Washington Post, WP Company, 10 Sept. 2013, www.washingtonpost.com/news/answer-sheet/wp/2013/09/10/why-u-s-news-college-rankings-shouldnt-matter-to-anyone/?utm_term.↩