In this report, we focused on the trends among the highest ranked colleges in America. We were interested in the data of colleges as we wanted to look at the specific variables that are weighted highly for top colleges. These rankings are created through formulas that use statistical measures based on many individual variables from the dataset.
As college students, we had to make these life altering decisions not too long ago about which college we wanted to attend. Each of us obviously used actual visits and familial knowledge to help create our decision, but we also did care about how highly ranked our school was. We all cared about the rank as this can show the strength of the academic program at hand. So, when searching for specific datasets, this topic became of interest on what specific trends in certain variables create a higher ranked college. Even thinking about our future, a higher ranked college may get more recognition when job searching or graduate school movement. It is interesting to look at the variables in the dataset and how they analyzed rank versus the variables we each individually equated to a “higher” ranked school. So even though the ranking methodology should be objective and fair, there is always going to be an element of biased behavior surrounding the variables chosen for the dataset.
We dove into two specific topics of potential trends in rank of college with total student financial aid and another between student faculty ratio and rank.
The dataset we will be using contains the top American colleges of 2022 listed by rank and contains details about them including financial aid, student population, campus setting, location, and more. The variables we will be focusing on in relation to our questions are: the rank of the college, college type (private vs. public), percentage of students with financial aid, region in the U.S., total student population, and student faculty ratio. The summary descriptives for each of these variables are listed in the tables below.
# numbers found using the summary() and sd() function
tsp= c(421,3112,9850,16074,24363,102826,16284.87,0)
psfa = c(40,78,92,86.2,90,100,NA,3)
sfr = c(3,10,13.5,14.12,17,49,5.203,0)
df= data.frame(tsp,psfa,sfr)
colnames(df)= c("totalStudentPop","percentOfStudentsFinAid","studentFacultyRatio")
rownames(df) = c("Min.","1st Qu.", "Median", "Mean", "3rd Qu.", "Max.","Sd", "NA's")
print(df)
## totalStudentPop percentOfStudentsFinAid studentFacultyRatio
## Min. 421.00 40.0 3.000
## 1st Qu. 3112.00 78.0 10.000
## Median 9850.00 92.0 13.500
## Mean 16074.00 86.2 14.120
## 3rd Qu. 24363.00 90.0 17.000
## Max. 102826.00 100.0 49.000
## Sd 16284.87 NA 5.203
## NA's 0.00 3.0 0.000
p= c(36,21,21,20,2)
c = c(181,105,104,99,9)
df1=data.frame(c,p)
colnames(df1)=c("Count","Percentage (%)")
rownames(df1)= c("Northeast","West","South","Midwest","NA")
print(df1)
## Count Percentage (%)
## Northeast 181 36
## West 105 21
## South 104 21
## Midwest 99 20
## NA 9 2
p1=c(54,46)
c1= c(270,228)
df2=data.frame(p1,c1)
colnames(df2)=c("Count","Percentage (%)")
rownames(df2)= c("Private not-for-profit","Public")
print(df2)
## Count Percentage (%)
## Private not-for-profit 54 270
## Public 46 228
There were 18 other variables presented in the dataset that we decided not to use for our overall discussion. We grouped these other variables into three groups of information on: location, general statistics, and other miscellaneous details.
The variables that describe the location are City, State, Country Longitude, Latitude. The state code (stateCode) is also included, along with the setting of the campus (campusSetting) which describes if the setting was urban, suburban or another type.
The general statistics variables include undergraduate population (undergradPop), total aid granted (totalGrantAid), percentage of students with grants (percentOfStudentsGrant), and the number of students at the school (studentPopulation). Additionally there is a variable describing median base salary (medianBaseSalary), but it is unclear as to what group of people this salary is representing.
The last type of variables that we did not include are variables that describe more general information. These include a college’s name (organizationName), the year it was founded (yearFounded), its website (website), phone number (phoneNumber) and carnegie classification (carnegieClassification) which has categories such as Doctoral Universities: Very high Research Activity and Baccalaureate Colleges: Arts & Sciences Focus. There is a description of the college (description), which gives a general idea of what the college prides itself on.
What are the trends between the ranks of colleges and the percentage of students with financial aid?
We predict that higher ranked colleges will have a higher percentage of students with financial aid. Additionally, we predict that public colleges will generally have a higher percentage of students with financial aid over private not-for-profit colleges.
ggplot(college, aes(x = rank, y = percentOfStudentsFinAid, color = collegeType)) +
geom_point() +
geom_smooth(method = lm, se = FALSE) +
ylim(40, 100) +
labs(
title = "Trends between College Rank and \n Percent of Students with Financial Aid",
x = "Rank of College",
y = "Percentage of Students with Financial Aid",
color = "Type of College",
caption = "Scatter plot of College Rank vs. Percent of Students \n with Financial Aid of 500 Colleges in the United States (2022). \n Subsetted by the type of college (public and private not-for-profit).") +
theme_light() +
theme(plot.caption = element_text(face="bold", size=8),plot.title = element_text(size = 15))
When interpreting the chart, we first focused on the overall trends for the rank of colleges and percentage of students with financial aid. We can see that there is a positive sloping line (positive trend) for both college types. The general trend we can conclude from the graph about the top 500 colleges in the United States is that as the rank of the college decreases, the percentage of students with financial aid increases. In other words, the higher the rank of the college is, the fewer students there are with financial aid. This was an interesting observation and went against our initial prediction that higher ranked colleges would have a higher percentage of students with financial aid.
Next, we looked at the differences between the college types- private (not-for-profit) and public colleges. What first stood out to us was that there is a steeper change, a sharper increase, in Percentage of Students with Financial Aid as the Rank of College decreases for private not-for-profit colleges. The trends we see for public colleges are that they have a less severe increase in the change in the Percentage of Students with Financial Aid over the change in Rank of College. This shows that there is more variance between the college rank and students with financial aid for private universities whereas there is somewhat more stability for public universities. Moreover, we found it interesting that many private colleges that are ranked lower than 100 have 100% of their students receiving financial aid. Although there are more students with financial aid in public, higher-ranked colleges compared to private, higher-ranked colleges, this becomes opposite as the rank of the college decreases. From the graph, we see that there is a lower percentage of students with financial aid for public, lower-ranked universities (though still higher than higher-ranked schools), but many more students who attend private, lower-ranked colleges have financial aid. We wonder why this may be the case. When discussing this interesting trend as a group, we thought that one reason private colleges may award more students financial aid is that they want to attract good students. This is a traditional reason for even granting financial aid, but we think that this could be a leading factor behind the reasoning for these private, lower-ranked schools- especially since private institutions generally cost more than public institutions.
In general, the context of this chart begins to point to the things we find positive and important in a college- which may not be a main priority for the higher ranked colleges in the United States. We, as students, all care about how many students and the amount of financial aid that our fellow classmates receive. We see that there is a tradeoff of either wanting to attend a prestigious higher ranked college versus a higher chance of receiving financial aid. After some reflection, we realized that this chart supports what we experienced in both our own respective college application processes and also of those around us. Thus, this dilemma between a better ranked college and wanting to receive financial aid seems to be a common contemplation.
We chose a scatterplot to relay this trend as it was an effective way to interpret 500 different colleges. Since each dot represents a different college, a scatterplot allows for visibility of the specific data points and to have an addition of trend lines. Additionally, by using a scatter plot, we are able to show the relationship (positive or negative) for each type of school while making sure to include minimum, maximum, and outliers of data points. Scatter plots also work well with most continuous scales of data. The additional use of color to differentiate between college types also helps us to compare private and public colleges.
The alternatives we considered was utilizing a bubble chart or converting rank into a factor in order to subset the data. For a bubble chart, we would have struggled to pick an important and correct size origin for this type of data, so we stopped pursuing the idea of a bubble chart. Since we did have a lot of continuous variables in our dataset, we debated the idea of converting the rank variable into a factor, and then subsetting the data. We believed this would alter the real conclusion of our data as we wanted to look at a broader trend between all the colleges in the United States.
Thus, the scatterplot of Trends between College Rank and Percent of Students with Financial Aid was the most optimal visualization as it allowed us to show large quantities of data and make an easier assumption of trends and clustering effects. There are, of course, downsides to using a scatterplots- including overlapping data points- but we believed the advantages outweighed the disadvantages.
What is the relationship between student faculty ratio and rank?
We predict that the higher ranked colleges would have a lower student faculty ratio. Also, we predict that the region of the college will not affect the trend of rank versus student faculty ratio.
college1= college[c(1:100),]
g1 = ggplot(college1, aes(x= rank,y=studentFacultyRatio))
g1 + geom_point(alpha=0.6, aes(color=region,size=totalStudentPop)) + geom_smooth(method=lm,se= F) + scale_x_continuous(breaks=seq(0,100,10)) + scale_y_continuous(breaks=seq(0,35,5)) + labs(title="College Rank vs Student-Faculty Ratio",
x="College Rank", y="Student-Faculty Ratio", caption ="Bubble Chart of Rank vs Student Faculty Ratio \n of the Top 100 Colleges in the United States (2022). \n Subsetted by region (color) and student population (size of points)") +
guides(size = guide_legend(title = "Student Population"))+ theme_light() +theme(plot.caption = element_text(face="bold", size=8),plot.title = element_text(hjust = 0.5))
Within this data set, higher ranked colleges have a lower student faculty ratio. In examining the top 100 US colleges grouped by region, student faculty ratio generally increased as the college rank decreased. The West region was the only exception to this trend. It is also important to note that colleges with a larger student population often have a higher student faculty ratio in comparison to smaller colleges.
A low student faculty ratio is used by colleges to attract students- signaling smaller class size and a higher quality of classroom learning. Colleges with low student faculty ratios may offer more collaboration, as well as stronger relationships with professors and peers. Higher ranked colleges often have the financial resources to support a lower student faculty ratio by hiring a large number of professors and other faculty. In comparison to colleges with a higher student faculty ratio, these schools tend to be larger in size resulting in large classes at the cost of less interaction. However, larger colleges have other appealing aspects beyond student faculty ratio. This typically involves popular sports programs and a wide variety of academic majors.
In regards to the West region, we found it interesting that the data here did not seem to follow the same trend as other regions. The colleges within this region are spread among rank despite a high overall student faculty ratio. This could be due to the large size of the student population among colleges in the West region. Since many of these schools generally have larger student populations, student faculty ratios may not be as influential in the rank of the college within this region. The West region, in terms of this dataset, also further proves that there are other factors that play an influential role in the determination of college ranking.
We chose to display the data through a bubble chart because we believe it was the most clear and effective way to include the variety of variables that we wanted to incorporate. The ability to see each of the 100 points individually, along with making trend lines was crucial for seeing patterns in the data. While this could have been done with a simple scatter plot, there was more that we wanted to examine and visualize. By turning this graph into a bubble chart, we were able to include the total student population (size of the bubbles) and the region that each college is in (color of bubbles). This was beneficial because we not only could see trends about student faculty ratio and rank, but also how region and student population affect this trend.
We considered grouping the data by rank, and creating boxplots subsetted by region, but did not believe that it would show trends as effectively as a bubble chart; nor would we be able to key in on a specific point or groups of points to see specific trends.
Of course, the bubble chart has its downsides like the possibility of it being hard to discern the different sizes of the bubbles and finding it difficult to see the trends for the regions due to the overlap of the bubbles. However, after discussing which type of visualization would be most effective in communicating our data, it became clear that a bubble chart would be the best way to do so and that the advnatages of the bubble chart outweighed its disadvantages.
Based on our analysis of the data, we discovered two trends related to college rankings in 2022: (1) Within this dataset, higher ranked colleges have a lower percentage of students with financial aid in comparison to colleges of lower rank. Private colleges also provided more financial aid than public colleges in terms of this data set. (2) Out of the colleges included in this study, higher ranked colleges have a lower student faculty ratio than lower ranked colleges.
In light of our analysis, we acknowledge that college rankings are subject to bias. Despite Forbes being a reliable source utilized for this dataset, the metrics that define what makes a college a “top” university may not be the same for everyone. US News was the first to monetize college rankings and they made a huge business out of it. In an article about the negative impacts of college rankings by Insider, the author writes that US News select what they think is valuable in higher education and develops their rating based on their subjectivity. Then, colleges react and try to accommodate their numbers to climb up the rankings. This means that as colleges put in all their efforts to get their numbers up and rise up the ranks, they sacrifice the college experience of their students, make it more expensive, and make it extremely challenging for their applicants. Thus, it is important for students to keep this in mind when comparing colleges and making a final decision. Unfortunately, the market of rankings is not slowing down, but growing at a rapid pace. There are rankings for high schools, elementary schools, and even for preschools. Therefore, we believe that the metrics used by US News and other sources are controversial and bring negative impacts- especially to those going through the college application process just like we did.