Each of our team members originally selected a few datasets that we compared as a group. Our group decided to explore a dataset Ana found related to the topic of racial and ethnic diversity at two-year and four-year colleges and universities in the United States. We initially thought to compare diversity at two-year community colleges versus 4-year schools and also were curious about the variations in diversity between the two that could be found comparatively across the country.
As we were exploring the dataset we identified a number of named colleges and universities that are known as “for-profit” institutions. Some of these institutions have been the subject of recent lawsuits regarding predatory (Halperin, 2020; Legal Services Center, 2020; Redman, 2020). The high number of current lawsuits regarding for-profit colleges’ predatory behaviors are a result of the actions of current Education Secretary Betsy DeVos. In 2019, DeVos “repealed an Obama-era regulation that sought to crack down on for-profit colleges and universities that produced graduates with no meaningful job prospects and mountains of student debt they could not hope to repay” (Green, 2019. para. 1). Another action that DeVos took was to deny debt forgiveness to students who had been prey to for-profit predatory educational institutions (Lobosco, 2019; Turner, 2019). We were aware of past and recent news articles describing the marketing strategies of these colleges that were aimed at students of color, low-income students, immigrant communities and students who are first in their family to go to college (Bonadies et al. 2018; Conti, 2019; Voorhees, 2019). Previous studies have found that loan debt is higher for students who have attended For-Profit institutions, which disproportionately affects students of color.
library(git2r)
library(usethis)
Race and For-Profits
(Body, 2019)
Loan Default Rates at For-Profits
(Body, 2019)
Despite their high cost, For-profit institutions have a lower graduation rate and employment rate than non-profit institutions.
For-Profit Graduation & Employment Rates
(Lopez, 2015)
We have examined our dataset and explored possible consistencies and/or inconsistencies with these previous reports.
1. Incorporation of Census Demographic Data
After the initial presentation of our analysis, questions from our colleagues and professor resulted in discussions regarding the comparative proportions of racial and ethnic groups at schools compared to the overall proportion of racial and ethnic backgrounds in the United States. We therefore gathered demographic data from the U.S. Census Bureau and incorporated that data into our analysis.
2. Incorporation of Tuition Cost and Salary Potential Data
Additionally, we received questions regarding economic inequalities such as differences in tuition, particularly at highly-ranked private, non-profit institutions such as Harvard. We therefore merged two additional datasets for tuition and salary potential into our previously compiled dataset regarding diversity at for-profit and non-profit schools. This dataset is titled “monsterfinal”. From this dataset we were able too begin a visual exploration of the data regarding tuition costs and salary potential at early and mid-career markers. This analysis was further developed by exploring the differences in in-state tuition versus out-of-state tuition and salary potential at public non-profit schools.
3. Analysis of the Data for the Top Ranked U.S. Tertiary Education Institutions
Lastly, we initiated a visual comparative analysis of cost of tuition of the top ranked of schools. We completed a second webscraping to compile a list of the top ranked tertiary education programs in the U.S. We utilized this list to create visualizations on the cost, salary potential and demographics at these schools.
How do racial and ethnic demographics vary between private and public institutions in our dataset?
How do racial and ethnic demographics vary between not-for-profit institutions and for-profit colleges in our dataset?
What is the relationship between tuition rate and salary potential?
Are For-Profit Tuition Costs Higher than Non-Profit Tuition Costs?
How do diversity, tuition rates and salary rates at the top 20 schools compare to other schools?
For-Profit Institutions: For-profit institutions are defined by the way that “revenue earned by the school is invested”. For-profit colleges have investors who want to make a profit. Their operations management is determined in part on maximizing the return profit for investors. “Money earned by the school may be used to pay out investors and award bonuses to executives, as well as sustain the operation’s profitability through aggressive marketing and recruitment strategies” (TBS Staff, 2019, para. 8)
Not-for Profit Institutions: Non-profit colleges can be either public or private. Regardless of whether the school is public or private, non-profit colleges must “reinvest the money earned through enrollment into the educational mission” (TBS Staff, 2019, para. 8).
Our dataset did not include information on whether colleges were for-profit.
In order to determine which of the schools on our list were for-profit institutions, we decided to find a website with a list of for-profit institutions, scrape the data from that website, to create a second dataset with a list of for profit schools.
#Loading the rvest package
library('rvest')
## Loading required package: xml2
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2 v purrr 0.3.4
## v tibble 3.0.1 v dplyr 1.0.0
## v tidyr 1.1.0 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## -- Conflicts ------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x readr::guess_encoding() masks rvest::guess_encoding()
## x purrr::is_empty() masks git2r::is_empty()
## x dplyr::lag() masks stats::lag()
## x purrr::pluck() masks rvest::pluck()
## x dplyr::pull() masks git2r::pull()
## x purrr::when() masks git2r::when()
library(dplyr)
diversity <- read_csv("diversity2.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## total_enrollment = col_double(),
## state = col_character(),
## category = col_character(),
## enrollment = col_double()
## )
salary <- read_csv("salary_potential.csv")
## Parsed with column specification:
## cols(
## rank = col_double(),
## name = col_character(),
## state_name = col_character(),
## early_career_pay = col_double(),
## mid_career_pay = col_double(),
## make_world_better_percent = col_double(),
## stem_percent = col_double()
## )
salary_added <- left_join(diversity,salary,by="name")
salary_added_final <- na.omit(salary_added)
tuition <- read_csv("tuition_cost.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## state = col_character(),
## state_code = col_character(),
## type = col_character(),
## degree_length = col_character(),
## room_and_board = col_double(),
## in_state_tuition = col_double(),
## in_state_total = col_double(),
## out_of_state_tuition = col_double(),
## out_of_state_total = col_double()
## )
tuition_added <- left_join(tuition,salary_added_final,by="name")
tuition_added_final <- na.omit(tuition_added)
monsterfinal <- tuition_added_final
monsterfinal <- monsterfinal %>% filter(category != "Women" & category != "Two Or More Races" & category != "Non-Resident Foreign" & category != "Unknown" & category != "Total Minority")
monsterfinal <- monsterfinal %>% mutate(enrollment_percentage = enrollment/ total_enrollment*100)
monsterfinal %>%
mutate_if(is.numeric, round, digits = 2)
## # A tibble: 3,588 x 21
## name state.x state_code type degree_length room_and_board in_state_tuition
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Adam~ Colora~ CO Publ~ 4 Year 8782 9440
## 2 Adam~ Colora~ CO Publ~ 4 Year 8782 9440
## 3 Adam~ Colora~ CO Publ~ 4 Year 8782 9440
## 4 Adam~ Colora~ CO Publ~ 4 Year 8782 9440
## 5 Adam~ Colora~ CO Publ~ 4 Year 8782 9440
## 6 Adam~ Colora~ CO Publ~ 4 Year 8782 9440
## 7 Agne~ Georgia GA Priv~ 4 Year 12330 41160
## 8 Agne~ Georgia GA Priv~ 4 Year 12330 41160
## 9 Agne~ Georgia GA Priv~ 4 Year 12330 41160
## 10 Agne~ Georgia GA Priv~ 4 Year 12330 41160
## # ... with 3,578 more rows, and 14 more variables: in_state_total <dbl>,
## # out_of_state_tuition <dbl>, out_of_state_total <dbl>,
## # total_enrollment <dbl>, state.y <chr>, category <chr>, enrollment <dbl>,
## # rank <dbl>, state_name <chr>, early_career_pay <dbl>, mid_career_pay <dbl>,
## # make_world_better_percent <dbl>, stem_percent <dbl>,
## # enrollment_percentage <dbl>
summary(monsterfinal)
## name state.x state_code type
## Length:3588 Length:3588 Length:3588 Length:3588
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## degree_length room_and_board in_state_tuition in_state_total
## Length:3588 Min. : 4275 Min. : 4220 Min. :11816
## Class :character 1st Qu.: 9040 1st Qu.:10370 1st Qu.:20264
## Mode :character Median :10584 Median :28312 Median :37830
## Mean :11030 Mean :27561 Mean :38590
## 3rd Qu.:13020 3rd Qu.:41642 3rd Qu.:54042
## Max. :18156 Max. :58230 Max. :75003
## out_of_state_tuition out_of_state_total total_enrollment state.y
## Min. : 6570 Min. :13674 Min. : 457 Length:3588
## 1st Qu.:21683 1st Qu.:31578 1st Qu.: 1960 Class :character
## Median :30096 Median :40478 Median : 3788 Mode :character
## Mean :31863 Mean :42893 Mean : 7775
## 3rd Qu.:41760 3rd Qu.:54450 3rd Qu.:10646
## Max. :58230 Max. :75003 Max. :60767
## category enrollment rank state_name
## Length:3588 Min. : 0.0 Min. : 1.00 Length:3588
## Class :character 1st Qu.: 15.0 1st Qu.: 5.00 Class :character
## Mode :character Median : 102.0 Median :11.00 Mode :character
## Mean : 1096.7 Mean :11.33
## 3rd Qu.: 796.2 3rd Qu.:17.00
## Max. :33293.0 Max. :25.00
## early_career_pay mid_career_pay make_world_better_percent stem_percent
## Min. :32500 Min. : 61900 Min. :34.00 Min. : 0.00
## 1st Qu.:45500 1st Qu.: 81200 1st Qu.:48.00 1st Qu.: 8.00
## Median :49400 Median : 89100 Median :52.00 Median :14.00
## Mean :51055 Mean : 92510 Mean :53.06 Mean :17.36
## 3rd Qu.:55000 3rd Qu.:100500 3rd Qu.:58.00 3rd Qu.:22.00
## Max. :88800 Max. :158200 Max. :86.00 Max. :97.00
## enrollment_percentage
## Min. : 0.0000
## 1st Qu.: 0.3656
## Median : 2.9349
## Mean :14.3040
## 3rd Qu.:10.5827
## Max. :98.8543
ggplot(data=monsterfinal, aes(x=category, y=enrollment_percentage, fill=type)) +
geom_bar(stat="identity", position=position_dodge()) +
ggtitle("Public & Private Racial & Ethnic Percent Enrollment Comparisons") +
labs(y = "Enrollment Percentage", x = "Race/Ethnicity")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
theme_minimal()+
scale_fill_manual(values=c("#9999CC", "#66CC99"))
This plot suggests that there are higher proportions of Black and Native Hawaiian/Pacific Islanders at private schools than public schools and a higher proportion of White students at non-profit schools than for-profit schools. Further computational analysis is required to determine significance. One critique raised of this plot is that population proportion information was not provided, which we incorporate below.
library(dplyr)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:git2r':
##
## config
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
monsterfinal %>%
mutate_if(is.numeric, round, digits = 2) %>%
ggplot(aes(x=category, y=enrollment_percentage, fill=type, text = paste("Percent:", enrollment_percentage, '</br>', '</br>School Name:', name))) +
geom_bar(stat="identity", position=position_dodge()) +
ggtitle("For-Profit/Non-Profit Racial & Ethnic Comparisons") +
labs(y = "Enrollment Percentage", x = "Race/Ethnicity")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
theme_minimal()
ggplotly(tooltip = "text")
This plot presents the same information as above, however instead of summary information, the interactivity of this plot allows readers to examine the enrollment percentages of each race/ethnicity by school name.
library(dplyr)
sum(tuition_added$type == "Public")
## [1] 3994
sum(tuition_added$type == "Private")
## [1] 5101
sum(tuition_added$type == "For Profit")
## [1] 107
library(RColorBrewer)
ggplot(monsterfinal, aes(reorder(type, enrollment_percentage), enrollment_percentage, fill = type)) +
geom_bar(stat = 'identity') +
facet_wrap(~ category)+
ggtitle("For-Profit/Non-Profit Racial & Ethnic Comparisons") +
labs(y = "Enrollment Percentage", x = "Race/Ethnicity")+
theme_bw()+
scale_fill_brewer()
This facet wrap plot is consist with the barplot shown above, however the scale is not quite correct. The scale problem was present for a number of our facet-wrap plots with enrollment percentage and will be discussed further below.
tuition4plot <- tuition_added %>%
filter(category != "Women" & category != "Unknown" & category != "Two Or More Races" & category != "Total Minority" & category != "Non-Resident Foreign")
ggplot(data=tuition4plot, aes(x=category, y=enrollment, fill=degree_length)) +
geom_bar(stat="identity", position=position_dodge()) +
ggtitle("2 Year & 4 Year Racial & Ethnic Enrollment Comparisons") +
labs(y = "Enrollment Totals", x = "Race/Ethnicity")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
theme_minimal()+
scale_fill_manual(values=c("#CC6666", "#9999CC", "#66CC99"))
This graph provides total enrollment numbers and suggests that White students and Latinx students are represented in much greater numbers than other racial and ethnic groups at 4 year schools.
According to the US census population statistics website the following is the breakdown of the racial demographics in the United States.
census <- data.frame("category" = c("White", "Black", "American Indian/Alaska Native", "Asian", "Native Hawaiian / Pacific Islander", "Hispanic"), "Percent_Pop" = c(.72, .127, .09, .056, .002, .18))
census
## category Percent_Pop
## 1 White 0.720
## 2 Black 0.127
## 3 American Indian/Alaska Native 0.090
## 4 Asian 0.056
## 5 Native Hawaiian / Pacific Islander 0.002
## 6 Hispanic 0.180
pcensus <- ggplot(census, aes(reorder(x = category, Percent_Pop), y = Percent_Pop, fill = category))+
geom_bar(stat = "identity")+
geom_text(aes(label = scales::percent(Percent_Pop),
y = Percent_Pop,
group = category, vjust = -.2))+
ggtitle("Percentage of U.S. Population by Race/Ethnicity") +
labs(y = "Percent of Population", x = "Race/Ethnicity")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
theme_minimal()+
scale_fill_brewer()
pcensus
head(diversity)
## # A tibble: 6 x 5
## name total_enrollment state category enrollment
## <chr> <dbl> <chr> <chr> <dbl>
## 1 University of Phoe~ 195059 Arizo~ Women 134722
## 2 University of Phoe~ 195059 Arizo~ American Indian / Alas~ 876
## 3 University of Phoe~ 195059 Arizo~ Asian 1959
## 4 University of Phoe~ 195059 Arizo~ Black 31455
## 5 University of Phoe~ 195059 Arizo~ Hispanic 13984
## 6 University of Phoe~ 195059 Arizo~ Native Hawaiian / Paci~ 1019
diversity <- read_csv("diversity2.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## total_enrollment = col_double(),
## state = col_character(),
## category = col_character(),
## enrollment = col_double()
## )
There were numerous “tidying” exercises that we undertook to clean our data and make the webscraping dataset and the diversity dataset comparable. Our tidying exercises included, but were not limited to, creating the following: 1. a new column calculating the percent of students in attendance based on race and ethnicity 2. a new binary column identifying the selected for-profit and not-for-profit schools represented by “1” and “0” respectively.
3. two new datasets: one for our randomly selected for-profit schools and one for our randomly selected not-for-profit schools.
diversity_new <- diversity %>% mutate(enrollment_percentage = enrollment/ total_enrollment*100)
head(diversity_new)
## # A tibble: 6 x 6
## name total_enrollment state category enrollment enrollment_perce~
## <chr> <dbl> <chr> <chr> <dbl> <dbl>
## 1 University~ 195059 Arizo~ Women 134722 69.1
## 2 University~ 195059 Arizo~ American Ind~ 876 0.449
## 3 University~ 195059 Arizo~ Asian 1959 1.00
## 4 University~ 195059 Arizo~ Black 31455 16.1
## 5 University~ 195059 Arizo~ Hispanic 13984 7.17
## 6 University~ 195059 Arizo~ Native Hawai~ 1019 0.522
We removed these categories to eliminate categories where there could be duplicitous counts.
diversity_new5 <- diversity_new %>% filter(category != "Women" & category != "Two Or More Races" & category != "Non-Resident Foreign" & category != "Unknown" & category != "Total Minority")
This function was used to assess how many separate schools were included in the dataset. There were over 4000 schools listed.
# unique(diversity_new5$name)
These functions were used to attempt to find individual schools. It was successful.
We checked the structure of our new dataset.
str(diversity_new5)
## tibble [27,630 x 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:27630] "University of Phoenix-Arizona" "University of Phoenix-Arizona" "University of Phoenix-Arizona" "University of Phoenix-Arizona" ...
## $ total_enrollment : num [1:27630] 195059 195059 195059 195059 195059 ...
## $ state : chr [1:27630] "Arizona" "Arizona" "Arizona" "Arizona" ...
## $ category : chr [1:27630] "American Indian / Alaska Native" "Asian" "Black" "Hispanic" ...
## $ enrollment : num [1:27630] 876 1959 31455 13984 1019 ...
## $ enrollment_percentage: num [1:27630] 0.449 1.004 16.126 7.169 0.522 ...
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. total_enrollment = col_double(),
## .. state = col_character(),
## .. category = col_character(),
## .. enrollment = col_double()
## .. )
This initial approach went one-by-one. For schools with multiple locations, some with 20 or more, we needed a faster and more efficient option.
diversity_new5$name[diversity_new5$name == "Spencerian College at Louisville (Ky.)"] <- "Spencerian College"
diversity_new5$name[diversity_new5$name == "Spencerian College at Lexington (Ky.)"] <- "Spencerian College"
diversity_new5$name[diversity_new5$name == "NewSchool of Architecture and Design"] <- "NewSchool Arch. Design"
diversity_new5$name[diversity_new5$name == "Western International University"] <- "Western Intl Univ."
diversity_new5$name[diversity_new5$name == "Schiller International University"] <- "Schiller Intl Univ."
diversity_new5$name[diversity_new5$name == "National Paralegal College"] <- "Natl Paralegal College"
diversity_new5$name[diversity_new5$name == "WestCoast University"] <- "West Coast Univ."
#Failed Attempt
# ME <- diversity_new %>%
# select(contains('Mildred Elley'))
# ME
We discovered the “grepl” function which allowed us to change entire groups of rows that contained a similar word or set of words. All of the different locations could be changed in one set of code instead of multiple sets.
diversity_new5$name[grepl('Mildred Elley', diversity_new5$name)] <- 'Mildred Elley'
This code identified all of the entries that included “West Coast University” in the name.
which(grepl("West Coast University", diversity_new5$name))
## [1] 12781 12782 12783 12784 12785 12786 13459 13460 13461 13462 13463 13464
## [13] 15265 15266 15267 15268 15269 15270 16747 16748 16749 16750 16751 16752
## [25] 20113 20114 20115 20116 20117 20118 27145 27146 27147 27148 27149 27150
These indices were used to show the current names in the dataset.
diversity_new5$name[14911]
## [1] "Salus University"
diversity_new5$name[15707]
## [1] "Concorde Career College at Garden Grove (Calif.)"
diversity_new5$name[17813]
## [1] "Argosy University Inland Empire"
diversity_new5$name[19541]
## [1] "Fortis College at Cincinnati"
diversity_new5$name[23466]
## [1] "Beacon College"
diversity_new5$name[31675]
## [1] NA
diversity_new5$name[grepl('West Coast University', diversity_new5$name)] <- 'West Coast Univ.'
We used the same indices to ensure that the names of all groups changed to the single name.
diversity_new5$name[14911]
## [1] "Salus University"
diversity_new5$name[15707]
## [1] "Concorde Career College at Garden Grove (Calif.)"
diversity_new5$name[17813]
## [1] "Argosy University Inland Empire"
diversity_new5$name[19541]
## [1] "Fortis College at Cincinnati"
diversity_new5$name[23466]
## [1] "Beacon College"
diversity_new5$name[31675]
## [1] NA
which(grepl("Brookline College", diversity_new5$name))
## [1] 13495 13496 13497 13498 13499 13500 19135 19136 19137 19138 19139 19140
## [13] 20281 20282 20283 20284 20285 20286 21247 21248 21249 21250 21251 21252
diversity_new5$name[15746]
## [1] "Concorde Career College at San Diego"
diversity_new5$name[24791]
## [1] "Gupton Jones College of Funeral Service"
diversity_new5$name[grepl("Brookline College", diversity_new5$name)] <- "Brookline College"
diversity_new5$name[15746]
## [1] "Concorde Career College at San Diego"
diversity_new5$name[24791]
## [1] "Gupton Jones College of Funeral Service"
We changed all of the names of our schools to simplified versions which was particularly important for the schools with multiple locations.
diversity_new5$name[grepl('Clover Park Technical College', diversity_new5$name)] <- 'Clover Park Tech College'
diversity_new5$name[grepl('San Diego City College', diversity_new5$name)] <- 'San Diego City College'
diversity_new5$name[grepl('Aspen University', diversity_new5$name)] <- 'Aspen Univ.'
diversity_new5$name[grepl('Grand Canyon University', diversity_new5$name)] <- 'Grand Canyon Univ.'
diversity_new5$name[grepl('American Public University', diversity_new5$name)] <- 'American Public Univ.'
diversity_new5$name[grepl('Blue Cliff College', diversity_new5$name)] <- 'Blue Cliff College'
diversity_new5$name[grepl('University of Phoenix', diversity_new5$name)] <- 'University of Phoenix'
diversity_new5$name[grepl('Stevens-Henager College', diversity_new5$name)] <- 'Stevens-Henager College'
diversity_new5$name[grepl('DeVry University', diversity_new5$name)] <- 'DeVry Univ.'
diversity_new5$name[grepl('Pioneer Pacific College', diversity_new5$name)] <- 'Pioneer Pacific College'
diversity_new5$name[grepl('National College at', diversity_new5$name)] <- 'National College'
diversity_new5$name[grepl('Strayer University',diversity_new5$name)]<- 'Strayer Univ.'
diversity_new5$name[grepl('Lincoln Tech',diversity_new5$name)]<- 'Lincoln Tech'
diversity_new5$name[grepl('Fashion Institute of Design and Merchandising', diversity_new5$name)]<- 'Fashion Institute'
diversity_new5$name[grepl('Centura College',diversity_new5$name)]<- 'Centura College'
diversity_new5$name[grepl('Rasmussen College',diversity_new5$name)]<- 'Rasmussen College'
diversity_new5$name[grepl('Fortis College',diversity_new5$name)]<- 'Fortis College'
For some reason Southeastern Community College (Iowa) would not change names. It also would not appear if we searched it using the which function. We were only able to filter this school by the total_enrollment column. Perhaps the difficulties with the name are a result of the parentheses. Eventually we found the gsub() function to remove the parentheses from the name.
library(stringr)
#str_replace(diversity_new5$name, "\\(.*\\)", "")
#diversity_new5$name <- as.character(diversity_new5$name)
#unlist(strsplit(diversity_new5$name, " \\(.*\\)"))
#library(plyr)
#diversity_new5$name <- as.character(diversity_new5$name)
#c<-strsplit(diversity_new5$name, "\\(")
#ldply(c)
# str_replace_all(diversity_new5$name, "[^[:alnum:]]", " ")
# str_replace_all(diversity_new5$name, "[[:punct:]]", " ")
diversity_new5$name <- gsub("Southeastern Community College (Iowa)", "Southeastern Community College Iowa", diversity_new5$name, fixed=TRUE)
which(grepl("Southeastern Community College Iowa", diversity_new5$name))
## [1] 9217 9218 9219 9220 9221 9222
which(grepl("Southeastern Community College (Iowa)", diversity_new5$name))
## [1] 9217 9218 9219 9220 9221 9222
diversity_new5$name[grepl("Southeastern Community College (Iowa)", diversity_new5$name)] <- "Southeastern Community College Iowa"
which(grepl("2987", diversity_new5$total_enrollment))
## [1] 9217 9218 9219 9220 9221 9222
diversity_new5$name[9217:9222]
## [1] "Southeastern Community College Iowa" "Southeastern Community College Iowa"
## [3] "Southeastern Community College Iowa" "Southeastern Community College Iowa"
## [5] "Southeastern Community College Iowa" "Southeastern Community College Iowa"
All of the schools with multiple locations have a single name now, however, they still are separated by rows. We would like to combine all of the statistic for each “category” for all of the locations into one summarized row. For example in the following image you can see four locations for the University of Houston.
We were able to combine each of the four into one set of 7 lines with the averages calculated for each category’s enrollment and enrollment percentage.
The resolution appears in the creation of dataset “c3a” below.
For consistency of methodology we decided to select 35 schools from the not-for-profit educational institutions to compare to the 35 randomly selected for-profit schools. The final list of randomly selected not-for-profit schools was also geographically diverse.
sample_n(diversity_new,35)
## # A tibble: 35 x 6
## name total_enrollment state category enrollment enrollment_perc~
## <chr> <dbl> <chr> <chr> <dbl> <dbl>
## 1 Carteret Co~ 1659 North ~ Asian 12 0.723
## 2 University ~ 2575 Texas Hispanic 950 36.9
## 3 California ~ 6631 Califo~ Hispanic 2639 39.8
## 4 St. Thomas ~ 1942 New Yo~ Unknown 133 6.85
## 5 <NA> 217 <NA> Unknown 35 16.1
## 6 ITT Technic~ 336 Michig~ Two Or Mor~ 12 3.57
## 7 Southwest V~ 2546 Virgin~ Unknown 4 0.157
## 8 Haskell Ind~ 808 Kansas Two Or Mor~ 0 0
## 9 Everest Uni~ 634 Florida Non-Reside~ 2 0.315
## 10 <NA> 131 <NA> Native Haw~ 0 0
## # ... with 25 more rows
diversity_new5$name[grepl('University of Houston', diversity_new5$name)] <- 'Univ. of Houston'
diversity_new5$name[grepl('University of Colorado', diversity_new5$name)] <- 'Univ. of Colorado'
diversity_new5$name[grepl('University of Massachusetts', diversity_new5$name)] <- 'Univ. of Massachusetts'
diversity_new5$name[grepl('Arizona College', diversity_new5$name)] <- 'Arizona College'
diversity_new5$name[grepl('University of Idaho', diversity_new5$name)] <- 'Univ. of Idaho'
diversity_new5$name[grepl('Clark College', diversity_new5$name)] <- 'Clark College'
diversity_new5$name[grepl('City University of New York Hunter College', diversity_new5$name)] <- 'City Univ. New York Hunter College'
diversity_new5$name[grepl('State University of New York', diversity_new5$name)] <- 'State Univ. of New York'
diversity_new5$name[grepl('Pennsylvania State University',diversity_new5$name)]<- 'Penn State Univ.'
diversity_new5$name[grepl('University of Minnesota',diversity_new5$name)]<- 'Univ. of Minn'
diversity_new5$name[grepl('Arizona State University',diversity_new5$name)]<- 'Arizona State'
diversity_new5$name[grepl('ITT Technical Institute',diversity_new5$name)]<- 'ITT Tech'
We were initially going to use the column to create our plots and complete out analysis, however we realized it would be faster to create separate datasets with the randomly selected schools. The below coding successfully create a new column with “1” for the identified for-profit colleges and “0” for the rest.
diversity_new3 <- diversity_new %>% mutate(diversity_new,forProfit = ifelse(name=="Spencerian_College", 1,
ifelse(name=="Aspen University", 1,
ifelse(name=="American Public University system", 1, 0))))
diversity_new3$forProfit[47834] # Check on Spencerian_College entry
## [1] 0
diversity_new3$forProfit[24533] # Check on Aspen University
## [1] 1
diversity_new3$forProfit[123] # Check on American Public University system
## [1] 1
Since the final combined dataset had nearly 4000 rows we decided to make a new dataset with simplified data. We combined all of the school entries with multiple locations into one entry with the rows populated by the averages for each variable
for_profit2 <- diversity_new5 %>%
filter(name == "Spencerian College" |
name == "Mildred Elley" |
name == "Brookline College" |
name == "Grand Canyon Univ." |
name == "Aspen Univ." |
name == "American Public Univ." |
name == "Western Intl Univ." |
name == "NewSchool Arch. Design" |
name == "Schiller Intl Univ." |
name == "Natl Paralegal College" |
name == "West Coast Univ." |
name == "Blue Cliff College" |
name == "Walden University" |
name == "Neumont University" |
name == "University of Phoenix" |
name == "Stevens-Henager College" |
name == "DeVry Univ." |
name == "Pioneer Pacific College" |
name == "Stratford University" |
name == "Capella University" |
name == "Grantham University" |
name == "Redstone College" |
name == "National College" |
name == "Strayer Univ." |
name == "Lincoln Tech" |
name == "Fashion Institute" |
name == "Centura College" |
name == "Rasmussen College" |
name == "Fortis College" |
name == "Full Sail University" |
name == "Rocky Mountain College of Art & Design" |
name == "Minneapolis Business College" |
name == "Paier College of Art" |
name == "Vista College" |
name == "Bay State College" )
str(for_profit2)
## tibble [1,014 x 6] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:1014] "University of Phoenix" "University of Phoenix" "University of Phoenix" "University of Phoenix" ...
## $ total_enrollment : num [1:1014] 195059 195059 195059 195059 195059 ...
## $ state : chr [1:1014] "Arizona" "Arizona" "Arizona" "Arizona" ...
## $ category : chr [1:1014] "American Indian / Alaska Native" "Asian" "Black" "Hispanic" ...
## $ enrollment : num [1:1014] 876 1959 31455 13984 1019 ...
## $ enrollment_percentage: num [1:1014] 0.449 1.004 16.126 7.169 0.522 ...
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. total_enrollment = col_double(),
## .. state = col_character(),
## .. category = col_character(),
## .. enrollment = col_double()
## .. )
head(for_profit2)
## # A tibble: 6 x 6
## name total_enrollment state category enrollment enrollment_perce~
## <chr> <dbl> <chr> <chr> <dbl> <dbl>
## 1 Universit~ 195059 Arizo~ American Indi~ 876 0.449
## 2 Universit~ 195059 Arizo~ Asian 1959 1.00
## 3 Universit~ 195059 Arizo~ Black 31455 16.1
## 4 Universit~ 195059 Arizo~ Hispanic 13984 7.17
## 5 Universit~ 195059 Arizo~ Native Hawaii~ 1019 0.522
## 6 Universit~ 195059 Arizo~ White 58209 29.8
not_profit <- diversity_new5 %>%
filter(name == "Univ. of Idaho" |
name == "Southeastern Community College Iowa"|
name == "Clover Park Technical College"|
name == "Clark College"|
name == "City Univ. New York Hunter College"|
name == "Univ. of Houston" |
name == "Univ. of Colorado"|
name == "Univ. of Massachusetts" |
name == "Arizona College" |
name == "San Diego City College" |
name == "State Univ. of New York" |
name == "Eastern Shore Community College" |
name == "Southern Oregon University" |
name == "Santa Monica College" |
name == "Adirondack Community College" |
name == "East Georgia State College" |
name == "Smith College" |
name == "East Tennessee State University" |
name == "Austin Community College" |
name == "Tompkins Cortland Community College" |
name == "Penn State Univ." |
name == "Univ. of Minn" |
name == "Arizona State" |
name == "ITT Tech" |
name == "University of Hawaii Hawaii Community College"|
name == "Burlington College" |
name == "University of Central Oklahoma" |
name == "Southern Virginia University" |
name == "Pennsylvania Highlands Community College" |
name == "Colgate University"|
name == "University of Pittsburg")
For some reason the name filter for “Southeastern Community College (Iowa)” would not select properly. The same happened for the attempted name change above.
for_profit3 <- cbind(for_profit2, profit_status = "for-profit")
not_profit2 <- cbind(not_profit, profit_status = "non-profit")
combo_set <- rbind(for_profit3, not_profit2)
combo_set2 <- combo_set %>%
mutate_if(is.numeric, round, digits = 2)
# Removing the "state" category so we can make a new dataset with means for each column without an error due to categorical differences that cannot be merged with a mean calculation.
c3 <- combo_set2[,-3]
#creating a new dataset with all locations in one entry for each school.
c4 <- c3 %>%
group_by(name, category, profit_status) %>%
summarise_each(funs(mean))
## Warning: `summarise_each_()` is deprecated as of dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
## Warning: `funs()` is deprecated as of dplyr 0.8.0.
## Please use a list of either functions or lambdas:
##
## # Simple named list:
## list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`:
## tibble::lst(mean, median)
##
## # Using lambdas
## list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
This list was based off of unmerged school locations so we merged all locations into one prior to the final dataset merge.
# For-profit short combined list
c3a <- c3 %>%
filter(profit_status == "for-profit") %>%
group_by(name, category, profit_status) %>%
summarise_each(funs(mean))
#str(c3a)
# not-profit short combined list
c3b <- c3 %>%
filter(profit_status == "non-profit") %>%
group_by(name, category, profit_status) %>%
summarise_each(funs(mean))
#str(c3b)
# creating census short combined list
c3c <- c3b
c3c$profit_status[c3c$profit_status == "non-profit"] <- "census"
#str(c3c)
# census percentages
c3c$enrollment_percentage[c3c$category == 'White'] <- as.numeric(72)
c3c$enrollment_percentage[c3c$category == 'Black'] <- as.numeric(12.7)
c3c$enrollment_percentage[c3c$category == 'American Indian / Alaska Native'] <- as.numeric(0.9)
c3c$enrollment_percentage[c3c$category == 'Asian'] <- as.numeric(5.6)
c3c$enrollment_percentage[c3c$category == 'Native Hawaiian / Pacific Islander'] <- as.numeric(0.2)
c3c$enrollment_percentage[c3c$category == 'Hispanic'] <- as.numeric(18)
# population totals
#census4bind$enrollment[census4bind$category == 'White'] <- '235560556'
#census4bind$enrollment[census4bind$category == 'Black'] <- '41550265'
#census4bind$enrollment[census4bind$category == 'American Indian / Alaska Native'] <- '2944507'
#census4bind$enrollment[census4bind$category == 'Asian'] <- '18321377'
#census4bind$enrollment[census4bind$category == 'Native Hawaiian / Pacific Islander'] <- '654334.9'
#census4bind$enrollment[census4bind$category == 'Hispanic'] <- '58890139'
c3c$enrollment_percentage <- as.numeric(as.character(c3c$enrollment_percentage))
#str(c3c)
c3bind <-rbind(c3c, c3b, c3a )
#str(c3bind)
plot1 <- for_profit2 %>%
ggplot() +
geom_bar(aes(x=name, y= enrollment_percentage ,
fill = category),
position = "fill",
stat = "identity" ) +
coord_flip() +
theme_minimal() +
ggtitle("Diversity % of Enrollment in For Profit US Schools", ) +
theme (plot.title = element_text(hjust = .01, size=15)) +
labs(fill = "Race") +
theme(legend.justification = -20,
legend.position="bottom",
legend.text = element_text(size=6) ,
) +
xlab("School Name") +
ylab ("Percent of Enrollment")
plot1
plot2 <- not_profit %>%
ggplot() +
geom_bar(aes(x=name, y= enrollment_percentage ,
fill = category),
position = "fill",
stat = "identity" ) +
coord_flip() +
theme_minimal() +
ggtitle("Diversity % of Enrollment in Not for Profit US Schools", ) +
theme (plot.title = element_text(hjust = .01, size=12)) +
labs(fill = "Race") +
theme( legend.position="bottom",
legend.justification = "left",
legend.text = element_text(size=6) ,
) +
xlab("School Name") +
ylab ("Percent of Enrollment")
plot2
Comparing this chart to the chart above suggests that there is a higher proportion of white students at non-profit schools than for-profit schools and that there is a higher percentage of students of color at for-profit schools than non-profit schools. Computational analysis must be completed.
ggplot(data=c3bind, aes(x=category, y=enrollment_percentage, fill=profit_status)) +
geom_bar(stat="identity", position=position_dodge()) +
ggtitle("For-Profit/Non-Profit Racial & Ethnic Percent Enrollment Comparisons") +
labs(y = "Enrollment Percentage", x = "Race/Ethnicity")+
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
theme_minimal()+
scale_fill_manual(values=c("#CC6666","#9999CC", "#66CC99"))
This graph incorporates new census data to compare the population proportions to the enrollment percentage proportions. The most striking information in this graphic is the high percentage of for-profit enrollment of most students of color and the higher percentage of white students at non-profit institutions.
c3bind <- c3bind %>%
mutate(enrollment_percentage = enrollment_percentage/100) %>%
mutate_if(is.numeric, round, digits = 2)
## `mutate_if()` ignored the following grouping variables:
## Columns `name`, `category`
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
ggplot(c3bind, aes(reorder(profit_status, enrollment_percentage), enrollment_percentage, fill = profit_status)) +
geom_bar(stat = 'identity') +
labs(x = "Isolated Race/Ethnic Categories", y = "Enrollment Percentage", fill = "Profit Status")+
#scale_y_continuous(labels = function(x) paste0((x*5), "%")) + # Multiply by 5 & add %
facet_wrap(~ category)+
theme_bw()+
scale_fill_brewer()
The above facet-wrap graph is consistent with the barplot above. However, one issue that this graph presents is the percentage scale. This is an issue that impacted a few of the other percentage-based visualizations we completed, particularly with the facet-wrap feature and will be discussed further.
With these individual breakdowns we thought it would be interesting to see what the overall breakdown would be between aggregated people of color as a category and white people as a category, particularly because of the vastly higher number of white people in the U.S. population compared with each individual group of people of color.
# creating a set with only categories for people of color
c5a <- c4 %>%
filter(category != "White")
# removing individual categories
c5b <- c5a[,-2]
# creating a new dataset with percentages summed for overall people of color
c5c <- c5b %>%
group_by(name, profit_status) %>%
summarise_each(funs(mean))
# adding a column to distinguish POC
c5c <- cbind(c5c, category = "People of Color")
c5cmean <- mean(c5c$enrollment_percentage)
c5cmean
## [1] 7.334774
# creating a set of all white category
c5d <- c4 %>%
filter(category == "White")
c5dmean <- mean(c5d$enrollment_percentage)
c5dmean
## [1] 48.29188
# joining the overall people of color dataset with the white people dataset for comparison
c6 <- rbind(c5c, c5d)
# white census category
c3c_white <- c3c %>%
filter(category == "White")
mean(c3c_white$enrollment_percentage)
## [1] 72
# people of color census category
c3c_poc <- c3c %>%
filter(category != "White")
# adding a column to distinguish POC
c3c_poc <- c3c_poc[,-2]
mean(c3c_poc$enrollment_percentage)
## [1] 7.48
c3c_poc <- cbind(c3c_poc, category = "People of Color")
c6census <- rbind(c6, c3c_white, c3c_poc)
p1 <- c6 %>% mutate(enrollment_percentage = enrollment_percentage/100)
ggplot(c6, aes(reorder(category, enrollment_percentage), enrollment_percentage, fill = category)) +
geom_bar(stat = 'identity') +
ggtitle("For-Profit/Non-Profit Racial & Ethnic Percent Enrollment Comparisons") +
labs(y = "Enrollment Percentage", x = "Race/Ethnicity") +
scale_fill_manual(values=c("#9999CC", "#66CC99")) +
theme_light()+
facet_wrap(~ profit_status)
This plot makes clear that there are more people of color enrolled in for-profit colleges than non-profit colleges and more White people enrolled in non-profit colleges than for-profit colleges. Two problems, we do not have census proportions incorporated and our scale is incorrect. Let us now incorporate the census information and then address the scale issue.
ggplot(c6census, aes(reorder(category, enrollment_percentage), enrollment_percentage, fill = category)) +
geom_bar(stat = 'identity') +
ggtitle("For-Profit/Non-Profit Racial & Ethnic Percent Enrollment Comparisons") +
labs(y = "Enrollment Percentage", x = "Race/Ethnicity") +
scale_fill_manual(values=c("#9999CC", "#66CC99")) +
scale_y_continuous(labels = function(x) paste(((x/40) + 20), "%")) + # Multiply by 5 & add %
theme_light()+
facet_wrap(~ profit_status)
This graph incorporates the census information and supports the alternative hypothesis that there are proportionally unequal representations of people of color and white people at for-profit and non-profit colleges. It also makes clear that there are proportionally more white people enrolling in institutions of higher education than people of color.
Regarding the scale issue, we were able to manually manipulate the y-axis scale based off of our knowledge that white people account for 72% of the population. However, would this be ethical? We do not think so. Although we are basing the scale off of accurate information, it opens a slippery slope of visual adjustment of data. It is a good lesson for us to see how easy it is to manipulate the data presentation. We want to learn the correct way by investigating why our scale is inaccurate and determine how to fix the problem in the formulated plot without aesthetic manipulation.
ptc1 <- monsterfinal %>%
ggplot()+
geom_point(aes(x=in_state_total, y=early_career_pay, fill = type, text = paste("Name:", name, '</br>', '</br>In-State Tuition:', in_state_total, '</br>Early Career Pay:', early_career_pay)))+
stat_smooth(aes(x=in_state_total, y=early_career_pay), method = 'lm', se = FALSE, lwd = 0.5, col = "black")+
ggtitle("In-State Tuition and Early-Career Salary")+
labs(x="In-State Tuition (per year)", y="Early-Career Salary")+
theme_minimal()+
theme(plot.title = element_text(hjust = 0.5))
## Warning: Ignoring unknown aesthetics: text
ggplotly(ptc1, tooltip = "text")
## `geom_smooth()` using formula 'y ~ x'
This plot suggests a few interpretations. First, private schools cost much more than in-state public schools. Second, in general, there appears to be a positively correlated relationship between tuition cost and early-career salary, with some exceptions. Third, there is a wider range of tuition costs in private schools than the public school tuition range. This graph is interactive to allow the reader to explore costs and salary potential at individual schools. There was not much aesthetic manipulation of this graph, further graphs explore utilizing different shapes and colors. Now let us examine if the visual is supported by the math.
fitInStateEarlyCareer<- lm(monsterfinal$early_career_pay ~ monsterfinal$in_state_total)
summary(fitInStateEarlyCareer)
##
## Call:
## lm(formula = monsterfinal$early_career_pay ~ monsterfinal$in_state_total)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16883.4 -4543.3 -858.2 3383.9 28405.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.116e+04 2.623e+02 156.91 <2e-16 ***
## monsterfinal$in_state_total 2.565e-01 6.138e-03 41.79 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6747 on 3586 degrees of freedom
## Multiple R-squared: 0.3275, Adjusted R-squared: 0.3273
## F-statistic: 1746 on 1 and 3586 DF, p-value: < 2.2e-16
The In-State tuition rates do have a low p-value suggesting that we should reject the null hypothesis that there is no relationship between in-state tuition and early career pay. However the adjusted R-squared value is only 32.73%, as is demonstrated by the wide spread of datapoints on our plot, telling us that in-state tuition alone does not predict early career pay, rather there is a strong chance that other factors contribute to the observed outcome.
ptc2 <- monsterfinal %>%
ggplot()+
geom_point(aes(x=in_state_total, y=mid_career_pay, fill = type, text = paste("Name:", name, '</br>', '</br>In-State Tuition:', in_state_total, '</br>Mid Career Pay:', mid_career_pay)))+
stat_smooth(aes(x=in_state_total, y=mid_career_pay), method = 'lm', se = FALSE, lwd = 0.5, col = "black")+
ggtitle("In-State Tuition and Mid-Career Salary")+
labs(x="In-State Tuition (per year)", y="Mid-Career Salary")+
theme_minimal()+
scale_fill_viridis_d()+
theme(plot.title = element_text(hjust = 0.5))
## Warning: Ignoring unknown aesthetics: text
ggplotly(ptc2, tooltip = "text")
## `geom_smooth()` using formula 'y ~ x'
The results of this plot are very similar to the previous plot of early-career salary. A few notable points are the expanded range of salary at mid-career versus early-career. There are divergences both upward and downward further down the career track. However the general positive correlation still appears to hold. Aesthetically we varied the color scheme to introduce further contrast. Let us examine the numbers.
fitInStateMidCareer<- lm(monsterfinal$mid_career_pay ~ monsterfinal$in_state_total)
summary(fitInStateMidCareer)
##
## Call:
## lm(formula = monsterfinal$mid_career_pay ~ monsterfinal$in_state_total)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32182 -8723 -1893 6109 50414
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.265e+04 4.956e+02 146.59 <2e-16 ***
## monsterfinal$in_state_total 5.147e-01 1.160e-02 44.38 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12750 on 3586 degrees of freedom
## Multiple R-squared: 0.3545, Adjusted R-squared: 0.3543
## F-statistic: 1970 on 1 and 3586 DF, p-value: < 2.2e-16
The In-State tuition rates do have a low p-value suggesting that we should reject the null hypothesis that there is no relationship between in-state tuition and mid-career pay. However the adjusted R-squared value is only 35.43%, as is demonstrated by the wide spread of datapoints on our plot, telling us that in-state tuition alone does not predict mid-career pay, rather there is a strong chance that other factors contribute to the observed outcome.
ptc3 <- monsterfinal %>%
ggplot()+
geom_point(aes(x=out_of_state_total, y=early_career_pay, fill = type, shape = type, text = paste("Name:", name, '</br>', '</br>Out-of-State Tuition:', out_of_state_total, '</br>Early Career Pay:', early_career_pay)))+
stat_smooth(aes(x=out_of_state_total, y=early_career_pay), method = 'lm', se = FALSE, lwd = 0.5, col = "black")+
ggtitle("Out-of-State Tuition and Early-Career Salary")+
labs(x="Out-of-State Tuition (per year)", y="Early-Career Salary")+
theme_minimal()+
theme(plot.title = element_text(hjust = 0.5))
## Warning: Ignoring unknown aesthetics: text
ggplotly(ptc3, tooltip = "text")
## `geom_smooth()` using formula 'y ~ x'
This plot shows us that there is much more overlap in tuition costs with out-of-state public school tuition and private school tuition. The positively correlated relationship appears to be stronger in this plot. Aesthetically we changed the shape of the public school datapoints, however we find that a color contrast is more effective for this plot due to the high number of datapoints. If we had fewer datapoints the different shapes could be increased in size and would be more effective. Let us now examine the math.
fitOutStateEarlyCareer<- lm(monsterfinal$early_career_pay ~ monsterfinal$out_of_state_total)
summary(fitOutStateEarlyCareer)
##
## Call:
## lm(formula = monsterfinal$early_career_pay ~ monsterfinal$out_of_state_total)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16254.6 -3955.1 -913.2 2777.8 28770.0
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.496e+04 3.053e+02 114.52 <2e-16 ***
## monsterfinal$out_of_state_total 3.753e-01 6.720e-03 55.84 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6017 on 3586 degrees of freedom
## Multiple R-squared: 0.4651, Adjusted R-squared: 0.465
## F-statistic: 3119 on 1 and 3586 DF, p-value: < 2.2e-16
The out-of-state tuition rates do have a low p-value suggesting that we should reject the null hypothesis that there is no relationship between in-state tuition and early career pay. The adjusted R-squared value is 46.5%, which is noticeably greater than in-state tuition. However, it is still not close to 100%, which tells us that out-of-state tuition alone does not predict early career pay, rather there is a strong chance that other factors contribute to the observed outcome.
ptc4 <- monsterfinal %>%
ggplot()+
geom_point(aes(x=out_of_state_total, y=mid_career_pay, fill = type, text = paste("Name:", name, '</br>', '</br>Out-of-State Tuition:', out_of_state_total, '</br>Mid-Career Pay:', mid_career_pay)))+
stat_smooth(aes(x=out_of_state_total, y=mid_career_pay), method = 'lm', se = FALSE, lwd = 0.5, col = "black")+
ggtitle("Out-of-State Tuition and Mid-Career Salary")+
labs(x="Out-of-State Tuition (per year)", y="Mid-Career Salary")+
theme_minimal()+
scale_fill_viridis_d()+
theme(plot.title = element_text(hjust = 0.5))
## Warning: Ignoring unknown aesthetics: text
ggplotly(ptc4, tooltip = "text")
## `geom_smooth()` using formula 'y ~ x'
Similarly to the in-state tuition plots, this plot largely mimics the early-career plot, however there is a wider apparent range in salaries. The positive correlation still appears to hold.
fitOutStateMidCareer<- lm(monsterfinal$mid_career_pay ~ monsterfinal$out_of_state_total)
summary(fitOutStateMidCareer)
##
## Call:
## lm(formula = monsterfinal$mid_career_pay ~ monsterfinal$out_of_state_total)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30896 -7692 -1775 5496 49950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.047e+04 5.717e+02 105.76 <2e-16 ***
## monsterfinal$out_of_state_total 7.471e-01 1.259e-02 59.36 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 11270 on 3586 degrees of freedom
## Multiple R-squared: 0.4956, Adjusted R-squared: 0.4954
## F-statistic: 3523 on 1 and 3586 DF, p-value: < 2.2e-16
The out-of-state tuition rates do have a low p-value suggesting that we should reject the null hypothesis that there is no relationship between out-of-state tuition and mid-career pay. The adjusted R-squared value is 49.54%, which is noticeably greater than in-state tuition. However, it is still not close to 100%, which tells us that out-of-state tuition alone does not predict mid-career pay, rather there is a strong chance that other factors contribute to the observed outcome.
library('rvest')
library(tidyverse)
library(dplyr)
#read.csv("/Users/tiffanyking/Documents/Data101/tuition_cost.csv")
#setwd("/Users/tiffanyking/Documents/Data101")
#tuition <- read_csv("tuition_cost.csv")
head(tuition)
## # A tibble: 6 x 10
## name state state_code type degree_length room_and_board in_state_tuition
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Aani~ Mont~ MT Publ~ 2 Year NA 2380
## 2 Abil~ Texas TX Priv~ 4 Year 10350 34850
## 3 Abra~ Geor~ GA Publ~ 2 Year 8474 4128
## 4 Acad~ Minn~ MN For ~ 2 Year NA 17661
## 5 Acad~ Cali~ CA For ~ 4 Year 16648 27810
## 6 Adam~ Colo~ CO Publ~ 4 Year 8782 9440
## # ... with 3 more variables: in_state_total <dbl>, out_of_state_tuition <dbl>,
## # out_of_state_total <dbl>
We changed all of the names of our schools to simplified versions which was particularly important for the schools with multiple locations.
tuition$name[grepl('Clover Park Technical College', tuition$name)] <- 'Clover Park Tech College'
tuition$name[grepl('San Diego City College', tuition$name)] <- 'San Diego City College'
tuition$name[grepl('Aspen University', tuition$name)] <- 'Aspen Univ.'
tuition$name[grepl('Grand Canyon University', tuition$name)] <- 'Grand Canyon Univ.'
tuition$name[grepl('American Public University', tuition$name)] <- 'American Public Univ.'
tuition$name[grepl("Southeastern Community College (Iowa)", tuition$name)] <- "Southeastern Community College Iowa"
tuition$name[grepl('Blue Cliff College', tuition$name)] <- 'Blue Cliff College'
tuition$name[grepl('University of Phoenix', tuition$name)] <- 'University of Phoenix'
tuition$name[grepl('Stevens-Henager College', tuition$name)] <- 'Stevens-Henager College'
tuition$name[grepl('DeVry University', tuition$name)] <- 'DeVry Univ.'
tuition$name[grepl('Pioneer Pacific College', tuition$name)] <- 'Pioneer Pacific College'
tuition$name[grepl('National College at', tuition$name)] <- 'National College'
tuition$name[grepl('Strayer University', tuition$name)]<- 'Strayer Univ.'
tuition$name[grepl('Lincoln Tech',tuition$name)]<- 'Lincoln Tech'
tuition$name[grepl('Fashion Institute of Design and Merchandising', tuition$name)]<- 'Fashion Institute'
tuition$name[grepl('Centura College', tuition$name)]<- 'Centura College'
tuition$name[grepl('Rasmussen College', tuition$name)]<- 'Rasmussen College'
tuition$name[grepl('Fortis College',tuition$name)]<- 'Fortis College'
tuition$name[grepl('University of Houston', tuition$name)] <- 'Univ. of Houston'
tuition$name[grepl('University of Colorado', tuition$name)] <- 'Univ. of Colorado'
tuition$name[grepl('University of Massachusetts', tuition$name)] <- 'Univ. of Massachusetts'
tuition$name[grepl('Arizona College', tuition$name)] <- 'Arizona College'
tuition$name[grepl('University of Idaho', tuition$name)] <- 'Univ. of Idaho'
tuition$name[grepl('Clark College', tuition$name)] <- 'Clark College'
tuition$name[grepl('City University of New York Hunter College', tuition$name)] <- 'City Univ. New York Hunter College'
tuition$name[grepl('State University of New York', tuition$name)] <- 'State Univ. of New York'
tuition$name[grepl('Pennsylvania State University',tuition$name)]<- 'Penn State Univ.'
tuition$name[grepl('University of Minnesota',tuition$name)]<- 'Univ. of Minn'
tuition$name[grepl('Arizona State University',tuition$name)]<- 'Arizona State'
tuition$name[grepl('ITT Technical Institute',tuition$name)]<- 'ITT Tech'
for_profit2 <- tuition %>%
filter(name == "Spencerian College" |
name == "Mildred Elley" |
name == "Brookline College" |
name == "Grand Canyon Univ." |
name == "Aspen Univ." |
name == "American Public Univ." |
name == "Western Intl Univ." |
name == "NewSchool Arch. Design" |
name == "Schiller Intl Univ." |
name == "Natl Paralegal College" |
name == "West Coast Univ." |
name == "Blue Cliff College" |
name == "Walden University" |
name == "Neumont University" |
name == "University of Phoenix" |
name == "Stevens-Henager College" |
name == "DeVry Univ." |
name == "Pioneer Pacific College" |
name == "Stratford University" |
name == "Capella University" |
name == "Grantham University" |
name == "Redstone College" |
name == "National College" |
name == "Strayer Univ." |
name == "Lincoln Tech" |
name == "Fashion Institute" |
name == "Centura College" |
name == "Rasmussen College" |
name == "Fortis College" |
name == "Full Sail University" |
name == "Rocky Mountain College of Art & Design" |
name == "Minneapolis Business College" |
name == "Paier College of Art" |
name == "Vista College" |
name == "Bay State College" )
str(for_profit2)
## tibble [6 x 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:6] "Bay State College" "Capella University" "Grand Canyon Univ." "Grantham University" ...
## $ state : chr [1:6] "Massachusetts" "Minnesota" "Arizona" "Kansas" ...
## $ state_code : chr [1:6] "MA" "MN" "AZ" "KS" ...
## $ type : chr [1:6] "For Profit" "For Profit" "For Profit" "For Profit" ...
## $ degree_length : chr [1:6] "2 Year" "4 Year" "4 Year" "4 Year" ...
## $ room_and_board : num [1:6] 13300 NA 10100 NA NA NA
## $ in_state_tuition : num [1:6] 27750 13788 27090 6540 18040 ...
## $ in_state_total : num [1:6] 41050 13788 37190 6540 18040 ...
## $ out_of_state_tuition: num [1:6] 27750 13788 27090 6540 18040 ...
## $ out_of_state_total : num [1:6] 41050 13788 37190 6540 18040 ...
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. state = col_character(),
## .. state_code = col_character(),
## .. type = col_character(),
## .. degree_length = col_character(),
## .. room_and_board = col_double(),
## .. in_state_tuition = col_double(),
## .. in_state_total = col_double(),
## .. out_of_state_tuition = col_double(),
## .. out_of_state_total = col_double()
## .. )
head(for_profit2)
## # A tibble: 6 x 10
## name state state_code type degree_length room_and_board in_state_tuition
## <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Bay ~ Mass~ MA For ~ 2 Year 13300 27750
## 2 Cape~ Minn~ MN For ~ 4 Year NA 13788
## 3 Gran~ Ariz~ AZ For ~ 4 Year 10100 27090
## 4 Gran~ Kans~ KS For ~ 4 Year NA 6540
## 5 Paie~ Conn~ CT For ~ 4 Year NA 18040
## 6 Wald~ Minn~ MN For ~ 4 Year NA 15045
## # ... with 3 more variables: in_state_total <dbl>, out_of_state_tuition <dbl>,
## # out_of_state_total <dbl>
not_profit <- tuition %>%
filter(name == "Univ. of Idaho" |
name == "Clover Park Technical College"|
name == "Clark College"|
name == "City Univ. New York Hunter College"|
name == "Univ. of Houston" |
name == "Univ. of Colorado"|
name == "Univ. of Massachusetts" |
name == "Arizona College" |
name == "San Diego City College" |
name == "State Univ. of New York" |
name == "Eastern Shore Community College" |
name == "Southern Oregon University" |
name == "Santa Monica College" |
name == "Adirondack Community College" |
name == "East Georgia State College" |
name == "Smith College" |
name == "East Tennessee State University" |
name == "Austin Community College" |
name == "Tompkins Cortland Community College" |
name == "Penn State Univ." |
name == "Univ. of Minn" |
name == "Arizona State" |
name == "ITT Tech" |
name == "University of Hawaii Hawaii Community College"|
name == "Burlington College" |
name == "University of Central Oklahoma" |
name == "Southern Virginia University" |
name == "Pennsylvania Highlands Community College" |
name == "Colgate University"|
name == "University of Pittsburg")
diversity <- read_csv("diversity2.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## total_enrollment = col_double(),
## state = col_character(),
## category = col_character(),
## enrollment = col_double()
## )
salary <- read_csv("salary_potential.csv")
## Parsed with column specification:
## cols(
## rank = col_double(),
## name = col_character(),
## state_name = col_character(),
## early_career_pay = col_double(),
## mid_career_pay = col_double(),
## make_world_better_percent = col_double(),
## stem_percent = col_double()
## )
salary_added <- left_join(diversity,salary,by="name")
salary_added_final <- na.omit(salary_added)
tuition <- read_csv("tuition_cost.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## state = col_character(),
## state_code = col_character(),
## type = col_character(),
## degree_length = col_character(),
## room_and_board = col_double(),
## in_state_tuition = col_double(),
## in_state_total = col_double(),
## out_of_state_tuition = col_double(),
## out_of_state_total = col_double()
## )
tuition_added <- left_join(tuition,salary_added_final,by="name")
monsterfinal <- na.omit(tuition_added)
library(ggplot2)
mean(not_profit$in_state_tuition)
## [1] 13346.43
mean(for_profit2$in_state_tuition)
## [1] 18042.17
mean(not_profit$out_of_state_tuition)
## [1] 23281.38
mean(for_profit2$out_of_state_tuition)
## [1] 18042.17
test <- data.frame( "Name" = c("Not Profit In State","Not Profit Out State","For Profit In State","For Profit Out State"),"Mean" = c(13346.43,23281.38,18042.17,18042.17))
str(not_profit)
## tibble [37 x 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ name : chr [1:37] "Adirondack Community College" "Arizona State" "Austin Community College" "Arizona College" ...
## $ state : chr [1:37] "New York" "Arizona" "Texas" "Arizona" ...
## $ state_code : chr [1:37] "NY" "AZ" "TX" "AZ" ...
## $ type : chr [1:37] "Public" "Public" "Public" "Public" ...
## $ degree_length : chr [1:37] "2 Year" "4 Year" "2 Year" "2 Year" ...
## $ room_and_board : num [1:37] 11660 12648 NA 7700 NA ...
## $ in_state_tuition : num [1:37] 5375 10822 2550 2580 4287 ...
## $ in_state_total : num [1:37] 17035 23470 2550 10280 4287 ...
## $ out_of_state_tuition: num [1:37] 9935 28336 13020 11100 9718 ...
## $ out_of_state_total : num [1:37] 21595 40984 13020 18800 9718 ...
## - attr(*, "spec")=
## .. cols(
## .. name = col_character(),
## .. state = col_character(),
## .. state_code = col_character(),
## .. type = col_character(),
## .. degree_length = col_character(),
## .. room_and_board = col_double(),
## .. in_state_tuition = col_double(),
## .. in_state_total = col_double(),
## .. out_of_state_tuition = col_double(),
## .. out_of_state_total = col_double()
## .. )
install.packages("RColorBrewer")
## Warning: package 'RColorBrewer' is in use and will not be installed
library(RColorBrewer)
plot1 <- test %>%
ggplot(aes(x=Name, y=Mean,fill=Name)) +
geom_bar(stat="identity")+
coord_flip() +
theme_minimal() +
ggtitle("Average Tuition Rate For & Not Profit US Schools", ) +
theme (plot.title = element_text(hjust = .01, size=15)) +
theme(legend.justification = -20,
legend.position="bottom",
legend.text = element_text(size=6) ,
) +
xlab("School Type") +
ylab ("Cost") +
scale_color_brewer()
plot1
This plot, created from a randomly selected dataset, suggests that the out of state tuition costs outweigh the costs at for profit institutions. Non-profit in-state institutions were the least expensive in our dataset. The dataset did include community colleges, which would be one factor to control for in further analysis.
test2 <- data.frame ("Name" = c("Univ. of Idaho", "Clover Park Technical College", "Clark College","City Univ. New York Hunter College", "Univ. of Houston","Univ. of Colorado", "Univ. of Massachusetts", "Arizona College","San Diego City College","State Univ. of New York", "Eastern Shore Community College","Southern Oregon University", "Santa Monica College","Adirondack Community College","East Georgia State College","Smith College","East Tennessee State University","Austin Community College", "Tompkins Cortland Community College","Penn State Univ.","Univ. of Minn","Arizona State","ITT Tech","University of Hawaii Hawaii Community College","Burlington College","University of Central Oklahoma", "Southern Virginia University","Pennsylvania Highlands Community College","Colgate University","University of Pittsburg") , "Tuition"=c(7864, 4271, 4287, 7180, 12506, 11463, 14835, 2640, 1418, 8540, 4800, 9654, 1444,5375,3666, 52404,9277,2550,6046,18454,11921,10822, 45000, 3915, 22500, 7488, 16495, 5820, 55870, 19080))
plot1 <- test2%>%
ggplot(aes(x=Name, y=Tuition ,fill=Tuition)) +
geom_bar(stat="identity")+
coord_flip() +
theme_minimal() +
ggtitle("In-State Tuition Rate for Not Profit US Schools", ) +
theme (plot.title = element_text(hjust = .01, size=15)) +
theme(legend.justification = 1,
legend.position="bottom",
legend.text = element_text(size=6) ,
) +
xlab("School name") +
ylab ("Tuition Cost") +
scale_color_brewer()
plot1
This plot identifies the individual costs of tuition within our non-profit school dataset. The private colleges were much more expensive than the in-state tuition at public schools.
test3 <- data.frame("Name"=c("Spencerian College", "Mildred Elley", "Brookline College","Grand Canyon Univ.","Aspen Univ.", "American Public Univ.", "Western Intl Univ.", "NewSchool Arch. Design", "Schiller Intl Univ.", "Natl Paralegal College", "West Coast Univ.", "Blue Cliff College", "Walden University", "Neumont University", "University of Phoenix", "Stevens-Henager College", "DeVry Univ.", "Pioneer Pacific College", "Stratford University", "Capella University","Grantham University", "Redstone College", "National College","Strayer Univ.","Lincoln Tech", "Fashion Institute","Centura College","Rasmussen College","Fortis College", "Full Sail University", "Rocky Mountain College of Art & Design", "Minneapolis Business College", "Paier College of Art", "Vista College","Bay State College"), "Tuition3"=c(16860,14250,13500,27090,14492,7324,11040,28386,17700,7995,17030,15182,15045,24750,9,552,14585,15188,2310,16750,13788,6540,15530,13320,27401,5740,15887,5200,15511,19929,28512,15120,18040,2720,27750))
##Graph the Test 3 Data Set(For-Profit-In-state)
plot1 <- test3%>%
ggplot(aes(x=Name, y=Tuition3 ,fill=Tuition3)) +
geom_bar(stat="identity")+
coord_flip() +
theme_minimal() +
ggtitle("In-State Tuition Rate for For Profit US Schools", ) +
theme (plot.title = element_text(hjust = .01, size=15)) +
theme(legend.justification = 1,
legend.position="bottom",
legend.text = element_text(size=6) ,
) +
xlab("School name") +
ylab ("Tuition Cost") +
scale_color_brewer()
plot1
Consistent with our previous plots, the in-state tuition rates at for-profit schools were all more expensive than the in-state tuition at non-profit schools.
#install.packages("RColorBrewer")
#Loading the rvest package
library('rvest')
library(tidyverse)
library(dplyr)
#Specifying the url for desired website to be scraped
url <- 'https://en.wikipedia.org/wiki/College_and_university_rankings_in_the_United_States'
#Reading the HTML code from the website
top <- read_html(url)
#Using CSS selectors to scrape the rankings section
top_html <- html_nodes(top,'td a')
#Converting the ranking data to text
topschools <- html_text(top_html)
#topschools
topschools_new <- topschools[11:30]
#df <-data_frame(topschools_new)
#df
topschools_new
## [1] "Harvard University"
## [2] "Stanford University"
## [3] "Yale University"
## [4] "Massachusetts Institute of Technology"
## [5] "Princeton University"
## [6] "University of Pennsylvania"
## [7] "Brown University"
## [8] "California Institute of Technology"
## [9] "Duke University"
## [10] "Dartmouth College"
## [11] "Cornell University"
## [12] "Pomona College"
## [13] "University of California, Berkeley"
## [14] "Columbia University"
## [15] "Georgetown University"
## [16] "University of Chicago"
## [17] "Northwestern University"
## [18] "University of Notre Dame"
## [19] "Williams College"
## [20] "University of Michigan"
diversity$name[grepl('University of Michigan', diversity$name)] <- 'University of Michigan'
diversity <- diversity %>% mutate(enrollment_percentage = enrollment/ total_enrollment*100)
topschools_diversity <- diversity %>%
filter(name == "Harvard University" |
name == "Stanford University" |
name == "Yale University" |
name == "Massachusetts Institute of Technology" |
name == "Princeton University" |
name == "University of Pennsylvania" |
name == "Brown University" |
name == "California Institute of Technology" |
name == "Duke University" |
name == "Dartmouth College" |
name == "Cornell University" |
name == "Pomona College" |
name == "University of California at Berkeley" |
name == "Columbia University" |
#name == "Georgetown University" |
name == "University of Chicago" |
name == "Northwestern University" |
name == "University of Notre Dame" |
name == "Rensselaer Polytechnic Institute" |
name == "University of Michigan"
)
unique(topschools_diversity$name)
## [1] "University of Michigan"
## [2] "University of California at Berkeley"
## [3] "Harvard University"
## [4] "Columbia University"
## [5] "University of Pennsylvania"
## [6] "Cornell University"
## [7] "Northwestern University"
## [8] "Stanford University"
## [9] "Duke University"
## [10] "University of Chicago"
## [11] "Yale University"
## [12] "University of Notre Dame"
## [13] "Massachusetts Institute of Technology"
## [14] "Brown University"
## [15] "Princeton University"
## [16] "Rensselaer Polytechnic Institute"
## [17] "Dartmouth College"
## [18] "California Institute of Technology"
## [19] "Pomona College"
topschools_diversity <- topschools_diversity %>% filter(category != "Women" & category != "Two Or More Races" & category != "Non-Resident Foreign" & category != "Unknown" & category != "Total Minority")
plot1 <- topschools_diversity %>%
ggplot() +
geom_bar(aes(x=name, y= enrollment_percentage ,
fill = category),
position = "fill",
stat = "identity" ) +
coord_flip() +
theme_minimal() +
ggtitle("Diversity Percent of Enrollment for Top 19 Collegs in US", ) +
theme (plot.title = element_text(hjust = .01, size=13)) +
labs(fill = "Race") +
theme(legend.justification = -5,
legend.position="bottom",
legend.text = element_text(size=7) ,
) +
xlab("School Name") +
ylab ("Percent of Enrollment")
plot1
The graph shows that white students are overwhelmingly present in the top 19 colleges in the US. The 2nd largest present population are Asian students. Hispanic and Black students follow, but are very underwhelmed in numbers. The population of American Indian and Native Hawaiian students are barely visible and present on the graph. This graph fuels the preconception that top US schools (including IVY leagues) are reserved for students with access to resources and privilege (white and asian).
tuition <- read_csv("tuition_cost.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## state = col_character(),
## state_code = col_character(),
## type = col_character(),
## degree_length = col_character(),
## room_and_board = col_double(),
## in_state_tuition = col_double(),
## in_state_total = col_double(),
## out_of_state_tuition = col_double(),
## out_of_state_total = col_double()
## )
salary <- read_csv("salary_potential.csv")
## Parsed with column specification:
## cols(
## rank = col_double(),
## name = col_character(),
## state_name = col_character(),
## early_career_pay = col_double(),
## mid_career_pay = col_double(),
## make_world_better_percent = col_double(),
## stem_percent = col_double()
## )
salary_added <- merge(x = diversity, y = salary, by = "name", all = TRUE)
all_added$in_state_tuition[all_added$name == "Harvard University"] <- 46340
all_added$in_state_total[all_added$name == "Harvard University"] <- 67580
all_added$out_of_state_tuition[all_added$name == "Harvard University"] <- 46340
all_added$out_of_state_total[all_added$name == "Harvard University"] <- 67580
all_added$out_of_state_tuition[all_added$name == "Harvard University"] <- 46340
all_added$room_and_board[all_added$name == "Harvard University"] <- 17160
all_added$state.y [all_added$name == "Harvard University"] <- "Massachusetts"
all_added$state_code [all_added$name == "Harvard University"] <- "MA"
all_added$type[all_added$name == "Harvard University"] <- "Private"
all_added$degree_length[all_added$name == "Harvard University"] <- "4 Year"
monsterfinal <- na.omit(all_added)
topschools_monster <- monsterfinal %>%
filter(name == "Harvard University" |
name == "Stanford University" |
name == "Yale University" |
name == "Massachusetts Institute of Technology" |
name == "Princeton University" |
name == "University of Pennsylvania" |
name == "Brown University" |
name == "California Institute of Technology" |
name == "Duke University" |
name == "Dartmouth College" |
name == "Cornell University" |
name == "Pomona College" |
name == "University of California at Berkeley" |
name == "Columbia University" |
name == "Georgetown University" |
name == "University of Chicago" |
name == "Northwestern University" |
name == "University of Notre Dame" |
name == "Rensselaer Polytechnic Institute" |
name == "University of Michigan"
)
unique(topschools_monster$name)
## [1] "Brown University"
## [2] "California Institute of Technology"
## [3] "Cornell University"
## [4] "Dartmouth College"
## [5] "Duke University"
## [6] "Harvard University"
## [7] "Massachusetts Institute of Technology"
## [8] "Northwestern University"
## [9] "Pomona College"
## [10] "Princeton University"
## [11] "Rensselaer Polytechnic Institute"
## [12] "Stanford University"
## [13] "University of Chicago"
## [14] "University of Notre Dame"
## [15] "University of Pennsylvania"
## [16] "Yale University"
library(ggplot2)
library(plotly)
plot_top <- ggplot(topschools_monster,
aes(x=early_career_pay, y=in_state_total, color=name)) +
geom_point(stat = 'identity') +
geom_smooth(method=lm, colour="grey") +
ggtitle("Early Career Pay and Instate tutition of Top Schools") +
xlab("Early Career Pay Salary") +
ylab ("In-state Tuition Total") +
labs(color = "School Name")
plot_top
## `geom_smooth()` using formula 'y ~ x'
ggplotly(plot_top)
## `geom_smooth()` using formula 'y ~ x'
The graph shows an intersection between the tuition costs and early career salary outcomes of students who attended top US schools. One would assume that the more expensive a school would be, is correlated to the “prestige” and be reflected in a higher salary coming straight of out undergrad. However, this graph suggest the opposite, that the lesser inexpensive “top schools” are yielding relatively higher early career salary than their more expensive counterparts.
monsterfinaltop <- topschools_monster %>% filter(category != "Women" & category != "Two Or More Races" & category != "Non-Resident Foreign" & category != "Unknown" & category != "Total Minority")
mft2 <- monsterfinaltop %>% mutate_if(is.numeric, round, digits = 2)
library(ggplot2)
library(plotly)
plot_top2 <- ggplot(mft2,
aes(x=category, y=enrollment_percentage, color=name, text = paste("Percent:", enrollment_percentage, '</br>', '</br>School Name:', name))) +
geom_point(stat = 'identity') +
geom_smooth(method=lm, colour="grey") +
ggtitle("Racial and Ethnic Demographics at Top Schools") +
xlab("Race/Ethnicity") +
ylab ("Enrollment Percentage") +
scale_x_discrete(labels = function(x) str_wrap(x, width = 10))+
labs(color = "School Name")
ggplotly(plot_top2tooltip = "text")
## `geom_smooth()` using formula 'y ~ x'
This graph makes clear the disproportionate representation of White people at top schools.
As the graphs above illustrate, our data verifies that indeed there are a higher proportion of students of color at for-profit colleges than at non-profit colleges. White Students have much greater attendance at non-profit colleges. These findings provide support for the argument that there are higher numbers of enrollment of students of color at for-profit colleges. One possible explanation could be targeted marketing campaigns by for-profit colleges to students of color. Further research is required.
Given the well-documented high cost of for-profit education, the exorbitant student debt incurred at these schools, the lower rates of employment upon graduation and the low levels of satisfaction with educational quality from previous students, all potential students should think twice before enrolling in for-profit colleges. This is particularly true for students of color who may be identified by for-profit colleges as susceptible to predatory exploitative practices that will provide additional profits to the college’s shareholders without making good on their promise to provide a marketable, quality education. The predatory actions of for-profit colleges contribute to our nation’s growing economic divide between the haves and the have nots and perpetuates the unequal education system we have today. Fortunately, there are many people and organizations working to end these predatory practices, but in the meantime while they continue to exist, we must inform one another about the importance of obtaining a quality education at not-for-profit institutions, like Montgomery College.
It is also clear that whites are disproportionately represented at the top 20 schools as well, which provides continued advantages for White people in earning potential throughout their lives, as the salary potential data on these schools portrayed.
In general, our data visualizations suggest the higher the tuition paid, the higher the salary received. However, although positive correlations were found to hold some significance, the R-squared values made it clear that there are many other possible explanatory variables for that outcome, including social capital, particularly at high-cost, high-salary expectation schools.
This may be related to the interesting finding we had that at the top 20 schools, cost of tuition and future salary potential were negatively correlated. Perhaps there is a tipping point. At this point though, we just do not know and future computational analysis and research is required to answer these questions and explain these findings comprehensively.
The following is a list of more detailed analysis that could be done with this dataset:
Comparing diversity of for-profit and not-for-profit schools in the same geographic regions utilizing local racial demographics.
Comparing diversity in community colleges, four-year-institutions, and for-profit institutions.
Comparing the tuition rates, salary potential and overall profits received from community colleges, four-year institutions and for-profit institutions.
A longitudinal study could be conducted to see if there have been changes in the demographic makeup of for-profit colleges in the time before the Obama-era regulations, the lack of regulation during the DeVos era and the patterns that emerge after the predatory lending lawsuits and any resulting legislation.
Body, D. (2019, Mar. 19). Worse Off Than When They Enrolled: The Consequence of For-Profit Colleges for People of Color. The Aspen Institute. https://www.aspeninstitute.org/blog-posts/worse-off-than-when-they-enrolled-the-consequence-of-for-profit-colleges-for-people-of-color/
Bonadies, G.G., Rovenger, J., Connor, E., Shum, B. & Merrill, T. (2018, Jul. 30). For-Profit Schools’ Predatory Practices and Students of Color: A Mission to Enroll Rather than Educate, Harvard Law Review Blog. https://blog.harvardlawreview.org/for-profit-schools-predatory-practices-and-students-of-color-a-mission-to-enroll-rather-than-educate/
Conti, A. (2019, Sep. 10). How For-Profit Colleges Have Targeted and Taken Advantage of Black Students. Vice. https://www.vice.com/en_us/article/bjwj3d/how-for-profit-colleges-have-targeted-and-taken-advantage-of-black-students
Green, E.L. (2019, Jun. 28). DeVos Repeals Obama-Era Rule Cracking down on For-Profit Colleges, New York Times. https://www.nytimes.com/2019/06/28/us/politics/betsy-devos-for-profit-colleges.html
Halperin, D. 22 States Sue DeVos to Overturn Anti-Student Rule. Republic Report. https://www.republicreport.org/2020/22-states-sue-devos-to-overturn-anti-student-rule/
Legal Services Center(2020), Project on Predatory Student Lending: Cases, Harvard Law School. https://predatorystudentlending.org/cases/
Lobosco, K. (2019, Jul. 23). For-profit college students are waiting 958 days for loan relief, CNN. https://www.cnn.com/2019/07/23/politics/betsy-devos-loan-forgiveness-for-profit-college-students/index.html
Lopez, M. (2015, Feb. 12). BEWARE: For-Profit Colleges. The Patriot Post. https://bcchspatriotpost.com/2391/news/beware-for-profit-colleges/
Redman, H. (2020, Jun. 27). AG Sues Department of Education Over For-Profit College Rules. Urban Milwaukee. https://urbanmilwaukee.com/2020/06/27/ag-sues-department-of-education-over-for-profit-college-rules/
TBS Staff (2019, Jul. 29). For-Profit Colleges vs. Non-Profit Colleges - What’s The Difference? The Best Schools Magazine. https://thebestschools.org/magazine/for-profit-vs-non-profit
Turner, C. (2019, Nov. 14). Devos Refuses to Forgive student Debt For Those DeFrauded by For-Profit Colleges, All Things Considered, NPR. https://www.npr.org/2019/11/14/779465130/devos-refuses-to-forgive-student-debt-for-those-defrauded-by-for-profit-colleges
Voorhees, K. (2019, Oct. 17). Civil Rights Groups: For-Profit Colleges Exploit Black and Latino Students. The Leadership Conference Education Fund. https://civilrights.org/edfund/2019/10/17/civil-rights-groups-for-profit-colleges-exploit-black-and-latino-students/