Reading the data

The dataset was obtained from https://www.kaggle.com/jessemostipak/college-tuition-diversity-and-pay?select=tuition_cost.csv

college_costs <- read_csv("Tuition and fees by college university for 2018-2019.csv")
glimpse(college_costs)
## Rows: 2,973
## Columns: 10
## $ name                 <chr> "Aaniiih Nakoda College", "Abilene Christian U...
## $ state                <chr> "Montana", "Texas", "Georgia", "Minnesota", "C...
## $ state_code           <chr> "MT", "TX", "GA", "MN", "CA", "CO", "NY", "NY"...
## $ type                 <chr> "Public", "Private", "Public", "For Profit", "...
## $ degree_length        <chr> "2 Year", "4 Year", "2 Year", "2 Year", "4 Yea...
## $ room_and_board       <dbl> NA, 10350, 8474, NA, 16648, 8782, 16030, 11660...
## $ in_state_tuition     <dbl> 2380, 34850, 4128, 17661, 27810, 9440, 38660, ...
## $ in_state_total       <dbl> 2380, 45200, 12602, 17661, 44458, 18222, 54690...
## $ out_of_state_tuition <dbl> 2380, 34850, 12550, 17661, 27810, 20456, 38660...
## $ out_of_state_total   <dbl> 2380, 45200, 21024, 17661, 44458, 29238, 54690...

Cleaning the data

Check if there are any unnecessary values for character columns.

levels(factor(college_costs$state))
##  [1] "Alabama"        "Alaska"         "Arizona"        "Arkansas"      
##  [5] "California"     "Colorado"       "Connecticut"    "Delaware"      
##  [9] "Florida"        "Georgia"        "Hawaii"         "Idaho"         
## [13] "Illinois"       "Indiana"        "Iowa"           "Kansas"        
## [17] "Kentucky"       "Louisiana"      "Maine"          "Maryland"      
## [21] "Massachusetts"  "Michigan"       "Minnesota"      "Mississippi"   
## [25] "Missouri"       "Montana"        "Nebraska"       "Nevada"        
## [29] "New Hampshire"  "New Jersey"     "New Mexico"     "New York"      
## [33] "North Carolina" "North Dakota"   "Ohio"           "Oklahoma"      
## [37] "Oregon"         "Pennsylvania"   "Rhode Island"   "South Carolina"
## [41] "South Dakota"   "Tennessee"      "Texas"          "Utah"          
## [45] "Vermont"        "Virginia"       "Washington"     "West Virginia" 
## [49] "Wisconsin"      "Wyoming"
levels(factor(college_costs$state_code))
##  [1] "AK" "AL" "AR" "AS" "AZ" "CA" "CO" "CT" "DC" "DE" "FL" "GA" "GU" "HI" "IA"
## [16] "ID" "IL" "IN" "KS" "KY" "LA" "MA" "MD" "ME" "MI" "MN" "MO" "MS" "MT" "NC"
## [31] "ND" "NE" "NH" "NJ" "NM" "NV" "NY" "OH" "OK" "OR" "PA" "PR" "RI" "SC" "SD"
## [46] "TN" "TX" "UT" "VA" "VI" "VT" "WA" "WI" "WV" "WY"
levels(factor(college_costs$type))
## [1] "For Profit" "Other"      "Private"    "Public"
levels(factor(college_costs$degree_length))
## [1] "2 Year" "4 Year" "Other"

There are 55 levels in state_code. Remove the territories of USA to deal with only 50 states and DC and change the value “Other” in the variables type and degree_length.

college_costs <- college_costs %>%
  filter(!state_code %in% c("AS", "GU", "PR", "VI"))
college_costs %>%
  filter(type == "Other" | degree_length == "Other")
## # A tibble: 1 x 10
##   name  state state_code type  degree_length room_and_board in_state_tuition
##   <chr> <chr> <chr>      <chr> <chr>                  <dbl>            <dbl>
## 1 Univ~ Texas TX         Other Other                     NA             8448
## # ... with 3 more variables: in_state_total <dbl>, out_of_state_tuition <dbl>,
## #   out_of_state_total <dbl>

After searching the university above, it is a 4 year public college. So, change the information about the university.

college_costs <- college_costs %>%
  mutate(type = replace(type, type == "Other", "Public"),
         degree_length = replace(degree_length, degree_length == "Other", "4 Year"))

Convert the classes of the character variables into factors and convert costs in thousands for the numerical variables.

college_costs <- college_costs %>%
  mutate_at(vars(state, state_code, type, degree_length), ~as.factor(.)) %>%
  mutate_at(vars(room_and_board, in_state_tuition, in_state_total, out_of_state_tuition, out_of_state_total), funs(./ 10^3))
str(college_costs)
## tibble [2,929 x 10] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ name                : chr [1:2929] "Aaniiih Nakoda College" "Abilene Christian University" "Abraham Baldwin Agricultural College" "Academy College" ...
##  $ state               : Factor w/ 50 levels "Alabama","Alaska",..: 26 43 10 23 5 6 32 32 22 46 ...
##  $ state_code          : Factor w/ 51 levels "AK","AL","AR",..: 27 44 11 24 5 6 35 35 23 46 ...
##  $ type                : Factor w/ 3 levels "For Profit","Private",..: 3 2 3 1 1 3 2 3 2 1 ...
##  $ degree_length       : Factor w/ 2 levels "2 Year","4 Year": 1 2 1 1 2 2 2 1 2 1 ...
##  $ room_and_board      : num [1:2929] NA 10.35 8.47 NA 16.65 ...
##  $ in_state_tuition    : num [1:2929] 2.38 34.85 4.13 17.66 27.81 ...
##  $ in_state_total      : num [1:2929] 2.38 45.2 12.6 17.66 44.46 ...
##  $ out_of_state_tuition: num [1:2929] 2.38 34.85 12.55 17.66 27.81 ...
##  $ out_of_state_total  : num [1:2929] 2.38 45.2 21.02 17.66 44.46 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   name = col_character(),
##   ..   state = col_character(),
##   ..   state_code = col_character(),
##   ..   type = col_character(),
##   ..   degree_length = col_character(),
##   ..   room_and_board = col_double(),
##   ..   in_state_tuition = col_double(),
##   ..   in_state_total = col_double(),
##   ..   out_of_state_tuition = col_double(),
##   ..   out_of_state_total = col_double()
##   .. )

Getting the statistical summary and outliers of the data

summary(college_costs)
##      name                    state        state_code           type     
##  Length:2929        California  : 254   CA     : 254   For Profit: 102  
##  Class :character   New York    : 221   NY     : 221   Private   :1258  
##  Mode  :character   Pennsylvania: 160   PA     : 160   Public    :1569  
##                     Texas       : 150   TX     : 150                    
##                     Ohio        : 127   OH     : 127                    
##                     (Other)     :2009   IL     : 125                    
##                     NA's        :   8   (Other):1892                    
##  degree_length room_and_board  in_state_tuition in_state_total  
##  2 Year:1112   Min.   : 0.03   Min.   : 0.480   Min.   : 0.962  
##  4 Year:1817   1st Qu.: 7.95   1st Qu.: 4.915   1st Qu.: 5.880  
##                Median :10.03   Median :10.408   Median :17.960  
##                Mean   :10.12   Mean   :16.658   Mean   :23.115  
##                3rd Qu.:12.43   3rd Qu.:27.400   3rd Qu.:36.511  
##                Max.   :21.30   Max.   :59.985   Max.   :75.003  
##                NA's   :1060                                     
##  out_of_state_tuition out_of_state_total
##  Min.   : 0.48        Min.   : 1.376    
##  1st Qu.: 9.80        1st Qu.:11.460    
##  Median :17.75        Median :23.813    
##  Mean   :20.75        Mean   :27.211    
##  3rd Qu.:29.32        3rd Qu.:39.267    
##  Max.   :59.98        Max.   :75.003    
## 

Check if there are outliers for the variables in_state_total and out_of_state_total.

boxplot(college_costs$in_state_total, plot = FALSE)$out
## numeric(0)
boxplot(college_costs$out_of_state_total, plot = FALSE)$out
## numeric(0)

Now, check if there are outliers for the variables in_state_total and out_of_state_total with respect to the variables type and degree_length.

g1 <- ggplot(college_costs, aes(type, in_state_total)) +
  geom_boxplot() +
  labs(title = "In-state Costs",
       x = "Type", 
       y = "Total In-state Costs (thousand dollars)") +
  theme_solarized() +
  facet_wrap( ~ degree_length, ncol = 1)
g2 <- ggplot(college_costs, aes(type, out_of_state_total)) +
  geom_boxplot() +
  labs(title = "Out-of-state Costs",
       x = "Type", 
       y = "Total Out-of-state Costs (thousand dollars)") +
  theme_solarized() +
  facet_wrap(~ degree_length, ncol = 1)
plot_grid(g1, g2, ncol = 2, align = "h")

Comparing the averages of the variables in_state_total and out_of_state_total for each state

Regroup the data by state_code and then compute the average costs for each state.

by_state <- college_costs %>%
  group_by(state_code)
mean_state <- summarise(by_state,
          mean_in = round(mean(in_state_total), 3),
          mean_out = round(mean(out_of_state_total), 3))

Visualize the average out-of-state costs in an ascending order filled with the average in-state costs.

ggplot(mean_state) +
  geom_bar(aes(x = reorder(state_code, mean_out), y = mean_out, fill = mean_in), 
           stat = "identity") +
  labs(title = "Average Out-of-state Costs (thousand dollars)",
       subtitle = "All Colleges",
       x = "State", 
       y = "Average Out-of-sate Costs",
       fill = "In-state") +
  scale_fill_gradient(low="lightblue", high="blue") +
  scale_y_continuous(breaks = seq(0, 60, 10)) +
  theme(axis.text.x = element_text(size = 6, angle = 70, hjust = 1))

Visualize the average costs for each state using highcharts.

highchart() %>%
  hc_add_series(data = mean_state$mean_out,
                type = "column",
                name = "Average Out-of-sate Costs",
                color = "green") %>%
  hc_add_series(data = mean_state$mean_in,
                type = "line",
                name = "Average In-state Costs",
                color = "blue") %>%
  hc_add_series(data = mean_state$mean_out - mean_state$mean_in,
                type = "line",
                name = "Difference between Out and In",
                color = "black") %>%
  hc_title(text = "<b> Average Costs (Tuition and Fees) for Colleges </b>",
           align = "center") %>%
  hc_subtitle(text = "2018 ~ 2019", align = "center") %>%
  hc_xAxis(title = list(text = "<b> State </b>"),
           categories = mean_state$state_code,
           tickInterval = 1) %>%
  hc_yAxis(title = list(text = "<b> Average Costs (thousand dollars) </b>")) %>%
  hc_legend(align = "right", verticalAlign = "top") %>%
  hc_tooltip(shared = TRUE)

Comparing the averages of the variables in_state_total and out_of_state_total in a various way and then putting them together

First, visualize the average of the variable out_of_state_total for each state and then fill with the average of the variable in_state_total using blue colors. The scale is from lighter blues to darker blues.

p1 <- ggplot(mean_state) +
  geom_bar(aes(x = state_code, y = mean_out, fill = mean_in), 
           stat = "identity", show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       subtitle = "All Colleges",
       x = "State", 
       y = "Avg Out-of-sate Costs") +
  scale_fill_gradient(low="lightblue", high="blue") +
  scale_y_continuous(breaks = seq(0, 60, 10)) +
  theme(axis.text.x = element_text(size = 6, angle = 90, vjust = 0.5))

Second, visualize the average of the variable out_of_state_total for 4 year colleges in each state and then fill with the average of the variable in_state_total using blue colors. The scale is from lighter blues to darker blues.

by_state_4year <- college_costs %>%
  filter(degree_length == "4 Year") %>%
  group_by(state_code)
mean_state_4year <- summarise(by_state_4year,
          mean_in_4year = round(mean(in_state_total), 3),
          mean_out_4year = round(mean(out_of_state_total), 3))

p2 <- ggplot(mean_state_4year) +
  geom_bar(aes(x = state_code, y = mean_out_4year, fill = mean_in_4year), 
           stat = "identity", show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       subtitle = "4 Year Colleges",
       x = "State", 
       y = "Avg Out-of-sate Costs") +
  scale_fill_gradient(low="lightblue", high="blue") +
  scale_y_continuous(breaks = seq(0, 60, 10)) +
  theme(axis.text.x = element_text(size = 6, angle = 90, vjust = 0.5))

Third, visualize the average of the variable out_of_state_total for private colleges in each state and then fill with the average of the variable in_state_total using blue colors. The scale is from lighter blues to darker blues.

by_state_priv <- college_costs %>%
  filter(type == "Private") %>%
  group_by(state_code)
mean_state_priv <- summarise(by_state_priv,
          mean_in_priv = round(mean(in_state_total), 3),
          mean_out_priv = round(mean(out_of_state_total), 3))

p3 <- ggplot(mean_state_priv) +
  geom_bar(aes(x = state_code, y = mean_out_priv, fill = mean_in_priv), 
           stat = "identity", show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       subtitle = "Private Colleges",
       x = "State", 
       y = "Avg Out-of-sate Costs") +
  scale_fill_gradient(low="lightblue", high="blue") +
  scale_y_continuous(breaks = seq(0, 60, 10)) +
  theme(axis.text.x = element_text(size = 6, angle = 90, vjust = 0.5))

Lastly, visualize the average of the variable out_of_state_total for 4 year private colleges in each state and then fill with the average of the variable in_state_total using blue colors. The scale is from lighter blues to darker blues.

by_state_4priv <- college_costs %>%
  filter(degree_length == "4 Year" & type == "Private") %>%
  group_by(state_code)
mean_state_4priv <- summarise(by_state_4priv,
          mean_in_4priv = round(mean(in_state_total), 3),
          mean_out_4priv = round(mean(out_of_state_total), 3))

p4 <- ggplot(mean_state_4priv) +
  geom_bar(aes(x = state_code, y = mean_out_4priv, fill = mean_in_4priv), 
           stat = "identity", show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       subtitle = "4 Year Private Colleges",
       x = "State", 
       y = "Avg Out-of-sate Costs") +
  scale_fill_gradient(low="lightblue", high="blue") +
  scale_y_continuous(breaks = seq(0, 60, 10)) +
  theme(axis.text.x = element_text(size = 6, angle = 90, vjust = 0.5))

Now, arrange the 4 plots above to compare. Note that the plots are filled with the average of the variable in_state_total using blue colors. The scale is from lighter blues to darker blues for larger mean in-state costs. Legends are removed to have bigger plots.

plot_grid(p1, p2, p3, p4, ncol = 2, labels = "AUTO", align = "h")

Comparing the averages of the variables in_state_total and out_of_state_total for all colleges with respect ot the variables type and degree_length

by_type_degree <- college_costs %>%
  group_by(type, degree_length) 
mean_type_degree <- by_type_degree %>%
  summarise(mean_in_type_degree = round(mean(in_state_total), 3),
            mean_out_type_degree = round(mean(out_of_state_total), 3))

g3 <- ggplot(mean_type_degree, aes(type, mean_in_type_degree)) +
  geom_bar(aes(fill = type), stat = "identity", show.legend = FALSE) +
  labs(title = "Average In-state Costs (thousands)",
       x = "Type", 
       y = "Average In-sate Costs") +
  theme_solarized() +
  facet_wrap( ~ degree_length, nrow = 2)
g4 <- ggplot(mean_type_degree, aes(type, mean_out_type_degree)) +
  geom_bar(aes(fill = type), stat = "identity", show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       x = "Type", 
       y = "Average Out-of-sate Costs") +
  theme_solarized() +
  facet_wrap( ~ degree_length, nrow = 2)
plot_grid(g3, g4, ncol = 2, align = "h", labels = c("E", "F"))

Focusing on the averages of the variable out_of_state_total for 4 year colleges

Regroup the 4 year colleges by state_code and type and then compute the average costs. Visualize the average costs over states with respect to type and then animate the plot. If you move your mouse over the plot, the play button will be appeared.

mean_state_4type <- college_costs %>%
  filter(degree_length == "4 Year") %>%
  group_by(state_code, type) %>%
  summarise(mean_in_state_4type = round(mean(in_state_total), 3),
            mean_out_state_4type = round(mean(out_of_state_total), 3))

pp1 <- ggplot(mean_state_4type, aes(x = state_code, y = mean_out_state_4type)) +
  geom_bar(aes(fill = type), stat = "identity", show.legend = FALSE) +
  scale_fill_manual(values = c("salmon", "blue", "red")) +
  labs(title = "Average Out-of-state Costs (thousand dollars) for 4 Year Colleges",
       x = "State", 
       y = "Average Out-of-sate Costs") +
  theme_solarized() +
  theme(axis.text.x = element_text(size = 7, angle = 70, hjust = 1)) +
  facet_wrap(~ type, ncol = 1)

pp2 <- pp1 +
  transition_states(state_code, transition_length = 2, state_length = 1) +
  labs(subtitle = "State: {closest_state}") +
  shadow_mark()

animate(pp2 + enter_fade() + exit_fly(y_loc = 1), renderer = av_renderer(), fps=4)

Compute the averages of 4 year colleges with respect to type.

mean_4type <- college_costs %>%
  filter(degree_length == "4 Year") %>%
  group_by(type) %>%
  summarise(mean_in_4type = round(mean(in_state_total), 3),
            mean_out_4type = round(mean(out_of_state_total), 3))

mean_4type
## # A tibble: 3 x 3
##   type       mean_in_4type mean_out_4type
##   <fct>              <dbl>          <dbl>
## 1 For Profit          21.8           21.8
## 2 Private             40.3           40.3
## 3 Public              18.8           30.5

Import an image for average college costs over the years 2000~2020 from the article [1].

knitr::include_graphics("average tuition growth.png")

Focusing on the colleges in Maryland, Pennsylvania, and Virginia

Find the numbers of schools in Maryland, Pennsylvania, and Virginia

MD_PA_VA <- college_costs %>%
  filter(state_code == "MD" | state_code == "PA" | state_code == "VA")
by_type_degree <- MD_PA_VA %>%
  group_by(state_code, type, degree_length)
by_type_degree %>%
  summarise(count = n())
## # A tibble: 14 x 4
## # Groups:   state_code, type [8]
##    state_code type       degree_length count
##    <fct>      <fct>      <fct>         <int>
##  1 MD         Private    4 Year           16
##  2 MD         Public     2 Year           16
##  3 MD         Public     4 Year           13
##  4 PA         For Profit 2 Year            3
##  5 PA         For Profit 4 Year            4
##  6 PA         Private    2 Year           12
##  7 PA         Private    4 Year           84
##  8 PA         Public     2 Year           16
##  9 PA         Public     4 Year           41
## 10 VA         For Profit 2 Year            6
## 11 VA         For Profit 4 Year            5
## 12 VA         Private    4 Year           29
## 13 VA         Public     2 Year           24
## 14 VA         Public     4 Year           15

Visually, compare the average costs of attending schools in these states.

gg1 <- ggplot(MD_PA_VA) +
  geom_boxplot(aes(x = in_state_total, 
                   y = reorder(state_code, in_state_total), fill = state_code),
               show.legend = FALSE) +
  labs(title = "Average In-state Costs (thousands)",
       x = "Average In-state Costs", 
       y = "State") +
  theme_solarized()
gg2 <- ggplot(MD_PA_VA) +
  geom_boxplot(aes(x = out_of_state_total, 
                   y = reorder(state_code, out_of_state_total), fill = state_code),
               show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       x = "Average Out-of-sate Costs", 
       y = "State") +
  theme_solarized()
plot_grid(gg1, gg2, ncol = 2, labels = c("G", "H"), align = "h")

Compare the average costs of attending schools in MD, PA, and VA with respect to type.

mean_3type <- MD_PA_VA %>%
  group_by(state_code, type) %>%
  summarise(mean_in_3type = round(mean(in_state_total), 3))
gg3 <- ggplot(mean_3type, aes(state_code, mean_in_3type)) +
  geom_bar(aes(fill = state_code), stat = "identity", show.legend = FALSE) +
  labs(title = "Average In-state Costs (thousands)",
       subtitle = "4-year Colleges",
       x = "State", 
       y = "Average In-sate Costs") +
  theme_solarized() +
  facet_wrap( ~ type)
mean_3type_4year <- MD_PA_VA %>%
  filter(degree_length == "4 Year") %>%
  group_by(state_code, type) %>%
  summarise(mean_in_3type_4year = round(mean(in_state_total), 3))
gg4 <- ggplot(mean_3type_4year, aes(state_code, mean_in_3type_4year)) +
  geom_bar(aes(fill = state_code), stat = "identity", show.legend = FALSE) +
  labs(title = "Average Out-of-state Costs (thousands)",
       subtitle = "4-year Colleges",
       x = "State", 
       y = "Average Out-of-sate Costs") +
  theme_solarized() +
  facet_wrap( ~ type)
plot_grid(gg3, gg4, ncol = 2, labels = c("I", "J"), align = "h")


Essay

My son is going to a college this fall. When searching and visiting colleges, Maryland has very few 4-year colleges or universities compared to Pennsylvania and Virginia. Tuition and fees were a big part of selecting colleges to apply. That is why I was interested in college tuition and fees and have chosen this dataset.

The dataset “Tuition and fees by college university for 2018-2019.csv” was obtained from the site https://www.kaggle.com/jessemostipak/college-tuition-diversity-and-pay?select=tuition_cost.csv The site says that the data originally came from the US Department of Education and the poster has filtered it down to a few tables as seen in the dataset. The dataset has 10 columns and 2973 rows. The columns are name (college names), state, state_code, type (For Profit, Other, Private, Public), degree_length (2 Year, 4 Year, Other), room_and_board, in_state_tuition, in_state_total (in_state_tuition plus room_and_board), out_of_state_tuition, out_of_state_total (out_of_state_tuition plus room_and_board). There were 55 state codes, so I removed the territories of USA to deal with only 50 state codes and DC. There was one “Other” in each of the variables type and degree_length. I filtered the data to find the universities with the character values “Other”. They were from the same university, so I changed the values “Other” by appropriate information after searching the university. I converted the classes of the character variables (except name) into factors and converted costs into thousand dollars for the numerical variables.

The average in-state total cost for all colleges is $23115 and the average out-of-state total cost is $27211. The difference is about $5000. But the difference will be more than $10000 for 4-year public colleges. There are no outliers for the variables in_state_total and out_of_state_total. But the two variables have outliers for 2-year colleges regardless of type and 4-year public colleges after regrouping the variables by type and degree length. There is a clear correlation between the averages of the two variables and the top 9 highest averages in the two variables are located in the northeast of the country.

According to Plots A, B, C and D, the averages of the two variables for 4-year colleges, private colleges, and 4-year private colleges have a similar trend and are approximately related to the average of all colleges. Plots E and F show that there is no difference between the two variables for for-profit or private colleges, but for public colleges the average out-of-state costs are $5000 and $12000 more than the average in-state costs for 2-year and 4-year colleges, respectively.

The data shows that the average in-state cost and the average out-of-state cost for 4-year public colleges are $18815 and $30513, respectively and the average cost for 4-year private colleges is $40300. The chart “Average Tuition Growth among National Universities 2000-2020” from the article [1] says that the corresponding costs are $12000, $27000, and $42000. In this article [1], the chart shows 20 years of tuition changes, as reported to U.S. News by the 381 ranked National Universities included in the recently released 2020 Best Colleges rankings. There are differences between the two analyses. I do not know why, but their resources are different. Anyway, the chart from the article [1] indicates that the average cost of tuition and fees for private and public National Universities has risen significantly since the late 1990s for both in-state and out-of-state students. The article [2] says that in the last 10 years, costs were increased by roughly 25.3% at private colleges and about 29.8% at public colleges. But, from the 2008 school year to the 2018 school year, 41 states spent less per student, after adjusting for inflation. During that time period, states spent an average of 13% less per student - about $1220. The article [3] says that from 1989-90 to 2019-20, average tuition and fees tripled at public four-year and more than doubled at public two-year and private four-year institutions, after adjusting for inflation. From 1988 to 2018, families with incomes in the bottom 20% saw their incomes rise 12%, compared to income increases of 51% for families in the top 20%.

Now, let’s focus on the three states MD, PA, and VA. I was looking for 4-year private colleges for my kid because costs attending private colleges after financial aids are a little more than those in 4-year public out-of-state colleges. Look at the numbers of 4-year private colleges in three states: MD=16, PA=84, and VA=29. MD has very few colleges compared to PA and VA regardless of type or degree length. The boxplots G and H were arranged in an ascending mean order. Within the IQR, MD has the most various college costs and PA has the least variability in college costs. Relatively, PA is the most expensive state to attend a college and VA is the least expensive state to attend a college. As I was interested in 4-year private colleges, Plots I and J imply that MD is the most expensive state to attend a 4-year private college and VA is the least expensive state to attend a 4-year private college.

References:
[1] https://www.usnews.com/education/best-colleges/paying-for-college/articles/2017-09-20/see-20-years-of-tuition-growth-at-national-universities
[2] https://www.cnbc.com/2019/12/13/cost-of-college-increased-by-more-than-25percent-in-the-last-10-years.html
[3] https://research.collegeboard.org/trends/college-pricing/highlights


The END