Question 1: Read the data into R and turn it into a tidy dataset.
Question 2: Calculate the average tuition in each state across all years in the data, and sort it from highest to lowest. Do you notice any patterns? Answer: Tuition in the Northeast seems to be the most expensive.
AvgTuition <- tuitiontidy %>%
group_by(State) %>%
summarise(State_Avg = mean(Tuition)) %>%
arrange(desc(State_Avg))
ggplot(AvgTuition) +
geom_bar(aes(x = reorder(State, State_Avg), y = State_Avg),
stat = "identity", position = "dodge") +
coord_flip()
Question 4: Does the average tuition cost decrease for any state? Which states have the largest/smallest increases over this period? To adjust for the fact that states have very different tuition costs in 2004, calculate this in terms of percentage change. Hint:Think about how arrange() combined with first() and last() would be helpful to calculating this change. Answer: Yes, the average tuition cost decreases for Ohio. Hawaii has the largest tuition cost increase and Maryland has the smallest increase.
Perc_Change <- tuitiontidy %>%
arrange(State, Year) %>%
group_by(State) %>%
mutate(TCP = ((last(Tuition) - first(Tuition)))/first(Tuition) * 100) %>%
summarise(Tuition_Perc_Change = mean(TCP)) %>%
arrange(desc(Tuition_Perc_Change))
Question 4a: Now consider which states have the largest/smallest increases in absolutevalue (dollar amount) over this period. Are the states with the highest percentchange the same as the states with the highest absolute change? Answer: They are not the same. Hawaii, Colorado, Arizona, Georgia, and Nevada had the highest percent change, but Hawaii, Arizona, Colorado, Illinois, and New Hampshire have the highest absolute value change.
Dollar_Change <- tuitiontidy %>%
arrange(State, Year) %>%
group_by(State) %>%
mutate(TDC = last(Tuition) - first(Tuition)) %>%
summarise(Tuition_Dollar_Change = mean(TDC)) %>%
arrange(desc(Tuition_Dollar_Change))
Question 5: Let’s get a closer look at a few states and their tuition trends. Explore the data and choose five states that have an interesting pattern. Justify why you chose these states. Answer: I chose to look at two states with a high percentage of change in cost, one that was about in the middle and two that were low or negative percentages of change. I want to get a sense of whether or not the percentages of increase in priuces were due to low starting rate and trying to get on par with tuitions at other institutions or not.
Intrest_States <- tuitiontidy %>%
filter(State %in% c("Hawaii", "Colorado", "Idaho", "Maryland",
"Ohio"))
Question 5a: Next, plot each state’s tuition change over time using a line graph. What can you conclude from the graph? Answer: The states that have the highest percentage of increased tuition, Hawaii and Colorado, started out with fairly lowest tuitions, but their drastic increase still does not put them at the top of the list for average price of tuition. They now fall just below the overall average cost of tuition. Idaho, which was my median average percentage change is still pretty low and far away from the cost of the overall average tuition. Where as Maryland and Ohio, which were my low and negative percentage of increased tuition are still holding tuition prices that are slightly highet than the overall average tuition.
ggplot(Intrest_States) +
geom_line(aes(x = Year, y = Tuition, group = State)) +
facet_wrap(~ State) +
theme(axis.text.x = element_text(angle = 90))
Bonus Question: Show the tuition trends over time for all 50 states all in one image, with each state in its own little graph. Make the line for the state that you are from (or pick a random state if you’re not from the US) a different color than all the other states.
ggplot(tuitiontidy) +
geom_line(aes(x = Year, y = Tuition, group = State, color = ifelse(State == "Pennsylvania", 'Home', 'Other'))) +
facet_wrap(~ State) +
theme(axis.text.x = element_text(angle = 90), legend.position = "none")