Question 1: Read the data into R and turn it into a tidy dataset. Answer: When starting this first question, first and foremost, I need to run library(tidyverse), then I ran library(readxl). Then to have R read the dataset I started to run the code. First I wanted to refer to the dataset as tuition in all future coding, then I told R to use the function read_excel to do two things. 1. I wanted to tell R that I want to read a new dataset, and 2. I wanted to tell R that it is reading a dataset in excel. From there I told R the path to the file. Which for me is Dropbox -> DATA101 -> Data -> Raw -> us-avg-tuition.xlsx. This is slightly different than Canvas due to the technology issues this week. I had to rename the file to have dashes instead of underscores so my system didn’t flag it as a duplicate. By running these lines of code, I added my new dataset in my environment as tuition. Once I did those first few steps, I used my new title “tuition” to tell R that everything that follows is relating to my us-avg-tuition dataset. Then, I need to clean the data because right now it spread across many columns, which makes coding more difficult than is needed. I use the pivot_long function because the dataset has one variable (year) across multiple columns. Instead of listing out all eleven years in the dataset, I can use a semicolon to tell R the data includes all years starting in 2004 and ending in 2015. Then I can tell my R the names and values.

Question 2: Calculate the average tuition in each state across all years in the data, and sort it from highest to lowest. Do you notice any patterns? Answer: First thing that I did to answer this question was I gave my tidy a name. I chose to name my tidy data: avg_st_tuition; this was in attempt to be able to remember which variable it was and remember how I got it throughout my assignment. Then, to organize the data, I grouped my data by state, allowing for the first column of data on the table. Once I had that ready to go, it was time to find the mean for each state. This was completed in a summarise line with the following mathematical equation: avg_state_tuition = mean(tuition). Then, I arranged my data to display states starting with the highest tuition average. Finally, in order to be able to see the table, I had to use the print function. Analysis: A pattern I noticed was that the closer to the east coast you attend school, the higher likelihood you will live in a state with a higher tuition average. Perhaps, this is because eight Ivy League schools are on the east coast. In fact, we can gather more observations to test this hypothesis. Pennsylvania, home of The University of Pennsylvania, has the 4th highest tuition average. New Jersey places #3 which is home to Princeton University. Massachusetts is 9th and is home to Harvard University. This observation is only one of the many factors that make up tuition costs and consequently, the averages per state. We cannot give this observation full blame for 3 home states to Ivy League schools being in the top 10; however, it is a pattern worth noting.

avg_st_tuition <- tuition_long%>%
  group_by(State)%>%
  summarise(avg_state_tuition = mean(tuition))%>%
  arrange(-avg_state_tuition)
print(avg_st_tuition)
## # A tibble: 50 × 2
##    State          avg_state_tuition
##    <chr>                      <dbl>
##  1 Vermont                   13067.
##  2 New Hampshire             12781.
##  3 New Jersey                12054.
##  4 Pennsylvania              11970.
##  5 Illinois                  11228.
##  6 Michigan                  10477.
##  7 South Carolina            10377.
##  8 Delaware                  10099.
##  9 Massachusetts             10058.
## 10 Ohio                       9942.
## # ℹ 40 more rows

Question 3: Plot the average tuition in each state using a bar chart, and arrange it from highest to lowest. HINT: you’ll need to read the documentation for geom_bar to figure out what options to use. And to arrange the states in your plot, you’ll need to look up the reorder option in ggplot. Answer 3:First, I told R what data I would like it to pull from, in this case avg_st_tuition, then I began coding to create a bar graph. Starting with ggplot()+, I made sure to reorder my variable so it went in descending order, then continued with the rest of my aesthetics. I added the red on the end to make it easier to read since there are 50 lines of data on one graph. Then, using my x and y labs, I named the variable on the graph, then did a ggtitle, which gave the graph the title at the top. To make the graph more legible, and not have the names of the states overlapping on each other, I used theme(axis.text.x = element_text(angle = 90, hjust = 1)). This made the name of states easy to read because of their vertical orientation. Data analysis: Based on the graph, we can see that Wyoming has the lowest tuition average out of the 50 states with Vermont being the highest tuition average. I notice relationship between the average tuition and the state’s political affiliation. Most of the republican states are on the left of the graph, meaning a significant amount of republican states have lower average tuition costs. Against my preconceived notions, I was expecting to see a relation between state coastal location and the average tuition. However, there does not appear to be a relation between these two variables. Of course, these observations could be irrelevant because tuition averages are a accumulation of multiple factors that is beyond the scope of our dataset, but these patterns could also, potentially, have significant merit in predicting future tuition averages.

avg_st_tuition %>%
  ggplot() +
  geom_bar(aes(x = reorder(State, -avg_state_tuition), y = avg_state_tuition), stat = "identity") +
  stat_summary(fun = "mean", geom = "point", aes(x = reorder(State, -avg_state_tuition), y = avg_state_tuition), color = "red", size = 3) +
  xlab("State") +
  ylab("Average State Tuition") +
  ggtitle("Average State Tuition Across 11 Years")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

Question 4: Does the average tuition cost decrease for any state? Which states have the largest/smallest increases over this period? To adjust for the fact that states have very different tuition costs in 2004, calculate this is terms of percentage change. HINT: Thank about how arrange() combined with first() and last() would be helpful to calculating this change. Answer: To begin this question, I told R that I wanted to get the percentage. To do this I first made my new variable, avg_state_tuition_pct. Then, I told R to summarise all of the values, then calculated the percentage. My final steps were to arrange the table and print the table. Data Analysis: The state that decreased in the tuition average was Ohio with -1.8% (-1.754094% not rounded percentage). What this means from 2004 to 2015 the average cost of tuition in Ohio went down in eleven years. This is good information for data analyst to be able to predict tuition in the future in this state. Maryland had the smallest increase in their tuition average with a 7.4% (7.417266% not rounded). The state with the largest increase in the past eleven years was Hawaii with an increase of 138.5% (138.477470% not rounded). We can constitute these increases, and Ohio’s decrease, to the changes in political leadership, student needs, or other factors. These factors are only speculation, but could hold some merit.

avg_state_tuition_pct <- tuition_long %>%
  group_by(State)%>%
  summarise(
    starting_tuition = first(tuition),
    current_tuition = last(tuition),
    avg_state_tuition_pct = ((last(tuition) - first(tuition)) / first(tuition)) *100) %>%
  arrange(-avg_state_tuition_pct)
print(avg_state_tuition_pct, n = 50) 
## # A tibble: 50 × 4
##    State          starting_tuition current_tuition avg_state_tuition_pct
##    <chr>                     <dbl>           <dbl>                 <dbl>
##  1 Hawaii                    4267.          10175.                138.  
##  2 Colorado                  4704.           9748.                107.  
##  3 Arizona                   5138.          10646.                107.  
##  4 Georgia                   4298.           8447.                 96.5 
##  5 Nevada                    3621.           6667.                 84.1 
##  6 Louisiana                 4453.           7871.                 76.8 
##  7 California                5286.           9270.                 75.4 
##  8 Alabama                   5683.           9751.                 71.6 
##  9 Tennessee                 5426.           9263.                 70.7 
## 10 Kentucky                  5640.           9567.                 69.6 
## 11 Virginia                  7030.          11819.                 68.1 
## 12 Oklahoma                  4454.           7450.                 67.2 
## 13 Washington                6192.          10288.                 66.2 
## 14 Florida                   3848.           6360.                 65.3 
## 15 Illinois                  8183.          13189.                 61.2 
## 16 Kansas                    5345.           8530.                 59.6 
## 17 West Virginia             4575.           7171.                 56.7 
## 18 North Carolina            4493.           6973.                 55.2 
## 19 Utah                      4125.           6363.                 54.2 
## 20 Rhode Island              7476.          11390.                 52.4 
## 21 Alaska                    4328.           6571.                 51.8 
## 22 Michigan                  7931.          11991.                 51.2 
## 23 Idaho                     4525.           6818.                 50.7 
## 24 New Hampshire            10188.          15160.                 48.8 
## 25 South Dakota              5479.           8055.                 47.0 
## 26 Connecticut               7984.          11397.                 42.8 
## 27 Texas                     6395.           9117.                 42.6 
## 28 Oregon                    6579.           9371.                 42.4 
## 29 Mississippi               5029.           7147.                 42.1 
## 30 South Carolina            8330.          11816.                 41.8 
## 31 Delaware                  8353.          11676.                 39.8 
## 32 Arkansas                  5772.           7867.                 36.3 
## 33 Maine                     7058.           9573.                 35.6 
## 34 Vermont                  11067.          14993.                 35.5 
## 35 Wisconsin                 6575.           8815.                 34.1 
## 36 Minnesota                 8144.          10831.                 33.0 
## 37 North Dakota              5804.           7688.                 32.5 
## 38 New Jersey               10054.          13303.                 32.3 
## 39 Massachusetts             8863.          11588.                 30.7 
## 40 New Mexico                4926.           6355.                 29.0 
## 41 Pennsylvania             10394.          13395.                 28.9 
## 42 Nebraska                  5947.           7608.                 27.9 
## 43 Indiana                   7368.           9120.                 23.8 
## 44 New York                  6235.           7644.                 22.6 
## 45 Wyoming                   4086.           4891.                 19.7 
## 46 Iowa                      6813.           7877.                 15.6 
## 47 Missouri                  7477.           8564.                 14.5 
## 48 Montana                   5630.           6351.                 12.8 
## 49 Maryland                  8531.           9163.                  7.42
## 50 Ohio                     10378.          10196.                 -1.75
  1. Now consider which states have the largest/smallest increases in absolute value (dollar amount) over this period. Are the states with the highest percent change the same as the states with the highest absolute change? Answer: Once coding the percentage change, coding the absolute value was easy because just change all of the percentage or absolute. Data Analysis: The state with the highest absolute value is the same has when we calculated the percentage chance. Hawaii’s average tuition increased by $5908.19 and Maryland is the smallest increase at 632.73 dollars. Ohio remained the only state to have decreased the tuition average with a total of 182.04. Note: There is the same top three highest tuition averages (Colorado and Arizona swapping places), there is differences in the states in the top 10 highest increase in tuition averages.
avg_state_tuition_abs <- tuition_long %>%
  group_by(State)%>%
  summarise(
    starting_tuition = first(tuition),
    current_tuition = last(tuition),
    avg_state_tuition_abs = last(tuition) - first(tuition)) %>%
  arrange(-avg_state_tuition_abs)
print(avg_state_tuition_abs, n = 50) 
## # A tibble: 50 × 4
##    State          starting_tuition current_tuition avg_state_tuition_abs
##    <chr>                     <dbl>           <dbl>                 <dbl>
##  1 Hawaii                    4267.          10175.                 5908.
##  2 Arizona                   5138.          10646.                 5508.
##  3 Colorado                  4704.           9748.                 5044.
##  4 Illinois                  8183.          13189.                 5006.
##  5 New Hampshire            10188.          15160.                 4972.
##  6 Virginia                  7030.          11819.                 4789.
##  7 Georgia                   4298.           8447.                 4149.
##  8 Washington                6192.          10288.                 4096.
##  9 Alabama                   5683.           9751.                 4068.
## 10 Michigan                  7931.          11991.                 4060.
## 11 California                5286.           9270.                 3984.
## 12 Kentucky                  5640.           9567.                 3927.
## 13 Vermont                  11067.          14993.                 3926.
## 14 Rhode Island              7476.          11390.                 3914.
## 15 Tennessee                 5426.           9263.                 3838.
## 16 South Carolina            8330.          11816.                 3486.
## 17 Louisiana                 4453.           7871.                 3418.
## 18 Connecticut               7984.          11397.                 3414.
## 19 Delaware                  8353.          11676.                 3323.
## 20 New Jersey               10054.          13303.                 3249.
## 21 Kansas                    5345.           8530.                 3185.
## 22 Nevada                    3621.           6667.                 3046.
## 23 Pennsylvania             10394.          13395.                 3001.
## 24 Oklahoma                  4454.           7450.                 2995.
## 25 Oregon                    6579.           9371.                 2793.
## 26 Massachusetts             8863.          11588.                 2725.
## 27 Texas                     6395.           9117.                 2722.
## 28 Minnesota                 8144.          10831.                 2688.
## 29 West Virginia             4575.           7171.                 2596.
## 30 South Dakota              5479.           8055.                 2576.
## 31 Maine                     7058.           9573.                 2515.
## 32 Florida                   3848.           6360.                 2512.
## 33 North Carolina            4493.           6973.                 2480.
## 34 Idaho                     4525.           6818.                 2294.
## 35 Alaska                    4328.           6571.                 2243.
## 36 Wisconsin                 6575.           8815.                 2240.
## 37 Utah                      4125.           6363.                 2237.
## 38 Mississippi               5029.           7147.                 2118.
## 39 Arkansas                  5772.           7867.                 2095.
## 40 North Dakota              5804.           7688.                 1884.
## 41 Indiana                   7368.           9120.                 1752.
## 42 Nebraska                  5947.           7608.                 1661.
## 43 New Mexico                4926.           6355.                 1429.
## 44 New York                  6235.           7644.                 1409.
## 45 Missouri                  7477.           8564.                 1087.
## 46 Iowa                      6813.           7877.                 1064.
## 47 Wyoming                   4086.           4891.                  805.
## 48 Montana                   5630.           6351.                  721.
## 49 Maryland                  8531.           9163.                  633.
## 50 Ohio                     10378.          10196.                 -182.

Question 5: Let’s get a closer look at a few states and their tuition trends. Explore the data and choose five states that have an interesting pattern. Justify why you chose these states. Answer: I chose the top 5 states with the highest increase in tuition when I calculated the absolute value. I felt this would be interesting to look at because I wanted to see how close each state was to each other’s increase tuition average. If I were a data analyst, I would also look at the lowest 5 to see if there is a relationship between the top and bottom 5. For now, I looked at the top 5 to yield comparisons between.

  1. Next, plot each state’s tuition change over time using a line graph. What can you conclude from the graph? Answer: From the graph, I can conclude that while New Hampshire appears to be more expensive, its slope is not has drastic as Hawaii, showing that while Hawaii was less expensive in 2004, the state has seen a higher climb in tuition costs.
top_five_avg_state_tuition <- tuition_long %>%
  filter(State %in% c("Hawaii", "Arizona", "Colorado", "Illinois", "New Hampshire"))

top_five_avg_state_tuition$year <- as.numeric(top_five_avg_state_tuition$year)

ggplot(top_five_avg_state_tuition, aes(x = year, y = tuition, color = State))+
  geom_line()

BONUS QUESTION: Show the tuition over time for all 50 states all in one image, with each state in its own little graph. Make the line for the state that you are from (or pick a random state if you’re not from the US) a different color than all the other states. Answer: I am from New Jersey, so that is the red state on the graph below.

ggplot(tuition_long, aes(x=year, y= tuition, color = State))+
geom_line()+
scale_color_manual(values = c("New Jersey" = "red")) +
labs(title = "National Tuition Through 2004-2015", x="Year", y= "Tuition") +
facet_wrap(~State, nrow = 5)+
theme_minimal()+
theme(axis.text.x = element_text(angle = 90, hjust = 0.5),
strip.text.x = element_text(size=6))
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?
## `geom_line()`: Each group consists of only one observation.
## ℹ Do you need to adjust the group aesthetic?