DATA 607 - Project 2

Class Rank and Housing Choice

For this project we made a .csv file from the image posted in Week 5 by Donghwan Kim and saved it to my GitHub repo here. The .csv file was imported into R and saved as a data frame called RankHousing.

fileURL <- "https://raw.githubusercontent.com/douglasbarley/DATA607/master/ClassRankAndHousing.csv"
RankHousing <- read.csv(fileURL, header = TRUE)

names(RankHousing) <- c("State_of_residence","Remove_column","Class_rank","Off_campus","On_campus","Total")

glimpse(RankHousing)

## Rows: 10
## Columns: 6
## $ State_of_residence <chr> "State of residence", "In state", "", "", "Out o...
## $ Remove_column      <chr> "", "Class Rank", "", "Total", "Class Rank", "",...
## $ Class_rank         <chr> "", "Underclassman", "Upperclassman", "", "Under...
## $ Off_campus         <chr> "Off-campus", "58", "108", "166", "13", "39", "5...
## $ On_campus          <chr> "On-campus", "110", "7", "177", "30", "2", "32",...
## $ Total              <chr> "Total", "168", "115", "283", "43", "41", "84", ...

Tidying the data

We wanted to remove unnecessary rows and columns from the table, fill missing values forward with the values that are already in the table for state of residence, and pivot the table longer to make the housing choice a single column with two possible values (off campus, on campus). All numbers are stacked in a single “count” column.

RankHousing <- RankHousing[-c(1,4,7,8,9,10),] %>% # remove unnecessary rows
  select(!c(Remove_column)) %>%                   # remove unnecessary column
  replace_with_na_all(condition = ~.x == "") %>%  # replace null strings with NA values
  fill(State_of_residence) %>%                    # forward fill values into NA values
  mutate(Off_campus = as.integer(Off_campus),
         On_campus = as.integer(On_campus),
         Total = as.integer(Total)) %>%
  pivot_longer(`Off_campus`:`On_campus`, names_to = "Housing_pref", values_to = "Num", values_drop_na = TRUE)

RankHousing$State_of_residence <- sub(" ", "_", RankHousing$State_of_residence) # run for first space in the values
RankHousing$State_of_residence <- sub(" ", "_", RankHousing$State_of_residence) # rerun for second space in the values


RankHousing <- RankHousing %>%
  select(State_of_residence,Class_rank,Housing_pref,Num,Total) %>%
  mutate(RankHousing, Pct = Num / Total)

RankHousing

## # A tibble: 8 x 6
##   State_of_residence Class_rank    Housing_pref   Num Total    Pct
##   <chr>              <chr>         <chr>        <int> <int>  <dbl>
## 1 In_state           Underclassman Off_campus      58   168 0.345 
## 2 In_state           Underclassman On_campus      110   168 0.655 
## 3 In_state           Upperclassman Off_campus     108   115 0.939 
## 4 In_state           Upperclassman On_campus        7   115 0.0609
## 5 Out_of_state       Underclassman Off_campus      13    43 0.302 
## 6 Out_of_state       Underclassman On_campus       30    43 0.698 
## 7 Out_of_state       Upperclassman Off_campus      39    41 0.951 
## 8 Out_of_state       Upperclassman On_campus        2    41 0.0488

Analysis of housing choice by class rank

We want to analyze the change in a student’s housing choice by residency status as class rank increases. So we should visualize the changes from underclassman housing preferences to upperclassman housing preferences.

ggplot(RankHousing) + geom_col(aes(x= State_of_residence, y = Num, fill = Housing_pref)) + facet_wrap(~Class_rank)

Conclusion

In terms of raw numbers we see that in-state underclassmen have a greater preference for living off campus than out-of-state underclassmen, while at the upperclassman level there appears to be a roughly equal preference for living off campus. The data table confirms that 34.5% of in-state underclassmen would prefer to live off campus compared to 30.2% of out-of-state underclassmen. The table also confirms that 93.9% of in-state upperclassmen and 95.1% of out-of-state upperclassmen would prefer to live off campus.

DATA 607 - Project 2

Douglas Barley, Rachel Greenlee, and Atina Karim

10/3/2020

Class Rank and Housing Choice

Tidying the data

Analysis of housing choice by class rank

Conclusion