IPEDS_Dash.knit

## .center[Developing a Shiny Dashboard]
## .center[Using IPEDS, Census, and BLS Data]

### Mark Perkins, Ph.D.
### Jonathan W. Carrier, Ph.D.
#### .center[University of Wyoming]

---
# Background
- This presentation provides an overview of how to create a dashboard with R using IPEDS, Census, and BLS Data
- We combine these three data sources to generate an interactive dashboard
- Here is a link to the dashboard 
- <https://ipedsdash.shinyapps.io/demonstration/>
- Here is a link to all the code
- <https://rpubs.com/IPEDS>
## Here I'll provide an overview of what it does then how it was made

---
# Getting IPEDS Data
## Link to Access file
- Download the ipeds() package and load the library
- Download the RODBC() package and load the library
- Link to the Access file from which you want to pull data
- Ready to go!

```r
library(RODBC)
library(ipeds)
#Get IPEDS DATA
IPEDSDatabase <- odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/Users/mperk/OneDrive/Dashboard/2022 Work/IPEDS201920.accdb")
```

---
# Getting the IPEDS Data
## Locating What you Want
- IPEDS has A LOT of data 
- I spent hours analyzing the keys to figure out what I want
- The best thing to do is open the Access file and read two tables (YY means
IPEDS year):
- vartableYY
- valuesetsYY
## This is when I open Access and Show You!

---
# Data Munging
- There were a lot of fields to munge on this project
- I have linked my code here 
- <https://rpubs.com/IPEDS>
- I will go over a few fundamentals on building the data
- Then I well get to the development of the dashboard
## This is where I go to my RPubs page

---
# Instituion Information
- This required me to write custom query code using the tidyverse to gather those tables
- I then custom coded variables to my needs

```r
institutioninformation <-  sqlFetch(IPEDSDatabase, "HD2019") %>%
subset(CONTROL == 1) %>%
select(UNITID, COUNTYCD, STABBR, INSTNM, IALIAS, F1SYSNAM, LONGITUD, LATITUDE, 
       LOCALE, C18SZSET)%>%
mutate(locale = case_when(LOCALE == 11 ~ "City",
             LOCALE == 12 ~ "City",
             LOCALE == 13 ~ "City",
             LOCALE == 21 ~ "Suburb",
             LOCALE == 22 ~ "Suburb",
             LOCALE == 23 ~ "Suburb",
             LOCALE == 31 ~ "Town",
             LOCALE == 32 ~ "Town",
             LOCALE == 33 ~ "Town",
             LOCALE == 41 ~ "Rural",
             LOCALE == 42 ~ "Rural",
             LOCALE == 43 ~ "Rural",
             LOCALE == -3 ~ "Unknown"))
```
---

# Other Information
## Several other pieces were gathered including
- Enrollment Information
- Graduation information (100%, 150%, 200%)
- Persistence (PT, FT)
## Changed variable names so they made sense
## Hard programmed certain variables
## Joined all the tables to one IPEDS table

---
## Enrollment and Graduation
- Just a little chunk of code on enrollment and graduation

```r
#Get Institution Enrollment from EF Table and Keep all Variables
enrollmentinformationrace <-  sqlFetch(IPEDSDatabase, "EF2019A")
enrollmentinformationrace <-  subset(enrollmentinformationrace, EFALEVEL == 1)
enrollmentinformationrace <-  enrollmentinformationrace %>%
 select(UNITID, EFAIANT, EFASIAT, EFBKAAT, EFHISPT,
        EFNHPIT, EFNRALT, EFUNKNT, EF2MORT, EFWHITT)

#Get Institution Enrollment from EF Table and Keep all Variables
enrollmentinformationgender <-  sqlFetch(IPEDSDatabase, "EF2019")
enrollmentinformationgender <-  subset(enrollmentinformationgender, EFLEVEL == 10)

#Get CC Graduation Data fro GR Table and Reduce to Desired Variables

graduationinformation <-  sqlFetch(IPEDSDatabase, "GR200_19")
graduationinformation <-  graduationinformation %>%
 select(UNITID, L4GR100, L4GR150, L4GR200)

#Get Four Graduation Data fro GR Table and Reduce to Desired Variables
universitygrad <-  sqlFetch(IPEDSDatabase, "GR200_19")
universitygrad <-  universitygrad %>%
 select(UNITID, BAGR100, BAGR150, BAGR200)
```

---
# Joining and renaming
- Here I show how I joined the data tables and began to set out to rename them
so they were understandable
- It was very useful to keep a data dictionary as I went

```r
#Join to one master file for IPEDS
ipedsdashdata <-  left_join(institutioninformation, enrollmentinformationgender, by = "UNITID")
ipedsdashdata <-  left_join(ipedsdashdata, enrollmentinformationrace, by = "UNITID")
ipedsdashdata <-  left_join(ipedsdashdata, graduationinformation, by = "UNITID")
ipedsdashdata <-  left_join(ipedsdashdata, universitygrad, by = "UNITID")
ipedsdashdata <-  left_join(ipedsdashdata, retentioninformation, by = "UNITID")
ipedsdashdata <-  left_join(ipedsdashdata, costinformation, by = "UNITID")
ipedsdashdata<- left_join(ipedsdashdata, transferinformation, by = "UNITID")

#Rename variable
ipedsdashdata <-  ipedsdashdata %>%
  rename("Institution" = INSTNM)
```

---
# Custom variables
- Here I show how I calculate proportions
- There is a more efficient way to pipe this

```r
# Calculate gender enrollment rates
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Women" = (Tot_Women / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_FT" = (Tot_Full_Time / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Am_Indian" = (Am_Indian / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Asian" = (Asian / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Black" = (Black / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Hispanic" = (Hispanic / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Hawaii_PI" = (Hawaii_PI / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Non_Resident" = (Non_Resident / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Unknown" = (Unknown / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_Two_or_More" = (Two_or_More / Tot_Enrolled)*100)
ipedsdashdata <-  ipedsdashdata %>%
 mutate("Percent_White" = (White / Tot_Enrolled)*100)
```

---
## Alas I wrote my final data set to a .csv file
- Later I link .csv files to Shiny, but it's not really necessary to do that
- I also formatted the county code to match the Census county code for joining
- Hence the spriinttf("%05d")
- That converts it to 5 digit county code with two leading zeros

```r
ipedsdashdata$COUNTYCODE  <-  sprintf("%05d", ipedsdashdata$COUNTYCD)
write.csv(ipedsdashdata, "ipedsdashdata.csv")
```

---
## Getting United States Census Bureau Data
### Get your Census API Key and use tidycensus!
- Get your API key here: <https://api.census.gov/data/key_signup.html>
- Here is more information about that: <https://www.census.gov/data/developers/guidance/api-user-guide.html>
- Download the tidycensus package <https://walker-data.com/tidycensus/>
- Then run the code with your key
- Your key goes in the quotes; I suppressed mine

```r
library(tidycensus)
census_api_key("")
```

---
## You can download the variable key
- Then get to work building your dataset
- See RMarkdown link for full code
- Here I grab income data using tidycensus

```r
key  <-  load_variables(2019, "acs5", cache = TRUE)

write.csv(key, "censuskey.csv")

#Estimated median household income in past 12 months, inflation adjusted 2015
income  <-  get_acs(geography = "county", variables = c(MedIncome = "B19013_001"))
income <- rename.variable(income, "estimate", "Median_Household_Income")
income <-  select(income, GEOID, Median_Household_Income)
```

---

## Health insurance population
- This shows how we got the total population data
- This will be used to calculate the proportion of folks with health insurance

```r
#Total in Population

HealthPro <- get_acs(geography = "county", variables = c(White = "C27001A_005", Black = "C27001B_005", Native = "C27001C_005", Asian = "C27001D_005", Pacific = "C27001E_005", Other = "C27001F_005", TwoMore = "C27001G_005", WhiteNot = "C27001H_005", Hispanic = "C27001I_005"))
HealthPro <-  HealthPro %>%
 select(GEOID, NAME, variable, estimate)
HealthPro <-  HealthPro %>%
 spread(variable, estimate)
HealthPro <-  HealthPro %>%
 replace_na(list(White = 0, Black = 0, Native = 0, Asian = 0, Pacific = 0, Other = 0, TwoMore = 0, WhiteNot = 0, Hispanic = 0))
HealthPro <-  HealthPro %>%
 mutate("HPro" = White + Black + Native + Asian + Pacific + Other + TwoMore + WhiteNot + Hispanic)
```

---
## Those with health insurance
- This calculates those with health insurance
- Similar programming, different census fields

```r
#Total with health

HealthTot <- get_acs(geography = "county", variables = c(Whitet = "C27001A_007", Blackt = "C27001B_007", Nativet = "C27001C_007", Asiant = "C27001D_007", Pacifict = "C27001E_007", Othert = "C27001F_007", TwoMoret = "C27001G_007", WhiteNott = "C27001H_007", Hispanict = "C27001I_007"))
HealthTot <-  HealthTot %>%
 select(GEOID, NAME, variable, estimate)
HealthTot <-  HealthTot %>%
 spread(variable, estimate)
HealthTot <-  HealthTot %>%
 replace_na(list(Whitet = 0, Blackt = 0, Nativet = 0, Asiant = 0, Pacifict = 0, Othert = 0, TwoMoret = 0, WhiteNott = 0, Hispanict = 0))
HealthTot <-  HealthTot %>%
 mutate("HTot" = Whitet + Blackt + Nativet + Asiant + Pacifict + Othert + TwoMoret + WhiteNott + Hispanict)
```

---
## Calculate the proportions
- Join tables
- Do math
- This is one of several fields calculated
- Check RPubs to see
- <https://rpubs.com/IPEDS/Data_Pull>

```r
#Join the health variables
Health <- left_join(HealthTot, HealthPro, by = "GEOID")
Health <- rename.variable(Health, "NAME.x", "County")
Health <-  select(Health, GEOID, County, HPro, HTot)
Health <-  Health%>%
 mutate("NoHealth" = HTot/HPro)
Health <-  Health %>%
mutate("WithHealth" = 1-NoHealth)
```

---
# Join them all!
- A series of joins gets you there
- You could do one line of code to join, but I like to do it iteratively
- Ultimately, you end up with a massive dataset
- IPEDS and Census United!

```r
census <-  left_join(employ, Health, by = "GEOID")
census <-  rename.variable(census, "GEOID.x", "GEOID")
census <-  left_join(census, income, by = "GEOID")
census <-  left_join(census, TotalWhite, by = "GEOID")
census <-  left_join(census, Veteran, by = "GEOID")
census <-  left_join(census, HousePercent, by = "GEOID")
census <-  left_join(census, Married, by = "GEOID")
census <-  left_join(census, Education, by = "GEOID")
census <-  left_join(census, Tribes, by = "GEOID")
census <-  left_join(census, Parenting, by = "GEOID")
census <-  left_join(census, Citizen, by = "GEOID")
census <-  left_join(census, Renters, by = "GEOID")
census <- rename.variable(census, "County.x", "County")

write.csv(census, "census.csv")

###Join with IPEDS Database
ipedscensusdata <-  left_join(ipedsdashdata, census, by = c("COUNTYCODE" = "GEOID"))
```

---
## Had one more thing to get, BLS data
- Got job projections from this link <https://www.bls.gov/emp/>
- Got CIP/SOC crosswalk form this link <https://nces.ed.gov/ipeds/cipcode/post3.aspx?y=56>
- Had to write them, munge them, and join them
- This is only a fraction of the code; see RPubs site  <https://rpubs.com/IPEDS/Data_Pull>

```r
CIPSOC<- read.csv("CIP_SOC.csv")

CIPSOC<- CIPSOC %>%
  rename("CIPCODE" = CIP2020Code)
CIPSOC<- CIPSOC %>%
  rename("Degree_Title" = CIP2020Title)
CIPSOC<- CIPSOC %>%
  rename("Career_Title" = SOC2018Title)
CIPSOC<- CIPSOC %>%
  rename("SOCCODE" = SOC2018Code)

Wage<- read.delim("wage.txt")

Wage<- Wage%>%
  rename("SOCCODE" = OCC_CODE)
```

---
## Grab CIP/SOC Data and Munge Away
-I wrote everything as .csv files
-This was for the Dashboard

```r
degree<- left_join(instname, instdegree, by = "UNITID")
degree<- left_join(degree, Wage, by = "CIPCODE")
Degrees_and_Jobs<- degree%>%
select(Institution_Name, Select_a_State, Degree_Title, Career_Title, CIPCODE, SOCCODE, Total_Programs, Total_Employed, Mean_Hourly, Mean_Annual, Hourly_Median,  Median_Annual)
write.csv(Degrees_and_Jobs, "Degrees_and_Jobs.csv", fileEncoding = "UTF-8")

## Projections Data Munge
CIPSOC<- read.csv("CIP_SOC.csv")
CIPSOC<- CIPSOC %>%
  rename("CIPCODE" = CIP2020Code) %>%
  rename("Degree_Title" = CIP2020Title)%>%
  rename("Career_Title" = SOC2018Title) %>%
  rename("SOCCODE" = SOC2018Code)
projections<- read.csv("Projections.csv") %>%
            rename("SOCCODE" = 'X2019.National.Employment.Matrix.code') %>%
            rename("Career_Title" = 'X2019.National.Employment.Matrix.title')
projections2<- left_join(projections, CIPSOC, by = "SOCCODE")
degree2<- left_join(instname, instdegree, by = "UNITID")
Career_Projections<- left_join(degree2, projections2, by = "CIPCODE")
write.csv(Career_Projections, "Career_Projections.csv", fileEncoding = "UTF-8")
```

---
## Time to Dine and Dash(board)
## Let's Talk About Shiny Application Dashboard
## I'm going to talk about the basic pattern
## Then I'll take you to the code

---
# Basic Pattern
## UI or User Interface
- This is what the user sees when they go to your application
- This is where you include graphics, input controls, etc
- This is where they navigate the dashboard
- This is where they see graphics and filter things

## Server
- This is the brains behind your application
- This is where the mechanisms behind the functions go

## The server and UI unite!

---
# Before the UI
## The first thing I do is load all my libraries
- This is only a faction of the data loaded
- The linked code gives all the data loaded

```r
library(plotly)
library(ggplot2)
library(shiny)
library(shinyWidgets)
library(shinydashboard)
library(leaflet)
library(leaflet.extras)
library(DT)
library(formattable)
library(data.table)

GradRate<- read.csv("GradRate.csv")
```

---
## The UI!
- Next you build your user interface (UI)
- This is a small example
- The linked code gives the full UI for the dashboard
- Consider the elements: skin, title, sidebar, menu, tabs
- Notice tabName = "intro"

---
# Here is the UI

```r
ui  <-  dashboardPage(skin = "purple", 
                    
                    dashboardHeader(title = "IPEDS Dashboard (2019 Data)"), 
                    
                    dashboardSidebar(
                      sidebarMenu(
                        menuItem("Introduction", tabName = "intro", icon = icon("user")))), 
                    
                    dashboardBody(
                      tabItems(
                        tabItem(tabName = "intro",
                                h1("IPEDS Dashboard", align = "center"), 
                                 fluidRow(
                                  selectInput("stateInput", "Select One or More States:", 
                                              choices = sort(unique(GradRate$State)), 
                                              selected = "WY", multiple = TRUE), 
                                  uiOutput("instInput")),
                                 fluidRow(
                                  box(plotlyOutput("plot", height = 500), width = 1000)))
  )))
```
---
## The Server
- Next we build various elements into our server
- Input controls linked with our UI inputs
- Maps
- Plots (ggplot2, leaflet)
- This uses the input control as the data element in ggplot
- It's wrapped with plotly to give tootips and otther features
- See next slide

---
- A Side Bar Graph with Filters

```r
server  <-  function(input, output, session) {
  
  df0  <-  eventReactive(input$stateInput, {
    GradRate %>% filter(State %in% input$stateInput)
  })
  output$instInput  <-  renderUI({
    selectInput("instInput", "Next Select One or More Colleges:", sort(unique(df0()$Institution)), selected = "Casper College", multiple = TRUE)
  })
  
  df1  <-  eventReactive(input$instInput, {
    df0() %>% filter(Institution %in% input$instInput)
  })

output$plot  <-  renderPlotly({
    
  grad <-  ggplot(df1(), aes(x = RateLevel, y = Rate, group = Institution, fill = Institution, 
                             text = paste("Institution:", Institution, "<br />State:", State, "<br />Rate Level:", 
                                        RateLevel, "<br />GraduationRate:", Rate %>% paste0("%")))) +  
      geom_bar(stat = "summary", fun = "mean", position = "dodge2") + 
      ggtitle("Graduation and Retention Rates") + 
      xlab("") + 
      ylab("Rate") + 
      scale_y_continuous(labels = function(x) paste0(x, "%")) + 
      scale_fill_brewer(palette = "Dark2") + 
      coord_flip() + 
      theme(axis.text.x = element_text(angle = 45))
    ggplotly(grad, tooltip = "text")
   })}
```

---
## I'll take you through the dashboard and code
- <https://ipedsdash.shinyapps.io/demonstration/>
- <https://rpubs.com/IPEDS>
## There is a lot not covered here
- Leaflet
- Correlations
- Getting regression results in application
- But it's all in the code
- Or Contact me!
---
###Get this presentation online (write this down)
- It has links to all the code
- <https://rpubs.com/MarkRules/>
## Thank You!