## .center[Developing a Shiny Dashboard] ## .center[Using IPEDS, Census, and BLS Data] ### Mark Perkins, Ph.D. ### Jonathan W. Carrier, Ph.D. #### .center[University of Wyoming] --- # Background - This presentation provides an overview of how to create a dashboard with R using IPEDS, Census, and BLS Data - We combine these three data sources to generate an interactive dashboard - Here is a link to the dashboard - <https://ipedsdash.shinyapps.io/demonstration/> - Here is a link to all the code - <https://rpubs.com/IPEDS> ## Here I'll provide an overview of what it does then how it was made --- # Getting IPEDS Data ## Link to Access file - Download the ipeds() package and load the library - Download the RODBC() package and load the library - Link to the Access file from which you want to pull data - Ready to go! ```r library(RODBC) library(ipeds) #Get IPEDS DATA IPEDSDatabase <- odbcDriverConnect("Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/Users/mperk/OneDrive/Dashboard/2022 Work/IPEDS201920.accdb") ``` --- # Getting the IPEDS Data ## Locating What you Want - IPEDS has A LOT of data - I spent hours analyzing the keys to figure out what I want - The best thing to do is open the Access file and read two tables (YY means IPEDS year): - vartableYY - valuesetsYY ## This is when I open Access and Show You! --- # Data Munging - There were a lot of fields to munge on this project - I have linked my code here - <https://rpubs.com/IPEDS> - I will go over a few fundamentals on building the data - Then I well get to the development of the dashboard ## This is where I go to my RPubs page --- # Instituion Information - This required me to write custom query code using the tidyverse to gather those tables - I then custom coded variables to my needs ```r institutioninformation <- sqlFetch(IPEDSDatabase, "HD2019") %>% subset(CONTROL == 1) %>% select(UNITID, COUNTYCD, STABBR, INSTNM, IALIAS, F1SYSNAM, LONGITUD, LATITUDE, LOCALE, C18SZSET)%>% mutate(locale = case_when(LOCALE == 11 ~ "City", LOCALE == 12 ~ "City", LOCALE == 13 ~ "City", LOCALE == 21 ~ "Suburb", LOCALE == 22 ~ "Suburb", LOCALE == 23 ~ "Suburb", LOCALE == 31 ~ "Town", LOCALE == 32 ~ "Town", LOCALE == 33 ~ "Town", LOCALE == 41 ~ "Rural", LOCALE == 42 ~ "Rural", LOCALE == 43 ~ "Rural", LOCALE == -3 ~ "Unknown")) ``` --- # Other Information ## Several other pieces were gathered including - Enrollment Information - Graduation information (100%, 150%, 200%) - Persistence (PT, FT) ## Changed variable names so they made sense ## Hard programmed certain variables ## Joined all the tables to one IPEDS table --- ## Enrollment and Graduation - Just a little chunk of code on enrollment and graduation ```r #Get Institution Enrollment from EF Table and Keep all Variables enrollmentinformationrace <- sqlFetch(IPEDSDatabase, "EF2019A") enrollmentinformationrace <- subset(enrollmentinformationrace, EFALEVEL == 1) enrollmentinformationrace <- enrollmentinformationrace %>% select(UNITID, EFAIANT, EFASIAT, EFBKAAT, EFHISPT, EFNHPIT, EFNRALT, EFUNKNT, EF2MORT, EFWHITT) #Get Institution Enrollment from EF Table and Keep all Variables enrollmentinformationgender <- sqlFetch(IPEDSDatabase, "EF2019") enrollmentinformationgender <- subset(enrollmentinformationgender, EFLEVEL == 10) #Get CC Graduation Data fro GR Table and Reduce to Desired Variables graduationinformation <- sqlFetch(IPEDSDatabase, "GR200_19") graduationinformation <- graduationinformation %>% select(UNITID, L4GR100, L4GR150, L4GR200) #Get Four Graduation Data fro GR Table and Reduce to Desired Variables universitygrad <- sqlFetch(IPEDSDatabase, "GR200_19") universitygrad <- universitygrad %>% select(UNITID, BAGR100, BAGR150, BAGR200) ``` --- # Joining and renaming - Here I show how I joined the data tables and began to set out to rename them so they were understandable - It was very useful to keep a data dictionary as I went ```r #Join to one master file for IPEDS ipedsdashdata <- left_join(institutioninformation, enrollmentinformationgender, by = "UNITID") ipedsdashdata <- left_join(ipedsdashdata, enrollmentinformationrace, by = "UNITID") ipedsdashdata <- left_join(ipedsdashdata, graduationinformation, by = "UNITID") ipedsdashdata <- left_join(ipedsdashdata, universitygrad, by = "UNITID") ipedsdashdata <- left_join(ipedsdashdata, retentioninformation, by = "UNITID") ipedsdashdata <- left_join(ipedsdashdata, costinformation, by = "UNITID") ipedsdashdata<- left_join(ipedsdashdata, transferinformation, by = "UNITID") #Rename variable ipedsdashdata <- ipedsdashdata %>% rename("Institution" = INSTNM) ``` --- # Custom variables - Here I show how I calculate proportions - There is a more efficient way to pipe this ```r # Calculate gender enrollment rates ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Women" = (Tot_Women / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_FT" = (Tot_Full_Time / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Am_Indian" = (Am_Indian / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Asian" = (Asian / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Black" = (Black / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Hispanic" = (Hispanic / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Hawaii_PI" = (Hawaii_PI / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Non_Resident" = (Non_Resident / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Unknown" = (Unknown / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_Two_or_More" = (Two_or_More / Tot_Enrolled)*100) ipedsdashdata <- ipedsdashdata %>% mutate("Percent_White" = (White / Tot_Enrolled)*100) ``` --- ## Alas I wrote my final data set to a .csv file - Later I link .csv files to Shiny, but it's not really necessary to do that - I also formatted the county code to match the Census county code for joining - Hence the spriinttf("%05d") - That converts it to 5 digit county code with two leading zeros ```r ipedsdashdata$COUNTYCODE <- sprintf("%05d", ipedsdashdata$COUNTYCD) write.csv(ipedsdashdata, "ipedsdashdata.csv") ``` --- ## Getting United States Census Bureau Data ### Get your Census API Key and use tidycensus! - Get your API key here: <https://api.census.gov/data/key_signup.html> - Here is more information about that: <https://www.census.gov/data/developers/guidance/api-user-guide.html> - Download the tidycensus package <https://walker-data.com/tidycensus/> - Then run the code with your key - Your key goes in the quotes; I suppressed mine ```r library(tidycensus) census_api_key("") ``` --- ## You can download the variable key - Then get to work building your dataset - See RMarkdown link for full code - Here I grab income data using tidycensus ```r key <- load_variables(2019, "acs5", cache = TRUE) write.csv(key, "censuskey.csv") #Estimated median household income in past 12 months, inflation adjusted 2015 income <- get_acs(geography = "county", variables = c(MedIncome = "B19013_001")) income <- rename.variable(income, "estimate", "Median_Household_Income") income <- select(income, GEOID, Median_Household_Income) ``` --- ## Health insurance population - This shows how we got the total population data - This will be used to calculate the proportion of folks with health insurance ```r #Total in Population HealthPro <- get_acs(geography = "county", variables = c(White = "C27001A_005", Black = "C27001B_005", Native = "C27001C_005", Asian = "C27001D_005", Pacific = "C27001E_005", Other = "C27001F_005", TwoMore = "C27001G_005", WhiteNot = "C27001H_005", Hispanic = "C27001I_005")) HealthPro <- HealthPro %>% select(GEOID, NAME, variable, estimate) HealthPro <- HealthPro %>% spread(variable, estimate) HealthPro <- HealthPro %>% replace_na(list(White = 0, Black = 0, Native = 0, Asian = 0, Pacific = 0, Other = 0, TwoMore = 0, WhiteNot = 0, Hispanic = 0)) HealthPro <- HealthPro %>% mutate("HPro" = White + Black + Native + Asian + Pacific + Other + TwoMore + WhiteNot + Hispanic) ``` --- ## Those with health insurance - This calculates those with health insurance - Similar programming, different census fields ```r #Total with health HealthTot <- get_acs(geography = "county", variables = c(Whitet = "C27001A_007", Blackt = "C27001B_007", Nativet = "C27001C_007", Asiant = "C27001D_007", Pacifict = "C27001E_007", Othert = "C27001F_007", TwoMoret = "C27001G_007", WhiteNott = "C27001H_007", Hispanict = "C27001I_007")) HealthTot <- HealthTot %>% select(GEOID, NAME, variable, estimate) HealthTot <- HealthTot %>% spread(variable, estimate) HealthTot <- HealthTot %>% replace_na(list(Whitet = 0, Blackt = 0, Nativet = 0, Asiant = 0, Pacifict = 0, Othert = 0, TwoMoret = 0, WhiteNott = 0, Hispanict = 0)) HealthTot <- HealthTot %>% mutate("HTot" = Whitet + Blackt + Nativet + Asiant + Pacifict + Othert + TwoMoret + WhiteNott + Hispanict) ``` --- ## Calculate the proportions - Join tables - Do math - This is one of several fields calculated - Check RPubs to see - <https://rpubs.com/IPEDS/Data_Pull> ```r #Join the health variables Health <- left_join(HealthTot, HealthPro, by = "GEOID") Health <- rename.variable(Health, "NAME.x", "County") Health <- select(Health, GEOID, County, HPro, HTot) Health <- Health%>% mutate("NoHealth" = HTot/HPro) Health <- Health %>% mutate("WithHealth" = 1-NoHealth) ``` --- # Join them all! - A series of joins gets you there - You could do one line of code to join, but I like to do it iteratively - Ultimately, you end up with a massive dataset - IPEDS and Census United! ```r census <- left_join(employ, Health, by = "GEOID") census <- rename.variable(census, "GEOID.x", "GEOID") census <- left_join(census, income, by = "GEOID") census <- left_join(census, TotalWhite, by = "GEOID") census <- left_join(census, Veteran, by = "GEOID") census <- left_join(census, HousePercent, by = "GEOID") census <- left_join(census, Married, by = "GEOID") census <- left_join(census, Education, by = "GEOID") census <- left_join(census, Tribes, by = "GEOID") census <- left_join(census, Parenting, by = "GEOID") census <- left_join(census, Citizen, by = "GEOID") census <- left_join(census, Renters, by = "GEOID") census <- rename.variable(census, "County.x", "County") write.csv(census, "census.csv") ###Join with IPEDS Database ipedscensusdata <- left_join(ipedsdashdata, census, by = c("COUNTYCODE" = "GEOID")) ``` --- ## Had one more thing to get, BLS data - Got job projections from this link <https://www.bls.gov/emp/> - Got CIP/SOC crosswalk form this link <https://nces.ed.gov/ipeds/cipcode/post3.aspx?y=56> - Had to write them, munge them, and join them - This is only a fraction of the code; see RPubs site <https://rpubs.com/IPEDS/Data_Pull> ```r CIPSOC<- read.csv("CIP_SOC.csv") CIPSOC<- CIPSOC %>% rename("CIPCODE" = CIP2020Code) CIPSOC<- CIPSOC %>% rename("Degree_Title" = CIP2020Title) CIPSOC<- CIPSOC %>% rename("Career_Title" = SOC2018Title) CIPSOC<- CIPSOC %>% rename("SOCCODE" = SOC2018Code) Wage<- read.delim("wage.txt") Wage<- Wage%>% rename("SOCCODE" = OCC_CODE) ``` --- ## Grab CIP/SOC Data and Munge Away -I wrote everything as .csv files -This was for the Dashboard ```r degree<- left_join(instname, instdegree, by = "UNITID") degree<- left_join(degree, Wage, by = "CIPCODE") Degrees_and_Jobs<- degree%>% select(Institution_Name, Select_a_State, Degree_Title, Career_Title, CIPCODE, SOCCODE, Total_Programs, Total_Employed, Mean_Hourly, Mean_Annual, Hourly_Median, Median_Annual) write.csv(Degrees_and_Jobs, "Degrees_and_Jobs.csv", fileEncoding = "UTF-8") ## Projections Data Munge CIPSOC<- read.csv("CIP_SOC.csv") CIPSOC<- CIPSOC %>% rename("CIPCODE" = CIP2020Code) %>% rename("Degree_Title" = CIP2020Title)%>% rename("Career_Title" = SOC2018Title) %>% rename("SOCCODE" = SOC2018Code) projections<- read.csv("Projections.csv") %>% rename("SOCCODE" = 'X2019.National.Employment.Matrix.code') %>% rename("Career_Title" = 'X2019.National.Employment.Matrix.title') projections2<- left_join(projections, CIPSOC, by = "SOCCODE") degree2<- left_join(instname, instdegree, by = "UNITID") Career_Projections<- left_join(degree2, projections2, by = "CIPCODE") write.csv(Career_Projections, "Career_Projections.csv", fileEncoding = "UTF-8") ``` --- ## Time to Dine and Dash(board) ## Let's Talk About Shiny Application Dashboard ## I'm going to talk about the basic pattern ## Then I'll take you to the code --- # Basic Pattern ## UI or User Interface - This is what the user sees when they go to your application - This is where you include graphics, input controls, etc - This is where they navigate the dashboard - This is where they see graphics and filter things ## Server - This is the brains behind your application - This is where the mechanisms behind the functions go ## The server and UI unite! --- # Before the UI ## The first thing I do is load all my libraries - This is only a faction of the data loaded - The linked code gives all the data loaded ```r library(plotly) library(ggplot2) library(shiny) library(shinyWidgets) library(shinydashboard) library(leaflet) library(leaflet.extras) library(DT) library(formattable) library(data.table) GradRate<- read.csv("GradRate.csv") ``` --- ## The UI! - Next you build your user interface (UI) - This is a small example - The linked code gives the full UI for the dashboard - Consider the elements: skin, title, sidebar, menu, tabs - Notice tabName = "intro" --- # Here is the UI ```r ui <- dashboardPage(skin = "purple", dashboardHeader(title = "IPEDS Dashboard (2019 Data)"), dashboardSidebar( sidebarMenu( menuItem("Introduction", tabName = "intro", icon = icon("user")))), dashboardBody( tabItems( tabItem(tabName = "intro", h1("IPEDS Dashboard", align = "center"), fluidRow( selectInput("stateInput", "Select One or More States:", choices = sort(unique(GradRate$State)), selected = "WY", multiple = TRUE), uiOutput("instInput")), fluidRow( box(plotlyOutput("plot", height = 500), width = 1000))) ))) ``` --- ## The Server - Next we build various elements into our server - Input controls linked with our UI inputs - Maps - Plots (ggplot2, leaflet) - This uses the input control as the data element in ggplot - It's wrapped with plotly to give tootips and otther features - See next slide --- - A Side Bar Graph with Filters ```r server <- function(input, output, session) { df0 <- eventReactive(input$stateInput, { GradRate %>% filter(State %in% input$stateInput) }) output$instInput <- renderUI({ selectInput("instInput", "Next Select One or More Colleges:", sort(unique(df0()$Institution)), selected = "Casper College", multiple = TRUE) }) df1 <- eventReactive(input$instInput, { df0() %>% filter(Institution %in% input$instInput) }) output$plot <- renderPlotly({ grad <- ggplot(df1(), aes(x = RateLevel, y = Rate, group = Institution, fill = Institution, text = paste("Institution:", Institution, "<br />State:", State, "<br />Rate Level:", RateLevel, "<br />GraduationRate:", Rate %>% paste0("%")))) + geom_bar(stat = "summary", fun = "mean", position = "dodge2") + ggtitle("Graduation and Retention Rates") + xlab("") + ylab("Rate") + scale_y_continuous(labels = function(x) paste0(x, "%")) + scale_fill_brewer(palette = "Dark2") + coord_flip() + theme(axis.text.x = element_text(angle = 45)) ggplotly(grad, tooltip = "text") })} ``` --- ## I'll take you through the dashboard and code - <https://ipedsdash.shinyapps.io/demonstration/> - <https://rpubs.com/IPEDS> ## There is a lot not covered here - Leaflet - Correlations - Getting regression results in application - But it's all in the code - Or Contact me! --- ###Get this presentation online (write this down) - It has links to all the code - <https://rpubs.com/MarkRules/> ## Thank You!