Introduction to the educationdata package

The educationdata package was developed by Kyle Ueyema, Lead Data Engineer at the Urban Institute. This package allows users to pull data from the Education Data Explorer and create data frames with your variables of interest. Data is collected and made available through many sources, including:


The Urban Institute

The Urban Institute was founded by former U.S. president, Lyndon B. Johnson, as an independent non-profit organization that could offer socially just and equitable data analysis for policy formation. This institute conducts research in more than 20 different areas, detailed in Table 1 below. Their research draws on data made available through many sources, including: The Civil Rights Data Collection, Small Area Income and Poverty Estimates, EDFacts, Integrated Postsecondary Education Data System, and more.

In addition to conducting research, the Urban Institute prepares data tools to help the general public access and understand different data sets, like the Education Data Explorer, which allows you to create your own data frame from their pool of K-12 and higher education records. The educationdata package uses the Education Data Explorer to import data from multiple sources.


Research Areas Descriptions
Aging and retirement Demographic trends among aging Americans
Children and youth Services that promote children’s health and development
Climate, disasters, and environment Effects of climate change on communities
Crime, justice, and safety Efficacy of system practices and those most affected
Economic mobility and inequality Factors that shape workers’ upward mobility
Education State and federal policies impact on K12 and higher education
Families Economic pressures and demographic changes
Health and health care Impact of America’s health care system
Housing finance Analysis of equitable housing finance practices
Housing Housing affordability and options
Immigrants and immigration Data on immigrants’ experiences and impact of policies
International development Development interventions in fragile countries
Land use Land use regulations influence on housing affordability
Neighborhoods, cities, and metros Local needs and policies on a range of community issues
Nonprofits and philanthropy How nonprofits measure impact
Sexual orientation, gender identity, and expression Roles of identity and orientation on financial well-being, health, and education
Social safety net Measurement of poverty’s several dimensions
State and local finance Fiscal challenges facing state and local governments
Race and equity Systemic barriers and economic inequities
Taxes and budgets Impacts of tax policy changes
Wealth and financial well being Barriers to financial stability
Workforce Changing labor markets and employer needs
Table 1. The Urban Institute research areas with a brief description of the area


How the educationdata package works

The package pulls data from the Education Data Explorer from the Urban Institute. The key function of the package is get_education_data(). This function uses arguments that mimic the selectable options in the Education Data API, as seen in Image 1 below.

Image 1. Education Data Explorer API

Image 1. Education Data Explorer API


The get_education_data() function uses the following arguments:

  • level - required
    • Select from either K-12 or higher education options
  • source - required
    • the data source you want to specifically access
  • topic - required
    • the variable(s) you are interested in pulling in
  • subtopic - optional
    • organize the topics by available demographic information
  • filters - optional
    • pull information for specific groups within the data
  • add_labels - defaults to FALSE
    • label variable names with their data integer codes
  • csv - defaults to FALSE
    • option to download a CSV file of your new data frame


An additional function of the educationdata package is the get_education_data_summary() function. This function provides summary information about your variables of interest. This function uses the same arguments as the get_education_data(), plus a few more:

  • stat - required
    • identify the summative measure (average, sum, median, etc.)
  • var - required
    • the variable you want to summarize
  • by - required
    • how you want to group your variables


Using the educationdata package


# Install the educationdata package with the install.package() function

# Load the library

library(educationdata)


Example question: What is the student to faculty ratio for colleges/universities?

To answer this question, you could use the following function and arguments to collect needed data.

stufac_df <- get_education_data(level = 'college-university',
                         source = 'ipeds',
                         topic = 'student-faculty-ratio')

head(stufac_df)


Example question: What are the demographics of Fall 2020 undergraduate students?

To answer this question, you could use the following function and arguments to collect needed data.

# It takes a good amount of time for this information to be retrieved

# For this example question, the evaluation is set to false

fallugs_df <- get_education_data(level = 'college-university',
                              source = 'ipeds',
                              topic = 'fall-enrollment',
                              subtopic = c('race', 'sex'),
                              filters = c(level_of_study = 'undergraduate', year = 2020))

head(fallugs_df)


Example question: How many students were enrolled in high school in 2019?

hsfem19_df <- get_education_data_summary(level = 'schools', 
                         source = 'ccd', 
                         topic = 'enrollment', 
                         filters = c(year = 2019),
                         stat = 'sum', 
                         var = 'enrollment',
                         by = 'year')

head(hsfem19_df)


Other possible applications of the educationdata package

In the first example question, the data frame includes a variable ‘fips’. FIPS, or Federal Information Processing Standards, are numerical codes assigned to different geographic regions. This data frames produced from this package could be merged with the U.S. Census and American Community Survey data, which also uses FIPS, to analyze educational data in a regional context.


Resources

Ueyama, K. educationdata: Retrieve Records from the Urban Institute’s Education Data Portal API. CRAN, accessed on April, 30, 2022, https://CRAN.R-project.org/package=educationdata.

Education Data Explorer, Education Data Portal (Version 0.15.0), Urban Institute, accessed April, 30, 2022, https://educationdata.urban.org/documentation/, made available under the ODC Attribution License.