Project Set-Up

Whenever I start an R Project, I run the following script to create the same folder structure to store all components of the project, along with the project itself. (tip from Andrew Ba Tran’s R-Journalism Course)

Next we load all packages.

Getting California Education Data

When possible, all data projects should work with the most comprehensive, unprocessed, and official databases as possible. While this often takes more time and effort initially, this cost comes with huge benefits:

  • Accuracy: the more data is processed, the more likely you’ll introduce error–especially when you are copying and pasting data, or even worse, eye-balling and typing from already-processed sources (e.g. scanned documents.

  • Scaling: With the full dataset, you can scale any future project to your needs. You can quickly look into depth at any accounting question you have or do a full comparison across the state and across 17 years of data. You can build models to compare the relationship of variables to answer future questions you may have. The possibilities are enormous.

  • Automation (i.e. saving time): Because the database all follows a consistent and tidy format, you can use programming scripts to automate much of your analysis workflow.

How to do it:

  1. Download SACS files (Microsoft Access .mdb format). https://www.cde.ca.gov/ds/fd/fd/

  2. Create a function to loads SACS file, grab the financial database and the school name database within, and merges into R dataframe. This process is done for every file for every year, in return creating a list of dataframes for ever year of financial data: df_list.

Wrangle Data

Now we create a function that filters the dataframe list (df_list) to only include financial information for non-charter schools and for General Funds, and apply this function to all years in df_list. The reason we have to filter out charters is some school districts include charter schools that report using the same SACS accounting system, and not the “Alternative SACS” system, and are thus included in the count. If you don’t filter them out, and summarize by school districts, your results won’t match what you’ll find elsewhere. And we only want the General Fund because this is the only fund typically bargained over and looked to to measure the financial health of the district.

And the end of this process is a new object: df_list_filtered

df_filter_and_select <- function(df){
  df_filtered <- df |>
    filter(
      Fund == 1, # General Fund Only
      SchoolCode == 0) |> # No charter schools included
    select(Dname, Object, Fiscalyear, Value, Resource, SchoolCode) # Drop all columns we don't care about

  return(df_filtered)
}

df_list_filtered <- map(df_list, df_filter_and_select)

Analyze Data

Now we create a function that groups, summarises, filters, pivots, and mutates the all years of filtered dataframe so they are ready for analysis and display. In particular, we mostly care about total General Fund revenues and expenses, which we can use to calculate net-revenue; as well as the Unrestricted Fund Balance for the end of the year (since this is a pot of money that can be bargained over). And for all financial analysis, we contextualize net-revenue and fund balances as a percentage of total expenditures. By doing so, we can compare net-revenue and fund balances across time and district.

df_sum_function <- function(df){

  df_sum <- df |>
    group_by(
      School =
        Dname,
      YearStart =
        Fiscalyear,
      Revenue =
        Object >= 8010 &
        Object <= 8799,
      Expense =
        Object >= 1000 &
        Object <= 7439,
      UnrestrictedFund =
        Object >= 9700 &
        Object<= 9790 & #I believe 9791 is beginning year balance, so it shouldn't be included
        Resource >= 0 &
        Resource <= 2000) |>
    summarise(Total = sum(Value)) |>
    filter(Revenue | Expense | UnrestrictedFund) |>
    pivot_longer(
      cols = 3:5,
      names_to = "Type",
      values_to = "Check") |>
    filter(Check == "TRUE") |>
    select(-Check) |>
    pivot_wider(
      names_from = Type,
      values_from = Total) |>
    mutate(
      NetRevenue = Revenue - Expense,
      NetRevenuePerExp = NetRevenue / Expense,
      UnrestrictedFundPerExp = UnrestrictedFund / Expense) |>
    pivot_longer(
      cols = 3:8,
      names_to = "Type",
      values_to = "Total") |>
    mutate(Total = round(Total, 2)) |>
    mutate(YearEnd = YearStart + 1, .after = School) |>
    ungroup(YearStart) |>
    select(-YearStart)

return(df_sum)
  }


df_sum <- map(df_list_filtered, df_sum_function)

df_sum_all <- bind_rows(df_sum)

The Result

The process above spits out a perfectly tidy and relatively small csv file that can now be used for analysis and visualization going forward and easily updated every Fall when new SACS files come out.

tibble(df_sum_all)
## # A tibble: 113,808 × 4
##    School                                          YearEnd Type            Total
##    <chr>                                             <dbl> <chr>           <dbl>
##  1 "Abc Unified                                  "    2004 Unrestrict…   2.46e+6
##  2 "Abc Unified                                  "    2004 Expense       1.50e+8
##  3 "Abc Unified                                  "    2004 Revenue       1.47e+8
##  4 "Abc Unified                                  "    2004 NetRevenue   -2.24e+6
##  5 "Abc Unified                                  "    2004 NetRevenue…  -1   e-2
##  6 "Abc Unified                                  "    2004 Unrestrict…   2   e-2
##  7 "Acalanes Union High                          "    2004 Unrestrict…   4.02e+6
##  8 "Acalanes Union High                          "    2004 Expense       7.59e+7
##  9 "Acalanes Union High                          "    2004 Revenue       7.61e+7
## 10 "Acalanes Union High                          "    2004 NetRevenue    1.77e+5
## # … with 113,798 more rows
write_csv(df_sum_all, "output_data/ca_ed_data_sum_all.csv")

What’s Next

  • Creating list of SEIU 1021 represented school districts
  • Create table and graphic templates analyzing financial health, and apply them to all SEIU 1021 schools
  • Create table and graphic templates for any particular school financial report
  • Create table and graphic templates analyzing all CA schools in 2021 and for 2004-2021
  • Create report Rmarkdown template for all future financial report analysis for SEIU 1021 schools
  • Create interactive Shiny web app that allows staff and members to analyze data