Data-Driven Careers in the Public Sector

Collin Paschall

  • [some bio here]

Main points

  • Building a public sector career is about (1) technical chops, and (2) showing you can bridge the gap between data-driven insights and politics public policy.

Plan for Presentation

  • Define public sector data science
  • Exploring roles
  • Technical distinctions
  • The “public sector special” skill set.

An example of a public sector issue

  • You have a limited budget to upgrade aging infrastructure (e.g., lead pipes).
  • Option A: Focus on the oldest pipes to prevent the most leaks.
  • Option B: Focus on low-income neighborhoods with the least access to healthcare.
  • Complexities: How do we use data to balance utility with justice? How do we determine political feasibility?

Public Sector Dynamics

  • Thinking of data as a public good rather than a corporate asset.
  • The responsibility of the data analyst: Accuracy, transparency, and accountability.
  • Why a liberal arts background is a competitive advantage: You can define the “should,” not just the “how.”

The Public Sector Landscape

  • Federal Agencies: The “Big Data” of the Census, BLS, and HHS.
  • State & Local Government: Where policy hits the pavement (Housing, Transit, Public Health).
  • “Civic Tech”: Think tanks (Urban Institute), NGOs, and Digital Service teams (USDS, 18F).
  • The Private-Public Hybrid: Consulting for social impact (Deloitte, Booz Allen, smaller boutiques).
  • Actual Political Work: legislative assistant, advocacy work, political work.

Career Roles & Archetypes

  • Data Analyst: Telling stories through visualization (Tableau/PowerBI) and reporting.
  • Data Scientist: Building models to anticipate community needs.
  • Policy Researcher: Designing rigorous studies to see “what actually works.”
  • Data Engineer: The “plumbing”—ensuring government data is clean, accessible, and secure.

The Methodological Divide

  • Econometrics (Inference): Focused on understanding causality and the “Why.” Social science and theory-driven.
  • Machine Learning (Prediction): Focused on maximizing accuracy and the “What.” Computational and pattern-driven.
  • The Trade-off: Do you need to isolate a specific effect for a law, or do you need an automated algorithm that works?

Program Evaluation (Econometrics)

  • Question: “Did the $5M universal pre-K pilot cause the rise in literacy rates?”
  • Causal Inference (Counterfactuals, Difference-in-Differences).
  • Controlling for “noise” and confounding variables.
  • Purpose: defend a budget or scale a pilot program to the national level.

Predictive Analytics (ML)

  • Question: “Which census tracts are most likely to experience a spike in food insecurity next month?”
  • Supervised Learning (Random Forests).
  • Preventing “Black Box” bias and ensuring transparency.
  • Purpose: early intervention systems and resource allocation in real-time.

The Hybrid Reality: Mixed Methods

  • In practice, these tools are rarely used in isolation.
  • Using ML to identify a problem area, then using Econometrics to test a solution.
  • Incorporating qualitative data: Why the numbers don’t always tell the whole story.

“Hard” Skills

  • Programming:
    • R: The gold standard for statistical social science and Tidyverse.
    • Python: For scalable machine learning and automation.
  • Databases: SQL: The essential language for “talking” to government servers.
  • Version Control: GitHub; showing your work and collaborating on open-source policy.

The “Soft” Skills are the Hardest

  • Data Ethics: Identifying bias in historical datasets (e.g., over-policing).
  • Data Privacy: Navigating the legalities of PII.
  • Translation: Turning a p-value into a 1-page briefing memo.
  • Critical Thinking: Asking “Who is missing from this dataset?”

The “Unspoken” Requirement: Political Savvy

  • Navigating bureaucracy and institutional inertia.
  • Balancing competing interests: Budget Hawks (Efficiency) vs. Community Advocates (Equity) vs. Legal Teams (Privacy).
  • Learning to “speak” three languages: Data (to your peers), Policy (to your bosses), and Impact (to the public).

Frame your findings!

  • A p-value doesn’t win an argument; a compelling narrative backed by evidence does.
  • Translating “Statistical Significance” into “Constituent Impact.”

Why “Perfect” Models Fail

  • Technical success vs. Political failure: Why accurate models often sit on a shelf unused.
  • Lack of “buy-in” from frontline workers or community members.
  • The importance of “Human-in-the-Loop” systems.

Working lean

  • Many agencies operate on legacy systems, Excel spreadsheets, or even paper records.
  • Infrastructure is often fragmented; data “silos” are the norm, not the exception.
  • In a low-tech environment, a little skill goes a very long way.

Easy wins

  • Automating a manual 20-hour weekly reporting task with a simple R or Python script can be revolutionary for a small agency.
  • You aren’t just an analyst; you are often the architect.
  • Using open-source tools (R, Python, QGIS) to solve problems without needing a million-dollar software budget.

Final Thoughts & Q&A

  • Great fit for Liberal arts - define problems, weigh complexities, solve challenges.
  • Next Steps: Internships, fellowships, and building your first policy-focused repo
  • Questions?