Author

Charles Scarborough

Professional Profile

I am a San Francisco, CA-based data scientist with over five years of professional experience in both spatial and time-series data management, analysis, and visualization, particularly on projects involving air quality, meteorology, and wildfire science and smoke impacts to air quality. I am an expert in establishing complex and efficient data pipelines in Python and R programming languages and have extensive experience utilizing ArcGIS software for geospatial analysis and cartography. I successfully apply my expertise and skills to various domains, such as data dashboard development, mobile air quality monitoring, wildfire risk and impacts, and more. I regularly collaborate with a diverse client list, including the U.S. Environmental Protection Agency (EPA), California Department of Forestry and Fire Protection (CALFIRE), local and state governments, and other private companies. I value environmental stewardship and protecting public health from externalities imposed by pollution, including climate change, and have built my career around these values.

Select Projects

Dashboard Products

I have worked on or fully developed 13 data analysis and visualization dashboards in collaboration with other team members and agencies, namely the U.S. EPA, CALFIRE, and Southern California Edison (SCE), where we establish data pipelines and build dashboards using Shiny and complementary R packages. We utilize Amazon Web Services (AWS) S3 cloud storage solutions to house large data sets, which are dynamically read into the dashboards based on user input. The dashboards that I create are highly interactive and feature data analysis and visualizations of particular interest to the client. For each dashboard, I have written detailed user guides, Readme files, and practiced version control on GitHub. The following sections highlight the dashboards where I was responsible for establishing data pipelines, coding the user interface (UI) and back-end processing, and demonstrating the dashboards to clients, but I have also advised on and contributed to the development of other dashboards.

Near-Road Data Quality Dashboard

Status: operational

The Near-Road Data Quality Dashboard provides quality assurance and data analysis for all data collected at sites in the nation-wide Near-Road air quality monitoring network. The dashboard allows users to quickly understand pollutant concentrations and metadata at a site and network level. Analysis and visualization include data completeness, summary statistics, interactive time series analysis, and more in the form of interactive tables and figures. This dashboard is updated every week to incorporate more data collected at air monitor sites. Therefore, the pre-processing scripts in the data pipeline are soft coded to handle unexpected data. I presented the dashboard in 2022 at the National Ambient Air Monitoring Conference. Since the presentation, I updated the dashboard to include a model that quantifies the expected average concentration of a pollutant as a function of wind direction and wind speed using nonparametric regression (NWR) and kernel smoothing methods, or a type of weighted average - this model helps users understand from which direction and at what wind speed a pollutant originates. Users can download visualizations from the model (see below).

An example of data visualization output from the Near-Road Dashboard.

Lead Monitoring Dashboard

Status: in progress

The Lead Dashboard is a spatial data analysis tool to analyze all lead monitoring data collected in the nation-wide lead monitoring network. The goal of the dashboard is to better understand redundancies and gaps in the lead monitoring network based on concentrations and known lead sources by comparing lead monitoring sites and National Emissions Inventory (NEI) point lead sources. The dashboard allows users to visualize lead sites on an interactive map and lead concentration data since 2008 and compare it to known sources of lead pollution. A user can define a circular radius on a selected site, then spatially summarize and compile all known lead sources within the radius. The dashboard also features spatial summaries and visualizations at multiple geographic levels (e.g., metropolitan, county, state, etc.).

SmokePath Explorer

Status: operational

The SmokePath Explorer tool was developed in collaboration with the California and Nevada Smoke and Air Committee (CANSAC), the Desert Research Institute (DRI), CALFIRE, and Sonoma Technology. The goal of the tool is to support fire and air agencies in long-term prescribed fire planning to mitigate smoke impacts from prescribed fires. The tool is highly interactive and features an interactive map, user feedback, download options, links to similar tools, and more. In short, the tool calculates meteorological statistics, probability of air transport from a user-defined starting location, identifies potential sensitive populations that may be impacted, and produces easy-to-read visuals of the resultant data. I presented the dashboard at ShinyConf 2024. The dashboard and underlying data were also presented at the American Meteorological Society annual conference. The dashboard and underlying data is published in Zare-Harofteh et al. (2025).

A portion of the SmokePath Explorer’s UI.

Routine Data Analysis

I have established data pipelines to efficiently apply data analysis and quality control (QC) algorithms to air quality data collected in multiple settings, including at landfills and the fence line perimeter of oil refineries. These projects are conducted on a weekly, monthly, quarterly, and annual basis, and are usually summarized in a report along with the input data. In sum, I have worked on or currently work on five separate routine data analysis projects. The following sections summarize two routine data analysis projects that I have been heavily involved in.

Refinery Perimeter Air Monitoring

Large refineries in the South Coast Air Basin in California are required to conduct real-time fence line perimeter monitoring of numerous air pollutants. The purpose of this regulation is to provide air quality information to the public about levels of various air pollutants at or near the property boundaries of petroleum refineries and in nearby communities. I oversee data analysis for nearly two dozen air quality and meteorology monitors at a South Coast (Los Angeles) refinery, and report data pursuant to the regulation on a monthly and quarterly basis. Quarterly reports can be found here. Monthly reports are not shared with the public.

Monthly Reporting

January, 2023 - present

Monthly reporting involves summarizing and producing visualizations of air quality monitor performance metrics and writing the monthly report.

Quarterly Reporting

January, 2022 - present

For quarterly reporting, I oversee data analysis and QC operations, advise junior data scientists on decisions related to data analysis and QC, and write the report.

Sunshine Canyon Landfill air monitoring

Air pollutants and meteorology has been continuously monitored at sites on and near the Sunshine Canyon Landfill in the greater Los Angeles area since 2007. Since the project’s inception, multiple data scientists have worked on quarterly and annual reports to summarize data quality. I began working on the project in 2021, where I translated the data QC algorithms from a SQL database to R to improve project efficiency by automating the QC process. My efforts saved numerous hours of work and allow for more control over data quality operations. Besides routine data analysis, I produce maps of wildfires and wildfire impacts for instances where smoke impacted air pollution levels. Quarterly and annual reports can be found here under the “Quarterly Reports” and “Annual Reports” drop downs, respectively.

Quarterly Reporting

January 2021 - present

Quarterly reporting involves summarizing air quality data, calculating statistics and producing visualizations, and writing the report.

Annual Reporting

2021 - present

Annual reporting involves summarizing the quarterly reports and air quality data collected throughout the year, calculating statistics and conducting analysis, producing visualizations, and writing the report.

Wildfire Science

Community Wildfire Protection Plans (CWPPs)

CWPPs are science-based assessments of wildfire hazards in communities such as towns or counties. The purpose of CWPPs are to provide community stakeholders with guidance and strategies to reduce fire hazard risk in the community, while promoting the protection of the community’s economic and ecological assets. CWPPs are heavily reliant on geospatial analysis, including wildfire modeling. I have worked on three separate CWPPs, where I collected geospatial data, ran a wildfire model, calculated parcel-based wildfire risk scores, and produces visualization products of all input and output data. Wildfire model inputs include fuel models, landscape characteristics (e.g., slope, aspect), forest canopy characteristics, and meteorology. I utilize Python and R programming to accomplish a variety of geospatial data science tasks, such as produce spatial statistics, prepare input geospatial data for wildfire modeling, and conduct risk assessments. I have been heavily involved in the following CWPPs:

Since the publication of the Marin County CWPP, I updated the input data to incorporate impervious surfaces (e.g., buildings, roads, etc.) and gridded meteorology data - this effort is critical in refining the wildfire model output and calculating subsequent wildfire risk scores.

Smoke Impacts to Air Quality

Smoke from wildfires, prescribed burns, and agricultural burns impact air quality throughout the world. When smoke causes elevated concentrations in federally-regulated pollutants (e.g., ozone, PM2.5), air agencies in the U.S. can request these concentrations be omitted from the record to avoid penalties. To omit these concentrations from the record, agencies must prepare an “exceptional event demonstration”, which provides evidence in the form of analysis of air pollution data, smoke transport, and more, that the elevated concentration was out of their control. I have worked on over a dozen demonstrations or similar documents for Clark County, Nevada, Maricopa County, Arizona, and Jefferson County, Alabama. The analysis is extensive, and includes statistical analysis, air trajectory analysis, mapping of fires, smoke, air monitors, meteorological analysis, and more depending on the complexity of the smoke impact event. I have served as task manager and lead data scientist on demonstrations for Maricopa County, Arizona. I have also written numerous sections of analysis. Demonstrations for Clark County, Nevada, can be found here.

Miscellaneous

Air Pollution Deposition

Deposition of pollutants such as nitrogen and ozone onto trees can affect tree health. Forests have a threshold (critical load) for how much pollution they can absorb without harm. Quantifying the critical load is an area of ongoing research, and is variable among different tree species. It is important to understand critical loads because exceeding these thresholds leads to reduced biodiversity, altered soil chemistry, and tree health, including growth and mortality. I have been involved with research to understand critical loads among different tree species. My role in this research was to prepare multiple large raster data sets covering the contiguous U.S. to be used in machine learning algorithms. Specifically, I prepared raster data sets of drought indices, ozone, soil characteristics, and more. This preparation included ensuring consistent spatial resolution, projection, and extent between different raster data sets, interpolating rasters between years to create raster time series, producing maps of resultant data, and more. Further, I applied down scaling techniques to the raster data sets and created versions with and without human-driven pollution emissions to quantify the affect that humans have on critical loads. These data were used to predict tree mortality. This research was published in Coughlin et al. (2024) and in Pavlovic et al. (2025).

Dust Impacts to Air Quality

Similar to smoke impacts, dust impacts to air pollution concentrations can sometimes require an exceptional event demonstration. I have worked on ten exceptional event demonstrations for Clark County, Nevada, where we linked high winds with elevated PM10 concentrations (dust). Like the demonstrations to link wildfire smoke and elevated ozone and PM2.5, demonstrations to link high winds and elevated PM10 require extensive analysis. For these demonstrations, I focused on mapping land use, land cover type (e.g., urban area, forest, crops, etc.), and dust emissions around Clark County, meteorological analysis, and researching and summarizing dust control regulations in the Desert Southwest.

Greenhouse Gas Emissions

Methane emissions from oil and gas infrastructure are poorly understood and vary widely across facilities and basins. Differences in emissions can vary extensively due to several facility-level characteristics, such as the type of infrastructure, equipment age, operations, maintenance, and environmental conditions (e.g., topography and meteorology). In recent years, there have been significant technological advances in measuring methane concentrations and emission rates using satellite instruments, airborne surveys, ground-based mobile measurements, onsite unmanned aerial vehicle (UAV) measurements, and low-cost sensors. I served as lead data scientist on a project to evaluate a variety of methane measurement devices (satellite, airborne, and surface) against in-situ measurements collected at an oil refinery in California. As lead data scientist, I created data pipelines to collect and integrate methane emission data into a common format to be used in regular reporting. I also also regularly created data visualization products showing methane emission concentrations and analyzer performance.

Mobile Air Quality Monitoring

I served as lead data scientist for a mobile air quality monitoring project based in Sacramento, California. This project consisted of a month-long mobile air quality monitoring campaign focused on communities in the city of Sacramento. The campaign measured pollutants such as black carbon (BC), carbon dioxide, methane, ozone, and more. Pollutants were continuously measured along fixed routes to ensure representative data collection. As lead data scientist, I established the data pipeline to apply QC algorithms to the raw data, corrected for ambient (background) pollutant concentrations using a time-series-based approach, allocated the time-series data onto multiple grids, and identified high pollution zones (HPZs) using spatial statistical methods. The final report can be found here. In 2023, I presented the project at the International Conference on Carbonaceous Particles in the Atmosphere.

Example map from the resultant mobile monitoring data showing average gridded ozone (top) and identified HPZ (bottom).

Network Assessments

Network assessments are a federally-mandated five-year review of a state’s ambient air quality monitoring network. The goal of network assessments are to assess the strengths and weaknesses of a state’s air monitoring network and to provide recommendations for optimizing and improving the overall effectiveness and efficiency of the network, while also maintaining compliance with federal ambient air monitoring regulations. These goals are accomplished by analyzing spatial and temporal trends in population growth based on the most current data, analyzing spatial patterns in air pollution emissions data, and analyzing trends in air quality data. I have served as lead data scientist on two network assessments, which involved a comprehensive review of air quality data (e.g., data completeness, air pollution concentrations, monitoring gaps), statistical trends analysis, correlation analysis, population analysis, and spatial emissions analysis. As lead data scientist, I establish data pipelines, assign analysis tasks, review work, and provide recommendations. I have served as lead data scientist on the following network assessments:

  • 2025 Wyoming Network Assessment (not yet published)

  • 2025 Florida Network Assessment (not yet published)

Select Publications

Coughlin, J., S. Y. Chang, K. Craig, C. R. Scarborough, C. T. Driscoll, C. M. Clark, and N. R. & Pavlovic. 2024. “Characterizing Localized Nitrogen Sensitivity of Tree Species and the Associated Influences of Mediating Factors.” Ecosphere 15 (7). https://doi.org/10.1002/ecs2.4925.
Pavlovic, N. R., S. Y. Chang, K. J. Craig, C. R. Scarborough, J. G. Coughlin, J. D. Herrick, and C. T & Driscoll. 2025. “Quantification of Ozone Exposure Impacts and Their Uncertainties on Growth and Survival of 88 Tree Species Across the United States.” Journal of Geophysical Research Atmospheres 130 (12). https://doi.org/10.1029/2024JD042063.
Zare-Harofteh, A., S. J. Kramer, S. Huang, C. R. Scarborough, K. Besong, N. Kumar, T. Brown, and F. Hosseinpour. 2025. “SmokePath Explorer - a Data Driven Smoke Management Tool to Support Prescribed Fire Planning in California, USA.” Journal of the Air & Waste Management Association 75 (September). https://doi.org/10.1080/10962247.2025.2553822.