Collocation Analysis Harrisburg and Philly

Author

Alain Izabayo

1. Introduction

In this analysis we calculate the r-squared values and root mean sqaure errors of the low-cost sensors used in collocation in Harrisburg and Philadelphia. The goal is to understand how the data collected using the three air sensor types (airgradient, quantaq and clarity) compares to the one collected using EPA monitors and reported to airnow. By tabulating the results we can easily make analyses to understand how the devices perform when compared to EPA monitors.

Let’s first set up some libraries that are likely tobe useful

2. Data extraction from sensors

The data is extracted from sensors, some cleaning done on it before it is sent to a cloud hosting database known as influxdb. In this notebook we shall read data from influxdb and analyze it from here.

The sensor data from all the three collocation sites was downloaded, cleaned in python and upload to the shared drive. Below we read the csv that stores the data from these sensors.

We make sure that the data types for each attribute in the dataset is set to the expectation

'data.frame':   56565 obs. of  9 variables:
 $ X                : int  0 1 2 3 4 ...
 $ time_US_eastern  : chr  "2026-04-15 20:00:00" "2026-04-15 21:00:00" ...
 $ latitude         : num  40.2 40.2 ...
 $ longitude        : num  -76.8 -76.8 ...
 $ device_sn        : chr  "MOD-X-00958" "MOD-X-00958" ...
 $ manufacturer     : chr  "quantaq" "quantaq" ...
 $ site_name        : chr  "Harrisburg" "Harrisburg" ...
 $ parameter        : chr  "PM2.5" "PM2.5" ...
 $ pm_gases_measures: num  7.29 7.27 ...

R doesnot seem to recognize the time column as time; so we convert it to time datatype for easy use during analysis

time_US_eastern latitude longitude device_sn manufacturer site_name parameter pm_gases_measures
46298 2026-04-14 20:00:00 40.24701 -76.84699 840420430401 EPA Harrisburg OZONE 36.0
46322 2026-04-14 20:00:00 40.24701 -76.84699 840420430401 EPA Harrisburg PM2.5 4.8
46299 2026-04-14 21:00:00 40.24701 -76.84699 840420430401 EPA Harrisburg OZONE 33.0

3. Data restructuring for RMSE and R-Squared

Let’s filter out for only the devices we plan to use in this analysis. For consistency, we can use the devices that justair is using to calculate the collocation factors.

In addition, we shall also add the EPA monitors since they will be used for reference. Lets first identify the sn for the EPA monitors

Since we need to use 24 hour average data for pm2.5, we can average the data on 24hour

4. Calculating the r-squared and rmse

We can first make a table with all the thresholds recommended by epa for comparison

For each location we find the r-squared and rmse using the EPA data as reference for each low-cost ‘representative’ device per manufacturer. For reusability and simplicity,let’s write a function that directly makes dataframes for r-squared and rmse based on paramter and location

We section the calculations by location for organization.

4.1 Harrisburg

The EPA monitors in Harrisburg only measure PM2.5 and Ozone.

4.1.1 Harrisburg PM2.5

We use the function to find the pm2.5 error values in Harrisburg for the three low cost sensors

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Harrisburg MOD-X-00958 quantaq PM2.5 0.97 2.139003e-47 >=0.70 0.5 <=7µg/m^3
Harrisburg A93NM49L clarity PM2.5 0.957 1.195538e-42 >=0.70 0.6 <=7µg/m^3
Harrisburg airgradient:588c8135a6ec airgradient PM2.5 0.942 1.05461e-25 >=0.70 0.8 <=7µg/m^3

Based on the values above, we can say that the R.squared and rmse values for all the devices (quantaq, clarity and airgradient) are withing the EPA recommended range

4.1.2 Harrisburg Ozone

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Harrisburg MOD-X-00958 quantaq OZONE 0.723 0e+00 >=0.80 6.3 <=5ppb
Harrisburg A93NM49L clarity OZONE 0.752 0e+00 >=0.80 6 <=5ppb
Harrisburg airgradient:588c8135a6ec airgradient OZONE 0.632 4.744426e-214 >=0.80 7.3 <=5ppb

The r-squared for Ozone in Harrisburg are low below the EPA recommended threshold. Aigradient seems to be having the lowest r-squared value and largest rmse

4.2 MON Philadelphia

The Philadelphia Montemogrey site measure PM2.5, CO and NO2.

4.2.1 MON - PM2.5

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-MON A636FW6Q clarity PM2.5 0.942 3.525698e-23 >=0.70 1.3 <=7µg/m^3
Philly-MON airgradient:588c813fe180 airgradient PM2.5 0.92 6.928558e-17 >=0.70 1.5 <=7µg/m^3

Both the air gradient and Clarity at the Montgomery have an r-squared value that is greater than the epa recommended threshold . The rmse is also less than the threshold

4.2.1 MON - NO2

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-MON A636FW6Q clarity NO2 0.23 5.345159e-45 >=0.70 4.5 <=5ppb
Philly-MON airgradient:588c813fe180 airgradient NO2 0.356 1.536529e-61 >=0.70 4.1 <=5ppb

The r-squared values for NO2 at the Montgomery site are way below the thrshold!

4.2.1 MON - CO

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-MON A636FW6Q clarity CO 0.798 2.429596e-275 >=0.80 0 <=1ppm

The CO value for the clarity at Montgomery site is just right on the threshold!

4.3 NEW - Philadelphia

4.3.1 NEW - PM2.5

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-NEW A44MFTF3 clarity PM2.5 0.663 4.785889e-09 >=0.70 1.6 <=7µg/m^3
Philly-NEW MOD-X-00959 quantaq PM2.5 0.741 1.331475e-10 >=0.70 1.4 <=7µg/m^3
Philly-NEW airgradient:588c81359ef8 airgradient PM2.5 0.637 3.595473e-07 >=0.70 1.7 <=7µg/m^3

Surprisingly, the pm2.5 r-squared values on the NEW site are below the threshold. This makes this site the only one where the pm2.5 r-squared values do not meet the threshold. The values are even lower when hourly data is used

4.3.2 NEW - OZONE

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-NEW A44MFTF3 clarity OZONE 0.824 3.673595e-300 >=0.80 5.4 <=5ppb
Philly-NEW MOD-X-00959 quantaq OZONE 0.89 0e+00 >=0.80 4.3 <=5ppb
Philly-NEW airgradient:588c81359ef8 airgradient OZONE 0.728 1.025456e-184 >=0.80 6.9 <=5ppb

Only airgradient slighlty fails to reach the threshold for the r.squared value for ozone at the NEW site

4.3.3 NEW - NO2

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-NEW A44MFTF3 clarity NO2 0.277 2.019119e-57 >=0.70 4.8 <=5ppb
Philly-NEW MOD-X-00959 quantaq NO2 0.203 8.03833e-39 >=0.70 5.1 <=5ppb
Philly-NEW airgradient:588c81359ef8 airgradient NO2 0.588 2.949638e-127 >=0.70 3.8 <=5ppb

The r.squared for N02 at the MON site are all below the threshold. SUrprisingly only air gradient is within close to 0.7 and it has the lowest rmse

4.3.4 NEW - CO

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Philly-NEW A44MFTF3 clarity CO 0.481 1.154292e-111 >=0.80 0 <=1ppm
Philly-NEW MOD-X-00959 quantaq CO 0.311 2.646471e-61 >=0.80 0 <=1ppm

Both clarity and quantaq CO r-squared values at new are below the threshold with clarity having a slightly higher r-squared

5. Summary table of all parameters at all locations

Location Device.sn Manufacturer Parameter R.squared P.values Target.R.squared rmse Target.rmse
Harrisburg MOD-X-00958 quantaq PM2.5 0.97 2.139003e-47 >=0.70 0.5 <=7µg/m^3
Harrisburg A93NM49L clarity PM2.5 0.957 1.195538e-42 >=0.70 0.6 <=7µg/m^3
Harrisburg airgradient:588c8135a6ec airgradient PM2.5 0.942 1.05461e-25 >=0.70 0.8 <=7µg/m^3
Harrisburg MOD-X-00958 quantaq OZONE 0.723 0e+00 >=0.80 6.3 <=5ppb
Harrisburg A93NM49L clarity OZONE 0.752 0e+00 >=0.80 6 <=5ppb
Harrisburg airgradient:588c8135a6ec airgradient OZONE 0.632 4.744426e-214 >=0.80 7.3 <=5ppb
Philly-MON A636FW6Q clarity PM2.5 0.942 3.525698e-23 >=0.70 1.3 <=7µg/m^3
Philly-MON airgradient:588c813fe180 airgradient PM2.5 0.92 6.928558e-17 >=0.70 1.5 <=7µg/m^3
Philly-MON A636FW6Q clarity NO2 0.23 5.345159e-45 >=0.70 4.5 <=5ppb
Philly-MON airgradient:588c813fe180 airgradient NO2 0.356 1.536529e-61 >=0.70 4.1 <=5ppb
Philly-MON A636FW6Q clarity CO 0.798 2.429596e-275 >=0.80 0 <=1ppm
Philly-NEW A44MFTF3 clarity PM2.5 0.663 4.785889e-09 >=0.70 1.6 <=7µg/m^3
Philly-NEW MOD-X-00959 quantaq PM2.5 0.741 1.331475e-10 >=0.70 1.4 <=7µg/m^3
Philly-NEW airgradient:588c81359ef8 airgradient PM2.5 0.637 3.595473e-07 >=0.70 1.7 <=7µg/m^3
Philly-NEW A44MFTF3 clarity OZONE 0.824 3.673595e-300 >=0.80 5.4 <=5ppb
Philly-NEW MOD-X-00959 quantaq OZONE 0.89 0e+00 >=0.80 4.3 <=5ppb
Philly-NEW airgradient:588c81359ef8 airgradient OZONE 0.728 1.025456e-184 >=0.80 6.9 <=5ppb
Philly-NEW A44MFTF3 clarity NO2 0.277 2.019119e-57 >=0.70 4.8 <=5ppb
Philly-NEW MOD-X-00959 quantaq NO2 0.203 8.03833e-39 >=0.70 5.1 <=5ppb
Philly-NEW airgradient:588c81359ef8 airgradient NO2 0.588 2.949638e-127 >=0.70 3.8 <=5ppb
Philly-NEW A44MFTF3 clarity CO 0.481 1.154292e-111 >=0.80 0 <=1ppm
Philly-NEW MOD-X-00959 quantaq CO 0.311 2.646471e-61 >=0.80 0 <=1ppm