'data.frame': 56565 obs. of 9 variables:
$ X : int 0 1 2 3 4 ...
$ time_US_eastern : chr "2026-04-15 20:00:00" "2026-04-15 21:00:00" ...
$ latitude : num 40.2 40.2 ...
$ longitude : num -76.8 -76.8 ...
$ device_sn : chr "MOD-X-00958" "MOD-X-00958" ...
$ manufacturer : chr "quantaq" "quantaq" ...
$ site_name : chr "Harrisburg" "Harrisburg" ...
$ parameter : chr "PM2.5" "PM2.5" ...
$ pm_gases_measures: num 7.29 7.27 ...
Collocation Analysis Harrisburg and Philly
1. Introduction
In this analysis we calculate the r-squared values and root mean sqaure errors of the low-cost sensors used in collocation in Harrisburg and Philadelphia. The goal is to understand how the data collected using the three air sensor types (airgradient, quantaq and clarity) compares to the one collected using EPA monitors and reported to airnow. By tabulating the results we can easily make analyses to understand how the devices perform when compared to EPA monitors.
Let’s first set up some libraries that are likely tobe useful
2. Data extraction from sensors
The data is extracted from sensors, some cleaning done on it before it is sent to a cloud hosting database known as influxdb. In this notebook we shall read data from influxdb and analyze it from here.
The sensor data from all the three collocation sites was downloaded, cleaned in python and upload to the shared drive. Below we read the csv that stores the data from these sensors.
We make sure that the data types for each attribute in the dataset is set to the expectation
R doesnot seem to recognize the time column as time; so we convert it to time datatype for easy use during analysis
| time_US_eastern | latitude | longitude | device_sn | manufacturer | site_name | parameter | pm_gases_measures | |
|---|---|---|---|---|---|---|---|---|
| 46298 | 2026-04-14 20:00:00 | 40.24701 | -76.84699 | 840420430401 | EPA | Harrisburg | OZONE | 36.0 |
| 46322 | 2026-04-14 20:00:00 | 40.24701 | -76.84699 | 840420430401 | EPA | Harrisburg | PM2.5 | 4.8 |
| 46299 | 2026-04-14 21:00:00 | 40.24701 | -76.84699 | 840420430401 | EPA | Harrisburg | OZONE | 33.0 |
3. Data restructuring for RMSE and R-Squared
Let’s filter out for only the devices we plan to use in this analysis. For consistency, we can use the devices that justair is using to calculate the collocation factors.
In addition, we shall also add the EPA monitors since they will be used for reference. Lets first identify the sn for the EPA monitors
Since we need to use 24 hour average data for pm2.5, we can average the data on 24hour
4. Calculating the r-squared and rmse
We can first make a table with all the thresholds recommended by epa for comparison
For each location we find the r-squared and rmse using the EPA data as reference for each low-cost ‘representative’ device per manufacturer. For reusability and simplicity,let’s write a function that directly makes dataframes for r-squared and rmse based on paramter and location
We section the calculations by location for organization.
4.1 Harrisburg
The EPA monitors in Harrisburg only measure PM2.5 and Ozone.
4.1.1 Harrisburg PM2.5
We use the function to find the pm2.5 error values in Harrisburg for the three low cost sensors
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Harrisburg | MOD-X-00958 | quantaq | PM2.5 | 0.97 | 2.139003e-47 | >=0.70 | 0.5 | <=7µg/m^3 |
| Harrisburg | A93NM49L | clarity | PM2.5 | 0.957 | 1.195538e-42 | >=0.70 | 0.6 | <=7µg/m^3 |
| Harrisburg | airgradient:588c8135a6ec | airgradient | PM2.5 | 0.942 | 1.05461e-25 | >=0.70 | 0.8 | <=7µg/m^3 |
Based on the values above, we can say that the R.squared and rmse values for all the devices (quantaq, clarity and airgradient) are withing the EPA recommended range
4.1.2 Harrisburg Ozone
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Harrisburg | MOD-X-00958 | quantaq | OZONE | 0.723 | 0e+00 | >=0.80 | 6.3 | <=5ppb |
| Harrisburg | A93NM49L | clarity | OZONE | 0.752 | 0e+00 | >=0.80 | 6 | <=5ppb |
| Harrisburg | airgradient:588c8135a6ec | airgradient | OZONE | 0.632 | 4.744426e-214 | >=0.80 | 7.3 | <=5ppb |
The r-squared for Ozone in Harrisburg are low below the EPA recommended threshold. Aigradient seems to be having the lowest r-squared value and largest rmse
4.2 MON Philadelphia
The Philadelphia Montemogrey site measure PM2.5, CO and NO2.
4.2.1 MON - PM2.5
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-MON | A636FW6Q | clarity | PM2.5 | 0.942 | 3.525698e-23 | >=0.70 | 1.3 | <=7µg/m^3 |
| Philly-MON | airgradient:588c813fe180 | airgradient | PM2.5 | 0.92 | 6.928558e-17 | >=0.70 | 1.5 | <=7µg/m^3 |
Both the air gradient and Clarity at the Montgomery have an r-squared value that is greater than the epa recommended threshold . The rmse is also less than the threshold
4.2.1 MON - NO2
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-MON | A636FW6Q | clarity | NO2 | 0.23 | 5.345159e-45 | >=0.70 | 4.5 | <=5ppb |
| Philly-MON | airgradient:588c813fe180 | airgradient | NO2 | 0.356 | 1.536529e-61 | >=0.70 | 4.1 | <=5ppb |
The r-squared values for NO2 at the Montgomery site are way below the thrshold!
4.2.1 MON - CO
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-MON | A636FW6Q | clarity | CO | 0.798 | 2.429596e-275 | >=0.80 | 0 | <=1ppm |
The CO value for the clarity at Montgomery site is just right on the threshold!
4.3 NEW - Philadelphia
4.3.1 NEW - PM2.5
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-NEW | A44MFTF3 | clarity | PM2.5 | 0.663 | 4.785889e-09 | >=0.70 | 1.6 | <=7µg/m^3 |
| Philly-NEW | MOD-X-00959 | quantaq | PM2.5 | 0.741 | 1.331475e-10 | >=0.70 | 1.4 | <=7µg/m^3 |
| Philly-NEW | airgradient:588c81359ef8 | airgradient | PM2.5 | 0.637 | 3.595473e-07 | >=0.70 | 1.7 | <=7µg/m^3 |
Surprisingly, the pm2.5 r-squared values on the NEW site are below the threshold. This makes this site the only one where the pm2.5 r-squared values do not meet the threshold. The values are even lower when hourly data is used
4.3.2 NEW - OZONE
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-NEW | A44MFTF3 | clarity | OZONE | 0.824 | 3.673595e-300 | >=0.80 | 5.4 | <=5ppb |
| Philly-NEW | MOD-X-00959 | quantaq | OZONE | 0.89 | 0e+00 | >=0.80 | 4.3 | <=5ppb |
| Philly-NEW | airgradient:588c81359ef8 | airgradient | OZONE | 0.728 | 1.025456e-184 | >=0.80 | 6.9 | <=5ppb |
Only airgradient slighlty fails to reach the threshold for the r.squared value for ozone at the NEW site
4.3.3 NEW - NO2
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-NEW | A44MFTF3 | clarity | NO2 | 0.277 | 2.019119e-57 | >=0.70 | 4.8 | <=5ppb |
| Philly-NEW | MOD-X-00959 | quantaq | NO2 | 0.203 | 8.03833e-39 | >=0.70 | 5.1 | <=5ppb |
| Philly-NEW | airgradient:588c81359ef8 | airgradient | NO2 | 0.588 | 2.949638e-127 | >=0.70 | 3.8 | <=5ppb |
The r.squared for N02 at the MON site are all below the threshold. SUrprisingly only air gradient is within close to 0.7 and it has the lowest rmse
4.3.4 NEW - CO
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Philly-NEW | A44MFTF3 | clarity | CO | 0.481 | 1.154292e-111 | >=0.80 | 0 | <=1ppm |
| Philly-NEW | MOD-X-00959 | quantaq | CO | 0.311 | 2.646471e-61 | >=0.80 | 0 | <=1ppm |
Both clarity and quantaq CO r-squared values at new are below the threshold with clarity having a slightly higher r-squared
5. Summary table of all parameters at all locations
| Location | Device.sn | Manufacturer | Parameter | R.squared | P.values | Target.R.squared | rmse | Target.rmse |
|---|---|---|---|---|---|---|---|---|
| Harrisburg | MOD-X-00958 | quantaq | PM2.5 | 0.97 | 2.139003e-47 | >=0.70 | 0.5 | <=7µg/m^3 |
| Harrisburg | A93NM49L | clarity | PM2.5 | 0.957 | 1.195538e-42 | >=0.70 | 0.6 | <=7µg/m^3 |
| Harrisburg | airgradient:588c8135a6ec | airgradient | PM2.5 | 0.942 | 1.05461e-25 | >=0.70 | 0.8 | <=7µg/m^3 |
| Harrisburg | MOD-X-00958 | quantaq | OZONE | 0.723 | 0e+00 | >=0.80 | 6.3 | <=5ppb |
| Harrisburg | A93NM49L | clarity | OZONE | 0.752 | 0e+00 | >=0.80 | 6 | <=5ppb |
| Harrisburg | airgradient:588c8135a6ec | airgradient | OZONE | 0.632 | 4.744426e-214 | >=0.80 | 7.3 | <=5ppb |
| Philly-MON | A636FW6Q | clarity | PM2.5 | 0.942 | 3.525698e-23 | >=0.70 | 1.3 | <=7µg/m^3 |
| Philly-MON | airgradient:588c813fe180 | airgradient | PM2.5 | 0.92 | 6.928558e-17 | >=0.70 | 1.5 | <=7µg/m^3 |
| Philly-MON | A636FW6Q | clarity | NO2 | 0.23 | 5.345159e-45 | >=0.70 | 4.5 | <=5ppb |
| Philly-MON | airgradient:588c813fe180 | airgradient | NO2 | 0.356 | 1.536529e-61 | >=0.70 | 4.1 | <=5ppb |
| Philly-MON | A636FW6Q | clarity | CO | 0.798 | 2.429596e-275 | >=0.80 | 0 | <=1ppm |
| Philly-NEW | A44MFTF3 | clarity | PM2.5 | 0.663 | 4.785889e-09 | >=0.70 | 1.6 | <=7µg/m^3 |
| Philly-NEW | MOD-X-00959 | quantaq | PM2.5 | 0.741 | 1.331475e-10 | >=0.70 | 1.4 | <=7µg/m^3 |
| Philly-NEW | airgradient:588c81359ef8 | airgradient | PM2.5 | 0.637 | 3.595473e-07 | >=0.70 | 1.7 | <=7µg/m^3 |
| Philly-NEW | A44MFTF3 | clarity | OZONE | 0.824 | 3.673595e-300 | >=0.80 | 5.4 | <=5ppb |
| Philly-NEW | MOD-X-00959 | quantaq | OZONE | 0.89 | 0e+00 | >=0.80 | 4.3 | <=5ppb |
| Philly-NEW | airgradient:588c81359ef8 | airgradient | OZONE | 0.728 | 1.025456e-184 | >=0.80 | 6.9 | <=5ppb |
| Philly-NEW | A44MFTF3 | clarity | NO2 | 0.277 | 2.019119e-57 | >=0.70 | 4.8 | <=5ppb |
| Philly-NEW | MOD-X-00959 | quantaq | NO2 | 0.203 | 8.03833e-39 | >=0.70 | 5.1 | <=5ppb |
| Philly-NEW | airgradient:588c81359ef8 | airgradient | NO2 | 0.588 | 2.949638e-127 | >=0.70 | 3.8 | <=5ppb |
| Philly-NEW | A44MFTF3 | clarity | CO | 0.481 | 1.154292e-111 | >=0.80 | 0 | <=1ppm |
| Philly-NEW | MOD-X-00959 | quantaq | CO | 0.311 | 2.646471e-61 | >=0.80 | 0 | <=1ppm |