This tutorial demonstrates how to import, clean, analyze, and visualize meteorological tracking data in R. We’ll compare sensor measurements against NOAA weather data to evaluate sensor accuracy and determine which device performs better.
First, we’ll install and load all necessary packages:
# Install packages if needed (uncomment the lines below)
# install.packages(c("readxl", "dplyr", "ggplot2", "leaflet", "lubridate", "summarytools", "httr", "jsonlite", "plotly"))
library(readxl) # For reading Excel files
library(dplyr) # For data manipulation
library(ggplot2) # For plotting
library(leaflet) # For interactive maps
library(lubridate) # For date-time manipulation
library(summarytools)# For summary statistics
library(httr) # For API calls
library(jsonlite) # For JSON parsing
library(plotly) # For interactive plots
# Import the Excel file
weather_data <- read_excel("CDT_Meteo_Tracker_Data.xlsx")
# View the first few rows
head(weather_data)
## # A tibble: 6 × 11
## TimeStampFull Date Time Lat Lon alt_m Temp_F
## <chr> <dttm> <dttm> <dbl> <dbl> <dbl> <dbl>
## 1 2025-09-09T0… 2025-09-09 00:00:00 1899-12-31 08:53:32 42.7 -109. 2779 10.6
## 2 2025-09-09T0… 2025-09-09 00:00:00 1899-12-31 08:53:35 42.7 -109. 2778 10.6
## 3 2025-09-09T0… 2025-09-09 00:00:00 1899-12-31 08:53:38 42.7 -109. 2779 10.6
## 4 2025-09-09T0… 2025-09-09 00:00:00 1899-12-31 08:53:41 42.7 -109. 2779 10.6
## 5 2025-09-09T0… 2025-09-09 00:00:00 1899-12-31 08:53:44 42.7 -109. 2778 10.6
## 6 2025-09-09T0… 2025-09-09 00:00:00 1899-12-31 08:53:47 42.7 -109. 2779 10.6
## # ℹ 4 more variables: Hum <dbl>, Press <dbl>, DP <dbl>, Device <chr>
Pay special attention to the timestamp, date, and time variables:
# Convert data types
weather_data <- weather_data %>%
mutate(
# Parse ISO 8601 timestamp with timezone
TimeStampFull = ymd_hms(TimeStampFull, tz = "UTC"),
# Parse date
Date = ymd(Date),
# Parse time (HMS format)
Time = hms::as_hms(Time),
# Ensure numeric variables are numeric
Lat = as.numeric(Lat),
Lon = as.numeric(Lon),
alt_m = as.numeric(alt_m),
Hum = as.numeric(Hum),
Press = as.numeric(Press),
DP = as.numeric(DP),
# Ensure Device is a factor
Device = as.factor(Device)
)
# Check if temperature is actually in Celsius (common issue with sensor data)
# If the column is named Temp_F but values are small (typical for C), convert to F
if (mean(weather_data$Temp_F, na.rm = TRUE) < 30) {
cat("NOTE: Temperature values appear to be in Celsius. Converting to Fahrenheit...\n")
weather_data <- weather_data %>%
mutate(
Temp_C = Temp_F, # Save original Celsius values
Temp_F = (Temp_F * 9/5) + 32 # Convert to Fahrenheit
)
cat("Conversion complete. Original Celsius values saved in 'Temp_C' column.\n\n")
} else {
# Ensure Temp_F is numeric
weather_data$Temp_F <- as.numeric(weather_data$Temp_F)
}
## NOTE: Temperature values appear to be in Celsius. Converting to Fahrenheit...
## Conversion complete. Original Celsius values saved in 'Temp_C' column.
# Check the structure
str(weather_data)
## tibble [13,647 × 12] (S3: tbl_df/tbl/data.frame)
## $ TimeStampFull: POSIXct[1:13647], format: "2025-09-09 14:53:32" "2025-09-09 14:53:35" ...
## $ Date : Date[1:13647], format: "2025-09-09" "2025-09-09" ...
## $ Time : 'hms' num [1:13647] 08:53:32 08:53:35 08:53:38 08:53:41 ...
## ..- attr(*, "units")= chr "secs"
## $ Lat : num [1:13647] 42.7 42.7 42.7 42.7 42.7 ...
## $ Lon : num [1:13647] -109 -109 -109 -109 -109 ...
## $ alt_m : num [1:13647] 2779 2778 2779 2779 2778 ...
## $ Temp_F : num [1:13647] 51.1 51.1 51.1 51.1 51.1 ...
## $ Hum : num [1:13647] 69 69 68 68 68 68 68 67 66 66 ...
## $ Press : num [1:13647] 733 733 733 733 733 733 733 733 733 733 ...
## $ DP : num [1:13647] 5.3 5.2 5.1 5.1 5.1 5.1 5.1 4.9 4.6 4.6 ...
## $ Device : Factor w/ 2 levels "Purple","Red": 2 2 2 2 2 2 2 2 2 2 ...
## $ Temp_C : num [1:13647] 10.6 10.6 10.6 10.6 10.6 10.6 10.7 10.7 10.6 10.5 ...
# Display temperature range to verify
cat("\nTemperature range after conversion:\n")
##
## Temperature range after conversion:
cat("Min:", round(min(weather_data$Temp_F, na.rm = TRUE), 1), "°F\n")
## Min: 29.7 °F
cat("Max:", round(max(weather_data$Temp_F, na.rm = TRUE), 1), "°F\n")
## Max: 92.5 °F
cat("Mean:", round(mean(weather_data$Temp_F, na.rm = TRUE), 1), "°F\n")
## Mean: 52.9 °F
# Create color palette for devices (Red and Purple)
device_colors <- colorFactor(palette = c("red", "purple"),
domain = weather_data$Device)
# Create the leaflet map
leaflet_map <- leaflet(weather_data) %>%
# Add multiple base layers
addProviderTiles(providers$OpenTopoMap, group = "OpenTopoMap") %>%
addProviderTiles(providers$Esri.WorldTopoMap, group = "ESRI Topo") %>%
addProviderTiles("Esri.WorldImagery", group = "Satellite") %>%
addWMSTiles(
baseUrl = "https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WMSServer?",
layers = "0",
options = WMSTileOptions(format = "image/png", transparent = TRUE),
attribution = "USGS The National Map",
group = "USGS Topo"
) %>%
addProviderTiles(providers$OpenStreetMap, group = "OpenStreetMap") %>%
# Add markers with popups and labels
addCircleMarkers(
lng = ~Lon,
lat = ~Lat,
radius = 5,
color = ~device_colors(Device),
fillOpacity = 0.7,
stroke = TRUE,
weight = 1,
# Hover label - expanded to include all key variables
label = ~paste0(
"Device: ", Device,
" | Temp: ", round(Temp_F, 1), "°F",
" | Hum: ", Hum, "%",
" | Press: ", Press, " hPa",
" | DP: ", round(DP, 1), "°F"
),
# Click popup
popup = ~paste0(
"<strong>Device:</strong> ", Device, "<br/>",
"<strong>Temperature:</strong> ", round(Temp_F, 1), "°F<br/>",
"<strong>Humidity:</strong> ", Hum, "%<br/>",
"<strong>Pressure:</strong> ", Press, " hPa<br/>",
"<strong>Dew Point:</strong> ", round(DP, 1), "°F<br/>",
"<strong>Altitude:</strong> ", round(alt_m, 1), " m<br/>",
"<strong>Time:</strong> ", format(TimeStampFull, "%Y-%m-%d %H:%M:%S")
)
) %>%
# Add layer control with radio buttons
addLayersControl(
baseGroups = c("OpenTopoMap", "ESRI Topo", "Satellite", "USGS Topo", "OpenStreetMap"),
options = layersControlOptions(collapsed = FALSE)
) %>%
# Add legend
addLegend(
position = "bottomright",
pal = device_colors,
values = ~Device,
title = "Device",
opacity = 1
) %>%
# Add scale bar
addScaleBar(position = "bottomleft")
# Display the map
leaflet_map
Exercise 1: In the map above, try changing the base layer that loads by default. Currently it loads OpenTopoMap first - can you make it load a different layer first instead? (Hint: The order in which you add the provider tiles matters!)
# Display dataset dimensions
cat("Dataset dimensions:", nrow(weather_data), "rows x", ncol(weather_data), "columns\n\n")
## Dataset dimensions: 13647 rows x 12 columns
# Check date range
cat("Dataset date range:\n")
## Dataset date range:
cat("Earliest:", as.character(min(weather_data$Date, na.rm = TRUE)), "\n")
## Earliest: 2024-09-05
cat("Latest:", as.character(max(weather_data$Date, na.rm = TRUE)), "\n\n")
## Latest: 2025-09-11
# Summary of all variables
summary(weather_data)
## TimeStampFull Date Time
## Min. :2024-09-05 17:27:21 Min. :2024-09-05 Min. :00:00:20.000000
## 1st Qu.:2024-09-15 19:31:45 1st Qu.:2024-09-15 1st Qu.:08:38:12.500000
## Median :2025-09-03 03:10:32 Median :2025-09-02 Median :13:31:44.000000
## Mean :2025-05-03 09:33:49 Mean :2025-05-02 Mean :12:49:47.503041
## 3rd Qu.:2025-09-07 13:52:33 3rd Qu.:2025-09-07 3rd Qu.:17:18:42.000000
## Max. :2025-09-12 00:47:54 Max. :2025-09-11 Max. :23:59:43.000000
## Lat Lon alt_m Temp_F
## Min. :42.52 Min. :-109.8 Min. :2178 Min. :29.66
## 1st Qu.:42.70 1st Qu.:-109.5 1st Qu.:2824 1st Qu.:42.26
## Median :42.79 Median :-109.3 Median :2985 Median :52.88
## Mean :42.80 Mean :-109.3 Mean :2950 Mean :52.93
## 3rd Qu.:42.93 3rd Qu.:-109.1 3rd Qu.:3118 3rd Qu.:62.60
## Max. :43.02 Max. :-108.9 Max. :3561 Max. :92.48
## Hum Press DP Device
## Min. :13.00 Min. :660.0 Min. :-5.6000 Purple:4788
## 1st Qu.:32.00 1st Qu.:708.0 1st Qu.:-1.5000 Red :8859
## Median :49.00 Median :718.0 Median : 0.5000
## Mean :51.54 Mean :718.8 Mean : 0.6537
## 3rd Qu.:70.00 3rd Qu.:728.0 3rd Qu.: 2.6000
## Max. :99.00 Max. :780.0 Max. :13.4000
## Temp_C
## Min. :-1.30
## 1st Qu.: 5.70
## Median :11.60
## Mean :11.63
## 3rd Qu.:17.00
## Max. :33.60
# Summary statistics for numeric variables by device
weather_data %>%
group_by(Device) %>%
summarise(
n = n(),
Mean_Temp_F = mean(Temp_F, na.rm = TRUE),
SD_Temp_F = sd(Temp_F, na.rm = TRUE),
Mean_Humidity = mean(Hum, na.rm = TRUE),
SD_Humidity = sd(Hum, na.rm = TRUE),
Mean_Pressure = mean(Press, na.rm = TRUE),
SD_Pressure = sd(Press, na.rm = TRUE),
Mean_Altitude_m = mean(alt_m, na.rm = TRUE),
Min_Altitude_m = min(alt_m, na.rm = TRUE),
Max_Altitude_m = max(alt_m, na.rm = TRUE)
) %>%
knitr::kable(digits = 2, caption = "Summary Statistics by Device")
| Device | n | Mean_Temp_F | SD_Temp_F | Mean_Humidity | SD_Humidity | Mean_Pressure | SD_Pressure | Mean_Altitude_m | Min_Altitude_m | Max_Altitude_m |
|---|---|---|---|---|---|---|---|---|---|---|
| Purple | 4788 | 54.72 | 11.62 | 50.43 | 21.83 | 719.40 | 26.97 | 2915.23 | 2178 | 3561 |
| Red | 8859 | 51.97 | 13.20 | 52.14 | 21.24 | 718.43 | 10.30 | 2968.11 | 2749 | 3320 |
We’ll use the Open-Meteo API to retrieve historical weather data. There is a limit on how much data you can take, if you exceed this you will be cut-off for a time period. If we want to eventually download all the data corresponding for all of our points we will need to be a bit strategic about doing it slower or in chunks. Here are the terms for the free tier of Open_Meteo: Less than 10’000 API calls per day, 5’000 per hour and 600 per minute. For more information visit: https://open-meteo.com/
# Function to get weather data from Open-Meteo API
get_weather_data <- function(lat, lon, datetime, debug = FALSE) {
# Convert datetime to date
date <- as.Date(datetime)
# Build API URL
base_url <- "https://archive-api.open-meteo.com/v1/archive"
url <- paste0(
base_url,
"?latitude=", lat,
"&longitude=", lon,
"&start_date=", date,
"&end_date=", date,
"&hourly=temperature_2m,relative_humidity_2m,surface_pressure",
"&temperature_unit=fahrenheit"
)
# Default return values
default_result <- list(
NOAA_Temp_F = NA_real_,
NOAA_Humidity = NA_real_,
NOAA_Pressure_hPa = NA_real_
)
# Make API request
result <- tryCatch({
response <- GET(url)
if (debug) {
cat("URL:", url, "\n")
cat("Status:", status_code(response), "\n")
}
if (status_code(response) == 200) {
data <- content(response, "parsed")
# Check if data exists
if (is.null(data$hourly)) {
if (debug) cat("No hourly data in response\n")
return(default_result)
}
if (is.null(data$hourly$time) || length(data$hourly$time) == 0) {
if (debug) cat("No time data\n")
return(default_result)
}
if (is.null(data$hourly$temperature_2m) || length(data$hourly$temperature_2m) == 0) {
if (debug) cat("No temperature data\n")
return(default_result)
}
hourly_data <- data$hourly
# Unwrap lists if needed
hourly_data$time <- unlist(hourly_data$time)
hourly_data$temperature_2m <- unlist(hourly_data$temperature_2m)
hourly_data$relative_humidity_2m <- unlist(hourly_data$relative_humidity_2m)
hourly_data$surface_pressure <- unlist(hourly_data$surface_pressure)
if (debug) {
cat("Number of hourly records:", length(hourly_data$time), "\n")
cat("First time value:", hourly_data$time[1], "\n")
cat("Target datetime:", as.character(datetime), "\n")
cat("Target hour:", hour(datetime), "\n")
}
# Get the hour from our datetime (0-23)
target_hour <- hour(datetime)
# The API returns 24 hours (0-23) for the requested date
# Index is target_hour + 1 (since R is 1-indexed)
idx <- target_hour + 1
if (debug) {
cat("Using index:", idx, "\n")
cat("Temperature at index:", hourly_data$temperature_2m[idx], "\n")
}
# Safely extract values with bounds checking
temp_val <- if (idx > 0 && idx <= length(hourly_data$temperature_2m)) {
val <- hourly_data$temperature_2m[idx]
if (!is.null(val)) as.numeric(val) else NA_real_
} else NA_real_
hum_val <- if (idx > 0 && idx <= length(hourly_data$relative_humidity_2m)) {
val <- hourly_data$relative_humidity_2m[idx]
if (!is.null(val)) as.numeric(val) else NA_real_
} else NA_real_
press_val <- if (idx > 0 && idx <= length(hourly_data$surface_pressure)) {
val <- hourly_data$surface_pressure[idx]
if (!is.null(val)) as.numeric(val) else NA_real_
} else NA_real_
if (debug) {
cat("Extracted - Temp:", temp_val, "Hum:", hum_val, "Press:", press_val, "\n\n")
}
return(list(
NOAA_Temp_F = temp_val,
NOAA_Humidity = hum_val,
NOAA_Pressure_hPa = press_val
))
} else {
if (debug) cat("API returned error status\n")
return(default_result)
}
}, error = function(e) {
if (debug) cat("Error:", e$message, "\n")
return(default_result)
})
return(result)
}
# Sample 100 points to demonstrate (to avoid excessive API calls and wait time)
set.seed(123)
sample_indices <- sample(1:nrow(weather_data), min(100, nrow(weather_data)))
# Initialize columns
weather_data$NOAA_Temp_F <- NA
weather_data$NOAA_Humidity <- NA
weather_data$NOAA_Pressure_hPa <- NA
# Test with sample point to see if API is working
cat("Testing API with first sample point...\n")
## Testing API with first sample point...
test_result <- get_weather_data(
weather_data$Lat[sample_indices[1]],
weather_data$Lon[sample_indices[1]],
weather_data$TimeStampFull[sample_indices[1]],
debug = TRUE
)
## URL: https://archive-api.open-meteo.com/v1/archive?latitude=42.585104&longitude=-109.086524&start_date=2025-09-11&end_date=2025-09-11&hourly=temperature_2m,relative_humidity_2m,surface_pressure&temperature_unit=fahrenheit
## Status: 200
## Number of hourly records: 24
## First time value: 2025-09-11T00:00
## Target datetime: 2025-09-11 16:30:02
## Target hour: 16
## Using index: 17
## Temperature at index: 55.3
## Extracted - Temp: 55.3 Hum: 51 Press: 742.8
if (!is.na(test_result$NOAA_Temp_F)) {
cat("SUCCESS! API is returning data.\n\n")
} else {
cat("WARNING: API test failed. Check the debug output above.\n\n")
}
## SUCCESS! API is returning data.
# Fetch NOAA data for sample points
cat("Fetching NOAA weather data for", length(sample_indices), "sample points...\n")
## Fetching NOAA weather data for 100 sample points...
cat("This will take approximately", length(sample_indices) * 0.5, "seconds.\n\n")
## This will take approximately 50 seconds.
# Counter for successful retrievals
success_count <- 0
fail_count <- 0
for (i in sample_indices) {
res <- get_weather_data(
weather_data$Lat[i],
weather_data$Lon[i],
weather_data$TimeStampFull[i]
)
weather_data$NOAA_Temp_F[i] <- res$NOAA_Temp_F
weather_data$NOAA_Humidity[i] <- res$NOAA_Humidity
weather_data$NOAA_Pressure_hPa[i] <- res$NOAA_Pressure_hPa
# Track success/failure
if (!is.na(weather_data$NOAA_Temp_F[i])) {
success_count <- success_count + 1
} else {
fail_count <- fail_count + 1
}
# add countdown feedback to know how long data retrieval is taking
if (i %% 10 == 0) {
cat("Processing point", which(sample_indices == i), "of", length(sample_indices),
"(", round(which(sample_indices == i) / length(sample_indices) * 100, 1), "% complete )\n")
}
Sys.sleep(0.5)
}
## Processing point 25 of 100 ( 25 % complete )
## Processing point 30 of 100 ( 30 % complete )
## Processing point 36 of 100 ( 36 % complete )
## Processing point 54 of 100 ( 54 % complete )
## Processing point 57 of 100 ( 57 % complete )
## Processing point 93 of 100 ( 93 % complete )
## Processing point 96 of 100 ( 96 % complete )
## Processing point 99 of 100 ( 99 % complete )
cat("\nNOAA data fetch complete!\n")
##
## NOAA data fetch complete!
cat("Successfully retrieved:", success_count, "of", length(sample_indices), "sample points\n")
## Successfully retrieved: 100 of 100 sample points
cat("Failed to retrieve:", fail_count, "points\n\n")
## Failed to retrieve: 0 points
# Create comparison dataset with only points that have NOAA data
comparison_data <- weather_data %>%
filter(!is.na(NOAA_Temp_F)) %>%
mutate(
Temp_Diff = Temp_F - NOAA_Temp_F,
Temp_Pct_Diff = ((Temp_F - NOAA_Temp_F) / NOAA_Temp_F) * 100,
Temp_Abs_Diff = abs(Temp_Diff),
Hum_Diff = Hum - NOAA_Humidity,
Hum_Abs_Diff = abs(Hum_Diff),
Press_Diff = Press - NOAA_Pressure_hPa,
Press_Abs_Diff = abs(Press_Diff)
)
cat("Comparison dataset created with", nrow(comparison_data), "points\n")
## Comparison dataset created with 100 points
Note: We’re sampling 100 points to demonstrate the API integration without excessive wait times. To fetch NOAA data for all points, change the sample line accordingly.
Now let’s analyze how well each sensor performs compared to NOAA data.
#Option to Load in comparison data instead of making API call if you have the csv file saved to your working directory
# library(readr)
# comparison_data <- read_csv("CDT_Weather_Comparison.csv")
accuracy_summary <- comparison_data %>%
group_by(Device) %>%
summarise(
n = n(),
# Temperature accuracy
Mean_Temp_Error = mean(Temp_Diff, na.rm = TRUE),
MAE_Temp = mean(Temp_Abs_Diff, na.rm = TRUE),
RMSE_Temp = sqrt(mean(Temp_Diff^2, na.rm = TRUE)),
# Humidity accuracy
Mean_Hum_Error = mean(Hum_Diff, na.rm = TRUE),
MAE_Hum = mean(Hum_Abs_Diff, na.rm = TRUE),
# Pressure accuracy
Mean_Press_Error = mean(Press_Diff, na.rm = TRUE),
MAE_Press = mean(Press_Abs_Diff, na.rm = TRUE)
)
knitr::kable(accuracy_summary, digits = 2,
caption = "Sensor Accuracy Comparison (Lower values = better accuracy)")
| Device | n | Mean_Temp_Error | MAE_Temp | RMSE_Temp | Mean_Hum_Error | MAE_Hum | Mean_Press_Error | MAE_Press |
|---|---|---|---|---|---|---|---|---|
| Purple | 45 | 0.84 | 3.89 | 4.65 | 6.71 | 9.56 | -4.23 | 4.31 |
| Red | 55 | -0.83 | 8.30 | 11.59 | 3.64 | 11.93 | -7.84 | 7.98 |
Interpretation: - Mean Error: Average difference (positive = sensor reads higher than NOAA) - MAE (Mean Absolute Error): Average magnitude of errors (ignoring direction) - RMSE (Root Mean Square Error): Emphasizes larger errors
ggplot(comparison_data, aes(x = Hum_Diff, color = Device, fill = Device)) +
geom_density(alpha = 0.3, linewidth = 1.2) +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", linewidth = 1) +
labs(title = "Humidity Error Distribution (Sensor - NOAA)",
subtitle = "Curves centered near 0 indicate accurate measurements",
x = "Humidity Error (%)",
y = "Density") +
theme_minimal() +
scale_fill_manual(values = c("Red" = "red", "Purple" = "purple")) +
scale_color_manual(values = c("Red" = "darkred", "Purple" = "darkviolet"))
ggplot(comparison_data, aes(x = Temp_Diff, color = Device, fill = Device)) +
geom_density(alpha = 0.3, linewidth = 1.2) +
geom_vline(xintercept = 0, linetype = "dashed", color = "black", linewidth = 1) +
labs(title = "Temperature Error Distribution (Sensor - NOAA)",
subtitle = "Curves centered near 0 indicate accurate measurements",
x = "Temperature Error (°F)",
y = "Density") +
theme_minimal() +
scale_fill_manual(values = c("Red" = "red", "Purple" = "purple")) +
scale_color_manual(values = c("Red" = "darkred", "Purple" = "darkviolet"))
Exercise 2: Modify the code above to create Pressure difference distributions instead of Temperature. Change the x-axis variable and update the title and x-axis label accordingly. Hint: look at what the column names for comparison_data are.
## put your exercise 1 code here
ggplot(comparison_data, aes(x = Device, y = Temp_Abs_Diff, fill = Device)) +
geom_boxplot() +
labs(title = "Absolute Temperature Error by Device",
subtitle = "Lower values and tighter boxes indicate better accuracy",
x = "Device",
y = "Absolute Temperature Error (°F)") +
theme_minimal() +
scale_fill_manual(values = c("Red" = "red", "Purple" = "purple"))
Exercise 3: Modify the code above to create a box plot showing the Pressure absolute difference distribution instead of Temperature. Change the y-axis variable and update the title and y-axis label accordingly.
## put your exercise 2 code here
# Categorize errors as small medium and large. change according to what might make the most sense
comparison_data <- comparison_data %>%
mutate(
Temp_Error_Category = case_when(
Temp_Abs_Diff < 3 ~ "Small (<3°F)",
Temp_Abs_Diff < 6 ~ "Medium (3-6°F)",
TRUE ~ "Large (>6°F)"
)
)
error_counts <- comparison_data %>%
count(Device, Temp_Error_Category) %>%
group_by(Device) %>%
mutate(Percentage = (n / sum(n)) * 100)
ggplot(error_counts, aes(x = Device, y = Percentage, fill = Temp_Error_Category)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = paste0(round(Percentage, 1), "%")),
position = position_dodge(width = 0.9), vjust = -0.5, size = 3) +
labs(title = "Temperature Error Categories by Device",
subtitle = "Higher percentage of 'Small' errors indicates better sensor",
x = "Device",
y = "Percentage of Measurements",
fill = "Error Size") +
theme_minimal() +
scale_fill_brewer(palette = "RdYlGn", direction = -1)
Exercise 4: Modify the code above to create bar charts showing the error sizes for Pressure absolute differences. Update the title accordingly.
## put your exercise 3 code here
# Create interactive scatter plot
p <- ggplot(comparison_data, aes(x = NOAA_Temp_F, y = Temp_F, color = Device,
text = paste0(
"Device: ", Device, "<br>",
"Sensor Temp: ", round(Temp_F, 2), "°F<br>",
"NOAA Temp: ", round(NOAA_Temp_F, 2), "°F<br>",
"Difference: ", round(Temp_Diff, 2), "°F<br>",
"% Difference: ", round(Temp_Pct_Diff, 2), "%<br>",
"Date/Time: ", format(TimeStampFull, "%Y-%m-%d %H:%M")
))) +
geom_point(size = 3, alpha = 0.7) +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "black", linewidth = 1) +
scale_color_manual(values = c("Red" = "red", "Purple" = "purple")) +
theme_minimal() +
labs(title = "Sensor Temperature vs NOAA Temperature",
subtitle = "Points closer to the diagonal line indicate better accuracy",
x = "NOAA Temperature (°F)",
y = "Sensor Temperature (°F)")
ggplotly(p, tooltip = "text") %>%
layout(hovermode = "closest")
Exercise 5: Create a similar interactive scatter plot for Humidity (Sensor Humidity vs NOAA Humidity).
## put your exercise 4 code here
Let’s investigate if there are patterns to the measurement errors based on date. Do the sensors underestimate or overestimate the noaa weather data?
# Group by date to see if certain days have more errors
daily_errors <- comparison_data %>%
filter(Device =="Red") %>%
group_by(Date, Device) %>%
summarise(
Mean_Temp_Error = mean(Temp_Diff, na.rm = TRUE),
n = n(),
.groups = "drop"
)
ggplot(daily_errors, aes(x = Date, y = Mean_Temp_Error, color = Device, size = n)) +
geom_point(alpha = 0.7) +
geom_line(aes(group = Device), linewidth = 1) +
labs(title = "Mean Temperature Error by Date",
subtitle = "Point size indicates number of measurements that day",
x = "Date",
y = "Mean Absolute Error (°F)",
size = "Number of\nMeasurements") +
theme_minimal() +
scale_color_manual(values = c("Red" = "red", "Purple" = "purple"))
Exercise 6: Create a similar mean absolute temperature error by date plot for the Purple device.
## put your exercise 5 code here
# Count positive (overestimate) vs negative (underestimate) errors
error_direction <- comparison_data %>%
mutate(Error_Direction = ifelse(Temp_Diff > 0, "Overestimate", "Underestimate")) %>%
count(Device, Error_Direction) %>%
group_by(Device) %>%
mutate(Percentage = (n / sum(n)) * 100)
ggplot(error_direction, aes(x = Device, y = Percentage, fill = Error_Direction)) +
geom_bar(stat = "identity", position = "dodge") +
geom_text(aes(label = paste0(round(Percentage, 1), "%")),
position = position_dodge(width = 0.9), vjust = -0.5) +
labs(title = "Direction of Temperature Errors",
subtitle = "Does the sensor tend to read high or low?",
x = "Device",
y = "Percentage of Measurements",
fill = "Error Direction") +
theme_minimal() +
scale_fill_manual(values = c("Overestimate" = "red", "Underestimate" = "blue"))
Now let’s create an interactive map that shows over/underestimates for temperature, humidity, and pressure:
# Create error classification for visualization
comparison_data <- comparison_data %>%
mutate(
Temp_Error_Type = case_when(
Temp_Diff > 1 ~ "Overestimate",
Temp_Diff < -1 ~ "Underestimate",
TRUE ~ "Accurate"
),
Hum_Error_Type = case_when(
Hum_Diff > 5 ~ "Overestimate",
Hum_Diff < -5 ~ "Underestimate",
TRUE ~ "Accurate"
),
Press_Error_Type = case_when(
Press_Diff > 2 ~ "Overestimate",
Press_Diff < -2 ~ "Underestimate",
TRUE ~ "Accurate"
)
)
# Create color palettes for each measurement type
temp_error_colors <- colorFactor(
palette = c("blue", "green", "red"),
domain = c("Underestimate", "Accurate", "Overestimate")
)
hum_error_colors <- colorFactor(
palette = c("blue", "green", "red"),
domain = c("Underestimate", "Accurate", "Overestimate")
)
press_error_colors <- colorFactor(
palette = c("blue", "green", "red"),
domain = c("Underestimate", "Accurate", "Overestimate")
)
# Create the interactive error map
error_map <- leaflet(comparison_data) %>%
# Add base layers
addProviderTiles(providers$Esri.WorldTopoMap, group = "ESRI Topo") %>%
addProviderTiles("Esri.WorldImagery", group = "Satellite") %>%
addWMSTiles(
baseUrl = "https://basemap.nationalmap.gov/arcgis/services/USGSTopo/MapServer/WMSServer?",
layers = "0",
options = WMSTileOptions(format = "image/png", transparent = TRUE),
attribution = "USGS The National Map",
group = "USGS Topo"
) %>%
addProviderTiles(providers$OpenStreetMap, group = "OpenStreetMap") %>%
# Temperature error markers
addCircleMarkers(
lng = ~Lon,
lat = ~Lat,
radius = 7,
color = ~temp_error_colors(Temp_Error_Type),
fillOpacity = 0.7,
stroke = TRUE,
weight = 2,
group = "Temperature Errors",
label = ~paste0(
"Device: ", Device,
" | ", Temp_Error_Type,
" | Error: ", round(Temp_Diff, 1), "°F"
),
popup = ~paste0(
"<strong>Device:</strong> ", Device, "<br/>",
"<strong>Temperature Error:</strong> ", Temp_Error_Type, "<br/>",
"Sensor: ", round(Temp_F, 1), "°F<br/>",
"NOAA: ", round(NOAA_Temp_F, 1), "°F<br/>",
"Difference: ", round(Temp_Diff, 1), "°F<br/>",
"<strong>Time:</strong> ", format(TimeStampFull, "%Y-%m-%d %H:%M")
)
) %>%
# Humidity error markers
addCircleMarkers(
lng = ~Lon,
lat = ~Lat,
radius = 7,
color = ~hum_error_colors(Hum_Error_Type),
fillOpacity = 0.7,
stroke = TRUE,
weight = 2,
group = "Humidity Errors",
label = ~paste0(
"Device: ", Device,
" | ", Hum_Error_Type,
" | Error: ", round(Hum_Diff, 1), "%"
),
popup = ~paste0(
"<strong>Device:</strong> ", Device, "<br/>",
"<strong>Humidity Error:</strong> ", Hum_Error_Type, "<br/>",
"Sensor: ", Hum, "%<br/>",
"NOAA: ", round(NOAA_Humidity, 1), "%<br/>",
"Difference: ", round(Hum_Diff, 1), "%<br/>",
"<strong>Time:</strong> ", format(TimeStampFull, "%Y-%m-%d %H:%M")
)
) %>%
# Pressure error markers
addCircleMarkers(
lng = ~Lon,
lat = ~Lat,
radius = 7,
color = ~press_error_colors(Press_Error_Type),
fillOpacity = 0.7,
stroke = TRUE,
weight = 2,
group = "Pressure Errors",
label = ~paste0(
"Device: ", Device,
" | ", Press_Error_Type,
" | Error: ", round(Press_Diff, 1), " hPa"
),
popup = ~paste0(
"<strong>Device:</strong> ", Device, "<br/>",
"<strong>Pressure Error:</strong> ", Press_Error_Type, "<br/>",
"Sensor: ", Press, " hPa<br/>",
"NOAA: ", round(NOAA_Pressure_hPa, 1), " hPa<br/>",
"Difference: ", round(Press_Diff, 1), " hPa<br/>",
"<strong>Time:</strong> ", format(TimeStampFull, "%Y-%m-%d %H:%M")
)
) %>%
# Add layer controls
addLayersControl(
baseGroups = c("ESRI Topo", "Satellite", "USGS Topo", "OpenStreetMap"),
overlayGroups = c("Temperature Errors", "Humidity Errors", "Pressure Errors"),
options = layersControlOptions(collapsed = FALSE)
) %>%
# Show only temperature errors by default
hideGroup(c("Humidity Errors", "Pressure Errors")) %>%
# Add legends
addLegend(
position = "bottomright",
pal = temp_error_colors,
values = ~Temp_Error_Type,
title = "Error Type",
opacity = 1,
group = "Temperature Errors"
) %>%
addLegend(
position = "bottomright",
pal = hum_error_colors,
values = ~Hum_Error_Type,
title = "Error Type",
opacity = 1,
group = "Humidity Errors"
) %>%
addLegend(
position = "bottomright",
pal = press_error_colors,
values = ~Press_Error_Type,
title = "Error Type",
opacity = 1,
group = "Pressure Errors"
) %>%
# Add scale bar
addScaleBar(position = "bottomleft")
# Display the map
error_map
This interactive map allows you to: - Switch base layers: Choose between ESRI Topo, Satellite, USGS Topo, or OpenStreetMap - Toggle error types: Check/uncheck boxes to show Temperature, Humidity, or Pressure errors - Color coding: - Blue = Underestimate (sensor reads too low) - Green = Accurate (within threshold) - Red = Overestimate (sensor reads too high) - Hover labels: Quick view of device, error type, and magnitude - Click popups: Detailed comparison data
# Save the enhanced dataset with NOAA data
# write.csv(comparison_data, "CDT_Weather_Comparison.csv", row.names = FALSE)
# write_xlsx(comparison_data, "CDT_Weather_Comparison.xlsx")
cat("Analysis complete!\n")
## Analysis complete!
cat("Comparison dataset contains", nrow(comparison_data), "points with NOAA data.\n")
## Comparison dataset contains 100 points with NOAA data.
This tutorial demonstrated:
Based on the comparison with NOAA data: - Total comparison points: 100 - Red device measurements: 55 - Purple device measurements: 45
Exercise 7: Review the analysis comparing the red and purple meteo devices. Type your answers below each question. Provide at least three sentences per response:
7a: Which device would you say was more accurate compared to NOAA weather data (Temperature, Pressure, Humidity) and why would you say so? Be sure to mention what graph(s) or statistic(s) support your conclusion.
7b: What was something interesting you noticed about the difference between device data and the NOAA temperature data? Where there any patterns you noticed in this descriptive statistical analysis regarding the potential accuracy of these devices compared to NOAA historical data? Mention what graph(s) or statistic(s) helped you notice this pattern.
To submit this assignment, publish on RPubs and post link url on Canvas