HW03 - CE 5363 Time Series Data Analysis of Speed Variation
Packages and File Path
Time Variables
month_levels <- c(
"January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"
)
time_levels <- c("Morning Peak", "Daytime", "Evening Peak", "Nighttime")
speed_data <- speed_data_raw %>%
transmute(
xd_id,
measurement_tstamp = ymd_hms(measurement_tstamp, quiet = TRUE),
speed,
average_speed,
reference_speed,
travel_time_seconds,
confidence_score,
cvalue
) %>%
mutate(
month = month(measurement_tstamp),
month_name = as.character(month(measurement_tstamp, label = TRUE, abbr = FALSE)),
day_name = as.character(wday(measurement_tstamp, label = TRUE, abbr = FALSE)),
hour = hour(measurement_tstamp),
day_type = if_else(day_name %in% c("Saturday", "Sunday"), "Weekend", "Weekday"),
time_period = case_when(
hour >= 6 & hour < 9 ~ "Morning Peak",
hour >= 9 & hour < 16 ~ "Daytime",
hour >= 16 & hour < 20 ~ "Evening Peak",
hour >= 20 | hour < 6 ~ "Nighttime",
TRUE ~ NA_character_
)
) %>%
filter(
!is.na(measurement_tstamp),
!is.na(month),
!is.na(month_name),
!is.na(day_type),
!is.na(time_period),
!is.na(speed)
) %>%
mutate(
month_name = factor(month_name, levels = month_levels),
day_type = factor(day_type, levels = c("Weekday", "Weekend")),
time_period = factor(time_period, levels = time_levels)
)Introduction
This report examines roadway speed time series data to understand how speed changes across months, day types, and time periods. The dataset is processed by converting timestamps and extracting key time attributes, including month, hour, and day of week. Observations are grouped into weekday and weekend and further classified into four time periods: morning peak, daytime, evening peak, and nighttime.
The analysis summarizes speed behavior by computing mean speed, standard deviation, and 85th percentile speed for each combination of month, day type, and time period. In addition to the required visualizations, additional plots are developed to better capture variability, distribution patterns, and monthly differences. These results are used to support interpretation of temporal speed patterns and their implications for traffic operations and roadway safety.
Dataset Description
dataset_info <- data.frame(
Description = c(
"Dataset File Path",
"Number of Observations",
"Number of Unique Road Segments"
),
Value = c(
data_path,
format(nrow(speed_data), big.mark = ","),
format(n_distinct(speed_data$xd_id), big.mark = ",")
)
)
kable(dataset_info, align = c("l", "c"), caption = "Dataset description") %>%
kable_styling(full_width = FALSE) %>%
row_spec(0, bold = TRUE) %>%
column_spec(1, bold = TRUE)| Description | Value |
|---|---|
| Dataset File Path | C:/Users/asr171/OneDrive - Texas State University/HW 3 asr171/Final/data_asr171.csv |
| Number of Observations | 7,491,739 |
| Number of Unique Road Segments | 153 |
Summary Table
summary_table <- speed_data %>%
filter(!is.na(month), !is.na(month_name), !is.na(day_type), !is.na(time_period), !is.na(speed)) %>%
group_by(month, month_name, day_type, time_period) %>%
summarise(
mean_speed = round(mean(speed, na.rm = TRUE), 2),
std_speed = round(sd(speed, na.rm = TRUE), 2),
p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
n_obs = n(),
.groups = "drop"
) %>%
arrange(month, day_type, time_period)
monthly_summary <- speed_data %>%
filter(!is.na(month), !is.na(month_name), !is.na(speed)) %>%
group_by(month, month_name) %>%
summarise(
mean_speed = round(mean(speed, na.rm = TRUE), 2),
std_speed = round(sd(speed, na.rm = TRUE), 2),
p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
n_obs = n(),
.groups = "drop"
) %>%
arrange(month)
daytype_summary <- speed_data %>%
filter(!is.na(day_type), !is.na(speed)) %>%
group_by(day_type) %>%
summarise(
mean_speed = round(mean(speed, na.rm = TRUE), 2),
std_speed = round(sd(speed, na.rm = TRUE), 2),
p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
n_obs = n(),
.groups = "drop"
)
timeperiod_summary <- speed_data %>%
filter(!is.na(time_period), !is.na(speed)) %>%
group_by(time_period) %>%
summarise(
mean_speed = round(mean(speed, na.rm = TRUE), 2),
std_speed = round(sd(speed, na.rm = TRUE), 2),
p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
n_obs = n(),
.groups = "drop"
)
bar_summary <- speed_data %>%
filter(!is.na(day_type), !is.na(time_period), !is.na(speed)) %>%
group_by(day_type, time_period) %>%
summarise(
mean_speed = round(mean(speed, na.rm = TRUE), 2),
p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
std_speed = round(sd(speed, na.rm = TRUE), 2),
.groups = "drop"
)
datatable(
summary_table,
caption = "Summary table of mean speed, standard deviation, 85th percentile speed, and observation count",
options = list(pageLength = 12, scrollX = TRUE)
)Plot Style
base_theme <- theme_minimal(base_size = 15) +
theme(
plot.title = element_text(face = "bold", size = 20, color = "#a61c46"),
plot.subtitle = element_text(size = 12, color = "#6b1f2a"),
axis.title = element_text(face = "bold", color = "#4a1c1c"),
axis.text = element_text(color = "#4a1c1c"),
panel.grid.minor = element_blank(),
panel.background = element_rect(fill = "#f5f5f5", color = NA),
plot.background = element_rect(fill = "#f5f5f5", color = NA),
legend.position = "bottom",
legend.title = element_text(face = "bold"),
strip.text = element_text(face = "bold", size = 12)
)
period_fill <- c(
"Morning Peak" = "#e6d3c3",
"Daytime" = "#e89a7a",
"Evening Peak" = "#f26c4f",
"Nighttime" = "#d62828"
)
metric_fill <- c(
"Mean Speed" = "#e89a7a",
"85th Percentile Speed" = "#d62828"
)
day_fill <- c(
"Weekday" = "#f26c4f",
"Weekend" = "#e6d3c3"
)Monthly Average Speed
plot_monthly <- ggplot(monthly_summary, aes(x = month, y = mean_speed, group = 1)) +
geom_line(color = "#d62828", linewidth = 1.2) +
geom_point(size = 3.2, color = "#e89a7a") +
geom_label(
aes(label = round(mean_speed, 2)),
size = 3.3,
fill = "white",
color = "#4a1c1c",
label.size = 0.15,
vjust = -0.8
) +
scale_x_continuous(breaks = monthly_summary$month, labels = monthly_summary$month_name) +
labs(
title = "Monthly Average Speed",
subtitle = "Mean speed variation across months for all roadway segments",
x = "Month",
y = "Average Speed"
) +
base_theme +
theme(axis.text.x = element_text(angle = 22, hjust = 1))
plot_monthlySpeed Distribution by Day Type and Time Period
set.seed(123)
speed_sample_box <- speed_data %>%
select(speed, day_type, time_period) %>%
sample_n(min(250000, n()))
plot_box <- ggplot(speed_sample_box, aes(x = time_period, y = speed, fill = time_period)) +
geom_boxplot(outlier.alpha = 0.10, width = 0.7) +
facet_wrap(~day_type, ncol = 1) +
scale_fill_manual(values = period_fill) +
labs(
title = "Speed Distribution by Day Type and Time Period",
subtitle = "Comparison of speed variability across weekday and weekend conditions",
x = "Time Period",
y = "Speed",
fill = "Time Period"
) +
base_theme
plot_boxMean and 85th Percentile Speed Comparison
bar_long <- bar_summary %>%
pivot_longer(c(mean_speed, p85_speed), names_to = "metric", values_to = "speed_value") %>%
mutate(metric = factor(metric, levels = c("mean_speed", "p85_speed"), labels = c("Mean Speed", "85th Percentile Speed")))
plot_bar <- ggplot(bar_long, aes(x = time_period, y = speed_value, fill = metric)) +
geom_col(position = position_dodge(width = 0.75), width = 0.68) +
geom_text(
aes(label = round(speed_value, 2)),
position = position_dodge(width = 0.75),
vjust = -0.35,
size = 3.5,
color = "#4a1c1c"
) +
facet_wrap(~day_type) +
scale_fill_manual(values = metric_fill) +
labs(
title = "Mean and 85th Percentile Speed Comparison",
subtitle = "Differences in average and high-end speeds across time periods and day types",
x = "Time Period",
y = "Speed",
fill = "Metric"
) +
base_theme +
theme(axis.text.x = element_text(angle = 14, hjust = 1))
plot_barInterpretation of Results
highest_month <- monthly_summary %>%
filter(!is.na(month_name)) %>%
slice_max(order_by = mean_speed, n = 1, with_ties = FALSE)
lowest_month <- monthly_summary %>%
filter(!is.na(month_name)) %>%
slice_min(order_by = mean_speed, n = 1, with_ties = FALSE)
highest_variability_period <- timeperiod_summary %>%
filter(!is.na(time_period)) %>%
slice_max(order_by = std_speed, n = 1, with_ties = FALSE)
weekday_mean <- daytype_summary %>%
filter(day_type == "Weekday") %>%
pull(mean_speed)
weekend_mean <- daytype_summary %>%
filter(day_type == "Weekend") %>%
pull(mean_speed)
peak_summary <- bar_summary %>%
filter(time_period %in% c("Morning Peak", "Evening Peak")) %>%
summarise(peak_mean = mean(mean_speed), .groups = "drop")
offpeak_summary <- bar_summary %>%
filter(time_period %in% c("Daytime", "Nighttime")) %>%
summarise(offpeak_mean = mean(mean_speed), .groups = "drop")
largest_gap <- bar_summary %>%
mutate(gap = p85_speed - mean_speed) %>%
slice_max(order_by = gap, n = 1, with_ties = FALSE)
cat(
"<h3>Interpretation of Results</h3>",
"<p><b>Monthly variation in speed:</b> The analysis shows noticeable variation in average speed across months. The highest average speed is observed in <b>",
as.character(highest_month$month_name),
"</b> (", round(highest_month$mean_speed, 2),
"), while the lowest occurs in <b>",
as.character(lowest_month$month_name),
"</b> (", round(lowest_month$mean_speed, 2),
"). This indicates that speed patterns change over the year, suggesting possible seasonal or demand related effects.</p>",
"<p><b>Speed variability across time periods:</b> The <b>",
as.character(highest_variability_period$time_period),
"</b> period shows the highest variability in speed, based on the standard deviation. This suggests that traffic conditions during this period are less consistent, likely due to fluctuations in traffic flow or congestion levels.</p>",
"<p><b>Weekday and weekend comparison:</b> The average weekday speed is <b>",
round(weekday_mean, 2),
"</b>, while the average weekend speed is <b>",
round(weekend_mean, 2),
"</b>. ",
ifelse(
weekend_mean > weekday_mean,
"Weekend speeds are slightly higher, indicating relatively smoother traffic conditions.",
"Weekday speeds are slightly higher, indicating more stable flow conditions during the week."
),
"</p>",
"<p><b>Peak versus off peak conditions:</b> The average speed during peak periods is <b>",
round(peak_summary$peak_mean, 2),
"</b>, compared to <b>",
round(offpeak_summary$offpeak_mean, 2),
"</b> during off peak periods. This confirms that peak periods are associated with lower operating speeds, reflecting higher traffic demand and congestion.</p>",
"<p><b>Mean and 85th percentile comparison:</b> The difference between mean speed and 85th percentile speed highlights the spread of higher end speeds. The largest gap is observed during <b>",
as.character(largest_gap$time_period),
"</b> under <b>",
as.character(largest_gap$day_type),
"</b> conditions. This suggests a wider distribution of speeds, where some drivers travel significantly faster than the average.</p>",
"<p><b>Implications for traffic operations and safety:</b> These findings indicate that speed patterns vary by time, day type, and operating conditions. Lower speeds during peak periods reflect congestion effects, while higher variability in certain periods suggests unstable traffic flow. The presence of larger gaps between mean and higher end speeds may indicate potential safety concerns related to speed dispersion. These insights can support targeted speed management strategies across different temporal conditions.</p>"
)Interpretation of Results
Monthly variation in speed: The analysis shows noticeable variation in average speed across months. The highest average speed is observed in January ( 42.98 ), while the lowest occurs in October ( 42.14 ). This indicates that speed patterns change over the year, suggesting possible seasonal or demand related effects.
Speed variability across time periods: The Evening Peak period shows the highest variability in speed, based on the standard deviation. This suggests that traffic conditions during this period are less consistent, likely due to fluctuations in traffic flow or congestion levels.
Weekday and weekend comparison: The average weekday speed is 42.27 , while the average weekend speed is 43.08 . Weekend speeds are slightly higher, indicating relatively smoother traffic conditions.
Peak versus off peak conditions: The average speed during peak periods is 42.1 , compared to 42.8 during off peak periods. This confirms that peak periods are associated with lower operating speeds, reflecting higher traffic demand and congestion.
Mean and 85th percentile comparison: The difference between mean speed and 85th percentile speed highlights the spread of higher end speeds. The largest gap is observed during Morning Peak under Weekday conditions. This suggests a wider distribution of speeds, where some drivers travel significantly faster than the average.
Implications for traffic operations and safety: These findings indicate that speed patterns vary by time, day type, and operating conditions. Lower speeds during peak periods reflect congestion effects, while higher variability in certain periods suggests unstable traffic flow. The presence of larger gaps between mean and higher end speeds may indicate potential safety concerns related to speed dispersion. These insights can support targeted speed management strategies across different temporal conditions.
Heatmap of Mean Speed by Month and Time Period
month_order <- speed_data %>%
filter(!is.na(month), !is.na(month_name)) %>%
distinct(month, month_name) %>%
arrange(month) %>%
pull(month_name)
heatmap_data <- speed_data %>%
filter(!is.na(month), !is.na(month_name), !is.na(time_period), !is.na(speed)) %>%
group_by(month, month_name, time_period) %>%
summarise(mean_speed = mean(speed, na.rm = TRUE), .groups = "drop") %>%
mutate(month_name = factor(month_name, levels = month_order))
plot_heatmap <- ggplot(heatmap_data, aes(x = month_name, y = time_period, fill = mean_speed)) +
geom_tile(color = "white", linewidth = 0.6) +
geom_text(aes(label = round(mean_speed, 2)), size = 3.6, color = "#4a1c1c") +
scale_fill_gradientn(
colours = c("#e6d3c3", "#e89a7a", "#f26c4f", "#d62828"),
name = "Mean Speed"
) +
labs(
title = "Mean Speed by Month and Time Period",
subtitle = "Monthly differences in average speed across daily operating periods",
x = "Month",
y = "Time Period"
) +
base_theme +
theme(axis.text.x = element_text(angle = 24, hjust = 1))
plot_heatmapSpeed and Variability Relationship
relationship_data <- bar_summary %>%
mutate(label = paste(day_type, time_period, sep = " - "))
plot_relationship <- ggplot(relationship_data, aes(x = mean_speed, y = std_speed, color = time_period, shape = day_type)) +
geom_point(size = 4.2, stroke = 1.1) +
geom_text_repel(aes(label = label), size = 3.5, color = "#4a1c1c", max.overlaps = Inf, show.legend = FALSE) +
scale_color_manual(values = period_fill) +
scale_shape_manual(values = c(16, 17)) +
labs(
title = "Speed and Variability Relationship",
subtitle = "Mean speed and standard deviation across day types and time periods",
x = "Mean Speed",
y = "Standard Deviation of Speed",
color = "Time Period",
shape = "Day Type"
) +
base_theme
plot_relationshipAdditional Interpretation
cat(
"<h3>Additional Interpretation from Visualizations</h3>",
"<p><b>Monthly pattern across time periods:</b> The heatmap provides a clearer view of how mean speed varies jointly across months and time periods. Rather than following a uniform trend, speed levels differ by both temporal dimensions. Nighttime consistently shows higher average speeds across most months, while daytime and evening peak periods tend to exhibit relatively lower values. This indicates that the effect of month is not uniform and is influenced by underlying traffic conditions within each time period.</p>",
"<p><b>Speed and variability relationship:</b> The relationship between mean speed and standard deviation highlights differences in traffic stability across conditions. Periods with higher mean speeds are not always associated with lower variability. In some cases, higher speeds are accompanied by increased dispersion, suggesting less uniform traffic flow. Conversely, certain lower speed conditions show relatively tighter distributions, indicating more stable movement. This distinction is important for understanding how speed and variability interact under different operational contexts.</p>",
"<p><b>Differences between weekday and weekend conditions:</b> When comparing weekday and weekend patterns within the relationship plot, weekend conditions generally show slightly higher mean speeds with comparable or lower variability in some time periods. This suggests relatively smoother traffic conditions during weekends, likely due to reduced demand. However, variability patterns remain dependent on the specific time period, indicating that both demand and temporal factors jointly influence traffic behavior.</p>",
"<p><b>Implications for traffic behavior:</b> The additional visualizations provide deeper insight beyond average comparisons by capturing how speed patterns vary jointly across time and how they relate to variability. The presence of higher speeds alongside increased variability in certain conditions may indicate inconsistent driver behavior, which has implications for safety and speed management. These findings support the need for time specific strategies that consider both average speed and its dispersion rather than relying on a single summary measure.</p>"
)Additional Interpretation from Visualizations
Monthly pattern across time periods: The heatmap provides a clearer view of how mean speed varies jointly across months and time periods. Rather than following a uniform trend, speed levels differ by both temporal dimensions. Nighttime consistently shows higher average speeds across most months, while daytime and evening peak periods tend to exhibit relatively lower values. This indicates that the effect of month is not uniform and is influenced by underlying traffic conditions within each time period.
Speed and variability relationship: The relationship between mean speed and standard deviation highlights differences in traffic stability across conditions. Periods with higher mean speeds are not always associated with lower variability. In some cases, higher speeds are accompanied by increased dispersion, suggesting less uniform traffic flow. Conversely, certain lower speed conditions show relatively tighter distributions, indicating more stable movement. This distinction is important for understanding how speed and variability interact under different operational contexts.
Differences between weekday and weekend conditions: When comparing weekday and weekend patterns within the relationship plot, weekend conditions generally show slightly higher mean speeds with comparable or lower variability in some time periods. This suggests relatively smoother traffic conditions during weekends, likely due to reduced demand. However, variability patterns remain dependent on the specific time period, indicating that both demand and temporal factors jointly influence traffic behavior.
Implications for traffic behavior: The additional visualizations provide deeper insight beyond average comparisons by capturing how speed patterns vary jointly across time and how they relate to variability. The presence of higher speeds alongside increased variability in certain conditions may indicate inconsistent driver behavior, which has implications for safety and speed management. These findings support the need for time specific strategies that consider both average speed and its dispersion rather than relying on a single summary measure.