HW03 - CE 5363 Time Series Data Analysis of Speed Variation

Packages and File Path

library(readr)
library(dplyr)
library(lubridate)
library(ggplot2)
library(tidyr)
library(knitr)
library(kableExtra)
library(DT)
library(forcats)
library(ggrepel)
library(scales)

data_path <- "C:/Users/asr171/OneDrive - Texas State University/HW 3 asr171/Final/data_asr171.csv"
output_dir <- getwd()

Dataset

speed_data_raw <- read_csv(data_path, show_col_types = FALSE)

Time Variables

month_levels <- c(
  "January", "February", "March", "April", "May", "June",
  "July", "August", "September", "October", "November", "December"
)

time_levels <- c("Morning Peak", "Daytime", "Evening Peak", "Nighttime")

speed_data <- speed_data_raw %>%
  transmute(
    xd_id,
    measurement_tstamp = ymd_hms(measurement_tstamp, quiet = TRUE),
    speed,
    average_speed,
    reference_speed,
    travel_time_seconds,
    confidence_score,
    cvalue
  ) %>%
  mutate(
    month = month(measurement_tstamp),
    month_name = as.character(month(measurement_tstamp, label = TRUE, abbr = FALSE)),
    day_name = as.character(wday(measurement_tstamp, label = TRUE, abbr = FALSE)),
    hour = hour(measurement_tstamp),
    day_type = if_else(day_name %in% c("Saturday", "Sunday"), "Weekend", "Weekday"),
    time_period = case_when(
      hour >= 6 & hour < 9 ~ "Morning Peak",
      hour >= 9 & hour < 16 ~ "Daytime",
      hour >= 16 & hour < 20 ~ "Evening Peak",
      hour >= 20 | hour < 6 ~ "Nighttime",
      TRUE ~ NA_character_
    )
  ) %>%
  filter(
    !is.na(measurement_tstamp),
    !is.na(month),
    !is.na(month_name),
    !is.na(day_type),
    !is.na(time_period),
    !is.na(speed)
  ) %>%
  mutate(
    month_name = factor(month_name, levels = month_levels),
    day_type = factor(day_type, levels = c("Weekday", "Weekend")),
    time_period = factor(time_period, levels = time_levels)
  )

Introduction

This report examines roadway speed time series data to understand how speed changes across months, day types, and time periods. The dataset is processed by converting timestamps and extracting key time attributes, including month, hour, and day of week. Observations are grouped into weekday and weekend and further classified into four time periods: morning peak, daytime, evening peak, and nighttime.

The analysis summarizes speed behavior by computing mean speed, standard deviation, and 85th percentile speed for each combination of month, day type, and time period. In addition to the required visualizations, additional plots are developed to better capture variability, distribution patterns, and monthly differences. These results are used to support interpretation of temporal speed patterns and their implications for traffic operations and roadway safety.

Dataset Description

dataset_info <- data.frame(
  Description = c(
    "Dataset File Path",
    "Number of Observations",
    "Number of Unique Road Segments"
  ),
  Value = c(
    data_path,
    format(nrow(speed_data), big.mark = ","),
    format(n_distinct(speed_data$xd_id), big.mark = ",")
  )
)

kable(dataset_info, align = c("l", "c"), caption = "Dataset description") %>%
  kable_styling(full_width = FALSE) %>%
  row_spec(0, bold = TRUE) %>%
  column_spec(1, bold = TRUE)
Dataset description
Description Value
Dataset File Path C:/Users/asr171/OneDrive - Texas State University/HW 3 asr171/Final/data_asr171.csv
Number of Observations 7,491,739
Number of Unique Road Segments 153

Summary Table

summary_table <- speed_data %>%
  filter(!is.na(month), !is.na(month_name), !is.na(day_type), !is.na(time_period), !is.na(speed)) %>%
  group_by(month, month_name, day_type, time_period) %>%
  summarise(
    mean_speed = round(mean(speed, na.rm = TRUE), 2),
    std_speed = round(sd(speed, na.rm = TRUE), 2),
    p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
    n_obs = n(),
    .groups = "drop"
  ) %>%
  arrange(month, day_type, time_period)

monthly_summary <- speed_data %>%
  filter(!is.na(month), !is.na(month_name), !is.na(speed)) %>%
  group_by(month, month_name) %>%
  summarise(
    mean_speed = round(mean(speed, na.rm = TRUE), 2),
    std_speed = round(sd(speed, na.rm = TRUE), 2),
    p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
    n_obs = n(),
    .groups = "drop"
  ) %>%
  arrange(month)

daytype_summary <- speed_data %>%
  filter(!is.na(day_type), !is.na(speed)) %>%
  group_by(day_type) %>%
  summarise(
    mean_speed = round(mean(speed, na.rm = TRUE), 2),
    std_speed = round(sd(speed, na.rm = TRUE), 2),
    p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
    n_obs = n(),
    .groups = "drop"
  )

timeperiod_summary <- speed_data %>%
  filter(!is.na(time_period), !is.na(speed)) %>%
  group_by(time_period) %>%
  summarise(
    mean_speed = round(mean(speed, na.rm = TRUE), 2),
    std_speed = round(sd(speed, na.rm = TRUE), 2),
    p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
    n_obs = n(),
    .groups = "drop"
  )

bar_summary <- speed_data %>%
  filter(!is.na(day_type), !is.na(time_period), !is.na(speed)) %>%
  group_by(day_type, time_period) %>%
  summarise(
    mean_speed = round(mean(speed, na.rm = TRUE), 2),
    p85_speed = round(quantile(speed, probs = 0.85, na.rm = TRUE, names = FALSE), 2),
    std_speed = round(sd(speed, na.rm = TRUE), 2),
    .groups = "drop"
  )

datatable(
  summary_table,
  caption = "Summary table of mean speed, standard deviation, 85th percentile speed, and observation count",
  options = list(pageLength = 12, scrollX = TRUE)
)

Plot Style

base_theme <- theme_minimal(base_size = 15) +
  theme(
    plot.title = element_text(face = "bold", size = 20, color = "#a61c46"),
    plot.subtitle = element_text(size = 12, color = "#6b1f2a"),
    axis.title = element_text(face = "bold", color = "#4a1c1c"),
    axis.text = element_text(color = "#4a1c1c"),
    panel.grid.minor = element_blank(),
    panel.background = element_rect(fill = "#f5f5f5", color = NA),
    plot.background = element_rect(fill = "#f5f5f5", color = NA),
    legend.position = "bottom",
    legend.title = element_text(face = "bold"),
    strip.text = element_text(face = "bold", size = 12)
  )

period_fill <- c(
  "Morning Peak" = "#e6d3c3",
  "Daytime" = "#e89a7a",
  "Evening Peak" = "#f26c4f",
  "Nighttime" = "#d62828"
)

metric_fill <- c(
  "Mean Speed" = "#e89a7a",
  "85th Percentile Speed" = "#d62828"
)

day_fill <- c(
  "Weekday" = "#f26c4f",
  "Weekend" = "#e6d3c3"
)

Monthly Average Speed

plot_monthly <- ggplot(monthly_summary, aes(x = month, y = mean_speed, group = 1)) +
  geom_line(color = "#d62828", linewidth = 1.2) +
  geom_point(size = 3.2, color = "#e89a7a") +
  geom_label(
    aes(label = round(mean_speed, 2)),
    size = 3.3,
    fill = "white",
    color = "#4a1c1c",
    label.size = 0.15,
    vjust = -0.8
  ) +
  scale_x_continuous(breaks = monthly_summary$month, labels = monthly_summary$month_name) +
  labs(
    title = "Monthly Average Speed",
    subtitle = "Mean speed variation across months for all roadway segments",
    x = "Month",
    y = "Average Speed"
  ) +
  base_theme +
  theme(axis.text.x = element_text(angle = 22, hjust = 1))

plot_monthly

ggsave(
  file.path(output_dir, "monthly_average_speed.png"),
  plot = plot_monthly,
  width = 11,
  height = 6.6,
  dpi = 420,
  bg = "white"
)

Speed Distribution by Day Type and Time Period

set.seed(123)

speed_sample_box <- speed_data %>%
  select(speed, day_type, time_period) %>%
  sample_n(min(250000, n()))

plot_box <- ggplot(speed_sample_box, aes(x = time_period, y = speed, fill = time_period)) +
  geom_boxplot(outlier.alpha = 0.10, width = 0.7) +
  facet_wrap(~day_type, ncol = 1) +
  scale_fill_manual(values = period_fill) +
  labs(
    title = "Speed Distribution by Day Type and Time Period",
    subtitle = "Comparison of speed variability across weekday and weekend conditions",
    x = "Time Period",
    y = "Speed",
    fill = "Time Period"
  ) +
  base_theme

plot_box

ggsave(
  file.path(output_dir, "speed_distribution_day_type_time_period.png"),
  plot = plot_box,
  width = 11,
  height = 8.5,
  dpi = 420,
  bg = "white"
)

Mean and 85th Percentile Speed Comparison

bar_long <- bar_summary %>%
  pivot_longer(c(mean_speed, p85_speed), names_to = "metric", values_to = "speed_value") %>%
  mutate(metric = factor(metric, levels = c("mean_speed", "p85_speed"), labels = c("Mean Speed", "85th Percentile Speed")))

plot_bar <- ggplot(bar_long, aes(x = time_period, y = speed_value, fill = metric)) +
  geom_col(position = position_dodge(width = 0.75), width = 0.68) +
  geom_text(
    aes(label = round(speed_value, 2)),
    position = position_dodge(width = 0.75),
    vjust = -0.35,
    size = 3.5,
    color = "#4a1c1c"
  ) +
  facet_wrap(~day_type) +
  scale_fill_manual(values = metric_fill) +
  labs(
    title = "Mean and 85th Percentile Speed Comparison",
    subtitle = "Differences in average and high-end speeds across time periods and day types",
    x = "Time Period",
    y = "Speed",
    fill = "Metric"
  ) +
  base_theme +
  theme(axis.text.x = element_text(angle = 14, hjust = 1))

plot_bar

ggsave(
  file.path(output_dir, "mean_and_85th_percentile_speed_comparison.png"),
  plot = plot_bar,
  width = 11,
  height = 6.8,
  dpi = 420,
  bg = "white"
)

Interpretation of Results

highest_month <- monthly_summary %>%
  filter(!is.na(month_name)) %>%
  slice_max(order_by = mean_speed, n = 1, with_ties = FALSE)

lowest_month <- monthly_summary %>%
  filter(!is.na(month_name)) %>%
  slice_min(order_by = mean_speed, n = 1, with_ties = FALSE)

highest_variability_period <- timeperiod_summary %>%
  filter(!is.na(time_period)) %>%
  slice_max(order_by = std_speed, n = 1, with_ties = FALSE)

weekday_mean <- daytype_summary %>%
  filter(day_type == "Weekday") %>%
  pull(mean_speed)

weekend_mean <- daytype_summary %>%
  filter(day_type == "Weekend") %>%
  pull(mean_speed)

peak_summary <- bar_summary %>%
  filter(time_period %in% c("Morning Peak", "Evening Peak")) %>%
  summarise(peak_mean = mean(mean_speed), .groups = "drop")

offpeak_summary <- bar_summary %>%
  filter(time_period %in% c("Daytime", "Nighttime")) %>%
  summarise(offpeak_mean = mean(mean_speed), .groups = "drop")

largest_gap <- bar_summary %>%
  mutate(gap = p85_speed - mean_speed) %>%
  slice_max(order_by = gap, n = 1, with_ties = FALSE)

cat(
  "<h3>Interpretation of Results</h3>",
  "<p><b>Monthly variation in speed:</b> The analysis shows noticeable variation in average speed across months. The highest average speed is observed in <b>",
  as.character(highest_month$month_name),
  "</b> (", round(highest_month$mean_speed, 2),
  "), while the lowest occurs in <b>",
  as.character(lowest_month$month_name),
  "</b> (", round(lowest_month$mean_speed, 2),
  "). This indicates that speed patterns change over the year, suggesting possible seasonal or demand related effects.</p>",
  "<p><b>Speed variability across time periods:</b> The <b>",
  as.character(highest_variability_period$time_period),
  "</b> period shows the highest variability in speed, based on the standard deviation. This suggests that traffic conditions during this period are less consistent, likely due to fluctuations in traffic flow or congestion levels.</p>",
  "<p><b>Weekday and weekend comparison:</b> The average weekday speed is <b>",
  round(weekday_mean, 2),
  "</b>, while the average weekend speed is <b>",
  round(weekend_mean, 2),
  "</b>. ",
  ifelse(
    weekend_mean > weekday_mean,
    "Weekend speeds are slightly higher, indicating relatively smoother traffic conditions.",
    "Weekday speeds are slightly higher, indicating more stable flow conditions during the week."
  ),
  "</p>",
  "<p><b>Peak versus off peak conditions:</b> The average speed during peak periods is <b>",
  round(peak_summary$peak_mean, 2),
  "</b>, compared to <b>",
  round(offpeak_summary$offpeak_mean, 2),
  "</b> during off peak periods. This confirms that peak periods are associated with lower operating speeds, reflecting higher traffic demand and congestion.</p>",
  "<p><b>Mean and 85th percentile comparison:</b> The difference between mean speed and 85th percentile speed highlights the spread of higher end speeds. The largest gap is observed during <b>",
  as.character(largest_gap$time_period),
  "</b> under <b>",
  as.character(largest_gap$day_type),
  "</b> conditions. This suggests a wider distribution of speeds, where some drivers travel significantly faster than the average.</p>",
  "<p><b>Implications for traffic operations and safety:</b> These findings indicate that speed patterns vary by time, day type, and operating conditions. Lower speeds during peak periods reflect congestion effects, while higher variability in certain periods suggests unstable traffic flow. The presence of larger gaps between mean and higher end speeds may indicate potential safety concerns related to speed dispersion. These insights can support targeted speed management strategies across different temporal conditions.</p>"
)

Interpretation of Results

Monthly variation in speed: The analysis shows noticeable variation in average speed across months. The highest average speed is observed in January ( 42.98 ), while the lowest occurs in October ( 42.14 ). This indicates that speed patterns change over the year, suggesting possible seasonal or demand related effects.

Speed variability across time periods: The Evening Peak period shows the highest variability in speed, based on the standard deviation. This suggests that traffic conditions during this period are less consistent, likely due to fluctuations in traffic flow or congestion levels.

Weekday and weekend comparison: The average weekday speed is 42.27 , while the average weekend speed is 43.08 . Weekend speeds are slightly higher, indicating relatively smoother traffic conditions.

Peak versus off peak conditions: The average speed during peak periods is 42.1 , compared to 42.8 during off peak periods. This confirms that peak periods are associated with lower operating speeds, reflecting higher traffic demand and congestion.

Mean and 85th percentile comparison: The difference between mean speed and 85th percentile speed highlights the spread of higher end speeds. The largest gap is observed during Morning Peak under Weekday conditions. This suggests a wider distribution of speeds, where some drivers travel significantly faster than the average.

Implications for traffic operations and safety: These findings indicate that speed patterns vary by time, day type, and operating conditions. Lower speeds during peak periods reflect congestion effects, while higher variability in certain periods suggests unstable traffic flow. The presence of larger gaps between mean and higher end speeds may indicate potential safety concerns related to speed dispersion. These insights can support targeted speed management strategies across different temporal conditions.

Heatmap of Mean Speed by Month and Time Period

month_order <- speed_data %>%
  filter(!is.na(month), !is.na(month_name)) %>%
  distinct(month, month_name) %>%
  arrange(month) %>%
  pull(month_name)

heatmap_data <- speed_data %>%
  filter(!is.na(month), !is.na(month_name), !is.na(time_period), !is.na(speed)) %>%
  group_by(month, month_name, time_period) %>%
  summarise(mean_speed = mean(speed, na.rm = TRUE), .groups = "drop") %>%
  mutate(month_name = factor(month_name, levels = month_order))

plot_heatmap <- ggplot(heatmap_data, aes(x = month_name, y = time_period, fill = mean_speed)) +
  geom_tile(color = "white", linewidth = 0.6) +
  geom_text(aes(label = round(mean_speed, 2)), size = 3.6, color = "#4a1c1c") +
  scale_fill_gradientn(
    colours = c("#e6d3c3", "#e89a7a", "#f26c4f", "#d62828"),
    name = "Mean Speed"
  ) +
  labs(
    title = "Mean Speed by Month and Time Period",
    subtitle = "Monthly differences in average speed across daily operating periods",
    x = "Month",
    y = "Time Period"
  ) +
  base_theme +
  theme(axis.text.x = element_text(angle = 24, hjust = 1))

plot_heatmap

ggsave(
  file.path(output_dir, "heatmap_mean_speed_month_time_period.png"),
  plot = plot_heatmap,
  width = 11,
  height = 6.8,
  dpi = 420,
  bg = "white"
)

Speed and Variability Relationship

relationship_data <- bar_summary %>%
  mutate(label = paste(day_type, time_period, sep = " - "))

plot_relationship <- ggplot(relationship_data, aes(x = mean_speed, y = std_speed, color = time_period, shape = day_type)) +
  geom_point(size = 4.2, stroke = 1.1) +
  geom_text_repel(aes(label = label), size = 3.5, color = "#4a1c1c", max.overlaps = Inf, show.legend = FALSE) +
  scale_color_manual(values = period_fill) +
  scale_shape_manual(values = c(16, 17)) +
  labs(
    title = "Speed and Variability Relationship",
    subtitle = "Mean speed and standard deviation across day types and time periods",
    x = "Mean Speed",
    y = "Standard Deviation of Speed",
    color = "Time Period",
    shape = "Day Type"
  ) +
  base_theme

plot_relationship

ggsave(
  file.path(output_dir, "speed_and_variability_relationship.png"),
  plot = plot_relationship,
  width = 11,
  height = 6.8,
  dpi = 420,
  bg = "white"
)

Additional Interpretation

cat(
  "<h3>Additional Interpretation from Visualizations</h3>",
  "<p><b>Monthly pattern across time periods:</b> The heatmap provides a clearer view of how mean speed varies jointly across months and time periods. Rather than following a uniform trend, speed levels differ by both temporal dimensions. Nighttime consistently shows higher average speeds across most months, while daytime and evening peak periods tend to exhibit relatively lower values. This indicates that the effect of month is not uniform and is influenced by underlying traffic conditions within each time period.</p>",
  "<p><b>Speed and variability relationship:</b> The relationship between mean speed and standard deviation highlights differences in traffic stability across conditions. Periods with higher mean speeds are not always associated with lower variability. In some cases, higher speeds are accompanied by increased dispersion, suggesting less uniform traffic flow. Conversely, certain lower speed conditions show relatively tighter distributions, indicating more stable movement. This distinction is important for understanding how speed and variability interact under different operational contexts.</p>",
  "<p><b>Differences between weekday and weekend conditions:</b> When comparing weekday and weekend patterns within the relationship plot, weekend conditions generally show slightly higher mean speeds with comparable or lower variability in some time periods. This suggests relatively smoother traffic conditions during weekends, likely due to reduced demand. However, variability patterns remain dependent on the specific time period, indicating that both demand and temporal factors jointly influence traffic behavior.</p>",
  "<p><b>Implications for traffic behavior:</b> The additional visualizations provide deeper insight beyond average comparisons by capturing how speed patterns vary jointly across time and how they relate to variability. The presence of higher speeds alongside increased variability in certain conditions may indicate inconsistent driver behavior, which has implications for safety and speed management. These findings support the need for time specific strategies that consider both average speed and its dispersion rather than relying on a single summary measure.</p>"
)

Additional Interpretation from Visualizations

Monthly pattern across time periods: The heatmap provides a clearer view of how mean speed varies jointly across months and time periods. Rather than following a uniform trend, speed levels differ by both temporal dimensions. Nighttime consistently shows higher average speeds across most months, while daytime and evening peak periods tend to exhibit relatively lower values. This indicates that the effect of month is not uniform and is influenced by underlying traffic conditions within each time period.

Speed and variability relationship: The relationship between mean speed and standard deviation highlights differences in traffic stability across conditions. Periods with higher mean speeds are not always associated with lower variability. In some cases, higher speeds are accompanied by increased dispersion, suggesting less uniform traffic flow. Conversely, certain lower speed conditions show relatively tighter distributions, indicating more stable movement. This distinction is important for understanding how speed and variability interact under different operational contexts.

Differences between weekday and weekend conditions: When comparing weekday and weekend patterns within the relationship plot, weekend conditions generally show slightly higher mean speeds with comparable or lower variability in some time periods. This suggests relatively smoother traffic conditions during weekends, likely due to reduced demand. However, variability patterns remain dependent on the specific time period, indicating that both demand and temporal factors jointly influence traffic behavior.

Implications for traffic behavior: The additional visualizations provide deeper insight beyond average comparisons by capturing how speed patterns vary jointly across time and how they relate to variability. The presence of higher speeds alongside increased variability in certain conditions may indicate inconsistent driver behavior, which has implications for safety and speed management. These findings support the need for time specific strategies that consider both average speed and its dispersion rather than relying on a single summary measure.