Planning Alerts Project

Author

Bazil Zafar

Company’s Introduction

Planning Alerts offers a service whereby customers can subscribe to be notified of any Irish planning applications that impact them. By outlining their preferred locations, Planning Alerts will send targeted alerts for any new, updated, or decided applications. Their service ensures the customer gets a full heads-up without them having to spend time trawling through countywide lists or newspapers. Instead, they send alerts by email, app, or SMS text message, saving customers significant time. You can rea…

This report explores trends and behaviors from website usage data to provide actionable insights.

# Load necessary libraries
library(tidyverse)
library(ggplot2)
library(kableExtra)
library(knitr)
library(lubridate)
# Import data and format datetime fields
pa_data <- read_csv("planning_alerts_data.csv", show_col_types = FALSE) %>%
  mutate(
    tfc_stamped_dt = parse_date_time(tfc_stamped, orders = c("dmy HMS", "ymd HMS", "ymd HM", "dmy HM"), quiet = TRUE)
  ) %>%
  filter(!is.na(tfc_stamped_dt)) %>%  # Ensure valid timestamps
  select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
  rename(tfc_stamped = tfc_stamped_dt) %>%
  mutate(
    day = wday(tfc_stamped, label = TRUE, abbr = FALSE),
    hour = hour(tfc_stamped),
    week = floor_date(tfc_stamped, "week"),
    month = floor_date(tfc_stamped, "month")
  )

1. Users by Device Type

device_users <- pa_data %>%
  group_by(tfc_device_type) %>%
  summarize(users = n_distinct(tfc_cookie), .groups = "drop")

ggplot(device_users, aes(x = reorder(tfc_device_type, -users), y = users, fill = tfc_device_type)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  geom_text(aes(label = users), vjust = -0.5, size = 3) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Users by Device Type",
    x = "Device Type",
    y = "Number of Users"
  ) +
  theme_minimal(base_size = 12)

Desktop users are the largest group, indicating a need to optimize the desktop experience. Mobile/browser and app users suggest increasing mobile-friendly features.

2. Sessions by Day of the Week

sessions_by_day <- pa_data %>%
  group_by(day, tfc_device_type) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = "drop")

ggplot(sessions_by_day, aes(x = day, y = sessions, fill = tfc_device_type)) +
  geom_bar(stat = "identity", position = "dodge") +
  geom_text(aes(label = sessions), position = position_dodge(width = 0.9), vjust = -0.5, size = 3) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Sessions by Day of the Week",
    x = "Day",
    y = "Number of Sessions",
    fill = "Device Type"
  ) +
  theme_minimal(base_size = 12)

Sessions peak on weekdays, particularly from desktop users, suggesting targeted weekday campaigns may be more effective.

3. Average Session Length

session_length <- pa_data %>%
  group_by(tfc_session) %>%
  summarize(
    session_duration = as.numeric(difftime(max(tfc_stamped), min(tfc_stamped), units = "mins"))
  ) %>%
  summarize(
    avg_session_length = mean(session_duration, na.rm = TRUE),
    median_session_length = median(session_duration, na.rm = TRUE)
  )

session_length %>%
  kable(
    caption = "Average Session Length (Minutes)",
    col.names = c("Average Duration", "Median Duration"),
    format.args = list(digits = 2)
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Average Session Length (Minutes)
Average Duration Median Duration
2193 0

The average session duration is approximately 2193 minutes, suggesting most users browse quickly.

4. Users by Hour of the Day

# Load required libraries
library(tidyverse)
library(lubridate)

# Step 1: Load the data
pa_data <- read_csv("planning_alerts_data.csv", show_col_types = FALSE) %>%
  mutate(
    # Step 2: Parse timestamps and extract hours
    tfc_stamped_dt = parse_date_time(
      tfc_stamped,
      orders = c("dmy HMS", "ymd HMS", "ymd HM", "dmy HM"),
      quiet = TRUE
    ),
    hour = hour(tfc_stamped_dt)  # Extract hour from the timestamp
  ) %>%
  filter(!is.na(tfc_stamped_dt))  # Exclude rows with invalid timestamps

# Debugging: Ensure necessary columns exist
if (!("tfc_session" %in% names(pa_data))) {
  print("Error: Column `tfc_session` is not found in the dataset. Please verify the input file.")
  stop("Execution halted: Missing `tfc_session` column.")
}

# Step 3: Aggregate unique users by hour
users_by_hour <- pa_data %>%
  group_by(hour) %>%
  summarize(users = n_distinct(tfc_cookie), .groups = "drop")

# Debugging: Check if aggregation was successful
if (nrow(users_by_hour) == 0) {
  print("Error: No data available for plotting. Ensure the dataset spans multiple hours.")
  stop("Execution halted: Insufficient data.")
}

# Step 4: Generate the plot
ggplot(users_by_hour, aes(x = hour, y = users, group = 1)) +
  geom_line(color = "blue", linewidth = 1) +
  geom_point(size = 2, color = "blue") +
  geom_text(aes(label = users), vjust = -0.5, size = 3) +
  labs(
    title = "Users by Hour of the Day",
    x = "Hour",
    y = "Number of Users"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    panel.grid.major = element_line(color = "gray", linetype = "dashed", linewidth = 0.5),
    panel.grid.minor = element_blank()
  )
`geom_line()`: Each group consists of only one observation.
ℹ Do you need to adjust the group aesthetic?

User activity peaks during afternoon hours, suggesting an optimal time for content releases.

5. Most Common Session Journeys

common_journeys <- pa_data %>%
  group_by(tfc_session) %>%
  summarize(
    journey = paste(unique(tfc_full_url_screen), collapse = " -> ")
  ) %>%
  count(journey, sort = TRUE) %>%
  slice_max(n, n = 5)

common_journeys %>%
  kable(
    caption = "Top 5 Most Common Session Journeys",
    col.names = c("Journey", "Number of Sessions")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed"))
Top 5 Most Common Session Journeys
Journey Number of Sessions
application 175555
list 38103
applicationmob 33225
map 19615
applicationmob -> list 3515

Common journeys often start with the homepage and move to application details, emphasizing the importance of streamlining these paths.

6. Referrals by Source

referral_sources <- pa_data %>%
  mutate(
    referrer_category = case_when(
      str_detect(tfc_referrer, "google|bing") ~ "Search Engine",
      str_detect(tfc_referrer, "facebook|instagram|linkedin") ~ "Social Media",
      is.na(tfc_referrer) ~ "Direct/Unknown",
      TRUE ~ "Other"
    )
  ) %>%
  group_by(referrer_category) %>%
  summarize(users = n_distinct(tfc_cookie), .groups = "drop")

ggplot(referral_sources, aes(x = reorder(referrer_category, -users), y = users, fill = referrer_category)) +
  geom_bar(stat = "identity", show.legend = FALSE) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Referrals by Source",
    x = "Source",
    y = "Number of Users"
  ) +
  theme_minimal()

Most users are referred by search engines, indicating the importance of SEO.

7. Repeat vs One-Time Visitors

visitor_type <- pa_data %>%
  group_by(tfc_cookie) %>%
  summarize(total_sessions = n_distinct(tfc_session)) %>%
  mutate(visitor_type = ifelse(total_sessions > 1, "Repeat Visitor", "One-Time Visitor")) %>%
  count(visitor_type)

ggplot(visitor_type, aes(x = "", y = n, fill = visitor_type)) +
  geom_bar(stat = "identity", width = 1) +
  coord_polar("y") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Repeat vs. One-Time Visitors",
    fill = "Visitor Type"
  ) +
  theme_minimal()

Repeat visitors account for approximately 7.21%, suggesting that implementing loyalty strategies could effectively drive engagement

Conclusion

This analysis provides actionable insights, such as the importance of SEO, optimizing peak usage times, and focusing on popular applications for marketing.

The findings highlight the importance of tailoring content and features to device-specific preferences, as well as focusing on peak usage times for targeted campaigns. Based on this analysis, it is recommended to enhance website design, prioritize mobile optimization, and allocate resources strategically to capitalize on high-traffic periods.