Planning Alerts Data Visualisation and Analysis

Author

Esther John Bassey

1 Introduction

Planning Alerts offers a service whereby customers can subscribe to be notified of any Irish planning applications that impact them. By outlining their preferred locations, PlanningAlerts.ie will send targeted alerts for any new, updated or decided applications. Their service ensures the customer gets a full heads up without them having to spend time trawling through any countywide lists or newspapers. There is no need for customers to check back for updates, saving them lots of time - instead they send alerts by email, app or SMS text message. You can read more about the various packages that Planning Alerts offers on their about page.

#Import the planning_alerts_data.csv file and create a new field called tfc_stamped_dt which contains a converted version of the tfc_stamped datetime field with values in the format of YYYY-MM-DD HH:MM:SS. Remove the old tfc_stamped field and rename the new one.

library(tidyverse)
library(grid)
library(lubridate)
library(gridExtra)
library(scales)
library(cluster)
library(kableExtra) #Contains the grid.arrange function for laying out several plots in the same plot window.

2 Analysis

plan <- read_csv("planning_alerts_data.csv") %>%
  mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
  select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
  rename(tfc_stamped = tfc_stamped_dt)
weekly_sesh <- plan %>%
  mutate(week = floor_date(tfc_stamped, unit = "week")) %>%
  group_by(week) %>%
  summarise(
    seshperwk = n_distinct(tfc_session), .groups = "drop"
  )
swk_avg = mean(weekly_sesh$seshperwk)
print(swk_avg)
[1] 23642.25
month_sesh <- plan %>%
  mutate(month = floor_date(tfc_stamped, unit = "month")) %>%
  group_by(month) %>%
  summarise(
    seshpermt = n_distinct(tfc_session), .groups = "drop"
  )

smt_avg = mean(month_sesh$seshpermt)
print(smt_avg)
[1] 94553
day_sesh <- plan %>%
  mutate(day = floor_date(tfc_stamped, unit = "day")) %>%
  group_by(day) %>%
  summarise(
    seshperdy = n_distinct(tfc_session), .groups = "drop"
  )

hour_sesh <- plan %>%
  mutate(hour = hour(tfc_stamped)) %>%
  group_by(hour) %>%
  summarise(
    seshperhr = n_distinct(tfc_session), .groups = "drop"
  )

shr_avg = mean(hour_sesh$seshperhr)
print(shr_avg)
[1] 12101
knitr::kable(month_sesh,
             digits = c(0,2),
             align = "lr",
             col.names=c("T Session", "Average Monthly Session"))
T Session Average Monthly Session
2024-06-01 51020
2024-07-01 115899
2024-08-01 116740
knitr::kable(weekly_sesh,
             digits = c(0,2),
             align = "lr",
             col.names=c("T Session", "Average Weekly Session"))
T Session Average Weekly Session
2024-06-09 5433
2024-06-16 21242
2024-06-23 23024
2024-06-30 19614
2024-07-07 19866
2024-07-14 29602
2024-07-21 29745
2024-07-28 32042
2024-08-04 26181
2024-08-11 25652
2024-08-18 32779
2024-08-25 18527
ggplot(weekly_sesh) +
  geom_line(mapping = aes(y = seshperwk, x = week), colour = "purple")+
  ylab("Weekly Sessions")+
  xlab("Dates")

ggplot(day_sesh) +
  geom_line(mapping = aes(y = seshperdy, x = day), colour = "orange")+
  ylab("Daily Sessions")+
  xlab("Dates by Day")

ggplot(data = hour_sesh, width = 0.5, aes(x = hour, y = seshperhr)) +
  geom_line( colour = "blue", size = 1) +
  geom_point(colour = "orange", size = 1) +
  scale_x_continuous(breaks = 0:23) +
  labs(    title = "Sessions in 24 hours",
  y = "Hour of the Day",
  x = "Number of Sessions"
  ) +
  theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

# Count the number of visits for each planning application
device_type_tab <- plan %>%
   group_by(tfc_device_type) %>%
  reframe(count = c("Desktop", "Mobile", "Android App", "Iphone App"))
  Visits = c(150, 300, 225, 180, 260)
users_by_device <- plan %>%
  group_by(tfc_device_type) %>%
  summarise(user_count = n_distinct(tfc_cookie))

ggplot(users_by_device, aes(x = tfc_device_type, y = user_count, fill = tfc_device_type)) +
  geom_bar(stat = "identity") +
  labs(title = "Users by Device Type", x = "Device Type", y = "Unique Users") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1), 
        legend.position = "none") + # Removes the side legend
  scale_x_discrete(labels = function(x) stringr::str_wrap(x, width = 10)) # Wraps labels

visit_frequency <- plan %>%
  group_by(tfc_cookie) %>%
  summarise(session_count = n()) %>%
  mutate(visitor_type = ifelse(session_count == 1, "Once-off", "Repeat"))

ggplot(visit_frequency, aes(x = visitor_type, y = session_count)) +
  geom_violin(fill = "purple", alpha = 0.6) +
  geom_jitter(width = 0.2, alpha = 0.5, color = "green") +
  scale_y_continuous(labels = scales::comma) +
  labs(title = "Distribution of Session Counts by Visitor Type",
       x = "Visitor Type", 
       y = "Session Count")

# Count the number of visits for each planning application
most_visited_application <- plan %>%
  filter(!is.na(tfc_application_reference)) %>%  # Ensure we're only looking at valid applications
  group_by(tfc_application_reference) %>%
  summarise(visit_count = n(), .groups = "drop") %>%  # Count visits
  arrange(desc(visit_count)) %>%  # Sort by visit count descending
  slice(1)  # Get the most visited application

# Print the result
print(most_visited_application)
# A tibble: 1 × 2
  tfc_application_reference        visit_count
  <chr>                                  <int>
1 57FA8F202310B985459A2790F316SDDE        7841
# Get the top N most visited applications (e.g., top 5)
top_applications <- plan %>%
  filter(!is.na(tfc_application_reference)) %>%
  group_by(tfc_application_reference) %>%
  summarise(visit_count = n(), .groups = "drop") %>%
  arrange(desc(visit_count)) %>%
  slice_head(n = 5)  # Change n to get more or fewer

# Create a bar plot for the top applications
ggplot(top_applications, aes(x = reorder(tfc_application_reference, visit_count), y = visit_count)) +
  geom_bar(stat = "identity", fill = "purple") +
  coord_flip() +  # Flip coordinates for better readability
  labs(title = "Top 5 Most Visited Planning Applications",
       x = "Planning Application Reference",
       y = "Visit Count")

journeys <- plan %>%
  filter(!is.na(tfc_session)) %>%
  group_by(tfc_session) %>%
  summarise(journey = paste(tfc_full_url, collapse = " -> "), .groups = "drop") %>%
  ungroup()

# View the top journeys
top_journeys <- journeys %>%
  count(journey, sort = TRUE) %>%
  top_n(10)  # Get the top 10 journeys
Selecting by n
# Print the top journeys
print(top_journeys)
# A tibble: 10 × 2
   journey                                                      n
   <chr>                                                    <int>
 1 application?pref=57FA8F202310B985459A2790F316SDDE01B113   7715
 2 /application?pref=5D452C202003B079FF6E253E2004LM6306D46A  5877
 3 /list?cref=lh5D500F18AB5E270D2333073F0100196B             2614
 4 /list?ref=12                                              2412
 5 /list?cref=rn5D45241C2CAEB9B22333073EF6000B8B             2193
 6 application?pref=57FA8A20230579EB04D427827F52KE572E83F2   2131
 7 /list?cref=mo55F54EB34B0E74922333073EF4000A82             2076
 8 /list?cref=ww57FA9B78FE3B87DA2333073F04001EEC             2011
 9 /list?cref=ck55DFE714444FC8C72333073F09002145             2002
10 /application?pref=555AA620160695A5C0002480989FCK0BFB82D1  1976
# Create a bar plot for the top journeys
ggplot(top_journeys, aes(x = reorder(journey, n), y = n)) +
  geom_bar(stat = "identity", fill = "orange") +
  coord_flip() +  # Flip for better readability
  labs(title = "Top 10 Common User Journeys",
       x = "User Journey",
       y = "Count of Sessions")

# Analyze sessions by day of the week
daily_usage <- plan %>%
  mutate(day = wday(tfc_stamped, label = TRUE)) %>%
  group_by(day) %>%
  summarise(sessions = n_distinct(tfc_session), .groups = "drop")

# Visualization: Sessions by Day of the Week
ggplot(daily_usage, aes(x = day, y = sessions, fill = day)) +
  geom_bar(stat = "identity") +
  labs(title = "Peak Usage by Day of the Week", 
       x = "Day of the Week", 
       y = "Number of Sessions") +
  theme_minimal() +
  theme(legend.position = "none")

# Identify top 10 days with the highest session counts
top_usage_days <- plan %>%
  mutate(date = as_date(tfc_stamped)) %>%
  group_by(date) %>%
  summarise(sessions = n_distinct(tfc_session), .groups = "drop") %>%
  arrange(desc(sessions)) %>%
  slice_head(n = 10)

# Display results
print(top_usage_days)
# A tibble: 10 × 2
   date       sessions
   <date>        <int>
 1 2024-08-20     6209
 2 2024-07-23     5720
 3 2024-07-31     5641
 4 2024-07-22     5567
 5 2024-08-26     5535
 6 2024-07-17     5525
 7 2024-07-18     5473
 8 2024-08-06     5419
 9 2024-08-21     5250
10 2024-07-24     5064
# Visualization: Top 10 Days
ggplot(top_usage_days, aes(x = reorder(date, sessions), y = sessions, fill = date)) +
  geom_bar(stat = "identity") +
  coord_flip() + # Horizontal bars for readability
  labs(title = "Top 10 Dates with Highest Usage", 
       x = "Date", 
       y = "Number of Sessions") +
  theme_minimal() +
  theme(legend.position = "none")

# Weekly session averages with peaks highlighted
weekly_usage <- plan %>%
  mutate(week = floor_date(tfc_stamped, unit = "week")) %>%
  group_by(week) %>%
  summarise(sessions = n_distinct(tfc_session), .groups = "drop") %>%
  mutate(peak = ifelse(sessions == max(sessions), "Peak", "Normal"))

# Visualization: Weekly Usage
ggplot(weekly_usage, aes(x = week, y = sessions, color = peak)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  scale_color_manual(values = c("Peak" = "orange", "Normal" = "blue")) +
  labs(title = "Weekly Usage Trends with Peaks Highlighted", 
       x = "Week", 
       y = "Number of Sessions", 
       color = "Usage") +
  theme_minimal()

# Create a heatmap dataset
heatmap_data <- plan %>%
  mutate(hour = hour(tfc_stamped), day = wday(tfc_stamped, label = TRUE)) %>%
  group_by(day, hour) %>%
  summarise(session_count = n_distinct(tfc_session), .groups = "drop")

# Heatmap visualization
ggplot(heatmap_data, aes(x = hour, y = day, fill = session_count)) +
  geom_tile(color = "white") +
  scale_fill_gradient(low = "purple", high = "orange") +
  labs(title = "Heatmap of Session Activity",
       x = "Hour of Day",
       y = "Day of Week",
       fill = "Sessions") +
  theme_minimal()

# Add week and user type columns
plan_weekly <- plan %>%
  mutate(week = floor_date(tfc_stamped, "week"),
         visitor_type = ifelse(tfc_cookie %in% 
                                unique(plan %>% 
                                         filter(week < min(week)) %>%
                                         pull(tfc_cookie)), "Repeat", "New")) %>%
  group_by(week, visitor_type) %>%
  summarise(user_count = n_distinct(tfc_cookie), .groups = "drop")

# Visualization of weekly retention
ggplot(plan_weekly, aes(x = week, y = user_count, color = visitor_type)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(title = "User Retention by Week",
       x = "Week",
       y = "Number of Users",
       color = "Visitor Type") +
  theme_minimal()

# Add month and user type columns
plan_monthly <- plan %>%
  mutate(month = floor_date(tfc_stamped, "month"),
         visitor_type = ifelse(tfc_cookie %in% 
                                unique(plan %>% 
                                         filter(month < min(month)) %>%
                                         pull(tfc_cookie)), "Repeat", "New")) %>%
  group_by(month, visitor_type) %>%
  summarise(user_count = n_distinct(tfc_cookie), .groups = "drop")

# Visualization of monthly retention
ggplot(plan_monthly, aes(x = month, y = user_count, color = visitor_type)) +
  geom_line(size = 1) +
  geom_point(size = 2) +
  labs(title = "User Retention by Month",
       x = "Month",
       y = "Number of Users",
       color = "Visitor Type") +
  theme_minimal()

# Create a dataframe with user retention information
user_retention <- plan %>%
  mutate(month = floor_date(tfc_stamped, "month")) %>%
  group_by(month, tfc_cookie) %>%
  summarise(session_count = n(), .groups = "drop") %>%
  group_by(month) %>%
  summarise(total_users = n_distinct(tfc_cookie),
            returning_users = sum(session_count > 1), # If a user has more than 1 session in a month, they are a returning user
            retention_rate = (returning_users / total_users) * 100)

# Visualization of retention rate by month
ggplot(user_retention, aes(x = month, y = retention_rate)) +
  geom_line(size = 1, color = "purple") +
  geom_point(size = 2, color = "orange") +
  labs(title = "Monthly Retention Rate",
       x = "Month",
       y = "Retention Rate (%)") +
  theme_minimal()

3 Interpretation

The analysis conducted on the data set offers valuable insights into the planning application visits, focusing on frequency and trends. By segmenting sessions across different time periods, such as days, weeks, and months, the data reveals significant patterns in user engagement. For instance, weekly and monthly averages of sessions suggest periodic spikes, with certain dates and hours showing more consistent high engagement. This could be indicative of particular planning application events or campaigns that attract more visits. Furthermore, the identification of the most visited applications and user journeys helps pinpoint areas of interest and suggests what resonates most with users. The correlation between the highest session counts and specific days of the week also highlights peak activity periods, which could be crucial for optimizing the timing of alerts or marketing campaigns. The distribution of sessions by device type also indicates that mobile access is likely to be a significant contributor to user engagement. Overall, the analysis provides a comprehensive look at user behavior, offering an actionable understanding of when and where to focus efforts for maximum impact.

4 Recommendation

Based on the insights gathered from the session frequency and trends, a few targeted strategies are recommended. First, leveraging the peak days and hours identified could be key to optimizing the timing of planning alerts. Sending notifications during high-engagement windows—such as the most visited days—can increase visibility and response rates. Additionally, enhancing the mobile user experience should be prioritized, as sessions from mobile devices are significantly higher. Streamlining the platform for mobile users, with quicker load times and intuitive navigation, could improve engagement and retention. Further, by examining the top user journeys, there is an opportunity to create personalized experiences, guiding users more effectively toward relevant planning applications. These focused adjustments can help refine the service, making it more responsive to user behaviors and maximizing overall effectiveness. Lastly, considering the high session counts during certain months, campaigns or updates aligned with these times could be strategically important, driving more interaction and conversions.