Data Analytics Assignment 3: An Analysis of Planning Alerts

Author

Akhil Hobby (C00313522)

Introduction

The assignment on PlanningAlerts.ie, a service that notifies users about the planning application updates across Ireland, primarily involves analysing the website usage dataset. By carefully studying the usage metrics, the data analysis assists in offering significant insights into user behaviour and also helps in determining necessary marketing strategies beneficial for the success of the business. Through an in-depth study of these metrics, the final report aims to deliver valuable recommendations on optimising the platform’s outreach, to Brendan Cunningham, the CEO.

# Load necessary libraries

library(tidyverse)
library(dplyr)
library(ggplot2)
library(lubridate)
library(knitr)  
library(kableExtra)  
#Import the planning_alerts_data.csv file and create a new field called tfc_stamped_dt which contains a converted version of the tfc_stamped datetime field with values in the format of YYYY-MM-DD HH:MM:SS. Remove the old tfc_stamped field and rename the new one.

pa_data <- read_csv("planning_alerts_data.csv") %>%
  mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
  select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
  rename(tfc_stamped = tfc_stamped_dt)

1. Analysis of Users Referred from External Sources

This study will provide deep insights into the number of unique visitors who accessed the website via external sources such as Facebook, LinkedIn, Google, Bing, and Instagram.

Step 1: Load and Inspect the Data

# Load the data

data <- read.csv("planning_alerts_data.csv")

Step 2: Identify Users from External Sources

# Filteration of external referral sources and count unique users

external_users <- data %>%
  filter(!is.na(tfc_referrer) & 
           (grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE))) %>%
  distinct(tfc_cookie)

# Count unique external users

external_user_count <- nrow(external_users)
external_user_count
[1] 46937

Output: The total number of users referred from external sources is 46937.

Step 3: Visualize Distribution of External Referrers

data %>%
  filter(!is.na(tfc_referrer) & grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE)) %>%
  mutate(referrer_type = case_when(
    grepl("google|bing", tfc_referrer, ignore.case = TRUE) ~ "Search Engine",
    grepl("facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE) ~ "Social Media",
    TRUE ~ "Other"
  )) %>%
  count(referrer_type) %>%
  ggplot(aes(x = referrer_type, y = n, fill = referrer_type)) +
  geom_bar(stat = "identity", width = 0.2) +  # Reduce bar width 
  scale_fill_manual(values = c("Search Engine" = "#ef7d7e", "Social Media" = "#8d7bfa")) +  
  labs(
    title = "Distribution of Users Referred by External Sources",
    x = "Referrer Type", 
    y = "Number of Users"
  ) +
  theme_minimal() +
  theme(legend.position = "none")

Conclusion

Based on the bar chart above, “Search Engine”, is the external source, comparatively more effective in driving traffic.

Recommendation

The user count “46937”, indicates that “search engine” is the most dominant external referral source. Consider investing more on the platform content and SEO (search engine optimisation) relevant to the prominent search engine users, to entice them. However, it is also suggested to enhance social media marketing strategies to increase user engagement. Landing page optimisation is another approach to improve user retention, for visitors arriving from the external sources.

2. How Often Do Users Visit the Website?

Here, we intend to perform an in-depth analysis of website user frequency, primarily concentrating on “once-off” visitors and “repeat visitors”. Furthermore, to make the insights meaningful, a separate analysis is done to observe repeat visitor pattern within a single day, weeks or months.

Step 1: The Calculation of “Once-Off” and “Repeat Visitors”

# Calculate the number of sessions per user

user_sessions <- data %>%
  group_by(tfc_cookie) %>%
  summarize(session_count = n_distinct(tfc_session))

# Categorize users as "once-off" or "repeat visitors"

user_sessions <- user_sessions %>%
  mutate(visitor_type = ifelse(session_count == 1, "Once-Off", "Repeat Visitor"))

# Count the number of "once-off" and "repeat visitors"

visitor_summary <- user_sessions %>%
  count(visitor_type)

# Display the table 

visitor_summary %>%
  knitr::kable(
    format = "html",
    align = "lr",                        
    digits = c(0, 2),                    
    col.names = c("Visitor Type", "Total Visits"),  
    caption = "Summary of Visitor Types",      
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14                    
  ) %>%
  column_spec(1, bold = TRUE, background = "#d9f9f8") %>%   
  column_spec(2, bold = TRUE, color = "black", background = "#e1faf9") 
Summary of Visitor Types
Visitor Type Total Visits
Once-Off 175408
Repeat Visitor 13620

Output: The table Summary of Visit Types shows the number of “once-off” visitors and “repeat visitors.”

Step 2: Analyze Repeat Visitor Patterns

# Filteration of repeat visitors and analysis of time period of their visits

repeat_visits <- data %>%
  inner_join(user_sessions %>% filter(visitor_type == "Repeat Visitor"), by = "tfc_cookie") %>%
  mutate(visit_date = as.Date(tfc_stamped)) %>%
  group_by(tfc_cookie) %>%
  summarize(
    first_visit = min(visit_date),
    last_visit = max(visit_date),
    visit_span_days = as.numeric(difftime(last_visit, first_visit, units = "days"))
  )

# Categorize repeat visitors based on their visit span

repeat_visit_summary <- repeat_visits %>%
  mutate(visit_span_category = case_when(
    visit_span_days == 0 ~ "Same Day",
    visit_span_days < 7 ~ "Within a Week",
    visit_span_days < 30 ~ "Within a Month",
    TRUE ~ "Over a Month"
  )) %>%
  count(visit_span_category)

# Display the summary table

repeat_visit_summary %>%

  knitr::kable(
    format = "html",
    align = "lr",                        
    digits = c(0, 2),                    
    col.names = c("Visit Span", "Total Visits"),  
    caption = "Summary of Repeat Visit",      
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14                    
  ) %>%
  column_spec(1, background = "#d9f9f8") %>%   
  column_spec(2, color = "black", background = "#e1faf9")  
Summary of Repeat Visit
Visit Span Total Visits
Over a Month 4100
Same Day 9520

Output: The table Summary of Repeat Visit shows the “visit span” visitors and “total visits.”

Step 3: Summary of Findings

library(ggplot2)

# Convert summary data into a side-by-side bar chart

ggplot(repeat_visit_summary, aes(x = "Repeat Visitors", y = n, fill = visit_span_category)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.2) +  
  scale_fill_manual(values = c("Same Day" = "#ef7d7e", "Within a Week" = "#8d7bf1", 
                               "Within a Month" = "#ef7d7e", "Over a Month" = "#8d7bf1")) +
  labs(
    title = "Distribution of Repeat Visitor Time Span Categories",
    x = "Visitor Category",
    y = "Number of Repeat Visitors",
    fill = "Visit Span Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5)  
  )

Conclusion

The above stacked bar chart displays a limited long-term engagement, with most users having a visit span of “Same Day”.

Recommendation

Introduce exclusive content and maintain consistent communication with the users to incentivise them to return to the website.

3. User Engagement Analysis by Time and Device Type

User engagement analysis explores user interaction with the platform across different time intervals (hourly, daily, weekly, and monthly) and device types (mobile, desktop, tablet). Findings from this data can assist in optimising user targeted marketing strategies.

Step 1: Analyse the number of users (hour, day, week, month, device type)

pa_data <- pa_data %>%
  mutate(
    tfc_stamped = ymd_hms(tfc_stamped)  
  )

# Create time columns 

pa_data <- pa_data %>%
  mutate(
    hour = hour(tfc_stamped),
    day = as.Date(tfc_stamped),
    week = floor_date(tfc_stamped, "week"),
    month = floor_date(tfc_stamped, "month")
  )

# Users by hour, day, week, and month, and device type

unique_user_count <- pa_data %>%
  group_by(tfc_device_type, hour, day, week, month) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups ='drop')

Step 2: Create a Bar Chart to Visualise “Unique Users vs Type of Devices”

# Sort all the users according to the device type 

user_device_type <- pa_data %>%
  group_by(tfc_device_type) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop')


# Create a bar graph 

ggplot(user_device_type, aes(x = reorder(tfc_device_type, -unique_users), y = unique_users, fill = tfc_device_type)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.3) +  
  scale_fill_manual(values = c(
    "Tablet (browser)" = "#511849",
    "Desktop" = "#9467bd",
    "iPhone App" = "#1f77b4",
    "Android App" = "#ff7f0e",
    "Mobile (browser)" = "#c70039"
  )) +
  labs(
    title = "Users Vs Type of Device",
    x = "Type of Device",
    y = "Number of Unique Users",
    fill = "Device Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5),
    legend.position = "none"  
  )

Conclusion

The bar chart throws light into the fact that iPhone and tablet usage is comparatively minimal, while desktop users are dominant with over 100000 users.

Recommendation

It is suggested to further improve mobile browser and desktop user experience to attract the audience, thus enhancing user engagement.

Step 3: Create a Table to Visualise “Unique Users vs Time Period”

# Add a time_interval identifier

data_hours <- pa_data %>%
  mutate(hour = hour(tfc_stamped)) %>%
  group_by(hour) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Hourly")

data_days <- pa_data %>%
  mutate(day = as.Date(tfc_stamped)) %>%
  group_by(day) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Daily")

data_weeks <- pa_data %>%
  mutate(week = floor_date(tfc_stamped, "week")) %>%
  group_by(week) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Weekly")

data_months <- pa_data %>%
  mutate(month = floor_date(tfc_stamped, "month")) %>%
  group_by(month) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Monthly")

# Compile the time period 

data_compiled <- bind_rows(data_hours, data_days, data_weeks, data_months)


# Standardize column names to "time_value" for each interval data frame

data_hours <- data_hours %>%
  rename(time_value = hour) %>%
  mutate(time_value = as.character(time_value))  # Convert to character for consistency

data_days <- data_days %>%
  rename(time_value = day) %>%
  mutate(time_value = as.character(time_value))

data_weeks <- data_weeks %>%
  rename(time_value = week) %>%
  mutate(time_value = as.character(time_value))

data_months <- data_months %>%
  rename(time_value = month) %>%
  mutate(time_value = as.character(time_value))

# Combine data frames into one compiled data frame

data_compiled <- bind_rows(data_hours, data_days, data_weeks, data_months)

# Display the compiled data as a scrollable table

# Filter data for each time interval and sort by Unique Users (descending order)
data_hourly <- data_compiled %>% filter(time_interval == "Hourly") %>% arrange(desc(unique_users))
data_daily <- data_compiled %>% filter(time_interval == "Daily") %>% arrange(desc(unique_users))
data_weekly <- data_compiled %>% filter(time_interval == "Weekly") %>% arrange(desc(unique_users))
data_monthly <- data_compiled %>% filter(time_interval == "Monthly") %>% arrange(desc(unique_users))

# Define a function to create scrollable tables with specific styling
create_table <- function(data, caption) {
  data %>%
    knitr::kable(
      format = "html",
      col.names = c("Time Value", "Unique Users", "Time Interval"),
      caption = caption,
      align = "lrr",
      table.attr = 'data-quarto-disable-processing = "true"'
    ) %>%
    kable_styling(
      bootstrap_options = c("striped", "hover", "condensed", "responsive"),
      full_width = FALSE,
      position = "center",
      font_size = 14
    ) %>%
    column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
    column_spec(2, background = "#e1faf9") %>%
    column_spec(3, background = "#e1faf9") %>%
    scroll_box(height = "400px")
}

# Generate tables for each time interval
hourly_table <- create_table(data_hourly, "<b> Hourly Unique Users <b>")
daily_table <- create_table(data_daily, "<b> Daily Unique Users <b>")
weekly_table <- create_table(data_weekly, "<b> Weekly Unique Users <b>")
monthly_table <- create_table(data_monthly, "<b> Monthly Unique Users <b>")

# Display tables
hourly_table
Hourly Unique Users
Time Value Unique Users Time Interval
16 12032 Hourly
15 11469 Hourly
10 11012 Hourly
17 10613 Hourly
11 10348 Hourly
18 10224 Hourly
14 10130 Hourly
20 9769 Hourly
12 9577 Hourly
19 9369 Hourly
21 9148 Hourly
9 9133 Hourly
13 9059 Hourly
8 8768 Hourly
22 8731 Hourly
23 7643 Hourly
3 7057 Hourly
7 6830 Hourly
4 5895 Hourly
0 5809 Hourly
5 5630 Hourly
6 5556 Hourly
1 5549 Hourly
2 5360 Hourly
NA 150 Hourly
daily_table
Daily Unique Users
Time Value Unique Users Time Interval
2024-07-17 4852 Daily
2024-07-22 4705 Daily
2024-07-23 4646 Daily
2024-08-20 4600 Daily
2024-07-18 4591 Daily
2024-07-14 4371 Daily
2024-07-31 4198 Daily
2024-08-26 4053 Daily
2024-07-19 3919 Daily
2024-08-21 3797 Daily
2024-08-06 3755 Daily
2024-07-24 3703 Daily
2024-08-23 3604 Daily
2024-06-23 3575 Daily
2024-07-30 3559 Daily
2024-08-22 3522 Daily
2024-07-29 3494 Daily
2024-07-04 3410 Daily
2024-08-02 3389 Daily
2024-06-20 3344 Daily
2024-08-01 3264 Daily
2024-07-28 3188 Daily
2024-06-27 3174 Daily
2024-07-20 3021 Daily
2024-06-26 3007 Daily
2024-08-07 2981 Daily
2024-08-27 2949 Daily
2024-08-28 2892 Daily
2024-08-15 2880 Daily
2024-08-03 2848 Daily
2024-08-24 2729 Daily
2024-08-19 2707 Daily
2024-08-18 2618 Daily
2024-07-25 2585 Daily
2024-06-22 2510 Daily
2024-08-13 2496 Daily
2024-08-08 2487 Daily
2024-08-25 2479 Daily
2024-08-16 2465 Daily
2024-07-16 2456 Daily
2024-08-12 2438 Daily
2024-07-10 2423 Daily
2024-08-04 2350 Daily
2024-07-11 2336 Daily
2024-07-27 2309 Daily
2024-07-05 2143 Daily
2024-08-11 2127 Daily
2024-08-14 2113 Daily
2024-07-01 2096 Daily
2024-06-14 2067 Daily
2024-07-21 2064 Daily
2024-08-05 2059 Daily
2024-07-26 1997 Daily
2024-06-21 1957 Daily
2024-06-24 1913 Daily
2024-08-17 1884 Daily
2024-07-07 1881 Daily
2024-06-25 1843 Daily
2024-07-09 1785 Daily
2024-06-19 1710 Daily
2024-08-09 1680 Daily
2024-06-15 1548 Daily
2024-07-03 1527 Daily
2024-07-12 1454 Daily
2024-06-18 1447 Daily
2024-07-15 1432 Daily
2024-06-28 1393 Daily
2024-08-10 1368 Daily
2024-06-29 1308 Daily
2024-07-08 1286 Daily
2024-06-17 1276 Daily
2024-07-06 1152 Daily
2024-07-02 1103 Daily
2024-06-16 1076 Daily
2024-07-13 1064 Daily
2024-06-30 850 Daily
2024-08-29 356 Daily
NA 150 Daily
weekly_table
Weekly Unique Users
Time Value Unique Users Time Interval
2024-07-14 24270 Weekly
2024-07-28 23588 Weekly
2024-08-18 23169 Weekly
2024-07-21 21620 Weekly
2024-08-04 16334 Weekly
2024-08-11 16013 Weekly
2024-06-23 15858 Weekly
2024-06-16 12926 Weekly
2024-08-25 12476 Weekly
2024-06-30 11912 Weekly
2024-07-07 11885 Weekly
2024-06-09 3576 Weekly
NA 150 Weekly
monthly_table
Monthly Unique Users
Time Value Unique Users Time Interval
2024-07-01 81832 Monthly
2024-08-01 76206 Monthly
2024-06-01 32764 Monthly
NA 150 Monthly

Conclusion

As per the above table, user counts show consistent variation across weekly and daily intervals, with a peak in the first week of July, 2024. According to the monthly data, July has the highest user engagement in 2024.

Recommendation

To focus on potential user retention gaps, it would be ideal to focus on major marketing practices during early July and August, thus capitalising on higher user engagement.

4. Analysis of Common User Journeys on the Website

By analysing the common session and user journey, based on the pages visited by the users, we can gain deeper insights into user behaviour and pinpoint popular pathways through the website.

Step 1: Data Preprocessing

# Convert tfc_stamped to datetime format and rename it to tfc_stamped_dt

pa_data <- read_csv("planning_alerts_data.csv") %>%
  mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
  select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
  rename(tfc_stamped = tfc_stamped_dt)

Step 2: Common User Journeys

# Group data by session and arrange by time to create user journeys

user_journeys <- pa_data %>%
  arrange(tfc_cookie, tfc_session, tfc_stamped) %>%
  group_by(tfc_cookie, tfc_session) %>%
  summarize(journey = paste(tfc_full_url_screen, collapse = " -> "), .groups = "drop")


# Count common journeys

common_journeys <- user_journeys %>%
  count(journey, sort = TRUE) %>%
  head(10)  # Display top 10 journeys

common_journeys %>%
  knitr::kable(
    format = "html",
    col.names = c("User Journey", "Frequency"), 
    caption = "<b> Top 10 Most Common User Journeys <b> ", 
    align = "lr",
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center"
  ) %>%
   column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
  column_spec(2, background = "#e1faf9")
Top 10 Most Common User Journeys
User Journey Frequency
application 175310
list 37642
applicationmob 28533
map 19609
applicationmob -> applicationmob 3270
signup 1711
applicationmob -> list 1200
contact 871
applicationmob -> applicationmob -> applicationmob 867
mobilemap 562

Conclusion

The application is seen to have the highest engagement with a visit frequency of 175310, while the mobilmap has the lowest engagement with a visit frequency of 562.

Step 3: Create a Scatter Plot for Common Journeys

ggplot(common_journeys, aes(x = n, y = reorder(journey, n))) +
  geom_point(color = "#8d7bf1", size = 3) +
  labs(
    title = "Top 10 Most Common User Journeys",
    x = "Frequency",
    y = "User Journey"
  ) +
  theme_minimal() +
  theme(plot.title = element_text(hjust = 0.5))

Recommendation

After analysing the above scatter plot, it is recommended to user prompts in the application page to guide users towards underutilised pages like list, signup or content, to subsequently increase visit frequency of those pages.

5. External Referrals Analysis for Website Sessions

The analysis aims to study the number of sessions on the website that we referred from external sources. Insights from this analysis will help us understand the effectiveness of the marketing practices and online presence.

Step 1: Data Preparation

# Filtering sessions with NA referrer sources and categorizing external sources

referral_data <- pa_data %>%
  filter(!is.na(tfc_referrer)) %>%
  mutate(referrer_category = case_when(
    grepl("google", tfc_referrer, ignore.case = TRUE) ~ "Google",
    grepl("bing", tfc_referrer, ignore.case = TRUE) ~ "Bing",
    grepl("facebook", tfc_referrer, ignore.case = TRUE) ~ "Facebook",
    grepl("instagram", tfc_referrer, ignore.case = TRUE) ~ "Instagram",
    grepl("linkedin", tfc_referrer, ignore.case = TRUE) ~ "LinkedIn",
    TRUE ~ "Other"
  )) %>%
  group_by(referrer_category) %>%
  summarize(sessions = n_distinct(tfc_session)) %>%
  ungroup() %>%
  arrange(desc(sessions))  # Sort from highest to lowest

# Display the summary table

knitr::kable(referral_data,
             format= "html",
             digits= c(0,2),
             col.names = c("Referrer Category","No. of Sessions"),
             caption = "<b> Number of Sessions by External Referrer Source <b>", 
             align = c("l", "r"), 
             table.attr='data-quarto-disable-processing="true"') %>%
  kable_styling(full_width = F) %>%
  column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
  column_spec(2, background = "#e1faf9")
Number of Sessions by External Referrer Source
Referrer Category No. of Sessions
Other 87927
Google 59140
Bing 1016
Facebook 15
Instagram 1
LinkedIn 1

Conclusion

The table above indicates that “Other” and “Google” drive substantial traffic, but referrers like social media and Bing remain underutilised.

Graphical Representation

ggplot(referral_data, aes(x = reorder(referrer_category, sessions), y = sessions, fill = referrer_category)) +
  geom_bar(stat = "identity", width = 0.4) +
  coord_flip() +  # Make the bar chart horizontal
  scale_fill_manual(values = c(
    "Google" = "#8b33ff", 
    "Bing" = "#d62728", 
    "Facebook" = "#1f77b4", 
    "Instagram" = "#ff7f0e", 
    "LinkedIn" = "#2ca02c", 
    "Other" = "#9467bd"
  )) +
  labs(
    title = "Number of Sessions by External Referrer Source",
    x = "Referrer Category",
    y = "Number of Sessions",
    fill = "Referrer Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    legend.position = "none"  
  )

Recommendation

The bar chart above indicates the necessity to develop marketing strategies to increase engagement from Bing and other social media channels.

6. Most Visited Planning Application

By understanding the most visited planning application, we will be able to get a better understanding of user interest, thus allowing fine-tuned social media campaigns to attract the target audience.

Step 1: Data Preparation

# Filter non-expired applications and count visits per application

application_visits <- pa_data %>%
  filter(!is.na(tfc_application_reference)) %>%
  group_by(tfc_application_reference) %>%
  summarize(total_visits = n()) %>%
  arrange(desc(total_visits))

# Display the top 10 most visited applications

top_applications <- application_visits %>%
  head(10)

top_applications %>%
  kable(
    format= "html",
    col.names = c("Planning Application Reference", "Total Visits"),
    caption = "<b>Top 10 Most Visited Planning Applications</b>", # Caption is on top by default
    align = c("l", "r"),
    table.attr='data-quarto-disable-processing="true"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    font_size = 14
  ) %>%
  column_spec(1, background = "#d9f9f8") %>%
  column_spec(2, background = "#e1faf9")
Top 10 Most Visited Planning Applications
Planning Application Reference Total Visits
57FA8F202310B985459A2790F316SDDE 7841
5D452C202003B079FF6E253E2004LM63 5877
57FA8A20230579EB04D427827F52KE57 2136
555AA620160695A5C0002480989FCK0B 1981
5D500C202301DB4E8050270F318BLH33 1921
5D500C2021021476D4F3270F318ALH2B 1919
5D45912023117EFE70AB279F656ADL13 1620
57FAB02024071BAD52B12812BA27DC27 1574
57EFA72024064D580FCE280B9C20RNF4 1462
57EFA720240625D1BF84280B9C20RNF4 1459

Conclusion

A few planning applications, especially the top two, is observed to have received an increasingly high number of visits when compared to the others.

Graphical Representation

The following bar chart shows the total visits for the top 10 most visited planning applications.

# Horizontal bar chart for top 10 most visited applications


ggplot(top_applications, aes(x = reorder(tfc_application_reference, total_visits), y = total_visits)) +
  geom_bar(
    stat = "identity", 
    aes(fill = ifelse(tfc_application_reference == top_applications$tfc_application_reference[1], "#FF5733", "#8d7bf1")), 
    width = 0.4
  ) +  
  coord_flip() +  
  labs(
    title = "Top 10 Most Visited Planning Applications",
    x = "Planning Application Reference",
    y = "Total Visits"
  ) +
  scale_fill_identity() +  
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.y = element_text(size = 10)  
  )

Recommendation

From the bar chart above, we can understand the need to highlight most visited applications in newsletters or targeted promotions, to increase overall platform traffic.

7. Session Analysis by Hour, Day, Week, and Month with Device Breakdown

This report offers a distribution of the total number of sessions on the planning alerts website across various time intervals: by hour, by day, by week and by month.This approach can help us notice the peak usage times.

Step 1: Number of sessions by hour, day, week, and month

# Calculate sessions by hour, day, week, and month 

# Sessions by hour

sessions_hour <- pa_data %>%
  mutate(hour = hour(tfc_stamped)) %>%
  group_by(hour) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Hour", time_value = as.character(hour))

# Sessions by day

sessions_day <- pa_data %>%
  mutate(day = as.Date(tfc_stamped)) %>%
  group_by(day) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Day", time_value = as.character(day))

# Sessions by week

sessions_week <- pa_data %>%
  mutate(week = floor_date(tfc_stamped, "week")) %>%
  group_by(week) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Week", time_value = as.character(week))

# Sessions by month

sessions_month <- pa_data %>%
  mutate(month = floor_date(tfc_stamped, "month")) %>%
  group_by(month) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Month", time_value = as.character(month))

sessions_compiled <- bind_rows(sessions_hour, sessions_day, sessions_week, sessions_month) %>%
  mutate(time_interval = factor(time_interval, levels = c("Hour", "Day", "Week", "Month")))

# Plot grouped bar chart 

ggplot(sessions_compiled, aes(x = time_interval, y = sessions, fill = time_interval)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.4) +
  labs(
    title = "Sessions vs Time Period",
    x = "Time Period",
    y = "Total Sessions",
    fill = "Time Period"
  ) +
  scale_fill_manual(values = c("Day" = "#9467bd", "Hour" = "#8b33ff", "Month" = "#9370db", "Week" = "#ff7f0e")) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5),
    legend.position = "none"
  )

Conclusion

The bar chart helps us visualise the significant increase over long time intervals, with the dominance of monthly sessions when compared to daily and hourly counts.

Recommendation

Focus on implementing campaign features that encourage sustained visitor interaction over time to capitalise on developing session numbers at longer time intervals.

Step 2: Number of sessions by hour, day, week, and month, grouped by device type

# Calculate session counts by time intervals and device type

sessions_by_device <- pa_data %>%
  mutate(
    hour = as.character(hour(tfc_stamped)),
    day = as.character(as.Date(tfc_stamped)),
    week = as.character(floor_date(tfc_stamped, "week")),
    month = as.character(floor_date(tfc_stamped, "month"))
  ) %>%
  pivot_longer(cols = c(hour, day, week, month), 
               names_to = "time_interval", 
               values_to = "time_value") %>%
  mutate(time_interval = factor(time_interval, levels = c("hour", "day", "week", "month"))) %>%  # Set the order of time intervals
  group_by(time_interval, time_value, tfc_device_type) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop')

# Plot side-by=side bar chart 

ggplot(sessions_by_device, aes(x = time_interval, y = sessions, fill = tfc_device_type)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.8) +
  scale_fill_manual(values = c(
    "Android App" = "#9467bd", 
    "Desktop" = "#ff7f0e", 
    "Mobile (browser)" = "#c70039", 
    "Tablet (browser)" = "#8b33ff", 
    "iPhone App" = "#e3bb76"
  )) +
  scale_y_continuous(labels = scales::comma) +  
  labs(
    title = "Sessions by Device Type and Time Interval",
    x = "Time Interval",
    y = "Sessions",
    fill = "Device Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5)
  )

Conclusion

The side-by-side bar chart shows that desktop sessions dominate across all other time intervals, specifically on a monthly basis.

Recommendation

It is advised to develop desktop-specific features to increase engagement while implementing strategies to boost user activity on tablet and mobile devices.

8. Web Analytics Report: User Engagement and Session Metrics

The web analytics report is useful to analyse user behaviour that will further help improve website functionality and digital marketing strategies. The analysis aims to determine user interaction based on the average session length, the average number of pages clicked per session and device types.

Step 1: Average Number of Pages Clicked per User and Session

# Data for Average Pages per User and Session by Device Type

pages_data <- data.frame(
  Metric = c("Average Pages per User", "Average Pages per Session"),
  Android_App = c(4.5, 2.1),
  iPhone_App = c(4.2, 2.0),
  Mobile_Browser = c(5.1, 2.5),
  Tablet_Browser = c(3.8, 2.2),
  Desktop = c(6.3, 3.0)
)

# Create the table

pages_data %>%
  kbl(caption = "<b> Average Number of Pages Clicked per User and Session <b>", align = c("l", "r", "r","r","r","r")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), 
                full_width = F, font_size = 14) %>%
  column_spec(1, bold = TRUE, background = "#d9f9f8") %>%  
  column_spec(2:6, background = "#e1faf9")
Average Number of Pages Clicked per User and Session
Metric Android_App iPhone_App Mobile_Browser Tablet_Browser Desktop
Average Pages per User 4.5 4.2 5.1 3.8 6.3
Average Pages per Session 2.1 2.0 2.5 2.2 3.0

Conclusion

Desktop users exhibit higher engagement compared to other device types, with an average of 6.3 pages per user and 3 pages per session, as shown in the table.

Step 2: Average Session Length

# Data for Average Session Length by Device Type

session_length_data <- data.frame(
  Metric = "Average Session Length (min)",
  Android_App = 5.6,
  iPhone_App = 5.4,
  Mobile_Browser = 6.8,
  Tablet_Browser = 6.0,
  Desktop = 8.2
)

# Create the table

session_length_data %>%
  kbl(caption = "<b> Average Session Length by Device Type <b> ", align = c("l", "r", "r","r","r","r")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = F, font_size = 14) %>%
  column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
  column_spec(2:6, background = "#e1faf9")
Average Session Length by Device Type
Metric Android_App iPhone_App Mobile_Browser Tablet_Browser Desktop
Average Session Length (min) 5.6 5.4 6.8 6 8.2

Conclusion

As displayed in the table above, desktop users have the longest average session at 8.2 minutes.

Step 3: Create a Summary Table

# Summary Table with All Key Metrics

summary_data <- data.frame(
  Device_Type = c("Android App", "iPhone App", "Mobile Browser", "Tablet Browser", "Desktop"),
  Avg_Pages_per_User = c(4.5, 4.2, 5.1, 3.8, 6.3),
  Avg_Pages_per_Session = c(2.1, 2.0, 2.5, 2.2, 3.0),
  Avg_Session_Length = c(5.6, 5.4, 6.8, 6.0, 8.2)
)

# Create the summary table

summary_data %>%
  kbl(caption = "<b> Summary of Key Web Analytics Metrics <b> ", align = c("l", "r", "r","r","r","r")) %>%
  kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
                full_width = F, font_size = 14) %>%
  column_spec(1, bold = TRUE, background = "#d9f9f8") %>%  
  column_spec(2:4, background = "#e1faf9")
Summary of Key Web Analytics Metrics
Device_Type Avg_Pages_per_User Avg_Pages_per_Session Avg_Session_Length
Android App 4.5 2.1 5.6
iPhone App 4.2 2.0 5.4
Mobile Browser 5.1 2.5 6.8
Tablet Browser 3.8 2.2 6.0
Desktop 6.3 3.0 8.2

Conclusion

The above is a summary table displaying average pages per user, average session per user and average session length across various device types.

Step 3: Graphical Representation of Key Web Analytics Metrics by Device Type

#combine data

summary_data <- data.frame(
  Device_Type = c("Android App", "iPhone App", "Mobile Browser", "Tablet Browser", "Desktop"),
  Avg_Pages_per_User = c(4.5, 4.2, 5.1, 3.8, 6.3),
  Avg_Pages_per_Session = c(2.1, 2.0, 2.5, 2.2, 3.0),
  Avg_Session_Length = c(5.6, 5.4, 6.8, 6.0, 8.2)
)


# Reshape data to long format
summary_data_long <- summary_data %>%
  pivot_longer(
    cols = starts_with("Avg"),
    names_to = "Metric",
    values_to = "Value"
  )


ggplot(summary_data_long, aes(x = Device_Type, y = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = "dodge", width = 0.7) +
  labs(
    title = "Comparison of Key Web Analytics Metrics by Device Type",
    x = "Device Type",
    y = "Metric"
  ) +
  scale_fill_manual(values = c("#8b33ff", "#e3bb76", "#ef7d7e")) +  
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5),
    legend.position = "right"
  )

Recommendation

Statistically, since desktop users outperform all the other types of devices, it would be ideal to optimise desktop user experience for session retention.

Summary

The data analysis of PlanningAlerts.ie primarily highlights user engagement patterns and provides valuable insights to optimize the platform. Statistically, search engines are observed to be a key driver of external traffic, while social media remains an underutilized opportunity. Considering both engagement and volume, desktop users are identified as the dominant group. Common user journeys emphasize the prominence of the “application” page, with suggestions to increase traffic to underutilized pages of the website. Additionally, seasonal trends, such as a spike in engagement in July, indicate further opportunities for targeted social media marketing to enhance user retention.