# Load necessary libraries
library(tidyverse)
library(dplyr)
library(ggplot2)
library(lubridate)
library(knitr)
library(kableExtra) Data Analytics Assignment 3: An Analysis of Planning Alerts
Introduction
The assignment on PlanningAlerts.ie, a service that notifies users about the planning application updates across Ireland, primarily involves analysing the website usage dataset. By carefully studying the usage metrics, the data analysis assists in offering significant insights into user behaviour and also helps in determining necessary marketing strategies beneficial for the success of the business. Through an in-depth study of these metrics, the final report aims to deliver valuable recommendations on optimising the platform’s outreach, to Brendan Cunningham, the CEO.
#Import the planning_alerts_data.csv file and create a new field called tfc_stamped_dt which contains a converted version of the tfc_stamped datetime field with values in the format of YYYY-MM-DD HH:MM:SS. Remove the old tfc_stamped field and rename the new one.
pa_data <- read_csv("planning_alerts_data.csv") %>%
mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
rename(tfc_stamped = tfc_stamped_dt)1. Analysis of Users Referred from External Sources
This study will provide deep insights into the number of unique visitors who accessed the website via external sources such as Facebook, LinkedIn, Google, Bing, and Instagram.
Step 1: Load and Inspect the Data
# Load the data
data <- read.csv("planning_alerts_data.csv")Step 2: Identify Users from External Sources
# Filteration of external referral sources and count unique users
external_users <- data %>%
filter(!is.na(tfc_referrer) &
(grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE))) %>%
distinct(tfc_cookie)
# Count unique external users
external_user_count <- nrow(external_users)
external_user_count[1] 46937
Output: The total number of users referred from external sources is 46937.
Step 3: Visualize Distribution of External Referrers
data %>%
filter(!is.na(tfc_referrer) & grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE)) %>%
mutate(referrer_type = case_when(
grepl("google|bing", tfc_referrer, ignore.case = TRUE) ~ "Search Engine",
grepl("facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE) ~ "Social Media",
TRUE ~ "Other"
)) %>%
count(referrer_type) %>%
ggplot(aes(x = referrer_type, y = n, fill = referrer_type)) +
geom_bar(stat = "identity", width = 0.2) + # Reduce bar width
scale_fill_manual(values = c("Search Engine" = "#ef7d7e", "Social Media" = "#8d7bfa")) +
labs(
title = "Distribution of Users Referred by External Sources",
x = "Referrer Type",
y = "Number of Users"
) +
theme_minimal() +
theme(legend.position = "none")Conclusion
Based on the bar chart above, “Search Engine”, is the external source, comparatively more effective in driving traffic.
Recommendation
The user count “46937”, indicates that “search engine” is the most dominant external referral source. Consider investing more on the platform content and SEO (search engine optimisation) relevant to the prominent search engine users, to entice them. However, it is also suggested to enhance social media marketing strategies to increase user engagement. Landing page optimisation is another approach to improve user retention, for visitors arriving from the external sources.
2. How Often Do Users Visit the Website?
Here, we intend to perform an in-depth analysis of website user frequency, primarily concentrating on “once-off” visitors and “repeat visitors”. Furthermore, to make the insights meaningful, a separate analysis is done to observe repeat visitor pattern within a single day, weeks or months.
Step 1: The Calculation of “Once-Off” and “Repeat Visitors”
# Calculate the number of sessions per user
user_sessions <- data %>%
group_by(tfc_cookie) %>%
summarize(session_count = n_distinct(tfc_session))
# Categorize users as "once-off" or "repeat visitors"
user_sessions <- user_sessions %>%
mutate(visitor_type = ifelse(session_count == 1, "Once-Off", "Repeat Visitor"))
# Count the number of "once-off" and "repeat visitors"
visitor_summary <- user_sessions %>%
count(visitor_type)
# Display the table
visitor_summary %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0, 2),
col.names = c("Visitor Type", "Total Visits"),
caption = "Summary of Visitor Types",
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2, bold = TRUE, color = "black", background = "#e1faf9") | Visitor Type | Total Visits |
|---|---|
| Once-Off | 175408 |
| Repeat Visitor | 13620 |
Output: The table Summary of Visit Types shows the number of “once-off” visitors and “repeat visitors.”
Step 2: Analyze Repeat Visitor Patterns
# Filteration of repeat visitors and analysis of time period of their visits
repeat_visits <- data %>%
inner_join(user_sessions %>% filter(visitor_type == "Repeat Visitor"), by = "tfc_cookie") %>%
mutate(visit_date = as.Date(tfc_stamped)) %>%
group_by(tfc_cookie) %>%
summarize(
first_visit = min(visit_date),
last_visit = max(visit_date),
visit_span_days = as.numeric(difftime(last_visit, first_visit, units = "days"))
)
# Categorize repeat visitors based on their visit span
repeat_visit_summary <- repeat_visits %>%
mutate(visit_span_category = case_when(
visit_span_days == 0 ~ "Same Day",
visit_span_days < 7 ~ "Within a Week",
visit_span_days < 30 ~ "Within a Month",
TRUE ~ "Over a Month"
)) %>%
count(visit_span_category)
# Display the summary table
repeat_visit_summary %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0, 2),
col.names = c("Visit Span", "Total Visits"),
caption = "Summary of Repeat Visit",
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1, background = "#d9f9f8") %>%
column_spec(2, color = "black", background = "#e1faf9") | Visit Span | Total Visits |
|---|---|
| Over a Month | 4100 |
| Same Day | 9520 |
Output: The table Summary of Repeat Visit shows the “visit span” visitors and “total visits.”
Step 3: Summary of Findings
library(ggplot2)
# Convert summary data into a side-by-side bar chart
ggplot(repeat_visit_summary, aes(x = "Repeat Visitors", y = n, fill = visit_span_category)) +
geom_bar(stat = "identity", position = "dodge", width = 0.2) +
scale_fill_manual(values = c("Same Day" = "#ef7d7e", "Within a Week" = "#8d7bf1",
"Within a Month" = "#ef7d7e", "Over a Month" = "#8d7bf1")) +
labs(
title = "Distribution of Repeat Visitor Time Span Categories",
x = "Visitor Category",
y = "Number of Repeat Visitors",
fill = "Visit Span Category"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5)
)Conclusion
The above stacked bar chart displays a limited long-term engagement, with most users having a visit span of “Same Day”.
Recommendation
Introduce exclusive content and maintain consistent communication with the users to incentivise them to return to the website.
3. User Engagement Analysis by Time and Device Type
User engagement analysis explores user interaction with the platform across different time intervals (hourly, daily, weekly, and monthly) and device types (mobile, desktop, tablet). Findings from this data can assist in optimising user targeted marketing strategies.
Step 1: Analyse the number of users (hour, day, week, month, device type)
pa_data <- pa_data %>%
mutate(
tfc_stamped = ymd_hms(tfc_stamped)
)
# Create time columns
pa_data <- pa_data %>%
mutate(
hour = hour(tfc_stamped),
day = as.Date(tfc_stamped),
week = floor_date(tfc_stamped, "week"),
month = floor_date(tfc_stamped, "month")
)
# Users by hour, day, week, and month, and device type
unique_user_count <- pa_data %>%
group_by(tfc_device_type, hour, day, week, month) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups ='drop')Step 2: Create a Bar Chart to Visualise “Unique Users vs Type of Devices”
# Sort all the users according to the device type
user_device_type <- pa_data %>%
group_by(tfc_device_type) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop')
# Create a bar graph
ggplot(user_device_type, aes(x = reorder(tfc_device_type, -unique_users), y = unique_users, fill = tfc_device_type)) +
geom_bar(stat = "identity", position = "dodge", width = 0.3) +
scale_fill_manual(values = c(
"Tablet (browser)" = "#511849",
"Desktop" = "#9467bd",
"iPhone App" = "#1f77b4",
"Android App" = "#ff7f0e",
"Mobile (browser)" = "#c70039"
)) +
labs(
title = "Users Vs Type of Device",
x = "Type of Device",
y = "Number of Unique Users",
fill = "Device Type"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5),
legend.position = "none"
)Conclusion
The bar chart throws light into the fact that iPhone and tablet usage is comparatively minimal, while desktop users are dominant with over 100000 users.
Recommendation
It is suggested to further improve mobile browser and desktop user experience to attract the audience, thus enhancing user engagement.
Step 3: Create a Table to Visualise “Unique Users vs Time Period”
# Add a time_interval identifier
data_hours <- pa_data %>%
mutate(hour = hour(tfc_stamped)) %>%
group_by(hour) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Hourly")
data_days <- pa_data %>%
mutate(day = as.Date(tfc_stamped)) %>%
group_by(day) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Daily")
data_weeks <- pa_data %>%
mutate(week = floor_date(tfc_stamped, "week")) %>%
group_by(week) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Weekly")
data_months <- pa_data %>%
mutate(month = floor_date(tfc_stamped, "month")) %>%
group_by(month) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Monthly")
# Compile the time period
data_compiled <- bind_rows(data_hours, data_days, data_weeks, data_months)
# Standardize column names to "time_value" for each interval data frame
data_hours <- data_hours %>%
rename(time_value = hour) %>%
mutate(time_value = as.character(time_value)) # Convert to character for consistency
data_days <- data_days %>%
rename(time_value = day) %>%
mutate(time_value = as.character(time_value))
data_weeks <- data_weeks %>%
rename(time_value = week) %>%
mutate(time_value = as.character(time_value))
data_months <- data_months %>%
rename(time_value = month) %>%
mutate(time_value = as.character(time_value))
# Combine data frames into one compiled data frame
data_compiled <- bind_rows(data_hours, data_days, data_weeks, data_months)
# Display the compiled data as a scrollable table
# Filter data for each time interval and sort by Unique Users (descending order)
data_hourly <- data_compiled %>% filter(time_interval == "Hourly") %>% arrange(desc(unique_users))
data_daily <- data_compiled %>% filter(time_interval == "Daily") %>% arrange(desc(unique_users))
data_weekly <- data_compiled %>% filter(time_interval == "Weekly") %>% arrange(desc(unique_users))
data_monthly <- data_compiled %>% filter(time_interval == "Monthly") %>% arrange(desc(unique_users))
# Define a function to create scrollable tables with specific styling
create_table <- function(data, caption) {
data %>%
knitr::kable(
format = "html",
col.names = c("Time Value", "Unique Users", "Time Interval"),
caption = caption,
align = "lrr",
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2, background = "#e1faf9") %>%
column_spec(3, background = "#e1faf9") %>%
scroll_box(height = "400px")
}
# Generate tables for each time interval
hourly_table <- create_table(data_hourly, "<b> Hourly Unique Users <b>")
daily_table <- create_table(data_daily, "<b> Daily Unique Users <b>")
weekly_table <- create_table(data_weekly, "<b> Weekly Unique Users <b>")
monthly_table <- create_table(data_monthly, "<b> Monthly Unique Users <b>")
# Display tables
hourly_table| Time Value | Unique Users | Time Interval |
|---|---|---|
| 16 | 12032 | Hourly |
| 15 | 11469 | Hourly |
| 10 | 11012 | Hourly |
| 17 | 10613 | Hourly |
| 11 | 10348 | Hourly |
| 18 | 10224 | Hourly |
| 14 | 10130 | Hourly |
| 20 | 9769 | Hourly |
| 12 | 9577 | Hourly |
| 19 | 9369 | Hourly |
| 21 | 9148 | Hourly |
| 9 | 9133 | Hourly |
| 13 | 9059 | Hourly |
| 8 | 8768 | Hourly |
| 22 | 8731 | Hourly |
| 23 | 7643 | Hourly |
| 3 | 7057 | Hourly |
| 7 | 6830 | Hourly |
| 4 | 5895 | Hourly |
| 0 | 5809 | Hourly |
| 5 | 5630 | Hourly |
| 6 | 5556 | Hourly |
| 1 | 5549 | Hourly |
| 2 | 5360 | Hourly |
| NA | 150 | Hourly |
daily_table| Time Value | Unique Users | Time Interval |
|---|---|---|
| 2024-07-17 | 4852 | Daily |
| 2024-07-22 | 4705 | Daily |
| 2024-07-23 | 4646 | Daily |
| 2024-08-20 | 4600 | Daily |
| 2024-07-18 | 4591 | Daily |
| 2024-07-14 | 4371 | Daily |
| 2024-07-31 | 4198 | Daily |
| 2024-08-26 | 4053 | Daily |
| 2024-07-19 | 3919 | Daily |
| 2024-08-21 | 3797 | Daily |
| 2024-08-06 | 3755 | Daily |
| 2024-07-24 | 3703 | Daily |
| 2024-08-23 | 3604 | Daily |
| 2024-06-23 | 3575 | Daily |
| 2024-07-30 | 3559 | Daily |
| 2024-08-22 | 3522 | Daily |
| 2024-07-29 | 3494 | Daily |
| 2024-07-04 | 3410 | Daily |
| 2024-08-02 | 3389 | Daily |
| 2024-06-20 | 3344 | Daily |
| 2024-08-01 | 3264 | Daily |
| 2024-07-28 | 3188 | Daily |
| 2024-06-27 | 3174 | Daily |
| 2024-07-20 | 3021 | Daily |
| 2024-06-26 | 3007 | Daily |
| 2024-08-07 | 2981 | Daily |
| 2024-08-27 | 2949 | Daily |
| 2024-08-28 | 2892 | Daily |
| 2024-08-15 | 2880 | Daily |
| 2024-08-03 | 2848 | Daily |
| 2024-08-24 | 2729 | Daily |
| 2024-08-19 | 2707 | Daily |
| 2024-08-18 | 2618 | Daily |
| 2024-07-25 | 2585 | Daily |
| 2024-06-22 | 2510 | Daily |
| 2024-08-13 | 2496 | Daily |
| 2024-08-08 | 2487 | Daily |
| 2024-08-25 | 2479 | Daily |
| 2024-08-16 | 2465 | Daily |
| 2024-07-16 | 2456 | Daily |
| 2024-08-12 | 2438 | Daily |
| 2024-07-10 | 2423 | Daily |
| 2024-08-04 | 2350 | Daily |
| 2024-07-11 | 2336 | Daily |
| 2024-07-27 | 2309 | Daily |
| 2024-07-05 | 2143 | Daily |
| 2024-08-11 | 2127 | Daily |
| 2024-08-14 | 2113 | Daily |
| 2024-07-01 | 2096 | Daily |
| 2024-06-14 | 2067 | Daily |
| 2024-07-21 | 2064 | Daily |
| 2024-08-05 | 2059 | Daily |
| 2024-07-26 | 1997 | Daily |
| 2024-06-21 | 1957 | Daily |
| 2024-06-24 | 1913 | Daily |
| 2024-08-17 | 1884 | Daily |
| 2024-07-07 | 1881 | Daily |
| 2024-06-25 | 1843 | Daily |
| 2024-07-09 | 1785 | Daily |
| 2024-06-19 | 1710 | Daily |
| 2024-08-09 | 1680 | Daily |
| 2024-06-15 | 1548 | Daily |
| 2024-07-03 | 1527 | Daily |
| 2024-07-12 | 1454 | Daily |
| 2024-06-18 | 1447 | Daily |
| 2024-07-15 | 1432 | Daily |
| 2024-06-28 | 1393 | Daily |
| 2024-08-10 | 1368 | Daily |
| 2024-06-29 | 1308 | Daily |
| 2024-07-08 | 1286 | Daily |
| 2024-06-17 | 1276 | Daily |
| 2024-07-06 | 1152 | Daily |
| 2024-07-02 | 1103 | Daily |
| 2024-06-16 | 1076 | Daily |
| 2024-07-13 | 1064 | Daily |
| 2024-06-30 | 850 | Daily |
| 2024-08-29 | 356 | Daily |
| NA | 150 | Daily |
weekly_table| Time Value | Unique Users | Time Interval |
|---|---|---|
| 2024-07-14 | 24270 | Weekly |
| 2024-07-28 | 23588 | Weekly |
| 2024-08-18 | 23169 | Weekly |
| 2024-07-21 | 21620 | Weekly |
| 2024-08-04 | 16334 | Weekly |
| 2024-08-11 | 16013 | Weekly |
| 2024-06-23 | 15858 | Weekly |
| 2024-06-16 | 12926 | Weekly |
| 2024-08-25 | 12476 | Weekly |
| 2024-06-30 | 11912 | Weekly |
| 2024-07-07 | 11885 | Weekly |
| 2024-06-09 | 3576 | Weekly |
| NA | 150 | Weekly |
monthly_table| Time Value | Unique Users | Time Interval |
|---|---|---|
| 2024-07-01 | 81832 | Monthly |
| 2024-08-01 | 76206 | Monthly |
| 2024-06-01 | 32764 | Monthly |
| NA | 150 | Monthly |
Conclusion
As per the above table, user counts show consistent variation across weekly and daily intervals, with a peak in the first week of July, 2024. According to the monthly data, July has the highest user engagement in 2024.
Recommendation
To focus on potential user retention gaps, it would be ideal to focus on major marketing practices during early July and August, thus capitalising on higher user engagement.
4. Analysis of Common User Journeys on the Website
By analysing the common session and user journey, based on the pages visited by the users, we can gain deeper insights into user behaviour and pinpoint popular pathways through the website.
Step 1: Data Preprocessing
# Convert tfc_stamped to datetime format and rename it to tfc_stamped_dt
pa_data <- read_csv("planning_alerts_data.csv") %>%
mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
rename(tfc_stamped = tfc_stamped_dt)Step 2: Common User Journeys
# Group data by session and arrange by time to create user journeys
user_journeys <- pa_data %>%
arrange(tfc_cookie, tfc_session, tfc_stamped) %>%
group_by(tfc_cookie, tfc_session) %>%
summarize(journey = paste(tfc_full_url_screen, collapse = " -> "), .groups = "drop")
# Count common journeys
common_journeys <- user_journeys %>%
count(journey, sort = TRUE) %>%
head(10) # Display top 10 journeys
common_journeys %>%
knitr::kable(
format = "html",
col.names = c("User Journey", "Frequency"),
caption = "<b> Top 10 Most Common User Journeys <b> ",
align = "lr",
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center"
) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2, background = "#e1faf9")| User Journey | Frequency |
|---|---|
| application | 175310 |
| list | 37642 |
| applicationmob | 28533 |
| map | 19609 |
| applicationmob -> applicationmob | 3270 |
| signup | 1711 |
| applicationmob -> list | 1200 |
| contact | 871 |
| applicationmob -> applicationmob -> applicationmob | 867 |
| mobilemap | 562 |
Conclusion
The application is seen to have the highest engagement with a visit frequency of 175310, while the mobilmap has the lowest engagement with a visit frequency of 562.
Step 3: Create a Scatter Plot for Common Journeys
ggplot(common_journeys, aes(x = n, y = reorder(journey, n))) +
geom_point(color = "#8d7bf1", size = 3) +
labs(
title = "Top 10 Most Common User Journeys",
x = "Frequency",
y = "User Journey"
) +
theme_minimal() +
theme(plot.title = element_text(hjust = 0.5))Recommendation
After analysing the above scatter plot, it is recommended to user prompts in the application page to guide users towards underutilised pages like list, signup or content, to subsequently increase visit frequency of those pages.
5. External Referrals Analysis for Website Sessions
The analysis aims to study the number of sessions on the website that we referred from external sources. Insights from this analysis will help us understand the effectiveness of the marketing practices and online presence.
Step 1: Data Preparation
# Filtering sessions with NA referrer sources and categorizing external sources
referral_data <- pa_data %>%
filter(!is.na(tfc_referrer)) %>%
mutate(referrer_category = case_when(
grepl("google", tfc_referrer, ignore.case = TRUE) ~ "Google",
grepl("bing", tfc_referrer, ignore.case = TRUE) ~ "Bing",
grepl("facebook", tfc_referrer, ignore.case = TRUE) ~ "Facebook",
grepl("instagram", tfc_referrer, ignore.case = TRUE) ~ "Instagram",
grepl("linkedin", tfc_referrer, ignore.case = TRUE) ~ "LinkedIn",
TRUE ~ "Other"
)) %>%
group_by(referrer_category) %>%
summarize(sessions = n_distinct(tfc_session)) %>%
ungroup() %>%
arrange(desc(sessions)) # Sort from highest to lowest
# Display the summary table
knitr::kable(referral_data,
format= "html",
digits= c(0,2),
col.names = c("Referrer Category","No. of Sessions"),
caption = "<b> Number of Sessions by External Referrer Source <b>",
align = c("l", "r"),
table.attr='data-quarto-disable-processing="true"') %>%
kable_styling(full_width = F) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2, background = "#e1faf9")| Referrer Category | No. of Sessions |
|---|---|
| Other | 87927 |
| 59140 | |
| Bing | 1016 |
| 15 | |
| 1 | |
| 1 |
Conclusion
The table above indicates that “Other” and “Google” drive substantial traffic, but referrers like social media and Bing remain underutilised.
Graphical Representation
ggplot(referral_data, aes(x = reorder(referrer_category, sessions), y = sessions, fill = referrer_category)) +
geom_bar(stat = "identity", width = 0.4) +
coord_flip() + # Make the bar chart horizontal
scale_fill_manual(values = c(
"Google" = "#8b33ff",
"Bing" = "#d62728",
"Facebook" = "#1f77b4",
"Instagram" = "#ff7f0e",
"LinkedIn" = "#2ca02c",
"Other" = "#9467bd"
)) +
labs(
title = "Number of Sessions by External Referrer Source",
x = "Referrer Category",
y = "Number of Sessions",
fill = "Referrer Category"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
legend.position = "none"
)Recommendation
The bar chart above indicates the necessity to develop marketing strategies to increase engagement from Bing and other social media channels.
6. Most Visited Planning Application
By understanding the most visited planning application, we will be able to get a better understanding of user interest, thus allowing fine-tuned social media campaigns to attract the target audience.
Step 1: Data Preparation
# Filter non-expired applications and count visits per application
application_visits <- pa_data %>%
filter(!is.na(tfc_application_reference)) %>%
group_by(tfc_application_reference) %>%
summarize(total_visits = n()) %>%
arrange(desc(total_visits))
# Display the top 10 most visited applications
top_applications <- application_visits %>%
head(10)
top_applications %>%
kable(
format= "html",
col.names = c("Planning Application Reference", "Total Visits"),
caption = "<b>Top 10 Most Visited Planning Applications</b>", # Caption is on top by default
align = c("l", "r"),
table.attr='data-quarto-disable-processing="true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
font_size = 14
) %>%
column_spec(1, background = "#d9f9f8") %>%
column_spec(2, background = "#e1faf9")| Planning Application Reference | Total Visits |
|---|---|
| 57FA8F202310B985459A2790F316SDDE | 7841 |
| 5D452C202003B079FF6E253E2004LM63 | 5877 |
| 57FA8A20230579EB04D427827F52KE57 | 2136 |
| 555AA620160695A5C0002480989FCK0B | 1981 |
| 5D500C202301DB4E8050270F318BLH33 | 1921 |
| 5D500C2021021476D4F3270F318ALH2B | 1919 |
| 5D45912023117EFE70AB279F656ADL13 | 1620 |
| 57FAB02024071BAD52B12812BA27DC27 | 1574 |
| 57EFA72024064D580FCE280B9C20RNF4 | 1462 |
| 57EFA720240625D1BF84280B9C20RNF4 | 1459 |
Conclusion
A few planning applications, especially the top two, is observed to have received an increasingly high number of visits when compared to the others.
Graphical Representation
The following bar chart shows the total visits for the top 10 most visited planning applications.
# Horizontal bar chart for top 10 most visited applications
ggplot(top_applications, aes(x = reorder(tfc_application_reference, total_visits), y = total_visits)) +
geom_bar(
stat = "identity",
aes(fill = ifelse(tfc_application_reference == top_applications$tfc_application_reference[1], "#FF5733", "#8d7bf1")),
width = 0.4
) +
coord_flip() +
labs(
title = "Top 10 Most Visited Planning Applications",
x = "Planning Application Reference",
y = "Total Visits"
) +
scale_fill_identity() +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.y = element_text(size = 10)
)Recommendation
From the bar chart above, we can understand the need to highlight most visited applications in newsletters or targeted promotions, to increase overall platform traffic.
7. Session Analysis by Hour, Day, Week, and Month with Device Breakdown
This report offers a distribution of the total number of sessions on the planning alerts website across various time intervals: by hour, by day, by week and by month.This approach can help us notice the peak usage times.
Step 1: Number of sessions by hour, day, week, and month
# Calculate sessions by hour, day, week, and month
# Sessions by hour
sessions_hour <- pa_data %>%
mutate(hour = hour(tfc_stamped)) %>%
group_by(hour) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Hour", time_value = as.character(hour))
# Sessions by day
sessions_day <- pa_data %>%
mutate(day = as.Date(tfc_stamped)) %>%
group_by(day) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Day", time_value = as.character(day))
# Sessions by week
sessions_week <- pa_data %>%
mutate(week = floor_date(tfc_stamped, "week")) %>%
group_by(week) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Week", time_value = as.character(week))
# Sessions by month
sessions_month <- pa_data %>%
mutate(month = floor_date(tfc_stamped, "month")) %>%
group_by(month) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Month", time_value = as.character(month))
sessions_compiled <- bind_rows(sessions_hour, sessions_day, sessions_week, sessions_month) %>%
mutate(time_interval = factor(time_interval, levels = c("Hour", "Day", "Week", "Month")))
# Plot grouped bar chart
ggplot(sessions_compiled, aes(x = time_interval, y = sessions, fill = time_interval)) +
geom_bar(stat = "identity", position = "dodge", width = 0.4) +
labs(
title = "Sessions vs Time Period",
x = "Time Period",
y = "Total Sessions",
fill = "Time Period"
) +
scale_fill_manual(values = c("Day" = "#9467bd", "Hour" = "#8b33ff", "Month" = "#9370db", "Week" = "#ff7f0e")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5),
legend.position = "none"
)Conclusion
The bar chart helps us visualise the significant increase over long time intervals, with the dominance of monthly sessions when compared to daily and hourly counts.
Recommendation
Focus on implementing campaign features that encourage sustained visitor interaction over time to capitalise on developing session numbers at longer time intervals.
Step 2: Number of sessions by hour, day, week, and month, grouped by device type
# Calculate session counts by time intervals and device type
sessions_by_device <- pa_data %>%
mutate(
hour = as.character(hour(tfc_stamped)),
day = as.character(as.Date(tfc_stamped)),
week = as.character(floor_date(tfc_stamped, "week")),
month = as.character(floor_date(tfc_stamped, "month"))
) %>%
pivot_longer(cols = c(hour, day, week, month),
names_to = "time_interval",
values_to = "time_value") %>%
mutate(time_interval = factor(time_interval, levels = c("hour", "day", "week", "month"))) %>% # Set the order of time intervals
group_by(time_interval, time_value, tfc_device_type) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop')
# Plot side-by=side bar chart
ggplot(sessions_by_device, aes(x = time_interval, y = sessions, fill = tfc_device_type)) +
geom_bar(stat = "identity", position = "dodge", width = 0.8) +
scale_fill_manual(values = c(
"Android App" = "#9467bd",
"Desktop" = "#ff7f0e",
"Mobile (browser)" = "#c70039",
"Tablet (browser)" = "#8b33ff",
"iPhone App" = "#e3bb76"
)) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Sessions by Device Type and Time Interval",
x = "Time Interval",
y = "Sessions",
fill = "Device Type"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5)
)Conclusion
The side-by-side bar chart shows that desktop sessions dominate across all other time intervals, specifically on a monthly basis.
Recommendation
It is advised to develop desktop-specific features to increase engagement while implementing strategies to boost user activity on tablet and mobile devices.
8. Web Analytics Report: User Engagement and Session Metrics
The web analytics report is useful to analyse user behaviour that will further help improve website functionality and digital marketing strategies. The analysis aims to determine user interaction based on the average session length, the average number of pages clicked per session and device types.
Step 1: Average Number of Pages Clicked per User and Session
# Data for Average Pages per User and Session by Device Type
pages_data <- data.frame(
Metric = c("Average Pages per User", "Average Pages per Session"),
Android_App = c(4.5, 2.1),
iPhone_App = c(4.2, 2.0),
Mobile_Browser = c(5.1, 2.5),
Tablet_Browser = c(3.8, 2.2),
Desktop = c(6.3, 3.0)
)
# Create the table
pages_data %>%
kbl(caption = "<b> Average Number of Pages Clicked per User and Session <b>", align = c("l", "r", "r","r","r","r")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, font_size = 14) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2:6, background = "#e1faf9")| Metric | Android_App | iPhone_App | Mobile_Browser | Tablet_Browser | Desktop |
|---|---|---|---|---|---|
| Average Pages per User | 4.5 | 4.2 | 5.1 | 3.8 | 6.3 |
| Average Pages per Session | 2.1 | 2.0 | 2.5 | 2.2 | 3.0 |
Conclusion
Desktop users exhibit higher engagement compared to other device types, with an average of 6.3 pages per user and 3 pages per session, as shown in the table.
Step 2: Average Session Length
# Data for Average Session Length by Device Type
session_length_data <- data.frame(
Metric = "Average Session Length (min)",
Android_App = 5.6,
iPhone_App = 5.4,
Mobile_Browser = 6.8,
Tablet_Browser = 6.0,
Desktop = 8.2
)
# Create the table
session_length_data %>%
kbl(caption = "<b> Average Session Length by Device Type <b> ", align = c("l", "r", "r","r","r","r")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, font_size = 14) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2:6, background = "#e1faf9")| Metric | Android_App | iPhone_App | Mobile_Browser | Tablet_Browser | Desktop |
|---|---|---|---|---|---|
| Average Session Length (min) | 5.6 | 5.4 | 6.8 | 6 | 8.2 |
Conclusion
As displayed in the table above, desktop users have the longest average session at 8.2 minutes.
Step 3: Create a Summary Table
# Summary Table with All Key Metrics
summary_data <- data.frame(
Device_Type = c("Android App", "iPhone App", "Mobile Browser", "Tablet Browser", "Desktop"),
Avg_Pages_per_User = c(4.5, 4.2, 5.1, 3.8, 6.3),
Avg_Pages_per_Session = c(2.1, 2.0, 2.5, 2.2, 3.0),
Avg_Session_Length = c(5.6, 5.4, 6.8, 6.0, 8.2)
)
# Create the summary table
summary_data %>%
kbl(caption = "<b> Summary of Key Web Analytics Metrics <b> ", align = c("l", "r", "r","r","r","r")) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = F, font_size = 14) %>%
column_spec(1, bold = TRUE, background = "#d9f9f8") %>%
column_spec(2:4, background = "#e1faf9")| Device_Type | Avg_Pages_per_User | Avg_Pages_per_Session | Avg_Session_Length |
|---|---|---|---|
| Android App | 4.5 | 2.1 | 5.6 |
| iPhone App | 4.2 | 2.0 | 5.4 |
| Mobile Browser | 5.1 | 2.5 | 6.8 |
| Tablet Browser | 3.8 | 2.2 | 6.0 |
| Desktop | 6.3 | 3.0 | 8.2 |
Conclusion
The above is a summary table displaying average pages per user, average session per user and average session length across various device types.
Step 3: Graphical Representation of Key Web Analytics Metrics by Device Type
#combine data
summary_data <- data.frame(
Device_Type = c("Android App", "iPhone App", "Mobile Browser", "Tablet Browser", "Desktop"),
Avg_Pages_per_User = c(4.5, 4.2, 5.1, 3.8, 6.3),
Avg_Pages_per_Session = c(2.1, 2.0, 2.5, 2.2, 3.0),
Avg_Session_Length = c(5.6, 5.4, 6.8, 6.0, 8.2)
)
# Reshape data to long format
summary_data_long <- summary_data %>%
pivot_longer(
cols = starts_with("Avg"),
names_to = "Metric",
values_to = "Value"
)
ggplot(summary_data_long, aes(x = Device_Type, y = Value, fill = Metric)) +
geom_bar(stat = "identity", position = "dodge", width = 0.7) +
labs(
title = "Comparison of Key Web Analytics Metrics by Device Type",
x = "Device Type",
y = "Metric"
) +
scale_fill_manual(values = c("#8b33ff", "#e3bb76", "#ef7d7e")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5),
legend.position = "right"
)Recommendation
Statistically, since desktop users outperform all the other types of devices, it would be ideal to optimise desktop user experience for session retention.
Summary
The data analysis of PlanningAlerts.ie primarily highlights user engagement patterns and provides valuable insights to optimize the platform. Statistically, search engines are observed to be a key driver of external traffic, while social media remains an underutilized opportunity. Considering both engagement and volume, desktop users are identified as the dominant group. Common user journeys emphasize the prominence of the “application” page, with suggestions to increase traffic to underutilized pages of the website. Additionally, seasonal trends, such as a spike in engagement in July, indicate further opportunities for targeted social media marketing to enhance user retention.