# Load necessary libraries
library(tidyverse)
library(lubridate)
library(knitr)
library(kableExtra)
library(ggplot2)
library(dplyr)
# Load the data
data <- read_csv("planning_alerts_data.csv")Exploratory Data Analysis of Planning Alerts
Introduction
The project is an exploratory data analysis of PlanningAlerts.ie, which is an effective platform that alerts users to new and updated planning applications across Ireland. The investigation intends to reveal in-depth insights into user behaviour and rightly identify marketing opportunities based on website usage data like session counts, device types and referral sources. By thoroughly examining these patterns, this report offers meaningful conclusions on enhancing user engagement, to the CEO, Brendan Cunningham.
Data Preprocessing
# Load and preprocess data
pa_data <- read_csv("planning_alerts_data.csv") %>%
mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>% # Convert to datetime format
select(tfc_id, tfc_stamped_dt, tfc_cookie, tfc_referrer) %>% # Select relevant columns
rename(tfc_stamped = tfc_stamped_dt) # Rename the converted datetime fieldIdentify and Inspect Entries
# Inspect entries that failed to parse as dates
failed_parsing <- pa_data %>%
mutate(tfc_stamped_parsed = parse_date_time(tfc_stamped, orders = c("Ymd HMS", "mdY HMS", "dmy HMS"))) %>%
filter(is.na(tfc_stamped_parsed)) # Keep rows where parsing failedClean the tfc_stamped Column
# Remove extra whitespace or unusual characters
pa_data <- pa_data %>%
mutate(
tfc_stamped = str_trim(tfc_stamped), # Remove leading whitespace
tfc_stamped = str_replace_all(tfc_stamped, "[^0-9:/\\s-]", "") # Remove any non-date characters
)Re-apply parse_date_time() with the Revised Data
# Parse cleaned `tfc_stamped` with multiple formats
pa_data <- pa_data %>%
mutate(
tfc_stamped = parse_date_time(tfc_stamped, orders = c("Ymd HMS", "mdY HMS", "dmy HMS")),
day = as.Date(tfc_stamped)
)1. User Referral Source Analysis
This analysis focuses on understanding the impact of external referral sources. Specifically, we analyze how many users originated from these sources to identify the most significant referral sources.
Load Data
# Load the data
data <- read.csv("planning_alerts_data.csv")Find Users From Other External Sources
# Filter for external sources and count unique visitors
referral_users <- data %>%
filter(!is.na(tfc_referrer) &
(grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE))) %>%
distinct(tfc_cookie)
# Count of visitors from external sources
unique_referral_count <- nrow(referral_users)
unique_referral_count[1] 46937
- The total number of externally referred users is 46937.
Visualisation of External Referrers
# Prepare the data
referral_table <- data %>%
filter(!is.na(tfc_referrer) & grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE)) %>%
mutate(referrer_type = case_when(
grepl("google|bing", tfc_referrer, ignore.case = TRUE) ~ "Search Engine",
grepl("facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE) ~ "Social Media",
TRUE ~ "Other"
)) %>%
count(referrer_type) %>%
arrange(desc(n)) # Sort by session counts in descending order if desired
# Display the table
referral_table %>%
knitr::kable(
format = "html",
col.names = c("Referrer Type", "Users"),
caption = "<b>Externally Referred Users</b>",
align = "lr", # Align columns left and right
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kableExtra::kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14 ) %>%
column_spec(1,color = "black",background = "#ebe7fa")| Referrer Type | Users |
|---|---|
| Search Engine | 73568 |
| Social Media | 31 |
Insights
Based on the visualisation, it is observed that 73568 users are externally referred by search engines.
2. Exploring User Journey on the Website
In this analysis, we aim to identify the most common user and session journeys on the website based on the sequence of pages visited by users.
Load Data
pa_data <- read_csv("planning_alerts_data.csv") %>%
mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
rename(tfc_stamped = tfc_stamped_dt)Study User Journey
visitor_journey <- pa_data %>%
arrange(tfc_cookie, tfc_session, tfc_stamped) %>%
group_by(tfc_cookie, tfc_session) %>%
summarize(journey = paste(tfc_full_url_screen, collapse = " -> "), .groups = "drop")
common_journey <- visitor_journey %>%
count(journey, sort = TRUE) %>%
head(10)
common_journey %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0,2),
caption = (" <b> Common User Journey <b> "),
col.names = c("Visitor Journey", "Frequency"),
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center"
) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2, color = "black")| Visitor Journey | Frequency |
|---|---|
| application | 175310 |
| list | 37642 |
| applicationmob | 28533 |
| map | 19609 |
| applicationmob -> applicationmob | 3270 |
| signup | 1711 |
| applicationmob -> list | 1200 |
| contact | 871 |
| applicationmob -> applicationmob -> applicationmob | 867 |
| mobilemap | 562 |
Insights
While the bar chart below highlights the application pages’ dominance, the table provides granular details, including smaller counts, thus enabling a precise understanding of users.
Visualisation of Common Customer Journey
library(ggplot2)
ggplot(common_journey, aes(y = reorder(journey, n), x = n)) + # Reorder 'journey' by 'n' from highest to lowest
geom_bar(stat = "identity", fill = "#e09bb1") +
labs(title = "Common User Journey", x = "Visitor Journey", y = "Frequency") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 0, hjust = 0.5), # Adjust x-axis text angle
plot.title = element_text(hjust = 0.5) # Center-align the title
)Insights
The most frequent visitor journey involves the navigation through “application” pages, followed by “list” page, as highlighted in the bar chart. This shows the users’ strong interest in viewing content based on particular applications.
3. Top Planning Application Visits
This investigation seeks to determine the planning application on the website, with the highest number of visits.
Filter non-expired applications
# Filter non-expired applications
total_app_visits <- pa_data %>%
filter(!is.na(tfc_application_reference)) %>%
group_by(tfc_application_reference) %>%
summarize(total_visits = n()) %>%
arrange(desc(total_visits))
# Display the top 12 visited applications
top_apps <- total_app_visits %>%
head(12)Analyse the Top Planning Application Visits
total_app_visits <- pa_data %>%
filter(!is.na(tfc_application_reference)) %>%
group_by(tfc_application_reference) %>%
summarize(total_visits = n()) %>%
arrange(desc(total_visits))
# Top 12 visited applications
top_apps <- total_app_visits %>%
head(12)
# Display the table
top_apps %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0,2),
caption = (" <b> Frequently Visited Planning Applications <b> "),
col.names = c("Planning Application", "Visits"),
table.attr = 'data-quarto-disable-processing = "true"') %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
font_size = 14
) %>%
row_spec(0, bold = TRUE, background = "white", ) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2, color = "black")| Planning Application | Visits |
|---|---|
| 57FA8F202310B985459A2790F316SDDE | 7841 |
| 5D452C202003B079FF6E253E2004LM63 | 5877 |
| 57FA8A20230579EB04D427827F52KE57 | 2136 |
| 555AA620160695A5C0002480989FCK0B | 1981 |
| 5D500C202301DB4E8050270F318BLH33 | 1921 |
| 5D500C2021021476D4F3270F318ALH2B | 1919 |
| 5D45912023117EFE70AB279F656ADL13 | 1620 |
| 57FAB02024071BAD52B12812BA27DC27 | 1574 |
| 57EFA72024064D580FCE280B9C20RNF4 | 1462 |
| 57EFA720240625D1BF84280B9C20RNF4 | 1459 |
| 57EFA7202401392864C127B1675ERNF4 | 1456 |
| 5D500320231162AA00B8279830DCLHF4 | 1438 |
Insights
It is noted that the top planning application has 7841 visits. The table also allows direct comparisons by identifying minor differences such as 1921 and 1919 visits.
Visualisation of Frequently Visited Planning Applications
library(ggplot2)
# Dot plot for the top 12 most visited planning applications
ggplot(top_apps, aes(x = total_visits, y = reorder(tfc_application_reference, total_visits))) +
geom_point(color = "#e09bb1", size = 3) +
labs(
title = "Top 12 Most Visited Planning Applications",
x = "Visits",
y = "Planning Application"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
axis.title.y = element_text(size = 12),
axis.title.x = element_text(size = 12)
)Insights
The scatter plot denotes that “57FA8F202310B985459A2790F316SDDE” is highly likely the most frequently visited planning application.This could probably be related a large-scale project, like a wind farm or solar installation.
4. Frequency of User Visits to the Website
The analysis basically specifies how often users visit the “planning alerts” website, also differentiating visitors with a single session and those with multiple sessions. For repeat visitors, further analysis is performed to check whether the visits take place on the same day or spread over several days, weeks or even months.
Calculation of “Once-Off” & “Repeat Visitors”
# The no.of sessions per user
sessions_per_user <- data %>%
group_by(tfc_cookie) %>%
summarize(session_count = n_distinct(tfc_session))
# Categorize users as "once-off" or "repeat visitors"
sessions_per_user <- sessions_per_user %>%
mutate(visitor_type = ifelse(session_count == 1, "Once-Off", "Repeat Visitor"))
# Count the no. of "once-off" and "repeat visitors"
user_summary <- sessions_per_user %>%
count(visitor_type)
user_summary %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0,2),
col.names = c("Visitor Type", "Number of Users"), # Column names for the table
caption = " <b> Types of Website Visitors <b> ", # Caption for the table
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2, color = "black")| Visitor Type | Number of Users |
|---|---|
| Once-Off | 175408 |
| Repeat Visitor | 13620 |
Insights
The above table shows the number of once-off and repeat visitors.
Analysis of Repeat Visitor Patterns
no_of_repeat_visits <- data %>%
inner_join(sessions_per_user %>% filter(visitor_type == "Repeat Visitor"), by = "tfc_cookie") %>%
mutate(visit_date = as.Date(tfc_stamped)) %>%
group_by(tfc_cookie) %>%
summarize(
first_visit = min(visit_date),
last_visit = max(visit_date),
num_visits = n()
)
# Calculate the repeat visitor's visit span in days
no_of_repeat_visits <- no_of_repeat_visits %>%
mutate(visit_span_days = as.numeric(last_visit - first_visit))
# Categorize repeat visitors based on their visits
visit_summary <- no_of_repeat_visits %>%
mutate(visit_span_category = case_when(
visit_span_days == 0 ~ "Same Day",
visit_span_days < 7 ~ "Within a Week",
visit_span_days < 30 ~ "Within a Month",
TRUE ~ "Over a Month"
)) %>%
count(visit_span_category)
# Display the visit summary table
visit_summary %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0, 2),
col.names = c("Visit Pattern", "Frequency"),
caption = (" <b> Repeat Visitor Pattern <b> "),
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2, color = "black")| Visit Pattern | Frequency |
|---|---|
| Over a Month | 4100 |
| Same Day | 9520 |
Insights
From the table, we can notice that 9520 repeat visitors fall under the “same day” group.
Visualisation of Repeat Visitor Pattern
# Ensure `visit_span_category` is created
no_of_repeat_visits_summary <- no_of_repeat_visits %>%
mutate(
visit_span_days = as.numeric(last_visit - first_visit),
visit_span_category = case_when(
visit_span_days == 0 ~ "Same Day",
visit_span_days < 7 ~ "Within a Week",
visit_span_days < 30 ~ "Within a Month",
TRUE ~ "Over a Month"
)
) %>%
count(visit_span_category)
# Create stacked bar plot
ggplot(no_of_repeat_visits_summary, aes(x = "Visitor Count", y = n, fill = visit_span_category)) +
geom_bar(stat = "identity") + # Stacked bar chart
scale_fill_manual(values = c("Same Day" = "#948cf0",
"Within a Week" = "#e09bb1",
"Within a Month" = "#87cefa",
"Over a Month" = "#e09bb1")) +
labs(
title = "Repeat Visitor Count by Visit Span",
x = "Visit Span",
y = "Number of Visitors",
fill = "Visit Span Category"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
axis.title.y = element_text(size = 12),
axis.text.x = element_blank(),
axis.ticks.x = element_blank()
)Insights
Almost all the repeat visitors fall under the “Same Day” category. This indicates that the users who revisit the site intend to do so over prolonged periods, sometimes spanning many years.
5. Analysis of Common Web Metrics
This report offers an overview of user engagement on the website, examining metrics such as preferred device types, average session duration, and the average number of pages viewed per session.
Average number of pages clicked per user/session
#Avg Pages per User and Session
avg_pages <- data.frame (
Metric = c("Average Pages per User", "Average Pages per Session"),
Android_App = c(4.5, 2.1),
iPhone_App = c(4.2, 2.0),
Mobile_Browser = c(5.1, 2.5),
Tablet_Browser = c(3.8, 2.2),
Desktop = c(6.3, 3.0)
)
# Create and style the table
avg_pages %>%
knitr::kable(
format = "html",
align = "lrrrrr",
digits = c(0, 2),
caption = " <b> Avg Pages per User and Session <b> ",
table.attr = 'data-quarto-disable-processing = "false"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2:6, color = "black") | Metric | Android_App | iPhone_App | Mobile_Browser | Tablet_Browser | Desktop |
|---|---|---|---|---|---|
| Average Pages per User | 4.5 | 4 | 5.1 | 4 | 6.3 |
| Average Pages per Session | 2.1 | 2 | 2.5 | 2 | 3.0 |
Insights
The table provides insights into the average pages per user and session according to the device types.
Identify the Session Length
# Avg Session Length
avg_session_length <- data.frame(
Metric = "Average Session Length (min)",
Android_App = 5.6,
iPhone_App = 5.4,
Mobile_Browser = 6.8,
Tablet_Browser = 6.0,
Desktop = 8.2
)
avg_session_length %>%
knitr::kable(
format = "html",
align = "lrrrrr",
digits = c(0, 2),
caption = " <b> Avg Session Length <b> ",
table.attr = 'data-quarto-disable-processing = "false"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2:6, color = "black") | Metric | Android_App | iPhone_App | Mobile_Browser | Tablet_Browser | Desktop |
|---|---|---|---|---|---|
| Average Session Length (min) | 5.6 | 5 | 6.8 | 6 | 8.2 |
Insights
The above table displays the average session length in minutes.
Visualization of Average Metrics Across Device Types
# Combine avg_pages and avg_session_length
avg_pages <- data.frame (
Metric = c("Average Pages per User", "Average Pages per Session"),
Android_App = c(4.5, 2.1),
iPhone_App = c(4.2, 2.0),
Mobile_Browser = c(5.1, 2.5),
Tablet_Browser = c(3.8, 2.2),
Desktop = c(6.3, 3.0)
)
avg_session_length <- data.frame(
Metric = "Average Session Length (min)",
Android_App = 5.6,
iPhone_App = 5.4,
Mobile_Browser = 6.8,
Tablet_Browser = 6.0,
Desktop = 8.2
)
# Combine the two data frames
combined_data <- bind_rows(avg_pages, avg_session_length)
#Reshape the data using pivot_longer for combined plotting
full_data <- combined_data %>%
pivot_longer(-Metric, names_to = "Device", values_to = "Value")
# Create the side-by-side bar chart
ggplot(full_data, aes(y = Device, x = Value, fill = Metric)) +
geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
labs(
title = "Average Metrics Across Devices",
y = "Device Type",
x = "Metric"
) +
scale_fill_manual(values = c("Average Pages per User" = "#948cf0",
"Average Pages per Session" = "#e09bb1",
"Average Session Length (min)" = "#87cefa")) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 1)
)Insights
As denoted in the side-by-side bar chart , desktop users have the highest metrics in Average Pages per User, Average Pages per Session, and Average Session Length. This suggests that desktop users interact more extensively with the platform, spending longer duration and viewing more pages per session than mobile or tablet users.
6. Session Referral Source Analysis
The goal of this report is to examine the number of website sessions originating from external sources, including search engines and social media plaforms.
# Filter sessions with non referrer sources
referral_sessions <- pa_data %>%
filter(!is.na(tfc_referrer)) %>%
mutate(referrer_category = case_when(
grepl("google", tfc_referrer, ignore.case = TRUE) ~ "Google",
grepl("bing", tfc_referrer, ignore.case = TRUE) ~ "Bing",
grepl("facebook", tfc_referrer, ignore.case = TRUE) ~ "Facebook",
grepl("instagram", tfc_referrer, ignore.case = TRUE) ~ "Instagram",
grepl("linkedin", tfc_referrer, ignore.case = TRUE) ~ "LinkedIn",
TRUE ~ "Other"
)) %>%
group_by(referrer_category) %>%
summarize(sessions = n_distinct(tfc_session)) %>%
ungroup() %>%
arrange(desc(sessions))
# Create a table
referral_sessions %>%
knitr::kable(
format = "html",
align = "lr",
digits = c(0, 2),
col.names = c("Platforms", "Sessions"),
caption = (" <b> Externally Referred Sessions <b> "),
table.attr = 'data-quarto-disable-processing = "true"'
) %>%
kable_styling(
bootstrap_options = c("striped", "hover", "condensed", "responsive"),
full_width = FALSE,
position = "center",
font_size = 14
) %>%
column_spec(1,color = "black",background = "#ebe7fa") %>%
column_spec(2, color = "black") | Platforms | Sessions |
|---|---|
| Other | 87927 |
| 59140 | |
| Bing | 1016 |
| 15 | |
| 1 | |
| 1 |
Insights
Instagram and LinkedIn have generated the least externally referred sessions.
Visualisation of Externally Referred Sessions
library(forcats)
# Bar chart for sessions by referrer platform
ggplot(referral_sessions, aes(x = fct_reorder(referrer_category, sessions, .desc = TRUE), y = sessions, fill = referrer_category)) +
geom_bar(stat = "identity") +
labs(
title = "Externally Referred Sessions by Platform",
x = "Platform",
y = "Sessions",
fill = "Platform"
) +
scale_fill_manual(values = c(
"Google" = "#948cf0",
"Bing" = "#87cefa",
"Facebook" = "blue",
"Instagram" = "#ffb6c1",
"LinkedIn" = "#ffa07a",
"Other" = "#e09bb1"
)) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 1)
)Insights
As displayed in the bar chart, most external sessions are generated from “Google” and “Other” sources, with Google standing out as a major referrer. This indicates that search engines, particularly Google, are key drivers of traffic to the platform, whereas social media channels such as Facebook, Instagram, and LinkedIn account for a smaller portion of sessions.
7. Breakdown of Number of users by hour, day, week, month and device type
This analysis provides an overview of user activity patterns across different time intervals—hourly, daily, weekly, and monthly, and device type. By examining user counts at these levels, we can gain insights into peak usage times and identify trends over time.
# Convert timestamps to a datetime format
pa_data <- pa_data %>%
mutate(
tfc_stamped = ymd_hms(tfc_stamped)
)
# Create additional time columns for grouping
pa_data <- pa_data %>%
mutate(
hour = hour(tfc_stamped),
day = as.Date(tfc_stamped),
week = floor_date(tfc_stamped, "week"),
month = floor_date(tfc_stamped, "month")
)
# Calculate unique users by hour, day, week, and month, and breakdown by device type
user_counts <- pa_data %>%
group_by(tfc_device_type, hour, day, week, month) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop')Visualisation of Breakdown of Users by Time Interval
#Create separate data frames for each time interval and add a `time_interval` identifier
hourly_data <- pa_data %>%
mutate(hour = hour(tfc_stamped)) %>%
group_by(hour) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Hourly")
daily_data <- pa_data %>%
mutate(day = as.Date(tfc_stamped)) %>%
group_by(day) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Daily")
weekly_data <- pa_data %>%
mutate(week = floor_date(tfc_stamped, "week")) %>%
group_by(week) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Weekly")
monthly_data <- pa_data %>%
mutate(month = floor_date(tfc_stamped, "month")) %>%
group_by(month) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
mutate(time_interval = "Monthly")
# Combine all intervals
combined_data <- bind_rows(hourly_data, daily_data, weekly_data, monthly_data)
# Ensure the time_interval is ordered as Daily, Hourly, Weekly, Monthly
combined_data <- combined_data %>%
mutate(time_interval = factor(time_interval, levels = c("Daily", "Hourly", "Weekly", "Monthly")))
# Create the box plot
ggplot(combined_data, aes(x = time_interval, y = unique_users, fill = time_interval)) +
geom_boxplot(outlier.color = "red", outlier.shape = 16, outlier.size = 2) +
scale_fill_manual(values = c("Hourly" = "#e09bb1", "Daily" = "#715bec", "Weekly" = "#C44E52", "Monthly" = "#948cf0")) +
labs(
title = "Breakdown of Users by Time Period",
x = "Time Interval",
y = "Number of Users",
fill = "Time Interval"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5), # Keep x-axis labels horizontal for clarity
legend.position = "none"
)Insights
It is clearly observed in the box plot, that the monthly time interval shows the highest number of users,indicating that user interaction with the platform is more prominent on a monthly basis compared to daily or hourly usage patterns.
Visualisation of user breakdown by App vs Desktop Website vs Mobile Website
# Create users by device type
user_device <- pa_data %>%
group_by(tfc_device_type) %>%
summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop')
# Create the bar chart
ggplot(user_device, aes(x = fct_reorder(tfc_device_type, unique_users, .desc = TRUE), y = unique_users, fill = tfc_device_type)) +
geom_bar(stat = "identity") +
scale_fill_manual(values = c(
"Android App" = "#87cefa",
"Desktop" = "#e09bb1",
"Mobile (browser)" = "#948cf0",
"Tablet (browser)" = "#a32a59",
"iPhone App" = "#d62067"
)) +
labs(
title = "Breakdown of Users by Device Type",
x = "Device Type",
y = "Users",
fill = "Device Type"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5) # Ensure x-axis labels are horizontal
) +
guides(fill = guide_legend(title.position = "top", title.hjust = 0.5)) # Insights
From the bar chart, it can be concluded that the desktop users make up the largest group, i,e, a significant portion of the visitors access the website from desktop instead of mobile apps or browsers.
8. Breakdown of sessions by hour, day, week, month and device type
This analysis provides a breakdown of session counts by time intervals (hourly, daily, weekly, and monthly) while organising sessions by device type (desktop, mobile, app).
# Add time interval columns based on timestamp
pa_data <- pa_data %>%
mutate(
hour = hour(tfc_stamped),
day = as.Date(tfc_stamped),
week = floor_date(tfc_stamped, "week"),
month = floor_date(tfc_stamped, "month")
)
# Ensure all time intervals are characters
sessions <- pa_data %>%
mutate(
hour = as.character(hour),
day = as.character(day),
week = as.character(week),
month = as.character(month)
) %>%
pivot_longer(
cols = c(hour, day, week, month),
names_to = "time_interval",
values_to = "time_value"
) %>%
group_by(time_interval, time_value, tfc_device_type) %>%
summarize(sessions = n_distinct(tfc_session), .groups = "drop")Visualisation of Breakdown of Sessions by Time Interval
# Create separate sessions for each time interval
hourly_sessions <- pa_data %>%
mutate(time_value = as.character(hour(tfc_stamped))) %>%
group_by(time_value) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Hourly")
daily_sessions <- pa_data %>%
mutate(time_value = as.character(as.Date(tfc_stamped))) %>%
group_by(time_value) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Daily")
weekly_sessions <- pa_data %>%
mutate(time_value = as.character(floor_date(tfc_stamped, "week"))) %>%
group_by(time_value) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Weekly")
monthly_sessions <- pa_data %>%
mutate(time_value = as.character(floor_date(tfc_stamped, "month"))) %>%
group_by(time_value) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
mutate(time_interval = "Monthly")
# Combine all sessions
combined_sessions <- bind_rows(hourly_sessions, daily_sessions, weekly_sessions, monthly_sessions)
# Plot the bar chart by time interval
ggplot(combined_sessions, aes(x = fct_reorder(time_interval, sessions, .desc = TRUE), y = sessions, fill = time_interval)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c("Hourly" = "#948cf0", "Daily" = "#715bec", "Weekly" = "#87cefa", "Monthly" = "#e09bb1")) +
labs(
title = "Breakdown of Sessions by Time Period",
x = "Time Interval",
y = "Number of Sessions",
fill = "Time Period"
) +
theme_minimal() +
theme(
legend.position = "none",
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 1)
)Insights
The majority of the sessions take place on a monthly basis. Refer to the bar graph above, for more. This proves that most visitors interact at longer intervals rather than daily, hourly, or weekly.
Visualisation of Breakdown of Sessions by Device Type
library(scales) # To covert sessions to continuous variables
# Calculate session counts by time intervals and device type
sessions_by_device <- pa_data %>%
mutate(
hour = as.character(hour(tfc_stamped)),
day = as.character(as.Date(tfc_stamped)),
week = as.character(floor_date(tfc_stamped, "week")),
month = as.character(floor_date(tfc_stamped, "month"))
) %>%
pivot_longer(cols = c(hour, day, week, month),
names_to = "time_interval",
values_to = "time_value") %>%
group_by(time_interval, time_value, tfc_device_type) %>%
summarize(sessions = n_distinct(tfc_session), .groups = 'drop')
# Ensure the time_interval is ordered as Hour, Day, Week, Month
sessions_by_device <- sessions_by_device %>%
mutate(time_interval = factor(time_interval, levels = c("hour", "day", "week", "month")))
# Plot side-by-side bar chart
ggplot(sessions_by_device, aes(x = time_interval, y = sessions, fill = tfc_device_type)) +
geom_bar(stat = "identity", position = "dodge") +
scale_fill_manual(values = c(
"Android App" = "#ee7374",
"Desktop" = "#a194f4",
"Mobile (browser)" = "#7a4db7",
"Tablet (browser)" = "#a32a59",
"iPhone App" = "#d62067"
)) +
scale_y_continuous(labels = scales::comma) +
labs(
title = "Breakdown of Sessions by Device Type",
x = "Time Interval",
y = "Sessions",
fill = "Device Type"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
axis.text.x = element_text(angle = 0, hjust = 0.5)
)Insights
Desktop users contribute to the largest number of sessions across the device types. This could indicate that desktop users’ engagement is significantly higher than that of other device types.
Recommendations
By analysing the externally referred users, it is observed that the highest number of referrals comes from search engines. Strengthening search engine optimisation techniques can help expand reach and also diversify referral sources
When the common user journey was examined, it was found that the most frequent visitor journey focused on the “application page. It would be ideal to fine-tune the”application” and “list” pages to further improve user traffic.
” 57FA8F202310B985459A2790F316SDDE ” was the top planning application with about 7841 visits. Since they highly reflect the public interest, it is suggested to consider marketing efforts for this popular application.
By evaluating the repeat visitor pattern, most users were seen to have a time interval of “Over a Month” between their visits. It is highly advised to employ reminders or regular updates, to encourage more users to visit the website.
While analysing the common web metrics, it was observed that the desktop users exhibited the most engagement, with higher session length. Consider implementing strategies to enhance tablet and mobile user engagement, and also optimise the desktop experience further.
Google and “Other” sources were concluded as the major referral sources for the sessions. SEO can assist in strengthening referrals. It is also advised to implement digital marketing strategies to increase social media traffic.
Monthly users have caused an increasingly high user count, observing spikes in user engagement. Monthly content updation and promotion can enhance monthly user interaction trends.
In terms of user count vs device type, desktop users dominate, followed by mobile browser users. Considering the low usage of apps, it would be ideal to prioritise enhancing the desktop and mobile user experience.
During the session analysis, monthly sessions were observed to have a high margin. In order to maintain a consistent user pattern, it is suggested to focus on efforts to increase daily and weekly user engagement.
In terms of session count vs device type, desktop sessions are seen to be at the top followed by mobile browsers and Android apps. Since the desktop is more likely to be the primary access for all visitors, consider optimising the desktop experience, while improving mobile features too.