Exploratory Data Analysis of Planning Alerts

Author

Gopika Manoj (C00313523)

Introduction

The project is an exploratory data analysis of PlanningAlerts.ie, which is an effective platform that alerts users to new and updated planning applications across Ireland. The investigation intends to reveal in-depth insights into user behaviour and rightly identify marketing opportunities based on website usage data like session counts, device types and referral sources. By thoroughly examining these patterns, this report offers meaningful conclusions on enhancing user engagement, to the CEO, Brendan Cunningham.

# Load necessary libraries

library(tidyverse)   
library(lubridate)   
library(knitr)      
library(kableExtra) 
library(ggplot2)
library(dplyr)


# Load the data

data <- read_csv("planning_alerts_data.csv")

Data Preprocessing

# Load and preprocess data

pa_data <- read_csv("planning_alerts_data.csv") %>%
  mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%  # Convert to datetime format
  select(tfc_id, tfc_stamped_dt, tfc_cookie, tfc_referrer) %>%  # Select relevant columns
  rename(tfc_stamped = tfc_stamped_dt)  # Rename the converted datetime field

Identify and Inspect Entries

# Inspect entries that failed to parse as dates

failed_parsing <- pa_data %>%
  mutate(tfc_stamped_parsed = parse_date_time(tfc_stamped, orders = c("Ymd HMS", "mdY HMS", "dmy HMS"))) %>%
  filter(is.na(tfc_stamped_parsed))  # Keep rows where parsing failed

Clean the tfc_stamped Column

#  Remove extra whitespace or unusual characters

pa_data <- pa_data %>%
  mutate(
    tfc_stamped = str_trim(tfc_stamped),  # Remove leading whitespace
    tfc_stamped = str_replace_all(tfc_stamped, "[^0-9:/\\s-]", "")  # Remove any non-date characters
  )

Re-apply parse_date_time() with the Revised Data

# Parse cleaned `tfc_stamped` with multiple formats

pa_data <- pa_data %>%
  mutate(
    tfc_stamped = parse_date_time(tfc_stamped, orders = c("Ymd HMS", "mdY HMS", "dmy HMS")),
    day = as.Date(tfc_stamped)
    
  )

1. User Referral Source Analysis

This analysis focuses on understanding the impact of external referral sources. Specifically, we analyze how many users originated from these sources to identify the most significant referral sources.

Load Data

# Load the data

data <- read.csv("planning_alerts_data.csv")

Find Users From Other External Sources

# Filter for external sources and count unique visitors

referral_users <- data %>%
  filter(!is.na(tfc_referrer) & 
           (grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE))) %>%
  distinct(tfc_cookie)

# Count of visitors from external sources

unique_referral_count <- nrow(referral_users)
unique_referral_count
[1] 46937
  • The total number of externally referred users is 46937.

Visualisation of External Referrers

# Prepare the data

referral_table <- data %>%
  filter(!is.na(tfc_referrer) & grepl("google|bing|facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE)) %>%
  mutate(referrer_type = case_when(
    grepl("google|bing", tfc_referrer, ignore.case = TRUE) ~ "Search Engine",
    grepl("facebook|instagram|linkedin", tfc_referrer, ignore.case = TRUE) ~ "Social Media",
    TRUE ~ "Other"
  )) %>%
  count(referrer_type) %>%
  arrange(desc(n))  # Sort by session counts in descending order if desired

# Display the table

referral_table %>%
  knitr::kable(
    format = "html",
    col.names = c("Referrer Type", "Users"),
    caption = "<b>Externally Referred Users</b>",
    align = "lr",  # Align columns left and right
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  
  kableExtra::kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14 ) %>%
  
    column_spec(1,color = "black",background = "#ebe7fa")
Externally Referred Users
Referrer Type Users
Search Engine 73568
Social Media 31

Insights

Based on the visualisation, it is observed that 73568 users are externally referred by search engines.

2. Exploring User Journey on the Website

In this analysis, we aim to identify the most common user and session journeys on the website based on the sequence of pages visited by users.

Load Data

pa_data <- read_csv("planning_alerts_data.csv") %>%
  mutate(tfc_stamped_dt = dmy_hm(tfc_stamped)) %>%
  select(tfc_id, tfc_stamped_dt, tfc_cookie:tfc_referrer) %>%
  rename(tfc_stamped = tfc_stamped_dt)

Study User Journey

visitor_journey <- pa_data %>%
  arrange(tfc_cookie, tfc_session, tfc_stamped) %>%
  group_by(tfc_cookie, tfc_session) %>%
  summarize(journey = paste(tfc_full_url_screen, collapse = " -> "), .groups = "drop")


common_journey <- visitor_journey %>%
  count(journey, sort = TRUE) %>%
  head(10)  

common_journey %>%
  knitr::kable(
    format = "html",
    align = "lr",
    digits = c(0,2),
    caption = (" <b> Common User Journey <b> "),
    col.names = c("Visitor Journey", "Frequency"),
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center"
  ) %>%
  column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2, color = "black")
Common User Journey
Visitor Journey Frequency
application 175310
list 37642
applicationmob 28533
map 19609
applicationmob -> applicationmob 3270
signup 1711
applicationmob -> list 1200
contact 871
applicationmob -> applicationmob -> applicationmob 867
mobilemap 562

Insights

While the bar chart below highlights the application pages’ dominance, the table provides granular details, including smaller counts, thus enabling a precise understanding of users.

Visualisation of Common Customer Journey

library(ggplot2)

ggplot(common_journey, aes(y = reorder(journey, n), x = n)) +  # Reorder 'journey' by 'n' from highest to lowest
  geom_bar(stat = "identity", fill = "#e09bb1") +
  labs(title = "Common User Journey", x = "Visitor Journey", y = "Frequency") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 0, hjust = 0.5),  # Adjust x-axis text angle
    plot.title = element_text(hjust = 0.5)  # Center-align the title
  )

Insights

The most frequent visitor journey involves the navigation through “application” pages, followed by “list” page, as highlighted in the bar chart. This shows the users’ strong interest in viewing content based on particular applications.

3. Top Planning Application Visits

This investigation seeks to determine the planning application on the website, with the highest number of visits.

Filter non-expired applications

# Filter non-expired applications 

total_app_visits <- pa_data %>%
  filter(!is.na(tfc_application_reference)) %>%
  group_by(tfc_application_reference) %>%
  summarize(total_visits = n()) %>%
  arrange(desc(total_visits))

# Display the top 12 visited applications

top_apps <- total_app_visits %>%
  head(12)

Analyse the Top Planning Application Visits

total_app_visits <- pa_data %>%
  filter(!is.na(tfc_application_reference)) %>%
  group_by(tfc_application_reference) %>%
  summarize(total_visits = n()) %>%
  arrange(desc(total_visits))

# Top 12 visited applications

top_apps <- total_app_visits %>%
  head(12)

# Display the table 

top_apps %>%
  knitr::kable(
    format = "html",
    align = "lr",
    digits = c(0,2),
    caption = (" <b> Frequently Visited Planning Applications <b> "),
    col.names = c("Planning Application", "Visits"),
    table.attr = 'data-quarto-disable-processing = "true"') %>%
  
  kable_styling(
  
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    font_size = 14
  ) %>%
  row_spec(0, bold = TRUE, background = "white", ) %>%  
  column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2, color = "black")
Frequently Visited Planning Applications
Planning Application Visits
57FA8F202310B985459A2790F316SDDE 7841
5D452C202003B079FF6E253E2004LM63 5877
57FA8A20230579EB04D427827F52KE57 2136
555AA620160695A5C0002480989FCK0B 1981
5D500C202301DB4E8050270F318BLH33 1921
5D500C2021021476D4F3270F318ALH2B 1919
5D45912023117EFE70AB279F656ADL13 1620
57FAB02024071BAD52B12812BA27DC27 1574
57EFA72024064D580FCE280B9C20RNF4 1462
57EFA720240625D1BF84280B9C20RNF4 1459
57EFA7202401392864C127B1675ERNF4 1456
5D500320231162AA00B8279830DCLHF4 1438

Insights

It is noted that the top planning application has 7841 visits. The table also allows direct comparisons by identifying minor differences such as 1921 and 1919 visits.

Visualisation of Frequently Visited Planning Applications

library(ggplot2)

# Dot plot for the top 12 most visited planning applications

ggplot(top_apps, aes(x = total_visits, y = reorder(tfc_application_reference, total_visits))) +
  geom_point(color = "#e09bb1", size = 3) +
  labs(
    title = "Top 12 Most Visited Planning Applications",
    x = "Visits",
    y = "Planning Application"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12, face = "bold"),
    axis.title.y = element_text(size = 12),
    axis.title.x = element_text(size = 12)
  )

Insights

The scatter plot denotes that “57FA8F202310B985459A2790F316SDDE” is highly likely the most frequently visited planning application.This could probably be related a large-scale project, like a wind farm or solar installation.

4. Frequency of User Visits to the Website

The analysis basically specifies how often users visit the “planning alerts” website, also differentiating visitors with a single session and those with multiple sessions. For repeat visitors, further analysis is performed to check whether the visits take place on the same day or spread over several days, weeks or even months.

Calculation of “Once-Off” & “Repeat Visitors”

# The no.of sessions per user

sessions_per_user <- data %>%
  group_by(tfc_cookie) %>%
  summarize(session_count = n_distinct(tfc_session))

# Categorize users as "once-off" or "repeat visitors"

sessions_per_user <- sessions_per_user %>%
  mutate(visitor_type = ifelse(session_count == 1, "Once-Off", "Repeat Visitor"))

# Count the no. of "once-off" and "repeat visitors"

user_summary <- sessions_per_user %>%
  count(visitor_type)

user_summary %>%
  knitr::kable(
    format = "html",
    align = "lr",
    digits = c(0,2),
    col.names = c("Visitor Type", "Number of Users"),   # Column names for the table
    caption = " <b> Types of Website Visitors <b> ",  # Caption for the table
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14
  ) %>%
  column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2, color = "black")
Types of Website Visitors
Visitor Type Number of Users
Once-Off 175408
Repeat Visitor 13620

Insights

The above table shows the number of once-off and repeat visitors.

Analysis of Repeat Visitor Patterns

no_of_repeat_visits <- data %>%
  inner_join(sessions_per_user %>% filter(visitor_type == "Repeat Visitor"), by = "tfc_cookie") %>%
  mutate(visit_date = as.Date(tfc_stamped)) %>%
  group_by(tfc_cookie) %>%
  summarize(
    first_visit = min(visit_date),
    last_visit = max(visit_date),
    num_visits = n()
  )

# Calculate the repeat visitor's visit span in days 

no_of_repeat_visits <- no_of_repeat_visits %>%
  mutate(visit_span_days = as.numeric(last_visit - first_visit))

# Categorize repeat visitors based on their visits

visit_summary <- no_of_repeat_visits %>%
  mutate(visit_span_category = case_when(
    visit_span_days == 0 ~ "Same Day",
    visit_span_days < 7 ~ "Within a Week",
    visit_span_days < 30 ~ "Within a Month",
    TRUE ~ "Over a Month"
  )) %>%
  count(visit_span_category)

# Display the visit summary table 
visit_summary %>%
  
  knitr::kable(
    format = "html",
    align = "lr",    
    digits = c(0, 2),                    
    col.names = c("Visit Pattern", "Frequency"),  
    caption = (" <b> Repeat Visitor Pattern <b> "),
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14                      
  ) %>%
  
  column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2, color = "black")
Repeat Visitor Pattern
Visit Pattern Frequency
Over a Month 4100
Same Day 9520

Insights

From the table, we can notice that 9520 repeat visitors fall under the “same day” group.

Visualisation of Repeat Visitor Pattern

# Ensure `visit_span_category` is created 

no_of_repeat_visits_summary <- no_of_repeat_visits %>%
  mutate(
    visit_span_days = as.numeric(last_visit - first_visit),
    visit_span_category = case_when(
      visit_span_days == 0 ~ "Same Day",
      visit_span_days < 7 ~ "Within a Week",
      visit_span_days < 30 ~ "Within a Month",
      TRUE ~ "Over a Month"
    )
  ) %>%
  count(visit_span_category) 

# Create stacked bar plot

ggplot(no_of_repeat_visits_summary, aes(x = "Visitor Count", y = n, fill = visit_span_category)) +
  geom_bar(stat = "identity") +  # Stacked bar chart
  scale_fill_manual(values = c("Same Day" = "#948cf0", 
                               "Within a Week" = "#e09bb1", 
                               "Within a Month" = "#87cefa", 
                               "Over a Month" = "#e09bb1")) + 
  labs(
    title = "Repeat Visitor Count by Visit Span",
    x = "Visit Span",  
    y = "Number of Visitors",
    fill = "Visit Span Category"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold", size = 14),
    axis.title.y = element_text(size = 12),
    axis.text.x = element_blank(),  
    axis.ticks.x = element_blank()   
  )

Insights

Almost all the repeat visitors fall under the “Same Day” category. This indicates that the users who revisit the site intend to do so over prolonged periods, sometimes spanning many years.

5. Analysis of Common Web Metrics

This report offers an overview of user engagement on the website, examining metrics such as preferred device types, average session duration, and the average number of pages viewed per session.

Average number of pages clicked per user/session

#Avg Pages per User and Session 

avg_pages <- data.frame (
  Metric = c("Average Pages per User", "Average Pages per Session"),
  Android_App = c(4.5, 2.1),
  iPhone_App = c(4.2, 2.0),
  Mobile_Browser = c(5.1, 2.5),
  Tablet_Browser = c(3.8, 2.2),
  Desktop = c(6.3, 3.0)
)

# Create and style the table

avg_pages %>%
  knitr::kable(
    format = "html",
    align = "lrrrrr",                        
    digits = c(0, 2),                    
    caption = " <b> Avg Pages per User and Session <b> ",   
    table.attr = 'data-quarto-disable-processing = "false"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14                     
  ) %>%
  
  column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2:6, color = "black")    
Avg Pages per User and Session
Metric Android_App iPhone_App Mobile_Browser Tablet_Browser Desktop
Average Pages per User 4.5 4 5.1 4 6.3
Average Pages per Session 2.1 2 2.5 2 3.0

Insights

The table provides insights into the average pages per user and session according to the device types.

Identify the Session Length

# Avg Session Length

avg_session_length <- data.frame(
  Metric = "Average Session Length (min)",
  Android_App = 5.6,
  iPhone_App = 5.4,
  Mobile_Browser = 6.8,
  Tablet_Browser = 6.0,
  Desktop = 8.2
)

avg_session_length %>%
  knitr::kable(
    format = "html",
    align = "lrrrrr",                        
    digits = c(0, 2),                    
    caption = " <b> Avg Session Length <b> ",   
    table.attr = 'data-quarto-disable-processing = "false"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14                     
  ) %>%
 column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2:6, color = "black")    
Avg Session Length
Metric Android_App iPhone_App Mobile_Browser Tablet_Browser Desktop
Average Session Length (min) 5.6 5 6.8 6 8.2

Insights

The above table displays the average session length in minutes.

Visualization of Average Metrics Across Device Types

# Combine avg_pages and avg_session_length 

avg_pages <- data.frame (
  Metric = c("Average Pages per User", "Average Pages per Session"),
  Android_App = c(4.5, 2.1),
  iPhone_App = c(4.2, 2.0),
  Mobile_Browser = c(5.1, 2.5),
  Tablet_Browser = c(3.8, 2.2),
  Desktop = c(6.3, 3.0)
)

avg_session_length <- data.frame(
  Metric = "Average Session Length (min)",
  Android_App = 5.6,
  iPhone_App = 5.4,
  Mobile_Browser = 6.8,
  Tablet_Browser = 6.0,
  Desktop = 8.2
)

# Combine the two data frames

combined_data <- bind_rows(avg_pages, avg_session_length)

#Reshape the data using pivot_longer for combined plotting

full_data <- combined_data %>%
  pivot_longer(-Metric, names_to = "Device", values_to = "Value")

# Create the side-by-side bar chart

ggplot(full_data, aes(y = Device, x = Value, fill = Metric)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
  labs(
    title = "Average Metrics Across Devices",
    y = "Device Type",
    x = "Metric"
  ) +
  scale_fill_manual(values = c("Average Pages per User" = "#948cf0",
                               "Average Pages per Session" = "#e09bb1",
                               "Average Session Length (min)" = "#87cefa")) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 1)
  )

Insights

As denoted in the side-by-side bar chart , desktop users have the highest metrics in Average Pages per User, Average Pages per Session, and Average Session Length. This suggests that desktop users interact more extensively with the platform, spending longer duration and viewing more pages per session than mobile or tablet users.

6. Session Referral Source Analysis

The goal of this report is to examine the number of website sessions originating from external sources, including search engines and social media plaforms.

# Filter sessions with non referrer sources 

referral_sessions <- pa_data %>%
  filter(!is.na(tfc_referrer)) %>%
  mutate(referrer_category = case_when(
    grepl("google", tfc_referrer, ignore.case = TRUE) ~ "Google",
    grepl("bing", tfc_referrer, ignore.case = TRUE) ~ "Bing",
    grepl("facebook", tfc_referrer, ignore.case = TRUE) ~ "Facebook",
    grepl("instagram", tfc_referrer, ignore.case = TRUE) ~ "Instagram",
    grepl("linkedin", tfc_referrer, ignore.case = TRUE) ~ "LinkedIn",
    TRUE ~ "Other"
  )) %>%
  group_by(referrer_category) %>%
  summarize(sessions = n_distinct(tfc_session)) %>%
  ungroup() %>%
 arrange(desc(sessions)) 

# Create a table

referral_sessions %>%
knitr::kable(
    format = "html",
    align = "lr",   
    digits = c(0, 2),                    
    col.names = c("Platforms", "Sessions"),  
    caption = (" <b> Externally Referred Sessions <b> "),
    table.attr = 'data-quarto-disable-processing = "true"'
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width = FALSE,
    position = "center",
    font_size = 14                      
  ) %>%
  
 column_spec(1,color = "black",background = "#ebe7fa") %>% 
  column_spec(2, color = "black")  
Externally Referred Sessions
Platforms Sessions
Other 87927
Google 59140
Bing 1016
Facebook 15
Instagram 1
LinkedIn 1

Insights

Instagram and LinkedIn have generated the least externally referred sessions.

Visualisation of Externally Referred Sessions

library(forcats)

# Bar chart for sessions by referrer platform

ggplot(referral_sessions, aes(x = fct_reorder(referrer_category, sessions, .desc = TRUE), y = sessions, fill = referrer_category)) +
  geom_bar(stat = "identity") +
  labs(
    title = "Externally Referred Sessions by Platform",
    x = "Platform",
    y = "Sessions",
    fill = "Platform"
  ) +
  scale_fill_manual(values = c(
    "Google" = "#948cf0",
    "Bing" = "#87cefa",
    "Facebook" = "blue",
    "Instagram" = "#ffb6c1",
    "LinkedIn" = "#ffa07a",
    "Other" = "#e09bb1"
  )) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 1)
  )

Insights

As displayed in the bar chart, most external sessions are generated from “Google” and “Other” sources, with Google standing out as a major referrer. This indicates that search engines, particularly Google, are key drivers of traffic to the platform, whereas social media channels such as Facebook, Instagram, and LinkedIn account for a smaller portion of sessions.

7. Breakdown of Number of users by hour, day, week, month and device type

This analysis provides an overview of user activity patterns across different time intervals—hourly, daily, weekly, and monthly, and device type. By examining user counts at these levels, we can gain insights into peak usage times and identify trends over time.

# Convert timestamps to a datetime format

pa_data <- pa_data %>%
  mutate(
    tfc_stamped = ymd_hms(tfc_stamped)  
  )

# Create additional time columns for grouping

pa_data <- pa_data %>%
  mutate(
    hour = hour(tfc_stamped),
    day = as.Date(tfc_stamped),
    week = floor_date(tfc_stamped, "week"),
    month = floor_date(tfc_stamped, "month")
  )

# Calculate unique users by hour, day, week, and month, and breakdown by device type

user_counts <- pa_data %>%
  group_by(tfc_device_type, hour, day, week, month) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop')

Visualisation of Breakdown of Users by Time Interval

#Create separate data frames for each time interval and add a `time_interval` identifier

hourly_data <- pa_data %>%
  mutate(hour = hour(tfc_stamped)) %>%
  group_by(hour) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Hourly")

daily_data <- pa_data %>%
  mutate(day = as.Date(tfc_stamped)) %>%
  group_by(day) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Daily")

weekly_data <- pa_data %>%
  mutate(week = floor_date(tfc_stamped, "week")) %>%
  group_by(week) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Weekly")

monthly_data <- pa_data %>%
  mutate(month = floor_date(tfc_stamped, "month")) %>%
  group_by(month) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop') %>%
  mutate(time_interval = "Monthly")

# Combine all intervals 

combined_data <- bind_rows(hourly_data, daily_data, weekly_data, monthly_data)

# Ensure the time_interval is ordered as Daily, Hourly, Weekly, Monthly

combined_data <- combined_data %>%
  mutate(time_interval = factor(time_interval, levels = c("Daily", "Hourly", "Weekly", "Monthly")))

# Create the box plot

ggplot(combined_data, aes(x = time_interval, y = unique_users, fill = time_interval)) +
  geom_boxplot(outlier.color = "red", outlier.shape = 16, outlier.size = 2) +
  scale_fill_manual(values = c("Hourly" = "#e09bb1", "Daily" = "#715bec", "Weekly" = "#C44E52", "Monthly" = "#948cf0")) +
  labs(
    title = "Breakdown of Users by Time Period",
    x = "Time Interval",
    y = "Number of Users",
    fill = "Time Interval"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5),  # Keep x-axis labels horizontal for clarity
    legend.position = "none"  
  )

Insights

It is clearly observed in the box plot, that the monthly time interval shows the highest number of users,indicating that user interaction with the platform is more prominent on a monthly basis compared to daily or hourly usage patterns.

Visualisation of user breakdown by App vs Desktop Website vs Mobile Website

# Create users by device type

user_device <- pa_data %>%
  group_by(tfc_device_type) %>%
  summarize(unique_users = n_distinct(tfc_cookie), .groups = 'drop')

# Create the bar chart

ggplot(user_device, aes(x = fct_reorder(tfc_device_type, unique_users, .desc = TRUE), y = unique_users, fill = tfc_device_type))  +
  geom_bar(stat = "identity") +
  scale_fill_manual(values = c(
    "Android App" = "#87cefa",
    "Desktop" = "#e09bb1",
    "Mobile (browser)" = "#948cf0",
    "Tablet (browser)" = "#a32a59",
    "iPhone App" = "#d62067"
  )) +
  labs(
    title = "Breakdown of Users by Device Type",
    x = "Device Type",
    y = "Users",
    fill = "Device Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5)  # Ensure x-axis labels are horizontal
  ) +
  guides(fill = guide_legend(title.position = "top", title.hjust = 0.5)) # 

Insights

From the bar chart, it can be concluded that the desktop users make up the largest group, i,e, a significant portion of the visitors access the website from desktop instead of mobile apps or browsers.

8. Breakdown of sessions by hour, day, week, month and device type

This analysis provides a breakdown of session counts by time intervals (hourly, daily, weekly, and monthly) while organising sessions by device type (desktop, mobile, app).

# Add time interval columns based on timestamp

pa_data <- pa_data %>%
  mutate(
    hour = hour(tfc_stamped),
    day = as.Date(tfc_stamped),
    week = floor_date(tfc_stamped, "week"),
    month = floor_date(tfc_stamped, "month")
  )

# Ensure all time intervals are characters

sessions <- pa_data %>%
  mutate(
    hour = as.character(hour),
    day = as.character(day),
    week = as.character(week),
    month = as.character(month)
  ) %>%
  pivot_longer(
    cols = c(hour, day, week, month),
    names_to = "time_interval",
    values_to = "time_value"
  ) %>%
  group_by(time_interval, time_value, tfc_device_type) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = "drop")

Visualisation of Breakdown of Sessions by Time Interval

# Create separate sessions for each time interval

hourly_sessions <- pa_data %>%
  mutate(time_value = as.character(hour(tfc_stamped))) %>%
  group_by(time_value) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Hourly")

daily_sessions <- pa_data %>%
  mutate(time_value = as.character(as.Date(tfc_stamped))) %>%
  group_by(time_value) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Daily")

weekly_sessions <- pa_data %>%
  mutate(time_value = as.character(floor_date(tfc_stamped, "week"))) %>%
  group_by(time_value) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Weekly")

monthly_sessions <- pa_data %>%
  mutate(time_value = as.character(floor_date(tfc_stamped, "month"))) %>%
  group_by(time_value) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop') %>%
  mutate(time_interval = "Monthly")

# Combine all sessions

combined_sessions <- bind_rows(hourly_sessions, daily_sessions, weekly_sessions, monthly_sessions)

# Plot the bar chart by time interval

ggplot(combined_sessions, aes(x = fct_reorder(time_interval, sessions, .desc = TRUE), y = sessions, fill = time_interval))  +
  geom_bar(stat = "identity", position = "dodge") +
  scale_fill_manual(values = c("Hourly" = "#948cf0", "Daily" = "#715bec", "Weekly" = "#87cefa", "Monthly" = "#e09bb1")) +
  labs(
    title = "Breakdown of Sessions by Time Period",
    x = "Time Interval",
    y = "Number of Sessions",
    fill = "Time Period"
  ) +
  theme_minimal() +
  theme(
    legend.position = "none",
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 1)
  
  )

Insights

The majority of the sessions take place on a monthly basis. Refer to the bar graph above, for more. This proves that most visitors interact at longer intervals rather than daily, hourly, or weekly.

Visualisation of Breakdown of Sessions by Device Type

library(scales) # To covert sessions to continuous variables

# Calculate session counts by time intervals and device type

sessions_by_device <- pa_data %>%
  mutate(
    hour = as.character(hour(tfc_stamped)),
    day = as.character(as.Date(tfc_stamped)),
    week = as.character(floor_date(tfc_stamped, "week")),
    month = as.character(floor_date(tfc_stamped, "month"))
  ) %>%
  pivot_longer(cols = c(hour, day, week, month), 
               names_to = "time_interval", 
               values_to = "time_value") %>%
  group_by(time_interval, time_value, tfc_device_type) %>%
  summarize(sessions = n_distinct(tfc_session), .groups = 'drop')

# Ensure the time_interval is ordered as Hour, Day, Week, Month

sessions_by_device <- sessions_by_device %>%
  mutate(time_interval = factor(time_interval, levels = c("hour", "day", "week", "month")))

# Plot side-by-side bar chart 

ggplot(sessions_by_device, aes(x = time_interval, y = sessions, fill = tfc_device_type)) +
  geom_bar(stat = "identity", position = "dodge") +  
  scale_fill_manual(values = c(
    "Android App" = "#ee7374", 
    "Desktop" = "#a194f4", 
    "Mobile (browser)" = "#7a4db7", 
    "Tablet (browser)" = "#a32a59", 
    "iPhone App" = "#d62067"
  )) +
  scale_y_continuous(labels = scales::comma) +  
  labs(
    title = "Breakdown of Sessions by Device Type",
    x = "Time Interval",
    y = "Sessions",
    fill = "Device Type"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    axis.text.x = element_text(angle = 0, hjust = 0.5)  
  )

Insights

Desktop users contribute to the largest number of sessions across the device types. This could indicate that desktop users’ engagement is significantly higher than that of other device types.

Recommendations

  • By analysing the externally referred users, it is observed that the highest number of referrals comes from search engines. Strengthening search engine optimisation techniques can help expand reach and also diversify referral sources

  • When the common user journey was examined, it was found that the most frequent visitor journey focused on the “application page. It would be ideal to fine-tune the”application” and “list” pages to further improve user traffic.

  • ” 57FA8F202310B985459A2790F316SDDE ” was the top planning application with about 7841 visits. Since they highly reflect the public interest, it is suggested to consider marketing efforts for this popular application.

  • By evaluating the repeat visitor pattern, most users were seen to have a time interval of “Over a Month” between their visits. It is highly advised to employ reminders or regular updates, to encourage more users to visit the website.

  • While analysing the common web metrics, it was observed that the desktop users exhibited the most engagement, with higher session length. Consider implementing strategies to enhance tablet and mobile user engagement, and also optimise the desktop experience further.

  • Google and “Other” sources were concluded as the major referral sources for the sessions. SEO can assist in strengthening referrals. It is also advised to implement digital marketing strategies to increase social media traffic.

  • Monthly users have caused an increasingly high user count, observing spikes in user engagement. Monthly content updation and promotion can enhance monthly user interaction trends.

  • In terms of user count vs device type, desktop users dominate, followed by mobile browser users. Considering the low usage of apps, it would be ideal to prioritise enhancing the desktop and mobile user experience.

  • During the session analysis, monthly sessions were observed to have a high margin. In order to maintain a consistent user pattern, it is suggested to focus on efforts to increase daily and weekly user engagement.

  • In terms of session count vs device type, desktop sessions are seen to be at the top followed by mobile browsers and Android apps. Since the desktop is more likely to be the primary access for all visitors, consider optimising the desktop experience, while improving mobile features too.