It is recommended that Zappy Internet further investigate the viability of consistent internet packages outlined below.
The data analysis has found gaps in the market that could allow Zappy to gain market share from competitors who offer inferior products.
Aiming to enter the internet service provider market, Zappy Internet (client) is conducting due diligence before offering a premium and value package in NSW, QLD and VIC based on internet speed.
Assumptions that have material effect on this report include:
On initial investigation, the data appears complete and usable without excessive manipulation and sorting.
Chart 1 gives an indication that there is a wide range of speeds for most services offered to the Australian public. For the client, a wide distribution may infer that there is potential to position within some of these market segments.
# Load packages
library(tidyverse)
library(ggplot2)
library(dplyr)
library(plotly)
library(ggpubr)
library(gridExtra)
theme_set(theme_pubr())
# Read in the dataset
broadband = read.csv("broadband.csv")
#head(broadband, 5)
# create cols for df from main xls
a <- broadband$technology
b <- broadband$tier
c <- broadband$rsp
d <- broadband$state_or_territory
e <- broadband$All.hour.trimmed.mean.download.speed
f <- broadband$All.hour.minimum.download.speed
g <- broadband$All.hour.maximum.download.speed
# combination of cols to form df
df <- data.frame(a,b,c,d,e,f,g)
#rename the col names for usability
names(df) <- c('connection', 'speed', 'ISP', 'state', 'trimmed_mean', 'mean_min', 'mean_max')
# create a col with the range (max - min)
rng <- df$mean_max - df$mean_min
# add to df
df$rng <- rng
#remove rows with the field -Fixed Wireless as this is not needed
new_df <- subset(df, connection != 'Fixed Wireless')
#remove rows with the field -500-990/50Mbps as this is not needed
new_df1 <- subset(new_df, speed != '500-990/50 Mbps')
#reorder the cols so state is first col
new_df1 <- new_df1[, c("state","connection","speed", "ISP", "trimmed_mean", "mean_max", "mean_min", "rng")]
#####################
## All State Data ##
#####################
std_df <- new_df1[with(new_df1, order(state, rng)),]
# Select the std_df data, and focus on range (quantitative) on y axis and State (qualitative) on x axis
p = ggplot(data=std_df, aes(x=factor(state),y=rng))
t <- ggplot(std_df, aes(trimmed_mean, rng)) + geom_point()
#Plot facets with cor line
t + facet_wrap(vars(speed), scales = "free") +
labs(x = "Trimmed Max Mean Speed in Mbps", y = "Max-Min Speed Range in Mbps",
title ="Australian ISP Speeds",
subtitle = "Mean Speed Range vs Trimmed Speed",
caption = "Chart 1",) +
theme_bw() +
theme(text=element_text(size = 10)) +
stat_smooth(method = "lm",
formula = y ~ x,
geom = "smooth")
The two higher speed (250 Mbps) services in the NSW market illustrates the arbitrage prospect for Zappy Internet. While the 250/25 service is positioned towards the y axis with its range at just over 30% of download speed. The 250/100 service’s range pushes it to be an outlier. The potential is in offering a similar package with a reduced range in mean speed. For a budget product the 100 Mbps offerings have a proportionally larger range than the 250/25 Mbps currently on offer in NSW. This would mean that a 100 Mbps connection with a lower range should be achievable and a better product for consumers.
#####################
## Indiv State Data##
#####################
# NSW
# get only NSW rows for analysis
nsw_df <- subset(std_df, state == 'NSW')
meanPrice2 <- nsw_df %>% group_by(speed) %>% summarise(meanRange = mean(rng))
meanPrice3 <- nsw_df %>% group_by(speed) %>% summarise(s_dev = sd(rng, na.rm = FALSE))
NSW_sd = merge(x = meanPrice2, y = meanPrice3, by = 'speed')
rownames(NSW_sd) <- NSW_sd$speed
NSW_sd$speed <- NULL
# get the download speed as a number by removing the text after the speed
nsw_df$dwld <- (gsub("\\/.*","",nsw_df$speed))
nsw_df$dwld <- as.numeric(nsw_df$dwld)
pct_dl <- (nsw_df$rng/nsw_df$dwld)*100
nsw_df$pct_dl <- pct_dl
#head(nsw_df, 5)
mean_pct <- nsw_df %>% group_by(speed) %>% summarise(meanpct = mean(pct_dl))
#head(mean_pct, 5)
nsw_mns = merge(x = mean_pct, y = meanPrice2, by = "speed")
nsw_mns$state <- c("NSW", "NSW","NSW", "NSW", "NSW", "NSW", "NSW", "NSW", "NSW" )
The Queensland market would appear to be the best of the three states for opportunity. The wider distribution of the services on Chart 3 could be interpreted to be a sign of a less competitive market than the tighter distributions of NSW and VIC. No offering of a 250/100 service may be a potential market to explore. Otherwise, the 25/10 services have a large range as proportion of overall maximum speed. As a budget service, Zappy Internet could take a market share with a more consistent connection at this speed.
#####################
## Indiv State Data##
#####################
# QLD
qld_df <- subset(std_df, state == 'QLD')
#head(qld_df, 10)
meanPriceq <- qld_df %>% group_by(speed) %>% summarise(meanRange = mean(rng))
meanPriceq3 <- qld_df %>% group_by(speed) %>% summarise(s_dev = sd(rng, na.rm = FALSE))
QLD_sd = merge(x = meanPriceq, y = meanPriceq3, by = 'speed')
rownames(QLD_sd) <- QLD_sd$speed
QLD_sd$speed <- NULL
qld_df$dwld <- (gsub("\\/.*","",qld_df$speed))
qld_df$dwld <- as.numeric(qld_df$dwld)
pct_dl <- (qld_df$rng/qld_df$dwld)*100
qld_df$pct_dl <- pct_dl
#head(nsw_df, 5)
mean_pct <- qld_df %>% group_by(speed) %>% summarise(meanpct = mean(pct_dl))
#head(mean_pct, 5)
qld_mns = merge(x = mean_pct, y = meanPrice2, by = "speed")
qld_mns$state <- c("QLD", "QLD","QLD", "QLD", "QLD", "QLD", "QLD", "QLD" )
This market would be the hardest market to gain traction in based on the range of internet speeds. Once again, the 250/100 connection is an outlier. If further research indicates there is more market uptake for this speed this could be a chance to get a foothold in the Victorian market. A lower budget connection would be hard to differentiate on this data.
#####################
## Indiv State Data##
#####################
# VIC
vic_df <- subset(std_df, state == 'VIC')
#head(qld_df, 10)
meanPricev <- vic_df %>% group_by(speed) %>% summarise(meanRange = mean(rng))
meanPricev3 <- vic_df %>% group_by(speed) %>% summarise(s_dev = sd(rng, na.rm = FALSE))
vic_sd = merge(x = meanPricev, y = meanPricev3, by = 'speed')
rownames(vic_sd) <- vic_sd$speed
vic_sd$speed <- NULL
vic_df$dwld <- (gsub("\\/.*","",vic_df$speed))
vic_df$dwld <- as.numeric(vic_df$dwld)
pct_dl <- (vic_df$rng/vic_df$dwld)*100
vic_df$pct_dl <- pct_dl
#head(nsw_df, 5)
mean_pct <- vic_df %>% group_by(speed) %>% summarise(meanpct = mean(pct_dl))
vic_mns = merge(x = mean_pct, y = meanPrice2, by = "speed")
vic_mns$state <- c("VIC", "VIC","VIC", "VIC", "VIC", "VIC", "VIC", "VIC", "VIC" )
########################
## Data back together ##
########################
#recombine the state data
e_seab <- rbind(nsw_mns, qld_mns, vic_mns)
sb <- ggplot(e_seab, aes(meanpct, meanRange ))
colours <- c('#4AC6B7', '#1972A4', '#965F8A', '#FF7070', '#C61951')
# Make the 3d plot
fig <- plot_ly(e_seab, x = ~state, y = ~meanpct, z = ~meanRange, color = ~speed, colors = colours,
marker = list(symbol = 'circle', sizemode = 'diameter'), sizes = c(5, 150))
fig <- fig %>% add_markers()
fig <- fig %>% layout(scene = list(xaxis = list(title = 'Speed'),
yaxis = list(title = 'Range as a % of speed'),
zaxis = list(title = 'Range (Mbps')),
annotations = list(
x = 1.13,
y = 1.05,
text = 'State',
xref = 'paper',
yref = 'paper',
showarrow = FALSE
))
fig
Data Carpentry. (2022, May 16). Data visualisations with GGPlot2. Retrieved from Datacarpentry.org: https://datacarpentry.org/R-ecology-lesson/04-visualization-ggplot2.html#Arranging_plots
Freedman, D., Pisani, R., & Purves, R. (2007). Statistics - Fourth Edition. New York: W. W. Norton & Company, Inc.
Geeks for Geeks. (2021, Apr 28). Add a regression line to GGPlot2. Retrieved from Geeks for Geeks: https://www.geeksforgeeks.org/add-regression-line-to-ggplot2-plot-in-r/
GGPlot2. (n.d.). Text. Retrieved from GGPlot2: https://ggplot2.tidyverse.org/reference/geom_text.html?q=geom_label#geom-label-
Statology - Zach. (2021, Jan 27). Character to Numeric. Retrieved from Statology: https://www.statology.org/character-to-numeric-in-r/
Statology - Zach. (2021, Jan 27). Character to Numeric. Retrieved from Statology: https://www.statology.org/character-to-numeric-in-r/