Welcome to my Business Analytics project! I developed a relational database for a digital marketing company and utilized SQL to analyze and explore their revenue growth opportunities. The data is hypothetical, assuming that their clients are S&P 500 companies.
This digital marketing company provides three primary marketing services: SEO (Search Engine Optimization), PPC (Pay-Per-Click Google Ads), and Web Design. Each of these services is treated as an individual project, and a client has the flexibility to engage in up to three distinct projects, one for each service (SEO, PPC, Web Design).
My objective is to explore and analyze their clients and projects data, and ultimately identify opportunities for increasing revenue.
The script follows the below structure:
1:Create The Database
2: Import Data from CSV Files
3: Review Data
4: Data Analytics and Data Visualization
I appreciate you taking the time to check this out! If you prefer, feel free to skip ahead to the Data Analytics and Visualization section. Key points (Insights and Suggestions) are bolded for your convenience.
# R function to turn off warnings
options(warn=-1)
#load packages
library(sqldf)
library(ggplot2)
library(dplyr)
library(stringr)
library(forcats)
# Establish Database Connection
db <- dbConnect(SQLite(), dbname="digitalmktg_db.sqlite")
The “clients” table contains information about the clients of the digital marketing company. In this case, the clients are the S&P 500 companies.
sqldf("DROP TABLE IF EXISTS clients", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE 'clients'(
client_id varchar(20) PRIMARY KEY,
biz_name varchar(100) NOT NULL,
industry varchar(100) NOT NULL)
", dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
The “projects” table houses information about each project. Each client may have from one to three projects including SEO, PPC, and Web Design.
sqldf("DROP TABLE IF EXISTS projects", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE projects (
pro_id varchar(20) PRIMARY KEY,
pro_type varchar(100) NOT NULL,
pro_name varchar(100) NOT NULL,
client_id varchar(20) NOT NULL REFERENCES clients(client_id)
)",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
The “event” table stores information about the dates when clients signed up, expanded, or canceled services.
Clients have the option to expand the scope of their projects multiple times, such as increasing the number of pages for SEO optimization or raising the budget for PPC Google Ad Campaigns. There is no expansion or cancellation for web design projects since it is a one-time project, unlike SEO and PPC, which are monthly projects. For web design, clients may sign up for a website redesign or the creation of a new site.
In this case, we assume that there is no project resumed after cancellation
sqldf("DROP TABLE IF EXISTS events", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE events (
event_id varchar(20) PRIMARY KEY,
pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
client_id varchar(20) NOT NULL REFERENCES clients(client_id),
event_type varchar(100) NOT NULL,
date date NOT NULL
)",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
The “ppc” table contains information about the scope, monthly time budget and actual average time spent per month of each PPC project. The scope represents the management fee the digital marketing company charges to the client for overseeing their Google Ad Campaigns.
sqldf("DROP TABLE IF EXISTS ppc", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE ppc (
ppc_id varchar(20) PRIMARY KEY,
pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
client_id varchar(20) NOT NULL REFERENCES clients(client_id),
ppc_scope decimal(10,2) NOT NULL,
budget_time decimal(6,2) NOT NULL,
monthly_avg_spend decimal(6,2) NOT NULL
)",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
The “seo” table contains information about the scope, monthly revenue, monthly time budget and actual average time spent per month of each SEO project. The scope represents the number of SEO pages. The scope directly correlates to the revenue. The higher the scope leads to the higher revenue for the digital marketing company.
sqldf("DROP TABLE IF EXISTS seo", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE seo (
seo_id varchar(20) PRIMARY KEY,
pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
client_id varchar(20) NOT NULL REFERENCES clients(client_id),
seo_scope int NOT NULL,
seo_rev decimal(10,2) NOT NULL,
budget_time decimal(6,2) NOT NULL,
monthly_avg_spend decimal(6,2) NOT NULL
)",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
The “web” table contains information about the total revenue, total time budget and actual total time spent of each Web Design project. The scope for each Web Design project varies depending on the amount of work and hours required to build or redesign the website based on the client’s request and customization.
sqldf("DROP TABLE IF EXISTS web", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE web (
wd_id varchar(20) PRIMARY KEY,
pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
client_id varchar(20) NOT NULL REFERENCES clients(client_id),
wd_rev decimal(10,2) NOT NULL,
budget_time decimal(6,2) NOT NULL,
total_spend decimal(6,2) NOT NULL
)",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
# Set working directory
setwd("/Users/rosenguyen/Documents/dms")
dbWriteTable(db, "clients", read.csv("/Users/rosenguyen/Documents/dms/clients.csv"), append = TRUE)
dbWriteTable(db, "projects", read.csv("/Users/rosenguyen/Documents/dms/projects.csv"), append = TRUE)
dbWriteTable(db, "events", read.csv("/Users/rosenguyen/Documents/dms/events.csv"), append = TRUE)
dbWriteTable(db, "ppc", read.csv("/Users/rosenguyen/Documents/dms/ppc.csv"), append = TRUE)
dbWriteTable(db, "web", read.csv("/Users/rosenguyen/Documents/dms/web.csv"), append = TRUE)
dbWriteTable(db, "seo", read.csv("/Users/rosenguyen/Documents/dms/seo.csv"), append = TRUE)
sqldf("PRAGMA table_info(clients)", dbname = "digitalmktg_db.sqlite")
## cid name type notnull dflt_value pk
## 1 0 client_id varchar(20) 0 NA 1
## 2 1 biz_name varchar(100) 1 NA 0
## 3 2 industry varchar(100) 1 NA 0
sqldf("SELECT * FROM clients LIMIT 5", dbname = "digitalmktg_db.sqlite")
## client_id biz_name industry
## 1 CNT389 Qualcomm Information Technology
## 2 CNT031 Ameriprise Financial Financials
## 3 CNT460 United Rentals Industrials
## 4 CNT161 Eaton Corporation Industrials
## 5 CNT068 BlackRock Financials
client_review <- sqldf("SELECT COUNT(client_id) AS num_clients, COUNT (DISTINCT industry) AS unique_num_industry
FROM clients
", dbname = "digitalmktg_db.sqlite")
client_review
## num_clients unique_num_industry
## 1 500 11
=> The digital marketing company has a total of 500 clients across 11 different industries.
sqldf("PRAGMA table_info(projects)", dbname = "digitalmktg_db.sqlite")
## cid name type notnull dflt_value pk
## 1 0 pro_id varchar(20) 0 NA 1
## 2 1 pro_type varchar(100) 1 NA 0
## 3 2 pro_name varchar(100) 1 NA 0
## 4 3 client_id varchar(20) 1 NA 0
sqldf("SELECT * FROM projects LIMIT 5", dbname = "digitalmktg_db.sqlite")
## pro_id pro_type pro_name client_id
## 1 PRO0001 WD MMM-WD CNT001
## 2 PRO0002 SEO MMM-SEO CNT001
## 3 PRO0003 PPC MMM-PPC CNT001
## 4 PRO0004 WD AOS-WD CNT002
## 5 PRO0005 SEO AOS-SEO CNT002
sqldf("SELECT COUNT(pro_id) AS num_projects, COUNT(DISTINCT pro_type) AS num_project_type, COUNT(DISTINCT client_id) AS num_clients
FROM projects
", dbname = "digitalmktg_db.sqlite")
## num_projects num_project_type num_clients
## 1 839 3 500
=> The 500 clients collectively have 839 projects, categorized into three types: SEO, PPC, and Web Design.
sqldf("PRAGMA table_info(events)", dbname = "digitalmktg_db.sqlite")
## cid name type notnull dflt_value pk
## 1 0 event_id varchar(20) 0 NA 1
## 2 1 pro_id varchar(20) 1 NA 0
## 3 2 client_id varchar(20) 1 NA 0
## 4 3 event_type varchar(100) 1 NA 0
## 5 4 date date 1 NA 0
sqldf("SELECT * FROM events LIMIT 5", dbname = "digitalmktg_db.sqlite")
## event_id pro_id client_id event_type date
## 1 E00001 PRO0001 CNT001 start 2014-08-09
## 2 E00002 PRO0002 CNT001 start 2019-10-17
## 3 E00003 PRO0002 CNT001 expansion 2021-04-17
## 4 E00004 PRO0002 CNT001 expansion 2021-09-17
## 5 E00005 PRO0002 CNT001 expansion 2022-11-17
sqldf("SELECT COUNT(event_id) AS num_event, COUNT(DISTINCT pro_id) AS num_projects, COUNT(DISTINCT event_type) AS num_event_type
FROM events
", dbname = "digitalmktg_db.sqlite")
## num_event num_projects num_event_type
## 1 2198 839 3
sqldf("SELECT DISTINCT event_type AS event_type_cat
FROM events
", dbname = "digitalmktg_db.sqlite")
## event_type_cat
## 1 start
## 2 expansion
## 3 cancel
sqldf("SELECT MIN(date) AS earliest_date, MAX(date) AS latest_date, (MAX(date)-MIN(date)) AS year_range
FROM events
", dbname = "digitalmktg_db.sqlite")
## earliest_date latest_date year_range
## 1 2012-01-10 2023-12-31 11
=> The 839 projects collectively have 2198 events, categorized into three types: start, expansion, and cancel. The date period spans from January 10, 2012 to December 31, 2023. The data set covers a 11-year period.
sqldf("PRAGMA table_info(ppc)", dbname = "digitalmktg_db.sqlite")
## cid name type notnull dflt_value pk
## 1 0 ppc_id varchar(20) 0 NA 1
## 2 1 pro_id varchar(20) 1 NA 0
## 3 2 client_id varchar(20) 1 NA 0
## 4 3 ppc_scope decimal(10,2) 1 NA 0
## 5 4 budget_time decimal(6,2) 1 NA 0
## 6 5 monthly_avg_spend decimal(6,2) 1 NA 0
sqldf("SELECT * FROM ppc LIMIT 5", dbname = "digitalmktg_db.sqlite")
## ppc_id pro_id client_id ppc_scope budget_time monthly_avg_spend
## 1 PPC001 PRO0003 CNT001 10000 50 145
## 2 PPC002 PRO0006 CNT002 11000 55 60
## 3 PPC003 PRO0009 CNT003 12000 60 96
## 4 PPC004 PRO0033 CNT011 20000 100 129
## 5 PPC005 PRO0036 CNT012 21000 105 194
ppc_review <- sqldf("SELECT COUNT(ppc_id) AS num_projects, MIN(ppc_scope) AS smallest_scope, MAX(ppc_scope) AS biggest_scope, AVG(ppc_scope) AS average_scope, MAX(ppc_scope)- MIN(ppc_scope) AS scope_range
FROM ppc
", dbname = "digitalmktg_db.sqlite")
ppc_review
## num_projects smallest_scope biggest_scope average_scope scope_range
## 1 311 10000 35000 22450.16 25000
=> There are 311 PPC projects with a monthly scope ranging from $10,000 to $35,000. The average scope across these projects is $22,450, and the scope range is $25,000.
sqldf("SELECT MIN(budget_time) AS lowest_budget_time, MAX(budget_time) AS highest_budget_time, AVG(budget_time) AS average_budget_time, MAX(budget_time)- MIN(budget_time) AS budget_time_range
FROM ppc
", dbname = "digitalmktg_db.sqlite")
## lowest_budget_time highest_budget_time average_budget_time budget_time_range
## 1 50 175 112.2508 125
The budget time encompasses the total allocated time for the entire team, including a project manager, digital marketing specialists, web developers, and the copywriters assigned to that specific project.
=> The budget time for a PPC project varies from a minimum of 50 hours to a maximum of 175 hours, resulting in a range of 125 hours. On average, the budget time for a PPC project is approximately 112 hours.
sqldf("SELECT MIN(monthly_avg_spend) AS lowest_monthly_avg_spend, MAX(monthly_avg_spend) AS highest_monthly_avg_spend, AVG(monthly_avg_spend) AS average_monthly_avg_spend, MAX(monthly_avg_spend)- MIN(monthly_avg_spend) AS monthly_avg_spend_range
FROM ppc
", dbname = "digitalmktg_db.sqlite")
## lowest_monthly_avg_spend highest_monthly_avg_spend average_monthly_avg_spend
## 1 51 199 124.7685
## monthly_avg_spend_range
## 1 148
The monthly average spend represents the total time spent on average per month by the entire team assigned to that specific project.
=> The actual average time spent for a PPC project per month varies from a minimum of 51 hours to a maximum of 199 hours, resulting in a range of almost 125 hours. On average, a team actually spent approximately 148 hours on a PPC project .
sqldf("PRAGMA table_info(seo)", dbname = "digitalmktg_db.sqlite")
## cid name type notnull dflt_value pk
## 1 0 seo_id varchar(20) 0 NA 1
## 2 1 pro_id varchar(20) 1 NA 0
## 3 2 client_id varchar(20) 1 NA 0
## 4 3 seo_scope INT 1 NA 0
## 5 4 seo_rev decimal(10,2) 1 NA 0
## 6 5 budget_time decimal(6,2) 1 NA 0
## 7 6 monthly_avg_spend decimal(6,2) 1 NA 0
sqldf("SELECT * FROM seo LIMIT 5", dbname = "digitalmktg_db.sqlite")
## seo_id pro_id client_id seo_scope seo_rev budget_time monthly_avg_spend
## 1 SEO001 PRO0002 CNT001 570 57000 285 275
## 2 SEO002 PRO0005 CNT002 980 98000 490 348
## 3 SEO003 PRO0011 CNT004 650 65000 325 469
## 4 SEO004 PRO0014 CNT005 930 93000 465 409
## 5 SEO005 PRO0017 CNT006 890 89000 445 259
seo_review <- sqldf("SELECT COUNT(seo_id) AS num_projects, MIN(seo_scope ) AS smallest_scope, MAX(seo_scope ) AS biggest_scope, AVG(seo_scope ) AS average_scope, MAX(seo_scope )- MIN(seo_scope ) AS scope_range
FROM seo
", dbname = "digitalmktg_db.sqlite")
seo_review
## num_projects smallest_scope biggest_scope average_scope scope_range
## 1 386 500 1000 746.6062 500
=> There are 386 SEO projects with the smallest scope at 500 SEO pages to the largest scope at 10,00 SEO pages. The average scope across these projects is approximately 746 SEO pages, and the scope range is 500 pages.
sqldf("SELECT COUNT(seo_id) AS num_projects, MIN(seo_rev) AS lowest_rev, MAX(seo_rev) AS highest_rev, AVG(seo_rev) AS average_rev, MAX(seo_rev)- MIN(seo_rev) AS rev_range
FROM seo
", dbname = "digitalmktg_db.sqlite")
## num_projects lowest_rev highest_rev average_rev rev_range
## 1 386 50000 100000 74660.62 50000
=> The revenue for a SEO project ranges from $50,000 to $100,000, creating a $50,000 span. The average revenue for an SEO project is around $74,661.
sqldf("SELECT MIN(budget_time) AS lowest_budget_time, MAX(budget_time) AS highest_budget_time, AVG(budget_time) AS average_budget_time, MAX(budget_time)- MIN(budget_time) AS budget_time_range
FROM seo
", dbname = "digitalmktg_db.sqlite")
## lowest_budget_time highest_budget_time average_budget_time budget_time_range
## 1 250 500 373.3031 250
=> The amount of time budgeted for a SEO project varies from a minimum of 250 hours to a maximum of 500 hours, resulting in a range of 250 hours. On average, the expected time spend for a SEO project is approximately 373 hours.
sqldf("SELECT MIN(monthly_avg_spend) AS lowest_monthly_avg_spend, MAX(monthly_avg_spend) AS highest_monthly_avg_spend, AVG(monthly_avg_spend) AS average_monthly_avg_spend, MAX(monthly_avg_spend)- MIN(monthly_avg_spend) AS monthly_avg_spend_range
FROM seo
", dbname = "digitalmktg_db.sqlite")
## lowest_monthly_avg_spend highest_monthly_avg_spend average_monthly_avg_spend
## 1 200 499 343.1218
## monthly_avg_spend_range
## 1 299
=> The actual average time spent for a SEO project per month varies from a minimum of 200 hours to a maximum of 499 hours, resulting in a range of 299 hours. On average, the average time spent a month for a SEO project is approximately 343 hours.
sqldf("PRAGMA table_info(web)", dbname = "digitalmktg_db.sqlite")
## cid name type notnull dflt_value pk
## 1 0 wd_id varchar(20) 0 NA 1
## 2 1 pro_id varchar(20) 1 NA 0
## 3 2 client_id varchar(20) 1 NA 0
## 4 3 wd_rev decimal(10,2) 1 NA 0
## 5 4 budget_time decimal(6,2) 1 NA 0
## 6 5 total_spend decimal(6,2) 1 NA 0
sqldf("SELECT * FROM web LIMIT 5", dbname = "digitalmktg_db.sqlite")
## wd_id pro_id client_id wd_rev budget_time total_spend
## 1 WD001 PRO0001 CNT001 80000 400 896.0
## 2 WD002 PRO0004 CNT002 15000 75 91.5
## 3 WD003 PRO0016 CNT006 65000 325 734.5
## 4 WD004 PRO0019 CNT007 20000 100 223.0
## 5 WD005 PRO0022 CNT008 60000 300 294.0
web_review <- sqldf("SELECT COUNT(wd_id) AS num_projects, MIN(wd_rev) AS lowest_rev, MAX(wd_rev) AS highest_rev, AVG(wd_rev) AS average_rev, MAX(wd_rev)- MIN(wd_rev) AS rev_range
FROM web
", dbname = "digitalmktg_db.sqlite")
web_review
## num_projects lowest_rev highest_rev average_rev rev_range
## 1 142 5000 95000 52042.25 90000
=> There are 142 Web Design projects with revenue ranging from $5,000 to $95,000, showing a substantial range of $90,000. On average, the revenue from a Web Design project for this digital marketing company is approximately $52,042.
sqldf("SELECT MIN(budget_time) AS lowest_budget_time, MAX(budget_time) AS highest_budget_time, AVG(budget_time) AS average_budget_time, MAX(budget_time)- MIN(budget_time) AS budget_time_range
FROM web
", dbname = "digitalmktg_db.sqlite")
## lowest_budget_time highest_budget_time average_budget_time budget_time_range
## 1 25 475 260.2113 450
Web Design is a one-time project and not a monthly recurring project. Consequently, the budget time for a Web Design project represents the total time allocated for the entire project, rather than being distributed monthly.
=> The budget time for a Web Design project varies from a minimum of 25 hours to a maximum of 475 hours, resulting in a range of 450 hours. On average, the budget time for a PPC project is approximately 260 hours.
sqldf("SELECT MIN(total_spend) AS lowest_total_spend, MAX(total_spend) AS highest_total_spend, AVG(total_spend) AS average_total_spend, MAX(total_spend)- MIN(total_spend) AS total_spend_range
FROM web
", dbname = "digitalmktg_db.sqlite")
## lowest_total_spend highest_total_spend average_total_spend total_spend_range
## 1 25.75 1173.25 402.419 1147.5
=> The actual total time spent on a Web Design project varies from a minimum of 25.75 hours to a maximum of 499 hours, resulting in a range of almost 1,148 hours. On average, the average time spent a month for a Web Design project is approximately 402 hours.
# Number of clients by industry
ind_client_ct <- sqldf("
SELECT industry, COUNT(*) AS client_ct
FROM clients
GROUP BY industry
ORDER BY client_ct DESC
", dbname = "digitalmktg_db.sqlite")
ind_client_ct
## industry client_ct
## 1 Industrials 78
## 2 Financials 72
## 3 Information Technology 64
## 4 Health Care 64
## 5 Consumer Discretionary 53
## 6 Consumer Staples 38
## 7 Real Estate 31
## 8 Utilities 30
## 9 Materials 28
## 10 Energy 23
## 11 Communication Services 19
# The average number of client per industry
avg_client_num <- sqldf("
WITH ind_client_ct AS (
SELECT industry, COUNT(*) AS client_ct
FROM clients
GROUP BY industry
ORDER BY client_ct DESC)
SELECT ROUND(AVG(client_ct), 0) AS avg_client_num
FROM ind_client_ct
", dbname = "digitalmktg_db.sqlite")
avg_client_num
## avg_client_num
## 1 45
# Which industries have the client base smaller than the average size?
target_industry <- sqldf("
SELECT industry, client_ct, ((SELECT avg_client_num FROM avg_client_num) - client_ct) AS target_increase
FROM ind_client_ct e1
WHERE client_ct < (
SELECT avg_client_num
FROM avg_client_num
)
", dbname = "digitalmktg_db.sqlite")
target_industry
## industry client_ct target_increase
## 1 Consumer Staples 38 7
## 2 Real Estate 31 14
## 3 Utilities 30 15
## 4 Materials 28 17
## 5 Energy 23 22
## 6 Communication Services 19 26
# Calculate the average of client_ct
average_client_ct <- mean(ind_client_ct$client_ct)
# Wrap the x-axis labels with a maximum width of 10 characters
ind_client_ct$industry <- str_wrap(ind_client_ct$industry, width = 10)
# Plot the bar chart with the average line
ggplot(data = ind_client_ct, aes(x = reorder(industry, -client_ct), y = client_ct)) +
geom_bar(stat = "identity", fill = "skyblue") +
geom_text(aes(label = client_ct), vjust = -0.5, color = "black") +
geom_hline(yintercept = average_client_ct, linetype = "solid", color = "red", size = 1) +
labs(title = "Client Distribution by Industry", x = "Industry", y = "Number of Clients")
=> Insight: The digital marketing company has an average of 45 clients per industry. There are 6 industries with a client base below the average.
=> Suggestion: They can acquire more clients in the industries with the client below the average. Given the existing clientele in these industries, they already have prior researches, resources, knowledge, and a track record of successful results and projects. This positions them well to attract potential new clients in those industries. For industries with high number of clients, the pool of new clients become smaller and it is easier to encounter the competitors of these clients.
sqldf("
SELECT SUM(target_increase) AS total_client_to_grow
FROM target_industry
", dbname = "digitalmktg_db.sqlite")
## total_client_to_grow
## 1 101
=> Conclusion: By leveraging opportunity 1, the digital marketing company can on-board up to 101 new clients, and sign up a maximum of 303 new projects considering each client can have a total of three projects.
# How many clients never had an SEO project?
need_seo <- sqldf("
SELECT COUNT( DISTINCT client_id) AS client_without_seo_ct
FROM projects
WHERE client_id NOT IN
(SELECT client_id
FROM seo)
", dbname = "digitalmktg_db.sqlite")
need_seo
## client_without_seo_ct
## 1 114
=> Insight: There are 114 existing clients who have never had an SEO project with the digital marketing company.
=> Suggestion: The sales team and project managers can offer SEO programs to these clients.
# How many clients never had a PPC project?
need_ppc <- sqldf("
SELECT COUNT(DISTINCT client_id) AS client_without_ppc_ct
FROM projects
WHERE client_id NOT IN
(SELECT client_id
FROM ppc)
", dbname = "digitalmktg_db.sqlite")
need_ppc
## client_without_ppc_ct
## 1 189
=> Insight: There are 189 clients who have never had a PPC project with the digital marketing company.
=> Suggestion: The sales team and project managers can suggest PPC programs for these existing clients.
# How many clients never had a Web Design project?
need_web_des <- sqldf("SELECT COUNT( DISTINCT client_id) AS client_without_wd_ct
FROM projects
WHERE client_id NOT IN
(SELECT client_id
FROM web)
", dbname = "digitalmktg_db.sqlite")
need_web_des
## client_without_wd_ct
## 1 358
=> Insight: There are 358 clients who have never signed up for a Web Design project with the digital marketing company
=> Suggestion: The sales team and project managers can recommend the Web Design service to these existing clients.
#Visualization
sqldf("DROP TABLE IF EXISTS pro_type_dis", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE 'pro_type_dis'(
pro_type varchar(100) NOT NULL,
pro_existence varchar(100) NOT NULL,
pro_perc decimal(10,2) NOT NULL
)", dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf(c(
"INSERT INTO pro_type_dis VALUES('SEO', 'With', (SELECT num_projects FROM seo_review)*100/ (SELECT num_clients FROM client_review))",
"INSERT INTO pro_type_dis VALUES('SEO', 'Without', (SELECT client_without_seo_ct FROM need_seo)*100/ (SELECT num_clients FROM client_review))",
"INSERT INTO pro_type_dis VALUES('PPC', 'With', (SELECT num_projects FROM ppc_review)*100/ (SELECT num_clients FROM client_review))",
"INSERT INTO pro_type_dis VALUES('PPC', 'Without', (SELECT client_without_ppc_ct FROM need_ppc)*100/ (SELECT num_clients FROM client_review))",
"INSERT INTO pro_type_dis VALUES('WD', 'With', (SELECT num_projects FROM web_review)*100/ (SELECT num_clients FROM client_review))",
"INSERT INTO pro_type_dis VALUES('WD', 'Without', (SELECT client_without_wd_ct FROM need_web_des)*100/ (SELECT num_clients FROM client_review))"
),
dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
pro_type_dis <-
sqldf("SELECT *
FROM pro_type_dis
", dbname="digitalmktg_db.sqlite")
pro_type_dis %>%
ggplot() +
aes(x = pro_type, y = pro_perc, fill = fct_rev(fct_infreq(factor(pro_existence)))) +
geom_bar(stat = "identity") +
xlab("Project Types") +
ylab("Percentage of Total") +
geom_text(aes(label = paste0(sprintf("%1.0f", pro_perc), "%")),
position = position_stack(vjust = 0.5)) +
ggtitle("Client Distribution (%) by Project Type") +
theme_bw() +
labs(fill = "Clients")
rm(pro_type_dis)
=> Insight: Leveraging the current client base, the digital marketing company has the potential to increase PPC projects by 37%, SEO projects by 22%, and Web Design projects by 71%.
# How many client haven't undergone a web redesign for over 3 years (or 36 months)?
need_web_redes <- sqldf("
WITH latest_wd_pro AS (
SELECT client_id, MAX(date) AS latest_date
FROM events
WHERE pro_id IN (
SELECT pro_id
FROM projects
WHERE pro_type = 'WD'
)
GROUP BY client_id, pro_id
)
SELECT COUNT(CASE WHEN((julianday('2023-12-31') - julianday(latest_date))/365)>= 3 THEN 1 ELSE 0 END) AS client_need_wd_ct
FROM latest_wd_pro
", dbname = "digitalmktg_db.sqlite")
need_web_redes
## client_need_wd_ct
## 1 142
=> Insight: Assuming that a website should have a web redesign every 3 years, there are currently 142 existing clients, who used this digital marketing company for their web design project, have now reached the time for a web redesign need.
=> Suggestion: The sales team and project managers can suggest and advocate a web redesign for these existing clients if applicable.
total_poten_pro <- sqldf("
SELECT e1.client_without_seo_ct + e2.client_without_ppc_ct + e3.client_without_wd_ct + e4.client_need_wd_ct AS total_poten_pro
FROM need_seo e1
JOIN need_ppc e2
JOIN need_web_des e3
JOIN need_web_redes e4
", dbname = "digitalmktg_db.sqlite")
total_poten_pro
## total_poten_pro
## 1 803
=> Conclusion: Through Opportunity 2, which is increasing the number of projects for existing clients, the digital marketing company has the potential to add a total of 803 new projects.
# How many existing projects have never been expanded before?
total_pro_start <- sqldf("WITH pro_start AS(
SELECT pro_id, date AS start_date
FROM events
WHERE event_type ='start'
AND pro_id NOT IN
(SELECT pro_id FROM web)
AND pro_id NOT IN
(SELECT pro_id
FROM events
WHERE event_type = 'expansion'))
SELECT COUNT(*) AS pro_ct
FROM pro_start
WHERE ROUND(((julianday('2023-12-31') - julianday(start_date))/365),0)>= 2
", dbname = "digitalmktg_db.sqlite")
total_pro_start
## pro_ct
## 1 58
=> Insight: There are 58 projects (SEO and PPC) that have remained at the same scope in the past 2 to 11 years (never expanded before).
never_ex <- sqldf("
WITH pro_start AS(
SELECT pro_id, date AS start_date
FROM events
WHERE event_type ='start'
AND pro_id NOT IN
(SELECT pro_id FROM web)
AND pro_id NOT IN
(SELECT pro_id
FROM events
WHERE event_type = 'expansion'))
SELECT COUNT(DISTINCT pro_id) AS pro_ct, (ROUND((julianday('2023-12-31') - julianday(start_date))/365,0)) AS period
FROM pro_start
GROUP BY period
", dbname = "digitalmktg_db.sqlite")
never_ex
## pro_ct period
## 1 5 0
## 2 5 1
## 3 7 2
## 4 3 3
## 5 3 4
## 6 6 5
## 7 10 6
## 8 8 7
## 9 4 8
## 10 5 9
## 11 6 10
## 12 5 11
## 13 1 12
ggplot(never_ex, aes(x = period, y = pro_ct)) +
geom_bar(stat = "identity", fill = "orange", color = "black") +
geom_text(aes(label = pro_ct), vjust = -0.5, color = "black") +
labs(title = "Projects Never Expanded Before (by their Age/Duration)",
x = "Duration (Years Old)",
y = "Number of Projects") + scale_x_continuous(breaks = seq(min(never_ex$period), max(never_ex$period), by = 1))
=> Insight: Upon closer examination, we can see that the room for expansion becomes more evident. For instance, there are 10 projects that are 6 years old and have never been expanded. Fascinatingly, there is one project that is 12 years old and has never undergone an expansion.
=> Suggestion: The sales team and project managers can examine the projects older than 2 years that have never had an expansion and present opportunities of increasing to larger project scopes to these respective clients.
# How many projects haven't been expanded in the last 11 years?
total_need_exp <- sqldf("
WITH latest_expansion AS
(SELECT pro_id, MAX(date) AS latest_date
FROM events
WHERE event_type ='expansion'
GROUP BY pro_id)
SELECT COUNT(pro_id) AS client_need_expan_ct
FROM latest_expansion
WHERE ROUND(((julianday('2023-12-31') - julianday(latest_date))/365),0)>= 2
", dbname = "digitalmktg_db.sqlite")
total_need_exp
## client_need_expan_ct
## 1 426
=> Insight: There are 426 projects (SEO and PPC) that have maintained the same scope in the past 2 to 11 years (since its last expansion).
# How many projects haven't been expanded again in the last 11 years?
need_more_exp <- sqldf("
WITH latest_expansion AS
(SELECT pro_id, MAX(date) AS latest_date
FROM events
WHERE event_type ='expansion'
GROUP BY pro_id)
SELECT COUNT(DISTINCT pro_id) AS pro_ct, (ROUND((julianday('2023-12-31') - julianday(latest_date))/365,0)) AS period
FROM latest_expansion
GROUP BY period
", dbname = "digitalmktg_db.sqlite")
need_more_exp
## pro_ct period
## 1 98 0
## 2 105 1
## 3 56 2
## 4 53 3
## 5 64 4
## 6 54 5
## 7 62 6
## 8 38 7
## 9 38 8
## 10 25 9
## 11 23 10
## 12 13 11
ggplot(need_more_exp, aes(x = period, y = pro_ct)) +
geom_bar(stat = "identity", fill = "skyblue", color = "black") +
geom_text(aes(label = pro_ct), vjust = -0.5, color = "black") +
labs(title = "The Last Time Existing Projects Expanded (in Years)",
x = "Duration (Years Ago)",
y = "Number of Projects") + scale_x_continuous(breaks = seq(min(need_more_exp$period), max(need_more_exp$period), by = 1))
=> Insight: Breaking down the total number of projects by year duration (from their last expansion to now), we can see that there are numerous spaces for expansion. For example, 105 projects haven’t expanded in over a year. In other words, the last time these 105 projects expanded was a year ago.
=> Suggestion: The sales team and project managers can review these projects (especially the ones that have not expanded in the last 2 to 12 years) and propose expansion to these respective clients.
sqldf("
SELECT e1.pro_ct + e2.client_need_expan_ct AS total_pro_op3
FROM total_pro_start e1
JOIN total_need_exp e2
", dbname = "digitalmktg_db.sqlite")
## total_pro_op3
## 1 484
=> Conclusion: By capitalizing on Opportunity 3, the digital marketing company has the potential to extend the scope of 484 existing projects.
=> Final Conclusion: Considering all three opportunities, by acquiring 101 new clients, initiating 803 new projects, and expanding 484 existing projects, the digital marketing company can substantially boost its revenue.