Welcome to my Business Analytics project! I developed a relational database for a digital marketing company and utilized SQL to analyze and explore their revenue growth opportunities. The data is hypothetical, assuming that their clients are S&P 500 companies.

This digital marketing company provides three primary marketing services: SEO (Search Engine Optimization), PPC (Pay-Per-Click Google Ads), and Web Design. Each of these services is treated as an individual project, and a client has the flexibility to engage in up to three distinct projects, one for each service (SEO, PPC, Web Design).

My objective is to explore and analyze their clients and projects data, and ultimately identify opportunities for increasing revenue.

The script follows the below structure:

1:Create The Database

2: Import Data from CSV Files

3: Review Data

4: Data Analytics and Data Visualization

I appreciate you taking the time to check this out! If you prefer, feel free to skip ahead to the Data Analytics and Visualization section. Key points (Insights and Suggestions) are bolded for your convenience.

# R function to turn off warnings

options(warn=-1)

 #load packages
library(sqldf)
library(ggplot2)
library(dplyr)
library(stringr)
library(forcats)

1. Create The Database

# Establish Database Connection
db <- dbConnect(SQLite(), dbname="digitalmktg_db.sqlite")

Create the “clients” table

The “clients” table contains information about the clients of the digital marketing company. In this case, the clients are the S&P 500 companies.

sqldf("DROP TABLE IF EXISTS clients", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE 'clients'(
      client_id varchar(20) PRIMARY KEY,
      biz_name varchar(100) NOT NULL,
      industry varchar(100) NOT NULL)
     ", dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows

Create the “projects” table

The “projects” table houses information about each project. Each client may have from one to three projects including SEO, PPC, and Web Design.

sqldf("DROP TABLE IF EXISTS projects", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE projects (
      pro_id varchar(20) PRIMARY KEY,
      pro_type varchar(100) NOT NULL,
      pro_name varchar(100) NOT NULL,
      client_id varchar(20) NOT NULL REFERENCES clients(client_id)
      )",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows

Create the “event” table

The “event” table stores information about the dates when clients signed up, expanded, or canceled services.

Clients have the option to expand the scope of their projects multiple times, such as increasing the number of pages for SEO optimization or raising the budget for PPC Google Ad Campaigns. There is no expansion or cancellation for web design projects since it is a one-time project, unlike SEO and PPC, which are monthly projects. For web design, clients may sign up for a website redesign or the creation of a new site.

In this case, we assume that there is no project resumed after cancellation

sqldf("DROP TABLE IF EXISTS events", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE events (
      event_id varchar(20) PRIMARY KEY,
      pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
      client_id varchar(20) NOT NULL REFERENCES clients(client_id),
      event_type varchar(100) NOT NULL,
      date date NOT NULL
      )",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows

Create the “ppc” table

The “ppc” table contains information about the scope, monthly time budget and actual average time spent per month of each PPC project. The scope represents the management fee the digital marketing company charges to the client for overseeing their Google Ad Campaigns.

sqldf("DROP TABLE IF EXISTS ppc", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE ppc (
      ppc_id varchar(20) PRIMARY KEY,
      pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
      client_id varchar(20) NOT NULL REFERENCES clients(client_id),
      ppc_scope decimal(10,2) NOT NULL,
      budget_time decimal(6,2) NOT NULL,
      monthly_avg_spend decimal(6,2) NOT NULL
      )",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows

Create the “seo” table

The “seo” table contains information about the scope, monthly revenue, monthly time budget and actual average time spent per month of each SEO project. The scope represents the number of SEO pages. The scope directly correlates to the revenue. The higher the scope leads to the higher revenue for the digital marketing company.

sqldf("DROP TABLE IF EXISTS seo", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE seo (
      seo_id varchar(20) PRIMARY KEY,
      pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
      client_id varchar(20) NOT NULL REFERENCES clients(client_id),
      seo_scope int NOT NULL,
      seo_rev decimal(10,2) NOT NULL,
      budget_time decimal(6,2) NOT NULL,
      monthly_avg_spend decimal(6,2) NOT NULL
      )",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows

Create the “web” table

The “web” table contains information about the total revenue, total time budget and actual total time spent of each Web Design project. The scope for each Web Design project varies depending on the amount of work and hours required to build or redesign the website based on the client’s request and customization.

sqldf("DROP TABLE IF EXISTS web", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE web (
     wd_id varchar(20) PRIMARY KEY,
     pro_id varchar(20) NOT NULL REFERENCES projects(project_id),
     client_id varchar(20) NOT NULL REFERENCES clients(client_id),
     wd_rev decimal(10,2) NOT NULL,
     budget_time decimal(6,2) NOT NULL,
     total_spend decimal(6,2) NOT NULL
     )",dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows

2: Import Data from CSV Files

# Set working directory

setwd("/Users/rosenguyen/Documents/dms")

dbWriteTable(db, "clients", read.csv("/Users/rosenguyen/Documents/dms/clients.csv"), append = TRUE)
dbWriteTable(db, "projects", read.csv("/Users/rosenguyen/Documents/dms/projects.csv"), append = TRUE)
dbWriteTable(db, "events", read.csv("/Users/rosenguyen/Documents/dms/events.csv"), append = TRUE)
dbWriteTable(db, "ppc", read.csv("/Users/rosenguyen/Documents/dms/ppc.csv"), append = TRUE)
dbWriteTable(db, "web", read.csv("/Users/rosenguyen/Documents/dms/web.csv"), append = TRUE)
dbWriteTable(db, "seo", read.csv("/Users/rosenguyen/Documents/dms/seo.csv"), append = TRUE)

3: Review Data

Summary of “clients” table

sqldf("PRAGMA table_info(clients)", dbname = "digitalmktg_db.sqlite")
##   cid      name         type notnull dflt_value pk
## 1   0 client_id  varchar(20)       0         NA  1
## 2   1  biz_name varchar(100)       1         NA  0
## 3   2  industry varchar(100)       1         NA  0
sqldf("SELECT * FROM clients LIMIT 5", dbname = "digitalmktg_db.sqlite")
##   client_id             biz_name               industry
## 1    CNT389             Qualcomm Information Technology
## 2    CNT031 Ameriprise Financial             Financials
## 3    CNT460       United Rentals            Industrials
## 4    CNT161    Eaton Corporation            Industrials
## 5    CNT068            BlackRock             Financials
client_review <- sqldf("SELECT COUNT(client_id) AS num_clients, COUNT (DISTINCT industry) AS unique_num_industry
       FROM clients
      ", dbname = "digitalmktg_db.sqlite")

client_review
##   num_clients unique_num_industry
## 1         500                  11

=> The digital marketing company has a total of 500 clients across 11 different industries.

Summary of “projects” table

sqldf("PRAGMA table_info(projects)", dbname = "digitalmktg_db.sqlite")
##   cid      name         type notnull dflt_value pk
## 1   0    pro_id  varchar(20)       0         NA  1
## 2   1  pro_type varchar(100)       1         NA  0
## 3   2  pro_name varchar(100)       1         NA  0
## 4   3 client_id  varchar(20)       1         NA  0
sqldf("SELECT * FROM projects LIMIT 5", dbname = "digitalmktg_db.sqlite")
##    pro_id pro_type pro_name client_id
## 1 PRO0001       WD   MMM-WD    CNT001
## 2 PRO0002      SEO  MMM-SEO    CNT001
## 3 PRO0003      PPC  MMM-PPC    CNT001
## 4 PRO0004       WD   AOS-WD    CNT002
## 5 PRO0005      SEO  AOS-SEO    CNT002
sqldf("SELECT COUNT(pro_id) AS num_projects, COUNT(DISTINCT pro_type) AS num_project_type, COUNT(DISTINCT client_id) AS num_clients
       FROM projects
      ", dbname = "digitalmktg_db.sqlite")
##   num_projects num_project_type num_clients
## 1          839                3         500

=> The 500 clients collectively have 839 projects, categorized into three types: SEO, PPC, and Web Design.

Summary of “events” table

sqldf("PRAGMA table_info(events)", dbname = "digitalmktg_db.sqlite")
##   cid       name         type notnull dflt_value pk
## 1   0   event_id  varchar(20)       0         NA  1
## 2   1     pro_id  varchar(20)       1         NA  0
## 3   2  client_id  varchar(20)       1         NA  0
## 4   3 event_type varchar(100)       1         NA  0
## 5   4       date         date       1         NA  0
sqldf("SELECT * FROM events LIMIT 5", dbname = "digitalmktg_db.sqlite")
##   event_id  pro_id client_id event_type       date
## 1   E00001 PRO0001    CNT001      start 2014-08-09
## 2   E00002 PRO0002    CNT001      start 2019-10-17
## 3   E00003 PRO0002    CNT001  expansion 2021-04-17
## 4   E00004 PRO0002    CNT001  expansion 2021-09-17
## 5   E00005 PRO0002    CNT001  expansion 2022-11-17
sqldf("SELECT COUNT(event_id) AS num_event, COUNT(DISTINCT pro_id) AS num_projects, COUNT(DISTINCT event_type) AS num_event_type
       FROM events
      ", dbname = "digitalmktg_db.sqlite")
##   num_event num_projects num_event_type
## 1      2198          839              3
sqldf("SELECT DISTINCT event_type AS event_type_cat
       FROM events
      ", dbname = "digitalmktg_db.sqlite")
##   event_type_cat
## 1          start
## 2      expansion
## 3         cancel
sqldf("SELECT MIN(date) AS earliest_date, MAX(date) AS latest_date, (MAX(date)-MIN(date)) AS year_range
       FROM events
      ", dbname = "digitalmktg_db.sqlite")
##   earliest_date latest_date year_range
## 1    2012-01-10  2023-12-31         11

=> The 839 projects collectively have 2198 events, categorized into three types: start, expansion, and cancel. The date period spans from January 10, 2012 to December 31, 2023. The data set covers a 11-year period.

Summary of “ppc” table

sqldf("PRAGMA table_info(ppc)", dbname = "digitalmktg_db.sqlite")
##   cid              name          type notnull dflt_value pk
## 1   0            ppc_id   varchar(20)       0         NA  1
## 2   1            pro_id   varchar(20)       1         NA  0
## 3   2         client_id   varchar(20)       1         NA  0
## 4   3         ppc_scope decimal(10,2)       1         NA  0
## 5   4       budget_time  decimal(6,2)       1         NA  0
## 6   5 monthly_avg_spend  decimal(6,2)       1         NA  0
sqldf("SELECT * FROM ppc LIMIT 5", dbname = "digitalmktg_db.sqlite")
##   ppc_id  pro_id client_id ppc_scope budget_time monthly_avg_spend
## 1 PPC001 PRO0003    CNT001     10000          50               145
## 2 PPC002 PRO0006    CNT002     11000          55                60
## 3 PPC003 PRO0009    CNT003     12000          60                96
## 4 PPC004 PRO0033    CNT011     20000         100               129
## 5 PPC005 PRO0036    CNT012     21000         105               194
ppc_review <- sqldf("SELECT COUNT(ppc_id) AS num_projects, MIN(ppc_scope) AS smallest_scope, MAX(ppc_scope) AS biggest_scope, AVG(ppc_scope) AS average_scope, MAX(ppc_scope)- MIN(ppc_scope) AS scope_range 
       FROM ppc
      ", dbname = "digitalmktg_db.sqlite")

ppc_review
##   num_projects smallest_scope biggest_scope average_scope scope_range
## 1          311          10000         35000      22450.16       25000

=> There are 311 PPC projects with a monthly scope ranging from $10,000 to $35,000. The average scope across these projects is $22,450, and the scope range is $25,000.

sqldf("SELECT MIN(budget_time) AS lowest_budget_time, MAX(budget_time) AS highest_budget_time, AVG(budget_time) AS average_budget_time, MAX(budget_time)- MIN(budget_time) AS budget_time_range 
       FROM ppc
      ", dbname = "digitalmktg_db.sqlite")
##   lowest_budget_time highest_budget_time average_budget_time budget_time_range
## 1                 50                 175            112.2508               125

The budget time encompasses the total allocated time for the entire team, including a project manager, digital marketing specialists, web developers, and the copywriters assigned to that specific project.

=> The budget time for a PPC project varies from a minimum of 50 hours to a maximum of 175 hours, resulting in a range of 125 hours. On average, the budget time for a PPC project is approximately 112 hours.

sqldf("SELECT MIN(monthly_avg_spend) AS lowest_monthly_avg_spend, MAX(monthly_avg_spend) AS highest_monthly_avg_spend, AVG(monthly_avg_spend) AS average_monthly_avg_spend, MAX(monthly_avg_spend)- MIN(monthly_avg_spend) AS monthly_avg_spend_range 
       FROM ppc
      ", dbname = "digitalmktg_db.sqlite")
##   lowest_monthly_avg_spend highest_monthly_avg_spend average_monthly_avg_spend
## 1                       51                       199                  124.7685
##   monthly_avg_spend_range
## 1                     148

The monthly average spend represents the total time spent on average per month by the entire team assigned to that specific project.

=> The actual average time spent for a PPC project per month varies from a minimum of 51 hours to a maximum of 199 hours, resulting in a range of almost 125 hours. On average, a team actually spent approximately 148 hours on a PPC project .

Summary of “seo” table

sqldf("PRAGMA table_info(seo)", dbname = "digitalmktg_db.sqlite")
##   cid              name          type notnull dflt_value pk
## 1   0            seo_id   varchar(20)       0         NA  1
## 2   1            pro_id   varchar(20)       1         NA  0
## 3   2         client_id   varchar(20)       1         NA  0
## 4   3         seo_scope           INT       1         NA  0
## 5   4           seo_rev decimal(10,2)       1         NA  0
## 6   5       budget_time  decimal(6,2)       1         NA  0
## 7   6 monthly_avg_spend  decimal(6,2)       1         NA  0
sqldf("SELECT * FROM seo LIMIT 5", dbname = "digitalmktg_db.sqlite")
##   seo_id  pro_id client_id seo_scope seo_rev budget_time monthly_avg_spend
## 1 SEO001 PRO0002    CNT001       570   57000         285               275
## 2 SEO002 PRO0005    CNT002       980   98000         490               348
## 3 SEO003 PRO0011    CNT004       650   65000         325               469
## 4 SEO004 PRO0014    CNT005       930   93000         465               409
## 5 SEO005 PRO0017    CNT006       890   89000         445               259
seo_review <- sqldf("SELECT COUNT(seo_id) AS num_projects, MIN(seo_scope ) AS smallest_scope, MAX(seo_scope ) AS biggest_scope, AVG(seo_scope ) AS average_scope, MAX(seo_scope )- MIN(seo_scope ) AS scope_range 
       FROM seo
      ", dbname = "digitalmktg_db.sqlite")

seo_review 
##   num_projects smallest_scope biggest_scope average_scope scope_range
## 1          386            500          1000      746.6062         500

=> There are 386 SEO projects with the smallest scope at 500 SEO pages to the largest scope at 10,00 SEO pages. The average scope across these projects is approximately 746 SEO pages, and the scope range is 500 pages.

sqldf("SELECT COUNT(seo_id) AS num_projects, MIN(seo_rev) AS lowest_rev, MAX(seo_rev) AS highest_rev, AVG(seo_rev) AS average_rev, MAX(seo_rev)- MIN(seo_rev) AS rev_range 
       FROM seo
      ", dbname = "digitalmktg_db.sqlite")
##   num_projects lowest_rev highest_rev average_rev rev_range
## 1          386      50000      100000    74660.62     50000

=> The revenue for a SEO project ranges from $50,000 to $100,000, creating a $50,000 span. The average revenue for an SEO project is around $74,661.

sqldf("SELECT MIN(budget_time) AS lowest_budget_time, MAX(budget_time) AS highest_budget_time, AVG(budget_time) AS average_budget_time, MAX(budget_time)- MIN(budget_time) AS budget_time_range 
       FROM seo
      ", dbname = "digitalmktg_db.sqlite")
##   lowest_budget_time highest_budget_time average_budget_time budget_time_range
## 1                250                 500            373.3031               250

=> The amount of time budgeted for a SEO project varies from a minimum of 250 hours to a maximum of 500 hours, resulting in a range of 250 hours. On average, the expected time spend for a SEO project is approximately 373 hours.

sqldf("SELECT MIN(monthly_avg_spend) AS lowest_monthly_avg_spend, MAX(monthly_avg_spend) AS highest_monthly_avg_spend, AVG(monthly_avg_spend) AS average_monthly_avg_spend, MAX(monthly_avg_spend)- MIN(monthly_avg_spend) AS monthly_avg_spend_range 
       FROM seo
      ", dbname = "digitalmktg_db.sqlite")
##   lowest_monthly_avg_spend highest_monthly_avg_spend average_monthly_avg_spend
## 1                      200                       499                  343.1218
##   monthly_avg_spend_range
## 1                     299

=> The actual average time spent for a SEO project per month varies from a minimum of 200 hours to a maximum of 499 hours, resulting in a range of 299 hours. On average, the average time spent a month for a SEO project is approximately 343 hours.

Summary of “web” table

sqldf("PRAGMA table_info(web)", dbname = "digitalmktg_db.sqlite")
##   cid        name          type notnull dflt_value pk
## 1   0       wd_id   varchar(20)       0         NA  1
## 2   1      pro_id   varchar(20)       1         NA  0
## 3   2   client_id   varchar(20)       1         NA  0
## 4   3      wd_rev decimal(10,2)       1         NA  0
## 5   4 budget_time  decimal(6,2)       1         NA  0
## 6   5 total_spend  decimal(6,2)       1         NA  0
sqldf("SELECT * FROM web LIMIT 5", dbname = "digitalmktg_db.sqlite")
##   wd_id  pro_id client_id wd_rev budget_time total_spend
## 1 WD001 PRO0001    CNT001  80000         400       896.0
## 2 WD002 PRO0004    CNT002  15000          75        91.5
## 3 WD003 PRO0016    CNT006  65000         325       734.5
## 4 WD004 PRO0019    CNT007  20000         100       223.0
## 5 WD005 PRO0022    CNT008  60000         300       294.0
web_review <- sqldf("SELECT COUNT(wd_id) AS num_projects, MIN(wd_rev) AS lowest_rev, MAX(wd_rev) AS highest_rev, AVG(wd_rev) AS average_rev, MAX(wd_rev)- MIN(wd_rev) AS rev_range 
       FROM web
      ", dbname = "digitalmktg_db.sqlite")

web_review 
##   num_projects lowest_rev highest_rev average_rev rev_range
## 1          142       5000       95000    52042.25     90000

=> There are 142 Web Design projects with revenue ranging from $5,000 to $95,000, showing a substantial range of $90,000. On average, the revenue from a Web Design project for this digital marketing company is approximately $52,042.

sqldf("SELECT MIN(budget_time) AS lowest_budget_time, MAX(budget_time) AS highest_budget_time, AVG(budget_time) AS average_budget_time, MAX(budget_time)- MIN(budget_time) AS budget_time_range 
       FROM web
      ", dbname = "digitalmktg_db.sqlite")
##   lowest_budget_time highest_budget_time average_budget_time budget_time_range
## 1                 25                 475            260.2113               450

Web Design is a one-time project and not a monthly recurring project. Consequently, the budget time for a Web Design project represents the total time allocated for the entire project, rather than being distributed monthly.

=> The budget time for a Web Design project varies from a minimum of 25 hours to a maximum of 475 hours, resulting in a range of 450 hours. On average, the budget time for a PPC project is approximately 260 hours.

sqldf("SELECT MIN(total_spend) AS lowest_total_spend, MAX(total_spend) AS highest_total_spend, AVG(total_spend) AS average_total_spend, MAX(total_spend)- MIN(total_spend) AS total_spend_range 
       FROM web
      ", dbname = "digitalmktg_db.sqlite")
##   lowest_total_spend highest_total_spend average_total_spend total_spend_range
## 1              25.75             1173.25             402.419            1147.5

=> The actual total time spent on a Web Design project varies from a minimum of 25.75 hours to a maximum of 499 hours, resulting in a range of almost 1,148 hours. On average, the average time spent a month for a Web Design project is approximately 402 hours.

4: Data Analytics and Data Visualization

Opportunity 1: Increase Customers

Increase customers in the industries with a client base smaller than the average size

# Number of clients by industry

ind_client_ct <- sqldf("
       SELECT industry, COUNT(*) AS client_ct
       FROM clients
       GROUP BY industry
       ORDER BY client_ct DESC
      ", dbname = "digitalmktg_db.sqlite")

ind_client_ct
##                  industry client_ct
## 1             Industrials        78
## 2              Financials        72
## 3  Information Technology        64
## 4             Health Care        64
## 5  Consumer Discretionary        53
## 6        Consumer Staples        38
## 7             Real Estate        31
## 8               Utilities        30
## 9               Materials        28
## 10                 Energy        23
## 11 Communication Services        19
# The average number of client per industry

avg_client_num <- sqldf("
WITH ind_client_ct AS (
       SELECT industry, COUNT(*) AS client_ct
       FROM clients
       GROUP BY industry
       ORDER BY client_ct DESC)

       SELECT ROUND(AVG(client_ct), 0) AS avg_client_num
       FROM ind_client_ct
      ", dbname = "digitalmktg_db.sqlite")

avg_client_num
##   avg_client_num
## 1             45
# Which industries have the client base smaller than the average size?

target_industry <- sqldf("
       SELECT industry, client_ct, ((SELECT avg_client_num FROM avg_client_num) - client_ct) AS target_increase
       FROM ind_client_ct e1
       WHERE client_ct < (
       SELECT avg_client_num
       FROM avg_client_num
       )
      ", dbname = "digitalmktg_db.sqlite")

target_industry
##                 industry client_ct target_increase
## 1       Consumer Staples        38               7
## 2            Real Estate        31              14
## 3              Utilities        30              15
## 4              Materials        28              17
## 5                 Energy        23              22
## 6 Communication Services        19              26
# Calculate the average of client_ct
average_client_ct <- mean(ind_client_ct$client_ct)

# Wrap the x-axis labels with a maximum width of 10 characters
ind_client_ct$industry <- str_wrap(ind_client_ct$industry, width = 10)

# Plot the bar chart with the average line
ggplot(data = ind_client_ct, aes(x = reorder(industry, -client_ct), y = client_ct)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  geom_text(aes(label = client_ct), vjust = -0.5, color = "black") +
  geom_hline(yintercept = average_client_ct, linetype = "solid", color = "red", size = 1) +
  labs(title = "Client Distribution by Industry", x = "Industry", y = "Number of Clients")

=> Insight: The digital marketing company has an average of 45 clients per industry. There are 6 industries with a client base below the average.

=> Suggestion: They can acquire more clients in the industries with the client below the average. Given the existing clientele in these industries, they already have prior researches, resources, knowledge, and a track record of successful results and projects. This positions them well to attract potential new clients in those industries. For industries with high number of clients, the pool of new clients become smaller and it is easier to encounter the competitors of these clients.

sqldf("
       SELECT SUM(target_increase) AS total_client_to_grow
       FROM target_industry
      ", dbname = "digitalmktg_db.sqlite")
##   total_client_to_grow
## 1                  101

=> Conclusion: By leveraging opportunity 1, the digital marketing company can on-board up to 101 new clients, and sign up a maximum of 303 new projects considering each client can have a total of three projects.

Opportunity 2: Increase Projects for Existing Clients

1. Increase the Number of SEO projects

# How many clients never had an SEO project?

need_seo <- sqldf("
       SELECT COUNT( DISTINCT client_id) AS client_without_seo_ct
       FROM projects
       WHERE client_id NOT IN
       (SELECT client_id
       FROM seo)
      ", dbname = "digitalmktg_db.sqlite")

need_seo
##   client_without_seo_ct
## 1                   114

=> Insight: There are 114 existing clients who have never had an SEO project with the digital marketing company.

=> Suggestion: The sales team and project managers can offer SEO programs to these clients.

2. Increase the Number of PPC projects

# How many clients never had a PPC project?

need_ppc <- sqldf("
       SELECT COUNT(DISTINCT client_id) AS client_without_ppc_ct
       FROM projects
       WHERE client_id NOT IN
       (SELECT client_id
       FROM ppc)
      ", dbname = "digitalmktg_db.sqlite")

need_ppc 
##   client_without_ppc_ct
## 1                   189

=> Insight: There are 189 clients who have never had a PPC project with the digital marketing company.

=> Suggestion: The sales team and project managers can suggest PPC programs for these existing clients.

3. Increase the Number of Web Design projects

# How many clients never had a Web Design project?

need_web_des <- sqldf("SELECT COUNT( DISTINCT client_id) AS client_without_wd_ct
       FROM projects
       WHERE client_id NOT IN
       (SELECT client_id
       FROM web)
      ", dbname = "digitalmktg_db.sqlite")

need_web_des
##   client_without_wd_ct
## 1                  358

=> Insight: There are 358 clients who have never signed up for a Web Design project with the digital marketing company

=> Suggestion: The sales team and project managers can recommend the Web Design service to these existing clients.

#Visualization

sqldf("DROP TABLE IF EXISTS pro_type_dis", dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf("CREATE TABLE 'pro_type_dis'(
      pro_type varchar(100) NOT NULL,
      pro_existence varchar(100) NOT NULL,
      pro_perc decimal(10,2) NOT NULL
     )", dbname="digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
sqldf(c(
  "INSERT INTO pro_type_dis VALUES('SEO', 'With', (SELECT num_projects FROM seo_review)*100/ (SELECT num_clients FROM client_review))",
  "INSERT INTO pro_type_dis VALUES('SEO', 'Without', (SELECT client_without_seo_ct FROM need_seo)*100/ (SELECT num_clients FROM client_review))",
  "INSERT INTO pro_type_dis VALUES('PPC', 'With', (SELECT num_projects FROM ppc_review)*100/ (SELECT num_clients FROM client_review))",
  "INSERT INTO pro_type_dis VALUES('PPC', 'Without', (SELECT client_without_ppc_ct FROM need_ppc)*100/ (SELECT num_clients FROM client_review))",
  "INSERT INTO pro_type_dis VALUES('WD', 'With', (SELECT num_projects FROM web_review)*100/ (SELECT num_clients FROM client_review))",
  "INSERT INTO pro_type_dis VALUES('WD', 'Without', (SELECT client_without_wd_ct FROM need_web_des)*100/ (SELECT num_clients FROM client_review))"
  
),
dbname = "digitalmktg_db.sqlite")
## data frame with 0 columns and 0 rows
pro_type_dis <- 
     sqldf("SELECT * 
     FROM pro_type_dis
     ", dbname="digitalmktg_db.sqlite")

pro_type_dis %>%
  ggplot() +
  aes(x = pro_type, y = pro_perc, fill = fct_rev(fct_infreq(factor(pro_existence)))) +
  geom_bar(stat = "identity") +
  xlab("Project Types") +
  ylab("Percentage of Total") +
  geom_text(aes(label = paste0(sprintf("%1.0f", pro_perc), "%")),
            position = position_stack(vjust = 0.5)) +
  ggtitle("Client Distribution (%) by Project Type") +
  theme_bw() +
  labs(fill = "Clients")

rm(pro_type_dis)

=> Insight: Leveraging the current client base, the digital marketing company has the potential to increase PPC projects by 37%, SEO projects by 22%, and Web Design projects by 71%.

4. Increase the Number of Web Design projects (Existing Web Design Clients)

# How many client haven't undergone a web redesign for over 3 years (or 36 months)?

need_web_redes <- sqldf("
        WITH latest_wd_pro AS (
        SELECT client_id, MAX(date) AS latest_date
        FROM events
        WHERE pro_id IN (
            SELECT pro_id
            FROM projects
            WHERE pro_type = 'WD'
        )
        GROUP BY client_id, pro_id
    )
       
    SELECT COUNT(CASE WHEN((julianday('2023-12-31') - julianday(latest_date))/365)>= 3 THEN 1 ELSE 0 END) AS client_need_wd_ct
    FROM latest_wd_pro 
", dbname = "digitalmktg_db.sqlite")

need_web_redes
##   client_need_wd_ct
## 1               142

=> Insight: Assuming that a website should have a web redesign every 3 years, there are currently 142 existing clients, who used this digital marketing company for their web design project, have now reached the time for a web redesign need.

=> Suggestion: The sales team and project managers can suggest and advocate a web redesign for these existing clients if applicable.

total_poten_pro <- sqldf("
       SELECT e1.client_without_seo_ct + e2.client_without_ppc_ct + e3.client_without_wd_ct + e4.client_need_wd_ct AS total_poten_pro
       FROM need_seo e1
       JOIN need_ppc e2
       JOIN need_web_des e3
       JOIN need_web_redes e4
", dbname = "digitalmktg_db.sqlite")

total_poten_pro
##   total_poten_pro
## 1             803

=> Conclusion: Through Opportunity 2, which is increasing the number of projects for existing clients, the digital marketing company has the potential to add a total of 803 new projects.

Opportunity 3: Increase the Scope of the Existing Projects

1. Expand the scope of Existing Projects that have never been expanded in the past 2 to 11 years

# How many existing projects have never been expanded before?

total_pro_start <- sqldf("WITH pro_start AS(
       SELECT pro_id, date AS start_date
       FROM events
       WHERE event_type ='start'
       AND pro_id NOT IN
          (SELECT pro_id FROM web)
       AND pro_id NOT IN
          (SELECT pro_id 
          FROM events
          WHERE event_type = 'expansion'))
          
      SELECT COUNT(*) AS pro_ct
      FROM pro_start
      WHERE ROUND(((julianday('2023-12-31') - julianday(start_date))/365),0)>= 2
       ", dbname = "digitalmktg_db.sqlite")

total_pro_start
##   pro_ct
## 1     58

=> Insight: There are 58 projects (SEO and PPC) that have remained at the same scope in the past 2 to 11 years (never expanded before).

never_ex <- sqldf("
       WITH pro_start AS(
       SELECT pro_id, date AS start_date
       FROM events
       WHERE event_type ='start'
       AND pro_id NOT IN
          (SELECT pro_id FROM web)
       AND pro_id NOT IN
          (SELECT pro_id 
          FROM events
          WHERE event_type = 'expansion'))
       
       SELECT COUNT(DISTINCT pro_id) AS pro_ct, (ROUND((julianday('2023-12-31') - julianday(start_date))/365,0)) AS period
       FROM pro_start
       GROUP BY period
      ", dbname = "digitalmktg_db.sqlite")  

never_ex
##    pro_ct period
## 1       5      0
## 2       5      1
## 3       7      2
## 4       3      3
## 5       3      4
## 6       6      5
## 7      10      6
## 8       8      7
## 9       4      8
## 10      5      9
## 11      6     10
## 12      5     11
## 13      1     12
ggplot(never_ex, aes(x = period, y = pro_ct)) +
  geom_bar(stat = "identity", fill = "orange", color = "black") +
  geom_text(aes(label = pro_ct), vjust = -0.5, color = "black") + 
  labs(title = "Projects Never Expanded Before (by their Age/Duration)",
       x = "Duration (Years Old)",
       y = "Number of Projects") + scale_x_continuous(breaks = seq(min(never_ex$period), max(never_ex$period), by = 1))

=> Insight: Upon closer examination, we can see that the room for expansion becomes more evident. For instance, there are 10 projects that are 6 years old and have never been expanded. Fascinatingly, there is one project that is 12 years old and has never undergone an expansion.

=> Suggestion: The sales team and project managers can examine the projects older than 2 years that have never had an expansion and present opportunities of increasing to larger project scopes to these respective clients.

2. Expand the scope of Existing Projects that have not been expanded again in the past 2 to 11 years

# How many projects haven't been expanded in the last 11 years?

total_need_exp <- sqldf("
       WITH latest_expansion AS 
       (SELECT pro_id, MAX(date) AS latest_date
       FROM events
       WHERE event_type ='expansion'
       GROUP BY pro_id)
       
       SELECT COUNT(pro_id) AS client_need_expan_ct
       FROM latest_expansion
       WHERE ROUND(((julianday('2023-12-31') - julianday(latest_date))/365),0)>= 2
      ", dbname = "digitalmktg_db.sqlite")

total_need_exp
##   client_need_expan_ct
## 1                  426

=> Insight: There are 426 projects (SEO and PPC) that have maintained the same scope in the past 2 to 11 years (since its last expansion).

# How many projects haven't been expanded again in the last 11 years?

need_more_exp <- sqldf("
       WITH latest_expansion AS 
       (SELECT pro_id, MAX(date) AS latest_date
       FROM events
       WHERE event_type ='expansion'
       GROUP BY pro_id)
       
       SELECT COUNT(DISTINCT pro_id) AS pro_ct, (ROUND((julianday('2023-12-31') - julianday(latest_date))/365,0)) AS period
       FROM latest_expansion
       GROUP BY period
      ", dbname = "digitalmktg_db.sqlite")

need_more_exp
##    pro_ct period
## 1      98      0
## 2     105      1
## 3      56      2
## 4      53      3
## 5      64      4
## 6      54      5
## 7      62      6
## 8      38      7
## 9      38      8
## 10     25      9
## 11     23     10
## 12     13     11
ggplot(need_more_exp, aes(x = period, y = pro_ct)) +
  geom_bar(stat = "identity", fill = "skyblue", color = "black") +
  geom_text(aes(label = pro_ct), vjust = -0.5, color = "black") + 
  labs(title = "The Last Time Existing Projects Expanded (in Years)",
       x = "Duration (Years Ago)",
       y = "Number of Projects") + scale_x_continuous(breaks = seq(min(need_more_exp$period), max(need_more_exp$period), by = 1))

=> Insight: Breaking down the total number of projects by year duration (from their last expansion to now), we can see that there are numerous spaces for expansion. For example, 105 projects haven’t expanded in over a year. In other words, the last time these 105 projects expanded was a year ago.

=> Suggestion: The sales team and project managers can review these projects (especially the ones that have not expanded in the last 2 to 12 years) and propose expansion to these respective clients.

sqldf("
       SELECT e1.pro_ct + e2.client_need_expan_ct AS total_pro_op3
       FROM total_pro_start e1
       JOIN total_need_exp e2
      ", dbname = "digitalmktg_db.sqlite")
##   total_pro_op3
## 1           484

=> Conclusion: By capitalizing on Opportunity 3, the digital marketing company has the potential to extend the scope of 484 existing projects.

=> Final Conclusion: Considering all three opportunities, by acquiring 101 new clients, initiating 803 new projects, and expanding 484 existing projects, the digital marketing company can substantially boost its revenue.