Personal Project: E-commerce Database analysis for Maven Fuzzy Factory

In this Case Study, I will assume the role of an E-commerce Database Analyst at a fictional company called Maven Fuzzy Factor: an online retail company. The company was established in 2012 and has been operating for 3 years.

I have been assigned the task of analyzing data, providing insights, and optimizing marketing channels. Additionally, I will measure and experiment with website conversion performance and utilize data to understand the impact of launching new products.

I will follow the steps of the data analysis process: Ask, Prepare, Process, Analyze, Share, and Act.


I. Ask

Business Task

  • Website session analysis

  • Websites session analysis from device type

  • Conversion performance analysis

  • Revenue and Margin analysis


II. Prepare

Resource

I utilize the company’s Database to perform analysis and identify tends over 3 years from 19/03/2012 to 31/12/2014

Data utilize from Maven Fuzzy Factory


III. Process

Observation:

Connect to MySQL in R language

Install Packages

install.packages("RMySQL")

install.packages("DBI")

install.packages("kableExtra")

In there:

  • Package “RMySQL” allow connect to MySQL

  • Package “DBI” allow interact with Database

  • Package “kableExtra” allows to retrieve query results in the form of tables on HTML screen

Utilize library of 3 packages

library(DBI)

library(RMySQL) 

library(kableExtra)

IV. Analyze and Share

1. Website traffic analysis

a) Websites session analysis from device type
query = "SELECT 
            YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quarter, 
            device_type, COUNT(device_type) AS count_devive
          FROM 
            website_pageviews AS wp
            LEFT JOIN website_sessions AS ws 
                ON wp.website_session_id = ws.website_session_id
          GROUP BY device_type, YEAR(wp.created_at), QUARTER(wp.created_at)
          ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at)
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year quarter device_type count_devive
2012 1 mobile 1191
2012 1 desktop 2528
2012 2 desktop 17356
2012 2 mobile 5883
2012 3 desktop 29988
2012 3 mobile 7389
2012 4 desktop 58756
2012 4 mobile 13392
2013 1 mobile 9770
2013 1 desktop 37889

Visulaization

Observation:

  • The website sessions from desktop is higher than mobile in all years and quarters.

  • The difference between desktop and mobile session can represent that customers frequently access the company’s website using desktop devices

b) Analysis of sessino from Utm_Source and Utm_Campaign

utm_source

query = "SELECT 
            YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quarter,  
            source AS utm_source, COUNT(source) AS count
          FROM 
            website_pageviews AS wp
            LEFT JOIN (
                SELECT 
                    *, CASE WHEN utm_source IS NULL THEN 'NULL' ELSE utm_source END AS source
                FROM website_sessions
                        ) AS ws 
                ON wp.website_session_id = ws.website_session_id
          GROUP BY YEAR(wp.created_at), QUARTER(wp.created_at), source
          ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at), source
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year quarter utm_source count
2012 1 bsearch 2
2012 1 gsearch 3668
2012 1 NULL 49
2012 2 bsearch 166
2012 2 gsearch 21034
2012 2 NULL 2039
2012 3 bsearch 4651
2012 3 gsearch 28936
2012 3 NULL 3790
2012 4 bsearch 14417

utm_campaign

query = "SELECT 
            YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quarter, 
            campaign AS utm_campaign, COUNT(campaign) AS count
          FROM 
            website_pageviews AS wp
            LEFT JOIN (
                SELECT 
                    *, CASE WHEN utm_campaign IS NULL THEN 'NULL' ELSE utm_campaign END AS campaign
                FROM website_sessions
                        ) AS ws 
                ON wp.website_session_id = ws.website_session_id
          GROUP BY YEAR(wp.created_at), QUARTER(wp.created_at), campaign
          ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at), campaign
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year quarter utm_campaign count
2012 1 brand 16
2012 1 nonbrand 3654
2012 1 NULL 49
2012 2 brand 970
2012 2 nonbrand 20230
2012 2 NULL 2039
2012 3 brand 1973
2012 3 nonbrand 31614
2012 3 NULL 3790
2012 4 brand 4181

Visulaization

Observation:

  • There are 4 main traffic sources: NULL, bsearch, search and socialbook. Where NULL represents accesses without a specified source

  • The gsearch traffic source accounts for a significantly larger amount of traffic than other traffic sources. It can be commented that customers often access the company’s website via Utm_source “gsearch”


2. Conversion performance analysis

a) Analysis of conversion performance from Funnels
query = "SELECT 
            YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quater, 
            pageview_url, COUNT(device_type) AS count
          FROM 
            website_pageviews AS wp
            LEFT JOIN website_sessions AS ws 
                ON wp.website_session_id = ws.website_session_id
          GROUP BY pageview_url, YEAR(wp.created_at), QUARTER(wp.created_at)
          HAVING pageview_url NOT IN ('/lander-1', '/lander-2', '/lander-3', '/lander-4', '/lander-5')
          ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at)
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year quater pageview_url count
2012 1 /home 1879
2012 1 /products 743
2012 1 /the-original-mr-fuzzy 530
2012 1 /cart 228
2012 1 /shipping 156
2012 1 /billing 123
2012 1 /thank-you-for-your-order 60
2012 2 /cart 1473
2012 2 /shipping 979
2012 2 /billing 813

Visulaization

Observation:

  • Website traffic tends to increase steadily and steadily from 2012 to 2015

  • Funnels like /home, /products, and /the-original-mr-fuzzy usually get more traffic than other Funnels

  • Conversion rate from /products Funnel to orders is quite good and stable

b) Analysis of the number of customers accessing the website during the day
query = "SELECT 
            YEAR(created_at) AS year, MONTH(created_at) AS month, DAY(created_at) AS day,
            COUNT(user_id) AS users
          FROM website_sessions
          GROUP BY YEAR(created_at), MONTH(created_at), DAY(created_at)
          ORDER BY YEAR(created_at), MONTH(created_at), DAY(created_at)
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year month day users
2012 3 19 137
2012 3 20 161
2012 3 21 191
2012 3 22 177
2012 3 23 156
2012 3 24 74
2012 3 25 73
2012 3 26 175
2012 3 27 163
2012 3 28 165

Visulaization

Observation:

  • The number of customers accessing the website tends to increase gradually from 2012 to 2015

  • There are 3 periods when the number of customers accessing the website spikes - That is Black Friday

c) Analysis of sessions, orders and CVR by quarter
query = "SELECT
            YEAR(ws.created_at) year, QUARTER(ws.created_at) quater, 
            COUNT(ws.website_session_id) AS session,
            COUNT(order_id) AS orders,
            COUNT(order_id)/COUNT(ws.website_session_id) AS CVR
          FROM website_sessions AS ws
            LEFT JOIN orders AS o
                ON ws.website_session_id = o.website_session_id
          GROUP BY YEAR(ws.created_at), QUARTER(ws.created_at)
          ORDER BY YEAR(ws.created_at), QUARTER(ws.created_at)
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year quater session orders CVR
2012 1 1879 60 0.0319
2012 2 11433 347 0.0304
2012 3 16892 684 0.0405
2012 4 32266 1495 0.0463
2013 1 19833 1273 0.0642
2013 2 24745 1718 0.0694
2013 3 27663 1840 0.0665
2013 4 40540 2616 0.0645
2014 1 46779 3069 0.0656
2014 2 53129 3848 0.0724

Visulaization

Observation:

  • Volume of Website sessions and orders tend to increase gradually from 2012 to 2015

  • Conversion rates typically range from about 0.03 to 0.9, which is between 3% and 9% of sessions converted into orders. This ratio is considered good and acceptable throughout the observation range


3. Revenue and Margin analysis

query = "SELECT 
            YEAR(created_at) year, QUARTER(created_at) quater,
            SUM(price_usd) AS revenue,
            SUM(price_usd - cogs_usd) AS margin
          FROM orders AS o
          GROUP BY YEAR(created_at), QUARTER(created_at)
          ORDER BY YEAR(created_at), QUARTER(created_at)
          LIMIT 10;"

result = dbGetQuery(con, query)

table_html = kable(result, format = "html") %>%
              kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))

table_html
year quater revenue margin
2012 1 2999.40 1830.0
2012 2 17346.53 10583.5
2012 3 34193.16 20862.0
2012 4 74735.05 45597.5
2013 1 66377.27 40744.5
2013 2 88492.83 54230.5
2013 3 95241.53 58370.5
2013 4 143136.24 88251.0
2014 1 190771.14 120197.0
2014 2 247711.95 156481.5

Visulaization

Observation:

  • Revenue and Margin tends to increase gradually from 2012 to 2015

  • The company usually has the highest revenue and profit in the last quarters of a year