Personal Project: E-commerce Database analysis for Maven Fuzzy
Factory
In this Case Study, I will assume the role of an E-commerce Database
Analyst at a fictional company called Maven Fuzzy Factor: an online
retail company. The company was established in 2012 and has been
operating for 3 years.
I have been assigned the task of analyzing data, providing insights,
and optimizing marketing channels. Additionally, I will measure and
experiment with website conversion performance and utilize data to
understand the impact of launching new products.
I will follow the steps of the data analysis process: Ask, Prepare,
Process, Analyze, Share, and Act.
I. Ask
Business Task
Website session analysis
Websites session analysis from device type
Conversion performance analysis
Revenue and Margin analysis
II. Prepare
Resource
I utilize the company’s Database to perform analysis and identify
tends over 3 years from 19/03/2012 to 31/12/2014
Data utilize from Maven Fuzzy
Factory
III. Process
- Maven Fuzzy Factory Database Overview

Observation:
Since the tables in the Maven_fuzzy_factory Database all have a
Primary Key, there are no duplicated values in those tables.
In the process of data analysis will manipulate many tables in
the Maven_fuzzy_factory Database, so the cleaning process will be done
together with the analyze step
Connect to MySQL in R language
Install Packages
install.packages("RMySQL")
install.packages("DBI")
install.packages("kableExtra")
In there:
Package “RMySQL” allow connect to MySQL
Package “DBI” allow interact with Database
Package “kableExtra” allows to retrieve query results in the form
of tables on HTML screen
Utilize library of 3 packages
library(DBI)
library(RMySQL)
library(kableExtra)
IV. Analyze and Share
1. Website traffic analysis
a) Websites session analysis from device type
query = "SELECT
YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quarter,
device_type, COUNT(device_type) AS count_devive
FROM
website_pageviews AS wp
LEFT JOIN website_sessions AS ws
ON wp.website_session_id = ws.website_session_id
GROUP BY device_type, YEAR(wp.created_at), QUARTER(wp.created_at)
ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at)
LIMIT 10;"
result = dbGetQuery(con, query)
table_html = kable(result, format = "html") %>%
kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
table_html
|
year
|
quarter
|
device_type
|
count_devive
|
|
2012
|
1
|
mobile
|
1191
|
|
2012
|
1
|
desktop
|
2528
|
|
2012
|
2
|
desktop
|
17356
|
|
2012
|
2
|
mobile
|
5883
|
|
2012
|
3
|
desktop
|
29988
|
|
2012
|
3
|
mobile
|
7389
|
|
2012
|
4
|
desktop
|
58756
|
|
2012
|
4
|
mobile
|
13392
|
|
2013
|
1
|
mobile
|
9770
|
|
2013
|
1
|
desktop
|
37889
|
Visulaization
Observation:
The website sessions from desktop is higher than mobile in all
years and quarters.
The difference between desktop and mobile session can represent
that customers frequently access the company’s website using desktop
devices
b) Analysis of sessino from Utm_Source and
Utm_Campaign
utm_source
query = "SELECT
YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quarter,
source AS utm_source, COUNT(source) AS count
FROM
website_pageviews AS wp
LEFT JOIN (
SELECT
*, CASE WHEN utm_source IS NULL THEN 'NULL' ELSE utm_source END AS source
FROM website_sessions
) AS ws
ON wp.website_session_id = ws.website_session_id
GROUP BY YEAR(wp.created_at), QUARTER(wp.created_at), source
ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at), source
LIMIT 10;"
result = dbGetQuery(con, query)
table_html = kable(result, format = "html") %>%
kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
table_html
|
year
|
quarter
|
utm_source
|
count
|
|
2012
|
1
|
bsearch
|
2
|
|
2012
|
1
|
gsearch
|
3668
|
|
2012
|
1
|
NULL
|
49
|
|
2012
|
2
|
bsearch
|
166
|
|
2012
|
2
|
gsearch
|
21034
|
|
2012
|
2
|
NULL
|
2039
|
|
2012
|
3
|
bsearch
|
4651
|
|
2012
|
3
|
gsearch
|
28936
|
|
2012
|
3
|
NULL
|
3790
|
|
2012
|
4
|
bsearch
|
14417
|
utm_campaign
query = "SELECT
YEAR(wp.created_at) AS year, QUARTER(wp.created_at) AS quarter,
campaign AS utm_campaign, COUNT(campaign) AS count
FROM
website_pageviews AS wp
LEFT JOIN (
SELECT
*, CASE WHEN utm_campaign IS NULL THEN 'NULL' ELSE utm_campaign END AS campaign
FROM website_sessions
) AS ws
ON wp.website_session_id = ws.website_session_id
GROUP BY YEAR(wp.created_at), QUARTER(wp.created_at), campaign
ORDER BY YEAR(wp.created_at), QUARTER(wp.created_at), campaign
LIMIT 10;"
result = dbGetQuery(con, query)
table_html = kable(result, format = "html") %>%
kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
table_html
|
year
|
quarter
|
utm_campaign
|
count
|
|
2012
|
1
|
brand
|
16
|
|
2012
|
1
|
nonbrand
|
3654
|
|
2012
|
1
|
NULL
|
49
|
|
2012
|
2
|
brand
|
970
|
|
2012
|
2
|
nonbrand
|
20230
|
|
2012
|
2
|
NULL
|
2039
|
|
2012
|
3
|
brand
|
1973
|
|
2012
|
3
|
nonbrand
|
31614
|
|
2012
|
3
|
NULL
|
3790
|
|
2012
|
4
|
brand
|
4181
|
Visulaization
Observation:
There are 4 main traffic sources: NULL, bsearch, search and
socialbook. Where NULL represents accesses without a
specified source
The gsearch traffic source accounts for a significantly
larger amount of traffic than other traffic sources. It can be commented
that customers often access the company’s website via Utm_source
“gsearch”
3. Revenue and Margin analysis
query = "SELECT
YEAR(created_at) year, QUARTER(created_at) quater,
SUM(price_usd) AS revenue,
SUM(price_usd - cogs_usd) AS margin
FROM orders AS o
GROUP BY YEAR(created_at), QUARTER(created_at)
ORDER BY YEAR(created_at), QUARTER(created_at)
LIMIT 10;"
result = dbGetQuery(con, query)
table_html = kable(result, format = "html") %>%
kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover"))
table_html
|
year
|
quater
|
revenue
|
margin
|
|
2012
|
1
|
2999.40
|
1830.0
|
|
2012
|
2
|
17346.53
|
10583.5
|
|
2012
|
3
|
34193.16
|
20862.0
|
|
2012
|
4
|
74735.05
|
45597.5
|
|
2013
|
1
|
66377.27
|
40744.5
|
|
2013
|
2
|
88492.83
|
54230.5
|
|
2013
|
3
|
95241.53
|
58370.5
|
|
2013
|
4
|
143136.24
|
88251.0
|
|
2014
|
1
|
190771.14
|
120197.0
|
|
2014
|
2
|
247711.95
|
156481.5
|
Visulaization
Observation: