kpmg-combined.knit

SQL Queries and Analysis

SQL queries are written in RMarkdown.

Libraries used
- library(odbc)
- library(DT)

This is what our final table will look like. Transactions will be the main table that will be analyzed. It has unique [customer_id] and [transaction_id]s as well as the [listing_price], which will subtract [standard] cost to get the “profit”.

I will be left joining the [CustomerAddress] table by the customer_id to Transactions.customer_id.

And the [CustomerDemographic] table by the customer_id to Transactions.customer_id.

This is the data preview.

select 
    *
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id

Problem: What is the total number of canceled vs. approved orders within the dataset?

Query Process: Group by the [order_status], and count(order_status).

select 
    order_status
    , count(order_status) quantity
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
group by order_status

Conclusion: 19821 Approved Orders, and 179 canceled orders.

Problem: Lets calculate the total profit Sprocket Central has gained.

Query Process: Use Transaction table with the joins created earlier, filter for [order_status]=‘Approved’ orders only, leaving out ‘Canceled’ orders. Round the sum of [list_price] - [standard_cost] to 0 decimals.

select 
    round(sum(list_price - isnull(standard_cost,0)),0) Total_Profit
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
where
    order_status = 'Approved'

Conclusion: 11047456$ was the total amount of profit within the dataset.

Problem: What brand type was the most popular. List the profit earned for each brand made.

Query Process: Count the total number of [customer_id], then use over() to create a new column called “Total_Customers” that has the total number of customers. Make this query into a common table expression named cte1.

Query Process: Group by the [brand], do a count() of ‘Approved’ orders, and do a sum of [list_price]-[standard_cost].

select 
    brand
    ,Count(*) Total_Customers
    ,round(sum(list_price-isnull(standard_cost,0)),0) Profit
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
where
    order_status = 'Approved'
group by
    brand
order by
    Total_Customers desc

Conclusion: Solex seems to be the most popular brand, but WeareA2B seems to produce the most profit.

Problem: Lets give a percentage of the total profit each state produces.

Query Process: Count the total number of [customer_id], then use over() to create a new column called “Total_Profit” that has the total profit. Make this query into a common table expression named cte1.

Query from cte1, group by [state], do another aggregation * 100 / max[Total_Customers], name it Percentage.

with cte1 as(
select
    isnull(state,'Unknown') as state
    ,list_price
    ,standard_cost
        /*This Total_Profit column will be used to calculate the percentage
    in the next query.*/
    ,sum(list_price - isnull(standard_cost,0)) over () Total_Profit
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
where
    order_status = 'Approved'
)

select 
    state
    ,round(((sum(list_price - isnull(standard_cost,0)))*100.0/max(Total_Profit)),2) as Percentage 
from 
    cte1
group by 
    state
order by Percentage desc

Conclusion: Majority of the profit comes from New South Wales.

Problem: For each month lets give the percentage increase in profit compared to the previous month.

Query Process: Group by month, and calculate the monthly profit, sum of [list_price] - [standard_cost], that new column will be “monthly_profit”. Then use the lag() function to accesses the value from the previous row of the new column “monthly_profit”, name it “prev_monthly_profit”. Name the query as cte1.

Query from cte1, create new column called “percent_change”. The equation for percent change is ([monthly_profit] - [prev_monthly_profit])/[prev_monthly_profit]. Standard percent change equation.

with cte1 as(
select
    datepart(MONTH,transaction_date) month
    ,sum(list_price - isnull(standard_cost,0)) monthly_profit
    ,lag(sum(list_price - isnull(standard_cost,0))) 
    over (order by datepart(year,transaction_date), datepart(MONTH,transaction_date)) prev_monthly_profit
    
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
where
    order_status = 'Approved'
group by
    datepart(MONTH,transaction_date),datepart(year,transaction_date)
)

select
    month
    ,prev_monthly_profit
    ,monthly_profit
    ,round(((100*(monthly_profit-prev_monthly_profit))/prev_monthly_profit),2) percent_change
from cte1

Conclusion: Month of October, 2017, seems to have the highest profit earned, and the highest percent increase in revenue.

Problem: What are the top 10 customers.

Query Process: Group by [Transactions.customer_id], [first_name], [last_name], [job_title]. Order by the profit_earned descending.

select top 10
    Transactions.customer_id
    ,first_name
    ,last_name
    ,sum(list_price - isnull(standard_cost,0)) profit_earned
    ,job_title
    
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
where
    order_status = 'Approved'
group by
    Transactions.customer_id
    ,first_name
    ,last_name
    ,job_title
order by 
    profit_earned desc

Conclusion: Seems a recruiting manager named Tye Doohan was the highest paying customer.

Problem: Label our customers into different age groups: young, middle-aged, elderly.

Query Process: Use a case statement to create a column called [Age_Label] to label our customers that are Young if the [Todays_date]-[DOB] <= 45, >= 65 as Middle-Aged, else then Elderly. Create a new column that count the total amount of customers then use over() clause. Make it cte1.

New query from cte1, group by [Age_Label], make a new column call Age_Percent that counts the total number of each age group then divide by the total number of customers to get a percentage.

with cte1 as(
select
    count(*) over () total_customers
    ,case when datediff(year,DOB,CAST( GETDATE() AS Date )) <= 45 then 'Young'
    when datediff(year,DOB,CAST( GETDATE() AS Date )) <= 65 then 'Middle Aged'
    else 'Elderly' 
    end Age_Label
from 
    Transactions
    left join CustomerAddress 
        on CustomerAddress.customer_id = Transactions.customer_id
    left join CustomerDemographic
        on CustomerDemographic.customer_id = Transactions.customer_id
where
    order_status = 'Approved'
)
select
    Age_Label
    ,round((100.*count(Age_Label))/max(total_customers),2) Age_Percent
from
    cte1
group by
    Age_Label
order by
    Age_Percent desc

Conclusion: Majority are Young and under 45.

Introduction

Project Recap

Schema

SQL Queries and Analysis

Tableau Visualization