CSV Files
select *
from supply_chain_data
SQL queries are written in RMarkdown.
Problem: I would like to understand the customer
demographic, and get a percentage of each.
Query Process: Make a sub-query labeled as temp, then create
a new column call [Total_Customers], which is a count aggregation of the
entire data set, then use the OVER() clause to place the aggregated
result in all the rows.
Call onto the subquery temp (with the
from clause), group by [Customer_demographics]. Do a count aggregation
that will be grouped by [Customer_demographics], multiply by 100 divided
by the aggregation MAX() of column [Total_Customers] to get the
percentage.
select
Customer_demographics
,(count(Customer_demographics)*100/max(Total_Customers)) as Percentage
from
(
Select
Customer_demographics
,count(Customer_demographics) over () as Total_Customers
from supply_chain_data
) as temp
group by
Customer_demographics
Conclusion: Table gives a percentage of each customer
demographic, if we ignore the unknown factor, we can assume majority of
customers are female.
Problem: Lets see which [Product_type] produces the most
revenue, and their average manufacturing cost.
Query Process:Grouped by [Product_type], then take the sum of the [Revenue_generated] and average() of [Manufacturing_costs]. Ordered by Revenue_generated.
select
Product_type
,sum(Revenue_generated) as Revenue_generated
,avg(Manufacturing_costs) as Avg_Manufac_costs
from
supply_chain_data
group by
Product_type
order by
2 desc
Conclusion: Skincare seems to produce the most
revenue.
Problem: Lets check shipping carriers and their average
costs.
Query Process: Group by [Shipping_carriers], take average() of [Shipping_costs].
select
Shipping_carriers
,round(avg(Shipping_costs),2) as avg_Shipping_costs
from
supply_chain_data
group by
Shipping_carriers
order by
2 asc
Conclusion: Carrier B has a lower average shipping
cost.
Problem: I want to know who is at fault when it comes
to defect rates, is it the Location or the Supplier.
Query Process: Created a first cte1, where
we give the AVG() defect rate of each Location using partition by
clause(). And the AVG() defect rate of each Supplier using partition by
() clause.
Create a second cte2. From cte1,
create a case statement, if the [location_defect_rate] is > than
[supplier_defect_rate], we label that as a ‘Bad Location’, else ‘Bad
Supplier’.
Create a third cte3. Group by
Location, Supplier_name, label. Then get the max defect rate of
supplier_defect_rate, and location_defect_rate to compare Location
versus Supplier.
/*
Gives the average defect rate of Locations, and average defect rate of all Suppliers.
*/
with cte1 as(
Select
Location
,Supplier_name
,Defect_rates
,round(avg(Defect_rates) over(partition by Location),2) as location_defect_rate
,round(avg(Defect_rates) over (partition by Supplier_name),2) as supplier_defect_rate
from
supply_chain_data
)
/*
Create a case statement where if the location or the supplier has a higher defect rate
*/
, cte2 as(
select
Location
,Supplier_name
,supplier_defect_rate
,location_defect_rate
,Case
when location_defect_rate > supplier_defect_rate then 'Bad Location'
else 'Bad Supplier'
end as label
from cte1
)
/*
Group by Location, Supplier_name, label to compare Location and Supplier.
*/
, cte3 as(
select
Supplier_name
,Location
,max(supplier_defect_rate) supplier_defect_rate
,max(location_defect_rate) location_defect_rate
, label
from cte2
group by
Location
, Supplier_name
, label
)
select *
from cte3
Order by
1,2
Here shows the list of locations the manufacturing supplies come from and their Supplier number, as well as this average supplier defect rate and location defect rate.
Here is a count of how many [Bad Location], and
[Bad Suppliers].
/*
Gives the average defect rate of Locations, and average defect rate of all Suppliers.
*/
with cte1 as(
Select
Location
,Supplier_name
,Defect_rates
,round(avg(Defect_rates) over(partition by Location),2) as location_defect_rate
,round(avg(Defect_rates) over (partition by Supplier_name),2) as supplier_defect_rate
from
supply_chain_data
)
/*
Create a case statement where if the location or the supplier has a higher defect rate
*/
, cte2 as(
select
Location
,Supplier_name
,supplier_defect_rate
,location_defect_rate
,Case
when location_defect_rate > supplier_defect_rate then 'Bad Location'
else 'Bad Supplier'
end as label
from cte1
)
, cte3 as(
select
Supplier_name
,Location
,max(supplier_defect_rate) supplier_defect_rate
,max(location_defect_rate) location_defect_rate
, label
from cte2
group by
Location
, Supplier_name
, label
)
select
count(case when label = 'Bad Location' then 1 else null end) as Bad_Location_Count
,count(case when label = 'Bad Supplier' then 1 else null end) as Bad_Supplier_Count
from cte3
Conclusion: There are 17 rows where the Suppliers had a
higher average defect rate than the location, and 8 rows where the
Location has a higher average defect rate than the Supplier.