i. INTRODUCTION:
The global consumer market expansion is at a large-scale with rapid growth of technology advancements and increase in working class population among the developing countries. According to PwC Global Consumer Insights Pulse Survey 2023, “Consumers seek friction-less experiences in a world of disruptions”.
Although consumer market has been facing hardship after COVID-19 pandemic, it is reviving back to pre-COVID situation as people started showing interest in purchases yet being cautious in savings at some extent. Global consumer market can be differentiated between pre-COVID-19 and post-COVID-19 since the revolution of e-Commerce have made a great impact in the people and the economy. Some of the major consumer markets in the world are, China, USA, and India (statista.com, 2023).
i. a) Dataset - Sales Order Summary:
In this report, we will analyze the sales order summary dataset that comprises the information about the consumer market sales in several countries in the regions like Africa, Asia Pacific, Europe, LATAM, and USCA. The products sold are categorized into three main categories such as, Furniture, Office supplies, and Technology.
The dataset contains fields that can be differentiated by different types of data for analysis: Categorical Nominal - City, State, Region, Market, Department, Division Categorical Ordinal - ProductID, Order Priority, ShipMode Quantitative discrete - Quantity, Returns Quantitative continuous - Product Price, Quantity, Shipping cost, Loss per Return
We can analyze the overall sales, sales in each market, consumer insights of each market, loss vs sales, shipping cost variation w.r.t markets.
i. b) Descriptive Statistics:
Descriptive statistics is
one form of statistics where it involves steps such as collecting,
orgnaizing, summarizing and presenting the data in the most meaningful
format for easy understanding. The results will be displayed in the form
of tables, charts, graphs, etc. For instance, any government census
reports about national population would use descriptive statistics
(Bluman, 2018).
i. c) Inferential Statistics:
Inferential statistics, on
the other hand and the other component of statistics, where it involves
generalizing results to population from collected samples, performing
hypothesis tests and estimations, determining relationships between
variables, and make predictions.
i. d) R Script versus R Markdown:
R Script: R script is the platform in RStudio where we can run
codes and see the output in console. It is possible to save our RScript
codes in “.R” file, however the saved R file cannot be produced in
different formats.
R Markdown: In the R Markdown file, we
can archive all our work and save them in different format such as .Rmd,
html, pdf, and word. It has the ability to save all kind information
from R Markdown file such as text, codes, graphs, tables, etc.
ii. ANALYSIS SECTION:
Task 1:
Description: In this section, presenting basic descriptive
statistics of the given dataset “sales order summary” and market
specific consumer sales.
# manipulating descriptive statistical analysis for shipping cost of each sales
mean_shipcost <- mean(salesordersummary$Shipping_Cost_Each)
median_shipcost <- median(salesordersummary$Shipping_Cost_Each)
sd_shipcost <- sd(salesordersummary$Shipping_Cost_Each)
mean_prodcost <- mean(salesordersummary$Product_Price)
median_prodcost <- median(salesordersummary$Product_Price)
sd_prodcost <- sd(salesordersummary$Product_Price)
# creating vectors with respect to column and rows to present data in a table
colname <- c("Product cost", "Ship cost")
rowname <- c("Mean", "Median", "Standard deviation")
salesordersumvector <- matrix(c(mean_prodcost, mean_shipcost, median_prodcost, median_shipcost, sd_prodcost, sd_shipcost), nrow=3, byrow=TRUE)
# using matrix for the table
salessummarytable <- matrix(salesordersumvector, ncol = 2, dimnames = list(rowname,colname))
# applying kable()
kable(salessummarytable, digits = 2, align = "c", format = "html")%>%
kable_styling(stripe_color = "blue", bootstrap_options = "striped", table.envir = "table", protect_latex = TRUE)
| Product cost | Ship cost | |
|---|---|---|
| Mean | 548.90 | 14.32 |
| Median | 501.35 | 8.59 |
| Standard deviation | 293.96 | 13.98 |
# sum shipping cost incurred in each market
sumshipcost_market <- tapply(salesordersummary$Shipping_Cost_Each, salesordersummary$Market, sum)
# presenting bar graph of sum of shipping cost in each market
par(mai=c(1.8,1.8,2,2),mar=c(4.5,4,2,1))
sumshipcost_market <- tapply(salesordersummary$Shipping_Cost_Each, salesordersummary$Market, sum)
sumshipcostpermarket_bar <- barplot(sumshipcost_market, main = "Sum of shipping cost per product in the markets", col = brewer.pal(8,"Blues"), las = 1, xlab = "Markets", ylab = "Shipping cost", ylim = c(0,max(sumshipcost_market)*1.25))
text(x = sumshipcostpermarket_bar, y = sumshipcost_market,
labels = round(sumshipcost_market, 2), pos = 3, cex = 0.7)
# presenting pie chart of avg shipping cost per product in each market
sumshipcostpermarket_pie <- pie(sumshipcost_market, main = "Sum of shipping cost per product in the markets", col = brewer.pal(8,"Pastel1"), las = 1, xlab = "Market", ylab = "Ship cost per product", ylim = c(0,max(sumshipcost_market)*1.25))
Observation: From the above presentation, we able to
understand that the sum of shipping costs in each market is visualized.
Asia Pacific market stands top with the total shipping value of 3931.4
and consecutively, Europe market with the total shipping cost of
3607.02.
Task 2:
Description: Presenting boxplot chart and histogram chart to
display the distribution of shipment costs incurred in the sales
par(par(mfcol=c(2,1)), mai=c(2,2,2,2),mar=c(4.5,4,2,1))
# Calculate mean and median
mean_value <- mean(salesordersummary$Shipping_Cost_Each)
median_value <- median(salesordersummary$Shipping_Cost_Each)
hist(salesordersummary$Shipping_Cost_Each, main = "Histogram of shipping cost per product - distribution", col = brewer.pal(9,"Greens"), xlab = "Shipping cost per product", ylim = c(0,max(salesordersummary$Shipping_Cost_Each)*6))
# Signifying average to the chart
abline(v = mean_value, col = "red", lwd = 2)
text(y = max(salesordersummary$Shipping_Cost_Each)*5.5, x = mean_value,
paste("Mean:", round(mean_value, 2)), col = "red")
# Signifying median to the chart
abline(v = median_value, col = "blue", lwd = 2)
text(y = max(salesordersummary$Shipping_Cost_Each)*5, x = median_value,
paste("Median:", round(median_value, 2)), col = "blue")
# presenting boxplot of sales order summart with respect to shipping cost
boxplot(salesordersummary$Shipping_Cost_Each, main = "Box plot of shipping cost per product - distribution", col = brewer.pal(7,"YlOrRd"), las = 1, horizontal = T, cex = 0.55, cex.main = 0.7, cex.lab = 0.7, cex.axis = 0.6, ylim = c(0,max(salesordersummary$Shipping_Cost_Each)*1.05))
# denoting mean in the bar chart
abline(col = "green", lwd = 2,v = mean_value)
text(y = max(salesordersummary$Shipping_Cost_Each)*5.5, x = mean_value,
paste("Mean:", round(mean_value, 2)), col = "red")
# denoting median in the bar chart
abline(col = "blue", lwd = 2, v = median_value)
text(y = max(salesordersummary$Shipping_Cost_Each)*5, x = median_value,
paste("Median:", round(median_value, 2)), col = "blue")
Observation: Bar chart and box plot presented with respect to the
shipping costs in the market and its distribution across the plot.
Median stands at 8.59 and Mean stands at 14.32. While reviewing box
plot, there are several outliers, refers to the shipping costs ranges
between 45 to 60.
Task 3:
Description: Presenting box chart with
respect to shipping cost and markets. In addition, will figure the
market that has highest shipping cost and the market holds lowest
shipping cost.
# boxplot to review Shipping cost w.r.t markets
boxplot(salesordersummary$Shipping_Cost_Each ~ salesordersummary$Market, data = salesordersummary, main="Boxplot of market and shipment cost", col = brewer.pal(7,"Paired"), xlab = "Markets", ylab = "Shipping cost")
# analyzing the shipping costs range
highshipcostperprod <- names(which.max(tapply(salesordersummary$Shipping_Cost_Each, salesordersummary$Market, max)))
lowshipcostperprod <- names(which.min(tapply(salesordersummary$Shipping_Cost_Each, salesordersummary$Market, max)))
Market that has high shipping cost is Africa
Market that has low
shipping cost is USCA
Task 4:
Description: Showing bar chart applying
tapply function to determine average shipping cost in each market and
will compare the same with the box chart interpreted in task 3.
#using tapply function
meanshipcost <- tapply(salesordersummary$Shipping_Cost_Each, salesordersummary$Market, mean)
#forming bar chart and assigning it to the object
barplotofshipcostmkt <- barplot(meanshipcost, main = "Average ship cost per product in the markets", col = brewer.pal(8,"Blues"), las = 1, xlab = "Markets", ylab = "Shipping cost", ylim = c(0,max(meanshipcost)*1.25), cex.axis = 0.6, cex = 0.41, cex.lab = 0.65,cex.main = 0.85)
text(x = barplotofshipcostmkt, y = meanshipcost + 0.05 * max(meanshipcost),
labels = round(meanshipcost, 2), cex = 0.7, pos = 3)
Observation: While reviewing the above bar chart and comparing
the same with the box plot presented in task 3, we able to observe that
Africa market has the highest average shipping cost compared to all
other markets
Task 5:
Description: This is the comparison between
shipping cost and shipping method in all the markets and sales.
#assigning objects by pulling the ship method and ship cost data from the dataset
modeofship <- salesordersummary$ShipMode
shipcost <- salesordersummary$Shipping_Cost_Each
# forming box plot to understand the comparison
boxplot(shipcost~modeofship,
main="Representation of Ship mode Vs Ship cost",
ylab="Ship Cost",
xlab="Ship Mode",
col=brewer.pal(8,"Spectral"))
Observation: While reviewing the outcome in the box plot, we able
to observe that the ship cost of the ship method “same day delivery” is
at the highest and consecutively, first class, second class, and
standard class.
Task 6:
Description: Adding a new column in the
dataset and renaming it into new dataset.
Task 7:
Description: As new dataset created in the
task 6 by adding new column for Total sales, in this step, we will be
able to figure out the market that holds highest sales value.
salesordersummary1 %>%
group_by(Market) %>%
summarise(Total_salesvalue = sum(Total_salesvalue)) %>%
filter(Total_salesvalue == max(Total_salesvalue)) %>%
kable(align = "c", digits = 2)%>%
kable_styling(full_width = NULL, stripe_color = "brown", table.envir = "table", protect_latex = TRUE, bootstrap_options = c("striped", "hover"))
| Market | Total_salesvalue |
|---|---|
| Asia Pacific | 24704625 |
Observation: In market, Asia Pacific total sales is the highest with 2.4704625^{7}.
Task 8:
Description: Random question to analyze the
dataset using the combination of all three codes: mutate(), filter(),
group_by().
Question: In the African market, which segment has highest
total loss due to returns in the furniture department. Posting this
question to understand the loss incurred in African consumer market due
to returns with respect to the furniture department. Country - Africa
Department - Furniture
# creating new object by adding additional column to the data set as "Total_lossreturns"
salesordersummary2 <- salesordersummary %>%
mutate(Total_lossreturns = Loss_Per_Return * Returns)
# forming the table to figure out the segment in the african market that incurs high loss due to returns for furniture department
salesordersummary2 %>%
filter(Market == "Africa", Department == "Furniture") %>%
group_by(Segment) %>%
summarise(Total_lossreturns = sum(Total_lossreturns)) %>%
filter(Total_lossreturns == max(Total_lossreturns)) %>%
kable(align = "c", digits = 2)%>%
kable_styling(full_width = NULL, stripe_color = "blue", table.envir = "table", protect_latex = TRUE, bootstrap_options = c("striped", "hover"))
| Segment | Total_lossreturns |
|---|---|
| Consumer | 21754.81 |
Observation: Based on the question, I did run r codes using
mutate(), filter(), and group_by(). In the African market, consumer
segment has highest total loss due to returns in the furniture
department.
Conclusion: In a summary, dataset about sales order summary
that comprises different types of data would help us to analyze the
shipping cost, return losses, product prices, consumer market strengths
acroos the world, consumer interests on segments or departments, etc. It
is great to observe that “Same day” delivery ship method is used a lot
and it gives rise of incurring highest shipping costs. On the other
hand, Asia Pacific market has highest sales compared to other
markets.
Acknowledgement: I would like to thank my Prof. Diana Chiluiza, PhD, who helped to understand the fundamentals of analytics with great and interactive class sessions. Appendix: An R Markdown file has been attached to this report. The name of the file is “FinalProject_Rmarkdown.rmd”