library(readxl)
mydata <- read_excel("C:/Users/pauli/Downloads/Adidas US Sales Datasets.xlsx")
head(mydata)
## # A tibble: 6 × 13
## Retai…¹ Retai…² `Invoice Date` Region State City Product Price…³ Units…⁴
## <chr> <dbl> <dttm> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Foot L… 1185732 2020-01-01 00:00:00 North… New … New … Men's … 50 1200
## 2 Foot L… 1185732 2020-01-02 00:00:00 North… New … New … Men's … 50 1000
## 3 Foot L… 1185732 2020-01-03 00:00:00 North… New … New … Women'… 40 1000
## 4 Foot L… 1185732 2020-01-04 00:00:00 North… New … New … Women'… 45 850
## 5 Foot L… 1185732 2020-01-05 00:00:00 North… New … New … Men's … 60 900
## 6 Foot L… 1185732 2020-01-06 00:00:00 North… New … New … Women'… 50 1000
## # … with 4 more variables: `Total Sales` <dbl>, `Operating Profit` <dbl>,
## # `Operating Margin` <dbl>, `Sales Method` <chr>, and abbreviated variable
## # names ¹Retailer, ²`Retailer ID`, ³`Price per Unit`, ⁴`Units Sold`
mydata1 <- mydata[,c(-2, -3, -4, -6, -12)]
head(mydata1)
## # A tibble: 6 × 8
## Retailer State Product Price…¹ Units…² Total…³ Opera…⁴ Sales…⁵
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 Foot Locker New York Men's Street Foo… 50 1200 60000 300000 In-sto…
## 2 Foot Locker New York Men's Athletic F… 50 1000 50000 150000 In-sto…
## 3 Foot Locker New York Women's Street F… 40 1000 40000 140000 In-sto…
## 4 Foot Locker New York Women's Athletic… 45 850 38250 133875 In-sto…
## 5 Foot Locker New York Men's Apparel 60 900 54000 162000 In-sto…
## 6 Foot Locker New York Women's Apparel 50 1000 50000 125000 In-sto…
## # … with abbreviated variable names ¹`Price per Unit`, ²`Units Sold`,
## # ³`Total Sales`, ⁴`Operating Profit`, ⁵`Sales Method`
The initial data set consists of 9648 observations of 13 variables. For further research the data set is trimmed down to 8 variables (4 of them are characteristic and 4 of them are numeric)
The unit of observation: Product line of Adidas by retailer in specific state in the US
The variables are the following:
-Retailer: name of a retailer -State: name of the state in the US -Product: type of Adidas merchandise -Price per unit: the price of a merchandise in US Dollars -Units sold: quantity of Adidas merchandise -Total sales: total sales in US Dollars -Operating profit: operating profit in US Dollars -Sales method: sales channel
The data set was found on the Kaggle website, the author is Heemali Chaudhari. Retrieved January 5th 2023, from https://www.kaggle.com/datasets/heemalichaudhari/adidas-sales-dataset .
The purpose of analyzing Adidas sales data is to identify the top-performing states, retailers, products, and sales methods in the US.Additionally, we will use regression analysis to examine the relationship between the price per unit and the quantity of units sold on total sales.
RQ: What are the key drivers of Adidas sales?
Let’s begin with some descriptive statistics.
summary(mydata1)
## Retailer State Product Price per Unit
## Length:9648 Length:9648 Length:9648 Min. : 7.00
## Class :character Class :character Class :character 1st Qu.: 35.00
## Mode :character Mode :character Mode :character Median : 45.00
## Mean : 45.22
## 3rd Qu.: 55.00
## Max. :110.00
## Units Sold Total Sales Operating Profit Sales Method
## Min. : 0.0 Min. : 0 Min. : 0 Length:9648
## 1st Qu.: 106.0 1st Qu.: 4065 1st Qu.: 1922 Class :character
## Median : 176.0 Median : 7804 Median : 4371 Mode :character
## Mean : 256.9 Mean :12455 Mean : 34425
## 3rd Qu.: 350.0 3rd Qu.:15864 3rd Qu.: 52063
## Max. :1275.0 Max. :82500 Max. :390000
library(psych)
## Warning: package 'psych' was built under R version 4.2.2
describe(mydata1[, c(-1, -2, -3, -8)])
## vars n mean sd median trimmed mad min
## Price per Unit 1 9648 45.22 14.71 45.00 44.83 14.83 7
## Units Sold 2 9648 256.93 214.25 176.00 221.49 134.92 0
## Total Sales 3 9648 12455.08 12716.39 7803.50 9958.75 6802.91 0
## Operating Profit 4 9648 34425.24 54193.11 4371.42 22270.22 5086.02 0
## max range skew kurtosis se
## Price per Unit 110 103 0.36 0.43 0.15
## Units Sold 1275 1275 1.46 1.70 2.18
## Total Sales 82500 82500 1.96 4.12 129.46
## Operating Profit 390000 390000 2.23 5.81 551.73
hist(mydata$`Price per Unit`, xlab = "Price of product", ylab = "Frequency", main = "Product's price distribution", breaks = seq(10, 50, 1000),
right = FALSE)
mean(mydata1$`Price per Unit`)
## [1] 45.21663
First, to look at the total sales we should check the distribution of price of product line. The average price of product is 45.22 as already mentioned before. By observing the graph above, we can see that the selling price has normal distribution, which is not skewed to the right or left.
mydata2 <- aggregate(mydata1$`Total Sales` ~ State, mydata1, sum)
colnames(mydata2)<-c("State","Total Sales")
state_with_highest_sales <- mydata2[order(-mydata2$`Total Sales`) , ]
head(state_with_highest_sales)
## State Total Sales
## 32 New York 8670464
## 5 California 8580508
## 9 Florida 7820589
## 43 Texas 6612371
## 40 South Carolina 3593112
## 18 Louisiana 3377031
Next, we will examine the states with the highest revenue of Adidas merchandise. The top three states by total sales were New York, California, and Florida, in that order.
mydata3 <- aggregate(mydata1$`Total Sales` ~ mydata1$`Sales Method`, mydata1, sum)
colnames(mydata3)<-c("Sales method","Total Sales")
sales_method_with_the_highest_total_sales <- mydata3[order(-mydata3$`Total Sales`) , ]
head(sales_method_with_the_highest_total_sales)
## Sales method Total Sales
## 2 Online 44965657
## 3 Outlet 39536618
## 1 In-store 35664375
Subsequently, we will analyze the total sales of Adidas merchandise by sales method. Online sales generated the most revenue, followed by outlet sales and in-store sales. The following two graphics show the percentage of total sales generated by each sales method.
sales_method <- c(35664375, 44965657, 39536618)
labels <- c("In-store", "Online", "Outlet")
colors <- c("red", "green", "yellow")
pie(sales_method, labels = labels, col = colors, main = "Pie chart of total sales by sales method")
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.2.2
##
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
##
## %+%, alpha
sales_method <- c(35664375/1000000, 44965657/1000000, 39536618/1000000)
labels <- c("In-store", "Online", "Outlet")
ggplot(mydata3, aes(x = labels , y = sales_method)) +
geom_col() +
ggtitle("Total sales by sales method in millions (US Dollars)") +
ylab("Total sales") +
xlab("Sales method") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
mydata4 <- aggregate(mydata1$`Total Sales` ~ mydata1$Retailer, mydata1, sum)
colnames(mydata4)<-c("Retailer","Total Sales")
retailer_with_highest_total_sales <- mydata4[order(-mydata4$`Total Sales`) , ]
head(retailer_with_highest_total_sales)
## Retailer Total Sales
## 6 West Gear 32409558
## 2 Foot Locker 29024945
## 4 Sports Direct 24616622
## 3 Kohl's 13512453
## 5 Walmart 10506085
## 1 Amazon 10096987
Additionally, we analyzed total sales of Adidas merchandise by retailer. West Gear had the highest total sales, followed by Foot Locker and Sports Direct. Below there are two graphic representations of that.
library(ggplot2)
df <- data.frame (
retailer = c("West Gear", "Foot Locker", "Sports Direct", "Kohl's", "Amazon", "Walmart"),
total_sales = c( 32409558, 29024945, 24616622, 13512453, 10506085, 10096987
))
colors <- c("red", "green", "blue", "yellow", "darkorange", "dodgerblue2"
)
pie(df$total_sales, labels = df$retailer, col = colors, main = "Total sales by retailer")
library(ggplot2)
library(scales)
##
## Attaching package: 'scales'
## The following objects are masked from 'package:psych':
##
## alpha, rescale
df <- data.frame (
retailer = c("West Gear", "Foot Locker", "Sports Direct", "Kohl's", "Amazon", "Walmart"),
total_sales = c( 32409558/1000000, 29024945/1000000, 24616622/1000000, 13512453/1000000, 10506085/1000000, 10096987/1000000
))
ggplot(df, aes(x = retailer , y = total_sales)) +
geom_col() +
ggtitle("Total sales per retailer in millions (US Dollars)") +
ylab("Total sales") +
xlab("Retailer") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
mydata5 <- aggregate(mydata1$`Total Sales` ~ mydata1$Product, mydata1, sum)
colnames(mydata5)<-c("Product","Total sales")
product_with_the_highest_sales <- mydata5[order(-mydata5$`Total sales`) , ]
head(product_with_the_highest_sales)
## Product Total sales
## 3 Men's Street Footwear 27680769
## 4 Women's Apparel 23870985
## 2 Men's Athletic Footwear 20577180
## 6 Women's Street Footwear 17201563
## 1 Men's Apparel 16520632
## 5 Women's Athletic Footwear 14315521
Next, we look at total sales by the type of Adidas product sold. Men’s Street Footwear generated the most revenue, followed by Women’s Apparel in second place and Men’s Athletic Footwear in third.
product_type <- data.frame(
product = c("Men's Street Footwear", "Women's Apparel", "Men's Athletic Footwear", "Women's Street Footwear", "Men's Apparel", "Women's Athletic Footwear"),
total_sales = c(27680769/1000000, 23870985/1000000, 20577180/1000000, 17201563/1000000, 16520632/1000000, 14315521/1000000)
)
library(ggplot2)
library(scales)
ggplot(product_type, aes(x = product , y = total_sales)) +
geom_col() +
ggtitle("Total sales per product in millions (US Dollars)") +
ylab("Total sales") +
xlab("Type of Product") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
mydata1$`Total Sales in thousand`<-(mydata$`Total Sales`/1000)
library(car)
## Warning: package 'car' was built under R version 4.2.2
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
scatterplot(x=mydata1$`Units Sold`, y=mydata1$`Total Sales in thousand`,
main = "Relationship between total sales and units sold",
xlab = "Units sold",
ylab = "Total Sales",
smooth = FALSE)
library(car)
scatterplot(x=mydata1$`Price per Unit`, y=mydata1$`Total Sales in thousand`,
main = "Relationship between total sales and average price per unit",
ylab = "Total Sales",
xlab ="Price" ,
smooth = FALSE)
reg<-lm(mydata1$`Total Sales`~mydata1$`Price per Unit`+mydata1$`Units Sold`,data=mydata1)
summary(reg)
##
## Call:
## lm(formula = mydata1$`Total Sales` ~ mydata1$`Price per Unit` +
## mydata1$`Units Sold`, data = mydata1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17739.8 -1356.7 -453.8 1288.0 27820.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -12694.056 104.254 -121.8 <2e-16 ***
## mydata1$`Price per Unit` 274.618 2.258 121.6 <2e-16 ***
## mydata1$`Units Sold` 49.554 0.155 319.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3143 on 9645 degrees of freedom
## Multiple R-squared: 0.9389, Adjusted R-squared: 0.9389
## F-statistic: 7.411e+04 on 2 and 9645 DF, p-value: < 2.2e-16
Based on our economic instinct we expected price and units sold both significantly impact total sales, therefore, we conducted a regression analysis. We also created two scatter plots (above), which show a positive correlation between price and units sold as independent variables and total sales as the dependent variable. The scatter plots indicate that units sold have a greater impact on total sales. According to the regression function, if the average price per unit increases by one US Dollar, total sales increase on average by 274.62 US Dollars, all else being equal. In contrast, if the number of units sold increases by one, total sales on average increase by 50 units, all else being equal.