Author: Yaqi Hu


1 Research question (2.5 points)

1.1 Clear research question that can be tested statistically (1.5 points)

Are there differences in pricing between women’s and men’s clothing items at Zara?

1.2 Which variables need to be collected to answer your research question (1 point)

Section, whether the clothing is for woman or man. And the Prices of both women’s and men’s clothing from Zara.

2 Data (2.5 points)

2.1 Import of the data, presentation of the data using a function head, definition of the variables used (0.5 points)
mydata <-read.table("./zara.csv", header= TRUE, sep=";",dec=".")

head(mydata)
##   Product.ID Product.Position Promotion Product.Category Seasonal Sales.Volume
## 1     185102            Aisle        No         Clothing       No         2823
## 2     188771            Aisle        No         Clothing       No          654
## 3     180176          End-cap       Yes         Clothing      Yes         2220
## 4     112917            Aisle       Yes         Clothing      Yes         1568
## 5     192936          End-cap        No         Clothing      Yes         2942
## 6     117590          End-cap        No         Clothing       No         2968
##   brand                                                                 url
## 1  Zara       https://www.zara.com/us/en/basic-puffer-jacket-p06985450.html
## 2  Zara             https://www.zara.com/us/en/tuxedo-jacket-p08896675.html
## 3  Zara      https://www.zara.com/us/en/slim-fit-suit-jacket-p01564520.html
## 4  Zara       https://www.zara.com/us/en/stretch-suit-jacket-p01564300.html
## 5  Zara       https://www.zara.com/us/en/double-faced-jacket-p08281477.html
## 6  Zara https://www.zara.com/us/en/contrasting-collar-jacket-p06987331.html
##                sku                      name
## 1  272145190-250-2       BASIC PUFFER JACKET
## 2 324052738-800-46             TUXEDO JACKET
## 3 335342680-800-44      SLIM FIT SUIT JACKET
## 4 328303236-420-44       STRETCH SUIT JACKET
## 5  312368260-800-2       DOUBLE FACED JACKET
## 6  320298385-807-2 CONTRASTING COLLAR JACKET
##                                                                                                                                                                                              description
## 1          Puffer jacket made of tear-resistant ripstop fabric. High collar and adjustable long sleeves with adhesive straps. Welt pockets at hip. Adjustable hem with side elastics. Front zip closure.
## 2                               Straight fit blazer. Pointed lapel collar and long sleeves with buttoned cuffs. Welt pockets at hip and interior pocket. Central back vent at hem. Front button closure.
## 3                              Slim fit jacket. Notched lapel collar. Long sleeves with buttoned cuffs. Welt pocket at chest and flap pockets at hip. Interior pocket. Back vents. Front button closure.
## 4 Slim fit jacket made of viscose blend fabric. Notched lapel collar. Long sleeves with buttoned cuffs. Welt pocket at chest and flap pockets at hip. Interior pocket. Back vents. Front button closure.
## 5                                                             Jacket made of faux leather faux shearling with fleece interior. Tabbed lapel collar. Long sleeves. Zip pockets at hip. Front zip closure.
## 6                                             Relaxed fit jacket. Contrasting lapel collar and long sleeves with buttoned cuffs. Front pouch pockets. Interior pocket. Washed effect. Front zip closure.
##    price currency                 scraped_at   terms section
## 1  19.99      USD 2024-02-19T08:50:05.654618 jackets     MAN
## 2 169.00      USD 2024-02-19T08:50:06.590930 jackets     MAN
## 3 129.00      USD 2024-02-19T08:50:07.301419 jackets     MAN
## 4 129.00      USD 2024-02-19T08:50:07.882922 jackets     MAN
## 5 139.00      USD 2024-02-19T08:50:08.453847 jackets     MAN
## 6  79.90      USD 2024-02-19T08:50:09.140497 jackets     MAN
2.2 Definition of the unit of observation and the sample size (0.5 points)
  • Unit of observations: items sold at Zara

  • Sample size: 252

  • Used variables:

  • Price: Price of the product in USD

  • Section: Specifies whether the product is intended for men or women

  • Other variables:

  • Product.ID: Identification number for each product

  • Product.Position: Location of the product

  • Promotion: Indicates whether the product is currently being offered at a promotional price.

  • Product Category: Broad product group of an item

  • Seasonal: Product sold seasonally

  • Sales Volume: The number of units sold for an item.

  • Brand: Brand of the item

  • URL: web link to the item

  • SKU: Stock Keeping Unit, identification number to manage the inventory for the product

  • Name: Name of the product

  • Description: Description of the product

  • Currency: Currency of the product price.

  • Scraped_at: The time when the data was scraped

  • Terms: Subcategory of the product

2.3 Source of the data set (0.5 points)

Dataset from: https://www.kaggle.com/datasets/xontoloyo/data-penjualan-zara The data shows product sales from Zara stores.

2.4 Basic descriptive statistics (1 point) - estimate a few parameters (e.g., functions summary, describe, etc.) and explanation
#Converting categorical variable into factors

mydata$section <- factor(mydata$section)

#Summary
summary(mydata$price)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    7.99   49.90   79.90   86.25  109.00  439.00
summary(mydata$section)
##   MAN WOMAN 
##   218    34
library(psych)
describe(mydata$price)
##    vars   n  mean    sd median trimmed   mad  min max  range skew kurtosis   se
## X1    1 252 86.25 52.08   79.9   80.92 43.14 7.99 439 431.01 2.36    10.99 3.28

The cheapest clothing item costs $7.99. On average, a piece of clothing sells for $86.25. 50% of the clothing items are priced lower than $79.90, and the other 50% are priced above. The maximum price of a clothing item amounts to $439.00. The sample contains 218 man clothing and 34 woman clothing. The range of the lowest price and highest price amounts 431.01.

library(psych)
describeBy(x = mydata$price,
           group = mydata$section)
## 
##  Descriptive statistics by group 
## group: MAN
##    vars   n  mean    sd median trimmed   mad  min max  range skew kurtosis   se
## X1    1 218 91.82 53.01   89.9   86.99 29.65 9.99 439 429.01 2.34    10.91 3.59
## ------------------------------------------------------------ 
## group: WOMAN
##    vars  n  mean    sd median trimmed  mad  min max  range skew kurtosis   se
## X1    1 34 50.53 25.25   48.9    48.4 4.45 7.99 169 161.01 2.79    11.83 4.33

On average man clothing $91.82 are more expensive than woman clothing $50.53

3 Analysis (7.5 points)

3.1 Determine which statistical test to use and why (1 points)

Independent samples T-test, because we want to compare the mean of two groups. Woman clothing and man clothing are in different groups independent from each other. H0: Mean man clothing price = Mean woman clothing price H1: Mean man clothing price ≠ Mean woman clothing price

3.2 Evaluate all assumptions (1.5 points)
  • Variable is numeric
  • Normal distribution
# Draw a plot to check the normality
library(ggplot2)
## 
## Attaching package: 'ggplot2'
## The following objects are masked from 'package:psych':
## 
##     %+%, alpha
ggplot(mydata[mydata$section == "WOMAN",], aes(x= price))+
          theme_linedraw()+
          geom_bar(fill = "darkred")+
          ylab("Frequency")+
          ggtitle("Woman clothing")

ggplot(mydata[mydata$section == "MAN",], aes(x= price))+
          theme_linedraw()+
          geom_bar(fill = "darkblue")+
          ylab("Frequency")+
          ggtitle("Man clothing")

# Check with Shapiro test
#shapiro.test(mydata$price[mydata$section == "MAN"])
#shapiro.test(mydata$price[mydata$section == "WOMAN"])    

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(rstatix)
## 
## Attaching package: 'rstatix'
## The following object is masked from 'package:stats':
## 
##     filter
mydata %>%
  group_by(section) %>%
  shapiro_test(price)
## # A tibble: 2 × 4
##   section variable statistic        p
##   <fct>   <chr>        <dbl>    <dbl>
## 1 MAN     price        0.829 9.51e-15
## 2 WOMAN   price        0.662 1.33e- 7

The dependent variable is numeric. The histogram shows a few outliners. Both Shapiro-tests suggest that the prices are not normally distributed (p<0.001).

3.3 Perform the appropriate statistical test based on the results of the assumption evaluation and its interpretation (2.5 points)

The condition normality distribution is not met, therefore a non-parametric will be used. Instead of independent t-test the Wilcoxon Rank Sum Test will be used.

wilcox.test(mydata$price ~ mydata$section,
            paired = FALSE,
            correct = FALSE,
            exact = FALSE,
            alternative = "two.sided")
## 
##  Wilcoxon rank sum test
## 
## data:  mydata$price by mydata$section
## W = 5996, p-value = 5.917e-09
## alternative hypothesis: true location shift is not equal to 0

The Wilcoxon rank sum test, shows a significant result (p<0.001).The null hypothesis is rejected. This indicates that true location shift is not equal to 0. Woman and man clothings are priced differently.

3.4 Calculation of the effect size and its interpretation (2.5 points)
#install.packages("effectsize")
library(effectsize)
## 
## Attaching package: 'effectsize'
## The following objects are masked from 'package:rstatix':
## 
##     cohens_d, eta_squared
## The following object is masked from 'package:psych':
## 
##     phi
effectsize(wilcox.test(mydata$price ~ mydata$section),
           paired = FALSE,
           correct = FALSE,
           exact = FALSE,
           alternative ="two.sided")
## r (rank biserial) |       95% CI
## --------------------------------
## 0.62              | [0.47, 0.73]
interpret_rank_biserial(0.62)
## [1] "very large"
## (Rules: funder2019)

The effect size shows that there is a very large difference between the distributions.

4 Conclusion (2.5 points)

Clear answer to your research question based on the results of the statistical test performed (2.5 points)

Are there differences in pricing between women’s and men’s clothing items at Zara? Wilcoxon Rank Sum Test suggests that the pricing between women’s and men’s clothing are significantly different with a very large effect (0.62). The results suggest that men’s clothing are more expensive at Zara than women’s clothing.