ETW2001 Assessment 1

Author

Iris Pok 35019085

Section A

#load libraries 
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
library(tidyr)
library(reshape2)

Attaching package: 'reshape2'

The following object is masked from 'package:tidyr':

    smiths
# Set the working directory
setwd("C:\\Users\\USER\\OneDrive\\Desktop\\Monash\\ETW2001")

# Import the CSV file
bigbasket_data<- read.csv("bigbasket.csv", header = TRUE, stringsAsFactors = FALSE)
head(bigbasket_data)
  index                                                 product
1     1                  Garlic Oil - Vegetarian Capsule 500 mg
2     2                                   Water Bottle - Orange
3     3                          Brass Angle Deep - Plain, No.2
4     4 Cereal Flip Lid Container/Storage Jar - Assorted Colour
5     5                      Creme Soft Soap - For Hands & Body
6     6                       Germ - Removal Multipurpose Wipes
                category          sub_category             brand sale_price
1       Beauty & Hygiene             Hair Care Sri Sri Ayurveda         220
2 Kitchen, Garden & Pets Storage & Accessories        Mastercook        180
3   Cleaning & Household           Pooja Needs               Trm        119
4   Cleaning & Household  Bins & Bathroom Ware            Nakoda        149
5       Beauty & Hygiene      Bath & Hand Wash             Nivea        162
6   Cleaning & Household  All Purpose Cleaners    Nature Protect        169
  market_price                          type rating
1          220              Hair Oil & Serum    4.1
2          180        Water & Fridge Bottles    2.3
3          250               Lamp & Lamp Oil    3.4
4          176      Laundry, Storage Baskets    3.7
5          162          Bathing Bars & Soaps    4.4
6          199 Disinfectant Spray & Cleaners    3.3
view(bigbasket_data)

Question 1 & 2 : Data Cleaning

1. Handling missing values

Handling missing values correctly ensures complete and accurate data for analysis, leading to more reliable business insights. This is crucial because missing values can lead to incomplete or biased analyses, which in turn can misinform business decisions. For example, analyzing customer purchase patterns with missing data could lead to incorrect conclusions about customer behavior that leads to poor marketing or inventory decisions. By addressing missing data, we ensure that the data set is more accurate and reliable. In this dataset, the missing values are removed to ensure that the analysis is only made based on complete and meaningful records to avoid misinterpretation and distortion of the data that may potentially mask important variations. This helps to maintain the integrity of the dataset and allows analyst to focus on meaningful information rather than trying to account for gaps or inconsistencies.

# View data summary 
print("Summary before removing missing values: ")
[1] "Summary before removing missing values: "
summary(bigbasket_data)
     index         product            category         sub_category      
 Min.   :    1   Length:27561       Length:27561       Length:27561      
 1st Qu.: 6891   Class :character   Class :character   Class :character  
 Median :13781   Mode  :character   Mode  :character   Mode  :character  
 Mean   :13781                                                           
 3rd Qu.:20671                                                           
 Max.   :27555                                                           
                                                                         
    brand             sale_price        market_price       type          
 Length:27561       Min.   :    2.45   Min.   :    3   Length:27561      
 Class :character   1st Qu.:   95.00   1st Qu.:  100   Class :character  
 Mode  :character   Median :  190.00   Median :  220   Mode  :character  
                    Mean   :  322.49   Mean   :  382                     
                    3rd Qu.:  359.00   3rd Qu.:  425                     
                    Max.   :12500.00   Max.   :12500                     
                                                                         
     rating     
 Min.   :1.000  
 1st Qu.:3.700  
 Median :4.100  
 Mean   :3.943  
 3rd Qu.:4.300  
 Max.   :5.000  
 NA's   :8626   
# Determine the number of missing values in each column of the dataset.
colSums(is.na(bigbasket_data))
       index      product     category sub_category        brand   sale_price 
           0            0            0            0            0            0 
market_price         type       rating 
           0            0         8626 
# Remove rows with any missing values in the rating column.
bigbasket_data_clean <- bigbasket_data[!is.na(bigbasket_data$rating), ]

# Check the number of rows after the data is cleaned.
colSums(is.na(bigbasket_data_clean))
       index      product     category sub_category        brand   sale_price 
           0            0            0            0            0            0 
market_price         type       rating 
           0            0            0 
# Check data summary 
print("Summary after removing missing values: ")
[1] "Summary after removing missing values: "
summary(bigbasket_data_clean)
     index         product            category         sub_category      
 Min.   :    1   Length:18935       Length:18935       Length:18935      
 1st Qu.: 6868   Class :character   Class :character   Class :character  
 Median :13775   Mode  :character   Mode  :character   Mode  :character  
 Mean   :13798                                                           
 3rd Qu.:20679                                                           
 Max.   :27555                                                           
    brand             sale_price      market_price        type          
 Length:18935       Min.   :   3.0   Min.   :   3.0   Length:18935      
 Class :character   1st Qu.:  85.0   1st Qu.:  96.0   Class :character  
 Mode  :character   Median : 165.0   Median : 196.0   Mode  :character  
                    Mean   : 267.7   Mean   : 322.1                     
                    3rd Qu.: 306.0   3rd Qu.: 370.0                     
                    Max.   :6660.0   Max.   :7400.0                     
     rating     
 Min.   :1.000  
 1st Qu.:3.700  
 Median :4.100  
 Mean   :3.943  
 3rd Qu.:4.300  
 Max.   :5.000  
# View the first few rows of the cleaned dataset
head(bigbasket_data_clean)
  index                                                 product
1     1                  Garlic Oil - Vegetarian Capsule 500 mg
2     2                                   Water Bottle - Orange
3     3                          Brass Angle Deep - Plain, No.2
4     4 Cereal Flip Lid Container/Storage Jar - Assorted Colour
5     5                      Creme Soft Soap - For Hands & Body
6     6                       Germ - Removal Multipurpose Wipes
                category          sub_category             brand sale_price
1       Beauty & Hygiene             Hair Care Sri Sri Ayurveda         220
2 Kitchen, Garden & Pets Storage & Accessories        Mastercook        180
3   Cleaning & Household           Pooja Needs               Trm        119
4   Cleaning & Household  Bins & Bathroom Ware            Nakoda        149
5       Beauty & Hygiene      Bath & Hand Wash             Nivea        162
6   Cleaning & Household  All Purpose Cleaners    Nature Protect        169
  market_price                          type rating
1          220              Hair Oil & Serum    4.1
2          180        Water & Fridge Bottles    2.3
3          250               Lamp & Lamp Oil    3.4
4          176      Laundry, Storage Baskets    3.7
5          162          Bathing Bars & Soaps    4.4
6          199 Disinfectant Spray & Cleaners    3.3

Justification of the Output:

The initial summary shows that some columns in the rating category have missing values which can skew the analysis, as missing data can lead to inaccurate conclusions. The output of colSums(is.na(bigbasket_data)) will reveal which columns have missing values and how many. This count is essential for understanding the extent of the missing data problem in your dataset. After removing the missing values, the number of rows in the data set have decreased, as rows with missing rating values were removed and the data set will show no missing value in the rating column, as all rows with missing rating values are filtered out. The data set is now cleaner and more reliable for analysis. The head(bigbasket_data_clean) will display the first few rows of the cleaned dataset, confirming that no rows with missing rating values remain.

2. Data redundancy

Solving data redundancy helps to prevent skewed results by ensuring the analysis is based on unique data, making the insights more trustworthy. Data redundancy, such as duplicate records, can distort analysis results by over-representing certain observations. Eliminating redundant data ensures that the analysis is based on unique entries, leading to more accurate and trustworthy insights. In this dataset, redundancy might occur in product listings where the same item is entered multiple times. This may inflate the number of products and their ratings and distort the analysis of discount effectiveness, leading to incorrect conclusions about which products to discount further. Thus, duplicate rows in the dataset are identified and removed to improve data quality and enhance efficiency.

# Remove duplicate rows
bigbasket_data_clean_2 <- bigbasket_data_clean %>% distinct()

# Verify that duplicates have been removed
print(paste("Number of rows before removing duplicates: ", nrow(bigbasket_data_clean)))
[1] "Number of rows before removing duplicates:  18935"
print(paste("Number of rows after removing duplicates: ", nrow(bigbasket_data_clean_2))) 
[1] "Number of rows after removing duplicates:  18929"
# Check if there are any remaining duplicate rows
any(duplicated(bigbasket_data_clean_2))
[1] FALSE
# View the first few rows of the cleaned dataset
head(bigbasket_data_clean_2)
  index                                                 product
1     1                  Garlic Oil - Vegetarian Capsule 500 mg
2     2                                   Water Bottle - Orange
3     3                          Brass Angle Deep - Plain, No.2
4     4 Cereal Flip Lid Container/Storage Jar - Assorted Colour
5     5                      Creme Soft Soap - For Hands & Body
6     6                       Germ - Removal Multipurpose Wipes
                category          sub_category             brand sale_price
1       Beauty & Hygiene             Hair Care Sri Sri Ayurveda         220
2 Kitchen, Garden & Pets Storage & Accessories        Mastercook        180
3   Cleaning & Household           Pooja Needs               Trm        119
4   Cleaning & Household  Bins & Bathroom Ware            Nakoda        149
5       Beauty & Hygiene      Bath & Hand Wash             Nivea        162
6   Cleaning & Household  All Purpose Cleaners    Nature Protect        169
  market_price                          type rating
1          220              Hair Oil & Serum    4.1
2          180        Water & Fridge Bottles    2.3
3          250               Lamp & Lamp Oil    3.4
4          176      Laundry, Storage Baskets    3.7
5          162          Bathing Bars & Soaps    4.4
6          199 Disinfectant Spray & Cleaners    3.3

Justification of the Output:

The output shows that there is a reduction in rows where the number of rows after removing duplicates is less than before which indicates that duplicate rows were found and removed. This is a positive outcome, as it confirms that redundancy in the data has been addressed. The code any(duplicated(bigbasket_data_clean_2)) returns FALSE also confirms that no duplicate rows remain in the dataset. The dataset is now ready for analysis without the risk of redundancy affecting the results.

3. Removing outliers in prices

Removing the outliers provides a more accurate representation of the data and avoids misleading conclusions that could affect business strategy. Outliers can skew statistical analyses and affect the accuracy of models. By identifying and appropriately handling outliers, we ensure that the analysis reflects the typical trends in the data, leading to more reliable business insights. In this data, the outliers are detected and removed using the Interquartile Range (IQR) method. For instance, sale prices and market prices in this data that are far higher than the rest of the data is removed from the dataset.

# Calculate the IQR for sale_price and market_price
IQR_sale_price <- IQR(bigbasket_data_clean_2$sale_price, na.rm = TRUE)
IQR_market_price <- IQR(bigbasket_data_clean_2$market_price, na.rm = TRUE)

# Calculate the lower and upper bounds for outliers
lower_bound_sale_price <- quantile(bigbasket_data_clean_2$sale_price, 0.25, na.rm = TRUE) - (1.5 * IQR_sale_price)
upper_bound_sale_price <- quantile(bigbasket_data_clean_2$sale_price, 0.75, na.rm = TRUE) + (1.5 * IQR_sale_price)
lower_bound_market_price <- quantile(bigbasket_data_clean_2$market_price, 0.25, na.rm = TRUE) - 1.5 * IQR_market_price
upper_bound_market_price <- quantile(bigbasket_data_clean_2$market_price, 0.75, na.rm = TRUE) + 1.5 * IQR_market_price

# Remove outliers
bigbasket_data_no_outliers <- bigbasket_data_clean_2[as.numeric(bigbasket_data_clean_2$sale_price) >= as.numeric(lower_bound_sale_price) & as.numeric(bigbasket_data_clean_2$sale_price) <= as.numeric(upper_bound_sale_price) & as.numeric(bigbasket_data_clean_2$market_price) >= as.numeric(lower_bound_market_price) & as.numeric(bigbasket_data_clean_2$market_price) <= as.numeric(upper_bound_market_price), ]

# View the first few rows for the data with no outliers
head(bigbasket_data_no_outliers)
  index                                                 product
1     1                  Garlic Oil - Vegetarian Capsule 500 mg
2     2                                   Water Bottle - Orange
3     3                          Brass Angle Deep - Plain, No.2
4     4 Cereal Flip Lid Container/Storage Jar - Assorted Colour
5     5                      Creme Soft Soap - For Hands & Body
6     6                       Germ - Removal Multipurpose Wipes
                category          sub_category             brand sale_price
1       Beauty & Hygiene             Hair Care Sri Sri Ayurveda         220
2 Kitchen, Garden & Pets Storage & Accessories        Mastercook        180
3   Cleaning & Household           Pooja Needs               Trm        119
4   Cleaning & Household  Bins & Bathroom Ware            Nakoda        149
5       Beauty & Hygiene      Bath & Hand Wash             Nivea        162
6   Cleaning & Household  All Purpose Cleaners    Nature Protect        169
  market_price                          type rating
1          220              Hair Oil & Serum    4.1
2          180        Water & Fridge Bottles    2.3
3          250               Lamp & Lamp Oil    3.4
4          176      Laundry, Storage Baskets    3.7
5          162          Bathing Bars & Soaps    4.4
6          199 Disinfectant Spray & Cleaners    3.3
# Create boxplots for sale_price and market_price before removing outliers
boxplot(bigbasket_data_clean_2$sale_price, main = "Sale Price with Outliers", ylab = "Sale Price", col = "lightblue")

boxplot(bigbasket_data_clean_2$market_price, main = "Market Price with Outliers", ylab = "Market Price", col = "darkblue")

# Create boxplots for sale_price and market_price after removing outliers
boxplot(bigbasket_data_no_outliers$sale_price, main = "Sale Price without Outliers", ylab = "Sale Price", col = "lightblue",range = 0, ylim = c(min(bigbasket_data_no_outliers$sale_price),upper_bound_sale_price))

boxplot(bigbasket_data_no_outliers$market_price, main = "Market Price without Outliers", ylab = "Market Price", col = "darkblue",range = 0, ylim = c(min(bigbasket_data_no_outliers$market_price),upper_bound_market_price))

Justification of the Code:

Outliers can significantly distort statistical analyses by inflating measures like mean and variance, leading to incorrect conclusions. By calculating and applying the IQR-based method to remove outliers, the dataset is made more representative of the typical data points, thus enhancing the accuracy of subsequent analyses. The code ensures that only data within a reasonable range, defined by the IQR method, is retained. Boxplots are used for visualizing the distribution of data, particularly for identifying the spread and central tendency after outlier removal. By creating boxplots post-outlier removal, the code provides a visual check to ensure that the data is now free from extreme values, and the distribution appears more normalized. The chosen color schemes (lightblue and darkblue) help distinguish the two different price distributions. Comparing the boxplots before and after removing outliers shows a reduction in the spread of the data, confirming that extreme values have been successfully filtered out. This approach effectively cleans the dataset, ensuring that the analysis will be based on more consistent and reliable data. The use of boxplots provides a visual confirmation that the outliers have been appropriately removed, and the dataset is now ready for further analysis.

4. Creating new column

Creating a new column based on existing data can help with better segmentation or categorization and facilitates more detailed analysis, enabling better-targeted strategies and decisions, which is essential for targeted business strategies. This allows for more granular analysis and can provide deeper insights into different segments of the data. In this case, a “discount_percentage” column that shows the percentage of discount of the market price for different products is created to help in analyzing how discount percentage perform in terms of sales, customer preferences.

# Create the discount_percentage column
new_bigbasket_data <- bigbasket_data_no_outliers %>%
  mutate(discount_percentage = ((market_price - sale_price) / market_price) * 100)

# View the first few rows to check the new column
head(new_bigbasket_data)
  index                                                 product
1     1                  Garlic Oil - Vegetarian Capsule 500 mg
2     2                                   Water Bottle - Orange
3     3                          Brass Angle Deep - Plain, No.2
4     4 Cereal Flip Lid Container/Storage Jar - Assorted Colour
5     5                      Creme Soft Soap - For Hands & Body
6     6                       Germ - Removal Multipurpose Wipes
                category          sub_category             brand sale_price
1       Beauty & Hygiene             Hair Care Sri Sri Ayurveda         220
2 Kitchen, Garden & Pets Storage & Accessories        Mastercook        180
3   Cleaning & Household           Pooja Needs               Trm        119
4   Cleaning & Household  Bins & Bathroom Ware            Nakoda        149
5       Beauty & Hygiene      Bath & Hand Wash             Nivea        162
6   Cleaning & Household  All Purpose Cleaners    Nature Protect        169
  market_price                          type rating discount_percentage
1          220              Hair Oil & Serum    4.1             0.00000
2          180        Water & Fridge Bottles    2.3             0.00000
3          250               Lamp & Lamp Oil    3.4            52.40000
4          176      Laundry, Storage Baskets    3.7            15.34091
5          162          Bathing Bars & Soaps    4.4             0.00000
6          199 Disinfectant Spray & Cleaners    3.3            15.07538

Justification of the Code:

The “discount_percentage” column is crucial for analyzing pricing strategies, customer behavior, and marketing effectiveness. By quantifying how much of a discount is offered as a percentage of the market price, you can easily compare discounts across different products, regardless of their absolute prices. This code can be used to identify which products are heavily discounted and might attract more customers, as well as to understand the overall discounting strategy of the business.

Question 3 : Price variance analysis

1. Identify the top 100 products with the highest discount rate.

# Filter the top 100 products with the highest discount percentage
top_100_discounted_products <- new_bigbasket_data %>%
  arrange(desc(discount_percentage)) %>%
  head(100)

# Find the category with the most products in the top 100
top_category <- top_100_discounted_products %>%
  group_by(category) %>%
  summarise(count = n()) %>%
  arrange(desc(count))

# Display the category with the highest count
print(top_category)
# A tibble: 7 × 2
  category                 count
  <chr>                    <int>
1 Kitchen, Garden & Pets      67
2 Beauty & Hygiene            13
3 Cleaning & Household        11
4 Snacks & Branded Foods       4
5 Foodgrains, Oil & Masala     3
6 Beverages                    1
7 Gourmet & World Food         1

2. Which category does the majority of the top 100 products fall under?

The majority of the top 100 products fall under Kitchen, Garden and Pets.

3. Why do you think this category needs the highest discount rate compared to others?

The category needs the highest discount rate compared to others as it is likely to face higher competitive pressure as most of the products may be similar and have the same usage, so discounts are used to attract price-sensitive customers and outcompete other retailers. The discounts also help in rapid inventory turnover before the products become unsellable. Products in this category that are seasonal, such as gardening tools, which are more relevant during certain times of the year can use high discounts to clear out inventory before the season ends. Additionally, if the seller aims to establish a stronger presence in this category, aggressive discounting can drive market penetration by attracting new customers.

Question 4

As a business analyst, which product will you correlate with the category you found in (3) to increase the sales of the supermarket? Identify that and Explain why.

To boost sales, it’s important to choose a product that complements the items in the “Kitchen, Garden, and Pet” category. As a business analyst, I will correlate food and household products with the “Kitchen, Garden, and Pet” category. Items in the “Kitchen” segment, such as blenders, cookware, or storage containers, are directly tied to food preparation and storage. By correlating these with food products, I can create a natural pairing that meets the customers’ immediate needs. For example, a customer buying a blender is likely also interested in purchasing fruits, vegetables, or smoothie ingredients, which enhances the value of their purchase. Next, products like cleaning supplies, detergents, or storage solutions are essential for maintaining a well-organized and clean home. These household items naturally complement products in the “Kitchen, Garden, and Pet” category, especially when customers are purchasing kitchen appliances or garden tools that might require cleaning or organization. Thus, the food and household products will definitely correlate well with the “Kitchen, Garden, and Pet” category.

Question 5

Use the dataset produced in question 2. Write a conditional statement using the “rating” variable to classify the products into three categories: i) high, ii) average, and iii) poor and store the information in a new column in the same dataset.

# Initialize the new column
new_bigbasket_data$rating_category <- NA

# Categorize data based on rating
for (i in 1:nrow(new_bigbasket_data)) {
  if (new_bigbasket_data$rating[i] >= 4.5) {
    new_bigbasket_data$rating_category[i] <- "high"
  } else if (new_bigbasket_data$rating[i] >= 3.5 && new_bigbasket_data$rating[i] < 4.5) {
    new_bigbasket_data$rating_category[i] <- "average"
  } else {
    new_bigbasket_data$rating_category[i] <- "poor"
  }
}

# View the first few rows of the updated dataset
head(new_bigbasket_data)
  index                                                 product
1     1                  Garlic Oil - Vegetarian Capsule 500 mg
2     2                                   Water Bottle - Orange
3     3                          Brass Angle Deep - Plain, No.2
4     4 Cereal Flip Lid Container/Storage Jar - Assorted Colour
5     5                      Creme Soft Soap - For Hands & Body
6     6                       Germ - Removal Multipurpose Wipes
                category          sub_category             brand sale_price
1       Beauty & Hygiene             Hair Care Sri Sri Ayurveda         220
2 Kitchen, Garden & Pets Storage & Accessories        Mastercook        180
3   Cleaning & Household           Pooja Needs               Trm        119
4   Cleaning & Household  Bins & Bathroom Ware            Nakoda        149
5       Beauty & Hygiene      Bath & Hand Wash             Nivea        162
6   Cleaning & Household  All Purpose Cleaners    Nature Protect        169
  market_price                          type rating discount_percentage
1          220              Hair Oil & Serum    4.1             0.00000
2          180        Water & Fridge Bottles    2.3             0.00000
3          250               Lamp & Lamp Oil    3.4            52.40000
4          176      Laundry, Storage Baskets    3.7            15.34091
5          162          Bathing Bars & Soaps    4.4             0.00000
6          199 Disinfectant Spray & Cleaners    3.3            15.07538
  rating_category
1         average
2            poor
3            poor
4         average
5         average
6            poor

Section B

Question 1: List down three business strategies used by ZARA to be the number one fashion store worldwide.q

ZARA employs several effective business strategies to maintain its status as a leading fashion retailer:

  • ZARA uses fast fashion model and quickly designs, produces, and delivers new clothing collections, allowing the brand to respond rapidly to changing fashion trends and customer preferences. This strategy keeps the offerings fresh and encourages frequent visits from customers.

  • ZARA limits its production runs by producing limited quantities of each design to create a sense of exclusivity. This scarcity drives demand and prompts customers to purchase items quickly, fearing they might miss out.

  • ZARA also has strong supply chain management to ensure efficiency and speed. The company closely monitors inventory and employs a just-in-time production approach, allowing for quick restocking and reducing excess inventory costs.

Question 2: Assume that you are hired as a business analyst in ZARA to analyze their E-Commerce data. Your first task is to create a data frame and perform basic analysis using conditional statements.

Create a data frame with 5 columns, including productID (create integers from 1 – 20 by yourself), product name, category, subcategory, and price using the information from the website. You must create 20 observations.

# Creating sample data from ZARA's website)
ZARA_products <- data.frame(
  ProductID = 1:20,
  ProductName = c(
    "TAILORED BLAZER", "BLAZER WITH ROLLED-UP SLEEVES", "GODET MINI DRESS ZW COLLECTION", "DENIM MINI DRESS Z1975", "WASHED INTERLOCK TROUSERS",
    "ZW COLLECTION CUT-OUT WOOL TROUSERS", "BACKPACK WITH FLAP", "PUFFER JACKET", "ZW COLLECTION EMBROIDERED SHIRT WITH JEWELS", "RUFFLED BLOUSE WITH CONTRAST PIPING",
    "ABSTRACT PRINT SHIRT", "CHECK KNIT SHIRT", "BAGGY FIT JEANS", "WIDE-LEG JEANS WITH BELT", "SEERSUCKER BLAZER",
    "COMFORT SUIT BLAZER", "TIMELESZ - LINEN BLEND DIP-DYE TOP", "OVERSHIRT WITH EMBROIDERED POCKETS", "EMBROIDERED JOGGING TROUSERS", "BALLOON FIT TROUSERS WITH POCKET"
  ),
  Category = c(
    "Woman", "Woman", "Woman", "Woman", "Woman",
    "Woman", "Woman", "Woman", "Woman", "Woman",
    "Man", "Man", "Man", "Man", "Man",
    "Man", "Kids", "Kids", "Kids", "Kids"
  ),
  Subcategory = c(
    "Blazers", "Blazers", "Dresses", "Dresses", "Trousers",
    "Trousers", "Bags", "Jackets", "Shirts", "Shirts",
    "Shirts", "Shirts", "Jeans", "Jeans", "Blazers",
    "Blazers", "Shirts", "Shirts", "Trousers", "Trousers"
  ),
  Price = c(
    399.00, 269.00, 269.00, 249.00, 169.00,
    369.00, 189.00, 269.00, 369.00, 189.00,
    219.00, 249.00, 219.00, 219.00, 369.00,
    569.00, 129.00, 139.00, 89.90, 179.00
  )
)

# Print DataFrame
print("DataFrame:")
[1] "DataFrame:"
print(ZARA_products)
   ProductID                                 ProductName Category Subcategory
1          1                             TAILORED BLAZER    Woman     Blazers
2          2               BLAZER WITH ROLLED-UP SLEEVES    Woman     Blazers
3          3              GODET MINI DRESS ZW COLLECTION    Woman     Dresses
4          4                      DENIM MINI DRESS Z1975    Woman     Dresses
5          5                   WASHED INTERLOCK TROUSERS    Woman    Trousers
6          6         ZW COLLECTION CUT-OUT WOOL TROUSERS    Woman    Trousers
7          7                          BACKPACK WITH FLAP    Woman        Bags
8          8                               PUFFER JACKET    Woman     Jackets
9          9 ZW COLLECTION EMBROIDERED SHIRT WITH JEWELS    Woman      Shirts
10        10         RUFFLED BLOUSE WITH CONTRAST PIPING    Woman      Shirts
11        11                        ABSTRACT PRINT SHIRT      Man      Shirts
12        12                            CHECK KNIT SHIRT      Man      Shirts
13        13                             BAGGY FIT JEANS      Man       Jeans
14        14                    WIDE-LEG JEANS WITH BELT      Man       Jeans
15        15                           SEERSUCKER BLAZER      Man     Blazers
16        16                         COMFORT SUIT BLAZER      Man     Blazers
17        17          TIMELESZ - LINEN BLEND DIP-DYE TOP     Kids      Shirts
18        18          OVERSHIRT WITH EMBROIDERED POCKETS     Kids      Shirts
19        19                EMBROIDERED JOGGING TROUSERS     Kids    Trousers
20        20            BALLOON FIT TROUSERS WITH POCKET     Kids    Trousers
   Price
1  399.0
2  269.0
3  269.0
4  249.0
5  169.0
6  369.0
7  189.0
8  269.0
9  369.0
10 189.0
11 219.0
12 249.0
13 219.0
14 219.0
15 369.0
16 569.0
17 129.0
18 139.0
19  89.9
20 179.0

Write a conditional statement using the price of the products in your data frame as an independent variable and subcategory as your target variable. Identify the subcategories at the highest and lowest prices using the conditional statement.

# Initialize variables to store the highest and lowest prices and their corresponding subcategories
highest_price <- -Inf
lowest_price <- Inf
highest_price_subcategory <- NA
lowest_price_subcategory <- NA

# Loop through each row in the dataset
for (i in 1:nrow(ZARA_products)) {
  # Check for the highest price
  if (ZARA_products$Price[i] > highest_price) {
    highest_price <- ZARA_products$Price[i]
    highest_price_subcategory <- ZARA_products$Subcategory[i]
  }
  
  # Check for the lowest price
  if (ZARA_products$Price[i] < lowest_price) {
    lowest_price <- ZARA_products$Price[i]
    lowest_price_subcategory <- ZARA_products$Subcategory[i]
  }
}

# Print results
print(paste("The subcategory with the highest price is:", highest_price_subcategory))
[1] "The subcategory with the highest price is: Blazers"
print(paste("The subcategory with the lowest price is:", lowest_price_subcategory))
[1] "The subcategory with the lowest price is: Trousers"

Write a conditional statement to identify the products in Q1 and Q3 of the price range and explain the output.

# Calculate Q1 and Q3
Q1 <- quantile(ZARA_products$Price, 0.25, na.rm = TRUE)
Q3 <- quantile(ZARA_products$Price, 0.75, na.rm = TRUE)

# Identify products in Q1
products_Q1 <- ZARA_products %>%
  filter(Price <= Q1)

# Identify products in Q3
products_Q3 <- ZARA_products %>%
  filter(Price >= Q3)

# View results
print(paste("Products in the first quartile (Q1) with a price less than or equal to", Q1, ":"))
[1] "Products in the first quartile (Q1) with a price less than or equal to 186.5 :"
print(products_Q1)
  ProductID                        ProductName Category Subcategory Price
1         5          WASHED INTERLOCK TROUSERS    Woman    Trousers 169.0
2        17 TIMELESZ - LINEN BLEND DIP-DYE TOP     Kids      Shirts 129.0
3        18 OVERSHIRT WITH EMBROIDERED POCKETS     Kids      Shirts 139.0
4        19       EMBROIDERED JOGGING TROUSERS     Kids    Trousers  89.9
5        20   BALLOON FIT TROUSERS WITH POCKET     Kids    Trousers 179.0
print(paste("Products in the third quartile (Q3) with a price greater than or equal to", Q3, ":"))
[1] "Products in the third quartile (Q3) with a price greater than or equal to 294 :"
print(products_Q3)
  ProductID                                 ProductName Category Subcategory
1         1                             TAILORED BLAZER    Woman     Blazers
2         6         ZW COLLECTION CUT-OUT WOOL TROUSERS    Woman    Trousers
3         9 ZW COLLECTION EMBROIDERED SHIRT WITH JEWELS    Woman      Shirts
4        15                           SEERSUCKER BLAZER      Man     Blazers
5        16                         COMFORT SUIT BLAZER      Man     Blazers
  Price
1   399
2   369
3   369
4   369
5   569

Explanation of Output:

This code categorizes products into different price ranges, allowing for analysis of price distribution. Products in Q1 are the least expensive (bottom 25%), while products in Q3 are the most expensive (top 25%). The product in Q1 are generally more affordable and could be aimed at customers looking for budget-friendly options while the product in Q3 are generally more expensive and might be premium products, possibly offering higher quality, better materials, or exclusive designs. Any products not classified in Q1 or Q3 fall between Q1 and Q3.