Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source: New York City Department of Consumer Affairs (2015).


Objective

Target audience for this visualisation is the people of US especially those who pay more taxes on products that are categorised to be used by women and labelled as pink taxpayers.

The objective of this visualisation is to show that there are certain items where women spend more than men on an average and most of these items fall under the Pink tax category in the US. These items are believed to have more tax deductions as compared to other items and most of the products affected are pink, hence referred to as pink tax. The main aim of this visualisation is to understand how females face higher prices of similar products when compared to Males in the US to raise awareness about gender pricing.

The visualisation chosen had the following three main issues:

  • AREA USED FOR DEPICTING NUMBERS - We do not see the area to represent actual numbers, thus this visualisation lacks accuracy. Also, as an example, for Razor Cartridges with average price $17.3 for women and Razors with average price $8.9 we see that Razors do not occupy half of the area whereas $8.9 is nearly half of $17.3. It is very difficult to determine numbers when depicted through area.

  • CLUTTERING - The bottom part of the visualisation is cluttered, and it is becomes difficult to understand the flow.

  • LEGENDS FOR TOTALS - The visualisation does have legends to show the percentage difference of products between men and women for all products, but it does not consider the product groups for the same process. This gives inconsistency to the graph and does not give us any information on the percentage difference for overall product groups as they are included in the data as well.

Reference

Code

The following code was used to fix the issues identified in the original.

library(readxl)
library(ggplot2)
library(tidyr)
library(dplyr)

data <- read_excel("C:/Users/Akshay/Desktop/ria/Data Source.xlsx")

head(data)
## # A tibble: 6 x 7
##   Products `Product Group` `Number of Prod~ `Women’s Averag~ `Men’s Average`
##   <chr>    <chr>                      <dbl>            <dbl>           <dbl>
## 1 Shampoo~ Personal Care ~               16             8.39            5.68
## 2 Underwe~ Adult Clothing                40             8.46           10.9 
## 3 Persona~ Senior/Home He~               12            11.3             9.32
## 4 Shirts   Adult Clothing                40            29.2            25.5 
## 5 Support~ Senior/Home He~               22            37.2            32.4 
## 6 Dress S~ Adult Clothing                40            58.1            51.5 
## # ... with 2 more variables: `Price Difference` <dbl>, `Percent
## #   Difference` <dbl>
sum_df <- data.frame(AvgPrice = c(307.38, 57.18, 140.46, 222.43, 285.85, 50.75, 130.08, 207.51),
                     Gender = c('Female', 'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Male'),
                     Product_groups = c('Adult Clothing','Personal Care Products','Senior/Home Health Care Products','Toys and Accessories', 'Adult Clothing','Personal Care Products','Senior/Home Health Care Products','Toys and Accessories'),
                     Percent_diff_groups = c(8, 13, 8, 7))


products <- data.frame(
Product_group = c('Personal Care Products', 'Adult Clothing',   'Senior/Home Health Care Products', 'Adult Clothing',   'Senior/Home Health Care Products', 'Adult Clothing',   'Toys and Accessories', 'Senior/Home Health Care Products', 'Personal Care Products',   'Personal Care Products',   'Personal Care Products',   'Toys and Accessories', 'Adult Clothing',   'Toys and Accessories', 'Toys and Accessories', 'Adult Clothing',   'Adult Clothing',   'Personal Care Products',   'Toys and Accessories', 'Senior/Home Health Care Products', 'Personal Care Products',   'Senior/Home Health Care Products', 'Adult Clothing',   'Personal Care Products',   'Senior/Home Health Care Products', 'Toys and Accessories', 'Personal Care Products',   'Adult Clothing',   'Senior/Home Health Care Products', 'Adult Clothing',   'Senior/Home Health Care Products', 'Adult Clothing',   'Toys and Accessories', 'Senior/Home Health Care Products', 'Personal Care Products',   'Personal Care Products',   'Personal Care Products',   'Toys and Accessories', 'Adult Clothing',   'Toys and Accessories', 'Toys and Accessories', 'Adult Clothing',   'Adult Clothing',   'Personal Care Products',   'Toys and Accessories', 'Senior/Home Health Care Products', 'Personal Care Products',   'Senior/Home Health Care Products', 'Adult Clothing',   'Personal Care Products',   'Senior/Home Health Care Products', 'Toys and Accessories'),

Products = c('Shampoo and Conditioner (Hair Care)', 'Underwear',    'Personal Urinals', 'Shirts',   'Supports and Braces',  'Dress Shirts', 'Helmets and Pads', 'Canes',    'Lotion',   'Razor Cartridges', 'Razors',   'General Toys', 'Jeans',    'Preschool Toys',   'Arts and Crafts',  'Dress Pants',  'Sweaters', 'Body Wash',    'Bikes and Scooters',   'Digestive Health', 'Shaving Cream',    'Compression Socks',    'Socks',    'Deodorant',    'Adult Diapers',    'Backpacks',
'Shampoo and Conditioner (Hair Care)',  'Underwear',    'Personal Urinals', 'Shirts',   'Supports and Braces',  'Dress Shirts', 'Helmets and Pads', 'Canes',    'Lotion',   'Razor Cartridges', 'Razors',   'General Toys', 'Jeans',    'Preschool Toys',   'Arts and Crafts',  'Dress Pants',  'Sweaters', 'Body Wash',    'Bikes and Scooters',   'Digestive Health', 'Shaving Cream',    'Compression Socks',    'Socks',    'Deodorant',    'Adult Diapers',    'Backpacks'),

Gender = c('Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Male'),

AvgPrice = c(8.39,  8.46,   11.32,  29.23,  37.17,  58.11,  25.79,  21.99,  8.25,   17.3,   8.9,    29.49,  62.75,  21.65,  32.79,  75.66,  63.19,  5.7,    86.72,  9.41,   3.73,   27.86,  9.98,   4.91,   32.71,  25.99, 5.68,    10.9,   9.32,   25.51,  32.43,  51.46,  22.89,  19.66,  7.43,   15.61,  7.99,   26.49,  57.09,  19.85,  30.59,  71.71,  59.45,  5.4,    81.9,   9.84,   3.89,   26.77,  9.73,   4.75,   32.06,  25.79))



p <- ggplot(sum_df, aes(x = Product_groups, y = AvgPrice)) +
     geom_bar(aes(color = Gender, fill = Gender),stat = "identity", position = position_dodge(0.8),width = 0.7) +
     theme_minimal()+
     scale_x_discrete(limits = c("Adult Clothing", "Personal Care Products", "Senior/Home Health Care Products", 
                      "Toys and Accessories"),
                      labels = c("Adult\nClothing", "Personal Care\nProducts", "Senior/ Home Health\nCare Products",                        "Toys and\nAccessories"),name = "Product Groups") +
     labs(title = "Average Price for Product Groups by Gender",
       caption = "Data Source: New York City Department of Consumer Affairs 2015",
       y = "Average Price ($)")+
     theme(plot.title = element_text(hjust = 0.5))+
     scale_color_manual(values = c("#F778A1", "#6698FF")) +
     scale_fill_manual(values = c("#F778A1", "#6698FF"))


# plot 1
p1 <- p + geom_text(
      aes(label =AvgPrice, group = Gender), 
      position = position_dodge(0.8),
      vjust = -0.2, size = 3.5)

# plot 2
p2 <- ggplot(data = products, aes(x = Products, y = AvgPrice)) +
      geom_bar(aes(color = Gender, fill = Gender),stat = "identity", position = position_dodge(0.8),width = 0.65) +        coord_flip() + theme_minimal()+
      scale_x_discrete(limits = c('Shampoo and Conditioner (Hair Care)',    'Underwear',    'Personal Urinals', 'Shirts', 'Supports and Braces',    'Dress Shirts', 'Helmets and Pads', 'Canes',    'Lotion',   'Razor Cartridges', 'Razors',          'General Toys',  'Jeans',    'Preschool Toys',   'Arts and Crafts',  'Dress Pants',  'Sweaters', 'Body Wash',              'Bikes and Scooters', 'Digestive Health', 'Shaving Cream',    'Compression Socks',    'Socks',    'Deodorant',               'Adult Diapers', 'Backpacks'),
       labels = c('Shampoo and Conditioner (Hair Care)',    'Underwear',    'Personal Urinals', 'Shirts',   'Supports and Braces',  'Dress Shirts', 'Helmets and Pads', 'Canes',    'Lotion',   'Razor Cartridges', 'Razors',   'General Toys', 'Jeans',    'Preschool Toys',   'Arts and Crafts',  'Dress Pants',  'Sweaters', 'Body Wash',    'Bikes and Scooters',   'Digestive Health', 'Shaving Cream',    'Compression Socks',    'Socks',    'Deodorant',    'Adult Diapers',    'Backpacks'),name = "Products") +
       labs(title = "Average Price for Products by Gender",
       caption = "Data Source: New York City Department of Consumer Affairs 2015",
       y = "Average Price ($)")+
       scale_color_manual(values = c("#F778A1", "#6698FF")) +
       scale_fill_manual(values = c("#F778A1", "#6698FF"))


# plot 3
p3 <- ggplot(data=data, aes(x=Products, y=`Percent Difference`)) +
geom_bar(position="identity",stat="identity",fill="#F778A1" )+ theme_minimal()+
geom_text(aes(label=`Percent Difference`),vjust=-0.25,size=2.5)+
labs(title ="Price Difference(%) by which Female Product Groups and Products\n are More Expensive than Male", x = "Products Groups and Products",
y = "Price Difference (%)", caption = "Data Source: New York City Department of Consumer Affairs 2015") + theme(axis.text = element_text(angle=45,hjust = 1), plot.title = element_text(hjust = 0.5))+
scale_fill_manual(values=c("#F778A1", "#6698FF"))  

Data Reference

Reconstruction

The following plots fix the main issues in the original.