This is a template file. The example included is not considered a good example to follow for Assignment 2. Remove this warning prior to submitting.

Click the Original, Code and Reconstruction tabs to read about the issues and how they were fixed.

Original


Source:Leanln Org and Mckinsey (2022).


Objective

Explain the objective of the original data visualisation and the targetted audience.

Objective of the original data visualization and target audience: The objective of the original data visualization appears to be to show the relationship between different variables (gender, age, and income) and the likelihood of experiencing stress. The target audience for this visualization could be researchers, policymakers, or individuals interested in understanding the factors that contribute to stress.

Color issues: The use of color in the visualization appears to be appropriate. The colors used are distinct and easy to differentiate, and the color legend is clearly visible.

Perceptive issues: The visualization uses a bar chart, which can be useful for showing the relationship between two variables. However, in this case, the use of this plot makes it difficult to see the relationship between all variables.

Data source issues: It is unclear where the data for this visualization was sourced from, which could be problematic because it could impact the reliability and accuracy of the information presented. Additionally, there is no information provided on the sample size or the representatives of the sample, which could make it difficult to draw meaningful conclusions from the data.

Reference

Code

The following code was used to fix the issues identified in the original.

# load required packages
library(ggplot2)
library(dplyr)
library(tidyr)

# create a data frame with the provided data
df <- data.frame(
  category = c("Entry level", "Manager", "Senior", "Vice President", "Senior Vice President", "C-Suite"),
  white_women = c(29, 27, 26, 24, 23, 21),
  women_of_color = c(19, 14, 10, 8, 6, 6),
  total_women_2022 = c(48, 41, 36, 32, 29, 27),
  total_women_2017 = c(47, 37, 33, 29, 21, 20),
  increase = c(1, 4, 3, 3, 8, 7)
)

# reshape the data to a long format
df_long <- df %>%
  pivot_longer(cols = c("white_women", "women_of_color"), names_to = "race", values_to = "count")

# create a bar chart of the count of white women and women of color in each leadership role
p1 <- ggplot(df_long, aes(x = category, y = count, fill = race)) +
  geom_col(position = "stack") +
  scale_y_continuous(limits = c(0, 40), expand = c(0, 0)) +
  labs(x = "Leadership Role", y = "Number of Women") +
  ggtitle("Number of Women by Leadership Role and Race/Ethnicity") +
  theme_bw()

Data Reference

Reconstruction

The following plot fixes the main issues in the original.