Assignment 2: Conceptual Plots

Author
Affiliation

Parsa Keyvani

Georgetown University

Data Question

Question

At what age does the population for both males and females combined drop below 3,000 (in thousands) in the year 1900 and 2000?

Part 1 - Developing Your Conceptual Drawing

Code
# importing the dataset
us_pop_data <- read_csv("/Users/parsakeyvani/Desktop/Adv Data viz/Assignments /spring-2024-a2-implementation-and-hats-keyvanip/us-pop.csv")

# Manipulating the data to answer my data question
manipulated_data <-us_pop_data %>%
  mutate(yr_1900 = Male1900 + Female1900,
         yr_2000 = Male2000 + Female2000) %>%
  select(Age, yr_1900,yr_2000)

# Importing my custom theme
my_theme <- theme_bw() +
  theme(
    plot.background = element_rect(fill = "white"),
    panel.background = element_rect(fill = "white"),
    axis.title = element_text(size = 12, face = "bold"),
    title = element_text(size = 12, face = "bold"),
    panel.grid.major = element_blank(),  
    panel.grid.minor = element_blank()  
  )

Plot 1

Hand-drawn Version

Software Version

Code
# Plotting my first hand-drawn graph
ggplot(manipulated_data, aes(x= Age)) + 
  geom_line(aes(y= yr_1900), color= "grey") +
  geom_line(aes(y= yr_2000), , color= "blue") + 
  labs(title = "U.S. Population (in thousands) in 1900 and 2000",
       x= "Age", 
       y = "Population (thousands)") + 
  my_theme +
  annotate("text", x = 5, y = 10000, label = "Year 1900", hjust = 1.2, color = "grey", size= 2.5) +
  annotate("text", x = 5, y = 20500, label = "Year 2000", hjust = 1.2, color = "blue", size= 2.5) +
  annotate("rect", xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = 3000, fill = "red", alpha = 0.2) +
  annotate("text", x = 10, y = 1000, label = "Range of Concern (y <= 3000)", size = 3, angle = 0, color = "red") +
  annotate("pointrange", x = 50, y = 3000, ymin = -Inf, ymax = 3000, colour = "red", size = 0.5, linewidth = 0.5) +
  annotate("pointrange", x = 85, y = 2950, ymin = -Inf, ymax = 3000, colour = "red", size = 0.5, linewidth = 0.5) +
  annotate("text", x = 55, y = 5000, label = "Age = 50\nPopulation = 3000", hjust = 1.2, color = "grey", size= 3) +
  annotate("text", x = 90, y = 5000, label = "Age = 85\nPopulation = 2950", hjust = 1.2, color = "blue", size= 3) 

Plot 2

Hand-drawn Version

Software Version

Code
# Converting the data from wide to long format
long_data <-manipulated_data %>%
  pivot_longer(cols = yr_1900:yr_2000, names_to = "Year", values_to = "Value")

# Plotting the second graph
ggplot(long_data, aes(x= Age, y=Value, fill = Year)) + 
  geom_col(position = "dodge") +
  labs(title = "U.S. Population (in thousands) in 1900 and 2000",
       x= "Age", 
       y = "Population (thousands)") + 
  scale_fill_manual(values = c("yr_1900" = "grey", "yr_2000" = "lightblue")) +
  #theme_classic() +
  my_theme + 
  annotate("rect", xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = 3000, fill = "red", alpha = 0.2) +
  annotate("text", x = 10, y = 1000, label = "Range of Concern (y <= 3000)", size = 3, angle = 0, color = "red") +
  annotate("pointrange", x = 50, y = 3000, ymin = -Inf, ymax = 3000, colour = "red", size = 0.5, linewidth = 0.5) +
  annotate("pointrange", x = 85, y = 2950, ymin = -Inf, ymax = 3000, colour = "red", size = 0.5, linewidth = 0.5) +
  annotate("text", x = 55, y = 5000, label = "Age = 50\nPopulation = 3000", hjust = 1, color = "darkgrey", size= 3) +
  annotate("text", x = 90, y = 5000, label = "Age = 85\nPopulation = 2950", hjust = 1, color = "blue", size= 3) 

Plot 3

Hand-drawn Version

Software Version

Code
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches

# Plotting
plt.figure(figsize=(10, 6))
sns.lineplot(x='Age', y='yr_1900', data=manipulated_data, color="grey")
plt.fill_between(manipulated_data['Age'], manipulated_data['yr_1900'], color="grey", alpha=0.5)
sns.lineplot(x='Age', y='yr_2000', data=manipulated_data, color="blue")
plt.fill_between(manipulated_data['Age'], manipulated_data['yr_2000'], color="blue", alpha=0.5)

# Adding annotations
plt.text(5, 10000, "Year 1900", horizontalalignment='right', color="grey", fontsize=10)
plt.text(5, 20500, "Year 2000", horizontalalignment='right', color="blue", fontsize=10)
plt.text(10, 1000, "Range of Concern (y <= 3000)", color="red", fontsize=12)
plt.text(55, 5000, "Age = 50\nPopulation = 3000", horizontalalignment='right', color="grey", fontsize=10)
plt.text(90, 5000, "Age = 85\nPopulation = 2950", horizontalalignment='right', color="blue", fontsize=10)
plt.axhspan(0, 3000, color='red', alpha=0.2)
plt.plot([50, 50], [0, 3000], color="red", lw=0.5)
plt.plot([85, 85], [0, 2950], color="red", lw=0.5)

# Setting labels and title
plt.xlabel("Age")
plt.ylabel("Population (thousands)")
plt.title("U.S. Population (in thousands) in 1900 and 2000")

plt.show()

A Short Writeup

Transforming the sketches of plots into R and Python visualizations took significantly longer than hand-drawing them. However, this process led to valuable insights. I noticed potential improvements in the sketches, which I then implemented in the software versions. This included enhancements in plot annotations. Additionally, I encountered unexpected aspects. For example, in the third graph, I had envisioned using two distinct colors, as in my sketch. However, while using the software, I realized that the larger blue section (representing the Year 2000) overlapped the grey area. Consequently, the colors in the software-rendered plot differed from those in the hand-drawn version. Overall, this experience was enlightening. It taught me that hand-sketching before employing software is immensely beneficial. It allows for the identification and rectification of flaws in the hand-drawn versions, thereby enhancing the final software-generated plots.