At what age does the population for both males and females combined drop below 3,000 (in thousands) in the year 1900 and 2000?
Part 1 - Developing Your Conceptual Drawing
Code
# importing the datasetus_pop_data <-read_csv("/Users/parsakeyvani/Desktop/Adv Data viz/Assignments /spring-2024-a2-implementation-and-hats-keyvanip/us-pop.csv")# Manipulating the data to answer my data questionmanipulated_data <-us_pop_data %>%mutate(yr_1900 = Male1900 + Female1900,yr_2000 = Male2000 + Female2000) %>%select(Age, yr_1900,yr_2000)# Importing my custom thememy_theme <-theme_bw() +theme(plot.background =element_rect(fill ="white"),panel.background =element_rect(fill ="white"),axis.title =element_text(size =12, face ="bold"),title =element_text(size =12, face ="bold"),panel.grid.major =element_blank(), panel.grid.minor =element_blank() )
Plot 1
Hand-drawn Version
Software Version
Code
# Plotting my first hand-drawn graphggplot(manipulated_data, aes(x= Age)) +geom_line(aes(y= yr_1900), color="grey") +geom_line(aes(y= yr_2000), , color="blue") +labs(title ="U.S. Population (in thousands) in 1900 and 2000",x="Age", y ="Population (thousands)") + my_theme +annotate("text", x =5, y =10000, label ="Year 1900", hjust =1.2, color ="grey", size=2.5) +annotate("text", x =5, y =20500, label ="Year 2000", hjust =1.2, color ="blue", size=2.5) +annotate("rect", xmin =-Inf, xmax =Inf, ymin =-Inf, ymax =3000, fill ="red", alpha =0.2) +annotate("text", x =10, y =1000, label ="Range of Concern (y <= 3000)", size =3, angle =0, color ="red") +annotate("pointrange", x =50, y =3000, ymin =-Inf, ymax =3000, colour ="red", size =0.5, linewidth =0.5) +annotate("pointrange", x =85, y =2950, ymin =-Inf, ymax =3000, colour ="red", size =0.5, linewidth =0.5) +annotate("text", x =55, y =5000, label ="Age = 50\nPopulation = 3000", hjust =1.2, color ="grey", size=3) +annotate("text", x =90, y =5000, label ="Age = 85\nPopulation = 2950", hjust =1.2, color ="blue", size=3)
Plot 2
Hand-drawn Version
Software Version
Code
# Converting the data from wide to long formatlong_data <-manipulated_data %>%pivot_longer(cols = yr_1900:yr_2000, names_to ="Year", values_to ="Value")# Plotting the second graphggplot(long_data, aes(x= Age, y=Value, fill = Year)) +geom_col(position ="dodge") +labs(title ="U.S. Population (in thousands) in 1900 and 2000",x="Age", y ="Population (thousands)") +scale_fill_manual(values =c("yr_1900"="grey", "yr_2000"="lightblue")) +#theme_classic() + my_theme +annotate("rect", xmin =-Inf, xmax =Inf, ymin =-Inf, ymax =3000, fill ="red", alpha =0.2) +annotate("text", x =10, y =1000, label ="Range of Concern (y <= 3000)", size =3, angle =0, color ="red") +annotate("pointrange", x =50, y =3000, ymin =-Inf, ymax =3000, colour ="red", size =0.5, linewidth =0.5) +annotate("pointrange", x =85, y =2950, ymin =-Inf, ymax =3000, colour ="red", size =0.5, linewidth =0.5) +annotate("text", x =55, y =5000, label ="Age = 50\nPopulation = 3000", hjust =1, color ="darkgrey", size=3) +annotate("text", x =90, y =5000, label ="Age = 85\nPopulation = 2950", hjust =1, color ="blue", size=3)
Plot 3
Hand-drawn Version
Software Version
Code
import seaborn as snsimport matplotlib.pyplot as pltimport matplotlib.patches as mpatches# Plottingplt.figure(figsize=(10, 6))sns.lineplot(x='Age', y='yr_1900', data=manipulated_data, color="grey")plt.fill_between(manipulated_data['Age'], manipulated_data['yr_1900'], color="grey", alpha=0.5)sns.lineplot(x='Age', y='yr_2000', data=manipulated_data, color="blue")plt.fill_between(manipulated_data['Age'], manipulated_data['yr_2000'], color="blue", alpha=0.5)# Adding annotationsplt.text(5, 10000, "Year 1900", horizontalalignment='right', color="grey", fontsize=10)plt.text(5, 20500, "Year 2000", horizontalalignment='right', color="blue", fontsize=10)plt.text(10, 1000, "Range of Concern (y <= 3000)", color="red", fontsize=12)plt.text(55, 5000, "Age = 50\nPopulation = 3000", horizontalalignment='right', color="grey", fontsize=10)plt.text(90, 5000, "Age = 85\nPopulation = 2950", horizontalalignment='right', color="blue", fontsize=10)plt.axhspan(0, 3000, color='red', alpha=0.2)plt.plot([50, 50], [0, 3000], color="red", lw=0.5)plt.plot([85, 85], [0, 2950], color="red", lw=0.5)# Setting labels and titleplt.xlabel("Age")plt.ylabel("Population (thousands)")plt.title("U.S. Population (in thousands) in 1900 and 2000")plt.show()
A Short Writeup
Transforming the sketches of plots into R and Python visualizations took significantly longer than hand-drawing them. However, this process led to valuable insights. I noticed potential improvements in the sketches, which I then implemented in the software versions. This included enhancements in plot annotations. Additionally, I encountered unexpected aspects. For example, in the third graph, I had envisioned using two distinct colors, as in my sketch. However, while using the software, I realized that the larger blue section (representing the Year 2000) overlapped the grey area. Consequently, the colors in the software-rendered plot differed from those in the hand-drawn version. Overall, this experience was enlightening. It taught me that hand-sketching before employing software is immensely beneficial. It allows for the identification and rectification of flaws in the hand-drawn versions, thereby enhancing the final software-generated plots.