In my opinion, I thoroughly considered the data and the most effective way to communicate its information when making a number of design decisions for the graph:
Variable Selection: The natural logarithm of “mortality_rate” and “year” were my two key choices for the graph’s variables. The data summary demonstrated the wide range of values for mortality rates, which led to the decision to utilize the natural logarithm of “mortality_rate”. We can more effectively graphically illustrate relative differences by using the logarithm.
Categorical Encoding: I used the “shape” and “color” aesthetics to represent the “age_group” and “income_group” categories, respectively, in order to accommodate additional dimensions of the data. The structure of the data, which showed that these factors significantly affect death rates, served as the basis for this choice.
Interactive Features: To improve the viewing experience, I decided to use Plotly to create an interactive plot. Users can access specific information, including the year, income group, and death rate, by hovering over data points in this interactive way, all without being overloaded with information. In order to balance data richness with user-friendliness, this decision was made.
Width Adjustment: I made the plot’s width 800 pixels in order to make sure the graph is readable and visible in the stitched file output. This facilitates viewers’ exploration of the data and lessens visual clutter.
In terms of expressing all the information included in the data, the graph successfully conveys several aspects:
Time Trends: You may see how death rates vary over time by charting “year” on the x-axis. This is especially important for comprehending how death rates have changed throughout time.
Group Comparisons: It is easy to distinguish and compare the “age_group” and “income_group” groups thanks to the usage of distinct forms and colors. Trends and differences between these groupings are easily discernible to viewers.
The natural logarithm of “mortality_rate” is used in logarithmic scaling to condense the data range and facilitate the observation of variances throughout the dataset.
However, there is still need for improvement in order to gain a more thorough comprehension of the data. With the use of facets to separate the data by “age_group” and “income_group,” error bars to show uncertainty, and summary statistics to give a more comprehensive picture of the distribution of the data, we can produce a more intricate representation. The graph will be enhanced and given even more information with these extra components.
Code:
library(ggplot2)
library(plotly)
data <- read.csv(“feigin2014_table1_mortality.csv”)
interactive_plot <- ggplot(data, aes(x = year, y = log(mortality_rate), shape = age_group, color = income_group)) + geom_jitter() + labs(x = “Age Group”, y = “Mortality Rate”) + theme(axis.text.x = element_text(angle = 45, hjust = 1))+ ggtitle(“Interactive Plot of Mortality Rate”)
interactive_plotly <- ggplotly(interactive_plot, tooltip = c(“year”, “income_group”, “mortality_rate”)) %>% layout(width = 800)
interactive_plotly