This graph is taken from the website “Our World in Data” where it shows a line graph of percentage of the population using the internet. The x-axis shows the years from 1990 to 2022 while the y-axis presents the percentage of population using the internet. The data is divided into lines according to regions in the world, and the graph demonstrates the positive relationship between years and percentage of population using internet. One of the strengths of this graph is utilizing a line graph to represent the data instead of a scatter plot with lines-of-best-fit. This is beneficial as it shows the changes in percentage year to year instead of visualizing the overall trend. Additionally, the graph utilizes direct labels instead of legends, so it is easier to distinguish which line represents which region. One potential weakness of this graph is the great number of region categories that make it harder to process. Overall, the graph does a decent job in visualizing the data.
text_df <- df %>%
group_by(Entity) %>%
summarize(Year = max(Year),
it_net_user_zs = max(it_net_user_zs)) %>%
ungroup()
ggplot(df, aes(x = Year, y = it_net_user_zs, color = Entity)) +
geom_line() +
geom_point() +
geom_text_repel(data = text_df,
aes(label = Entity),
box.padding = 0.5,
max.overlaps = Inf,
direction = "y",
nudge_x = 12,
size = 3) +
labs(title= "Share of the population using the internet",
subtitle= "Share of the population who used the Internet in the last three months.",
y= "", x="") +
theme_light() +
theme(legend.position="none",
panel.grid.minor= element_blank(),
panel.grid.major.x = element_blank()) +
scale_y_continuous(breaks=c(0,20,40,60,80,100), labels=c("0%","20%","40%",
"60%","80%","100%"),
limits=c(0,103)) +
scale_x_continuous(limits=c(1990,2040), breaks=c(1990, 1995,
2000, 2005, 2010, 2015, 2022))
While the current graph has some strengths to it, there are layers that can be improved. Firstly, the original data uses colors that may not be the most accessible to color-blind individuals. Thus, we changed the color palette to the viridis scale that includes color-blind friendly colors. Secondly, we changed the shape of data points so it is easier to distinguish between different regions when lines overlap. Thirdly, to improve clarity of how the data is presented, we added x and y axis labels. Last but not least, we added vertical grid lines to the graph background to make it easier to trace the year of the data points.
library(tidyverse)
library(ggrepel)
text_df <- df %>%
group_by(Entity) %>%
summarize(Year = max(Year),
it_net_user_zs = max(it_net_user_zs)) %>%
ungroup()
ggplot(df, aes(x = Year, y = it_net_user_zs, color = Entity, shape=Entity)) +
geom_line() +
geom_point() +
geom_text_repel(data = text_df,
aes(label = Entity),
box.padding = 0.5,
max.overlaps = Inf,
direction = "y",
nudge_x = 12,
size = 3) +
labs(title= "Share of the population using the internet",
subtitle= "Share of the population who used the Internet in the last three months.",
y= "Population using the internet (%)", x="Year") +
scale_color_viridis_d() +
theme_light() +
theme(legend.position="none",
panel.grid.minor= element_blank()) +
scale_y_continuous(breaks=c(0,20,40,60,80,100), labels=c("0%","20%","40%",
"60%","80%","100%"),
limits=c(0,103)) +
scale_x_continuous(limits=c(1990,2040), breaks=c(1990, 1995,
2000, 2005, 2010, 2015, 2022))