library(gapminder)
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
data <- gapminder
#some data manipulation
#find averages of life expectancy and GDP per capita for each country and pop
by_country <- gapminder %>% group_by(country) %>%
summarise(avg_gdp = mean(gdpPercap), avg_life = mean(lifeExp), avg_pop = mean(pop))
#make sure it has all other data
by_country <- merge(by_country, data, by= "country") %>% distinct(country, .keep_all = TRUE)
#filter to just have certain columns
by_country <- by_country[-c(6,7,8,9)]
This initial plot is a basic scatter plot showing average life expectancy and average GDP per capita by country from 1952-2007. My goal is to use this data as a piece to what the ‘best’ place in the world is to live for my project, and GDP per capita and life expectancy are two factors I want to consider.
plot1 <- ggplot(by_country, aes(x= avg_gdp, y= avg_life)) + geom_point(shape=17) +
ggtitle("Average GDP per Capita vs Average Life Expectancy (1952-2007)")
plot1
I added a trend line as the aesthetic in this plot. There seems to be is a pattern with each point here, even if it is not a linear relationship. I want to apply a trend line of the points to determine if there is a calculable relationship and if this can give me more insight to the data and potential existing trends between GDP and life expectancy.
plot2 <- ggplot(by_country, aes(x= avg_gdp, y= avg_life)) + geom_point(shape=17) +
geom_smooth(se = FALSE) +
ggtitle("Average GDP per Capita vs Average Life Expectancy (1952-2007)")
plot2
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Because the trend line did not seem to give me more insight, I decided to turn to a different aesthetic method. The aesthetic I changed here was transparency, point size and shape. Because many of the points where clumped, I adjusted the transparency of each point in order to be able to perceive the data clumps more easily. I also increased the size of each point to aid with perception. I changed the shape to the default circular points, as I think this aligns with the aesthetic of the plot more.
plot3 <- ggplot(by_country, aes(x= avg_gdp, y= avg_life)) + geom_point(size = 3, alpha = .3) +
ggtitle("Average GDP per Capita vs Average Life Expectancy (1952-2007)")
plot3
The aesthetics I changed in this plot include axis titles/main titles, and color aesthetics. For this plot, I aim to show potential differences in this data for every continent. Every point is now filled/colored according to their continent. I also changed the title and axis labels in this plot, hoping to make the intended data story clearer- that GDP per Capita and Life Expectancy are factors of quality of life in a country.
plot4 <- ggplot(by_country, aes(x= avg_gdp, y= avg_life, color = continent)) + geom_point(size = 3, alpha = .3) +
ggtitle("Average Global Quality of Life Factors (1952-2007)") +
labs(x = "Average GDP per Capita (1952-2007)",
y= "Average Life Expetancy (1952-2007)")
plot4 + scale_color_manual(values = c("green","blue","purple","red",'yellow'))
The aesthetic changed in this plot is the size aesthetic. The points are now reflective of the average population of each nation. This information adds more context to the plotted data. The larger the population, the larger the data point. I also added a thematic change to adjust the position of the title of the plot, as well as modifying the legend labels to capitalize ‘continent’ and add the avg. population label to the size scale.
plot5 <- ggplot(by_country, aes(x= avg_gdp, y= avg_life, size= avg_pop/10000, color = continent)) + geom_point(alpha = .3) +
ggtitle("Average Global Quality of Life Factors (1952-2007)") +
labs(x = "Average GDP per Capita (1952-2007)",
y= "Average Life Expetancy (1952-2007)") +
theme(plot.title = element_text(hjust = .3))
plot5 + scale_color_manual(values = c("green","blue","purple","red",'yellow')) +
labs(color = "Continent", size = "Avg. Population")
My final modification to the plot includes making it interactive. I converted all the aesthetic features from the previous plots into an interactive plotly plot. I added hover aspects which, when hovering, you can see which country and the average population that that point corresponds to. This helps display global trends by region. The points are also sized according to population of that country, like my plot previously, but now in the interactive plot, a user can zoom in on each point, solving the issue of the jumbled points.
#colors
col <- c("green","blue","purple","red",'yellow')
by_country$popred <- by_country$avg_pop/10000
#make plotly plot
plot <- plot_ly(by_country,
x = ~avg_gdp,
y = ~avg_life,
type = "scatter",
color= ~continent,
colors = col,
mode = "markers",
size = ~popred,
marker = list(opacity = .3),
text = ~paste("Country: ", country, "<br>Average Population: ", round(avg_pop, 2)),
mode = "markers", hoverinfo = "text") %>%
layout(title = "Average Global Quality of Life Factors (1952-2007)",
xaxis = list(title = "Avg. GDP per Capita (1952-2007)"),
yaxis= list(title ="Avg. Life Expectancy (1952-2007)"))
plot
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
## Warning: `line.width` does not currently support multiple values.
Plot 6 is easily the best out of all 6 plots. It is the plot which tells the most cohesive story and which plot aligns with my goals the most, to show a summary of different factors which contribute to quality of life. Plot 6 displays average GDP per capita and average life expectancy over several decades in the points themselves. When countrys are plotted against one another, the relative quality of life globally is easier to percieve. The additional aesthetics added to the plot- continent by color, size by population, hover features displaying specific country and average population- all contribute to the data story by adding context to the data. Region of the world reveals their stage of economic development, along with GDP. Adding population also provides context, mostly for scale. Then, being able to display the specific country for each point allows the viewer to gather even more context behind the data- whether it be historical events or a unique economic history. This plot also fulfills important principles we have discussed: displaying the data, starting with gray, reducing clutter, and integrating text into the plot.