Buttin Module 15

Author

Camille Buttin

Welcome to my lab report! Let’s get started by loading in the needed packages (which are already installed):

library(ggplot2)
library(ggpubr)

Warning: package 'ggpubr' was built under R version 4.5.1

library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(ggExtra)

Now, let’s load in the GapMinder data and filter for the subset of the year 2007:

gap <- read.csv("gapminderData5.csv")
str(gap)

'data.frame':   1704 obs. of  6 variables:
 $ country  : chr  "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
 $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
 $ pop      : num  8425333 9240934 10267083 11537966 13079460 ...
 $ continent: chr  "Asia" "Asia" "Asia" "Asia" ...
 $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
 $ gdpPercap: num  779 821 853 836 740 ...

gap07 <- gap %>% 
  filter(year == 2007 & continent != "Oceania")

Scatter Plots

Here, we will generate scatter plots with two different packages:

Scatter Plots ggplot2

ggplot(gap07, aes(x = gdpPercap, y = lifeExp, col = continent)) + 
  geom_point() + 
  scale_x_log10("GDP per capita ($)") + 
  scale_y_continuous("Life Expectancy (yrs)") + 
  ggtitle("GapMinder Data 2007")

Now, I will do the same thing with ggpubr

Scatter Plots with ggpubr

ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent",
          xlab = "GDP per capita ($)", ylab = "Life expectancy (yrs)", 
          main = "GapMinder Data 2007") + 
  xscale("log10", .format = TRUE)

As the lab details, we can also add labels for each of our data points:

ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent",
          label = "country", repel = TRUE) + 
  xscale("log10", .format = TRUE)

Though, this looks a bit chaotic. We can also select a few points to label:

sel_countries <- c("United States", "China", "Germany")
ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent", xlab = "GDP per capita ($)", ylab = "Life expectancy (yrs)", main = "GapMinder Data 2007",
          label = "country", label.select = sel_countries, repel = TRUE) + 
  xscale("log10", .format = TRUE)

There are a variety of other ways to show the distribution of these points as well, including a marginal histogram:

p <- ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent") + 
  xscale("log10", .format = TRUE)
ggMarginal(p, type = "histogram")

We can add a regression line:

ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent", xlab = "GDP per capita ($)", ylab = "Life expectancy (yrs)", main = "Gapminder Data 2007",
          add = "reg.line", conf.int = TRUE) + 
  xscale("log10", .format = TRUE)

And, get this, we can add correlations too:

ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent",
          xlab = "GDP per capita ($)", ylab = "Life expectancy (yrs)", 
          main = "GapMinder Data 2007", add = "reg.line", conf.int = TRUE) + 
  xscale("log10", .format = TRUE) +
  stat_cor(aes(color = continent), method = "spearman")

We can even add the regression line equations!

ggscatter(gap07, x = "gdpPercap", y = "lifeExp", col = "continent",
          xlab = "GDP per capita ($)", ylab = "Life expectancy (yrs)", 
          main = "GapMinder Data 2007", add = "reg.line", conf.int = TRUE) + 
  xscale("log10", .format = TRUE) +
  stat_regline_equation(aes(color = continent))

Histograms

Now, let’s move to histograms. These graphs can be generated by using the command: gghistogram(). See below for an example with the distribution of life expectancy values (using the fill argument to separate the continents):

gghistogram(gap07, x = "lifeExp", fill = "continent", main = "GapMinder Life Expectancy")

Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.

Palettes

We can also remake these figures using different color palettes:

gghistogram(gap07, x = "lifeExp", fill = "continent", 
            main = "GapMinder Life Expectancy", palette = "npg")

Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.

Density Plots

As the lab details, density plots are a great alternative to histograms. See below (with some helpful additions):

ggdensity(gap07, x = "lifeExp", fill = "continent", main = "GapMinder Life Expectancy", palette = "jco", facet.by = "continent", add = "median", rug = TRUE)

Violin Plots

Aren’t these funky? They are an alternative to density plots (they plot a density curve). Let’s see an example with some fun, helpful additions:

ggviolin(gap07, x = "continent", y = "lifeExp", 
         fill = "continent", palette = "jco", 
         add = c("boxplot", "jitter"), ylab = "Life Expectancy (yrs)")

Horizontal Violin Plots

By adding the simple rotate argument, you can make your violins horizontal:

ggviolin(gap07, x = "continent", y = "lifeExp", 
         fill = "continent", palette = "jco", 
         add = c("boxplot", "jitter"), ylab = "Life Expectancy (yrs)", rotate = TRUE)

Isn’t that something.

Bar Plots

Check out this colorful bar plot by country! It includes a fill for each continent, rotated country labels and reduced font sizes, and axis labels:

ggbarplot(gap07, x = "country", y = "lifeExp", fill = "continent", palette = "jco", x.text.angle = 90, ylab = "Life Expectancy (yrs)", xlab = "Country") + font("x.text", size = 4)

Dot Plots

Dot plots are nifty alternatives to barplots. Note (as the lab says): adding segments forces the origin to zero:

ggdotchart(gap07,
           x = "country",
           y = "lifeExp",
           color = "continent",
           palette = "jco",
           sorting = "descending",
           rotate = TRUE,
           group = "continent",
           add = "segments",
           ylab = "Life expectancy (yrs)",
           xlab = "Country") +
  font("y.text", size = 4)

Group Comparisons

Here, I will make a new subset of the gap data (including only African and Asian countries for three of the years), and make a boxplot of the life expectancy values for the two countries, colored by continent, with jittered observations overlayed, and a T-test to compare the means:

gap_sub <- gap %>% 
  filter(continent %in% c("Asia", "Africa"), year %in% c(1957, 1982, 2007))

ggboxplot(gap_sub, x = "continent", y = "lifeExp", ylab = "Years", col = "continent", add = "jitter") + 
  stat_compare_means(method = "t.test", label.y = 90)

Multiple Comparisons

Finally, we can use these plots to compare multiple groups. First, we will make a list containing all the pairs of comparisons we want to test:

comps <- list(c("1957", "1982"), c("1957", "2007"), c("1982", "2007"))
comps

[[1]]
[1] "1957" "1982"

[[2]]
[1] "1957" "2007"

[[3]]
[1] "1982" "2007"

Then, we can make a boxplot using multiple stat_compare_mean functions, where we ask for t-tests between all the comparisons in the list comps and an anova test:

ggboxplot(gap_sub, x = "year", y = "lifeExp") + 
  stat_compare_means(method = "t.test", comparisons = comps, bracket.size = 0.6, size = 4) + 
  stat_compare_means(label.y = 110, method = "anova")

Thanks for reading!