Welcome to my lab report! Let’s get started by loading in the needed packages (which are already installed):
library(ggplot2)library(ggpubr)
Warning: package 'ggpubr' was built under R version 4.5.1
library(dplyr)
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
library(ggExtra)
Now, let’s load in the GapMinder data and filter for the subset of the year 2007:
gap <-read.csv("gapminderData5.csv")str(gap)
'data.frame': 1704 obs. of 6 variables:
$ country : chr "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
$ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
$ pop : num 8425333 9240934 10267083 11537966 13079460 ...
$ continent: chr "Asia" "Asia" "Asia" "Asia" ...
$ lifeExp : num 28.8 30.3 32 34 36.1 ...
$ gdpPercap: num 779 821 853 836 740 ...
gap07 <- gap %>%filter(year ==2007& continent !="Oceania")
Scatter Plots
Here, we will generate scatter plots with two different packages:
Scatter Plots ggplot2
ggplot(gap07, aes(x = gdpPercap, y = lifeExp, col = continent)) +geom_point() +scale_x_log10("GDP per capita ($)") +scale_y_continuous("Life Expectancy (yrs)") +ggtitle("GapMinder Data 2007")
Now, I will do the same thing with ggpubr
Scatter Plots with ggpubr
ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent",xlab ="GDP per capita ($)", ylab ="Life expectancy (yrs)", main ="GapMinder Data 2007") +xscale("log10", .format =TRUE)
As the lab details, we can also add labels for each of our data points:
ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent",label ="country", repel =TRUE) +xscale("log10", .format =TRUE)
Though, this looks a bit chaotic. We can also select a few points to label:
sel_countries <-c("United States", "China", "Germany")ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent", xlab ="GDP per capita ($)", ylab ="Life expectancy (yrs)", main ="GapMinder Data 2007",label ="country", label.select = sel_countries, repel =TRUE) +xscale("log10", .format =TRUE)
There are a variety of other ways to show the distribution of these points as well, including a marginal histogram:
p <-ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent") +xscale("log10", .format =TRUE)ggMarginal(p, type ="histogram")
We can add a regression line:
ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent", xlab ="GDP per capita ($)", ylab ="Life expectancy (yrs)", main ="Gapminder Data 2007",add ="reg.line", conf.int =TRUE) +xscale("log10", .format =TRUE)
And, get this, we can add correlations too:
ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent",xlab ="GDP per capita ($)", ylab ="Life expectancy (yrs)", main ="GapMinder Data 2007", add ="reg.line", conf.int =TRUE) +xscale("log10", .format =TRUE) +stat_cor(aes(color = continent), method ="spearman")
We can even add the regression line equations!
ggscatter(gap07, x ="gdpPercap", y ="lifeExp", col ="continent",xlab ="GDP per capita ($)", ylab ="Life expectancy (yrs)", main ="GapMinder Data 2007", add ="reg.line", conf.int =TRUE) +xscale("log10", .format =TRUE) +stat_regline_equation(aes(color = continent))
Histograms
Now, let’s move to histograms. These graphs can be generated by using the command: gghistogram(). See below for an example with the distribution of life expectancy values (using the fill argument to separate the continents):
gghistogram(gap07, x ="lifeExp", fill ="continent", main ="GapMinder Life Expectancy")
Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.
Palettes
We can also remake these figures using different color palettes:
gghistogram(gap07, x ="lifeExp", fill ="continent", main ="GapMinder Life Expectancy", palette ="npg")
Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.
Density Plots
As the lab details, density plots are a great alternative to histograms. See below (with some helpful additions):
ggdensity(gap07, x ="lifeExp", fill ="continent", main ="GapMinder Life Expectancy", palette ="jco", facet.by ="continent", add ="median", rug =TRUE)
Violin Plots
Aren’t these funky? They are an alternative to density plots (they plot a density curve). Let’s see an example with some fun, helpful additions:
ggviolin(gap07, x ="continent", y ="lifeExp", fill ="continent", palette ="jco", add =c("boxplot", "jitter"), ylab ="Life Expectancy (yrs)")
Horizontal Violin Plots
By adding the simple rotate argument, you can make your violins horizontal:
ggviolin(gap07, x ="continent", y ="lifeExp", fill ="continent", palette ="jco", add =c("boxplot", "jitter"), ylab ="Life Expectancy (yrs)", rotate =TRUE)
Isn’t that something.
Bar Plots
Check out this colorful bar plot by country! It includes a fill for each continent, rotated country labels and reduced font sizes, and axis labels:
ggbarplot(gap07, x ="country", y ="lifeExp", fill ="continent", palette ="jco", x.text.angle =90, ylab ="Life Expectancy (yrs)", xlab ="Country") +font("x.text", size =4)
Dot Plots
Dot plots are nifty alternatives to barplots. Note (as the lab says): adding segments forces the origin to zero:
Here, I will make a new subset of the gap data (including only African and Asian countries for three of the years), and make a boxplot of the life expectancy values for the two countries, colored by continent, with jittered observations overlayed, and a T-test to compare the means:
gap_sub <- gap %>%filter(continent %in%c("Asia", "Africa"), year %in%c(1957, 1982, 2007))ggboxplot(gap_sub, x ="continent", y ="lifeExp", ylab ="Years", col ="continent", add ="jitter") +stat_compare_means(method ="t.test", label.y =90)
Multiple Comparisons
Finally, we can use these plots to compare multiple groups. First, we will make a list containing all the pairs of comparisons we want to test:
Then, we can make a boxplot using multiple stat_compare_mean functions, where we ask for t-tests between all the comparisons in the list comps and an anova test:
ggboxplot(gap_sub, x ="year", y ="lifeExp") +stat_compare_means(method ="t.test", comparisons = comps, bracket.size =0.6, size =4) +stat_compare_means(label.y =110, method ="anova")