Data Visualization using ggplot2 and dplyr in R

Task One: Import packages & dataset

In this task, we will load the required package and dataset

into the R workspace. Also, we will explore the dataset

1.1: Load the required packages

library(gapminder)
library(dplyr)
## 
## 载入程辑包:'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

1.2: Look at the gapminder dataset

library(gapminder)
library(dplyr)
library(ggplot2)
gapminder
## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows

1.3: Create a subset of gapminder data set.

Create gapminder_1957

gapminder_1957<- gapminder %>%
  filter(year==1957)
gapminder_1957
## # A tibble: 142 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1957    30.3  9240934      821.
##  2 Albania     Europe     1957    59.3  1476505     1942.
##  3 Algeria     Africa     1957    45.7 10270856     3014.
##  4 Angola      Africa     1957    32.0  4561361     3828.
##  5 Argentina   Americas   1957    64.4 19610538     6857.
##  6 Australia   Oceania    1957    70.3  9712569    10950.
##  7 Austria     Europe     1957    67.5  6965860     8843.
##  8 Bahrain     Asia       1957    53.8   138655    11636.
##  9 Bangladesh  Asia       1957    39.3 51365468      662.
## 10 Belgium     Europe     1957    69.2  8989111     9715.
## # … with 132 more rows

Task Two: Scatterplots

In this task, we will use dplyr to manipulate

the data set and plot a scatterplot using ggplot2

2.1: Plot a scatterplot pop on the x-axis and lifeExp on the y-axis

ggplot(gapminder_1957,aes(x=pop,y=lifeExp))+geom_point()

2.2: Change to put pop on the x-axis and gdpPercap on the y-axis

ggplot(gapminder_1957,aes(x=pop,y=gdpPercap))+geom_point()

2.3 (Ex.): Create a scatter plot with gdpPercap on the x-axis

and lifeExp on the y-axis

ggplot(gapminder_1957,aes(x=gdpPercap,y=lifeExp))+geom_point()

Adding log Scales

2.4: Change this plot to put the x-axis on a log scale

ggplot(gapminder_1957,aes(x=gdpPercap,y=lifeExp))+geom_point()+scale_x_log10()

2.5 (Ex.): Scatter plot comparing pop and gdpPercap,

with both axes on a log scale

ggplot(gapminder,aes(x=pop,y=lifeExp, colour=continent,size=gdpPercap))+
geom_point()+scale_x_log10()+scale_y_log10()+facet_wrap(~year)

Task Three: Additional Aesthetics: Color & Size Aesthetics

In this task, we will add additional aesthetics like

color and size to the scatterplot

3.1: Scatter plot comparing pop and lifeExp,

with color representing continent

ggplot(gapminder,aes(x=pop,y=lifeExp, colour=continent))+
geom_point()+scale_x_log10()+scale_y_log10()

Size Aesthetics

3.2: Add the size aesthetic to represent a country’s gdpPercap

ggplot(gapminder,aes(x=pop,y=lifeExp, colour=continent,size=gdpPercap))+
geom_point()+scale_x_log10()+scale_y_log10()

Task Four: Facetting

In this task, we will add facet to plot multiple plots

on one page

4.1: Scatter plot comparing pop and lifeExp, faceted by continent

ggplot(gapminder,aes(x=pop,y=lifeExp))+
geom_point()+scale_x_log10()+scale_y_log10()+facet_wrap(~continent)

4.2: Scatter plot comparing gdpPercap and lifeExp, with color

representing continent and size representing population, faceted by year

ggplot(gapminder,aes(x=gdpPercap,y=lifeExp, colour=continent,size=pop))+
geom_point()+scale_x_log10()+scale_y_log10()+facet_wrap(~year)

Task Five: Visualizing summarized data: Scatterplots

In this task, we will use the summarise verb to get summaries

of the data set and visualize it using ggplot2

5.1: Create a variable by_year that gets the median life expectancy

for each year

gapminder by_year<- gapminder %>% group_by(year) %>% summarise(MedianLifexp=median(lifeExp)) by_year ## 5.2: Create a scatter plot showing the change in medianLifeExp over time

gapminder
## # A tibble: 1,704 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # … with 1,694 more rows
by_year<- gapminder %>%
  group_by(year) %>%
  summarise(MedianLifexp=median(lifeExp))
ggplot(by_year,aes(x=year,y=MedianLifexp))+geom_point()+expand_limits(y=0)

## 5.3: Summarize medianGdpPercap within each continent within each year: ## ggplot(by_year,aes(x=year,y=MedianLifexp))+geom_point()+expand_limits(y=0)

by_year_continent <- gapminder %>% group_by(year, continent) %>% summarize(medianGdpPercap = median(gdpPercap))

5.4: Plot the change in medianGdpPercap in each continent over time

by_year_continent

by_year_continent <- gapminder %>%
  group_by(year, continent) %>%
  summarize(medianGdpPercap = median(gdpPercap))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
ggplot(by_year_continent,aes(x=year,y=medianGdpPercap,colour=continent))+geom_point()

5.5: Summarize the median GDP and median life expectancy

per continent in 2007

by_continent_2007 <- gapminder %>% filter(year == 2007) %>% group_by(continent) %>% summarize(medianLifeExp = median(lifeExp), medianGdpPercap = median(gdpPercap))

5.6: Use a scatter plot to compare the median GDP

and median life expectancy

by_continent_2007 <- gapminder %>%
  filter(year == 2007) %>%
  group_by(continent) %>%
  summarize(medianLifeExp = median(lifeExp),
            medianGdpPercap = median(gdpPercap))
ggplot(by_continent_2007,aes(x=medianLifeExp,y=medianGdpPercap,colour=continent))+geom_point()

Task Six: Visualizing summarized data: Line plots

In this task, we will visualise summarized data to get

6.1: Summarize the median gdpPercap by year,

then save it as by_year

by_year <- gapminder %>% group_by(year) %>% summarize(medianGdpPercap = median(gdpPercap))

6.2: Create a line plot showing the change in medianGdpPercap over time

by_year

by_year <- gapminder %>%
  group_by(year) %>%
  summarize(medianGdpPercap = median(gdpPercap))
ggplot(by_year,aes(x=year,y=medianGdpPercap))+geom_line()+expand_limits(y=0)

6.3: Summarize the median gdpPercap by year & continent,

save as by_year_continent

by_year_continent <- gapminder %>% group_by(year, continent) %>% summarize(medianGdpPercap = median(gdpPercap))

6.4: Create a line plot showing the change in

medianGdpPercap by continent over time

by_year_continent <- gapminder %>%
  group_by(year, continent) %>%
  summarize(medianGdpPercap = median(gdpPercap))
## `summarise()` has grouped output by 'year'. You can override using the `.groups` argument.
ggplot(by_year_continent,aes(x=year,y=medianGdpPercap, color=continent))+geom_line()+expand_limits(y=0)

Task Seven: Visualizing summarized data: Bar Plots

In this task, we will visualise summarized data

using bar plots

7.1: Summarize the median gdpPercap by continent in 1957

by_continent <- gapminder %>% filter(year == 1957) %>% group_by(continent) %>% summarize(medianGdpPercap = median(gdpPercap))

7.2: Create a bar plot showing medianGdp by continent

by_continent <- gapminder %>%
  filter(year == 1957) %>%
  group_by(continent) %>%
  summarize(medianGdpPercap = median(gdpPercap))
ggplot(by_continent,aes(x=continent,y=medianGdpPercap))+geom_col()

7.3: Visualizing GDP per capita by country in Oceania

Filter for observations in the Oceania continent in 1957

oceania_1957 <- gapminder %>% filter(continent == “Oceania”, year == 1957)

7.4: Create a bar plot of gdpPercap by country

oceania_1957 <- gapminder %>%
  filter(continent == "Oceania", year == 1957)
ggplot(oceania_1957,aes(x=country,y=gdpPercap))+geom_col()

oceania_1957

Task Eight: Visualizing summarized data: Histograms

In this task, we will visualise summarized data

using histograms

8.1: Filter the dataset for the year 1957. Create a new column called

pop_by_mil. Save this in a new variable called gapminder_1957

gapminder_1957<- gapminder %>% filter(year==1957) %>% mutate(pop_by_mil=pop/1000000)

8.2: Create a histogram of population (pop_by_mil)

gapminder_1957<- gapminder %>%
  filter(year==1957) %>%
mutate(pop_by_mil=pop/1000000)
ggplot(gapminder_1957,aes(x=pop_by_mil))+geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

8.3: Recreate the gapminder_1957 and filter for the year 1957 only

gapminder_1957 <- gapminder %>% filter(year==1957)

8.4: Create a histogram of population (pop), with x on a log scale

gapminder_1957 <- gapminder %>%
filter(year==1957)
ggplot(gapminder_1957,aes(x=pop))+geom_histogram()+scale_x_log10()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Task Nine: Visualizing summarized data: Boxplots

In this task, we will visualise summarized data

using boxplots

9.1: Create the gapminder_1957 and filter for the year 1957 only

gapminder_1957 <- gapminder %>%
  filter(year == 1957)
gapminder_1957
## # A tibble: 142 × 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1957    30.3  9240934      821.
##  2 Albania     Europe     1957    59.3  1476505     1942.
##  3 Algeria     Africa     1957    45.7 10270856     3014.
##  4 Angola      Africa     1957    32.0  4561361     3828.
##  5 Argentina   Americas   1957    64.4 19610538     6857.
##  6 Australia   Oceania    1957    70.3  9712569    10950.
##  7 Austria     Europe     1957    67.5  6965860     8843.
##  8 Bahrain     Asia       1957    53.8   138655    11636.
##  9 Bangladesh  Asia       1957    39.3 51365468      662.
## 10 Belgium     Europe     1957    69.2  8989111     9715.
## # … with 132 more rows
 ggplot(gapminder_1957,aes(x=gdpPercap,y=lifeExp))+geom_point()+
  scale_x_log10()+
  scale_y_log10()

9.2: Create a boxplot comparing gdpPercap among continents

ggplot(gapminder_1957,aes(x=continent,y=gdpPercap))+geom_boxplot()

9.3: Add a title to this graph:

“Comparing GDP per capita across continents”

 ggplot(gapminder_1957,aes(x=continent,y=gdpPercap))+geom_boxplot()+
   scale_y_log10()+
   ggtitle("Comparing GDP per capita across continents")