Instructions

For each exercise below, show code. Once you’ve completed things, don’t forget to input everything into the quiz on Canvas and to upload this document (knitted version please!) at the end of the quiz. A few tips:


Q1: How many observations are there in the gapminder dataset?

1704

#install.packages("tidyverse")

library(tidyverse)
library(gapminder)
data(gapminder)

Q2: Show the class() of each variable in the gapminder dataset. Describe the the difference between"numeric" and "integer".

country and continent are factor variables, year and pop are integers, gdppercap and lifeexp are double precision. integer is a subset of numeric data that only includes whole numbers ## What’s the class of year? year is an integer

glimpse(gapminder)
## Rows: 1,704
## Columns: 6
## $ country   <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …

Q3: How many unique countries are in the dataset?

142 Hint: Look at the length() function.

length(unique(gapminder$country))
## [1] 142

Q4: What was the population of Oman in 2007?

3204897 Hint: Use filter().

filter(gapminder, country=="Oman", year=="2007")
## # A tibble: 1 × 6
##   country continent  year lifeExp     pop gdpPercap
##   <fct>   <fct>     <int>   <dbl>   <int>     <dbl>
## 1 Oman    Asia       2007    75.6 3204897    22316.

Q5: Which 5 countries have the highest GDP per capita in 2007? (Top 5 countries with the highest GDP in 2007)

Norway, Kuwait, Singapore, United States, Ireland

Hint: Use filter() and arrange().

gap2007<- filter(gapminder, year=="2007")
arrange(gap2007, desc(gdpPercap))
## # A tibble: 142 × 6
##    country          continent  year lifeExp       pop gdpPercap
##    <fct>            <fct>     <int>   <dbl>     <int>     <dbl>
##  1 Norway           Europe     2007    80.2   4627926    49357.
##  2 Kuwait           Asia       2007    77.6   2505559    47307.
##  3 Singapore        Asia       2007    80.0   4553009    47143.
##  4 United States    Americas   2007    78.2 301139947    42952.
##  5 Ireland          Europe     2007    78.9   4109086    40676.
##  6 Hong Kong, China Asia       2007    82.2   6980412    39725.
##  7 Switzerland      Europe     2007    81.7   7554661    37506.
##  8 Netherlands      Europe     2007    79.8  16570613    36798.
##  9 Canada           Americas   2007    80.7  33390141    36319.
## 10 Iceland          Europe     2007    81.8    301931    36181.
## # ℹ 132 more rows

Q6: Which 5 countries have the lowest average life expectancy over the period from 1952 to 2007?

Hint: group_by() and summarize()!

gapLE <- gapminder %>% group_by(country) %>% summarise(avg = mean(lifeExp))
arrange(gapLE, (avg))
## # A tibble: 142 × 2
##    country             avg
##    <fct>             <dbl>
##  1 Sierra Leone       36.8
##  2 Afghanistan        37.5
##  3 Angola             37.9
##  4 Guinea-Bissau      39.2
##  5 Mozambique         40.4
##  6 Somalia            41.0
##  7 Rwanda             41.5
##  8 Liberia            42.5
##  9 Equatorial Guinea  43.0
## 10 Guinea             43.2
## # ℹ 132 more rows

Q7: List the top three countries in terms of population in 2007.

China, India, United States

arrange(gap2007, desc(pop))
## # A tibble: 142 × 6
##    country       continent  year lifeExp        pop gdpPercap
##    <fct>         <fct>     <int>   <dbl>      <int>     <dbl>
##  1 China         Asia       2007    73.0 1318683096     4959.
##  2 India         Asia       2007    64.7 1110396331     2452.
##  3 United States Americas   2007    78.2  301139947    42952.
##  4 Indonesia     Asia       2007    70.6  223547000     3541.
##  5 Brazil        Americas   2007    72.4  190010647     9066.
##  6 Pakistan      Asia       2007    65.5  169270617     2606.
##  7 Bangladesh    Asia       2007    64.1  150448339     1391.
##  8 Nigeria       Africa     2007    46.9  135031164     2014.
##  9 Japan         Asia       2007    82.6  127467972    31656.
## 10 Mexico        Americas   2007    76.2  108700891    11978.
## # ℹ 132 more rows

Q8: Create a new variable called africa where observations located in the continent of Africa are coded as “Africa” and those not located in Africa as “Not Africa.” Use dplyr to compute the average life expectancy and GDP per capita in countries located within Africa and outside of Africa in 2007.

Quiz question: What is the average life expectancy in the continent of Africa in 2007?

54.8

gapAfr <- gapminder %>% mutate(africa = if_else(continent == "Africa", "Africa", "Not Africa"))
gapAfr2 <- filter(gapAfr, africa=="Africa")
summarise(gapAfr2, mean(lifeExp))
## # A tibble: 1 × 1
##   `mean(lifeExp)`
##             <dbl>
## 1            48.9
summarise(gapAfr2, mean(gdpPercap))
## # A tibble: 1 × 1
##   `mean(gdpPercap)`
##               <dbl>
## 1             2194.
gap_nonafr <- filter(gapAfr, africa=="Not Africa")
summarise(gap_nonafr, mean(lifeExp))
## # A tibble: 1 × 1
##   `mean(lifeExp)`
##             <dbl>
## 1            65.6
summarise(gap_nonafr, mean(gdpPercap))
## # A tibble: 1 × 1
##   `mean(gdpPercap)`
##               <dbl>
## 1            10117.
gapAfr2007 <- gapAfr2 %>% filter(year==2007)
summarise(gapAfr2007, mean(lifeExp))
## # A tibble: 1 × 1
##   `mean(lifeExp)`
##             <dbl>
## 1            54.8

Knit your document and save it as a HTML file. you will be uploading it in the Quiz.