Instructions

For each exercise below, show code. Once you’ve completed things, don’t forget to input everything into the quiz on Canvas and to upload this document (knitted version please!) at the end of the quiz. A few tips:


Q1: How many observations are there in the gapminder dataset?

There are 1,704 observations as seen in the the gapminder dataset environment.

library(tidyverse)
library(gapminder)
data(gapminder)

Q2: Show the class() of each variable in the gapminder dataset. Describe the the difference between"numeric" and "integer". What’s the class of year?

Head is used to view the dataset.

head(gapminder)
## # A tibble: 6 × 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

sapply function is used to show the class of each variable in the dataset.

sapply(gapminder, class)
##   country continent      year   lifeExp       pop gdpPercap 
##  "factor"  "factor" "integer" "numeric" "integer" "numeric"

Differences in class: Factor are categorical variables, numeric is continuous numerical data, and integer is discrete valued numerical values. Variables, such as “year”, fall within the integer class.


Q3: How many unique countries are in the dataset? Hint: Look at the length() function.

The length function can be used to determine how many countries are within the dataset.

length(gapminder$country)
## [1] 1704

Q4: What was the population of Oman in 2007? Hint: Use filter().

The filter funtion can be used to pull specific items from rows within the dataset such as the population of Oman in 2007.

oman_pop_2007 <- gapminder %>%
  filter(country == "Oman") %>%
  select(country, year, pop) 
oman_pop_2007 %>%
  filter(year == "2007")
## # A tibble: 1 × 3
##   country  year     pop
##   <fct>   <int>   <int>
## 1 Oman     2007 3204897

Q5: Which 5 countries have the highest GDP per capita in 2007? Hint: Use filter() and arrange().

Filter was used to identify 2007, the select funtion was used to keep desired columns, and the arrange function was used to sort the returned data.

gapminder %>%
  filter(year == "2007") %>%
  select(country, gdpPercap) %>%
  arrange(desc(gdpPercap))
## # A tibble: 142 × 2
##    country          gdpPercap
##    <fct>                <dbl>
##  1 Norway              49357.
##  2 Kuwait              47307.
##  3 Singapore           47143.
##  4 United States       42952.
##  5 Ireland             40676.
##  6 Hong Kong, China    39725.
##  7 Switzerland         37506.
##  8 Netherlands         36798.
##  9 Canada              36319.
## 10 Iceland             36181.
## # … with 132 more rows

The countries with the highest GDP per capita in 2007 were:

Country: GDP Per Capita

Norway: 49357.1902

Kuwait: 47306.9898

Singapore: 47143.1796

United States: 42951.6531

Ireland: 40675.9964


Q6: Which 5 countries have the lowest average life expectancy over the period from 1952 to 2007? (Hint: group_by() and summarize()!)

The data was grouped by country than the summarize function was used to return the average life expectancy before sorting in ascending order using the arrange function.

gapminder %>%
  group_by(country) %>%
  summarize(mean_le = mean(lifeExp)) %>%
  arrange(mean_le)
## # A tibble: 142 × 2
##    country           mean_le
##    <fct>               <dbl>
##  1 Sierra Leone         36.8
##  2 Afghanistan          37.5
##  3 Angola               37.9
##  4 Guinea-Bissau        39.2
##  5 Mozambique           40.4
##  6 Somalia              41.0
##  7 Rwanda               41.5
##  8 Liberia              42.5
##  9 Equatorial Guinea    43.0
## 10 Guinea               43.2
## # … with 132 more rows

The countries with the lowest life expectancies in the world are:

Country: Life Expectancy

Sierra Leone: 36.76917

Afghanistan: 37.47883

Angola: 37.88350

Guinea-Bissau: 39.21025

Mozambique: 40.37950


Q7: List the top three countries in terms of population in 2007.

The data list was generated by filtering year 2007 and selecting the country, population, and year columns to be arranged in descending order by population.

gapminder %>%
  filter(year == "2007") %>%
  select(country, pop, year) %>%
  arrange(desc(pop))
## # A tibble: 142 × 3
##    country              pop  year
##    <fct>              <int> <int>
##  1 China         1318683096  2007
##  2 India         1110396331  2007
##  3 United States  301139947  2007
##  4 Indonesia      223547000  2007
##  5 Brazil         190010647  2007
##  6 Pakistan       169270617  2007
##  7 Bangladesh     150448339  2007
##  8 Nigeria        135031164  2007
##  9 Japan          127467972  2007
## 10 Mexico         108700891  2007
## # … with 132 more rows

The top 3 country in terms of population in 2007 are:

Country: Population

China: 1318683096

India: 1110396331

United States: 301139947


Q8: Create a new variable called africa where observations located in the continent of Africa are coded as “Africa” and those not located in Africa as “Not Africa.” Use dplyr to compute the average life expectancy and GDP per capita in countries located within Africa and outside of Africa in 2007. (2 points)

Mutate function was used to create a dataset which included a variable called africa to indicate countries that are or are not located within the African Continent. Within this dataset the original values from gapminder’s variables were included for analysis. New vectors were created using the filter, groupby, and summarize functions to run analysis on the average life expectancy and gdp per capitas between Africa and Non African countries in 2007. The analysis showed the average life expectancy in Africa is 55 vs. 74 years old in non African countries. The average GDP per capita of African countries were $3,089 vs. $16,644 in non African Countries.

africa <- gapminder %>%
  mutate(africa = if_else(continent == "Africa", "Africa", "Not Africa"))
head(africa)
## # A tibble: 6 × 7
##   country     continent  year lifeExp      pop gdpPercap africa    
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl> <chr>     
## 1 Afghanistan Asia       1952    28.8  8425333      779. Not Africa
## 2 Afghanistan Asia       1957    30.3  9240934      821. Not Africa
## 3 Afghanistan Asia       1962    32.0 10267083      853. Not Africa
## 4 Afghanistan Asia       1967    34.0 11537966      836. Not Africa
## 5 Afghanistan Asia       1972    36.1 13079460      740. Not Africa
## 6 Afghanistan Asia       1977    38.4 14880372      786. Not Africa
Africa_le_gdp <- africa %>%
  filter(year == "2007", africa == "Africa") %>%
  group_by(africa, year) %>%
  summarize(avg_le = mean(lifeExp),
            avg_gdp = mean(gdpPercap))
## `summarise()` has grouped output by 'africa'. You can override using the
## `.groups` argument.
Africa_le_gdp
## # A tibble: 1 × 4
## # Groups:   africa [1]
##   africa  year avg_le avg_gdp
##   <chr>  <int>  <dbl>   <dbl>
## 1 Africa  2007   54.8   3089.
Not_Africa_le_gdp <- africa %>%
  filter(year == "2007", africa == "Not Africa") %>%
  group_by(africa, year) %>%
  summarize(avg_le = mean(lifeExp),
            avg_gdp = mean(gdpPercap))
## `summarise()` has grouped output by 'africa'. You can override using the
## `.groups` argument.
Not_Africa_le_gdp
## # A tibble: 1 × 4
## # Groups:   africa [1]
##   africa      year avg_le avg_gdp
##   <chr>      <int>  <dbl>   <dbl>
## 1 Not Africa  2007   74.1  16644.