1: (10 points) Are there missing values in the data? If so, can you show how data is missing? (open-ended question)

colSums(is.na(country))
##          country             year infant_mortality  life_expectancy 
##                0                0             1453                0 
##        fertility       population              gdp        continent 
##              187              185             2972                0 
##           region 
##                0
missing_rows <- which(apply(is.na(country), 1, any))
head(country[missing_rows, ])
##                country year infant_mortality life_expectancy fertility
## 1              Albania 1960            115.4           62.87      6.19
## 3               Angola 1960            208.0           35.98      7.32
## 4  Antigua and Barbuda 1960               NA           62.97      4.43
## 6              Armenia 1960               NA           66.86      4.55
## 7                Aruba 1960               NA           65.66      4.82
## 10          Azerbaijan 1960               NA           61.33      5.57
##    population gdp continent          region
## 1     1636054  NA    Europe Southern Europe
## 3     5270844  NA    Africa   Middle Africa
## 4       54681  NA  Americas       Caribbean
## 6     1867396  NA      Asia    Western Asia
## 7       54208  NA  Americas       Caribbean
## 10    3897889  NA      Asia    Western Asia

By combining the is.na() and colSums() functions, we can see that there are missing values in the data, specifically in the columns/ variables of fertility, population, gdp, and infant_mortality. In total, there are 4797 missing values. To show how data is missing, I use which() function to locate the rows with missing values, and then subset the data set to show only those rows.

2: (5 points) How many unique countries are included in the data? How many years of observations are included in the data?

num_countries <- length(unique(country$country))
num_years <- length(unique(country$year))
cat("Number of unique countries:", num_countries, "\n")
## Number of unique countries: 185
cat("Number of years of observations:", num_years, "\n")
## Number of years of observations: 57

3:(5 points) In the data, create a new variable called GDP_per_capita which equals to GDP/population.

country <- country %>%
  mutate( GDP_per_capita = gdp/population)
head(country)
##               country year infant_mortality life_expectancy fertility
## 1             Albania 1960           115.40           62.87      6.19
## 2             Algeria 1960           148.20           47.50      7.65
## 3              Angola 1960           208.00           35.98      7.32
## 4 Antigua and Barbuda 1960               NA           62.97      4.43
## 5           Argentina 1960            59.87           65.39      3.11
## 6             Armenia 1960               NA           66.86      4.55
##   population          gdp continent          region GDP_per_capita
## 1    1636054           NA    Europe Southern Europe             NA
## 2   11124892  13828152297    Africa Northern Africa       1242.992
## 3    5270844           NA    Africa   Middle Africa             NA
## 4      54681           NA  Americas       Caribbean             NA
## 5   20619075 108322326649  Americas   South America       5253.501
## 6    1867396           NA      Asia    Western Asia             NA