1: (10 points) Are there missing values in the data? If so, can you
show how data is missing? (open-ended question)
colSums(is.na(country))
## country year infant_mortality life_expectancy
## 0 0 1453 0
## fertility population gdp continent
## 187 185 2972 0
## region
## 0
missing_rows <- which(apply(is.na(country), 1, any))
head(country[missing_rows, ])
## country year infant_mortality life_expectancy fertility
## 1 Albania 1960 115.4 62.87 6.19
## 3 Angola 1960 208.0 35.98 7.32
## 4 Antigua and Barbuda 1960 NA 62.97 4.43
## 6 Armenia 1960 NA 66.86 4.55
## 7 Aruba 1960 NA 65.66 4.82
## 10 Azerbaijan 1960 NA 61.33 5.57
## population gdp continent region
## 1 1636054 NA Europe Southern Europe
## 3 5270844 NA Africa Middle Africa
## 4 54681 NA Americas Caribbean
## 6 1867396 NA Asia Western Asia
## 7 54208 NA Americas Caribbean
## 10 3897889 NA Asia Western Asia
By combining the is.na() and colSums() functions, we can see that
there are missing values in the data, specifically in the columns/
variables of fertility, population, gdp, and infant_mortality. In total,
there are 4797 missing values. To show how data is missing, I use
which() function to locate the rows with missing values, and then subset
the data set to show only those rows.
2: (5 points) How many unique countries are included in the data?
How many years of observations are included in the data?
num_countries <- length(unique(country$country))
num_years <- length(unique(country$year))
cat("Number of unique countries:", num_countries, "\n")
## Number of unique countries: 185
cat("Number of years of observations:", num_years, "\n")
## Number of years of observations: 57
3:(5 points) In the data, create a new variable called
GDP_per_capita which equals to GDP/population.
country <- country %>%
mutate( GDP_per_capita = gdp/population)
head(country)
## country year infant_mortality life_expectancy fertility
## 1 Albania 1960 115.40 62.87 6.19
## 2 Algeria 1960 148.20 47.50 7.65
## 3 Angola 1960 208.00 35.98 7.32
## 4 Antigua and Barbuda 1960 NA 62.97 4.43
## 5 Argentina 1960 59.87 65.39 3.11
## 6 Armenia 1960 NA 66.86 4.55
## population gdp continent region GDP_per_capita
## 1 1636054 NA Europe Southern Europe NA
## 2 11124892 13828152297 Africa Northern Africa 1242.992
## 3 5270844 NA Africa Middle Africa NA
## 4 54681 NA Americas Caribbean NA
## 5 20619075 108322326649 Americas South America 5253.501
## 6 1867396 NA Asia Western Asia NA