This is my homework report for week 4, produced with R Markdown. In this homework I will be working on Gapminder_unfiltered data to do some exploratory data analysis by using a combination of data transformation and visualization techniques
For this homework assignment, I used the following packages:
library(gapminder) # for using the gapminder_unfiltered dataset
## Warning: package 'gapminder' was built under R version 3.3.2
library(ggplot2) # for creating graphs
## Warning: package 'ggplot2' was built under R version 3.3.2
library(dplyr) # for performing data transformation and manipulation tasks.
## Warning: package 'dplyr' was built under R version 3.3.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(knitr) # for kniting r code to html files
The data fields in the data set are:
country : factor with 142 levels
continent: factor with 5 levels
year : ranges from 1952 to 2007 in increments of 5 years
lifeExp : life expectancy at birth, in years
pop : population
gdpPercap: GDP per capita
The data is an excerpt of the Gapminder data on life expectancy, GDP per capita, and population by country.
# number of rows and variables
dim(gapminder_unfiltered)
## [1] 3313 6
# names of variables
names(gapminder_unfiltered)
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
head(gapminder_unfiltered)
tail(gapminder_unfiltered)
#count of missing values
sum(is.na(gapminder_unfiltered))
## [1] 0
#summary of data set
summary(gapminder_unfiltered)
## country continent year lifeExp
## Czech Republic: 58 Africa : 637 Min. :1950 Min. :23.60
## Denmark : 58 Americas: 470 1st Qu.:1967 1st Qu.:58.33
## Finland : 58 Asia : 578 Median :1982 Median :69.61
## Iceland : 58 Europe :1302 Mean :1980 Mean :65.24
## Japan : 58 FSU : 139 3rd Qu.:1996 3rd Qu.:73.66
## Netherlands : 58 Oceania : 187 Max. :2007 Max. :82.67
## (Other) :2965
## pop gdpPercap
## Min. :5.941e+04 Min. : 241.2
## 1st Qu.:2.680e+06 1st Qu.: 2505.3
## Median :7.560e+06 Median : 7825.8
## Mean :3.177e+07 Mean : 11313.8
## 3rd Qu.:1.961e+07 3rd Qu.: 17355.8
## Max. :1.319e+09 Max. :113523.1
##
table(gapminder_unfiltered$year)
##
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964
## 39 24 144 24 24 24 24 144 25 25 26 26 151 26 26
## 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979
## 27 27 156 27 27 27 27 168 32 27 27 27 171 27 27
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994
## 27 27 171 27 27 27 27 171 27 27 32 33 183 33 33
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
## 33 33 184 33 33 33 33 187 33 32 30 18 183
table(gapminder_unfiltered$country)
##
## Afghanistan Albania Algeria
## 12 12 12
## Angola Argentina Armenia
## 12 12 4
## Aruba Australia Austria
## 8 56 57
## Azerbaijan Bahamas Bahrain
## 4 10 12
## Bangladesh Barbados Belarus
## 12 10 18
## Belgium Belize Benin
## 57 11 12
## Bhutan Bolivia Bosnia and Herzegovina
## 8 12 12
## Botswana Brazil Brunei
## 12 12 8
## Bulgaria Burkina Faso Burundi
## 57 12 12
## Cambodia Cameroon Canada
## 12 12 57
## Cape Verde Central African Republic Chad
## 11 12 12
## Chile China Colombia
## 12 36 12
## Comoros Congo, Dem. Rep. Congo, Rep.
## 12 12 12
## Costa Rica Cote d'Ivoire Croatia
## 13 12 12
## Cuba Cyprus Czech Republic
## 13 8 58
## Denmark Djibouti Dominican Republic
## 58 12 12
## Ecuador Egypt El Salvador
## 12 12 12
## Equatorial Guinea Eritrea Estonia
## 12 12 18
## Ethiopia Fiji Finland
## 12 10 58
## France French Guiana French Polynesia
## 57 1 9
## Gabon Gambia Georgia
## 12 12 9
## Germany Ghana Greece
## 26 12 13
## Grenada Guadeloupe Guatemala
## 8 1 12
## Guinea Guinea-Bissau Guyana
## 12 12 10
## Haiti Honduras Hong Kong, China
## 12 12 12
## Hungary Iceland India
## 57 58 12
## Indonesia Iran Iraq
## 12 12 12
## Ireland Israel Italy
## 13 12 56
## Jamaica Japan Jordan
## 12 58 12
## Kazakhstan Kenya Korea, Dem. Rep.
## 4 12 12
## Korea, Rep. Kuwait Latvia
## 12 12 42
## Lebanon Lesotho Liberia
## 12 12 12
## Libya Lithuania Luxembourg
## 13 18 49
## Macao, China Madagascar Malawi
## 8 12 12
## Malaysia Maldives Mali
## 12 8 12
## Malta Martinique Mauritania
## 10 1 12
## Mauritius Mexico Micronesia, Fed. Sts.
## 12 13 8
## Moldova Mongolia Montenegro
## 5 12 12
## Morocco Mozambique Myanmar
## 12 12 12
## Namibia Nepal Netherlands
## 12 12 58
## Netherlands Antilles New Caledonia New Zealand
## 8 9 55
## Nicaragua Niger Nigeria
## 12 12 12
## Norway Oman Pakistan
## 58 12 12
## Panama Papua New Guinea Paraguay
## 12 10 12
## Peru Philippines Poland
## 12 12 52
## Portugal Puerto Rico Qatar
## 58 13 8
## Reunion Romania Russia
## 12 12 20
## Rwanda Samoa Sao Tome and Principe
## 12 7 12
## Saudi Arabia Senegal Serbia
## 12 12 12
## Sierra Leone Singapore Slovak Republic
## 12 12 58
## Slovenia Solomon Islands Somalia
## 32 9 12
## South Africa Spain Sri Lanka
## 12 58 13
## Sudan Suriname Swaziland
## 12 8 12
## Sweden Switzerland Syria
## 58 58 12
## Taiwan Tajikistan Tanzania
## 58 4 12
## Thailand Timor-Leste Togo
## 13 4 12
## Tonga Trinidad and Tobago Tunisia
## 7 12 12
## Turkey Turkmenistan Uganda
## 12 4 13
## Ukraine United Arab Emirates United Kingdom
## 20 8 13
## United States Uruguay Uzbekistan
## 57 12 4
## Vanuatu Venezuela Vietnam
## 7 12 12
## West Bank and Gaza Yemen, Rep. Zambia
## 12 12 12
## Zimbabwe
## 12
unique(gapminder_unfiltered$continent)
## [1] Asia Europe Africa Americas FSU Oceania
## Levels: Africa Americas Asia Europe FSU Oceania
For the year 2007, what is the distribution of GDP per capita across all countries?
gapminder_unfiltered %>%
filter(year == 2007) %>%
ggplot() + geom_histogram(mapping = aes(x = gdpPercap), bins = 30)
For the year 2007, how do the distributions differ across the different continents?
gapminder_unfiltered %>%
filter(year == 2007) %>%
ggplot() +
geom_histogram(mapping = aes(x = gdpPercap), bins = 20)+
facet_wrap(~continent, nrow=3)
For the year 2007, what are the top 10 countries with the largest GDP per capita?
gapminder_unfiltered %>%
filter(year == 2007) %>%
filter(rank(desc(gdpPercap)) <= 10)%>%
arrange(desc(gdpPercap))%>%
select(country, gdpPercap)
Plot the GDP per capita for your country of origin for all years available.
gapminder_unfiltered %>%
filter(country == 'India') %>%
ggplot() +
geom_point(mapping = aes(x = year, y = gdpPercap))
What was the percent growth (or decline) in GDP per capita in 2007?
gapminder_unfiltered %>%
filter(country == 'India') %>%
arrange(year) %>%
mutate(change = (gdpPercap - lag(gdpPercap))/lag(gdpPercap) * 100) %>%
filter(year == 2007) %>%
select(year, change)
What has been the historical growth (or decline) in GDP per capita for your country?
gapminder_unfiltered %>%
filter(country == 'India') %>%
arrange(year) %>%
select(year, gdpPercap) %>%
summarize(growth = last(gdpPercap) - first(gdpPercap))