synopsis:

This report explores transforming the data and combining it with the visualization for explorartory data analysis

Packages used

ggplot2 for the visualizzations
tidyverse for using some dplyr functions
gapminder for using the gapminder data

Source code:

The gapminder data contains 6 variables. Country and continent are factors. Life expectency is at birth. The years range from 1950 to 2007 with increments of five years.

Data Description

Structure

str(gapminder_unfiltered)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Missing Values

sapply(gapminder_unfiltered,function(x) (sum(is.na(x))))
##   country continent      year   lifeExp       pop gdpPercap 
##         0         0         0         0         0         0

Summary Statistics

sapply(gapminder_unfiltered,summary)
## $country
##           Czech Republic                  Denmark                  Finland 
##                       58                       58                       58 
##                  Iceland                    Japan              Netherlands 
##                       58                       58                       58 
##                   Norway                 Portugal          Slovak Republic 
##                       58                       58                       58 
##                    Spain                   Sweden              Switzerland 
##                       58                       58                       58 
##                   Taiwan                  Austria                  Belgium 
##                       58                       57                       57 
##                 Bulgaria                   Canada                   France 
##                       57                       57                       57 
##                  Hungary            United States                Australia 
##                       57                       57                       56 
##                    Italy              New Zealand                   Poland 
##                       56                       55                       52 
##               Luxembourg                   Latvia                    China 
##                       49                       42                       36 
##                 Slovenia                  Germany                   Russia 
##                       32                       26                       20 
##                  Ukraine                  Belarus                  Estonia 
##                       20                       18                       18 
##                Lithuania               Costa Rica                     Cuba 
##                       18                       13                       13 
##                   Greece                  Ireland                    Libya 
##                       13                       13                       13 
##                   Mexico              Puerto Rico                Sri Lanka 
##                       13                       13                       13 
##                 Thailand                   Uganda           United Kingdom 
##                       13                       13                       13 
##              Afghanistan                  Albania                  Algeria 
##                       12                       12                       12 
##                   Angola                Argentina                  Bahrain 
##                       12                       12                       12 
##               Bangladesh                    Benin                  Bolivia 
##                       12                       12                       12 
##   Bosnia and Herzegovina                 Botswana                   Brazil 
##                       12                       12                       12 
##             Burkina Faso                  Burundi                 Cambodia 
##                       12                       12                       12 
##                 Cameroon Central African Republic                     Chad 
##                       12                       12                       12 
##                    Chile                 Colombia                  Comoros 
##                       12                       12                       12 
##         Congo, Dem. Rep.              Congo, Rep.            Cote d'Ivoire 
##                       12                       12                       12 
##                  Croatia                 Djibouti       Dominican Republic 
##                       12                       12                       12 
##                  Ecuador                    Egypt              El Salvador 
##                       12                       12                       12 
##        Equatorial Guinea                  Eritrea                 Ethiopia 
##                       12                       12                       12 
##                    Gabon                   Gambia                    Ghana 
##                       12                       12                       12 
##                Guatemala                   Guinea            Guinea-Bissau 
##                       12                       12                       12 
##                    Haiti                 Honduras         Hong Kong, China 
##                       12                       12                       12 
##                    India                Indonesia                     Iran 
##                       12                       12                       12 
##                     Iraq                   Israel                  Jamaica 
##                       12                       12                       12 
##                   Jordan                    Kenya         Korea, Dem. Rep. 
##                       12                       12                       12 
##              Korea, Rep.                   Kuwait                  Lebanon 
##                       12                       12                       12 
##                  (Other) 
##                      871 
## 
## $continent
##   Africa Americas     Asia   Europe      FSU  Oceania 
##      637      470      578     1302      139      187 
## 
## $year
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1950    1967    1982    1980    1996    2007 
## 
## $lifeExp
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   23.60   58.33   69.61   65.24   73.66   82.67 
## 
## $pop
##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## 5.941e+04 2.680e+06 7.560e+06 3.177e+07 1.961e+07 1.319e+09 
## 
## $gdpPercap
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    241.2   2505.0   7826.0  11310.0  17360.0 113500.0

Findings

Level of data is at country and year level

nrow(unique(gapminder_unfiltered[,c("country","year")]))
## [1] 3313

The unique number of rows at country and year level is same as total nuber of rows

 gapminder_unfiltered %>% 
  group_by(country) %>% 
  summarize(years_for_country=n_distinct(year),num=n()) %>% 
  arrange(desc(years_for_country))
## # A tibble: 187 × 3
##            country years_for_country   num
##             <fctr>             <int> <int>
## 1   Czech Republic                58    58
## 2          Denmark                58    58
## 3          Finland                58    58
## 4          Iceland                58    58
## 5            Japan                58    58
## 6      Netherlands                58    58
## 7           Norway                58    58
## 8         Portugal                58    58
## 9  Slovak Republic                58    58
## 10           Spain                58    58
## # ... with 177 more rows

Number of years captured for each country. Maximum years for some countries is 58 i.e. all years from 1950 to 2007 captured

  gapminder_unfiltered %>% 
  group_by(year) %>% 
  summarize(country_count_for_years= n_distinct(country)) %>% 
  arrange(desc(country_count_for_years))
## # A tibble: 58 × 2
##     year country_count_for_years
##    <int>                   <int>
## 1   2002                     187
## 2   1997                     184
## 3   1992                     183
## 4   2007                     183
## 5   1977                     171
## 6   1982                     171
## 7   1987                     171
## 8   1972                     168
## 9   1967                     156
## 10  1962                     151
## # ... with 48 more rows

The year 2002 had the maximum coverage with 187 countries captured

Exploratory Data Analysis

1. Distribution of GDP per Capita for 2007 - all countries

gapminder_unfiltered %>% 
  filter(year==2007) %>% 
  ggplot()+
  geom_histogram(mapping=aes(x=gdpPercap),col="red")

2. Distribution of GDP per capita for 2007 for different continents

gapminder_unfiltered %>% 
  filter(year==2007) %>% 
  ggplot()+
  geom_histogram(mapping=aes(x=gdpPercap),col="red") +
  facet_wrap(~continent , nrow=3)

3. Top 10 countries with largest GDP per capita in 2007

gapminder_unfiltered %>% 
  filter(year==2007) %>% 
  mutate(rank=dense_rank(desc(gdpPercap))) %>% 
  filter(rank<=10) %>% 
  select(country,gdpPercap) %>% 
  ggplot()+
  geom_bar(mapping=aes(y=gdpPercap, x=reorder(country,-gdpPercap)),stat="identity",col="green")

4. Variation of GDP per Capita for India over time

gapminder_unfiltered %>% 
  filter(country=="India") %>% 
  ggplot(mapping=aes(x=year,y=gdpPercap))+
  geom_point(col="red",size=3)+
  geom_line()

5,6. Historical percent change in GDP per capita for India

gapminder_unfiltered %>% 
  filter(country=="India") %>% 
  arrange(year) %>% 
  mutate(percent_growth=(gdpPercap-(lag(gdpPercap)))/(lag(gdpPercap))) %>% 
  ggplot()+
  geom_point(mapping=aes(x=year,y=gdpPercap,size=percent_growth),col="blue")