Packages Required:

Gapminder Package: Data for Exploration

For each of 142 countries, the package provides values for life expectancy, GDP per capita, and population, every five years, from 1952 to 2007.

Tidyverse Package

The tidyverse is a set of packages that work in harmony because they share common data representations and API design. The tidyverse package is designed to make it easy to install and load core packages from the tidyverse in a single command.

Tidyverse installs many packages used for data import , manipulation and modeling, Library (tidyverse) installs the core tidyverse packages that you are likely to use in almost every analysis. The other packages in tidyverse need to be called explicitly

Core Tidyverse packages ggplot2, for data visualisation. dplyr, for data manipulation. tidyr, for data tidying. readr, for data import. purrr, for functional programming. tibble, for tibbles, a modern re-imagining of data frames.

library (gapminder)
library (tidyverse)
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag():    dplyr, stats

Dataset Gapminder_unfiltered

Create a copy of the gapminder_unfiltered dataset to avoid damaging the original dataset

datagap<- (gapminder_unfiltered)
datagap
## # A tibble: 3,313 × 6
##        country continent  year lifeExp      pop gdpPercap
##         <fctr>    <fctr> <int>   <dbl>    <int>     <dbl>
## 1  Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2  Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3  Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4  Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5  Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6  Afghanistan      Asia  1977  38.438 14880372  786.1134
## 7  Afghanistan      Asia  1982  39.854 12881816  978.0114
## 8  Afghanistan      Asia  1987  40.822 13867957  852.3959
## 9  Afghanistan      Asia  1992  41.674 16317921  649.3414
## 10 Afghanistan      Asia  1997  41.763 22227415  635.3414
## # ... with 3,303 more rows

Dataset has 6 variables and 3313 observations Sorted by Country and Year (ascending)

Source Code:

?gapminder_unfiltered The supplemental data frame gapminder_unfiltered was not filtered on year or for complete data and has 3313 rows. Everything else is as documented in gapminder.

Our ambition with Gapminder World is to enable the display of data for all the countries and territories of the world. Therefore, the guiding principle has been to include as many entities as possible for which data might be available.

Please note that the inclusion of any geaographical area in this data set is based solely on data availability and convenience for possible users. Our choice of names for any of the included countries and territories is likewise made solely for the convenience of users. The notes on international status are based on Wikipedia. Neither this nor the inclusion/exclusion of a specific country or territory implies a stated opinion of Gapminder regarding the legal or political status of the geographica area in question. Neither do the names imply a stated opinion of Gapminder on the correct naming of an entity.

gapminder_unfiltered was not filtered on year or for complete data and has 3313 rows.

The number of countries and territories to include is arbitrary, but we have decided to include the following entities:

192 UN members (as of April 2008) 51 other entities listed in the “List of countries” in Wikipedia (2008-05-13). These include the Vatican, dependent territories, special entities and disputed territories. We have excluded the two “sub-dependencies” Ascension Island and Tristan da Cunha, although they are listed by Wikipedia. 4 French overseas territories (Guadeloupe, Martinique, Reunion and French Guyana), although they are considered an integral part of France 10 former states 2 ad-hoc areas: “Serbia excluding Kosovo” and “the Channel Islands”. The latter is the collective name of the two dependent territories Guernsey and Jersey.

Country

factor with 187 levels

Continent

factor with 5 levels

year

ranges from 1952 to 2007. Some countries have data only every fifth year starting in 1952. Other countries have data for every year

lifeExp

life expectancy at birth, in years The data in this file was combined from hundreds of sources, in four steps:
a) The period 1990 to 2015, uses data from IHME
Downloaded from this file: Data after 1990 comes from: Global Burden of Disease Study 2015 (GBD 2015) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2016. Available from http://ghdx.healthdata.org/gbd-results-tool. b) Data before 1990 uses Gapminder Historic Life Expectancy data,

  1. Data before 1970, are collected of estimates of average life expectancy from different sources. The main sources used:
  1. Human Mortality Database, www.mortality.org
  2. World Population Prospects: The 2010 Revision / United Nations Population Division, with projections
  3. Publications and files by history prof. James C Riley
  4. Human Lifetable Database, www.lifetable.de
  5. Miscellaneous sources, see full documentation (link below)
  1. where no estimates are available form any source, before 1950, we constructed, simple model for showing levels and changes in historical life expectancy, mainly based on Infant Mortality Rate data.
    WE DISCOURAGE THE USE OF THIS DATASET FOR STATISTICAL ANALYSIS
    PLEASE CONSULT THE FULL DOCUMENTATION FOR MORE DETAILS

pop

population

gdpPercap NA=Blank field)

Note: I seem to find a discrepancy when looking at descriptions of the data (list of indicators vs documentation) on the gapminder site. One place reference $US for year 2000 and the other place references year 2005. Second discrepancy is that one references that only inflation is taken into account (year2000) and the other says data is adjusted for both inflation and cost of living (year 2005)

Gross Domestic Product per capita in constant 2000 US$. The inflation but not the differences in the cost of living between countries has been taken into account.

GDP per capita measures how much have been produced in a country during a year, divided by the number of people. The data is adjusted for inflation and differences in the cost of living between countries. Cross-country data for 2005 is mainly based on the 2005 round of the International Comparison Program. Real growth rates were linked to the 2005 levels. Several sources are used for these growth rates, such as the data of Angus Maddison. In addition we utilised a couple of cross-country comparisons for earlier years, which required that we adjusted the growth rates. The unit is in international dollars, fixed 2005 prices.

Life Expectancy

Life expectancy at birth, IHME downloaded 2015 jan from: http://ghdx.healthdata.org/record/global-burden-disease-study-2013-gbd-2013-age-sex-specific-all-cause-and-cause-specific Contributor: Global Burden of Disease Study 2013 Publication year : 2014

The data in this file is estimated and was combined from hundreds of sources, in four steps:
a) The period 1990 to 2015, uses data from IHME
Downloaded from this file: Data after 1990 comes from: Global Burden of Disease Study 2015 (GBD 2015) Results. Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2016. Available from http://ghdx.healthdata.org/gbd-results-tool. b) Data before 1990 uses Gapminder Historic Life Expectancy data,

  1. Data before 1970, are collected of estimates of average life expectancy from different sources. The main sources used:
  1. Human Mortality Database, www.mortality.org
  2. World Population Prospects: The 2010 Revision / United Nations Population Division, with projections
  3. Publications and files by history prof. James C Riley
  4. Human Lifetable Database, www.lifetable.de
  5. Miscellaneous sources, see full documentation (link below)
  1. where no estimates are available form any source, before 1950, we constructed, simple model for showing levels and changes in historical life expectancy, mainly based on Infant Mortality Rate data.
    WE DISCOURAGE THE USE OF THIS DATASET FOR STATISTICAL ANALYSIS
    PLEASE CONSULT THE FULL DOCUMENTATION FOR MORE DETAILS

Data Description

Data Dimensions

dim(datagap)
## [1] 3313    6
summary(datagap)
##            country        continent         year         lifeExp     
##  Czech Republic:  58   Africa  : 637   Min.   :1950   Min.   :23.60  
##  Denmark       :  58   Americas: 470   1st Qu.:1967   1st Qu.:58.33  
##  Finland       :  58   Asia    : 578   Median :1982   Median :69.61  
##  Iceland       :  58   Europe  :1302   Mean   :1980   Mean   :65.24  
##  Japan         :  58   FSU     : 139   3rd Qu.:1996   3rd Qu.:73.66  
##  Netherlands   :  58   Oceania : 187   Max.   :2007   Max.   :82.67  
##  (Other)       :2965                                                 
##       pop              gdpPercap       
##  Min.   :5.941e+04   Min.   :   241.2  
##  1st Qu.:2.680e+06   1st Qu.:  2505.3  
##  Median :7.560e+06   Median :  7825.8  
##  Mean   :3.177e+07   Mean   : 11313.8  
##  3rd Qu.:1.961e+07   3rd Qu.: 17355.8  
##  Max.   :1.319e+09   Max.   :113523.1  
## 

Variable names

names(datagap)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"

Data types

str(datagap)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Missing Values:

sum(is.na(datagap))
## [1] 0

Summary Statistics

summary(datagap)
##            country        continent         year         lifeExp     
##  Czech Republic:  58   Africa  : 637   Min.   :1950   Min.   :23.60  
##  Denmark       :  58   Americas: 470   1st Qu.:1967   1st Qu.:58.33  
##  Finland       :  58   Asia    : 578   Median :1982   Median :69.61  
##  Iceland       :  58   Europe  :1302   Mean   :1980   Mean   :65.24  
##  Japan         :  58   FSU     : 139   3rd Qu.:1996   3rd Qu.:73.66  
##  Netherlands   :  58   Oceania : 187   Max.   :2007   Max.   :82.67  
##  (Other)       :2965                                                 
##       pop              gdpPercap       
##  Min.   :5.941e+04   Min.   :   241.2  
##  1st Qu.:2.680e+06   1st Qu.:  2505.3  
##  Median :7.560e+06   Median :  7825.8  
##  Mean   :3.177e+07   Mean   : 11313.8  
##  3rd Qu.:1.961e+07   3rd Qu.: 17355.8  
##  Max.   :1.319e+09   Max.   :113523.1  
## 

Unique Values

unique(datagap$continent)
## [1] Asia     Europe   Africa   Americas FSU      Oceania 
## Levels: Africa Americas Asia Europe FSU Oceania
unique(datagap$year)
##  [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 1950 1951
## [15] 1953 1954 1955 1956 1958 1959 1960 1961 1963 1964 1965 1966 1968 1969
## [29] 1970 1971 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986
## [43] 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2001 2003 2004
## [57] 2005 2006
table(datagap$year)
## 
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962 1963 1964 
##   39   24  144   24   24   24   24  144   25   25   26   26  151   26   26 
## 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 
##   27   27  156   27   27   27   27  168   32   27   27   27  171   27   27 
## 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 
##   27   27  171   27   27   27   27  171   27   27   32   33  183   33   33 
## 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 
##   33   33  184   33   33   33   33  187   33   32   30   18  183

Exploratory Data Analysis

For the year 2007, what is the distribution of GDP per capita across all countries?

GDP2007 <- filter(datagap, year== "2007")
GDP2007
## # A tibble: 183 × 6
##        country continent  year lifeExp      pop  gdpPercap
##         <fctr>    <fctr> <int>   <dbl>    <int>      <dbl>
## 1  Afghanistan      Asia  2007  43.828 31889923   974.5803
## 2      Albania    Europe  2007  76.423  3600523  5937.0295
## 3      Algeria    Africa  2007  72.301 33333216  6223.3675
## 4       Angola    Africa  2007  42.731 12420476  4797.2313
## 5    Argentina  Americas  2007  75.320 40301927 12779.3796
## 6      Armenia       FSU  2007  71.965  2971650  4942.5439
## 7        Aruba  Americas  2007  74.239    72194 27230.6752
## 8    Australia   Oceania  2007  81.235 20434176 34435.3674
## 9      Austria    Europe  2007  79.829  8199783 36126.4927
## 10  Azerbaijan      Asia  2007  67.487  8017309  7708.6112
## # ... with 173 more rows
ggplot(GDP2007,aes(x=GDP2007$gdpPercap)) +
    geom_histogram(aes(y=..density..),
        binwidth=2500,
        colour="black", fill="red") +
    geom_density(alpha=.2, fill="#99CCFF")  +
    labs(title="Histogram of All countries GDP per Capita in 2007 with Mean") +
    labs(x="GDP per Capita", y="Density") +
    geom_vline(aes(xintercept=mean(gdpPercap,  na.rm = TRUE)),   
           color="blue", linetype="dashed", size=1)

For the year 2007, how do the distributions differ across the different continents? facets by continent

ggplot(GDP2007, aes(x=gdpPercap, fill=continent)) +
 geom_histogram() 
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

For the year 2007, what are the top 10 countries with the largest GDP per capita?

GDP2007<-arrange(GDP2007, desc(gdpPercap))
GDP2007
## # A tibble: 183 × 6
##             country continent  year lifeExp       pop gdpPercap
##              <fctr>    <fctr> <int>   <dbl>     <int>     <dbl>
## 1             Qatar      Asia  2007  75.588    907229  82010.98
## 2      Macao, China      Asia  2007  80.718    456989  54589.82
## 3            Norway    Europe  2007  80.196   4627926  49357.19
## 4            Brunei      Asia  2007  77.118    386511  48014.59
## 5            Kuwait      Asia  2007  77.588   2505559  47306.99
## 6         Singapore      Asia  2007  79.972   4553009  47143.18
## 7     United States  Americas  2007  78.242 301139947  42951.65
## 8           Ireland    Europe  2007  78.885   4109086  40676.00
## 9  Hong Kong, China      Asia  2007  82.208   6980412  39724.98
## 10      Switzerland    Europe  2007  81.701   7554661  37506.42
## # ... with 173 more rows

Plot the GDP per capita for your country of origin for all years available.

Canada <- datagap %>%
          filter(country=="Canada") %>%
          select(year, gdpPercap)
Canada
## # A tibble: 57 × 2
##     year gdpPercap
##    <int>     <dbl>
## 1   1950  10581.27
## 2   1951  10932.47
## 3   1952  11367.16
## 4   1953  11586.61
## 5   1954  11173.26
## 6   1955  11901.51
## 7   1956  12555.55
## 8   1957  12489.95
## 9   1958  12384.41
## 10  1959  12590.80
## # ... with 47 more rows
 ggplot(Canada, aes(x=year, y=gdpPercap)) +
   geom_line() +
   geom_point() +
   labs(title="Canadian GDP per capita from 1950 to 2005") +
    labs(x="Year", y="GDP per Capita")

What was the percent growth (or decline) in GDP per capita in 2007? 3.53%

GDPChange<-Canada %>%
          mutate(GDPChange=(gdpPercap-lag(gdpPercap))/lag(gdpPercap))
tail(GDPChange)
## # A tibble: 6 × 3
##    year gdpPercap   GDPChange
##   <int>     <dbl>       <dbl>
## 1  2001  32570.57 0.003758522
## 2  2002  33328.97 0.023284784
## 3  2003  33635.25 0.009189884
## 4  2004  34346.97 0.021159677
## 5  2005  35078.00 0.021283816
## 6  2007  36319.24 0.035384999

What has been the historical growth (or decline) in GDP per capita for your country?

ggplot(GDPChange, aes(x=year, y=GDPChange)) +
   geom_line() +
   geom_point() +
   labs(title="Change in Canadian GDP per capita from 1950 to 2005") +
    labs(x="Year", y="Change in GDP per Capita")
## Warning: Removed 1 rows containing missing values (geom_path).
## Warning: Removed 1 rows containing missing values (geom_point).