Synopsis

This is my homework report for week 4. In this week, our main focus is to investigate the dataset Gapminder and interact with it by performing early exploratory data analysis to answer questions about the data via transforming and visualizing it.

We will do the follwoing to explore our data -

  1. Use the dplyr package to perform many common data transformation and manipulation tasks.
  2. Use the ggplot package to visually analyze our data and answer specific questions.

Packages Used

library(knitr)       #knitr is the package used to create R Markdown pages
library(tidyverse)   #tidyverse loads the library dplyr which we use to persform data tranformations and 
library(Hmisc)       #Hmisc has some nice functions to describe the data-set
library(gapminder)   #gapminder loads the data set we are going to study today 

Source Code

Details about the Data: The Gapminder package gives us data on life expectancy, GDP per capita, and population by country.The main data frame gapminder has 1704 rows and 6 variables.

The variables are explained as follows:

  1. Country - factor with 142 levels
  2. Continent - Factor with 5 levels
  3. Year - ranges from 1952 to 2007 in increments of 5 years
  4. lifeExp - life expectancy at birth, in years
  5. pop - population
  6. dgoPercap - GDP per capita

Understanding the Data :

?gapminder
kable(head(gapminder))
country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
Afghanistan Asia 1977 38.438 14880372 786.1134

Data Description

Displaying the top and bottom 5 rows each from the data-set.

kable(head(gapminder_unfiltered,5))
country continent year lifeExp pop gdpPercap
Afghanistan Asia 1952 28.801 8425333 779.4453
Afghanistan Asia 1957 30.332 9240934 820.8530
Afghanistan Asia 1962 31.997 10267083 853.1007
Afghanistan Asia 1967 34.020 11537966 836.1971
Afghanistan Asia 1972 36.088 13079460 739.9811
kable(tail(gapminder_unfiltered,5))
country continent year lifeExp pop gdpPercap
Zimbabwe Africa 1987 62.351 9216418 706.1573
Zimbabwe Africa 1992 60.377 10704340 693.4208
Zimbabwe Africa 1997 46.809 11404948 792.4500
Zimbabwe Africa 2002 39.989 11926563 672.0386
Zimbabwe Africa 2007 43.487 12311143 469.7093

Displaying the Names of the Variables

names(gapminder_unfiltered)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"

Displaying the dimensions(number of rows and columns) of the dataset.

dim(gapminder_unfiltered)
## [1] 3313    6

Finding out the missing values in the data. As we can see, the data has no missing values.

sum(is.na(gapminder_unfiltered))
## [1] 0
which(is.na(gapminder_unfiltered), arr.ind = TRUE)
##      row col

Displaying the number of and the values of unique countries, continents and years.

length(unique(gapminder_unfiltered$country))
## [1] 187
unique(gapminder_unfiltered$country)
##   [1] Afghanistan              Albania                 
##   [3] Algeria                  Angola                  
##   [5] Argentina                Armenia                 
##   [7] Aruba                    Australia               
##   [9] Austria                  Azerbaijan              
##  [11] Bahamas                  Bahrain                 
##  [13] Bangladesh               Barbados                
##  [15] Belarus                  Belgium                 
##  [17] Belize                   Benin                   
##  [19] Bhutan                   Bolivia                 
##  [21] Bosnia and Herzegovina   Botswana                
##  [23] Brazil                   Brunei                  
##  [25] Bulgaria                 Burkina Faso            
##  [27] Burundi                  Cambodia                
##  [29] Cameroon                 Canada                  
##  [31] Cape Verde               Central African Republic
##  [33] Chad                     Chile                   
##  [35] China                    Colombia                
##  [37] Comoros                  Congo, Dem. Rep.        
##  [39] Congo, Rep.              Costa Rica              
##  [41] Cote d'Ivoire            Croatia                 
##  [43] Cuba                     Cyprus                  
##  [45] Czech Republic           Denmark                 
##  [47] Djibouti                 Dominican Republic      
##  [49] Ecuador                  Egypt                   
##  [51] El Salvador              Equatorial Guinea       
##  [53] Eritrea                  Estonia                 
##  [55] Ethiopia                 Fiji                    
##  [57] Finland                  France                  
##  [59] French Guiana            French Polynesia        
##  [61] Gabon                    Gambia                  
##  [63] Georgia                  Germany                 
##  [65] Ghana                    Greece                  
##  [67] Grenada                  Guadeloupe              
##  [69] Guatemala                Guinea                  
##  [71] Guinea-Bissau            Guyana                  
##  [73] Haiti                    Honduras                
##  [75] Hong Kong, China         Hungary                 
##  [77] Iceland                  India                   
##  [79] Indonesia                Iran                    
##  [81] Iraq                     Ireland                 
##  [83] Israel                   Italy                   
##  [85] Jamaica                  Japan                   
##  [87] Jordan                   Kazakhstan              
##  [89] Kenya                    Korea, Dem. Rep.        
##  [91] Korea, Rep.              Kuwait                  
##  [93] Latvia                   Lebanon                 
##  [95] Lesotho                  Liberia                 
##  [97] Libya                    Lithuania               
##  [99] Luxembourg               Macao, China            
## [101] Madagascar               Malawi                  
## [103] Malaysia                 Maldives                
## [105] Mali                     Malta                   
## [107] Martinique               Mauritania              
## [109] Mauritius                Mexico                  
## [111] Micronesia, Fed. Sts.    Moldova                 
## [113] Mongolia                 Montenegro              
## [115] Morocco                  Mozambique              
## [117] Myanmar                  Namibia                 
## [119] Nepal                    Netherlands             
## [121] Netherlands Antilles     New Caledonia           
## [123] New Zealand              Nicaragua               
## [125] Niger                    Nigeria                 
## [127] Norway                   Oman                    
## [129] Pakistan                 Panama                  
## [131] Papua New Guinea         Paraguay                
## [133] Peru                     Philippines             
## [135] Poland                   Portugal                
## [137] Puerto Rico              Qatar                   
## [139] Reunion                  Romania                 
## [141] Russia                   Rwanda                  
## [143] Samoa                    Sao Tome and Principe   
## [145] Saudi Arabia             Senegal                 
## [147] Serbia                   Sierra Leone            
## [149] Singapore                Slovak Republic         
## [151] Slovenia                 Solomon Islands         
## [153] Somalia                  South Africa            
## [155] Spain                    Sri Lanka               
## [157] Sudan                    Suriname                
## [159] Swaziland                Sweden                  
## [161] Switzerland              Syria                   
## [163] Taiwan                   Tajikistan              
## [165] Tanzania                 Thailand                
## [167] Timor-Leste              Togo                    
## [169] Tonga                    Trinidad and Tobago     
## [171] Tunisia                  Turkey                  
## [173] Turkmenistan             Uganda                  
## [175] Ukraine                  United Arab Emirates    
## [177] United Kingdom           United States           
## [179] Uruguay                  Uzbekistan              
## [181] Vanuatu                  Venezuela               
## [183] Vietnam                  West Bank and Gaza      
## [185] Yemen, Rep.              Zambia                  
## [187] Zimbabwe                
## 187 Levels: Afghanistan Albania Algeria Angola Argentina Armenia ... Zimbabwe
length(unique(gapminder_unfiltered$continent))
## [1] 6
unique(gapminder_unfiltered$continent)
## [1] Asia     Europe   Africa   Americas FSU      Oceania 
## Levels: Africa Americas Asia Europe FSU Oceania
length(unique(gapminder_unfiltered$year))
## [1] 58
unique(gapminder_unfiltered$year)
##  [1] 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007 1950 1951
## [15] 1953 1954 1955 1956 1958 1959 1960 1961 1963 1964 1965 1966 1968 1969
## [29] 1970 1971 1973 1974 1975 1976 1978 1979 1980 1981 1983 1984 1985 1986
## [43] 1988 1989 1990 1991 1993 1994 1995 1996 1998 1999 2000 2001 2003 2004
## [57] 2005 2006

Structure of the Data-Set

str(gapminder_unfiltered)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

Basic Summary statistics of the Data Set.

describe(gapminder_unfiltered)
## gapminder_unfiltered 
## 
##  6  Variables      3313  Observations
## ---------------------------------------------------------------------------
## country 
##        n  missing distinct 
##     3313        0      187 
## 
## lowest : Afghanistan        Albania            Algeria            Angola             Argentina         
## highest: Vietnam            West Bank and Gaza Yemen, Rep.        Zambia             Zimbabwe           
## ---------------------------------------------------------------------------
## continent 
##        n  missing distinct 
##     3313        0        6 
## 
## lowest : Africa   Americas Asia     Europe   FSU     
## highest: Americas Asia     Europe   FSU      Oceania  
## 
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0       58    0.998     1980    19.52     1952     1957 
##      .25      .50      .75      .90      .95 
##     1967     1982     1996     2002     2007 
## 
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007 
## ---------------------------------------------------------------------------
## lifeExp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     2571        1    65.24    12.73    41.22    45.37 
##      .25      .50      .75      .90      .95 
##    58.33    69.61    73.66    77.12    78.68 
## 
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670 
## ---------------------------------------------------------------------------
## pop 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      3313         0      3312         1  31773251  50168977    235605 
##       .10       .25       .50       .75       .90       .95 
##    436150   2680018   7559776  19610538  56737055 121365965 
## 
## lowest :      59412      59461      60011      60427      61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096 
## ---------------------------------------------------------------------------
## gdpPercap 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     3313        1    11314    11542    665.7    887.9 
##      .25      .50      .75      .90      .95 
##   2505.3   7825.8  17355.7  26592.7  31534.9 
## 
## lowest :    241.1659    277.5519    298.8462    299.8503    312.1884
## highest:  82010.9780  95458.1118 108382.3529 109347.8670 113523.1329 
## ---------------------------------------------------------------------------

Exploratory Data Analysis

For the year 2007, what is the distribution of GDP per capita across all countries?

GDP_2007 <- gapminder_unfiltered %>% filter(year==2007) %>% select(continent, country, gdpPercap)
ggplot(GDP_2007, aes(x=gdpPercap)) + geom_histogram(fill="cyan", bins=40) +
                                      ylab("GDP per Capita") +
                                      ggtitle("Distribution of GDP Per Capita for 2007 for all Countries") +
                                      theme(axis.title.x=element_blank(),
                                            axis.text.x=element_blank(),
                                            axis.ticks.x=element_blank())

ggplot(GDP_2007, aes(x=country, y=gdpPercap)) + geom_point(aes(color = continent)) +
                                              ylab("GDP per Capita") +
                                              ggtitle("GDP Per Capita for Contries grouped by Continents for 2007") + 
                                              theme(axis.title.x=element_blank(),
                                                    axis.text.x=element_blank(),
                                                    axis.ticks.x=element_blank())

For the year 2007, how do the distributions differ across the different continents?

ggplot(GDP_2007, aes(x=continent, y=gdpPercap)) + geom_bar(fill="firebrick", stat = "identity") +
                                                  xlab("Continents") +
                                                  ylab("GDP per Capita") +
                                                  ggtitle("GDP Per Capita vs Continents for 2007")

ggplot(GDP_2007, aes(x=continent, y=gdpPercap)) + geom_jitter(aes(color="firebrick")) +
                                                  xlab("Continents") +
                                                  ylab("GDP per Capita") +
                                                  ggtitle("GDP Per Capita vs Continents for 2007")

For the year 2007, what are the top 10 countries with the largest GDP per capita?

top_10_gdps <- GDP_2007[order(GDP_2007$gdpPercap, decreasing = TRUE),2:3][1:10,]
kable(top_10_gdps)
country gdpPercap
Qatar 82010.98
Macao, China 54589.82
Norway 49357.19
Brunei 48014.59
Kuwait 47306.99
Singapore 47143.18
United States 42951.65
Ireland 40676.00
Hong Kong, China 39724.98
Switzerland 37506.42
ggplot(top_10_gdps, aes(x=country, y=gdpPercap)) + geom_bar(fill="palegreen2", stat = "identity") +
                                                   xlab("Top 10 Countries") +
                                                   ylab("GDP per Capita") +
                                                   ggtitle("Top 10 GDP Per Capita vs Countries")

Plot the GDP per capita for your country of origin for all years available.

gdp_India <- gapminder_unfiltered %>% filter(country=="India") %>% select(year,gdpPercap)
ggplot(gdp_India, aes(x=year, y=gdpPercap)) + geom_point() + geom_smooth()  +
                                              xlab("Year") +
                                              ylab("GDP per Capita") +
                                              ggtitle("GDP Per Capita vs Year for INDIA")

What was the percent growth (or decline) in GDP per capita in 2007?

Percent.growth.India.2007 <- gapminder_unfiltered %>% arrange(year) %>% filter(country=="India")%>% 
                             mutate(GDP_Growth = (gdpPercap - lag(gdpPercap))/lag(gdpPercap)*100) %>%
                            filter(year==2007)
Percent.growth.India.2007
## # A tibble: 1 x 7
##   country continent  year lifeExp        pop gdpPercap GDP_Growth
##    <fctr>    <fctr> <int>   <dbl>      <int>     <dbl>      <dbl>
## 1   India      Asia  2007  64.698 1110396331   2452.21   40.38546

The percent growth in GDP per Capita for India from 2002 to 2007 is almost 40%

6. What has been the historical growth (or decline) in GDP per capita for your country?

Percent.growth.India.Historical <- gapminder_unfiltered %>% arrange(year) %>% filter(country=="India")%>% 
                                mutate(GDP_Growth = (gdpPercap - lag(gdpPercap))/lag(gdpPercap)*100)
kable(Percent.growth.India.Historical)
country continent year lifeExp pop gdpPercap GDP_Growth
India Asia 1952 37.373 372000000 546.5657 NA
India Asia 1957 40.249 409000000 590.0620 7.958100
India Asia 1962 43.605 454000000 658.3472 11.572539
India Asia 1967 47.193 506000000 700.7706 6.443935
India Asia 1972 50.651 567000000 724.0325 3.319477
India Asia 1977 54.208 634000000 813.3373 12.334362
India Asia 1982 56.596 708000000 855.7235 5.211394
India Asia 1987 58.553 788000000 976.5127 14.115440
India Asia 1992 60.223 872000000 1164.4068 19.241341
India Asia 1997 61.765 959000000 1458.8174 25.284173
India Asia 2002 62.879 1034172547 1746.7695 19.738728
India Asia 2007 64.698 1110396331 2452.2104 40.385464
ggplot(Percent.growth.India.Historical, aes(x=year, y=GDP_Growth)) + geom_line() +
                                              xlab("Year") +
                                              ylab("GDP Growth") +
                                              ggtitle("GDP Growth vs Year for INDIA")  
## Warning: Removed 1 rows containing missing values (geom_path).

Total.Percent.Growth <- gapminder_unfiltered %>% arrange(year) %>% filter(country=="India")%>% 
                        mutate(GDP_Growth = (gdpPercap - first(gdpPercap))/first(gdpPercap)*100)%>%
                        filter(year==2007)
Total.Percent.Growth
## # A tibble: 1 x 7
##   country continent  year lifeExp        pop gdpPercap GDP_Growth
##    <fctr>    <fctr> <int>   <dbl>      <int>     <dbl>      <dbl>
## 1   India      Asia  2007  64.698 1110396331   2452.21   348.6579

The historical percent growth in GDP per Capita for India from 1952 to 2007 has been close to 350%