Synopsis

The purpose of this Markdown file is to get acquainted with various R-Packages like dplyr, ggplot , rworldap, gapminder
Findings: countries like America and Australia have high GDP per Capita, while Qatar has topped in per capita for 2007. Also majorty of the countries have gdpPerCapita less than 20000. While India is growing over years

Packages Required

Following are the packages loaded

library(gapminder) ## Getting the Data
library(rworldmap) ## plotting the data on World Map
library(countrycode) ## Converting the country name to Country code
library(dplyr) ## For manipulating, transforming, filtering, summarizing the data
library(Hmisc)
library(printr) 
library(RColorBrewer) ## to chose different colors for the graph

Source Code

GapMinder_Unfiltered is the dataset which has gdpPerCapita (Gross Domestic Product per Capita) across the countries in the globe collected over years dating 1950 to 2007
Following are the variables in the dataset:
1. Country : Names of the countries
2. Continent : Name of the continent the country belongs to
3. Year : year for which this observation is collected
4. LifeExp : Life Expectency for people in that country
5. Pop : Population for that country in that year
6. gdpPerCap : It is the gdp Per capita ( gross domestic product divided by the population)

Data Description

dim(gapminder_unfiltered)
## [1] 3313    6
colnames(gapminder_unfiltered)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"
sum(complete.cases(gapminder)) ## No missing values found
## [1] 1704
describe(gapminder_unfiltered) # getting the summary statistics for gapminder_unfiltered data
## gapminder_unfiltered 
## 
##  6  Variables      3313  Observations
## ---------------------------------------------------------------------------
## country 
##        n  missing distinct 
##     3313        0      187 
## 
## lowest : Afghanistan        Albania            Algeria            Angola             Argentina         
## highest: Vietnam            West Bank and Gaza Yemen, Rep.        Zambia             Zimbabwe           
## ---------------------------------------------------------------------------
## continent 
##        n  missing distinct 
##     3313        0        6 
## 
## lowest : Africa   Americas Asia     Europe   FSU     
## highest: Americas Asia     Europe   FSU      Oceania  
## 
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0       58    0.998     1980    19.52     1952     1957 
##      .25      .50      .75      .90      .95 
##     1967     1982     1996     2002     2007 
## 
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007 
## ---------------------------------------------------------------------------
## lifeExp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     2571        1    65.24    12.73    41.22    45.37 
##      .25      .50      .75      .90      .95 
##    58.33    69.61    73.66    77.12    78.68 
## 
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670 
## ---------------------------------------------------------------------------
## pop 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      3313         0      3312         1  31773251  50168977    235605 
##       .10       .25       .50       .75       .90       .95 
##    436150   2680018   7559776  19610538  56737055 121365965 
## 
## lowest :      59412      59461      60011      60427      61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096 
## ---------------------------------------------------------------------------
## gdpPercap 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     3313        1    11314    11542    665.7    887.9 
##      .25      .50      .75      .90      .95 
##   2505.3   7825.8  17355.7  26592.7  31534.9 
## 
## lowest :    241.1659    277.5519    298.8462    299.8503    312.1884
## highest:  82010.9780  95458.1118 108382.3529 109347.8670 113523.1329 
## ---------------------------------------------------------------------------

Exploratory Data Analysis

Some of the Questions are answered

For the year 2007, what is the distribution of GDP per capita across all countries?

gapminder <-  gapminder_unfiltered
gapminder$countrycode <- countrycode(gapminder$country, 'country.name', 'iso3c')
  
sPDF <- joinCountryData2Map(gapminder %>% filter(year == 2007)
                              ,joinCode = "ISO3"
                              ,nameJoinColumn = "countrycode"
                              ,mapResolution = "coarse"
                              , verbose = T)
## 181 codes from your data successfully matched countries in the map
## 2 codes from your data failed to match with a country code in the map
##      failedCodes
## [1,] "ANT"      
## [2,] "REU"      
## 62 codes from the map weren't represented in your data
colourPalette <- brewer.pal(7,'GnBu')

mapParams <- mapCountryData(sPDF,
                            nameColumnToPlot="gdpPercap",
                            addLegend=FALSE,
                            colourPalette=colourPalette )

do.call(addMapLegend
        ,c(mapParams
           ,legendLabels="all"
           ,legendWidth=0.5
           ,legendIntervals="data"
           ,legendMar = 2))

gap2007 <- gapminder %>% filter(year == 2007)
ggplot(data=gap2007, aes(gap2007$gdpPercap)) + geom_histogram(fill = '#1ABC9C')

For the year 2007, how do the distributions differ across the different continents?

ggplot(gap2007,aes(gap2007$gdpPercap)) + geom_histogram(fill = '#1ABC9C')+ 
  facet_wrap(~continent, scales='free') + stat_bin(breaks=10, binwidth = 1000 )
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

For the year 2007, what are the top 10 countries with the largest GDP per capita?

gap2007 %>% arrange(-gdpPercap) %>% head(10) %>% select(country, gdpPercap)
country gdpPercap
Qatar 82010.98
Macao, China 54589.82
Norway 49357.19
Brunei 48014.59
Kuwait 47306.99
Singapore 47143.18
United States 42951.65
Ireland 40676.00
Hong Kong, China 39724.98
Switzerland 37506.42

Plot the GDP per capita for your country of origin for all years available.

gapminder %>% filter(country =='India') %>%  ggplot(.,aes(x =year, y= gdpPercap)) +
  geom_line(color ='#1ABC9C') + geom_point(color='#21618C')

# grid.arrange(gp, ncol=3, nrow =2)

What was the percent growth (or decline) in GDP per capita in 2007?

gapminder %>% filter(country =='India') %>%  mutate(PercentGrowth =
                  (gdpPercap - lag(gdpPercap))*100/lag(gdpPercap)) %>% 
  arrange(-PercentGrowth) %>% head(1)
country continent year lifeExp pop gdpPercap countrycode PercentGrowth
India Asia 2007 64.698 1110396331 2452.21 IND 40.38546

What has been the historical growth (or decline) in GDP per capita for your country?

gapminder %>% filter(country =='India') %>% 
           mutate(PercentGrowth = (gdpPercap - lag(gdpPercap))*100/lag(gdpPercap)) %>% 
           ggplot(.,aes(x=year, y=PercentGrowth)) + geom_line(color ='#1ABC9C') + geom_point(color='#21618C')