Synopsis
The purpose of this Markdown file is to get acquainted with various R-Packages like dplyr, ggplot , rworldap, gapminder
Findings: countries like America and Australia have high GDP per Capita, while Qatar has topped in per capita for 2007. Also majorty of the countries have gdpPerCapita less than 20000. While India is growing over years
Packages Required
Following are the packages loaded
library(gapminder) ## Getting the Data
library(rworldmap) ## plotting the data on World Map
library(countrycode) ## Converting the country name to Country code
library(dplyr) ## For manipulating, transforming, filtering, summarizing the data
library(Hmisc)
library(printr)
library(RColorBrewer) ## to chose different colors for the graphSource Code
GapMinder_Unfiltered is the dataset which has gdpPerCapita (Gross Domestic Product per Capita) across the countries in the globe collected over years dating 1950 to 2007
Following are the variables in the dataset:
1. Country : Names of the countries
2. Continent : Name of the continent the country belongs to
3. Year : year for which this observation is collected
4. LifeExp : Life Expectency for people in that country
5. Pop : Population for that country in that year
6. gdpPerCap : It is the gdp Per capita ( gross domestic product divided by the population)
Data Description
dim(gapminder_unfiltered)## [1] 3313 6
colnames(gapminder_unfiltered)## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
sum(complete.cases(gapminder)) ## No missing values found## [1] 1704
describe(gapminder_unfiltered) # getting the summary statistics for gapminder_unfiltered data## gapminder_unfiltered
##
## 6 Variables 3313 Observations
## ---------------------------------------------------------------------------
## country
## n missing distinct
## 3313 0 187
##
## lowest : Afghanistan Albania Algeria Angola Argentina
## highest: Vietnam West Bank and Gaza Yemen, Rep. Zambia Zimbabwe
## ---------------------------------------------------------------------------
## continent
## n missing distinct
## 3313 0 6
##
## lowest : Africa Americas Asia Europe FSU
## highest: Americas Asia Europe FSU Oceania
##
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 58 0.998 1980 19.52 1952 1957
## .25 .50 .75 .90 .95
## 1967 1982 1996 2002 2007
##
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007
## ---------------------------------------------------------------------------
## lifeExp
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 2571 1 65.24 12.73 41.22 45.37
## .25 .50 .75 .90 .95
## 58.33 69.61 73.66 77.12 78.68
##
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670
## ---------------------------------------------------------------------------
## pop
## n missing distinct Info Mean Gmd .05
## 3313 0 3312 1 31773251 50168977 235605
## .10 .25 .50 .75 .90 .95
## 436150 2680018 7559776 19610538 56737055 121365965
##
## lowest : 59412 59461 60011 60427 61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096
## ---------------------------------------------------------------------------
## gdpPercap
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 3313 1 11314 11542 665.7 887.9
## .25 .50 .75 .90 .95
## 2505.3 7825.8 17355.7 26592.7 31534.9
##
## lowest : 241.1659 277.5519 298.8462 299.8503 312.1884
## highest: 82010.9780 95458.1118 108382.3529 109347.8670 113523.1329
## ---------------------------------------------------------------------------
Exploratory Data Analysis
Some of the Questions are answered
For the year 2007, what is the distribution of GDP per capita across all countries?
gapminder <- gapminder_unfiltered
gapminder$countrycode <- countrycode(gapminder$country, 'country.name', 'iso3c')
sPDF <- joinCountryData2Map(gapminder %>% filter(year == 2007)
,joinCode = "ISO3"
,nameJoinColumn = "countrycode"
,mapResolution = "coarse"
, verbose = T)## 181 codes from your data successfully matched countries in the map
## 2 codes from your data failed to match with a country code in the map
## failedCodes
## [1,] "ANT"
## [2,] "REU"
## 62 codes from the map weren't represented in your data
colourPalette <- brewer.pal(7,'GnBu')
mapParams <- mapCountryData(sPDF,
nameColumnToPlot="gdpPercap",
addLegend=FALSE,
colourPalette=colourPalette )
do.call(addMapLegend
,c(mapParams
,legendLabels="all"
,legendWidth=0.5
,legendIntervals="data"
,legendMar = 2))gap2007 <- gapminder %>% filter(year == 2007)
ggplot(data=gap2007, aes(gap2007$gdpPercap)) + geom_histogram(fill = '#1ABC9C')For the year 2007, how do the distributions differ across the different continents?
ggplot(gap2007,aes(gap2007$gdpPercap)) + geom_histogram(fill = '#1ABC9C')+
facet_wrap(~continent, scales='free') + stat_bin(breaks=10, binwidth = 1000 )## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
For the year 2007, what are the top 10 countries with the largest GDP per capita?
gap2007 %>% arrange(-gdpPercap) %>% head(10) %>% select(country, gdpPercap)| country | gdpPercap |
|---|---|
| Qatar | 82010.98 |
| Macao, China | 54589.82 |
| Norway | 49357.19 |
| Brunei | 48014.59 |
| Kuwait | 47306.99 |
| Singapore | 47143.18 |
| United States | 42951.65 |
| Ireland | 40676.00 |
| Hong Kong, China | 39724.98 |
| Switzerland | 37506.42 |
Plot the GDP per capita for your country of origin for all years available.
gapminder %>% filter(country =='India') %>% ggplot(.,aes(x =year, y= gdpPercap)) +
geom_line(color ='#1ABC9C') + geom_point(color='#21618C')# grid.arrange(gp, ncol=3, nrow =2)What was the percent growth (or decline) in GDP per capita in 2007?
gapminder %>% filter(country =='India') %>% mutate(PercentGrowth =
(gdpPercap - lag(gdpPercap))*100/lag(gdpPercap)) %>%
arrange(-PercentGrowth) %>% head(1)| country | continent | year | lifeExp | pop | gdpPercap | countrycode | PercentGrowth |
|---|---|---|---|---|---|---|---|
| India | Asia | 2007 | 64.698 | 1110396331 | 2452.21 | IND | 40.38546 |
What has been the historical growth (or decline) in GDP per capita for your country?
gapminder %>% filter(country =='India') %>%
mutate(PercentGrowth = (gdpPercap - lag(gdpPercap))*100/lag(gdpPercap)) %>%
ggplot(.,aes(x=year, y=PercentGrowth)) + geom_line(color ='#1ABC9C') + geom_point(color='#21618C')