1. Synopsis

This markdown file we will be doing the analysis of Gapminder dataset which was originally created by Hans Rosling with the visualization done in trendalyzer tool.

2. Packages

The required package will be

library(gapminder)
library(tidyverse)

3. Source Code

There are 6 columns in the gapminder_unfiltered dataset

  1. Country= The name of the country
  2. Continent= Name of the continent
  3. Year= Values ranging from 1950 to 2007
  4. lifeExp= Life expectancy at birth in years
  5. Pop= Population of the given country
  6. gdpPercap= GDP per capita
gu<-gapminder_unfiltered
dim(gu)
## [1] 3313    6
str(gu)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...

4. Data Description

No of Observation 3313
Missing values there are no missng values in gapminder_unfiltered.
No of continents 6 The Year ranges from 1950 to 2007 There are no missing values in this dataset.

summary(gu)
##            country        continent         year         lifeExp     
##  Czech Republic:  58   Africa  : 637   Min.   :1950   Min.   :23.60  
##  Denmark       :  58   Americas: 470   1st Qu.:1967   1st Qu.:58.33  
##  Finland       :  58   Asia    : 578   Median :1982   Median :69.61  
##  Iceland       :  58   Europe  :1302   Mean   :1980   Mean   :65.24  
##  Japan         :  58   FSU     : 139   3rd Qu.:1996   3rd Qu.:73.66  
##  Netherlands   :  58   Oceania : 187   Max.   :2007   Max.   :82.67  
##  (Other)       :2965                                                 
##       pop              gdpPercap       
##  Min.   :5.941e+04   Min.   :   241.2  
##  1st Qu.:2.680e+06   1st Qu.:  2505.3  
##  Median :7.560e+06   Median :  7825.8  
##  Mean   :3.177e+07   Mean   : 11313.8  
##  3rd Qu.:1.961e+07   3rd Qu.: 17355.8  
##  Max.   :1.319e+09   Max.   :113523.1  
## 
str(gu)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
gu$continent %>% summary()
##   Africa Americas     Asia   Europe      FSU  Oceania 
##      637      470      578     1302      139      187

6. Exploratory Data Analysis 1. In this section answer the following questions using a combination of data transformation and visualization techniques: For the year 2007, what is the distribution of GDP per capita across all countries?

cry07<- filter(gu, year==2007) 
summary(cry07$gdpPercap);
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   277.6  2147.0  6873.0 12400.0 19000.0 82010.0
boxplot(cry07$gdpPercap)

  1. For the year 2007, how do the distributions differ across the different continents?
con07<- filter(gu, year==2007) %>% group_by(continent) 
summary(con07$gdpPercap);
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   277.6  2147.0  6873.0 12400.0 19000.0 82010.0
ggplot(con07, aes(x = gdpPercap)) +
  geom_histogram() +
  facet_wrap(~ continent, ncol = 3)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

3.For the year 2007, what are the top 10 countries with the largest GDP per capita?

top10cn<-tail(arrange(filter(gu, year==2007),gdpPercap),n=10)
ggplot(data = top10cn) + 
  geom_point( mapping = aes(x = gdpPercap, y=country))

4.Plot the GDP per capita for your country of origin for all years available.

India<- filter(gu, country=="India")
nrow(India)
## [1] 12
ggplot(data = India) + 
  geom_point( mapping = aes(x = year, y=gdpPercap))+ geom_smooth(mapping = aes(x = year, y=gdpPercap))

5.What was the percent growth (or decline) in GDP per capita in 2007? The growth in percent for India in 5 us given below

#India[12,6]=GDPPercap 2007 
#India[11,6]=GDPPercap 2002
100*(India[12,6]-India[11,6])/India[11,6]                                            
##   gdpPercap
## 1  40.38546

6.What has been the historical growth (or decline) in GDP per capita for your country? There has been growth in GDP per capita in India

gpnew<- gu %>%
  group_by(country)%>%
  filter(country=="India")%>%
  mutate(percent=100*((gdpPercap-lag(gdpPercap))/lag(gdpPercap)))
ggplot(data=gpnew)+geom_point(mapping= aes(x=year,y=gdpPercap))