Gapminder_Unfiltered Analysis

Synopsis

This is my homework report for week 4, produced with R Markdown. In this homework I am working on gapminder_unfiltered to answer the questions asked. Also, I have provided the libraries that i have used to answer the questions asked. Initial findings:

  1. India’s GDP per capita growth is maximum from 2002 to 2007
  2. Asia has maximum variation in GDP per capita for the year 2007
  3. We can see an increasing trend in GDP per capita for India from 1952 to 2007

Packages Required

library(gapminder) #used to load data gapminder_unfiltered
library(Hmisc) #used to get the discription of dataset
library(tidyverse) #group of packages used to summarise and visualize data

Source Code

gapminder_unfiltered is a dataset that has information about life expetency population and gdp per capita for different countries at from 1950 to 2007.It includes 6 Variables 3313 Observations.

The dataset includes following varibles: 1. Country: This provides the country name 2. Continent: This provides the continent name for 6 continents namely Asia Europe Africa Americas FSU Oceania 3. Year: It includes year from 1950 to 2007 4. LifeExp:life expectancy at birth. It is a numeric variable 5. Pop: Total population a country. It list population of 187 countries from year 1950 to 2007 for a gap of 5 years 6. gdpPercap: Per capita GDP per year for every country from year 1950 to 2007

Data Description

You can also embed plots, for example:

gap <- gapminder::gapminder_unfiltered
names(gap)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"
str(gap)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
describe(gap)
## gap 
## 
##  6  Variables      3313  Observations
## ---------------------------------------------------------------------------
## country 
##        n  missing distinct 
##     3313        0      187 
## 
## lowest : Afghanistan        Albania            Algeria            Angola             Argentina         
## highest: Vietnam            West Bank and Gaza Yemen, Rep.        Zambia             Zimbabwe           
## ---------------------------------------------------------------------------
## continent 
##        n  missing distinct 
##     3313        0        6 
## 
## lowest : Africa   Americas Asia     Europe   FSU     
## highest: Americas Asia     Europe   FSU      Oceania  
## 
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0       58    0.998     1980    19.52     1952     1957 
##      .25      .50      .75      .90      .95 
##     1967     1982     1996     2002     2007 
## 
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007 
## ---------------------------------------------------------------------------
## lifeExp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     2571        1    65.24    12.73    41.22    45.37 
##      .25      .50      .75      .90      .95 
##    58.33    69.61    73.66    77.12    78.68 
## 
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670 
## ---------------------------------------------------------------------------
## pop 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      3313         0      3312         1  31773251  50168977    235605 
##       .10       .25       .50       .75       .90       .95 
##    436150   2680018   7559776  19610538  56737055 121365965 
## 
## lowest :      59412      59461      60011      60427      61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096 
## ---------------------------------------------------------------------------
## gdpPercap 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     3313        1    11314    11542    665.7    887.9 
##      .25      .50      .75      .90      .95 
##   2505.3   7825.8  17355.7  26592.7  31534.9 
## 
## lowest :    241.1659    277.5519    298.8462    299.8503    312.1884
## highest:  82010.9780  95458.1118 108382.3529 109347.8670 113523.1329 
## ---------------------------------------------------------------------------
unique(gap$continent)
## [1] Asia     Europe   Africa   Americas FSU      Oceania 
## Levels: Africa Americas Asia Europe FSU Oceania
gap <- gapminder::gapminder_unfiltered
names(gap)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"
##getting the structure and content of dataset
str(gap)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
##giving summary statistics for all the variables
summary(gap)
##            country        continent         year         lifeExp     
##  Czech Republic:  58   Africa  : 637   Min.   :1950   Min.   :23.60  
##  Denmark       :  58   Americas: 470   1st Qu.:1967   1st Qu.:58.33  
##  Finland       :  58   Asia    : 578   Median :1982   Median :69.61  
##  Iceland       :  58   Europe  :1302   Mean   :1980   Mean   :65.24  
##  Japan         :  58   FSU     : 139   3rd Qu.:1996   3rd Qu.:73.66  
##  Netherlands   :  58   Oceania : 187   Max.   :2007   Max.   :82.67  
##  (Other)       :2965                                                 
##       pop              gdpPercap       
##  Min.   :5.941e+04   Min.   :   241.2  
##  1st Qu.:2.680e+06   1st Qu.:  2505.3  
##  Median :7.560e+06   Median :  7825.8  
##  Mean   :3.177e+07   Mean   : 11313.8  
##  3rd Qu.:1.961e+07   3rd Qu.: 17355.8  
##  Max.   :1.319e+09   Max.   :113523.1  
## 
## using describe function of Hmisc package for all variables
describe(gap)
## gap 
## 
##  6  Variables      3313  Observations
## ---------------------------------------------------------------------------
## country 
##        n  missing distinct 
##     3313        0      187 
## 
## lowest : Afghanistan        Albania            Algeria            Angola             Argentina         
## highest: Vietnam            West Bank and Gaza Yemen, Rep.        Zambia             Zimbabwe           
## ---------------------------------------------------------------------------
## continent 
##        n  missing distinct 
##     3313        0        6 
## 
## lowest : Africa   Americas Asia     Europe   FSU     
## highest: Americas Asia     Europe   FSU      Oceania  
## 
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0       58    0.998     1980    19.52     1952     1957 
##      .25      .50      .75      .90      .95 
##     1967     1982     1996     2002     2007 
## 
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007 
## ---------------------------------------------------------------------------
## lifeExp 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     2571        1    65.24    12.73    41.22    45.37 
##      .25      .50      .75      .90      .95 
##    58.33    69.61    73.66    77.12    78.68 
## 
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670 
## ---------------------------------------------------------------------------
## pop 
##         n   missing  distinct      Info      Mean       Gmd       .05 
##      3313         0      3312         1  31773251  50168977    235605 
##       .10       .25       .50       .75       .90       .95 
##    436150   2680018   7559776  19610538  56737055 121365965 
## 
## lowest :      59412      59461      60011      60427      61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096 
## ---------------------------------------------------------------------------
## gdpPercap 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##     3313        0     3313        1    11314    11542    665.7    887.9 
##      .25      .50      .75      .90      .95 
##   2505.3   7825.8  17355.7  26592.7  31534.9 
## 
## lowest :    241.1659    277.5519    298.8462    299.8503    312.1884
## highest:  82010.9780  95458.1118 108382.3529 109347.8670 113523.1329 
## ---------------------------------------------------------------------------
unique(gap$continent)
## [1] Asia     Europe   Africa   Americas FSU      Oceania 
## Levels: Africa Americas Asia Europe FSU Oceania

Exploratory Data Analysis

Question1:For the year 2007, what is the distribution of GDP per capita across all countries?

gap_2007<-filter(gap,year==2007)
 ggplot(gap_2007, aes(x = country, y = gdpPercap)) + geom_bar(stat = "identity")+coord_flip()

Question2:For the year 2007, how do the distributions differ across the different continents?

 ggplot(data = gap_2007, mapping = aes(x = continent, y = gdpPercap,fill=continent,color = continent)) +
  geom_boxplot()+
  ggtitle("Continent wise gdpPercap distribution for year 2007")

the distribution shows that there are 4 countries in countinent Americas 4 countries in Africa and one country in asia which have extreem vales for gdp per cap for the year 2007

Question3:For the year 2007, what are the top 10 countries with the largest GDP per capita?

top_cntry<-head(select(arrange(gap_2007, desc(gdpPercap)),country),10)
print(top_cntry)
## # A tibble: 10 × 1
##             country
##              <fctr>
## 1             Qatar
## 2      Macao, China
## 3            Norway
## 4            Brunei
## 5            Kuwait
## 6         Singapore
## 7     United States
## 8           Ireland
## 9  Hong Kong, China
## 10      Switzerland

Question4:Plot the GDP per capita for your country of origin for all years available.

gap_india<-filter(gap,country=="India")

 ggplot(gap_india, aes(x = year, y = gdpPercap)) + geom_bar(stat = "identity")+scale_x_continuous(breaks=gap_india$year)+
 ggtitle("gdpPercap value for india from 1952 to 2007")

Question5:What was the percent growth (or decline) in GDP per capita in 2007?

india_gdp2002<-select(filter(gap,country=='India' & year==2002),gdpPercap)


india_gdp2007<-select(filter(gap,country=="India" & year==2007),gdpPercap)


india_gdp_growth_2007<-(india_gdp2007-india_gdp2002)*100/india_gdp2002
print(india_gdp_growth_2007)
##   gdpPercap
## 1  40.38546
#There is approximately 14% gdp per capita growth for 2007 withrespect to year 2002 for all the countries combined.

Question6:What has been the historical growth (or decline) in GDP per capita for your country?

gdp_growth_india<-filter(gap,country=="India")%>%
 mutate(percent_growth=(gdpPercap-lag(gdpPercap))*100/lag(gdpPercap),label_n=paste0(round(percent_growth,2),"%"))

dt<-filter(gdp_growth_india,year!=1952)

gap_india<-filter(gap,country=="India")

 ggplot(dt, aes(x = year, y = percent_growth)) + geom_bar(stat = "identity")+scale_x_continuous(breaks=dt$year)+
 ggtitle("Percentage growth in gdp per capita from previous census(5 year back) for India")

#   maximum gdp growth is observed for the year 2007