1. Synopsis
This markdown file we will be doing the analysis of Gapminder dataset which was originally created by Hans Rosling with the visualization done in trendalyzer tool.
2. Packages
The required package will be
library(gapminder)
library(tidyverse)
3. Source Code
There are 6 columns in the gapminder_unfiltered dataset
gu<-gapminder_unfiltered
dim(gu)
## [1] 3313 6
str(gu)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
4. Data Description
No of Observation 3313
Missing values there are no missng values in gapminder_unfiltered.
No of continents 6 The Year ranges from 1950 to 2007 There are no missing values in this dataset.
summary(gu)
## country continent year lifeExp
## Czech Republic: 58 Africa : 637 Min. :1950 Min. :23.60
## Denmark : 58 Americas: 470 1st Qu.:1967 1st Qu.:58.33
## Finland : 58 Asia : 578 Median :1982 Median :69.61
## Iceland : 58 Europe :1302 Mean :1980 Mean :65.24
## Japan : 58 FSU : 139 3rd Qu.:1996 3rd Qu.:73.66
## Netherlands : 58 Oceania : 187 Max. :2007 Max. :82.67
## (Other) :2965
## pop gdpPercap
## Min. :5.941e+04 Min. : 241.2
## 1st Qu.:2.680e+06 1st Qu.: 2505.3
## Median :7.560e+06 Median : 7825.8
## Mean :3.177e+07 Mean : 11313.8
## 3rd Qu.:1.961e+07 3rd Qu.: 17355.8
## Max. :1.319e+09 Max. :113523.1
##
str(gu)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
gu$continent %>% summary()
## Africa Americas Asia Europe FSU Oceania
## 637 470 578 1302 139 187
6. Exploratory Data Analysis 1. In this section answer the following questions using a combination of data transformation and visualization techniques: For the year 2007, what is the distribution of GDP per capita across all countries?
cry07<- filter(gu, year==2007)
summary(cry07$gdpPercap);
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 277.6 2147.0 6873.0 12400.0 19000.0 82010.0
boxplot(cry07$gdpPercap)
con07<- filter(gu, year==2007) %>% group_by(continent)
summary(con07$gdpPercap);
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 277.6 2147.0 6873.0 12400.0 19000.0 82010.0
ggplot(con07, aes(x = gdpPercap)) +
geom_histogram() +
facet_wrap(~ continent, ncol = 3)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
3.For the year 2007, what are the top 10 countries with the largest GDP per capita?
top10cn<-tail(arrange(filter(gu, year==2007),gdpPercap),n=10)
ggplot(data = top10cn) +
geom_point( mapping = aes(x = gdpPercap, y=country))
4.Plot the GDP per capita for your country of origin for all years available.
India<- filter(gu, country=="India")
nrow(India)
## [1] 12
ggplot(data = India) +
geom_point( mapping = aes(x = year, y=gdpPercap))+ geom_smooth(mapping = aes(x = year, y=gdpPercap))
5.What was the percent growth (or decline) in GDP per capita in 2007? The growth in percent for India in 5 us given below
#India[12,6]=GDPPercap 2007
#India[11,6]=GDPPercap 2002
100*(India[12,6]-India[11,6])/India[11,6]
## gdpPercap
## 1 40.38546
6.What has been the historical growth (or decline) in GDP per capita for your country? There has been growth in GDP per capita in India
gpnew<- gu %>%
group_by(country)%>%
filter(country=="India")%>%
mutate(percent=100*((gdpPercap-lag(gdpPercap))/lag(gdpPercap)))
ggplot(data=gpnew)+geom_point(mapping= aes(x=year,y=gdpPercap))