This is my homework report for week 4, produced with R Markdown. In this homework I am working on gapminder_unfiltered to answer the questions asked. Also, I have provided the libraries that i have used to answer the questions asked. Initial findings:
library(gapminder) #used to load data gapminder_unfiltered
library(Hmisc) #used to get the discription of dataset
library(tidyverse) #group of packages used to summarise and visualize data
gapminder_unfiltered is a dataset that has information about life expetency population and gdp per capita for different countries at from 1950 to 2007.It includes 6 Variables 3313 Observations.
The dataset includes following varibles: 1. Country: This provides the country name 2. Continent: This provides the continent name for 6 continents namely Asia Europe Africa Americas FSU Oceania 3. Year: It includes year from 1950 to 2007 4. LifeExp:life expectancy at birth. It is a numeric variable 5. Pop: Total population a country. It list population of 187 countries from year 1950 to 2007 for a gap of 5 years 6. gdpPercap: Per capita GDP per year for every country from year 1950 to 2007
You can also embed plots, for example:
gap <- gapminder::gapminder_unfiltered
names(gap)
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
str(gap)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
describe(gap)
## gap
##
## 6 Variables 3313 Observations
## ---------------------------------------------------------------------------
## country
## n missing distinct
## 3313 0 187
##
## lowest : Afghanistan Albania Algeria Angola Argentina
## highest: Vietnam West Bank and Gaza Yemen, Rep. Zambia Zimbabwe
## ---------------------------------------------------------------------------
## continent
## n missing distinct
## 3313 0 6
##
## lowest : Africa Americas Asia Europe FSU
## highest: Americas Asia Europe FSU Oceania
##
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 58 0.998 1980 19.52 1952 1957
## .25 .50 .75 .90 .95
## 1967 1982 1996 2002 2007
##
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007
## ---------------------------------------------------------------------------
## lifeExp
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 2571 1 65.24 12.73 41.22 45.37
## .25 .50 .75 .90 .95
## 58.33 69.61 73.66 77.12 78.68
##
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670
## ---------------------------------------------------------------------------
## pop
## n missing distinct Info Mean Gmd .05
## 3313 0 3312 1 31773251 50168977 235605
## .10 .25 .50 .75 .90 .95
## 436150 2680018 7559776 19610538 56737055 121365965
##
## lowest : 59412 59461 60011 60427 61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096
## ---------------------------------------------------------------------------
## gdpPercap
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 3313 1 11314 11542 665.7 887.9
## .25 .50 .75 .90 .95
## 2505.3 7825.8 17355.7 26592.7 31534.9
##
## lowest : 241.1659 277.5519 298.8462 299.8503 312.1884
## highest: 82010.9780 95458.1118 108382.3529 109347.8670 113523.1329
## ---------------------------------------------------------------------------
unique(gap$continent)
## [1] Asia Europe Africa Americas FSU Oceania
## Levels: Africa Americas Asia Europe FSU Oceania
gap <- gapminder::gapminder_unfiltered
names(gap)
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
##getting the structure and content of dataset
str(gap)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
##giving summary statistics for all the variables
summary(gap)
## country continent year lifeExp
## Czech Republic: 58 Africa : 637 Min. :1950 Min. :23.60
## Denmark : 58 Americas: 470 1st Qu.:1967 1st Qu.:58.33
## Finland : 58 Asia : 578 Median :1982 Median :69.61
## Iceland : 58 Europe :1302 Mean :1980 Mean :65.24
## Japan : 58 FSU : 139 3rd Qu.:1996 3rd Qu.:73.66
## Netherlands : 58 Oceania : 187 Max. :2007 Max. :82.67
## (Other) :2965
## pop gdpPercap
## Min. :5.941e+04 Min. : 241.2
## 1st Qu.:2.680e+06 1st Qu.: 2505.3
## Median :7.560e+06 Median : 7825.8
## Mean :3.177e+07 Mean : 11313.8
## 3rd Qu.:1.961e+07 3rd Qu.: 17355.8
## Max. :1.319e+09 Max. :113523.1
##
## using describe function of Hmisc package for all variables
describe(gap)
## gap
##
## 6 Variables 3313 Observations
## ---------------------------------------------------------------------------
## country
## n missing distinct
## 3313 0 187
##
## lowest : Afghanistan Albania Algeria Angola Argentina
## highest: Vietnam West Bank and Gaza Yemen, Rep. Zambia Zimbabwe
## ---------------------------------------------------------------------------
## continent
## n missing distinct
## 3313 0 6
##
## lowest : Africa Americas Asia Europe FSU
## highest: Americas Asia Europe FSU Oceania
##
## Africa (637, 0.192), Americas (470, 0.142), Asia (578, 0.174), Europe
## (1302, 0.393), FSU (139, 0.042), Oceania (187, 0.056)
## ---------------------------------------------------------------------------
## year
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 58 0.998 1980 19.52 1952 1957
## .25 .50 .75 .90 .95
## 1967 1982 1996 2002 2007
##
## lowest : 1950 1951 1952 1953 1954, highest: 2003 2004 2005 2006 2007
## ---------------------------------------------------------------------------
## lifeExp
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 2571 1 65.24 12.73 41.22 45.37
## .25 .50 .75 .90 .95
## 58.33 69.61 73.66 77.12 78.68
##
## lowest : 23.599 28.801 30.000 30.015 30.331
## highest: 82.208 82.270 82.360 82.603 82.670
## ---------------------------------------------------------------------------
## pop
## n missing distinct Info Mean Gmd .05
## 3313 0 3312 1 31773251 50168977 235605
## .10 .25 .50 .75 .90 .95
## 436150 2680018 7559776 19610538 56737055 121365965
##
## lowest : 59412 59461 60011 60427 61325
## highest: 1110396331 1164970000 1230075000 1280400000 1318683096
## ---------------------------------------------------------------------------
## gdpPercap
## n missing distinct Info Mean Gmd .05 .10
## 3313 0 3313 1 11314 11542 665.7 887.9
## .25 .50 .75 .90 .95
## 2505.3 7825.8 17355.7 26592.7 31534.9
##
## lowest : 241.1659 277.5519 298.8462 299.8503 312.1884
## highest: 82010.9780 95458.1118 108382.3529 109347.8670 113523.1329
## ---------------------------------------------------------------------------
unique(gap$continent)
## [1] Asia Europe Africa Americas FSU Oceania
## Levels: Africa Americas Asia Europe FSU Oceania
Question1:For the year 2007, what is the distribution of GDP per capita across all countries?
gap_2007<-filter(gap,year==2007)
ggplot(gap_2007, aes(x = country, y = gdpPercap)) + geom_bar(stat = "identity")+coord_flip()
Question2:For the year 2007, how do the distributions differ across the different continents?
ggplot(data = gap_2007, mapping = aes(x = continent, y = gdpPercap,fill=continent,color = continent)) +
geom_boxplot()+
ggtitle("Continent wise gdpPercap distribution for year 2007")
the distribution shows that there are 4 countries in countinent Americas 4 countries in Africa and one country in asia which have extreem vales for gdp per cap for the year 2007
Question3:For the year 2007, what are the top 10 countries with the largest GDP per capita?
top_cntry<-head(select(arrange(gap_2007, desc(gdpPercap)),country),10)
print(top_cntry)
## # A tibble: 10 × 1
## country
## <fctr>
## 1 Qatar
## 2 Macao, China
## 3 Norway
## 4 Brunei
## 5 Kuwait
## 6 Singapore
## 7 United States
## 8 Ireland
## 9 Hong Kong, China
## 10 Switzerland
Question4:Plot the GDP per capita for your country of origin for all years available.
gap_india<-filter(gap,country=="India")
ggplot(gap_india, aes(x = year, y = gdpPercap)) + geom_bar(stat = "identity")+scale_x_continuous(breaks=gap_india$year)+
ggtitle("gdpPercap value for india from 1952 to 2007")
Question5:What was the percent growth (or decline) in GDP per capita in 2007?
india_gdp2002<-select(filter(gap,country=='India' & year==2002),gdpPercap)
india_gdp2007<-select(filter(gap,country=="India" & year==2007),gdpPercap)
india_gdp_growth_2007<-(india_gdp2007-india_gdp2002)*100/india_gdp2002
print(india_gdp_growth_2007)
## gdpPercap
## 1 40.38546
#There is approximately 14% gdp per capita growth for 2007 withrespect to year 2002 for all the countries combined.
Question6:What has been the historical growth (or decline) in GDP per capita for your country?
gdp_growth_india<-filter(gap,country=="India")%>%
mutate(percent_growth=(gdpPercap-lag(gdpPercap))*100/lag(gdpPercap),label_n=paste0(round(percent_growth,2),"%"))
dt<-filter(gdp_growth_india,year!=1952)
gap_india<-filter(gap,country=="India")
ggplot(dt, aes(x = year, y = percent_growth)) + geom_bar(stat = "identity")+scale_x_continuous(breaks=dt$year)+
ggtitle("Percentage growth in gdp per capita from previous census(5 year back) for India")
# maximum gdp growth is observed for the year 2007