This is the assignment report for week-4. In this assignment, I have worked on a data set called “gapminder” which records the GDP per capita of 187 countries over 1950 to 2007. By exploring and analyzing the data I have answered the six questions inthe assignment. By working on week-4 homework I have learned various ways of data exploration and tasks that can be done by the “dplyr” package in R.
To complete this assignment and run the codes I have used the following packages:
library(gapminder) #Load the gapminder package to get the data set#
library(tidyverse) # To use ggplot and dplyr#
The data set gapminder_unfiltered has 3313 observations and 6 variables described below:
country : The data for 187 countries.
continent : The data set has observations for 6 continents.
year: The years ranging from 1950 to 2007.
lifeExp: The life expectancy at the time of birth expressed in years.
pop: The population of that particular country in a particular year.
gdpPercap: The per capita GDP of a country in that particular year.
This data set contains on life expectancy, GDP per capita, and population by country from 1950 to 2007. This data was not filtered on year.
my_gap<-gapminder_unfiltered
ncol(my_gap)
## [1] 6
nrow(my_gap)
## [1] 3313
names(my_gap)
## [1] "country" "continent" "year" "lifeExp" "pop" "gdpPercap"
range(my_gap$year)
## [1] 1950 2007
Number_of_missing_values<-sum(is.na(my_gap$gdpPercap==T))
Number_of_missing_values
## [1] 0
str(my_gap)
## Classes 'tbl_df', 'tbl' and 'data.frame': 3313 obs. of 6 variables:
## $ country : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ year : int 1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
## $ lifeExp : num 28.8 30.3 32 34 36.1 ...
## $ pop : int 8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
## $ gdpPercap: num 779 821 853 836 740 ...
summary(my_gap)
## country continent year lifeExp
## Czech Republic: 58 Africa : 637 Min. :1950 Min. :23.60
## Denmark : 58 Americas: 470 1st Qu.:1967 1st Qu.:58.33
## Finland : 58 Asia : 578 Median :1982 Median :69.61
## Iceland : 58 Europe :1302 Mean :1980 Mean :65.24
## Japan : 58 FSU : 139 3rd Qu.:1996 3rd Qu.:73.66
## Netherlands : 58 Oceania : 187 Max. :2007 Max. :82.67
## (Other) :2965
## pop gdpPercap
## Min. :5.941e+04 Min. : 241.2
## 1st Qu.:2.680e+06 1st Qu.: 2505.3
## Median :7.560e+06 Median : 7825.8
## Mean :3.177e+07 Mean : 11313.8
## 3rd Qu.:1.961e+07 3rd Qu.: 17355.8
## Max. :1.319e+09 Max. :113523.1
##
summary(my_gap$gdpPercap)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 241.2 2505.0 7826.0 11310.0 17360.0 113500.0
my_gap %>%
group_by(year) %>%
summarise(no_year=n())
## # A tibble: 58 × 2
## year no_year
## <int> <int>
## 1 1950 39
## 2 1951 24
## 3 1952 144
## 4 1953 24
## 5 1954 24
## 6 1955 24
## 7 1956 24
## 8 1957 144
## 9 1958 25
## 10 1959 25
## # ... with 48 more rows
my_gap %>%
group_by(country) %>%
summarise(no_country=n())
## # A tibble: 187 × 2
## country no_country
## <fctr> <int>
## 1 Afghanistan 12
## 2 Albania 12
## 3 Algeria 12
## 4 Angola 12
## 5 Argentina 12
## 6 Armenia 4
## 7 Aruba 8
## 8 Australia 56
## 9 Austria 57
## 10 Azerbaijan 4
## # ... with 177 more rows
my_gap %>%
filter(year==2007) %>%
summarise(no_of_countries_for_2007=n())
## # A tibble: 1 × 1
## no_of_countries_for_2007
## <int>
## 1 183
my_gap %>%
summarise(No_of_countries=n_distinct(country))
## # A tibble: 1 × 1
## No_of_countries
## <int>
## 1 187
my_gap %>%
summarise(No_of_continents=n_distinct(continent))
## # A tibble: 1 × 1
## No_of_continents
## <int>
## 1 6
my_gap %>%
summarise(No_of_years=n_distinct(year))
## # A tibble: 1 × 1
## No_of_years
## <int>
## 1 58
head(my_gap)
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fctr> <fctr> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.801 8425333 779.4453
## 2 Afghanistan Asia 1957 30.332 9240934 820.8530
## 3 Afghanistan Asia 1962 31.997 10267083 853.1007
## 4 Afghanistan Asia 1967 34.020 11537966 836.1971
## 5 Afghanistan Asia 1972 36.088 13079460 739.9811
## 6 Afghanistan Asia 1977 38.438 14880372 786.1134
In order to carry out the analysis of the data set provided to me and answer the questions I used some visualization and data transformation techniques.
For the year 2007, what is the distribution of GDP per capita across all countries?
my_gap %>%
filter(year==2007) %>%
ggplot +
geom_histogram(mapping = aes(x = gdpPercap), binwidth = 1000, color = "blue")+
ggtitle("Distribution of GDP/Capita for 2007")+
labs(x="GDP per Capita", y="No. of countries")
Results: The plot shows that for year 2007 a large number of countries have GDP/capita less than 10,000. Lesser number of countries have higher per capita GDP as GDP/capita is measure of how developed a nation is. Thus only 44 countries have a GDP/capita greater than 20,000.
For the year 2007, how do the distributions differ across the different continents?
my_gap %>%
filter(year==2007) %>%
ggplot +
geom_histogram(mapping = aes(x = gdpPercap), binwidth = 1000) +
facet_wrap(~ continent, nrow = 6)+
ggtitle("Distribution of GDP/Capita for Different Continents in 2007")+
labs(x="GDP per Capita", y="No. of countries")
Results: The plot shows that for year 2007 the continents having developing or underdeveloped countries like Africa or Asia have more count for lesser GDP/capita. On the other hand, the more developed nations have an evenly distributed per capita GDP.
For the year 2007, what are the top 10 countries with the largest GDP per capita?
my_gap %>%
filter(year==2007) %>%
filter(rank(desc(gdpPercap)) <= 10) %>%
arrange(desc(gdpPercap)) %>%
select(country,gdpPercap)
## # A tibble: 10 × 2
## country gdpPercap
## <fctr> <dbl>
## 1 Qatar 82010.98
## 2 Macao, China 54589.82
## 3 Norway 49357.19
## 4 Brunei 48014.59
## 5 Kuwait 47306.99
## 6 Singapore 47143.18
## 7 United States 42951.65
## 8 Ireland 40676.00
## 9 Hong Kong, China 39724.98
## 10 Switzerland 37506.42
Plot the GDP per capita for your country of origin for all years available.
my_gap %>%
filter(country=="India") %>%
ggplot +
geom_smooth(mapping = aes(x = year , y = gdpPercap ), se=FALSE)+
ggtitle("Distribution of GDP per Capita for INDIA")+
labs(x="Year", y="GDP per Capita")
Results: As we can see that the GDP per Capita for India has been countinously rising since 1950s to 2007.
What was the percent growth (or decline) in GDP per capita in 2007?
my_gap %>%
group_by(country) %>%
mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100)%>%
filter(year==2007) %>%
select(country , percent_growth)
## Source: local data frame [183 x 2]
## Groups: country [183]
##
## country percent_growth
## <fctr> <dbl>
## 1 Afghanistan 34.104124
## 2 Albania 28.947795
## 3 Algeria 17.687593
## 4 Angola 72.979959
## 5 Argentina 45.259167
## 6 Armenia 83.580452
## 7 Aruba 2.882002
## 8 Australia 7.280281
## 9 Austria 5.917945
## 10 Azerbaijan 151.982929
## # ... with 173 more rows
#If we want the percent growth for just one country in this case the country I belong to i.e. India then :#
my_gap %>%
group_by(country) %>%
mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100)%>%
filter(year==2007, country=="India") %>%
select(country , percent_growth)
## Source: local data frame [1 x 2]
## Groups: country [1]
##
## country percent_growth
## <fctr> <dbl>
## 1 India 40.38546
What has been the historical growth (or decline) in GDP per capita for your country?
my_gap %>%
filter(country=="India") %>%
mutate(growth = gdpPercap - lag(gdpPercap)) %>%
select(year , growth) %>%
ggplot +
geom_line(mapping = aes(x = year , y = growth), color = "red")+
ggtitle("Absolute Growth in GDP per Capita of INDIA")+
labs(x="Year", y="Growth in GDP/Capita")
We can see here that the GDP/Capita was countinuously rising. This graph actually showsthe amount by which the GDP/Capita was rising and we can observe that rise much rapid after 1980.
my_gap %>%
filter(country=="India") %>%
mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100) %>%
select(year , percent_growth) %>%
ggplot +
geom_line(mapping = aes(x = year , y = percent_growth), color = "orange")+
ggtitle("Percent Growth in GDP per Capita of INDIA")+
labs(x="Year", y="Percent Growth in GDP/Capita")