Synopsis

This is the assignment report for week-4. In this assignment, I have worked on a data set called “gapminder” which records the GDP per capita of 187 countries over 1950 to 2007. By exploring and analyzing the data I have answered the six questions inthe assignment. By working on week-4 homework I have learned various ways of data exploration and tasks that can be done by the “dplyr” package in R.

Packages Required

To complete this assignment and run the codes I have used the following packages:

library(gapminder) #Load the gapminder package to get the data set#
library(tidyverse) # To use ggplot and dplyr#

Source Code

The data set gapminder_unfiltered has 3313 observations and 6 variables described below:

country : The data for 187 countries.

continent : The data set has observations for 6 continents.

year: The years ranging from 1950 to 2007.

lifeExp: The life expectancy at the time of birth expressed in years.

pop: The population of that particular country in a particular year.

gdpPercap: The per capita GDP of a country in that particular year.

Data Description

This data set contains on life expectancy, GDP per capita, and population by country from 1950 to 2007. This data was not filtered on year.

my_gap<-gapminder_unfiltered
ncol(my_gap)
## [1] 6
nrow(my_gap)
## [1] 3313
names(my_gap)
## [1] "country"   "continent" "year"      "lifeExp"   "pop"       "gdpPercap"
range(my_gap$year)
## [1] 1950 2007
Number_of_missing_values<-sum(is.na(my_gap$gdpPercap==T))
Number_of_missing_values
## [1] 0
str(my_gap)
## Classes 'tbl_df', 'tbl' and 'data.frame':    3313 obs. of  6 variables:
##  $ country  : Factor w/ 187 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 6 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
summary(my_gap)
##            country        continent         year         lifeExp     
##  Czech Republic:  58   Africa  : 637   Min.   :1950   Min.   :23.60  
##  Denmark       :  58   Americas: 470   1st Qu.:1967   1st Qu.:58.33  
##  Finland       :  58   Asia    : 578   Median :1982   Median :69.61  
##  Iceland       :  58   Europe  :1302   Mean   :1980   Mean   :65.24  
##  Japan         :  58   FSU     : 139   3rd Qu.:1996   3rd Qu.:73.66  
##  Netherlands   :  58   Oceania : 187   Max.   :2007   Max.   :82.67  
##  (Other)       :2965                                                 
##       pop              gdpPercap       
##  Min.   :5.941e+04   Min.   :   241.2  
##  1st Qu.:2.680e+06   1st Qu.:  2505.3  
##  Median :7.560e+06   Median :  7825.8  
##  Mean   :3.177e+07   Mean   : 11313.8  
##  3rd Qu.:1.961e+07   3rd Qu.: 17355.8  
##  Max.   :1.319e+09   Max.   :113523.1  
## 
summary(my_gap$gdpPercap)
##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##    241.2   2505.0   7826.0  11310.0  17360.0 113500.0
my_gap %>% 
  group_by(year) %>% 
  summarise(no_year=n())
## # A tibble: 58 × 2
##     year no_year
##    <int>   <int>
## 1   1950      39
## 2   1951      24
## 3   1952     144
## 4   1953      24
## 5   1954      24
## 6   1955      24
## 7   1956      24
## 8   1957     144
## 9   1958      25
## 10  1959      25
## # ... with 48 more rows
my_gap %>% 
  group_by(country) %>% 
  summarise(no_country=n())
## # A tibble: 187 × 2
##        country no_country
##         <fctr>      <int>
## 1  Afghanistan         12
## 2      Albania         12
## 3      Algeria         12
## 4       Angola         12
## 5    Argentina         12
## 6      Armenia          4
## 7        Aruba          8
## 8    Australia         56
## 9      Austria         57
## 10  Azerbaijan          4
## # ... with 177 more rows
my_gap %>% 
  filter(year==2007) %>% 
  summarise(no_of_countries_for_2007=n())
## # A tibble: 1 × 1
##   no_of_countries_for_2007
##                      <int>
## 1                      183
my_gap %>% 
  summarise(No_of_countries=n_distinct(country))
## # A tibble: 1 × 1
##   No_of_countries
##             <int>
## 1             187
my_gap %>% 
  summarise(No_of_continents=n_distinct(continent))
## # A tibble: 1 × 1
##   No_of_continents
##              <int>
## 1                6
my_gap %>% 
  summarise(No_of_years=n_distinct(year))
## # A tibble: 1 × 1
##   No_of_years
##         <int>
## 1          58
head(my_gap)
## # A tibble: 6 × 6
##       country continent  year lifeExp      pop gdpPercap
##        <fctr>    <fctr> <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan      Asia  1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia  1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia  1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia  1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia  1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia  1977  38.438 14880372  786.1134

Expolatory Data Analysis

In order to carry out the analysis of the data set provided to me and answer the questions I used some visualization and data transformation techniques.

Q1.

For the year 2007, what is the distribution of GDP per capita across all countries?

my_gap %>%
filter(year==2007) %>%
  ggplot +
  geom_histogram(mapping = aes(x = gdpPercap), binwidth = 1000, color = "blue")+
  ggtitle("Distribution of GDP/Capita for 2007")+
  labs(x="GDP per Capita", y="No. of countries")

Results: The plot shows that for year 2007 a large number of countries have GDP/capita less than 10,000. Lesser number of countries have higher per capita GDP as GDP/capita is measure of how developed a nation is. Thus only 44 countries have a GDP/capita greater than 20,000.

Q2.

For the year 2007, how do the distributions differ across the different continents?

my_gap %>%
  filter(year==2007) %>%
  ggplot +
  geom_histogram(mapping = aes(x = gdpPercap), binwidth = 1000) +
    facet_wrap(~ continent, nrow = 6)+
  ggtitle("Distribution of GDP/Capita for Different Continents in 2007")+
  labs(x="GDP per Capita", y="No. of countries")

Results: The plot shows that for year 2007 the continents having developing or underdeveloped countries like Africa or Asia have more count for lesser GDP/capita. On the other hand, the more developed nations have an evenly distributed per capita GDP.

Q3.

For the year 2007, what are the top 10 countries with the largest GDP per capita?

my_gap %>%
  filter(year==2007) %>%
  filter(rank(desc(gdpPercap)) <= 10) %>%
  arrange(desc(gdpPercap)) %>%
  select(country,gdpPercap)
## # A tibble: 10 × 2
##             country gdpPercap
##              <fctr>     <dbl>
## 1             Qatar  82010.98
## 2      Macao, China  54589.82
## 3            Norway  49357.19
## 4            Brunei  48014.59
## 5            Kuwait  47306.99
## 6         Singapore  47143.18
## 7     United States  42951.65
## 8           Ireland  40676.00
## 9  Hong Kong, China  39724.98
## 10      Switzerland  37506.42

Q4.

Plot the GDP per capita for your country of origin for all years available.

my_gap %>%
  filter(country=="India") %>%
  ggplot +
  geom_smooth(mapping = aes(x = year , y = gdpPercap ), se=FALSE)+
  ggtitle("Distribution of GDP per Capita for INDIA")+
  labs(x="Year", y="GDP per Capita")

Results: As we can see that the GDP per Capita for India has been countinously rising since 1950s to 2007.

Q5.

What was the percent growth (or decline) in GDP per capita in 2007?

my_gap %>%
  group_by(country) %>%
  mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100)%>%
  filter(year==2007) %>%
  select(country , percent_growth)
## Source: local data frame [183 x 2]
## Groups: country [183]
## 
##        country percent_growth
##         <fctr>          <dbl>
## 1  Afghanistan      34.104124
## 2      Albania      28.947795
## 3      Algeria      17.687593
## 4       Angola      72.979959
## 5    Argentina      45.259167
## 6      Armenia      83.580452
## 7        Aruba       2.882002
## 8    Australia       7.280281
## 9      Austria       5.917945
## 10  Azerbaijan     151.982929
## # ... with 173 more rows
#If we want the percent growth for just one country in this case the country I belong to i.e. India then :#
my_gap %>%
  group_by(country) %>%
  mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100)%>%
  filter(year==2007, country=="India") %>%
  select(country , percent_growth)
## Source: local data frame [1 x 2]
## Groups: country [1]
## 
##   country percent_growth
##    <fctr>          <dbl>
## 1   India       40.38546

Q6.

What has been the historical growth (or decline) in GDP per capita for your country?

my_gap %>%
  filter(country=="India") %>%
  mutate(growth = gdpPercap - lag(gdpPercap)) %>%
  select(year , growth) %>%
  ggplot +
  geom_line(mapping = aes(x = year , y = growth), color = "red")+
  ggtitle("Absolute Growth in GDP per Capita of INDIA")+
  labs(x="Year", y="Growth in GDP/Capita")

We can see here that the GDP/Capita was countinuously rising. This graph actually showsthe amount by which the GDP/Capita was rising and we can observe that rise much rapid after 1980.

my_gap %>%
  filter(country=="India") %>%
  mutate(percent_growth = {{gdpPercap - lag(gdpPercap)}/{lag(gdpPercap)}}*100) %>%
  select(year , percent_growth) %>%
  ggplot +
  geom_line(mapping = aes(x = year , y = percent_growth), color = "orange")+
  ggtitle("Percent Growth in GDP per Capita of INDIA")+
  labs(x="Year", y="Percent Growth in GDP/Capita")