Project Name: Demographic Data Analysis

Project Discription:

You are employed as a data scientists by the World Bank and You are working on a Project to analyse the worlds Demographic trends. You are required to produce a scatter plot illustrating Birth Rate and Internet usage statistics by country. The scatter plot needs to be categorized by country’s.

Step 1: Loading the data into Rstudio.

Load the data into R

stats <- read.csv(file.choose())

Step 2: Working Directory

Other methods to load the data

stats <- read.csv(“DemograhicData.csv”)

setting working directory

getwd() # To get the path of the dataset
## [1] "D:/KIAMS/SEM/SEM-V/R-For-Manager"

Step 3: Exploring Data Sets

To Understand the data and its data type can use the function view and structure(str) to analyse the data.

View(stats) 
str(stats)
## 'data.frame':    195 obs. of  5 variables:
##  $ Country.Name  : chr  "Aruba" "Afghanistan" "Angola" "Albania" ...
##  $ Country.Code  : chr  "ABW" "AFG" "AGO" "ALB" ...
##  $ Birth.rate    : num  10.2 35.3 46 12.9 11 ...
##  $ Internet.users: num  78.9 5.9 19.1 57.2 88 ...
##  $ Income.Group  : chr  "High income" "Low income" "Upper middle income" "Upper middle income" ...
nrow(stats) # To check the number of rows.
## [1] 195
ncol(stats) # To check the number of columns.
## [1] 5
head(stats) # To check the first six data rows.
##           Country.Name Country.Code Birth.rate Internet.users
## 1                Aruba          ABW     10.244           78.9
## 2          Afghanistan          AFG     35.253            5.9
## 3               Angola          AGO     45.985           19.1
## 4              Albania          ALB     12.877           57.2
## 5 United Arab Emirates          ARE     11.044           88.0
## 6            Argentina          ARG     17.716           59.9
##          Income.Group
## 1         High income
## 2          Low income
## 3 Upper middle income
## 4 Upper middle income
## 5         High income
## 6         High income
tail(stats) # To check the bottom six data rows.
##         Country.Name Country.Code Birth.rate Internet.users        Income.Group
## 190            Samoa          WSM     26.172           15.3 Lower middle income
## 191      Yemen, Rep.          YEM     32.947           20.0 Lower middle income
## 192     South Africa          ZAF     20.850           46.5 Upper middle income
## 193 Congo, Dem. Rep.          COD     42.394            2.2          Low income
## 194           Zambia          ZMB     40.471           15.4 Lower middle income
## 195         Zimbabwe          ZWE     35.715           18.5          Low income

To see 10 rows of top and bottom

head(stats,10) # To check top 10 rows in the data.
##            Country.Name Country.Code Birth.rate Internet.users
## 1                 Aruba          ABW     10.244        78.9000
## 2           Afghanistan          AFG     35.253         5.9000
## 3                Angola          AGO     45.985        19.1000
## 4               Albania          ALB     12.877        57.2000
## 5  United Arab Emirates          ARE     11.044        88.0000
## 6             Argentina          ARG     17.716        59.9000
## 7               Armenia          ARM     13.308        41.9000
## 8   Antigua and Barbuda          ATG     16.447        63.4000
## 9             Australia          AUS     13.200        83.0000
## 10              Austria          AUT      9.400        80.6188
##           Income.Group
## 1          High income
## 2           Low income
## 3  Upper middle income
## 4  Upper middle income
## 5          High income
## 6          High income
## 7  Lower middle income
## 8          High income
## 9          High income
## 10         High income
tail(stats,n=10) # To check bottom 10 rows in the data.
##              Country.Name Country.Code Birth.rate Internet.users
## 186 Virgin Islands (U.S.)          VIR     10.700           45.3
## 187               Vietnam          VNM     15.537           43.9
## 188               Vanuatu          VUT     26.739           11.3
## 189    West Bank and Gaza          PSE     30.394           46.6
## 190                 Samoa          WSM     26.172           15.3
## 191           Yemen, Rep.          YEM     32.947           20.0
## 192          South Africa          ZAF     20.850           46.5
## 193      Congo, Dem. Rep.          COD     42.394            2.2
## 194                Zambia          ZMB     40.471           15.4
## 195              Zimbabwe          ZWE     35.715           18.5
##            Income.Group
## 186         High income
## 187 Lower middle income
## 188 Lower middle income
## 189 Lower middle income
## 190 Lower middle income
## 191 Lower middle income
## 192 Upper middle income
## 193          Low income
## 194 Lower middle income
## 195          Low income
summary(stats)
##  Country.Name       Country.Code         Birth.rate    Internet.users 
##  Length:195         Length:195         Min.   : 7.90   Min.   : 0.90  
##  Class :character   Class :character   1st Qu.:12.12   1st Qu.:14.52  
##  Mode  :character   Mode  :character   Median :19.68   Median :41.00  
##                                        Mean   :21.47   Mean   :42.08  
##                                        3rd Qu.:29.76   3rd Qu.:66.22  
##                                        Max.   :49.66   Max.   :96.55  
##  Income.Group      
##  Length:195        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
stats$Internet.users # It will give the Internet.users data from the dataset stats.
##   [1] 78.90000  5.90000 19.10000 57.20000 88.00000 59.90000 41.90000 63.40000
##   [9] 83.00000 80.61880 58.70000  1.30000 82.17020  4.90000  9.10000  6.63000
##  [17] 53.06150 90.00004 72.00000 57.79000 54.17000 33.60000 95.30000 36.94000
##  [25] 51.04000 73.00000 64.50000 29.90000 15.00000  3.50000 85.80000 86.34000
##  [33] 66.50000 45.80000  8.40000  6.40000  6.60000 51.70000  6.50000 37.50000
##  [41] 45.96000 27.93000 74.10000 65.45480 74.11040 84.17000  9.50000 94.62970
##  [49] 45.90000 16.50000 40.35368 29.40000  0.90000 71.63500 79.40000  1.90000
##  [57] 91.51440 37.10000 81.91980 27.80000  9.20000 89.84410 43.30000 12.30000
##  [65]  1.60000 14.00000  3.10000 16.40000 59.86630 35.00000 65.80000 19.70000
##  [73] 65.40000 35.00000 74.20000 17.80000 66.74760 10.60000 72.64390 14.94000
##  [81] 15.10000 78.24770 29.95000  9.20000 96.54680 70.80000 58.45930 37.10000
##  [89] 41.00000 89.71000 54.00000 39.00000 23.00000  6.80000 11.50000 84.77000
##  [97] 75.46000 12.50000 70.50000  3.20000 16.50000 46.20000 93.80000 21.90000
## [105]  5.00000 68.45290 93.77650 75.23440 65.80000 56.00000 45.00000  3.00000
## [113] 44.10000 43.46000 65.24000  3.50000 68.91380  1.60000 60.31000 20.00000
## [121]  5.40000  6.20000 39.00000  5.05000 66.97000 13.90000 66.00000  1.70000
## [129] 38.00000 15.50000 93.95640 95.05340 13.30000 82.78000 66.45000 10.90000
## [137] 44.03000 39.20000 37.00000  6.50000 62.84920 73.90000 62.09560 36.90000
## [145] 56.80000 85.30000 49.76450 67.97000  9.00000 60.50000 22.70000 13.10000
## [153] 81.00000  8.00000  1.70000 23.10930  1.50000 51.50000 14.10000 23.00000
## [161] 37.40000 77.88260 72.67560 94.78360 24.70000 50.40000 26.20000  2.30000
## [169]  4.50000 28.94000 16.00000  9.60000  1.10000 35.00000 63.80000 43.80000
## [177] 46.25000  4.40000 16.20000 41.00000 57.69000 84.20000 38.20000 52.00000
## [185] 54.90000 45.30000 43.90000 11.30000 46.60000 15.30000 20.00000 46.50000
## [193]  2.20000 15.40000 18.50000
stats$Income.Group <- as.factor(stats$Income.Group) # It will give the Income.Group data from the dataset stats stats$Income.Group levels(stats$Income.Group) # It will gives you all the parameter on which the data is seggregated.

subsetting

stats[1:10,] # To get the data of first 10 rows of all column.
##            Country.Name Country.Code Birth.rate Internet.users
## 1                 Aruba          ABW     10.244        78.9000
## 2           Afghanistan          AFG     35.253         5.9000
## 3                Angola          AGO     45.985        19.1000
## 4               Albania          ALB     12.877        57.2000
## 5  United Arab Emirates          ARE     11.044        88.0000
## 6             Argentina          ARG     17.716        59.9000
## 7               Armenia          ARM     13.308        41.9000
## 8   Antigua and Barbuda          ATG     16.447        63.4000
## 9             Australia          AUS     13.200        83.0000
## 10              Austria          AUT      9.400        80.6188
##           Income.Group
## 1          High income
## 2           Low income
## 3  Upper middle income
## 4  Upper middle income
## 5          High income
## 6          High income
## 7  Lower middle income
## 8          High income
## 9          High income
## 10         High income
stats[3:10,] # To get the data from 3rd row to 10th row.
##            Country.Name Country.Code Birth.rate Internet.users
## 3                Angola          AGO     45.985        19.1000
## 4               Albania          ALB     12.877        57.2000
## 5  United Arab Emirates          ARE     11.044        88.0000
## 6             Argentina          ARG     17.716        59.9000
## 7               Armenia          ARM     13.308        41.9000
## 8   Antigua and Barbuda          ATG     16.447        63.4000
## 9             Australia          AUS     13.200        83.0000
## 10              Austria          AUT      9.400        80.6188
##           Income.Group
## 3  Upper middle income
## 4  Upper middle income
## 5          High income
## 6          High income
## 7  Lower middle income
## 8          High income
## 9          High income
## 10         High income
stats[c(4,100),] # To get the data of 4th and 100th rows and all columns.
##     Country.Name Country.Code Birth.rate Internet.users        Income.Group
## 4        Albania          ALB     12.877           57.2 Upper middle income
## 100      Liberia          LBR     35.521            3.2          Low income
stats[c(4,100),2] # To get the data of 4th and 100th row but only the second variable stored in the data.
## [1] "ALB" "LBR"

we have created new variable by using assigning function on birth.rate and internet.user

stats$Mycale <- stats$Birth.rate + stats$Internet.users 

stats$MyCale
## NULL

It will make all the value inside the variable as null. {stats$Mycale <- NULL}

To get data where the internet users are less than 2

stats$Internet.users <2
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [157]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE

creating a new variable to store data of those country where internet users are less than 2

filter <- stats$Internet.users <2

filter 
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE
##  [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
##  [61] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE
## [157]  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE
stats[filter,]
##     Country.Name Country.Code Birth.rate Internet.users        Income.Group
## 12       Burundi          BDI     44.151            1.3          Low income
## 53       Eritrea          ERI     34.800            0.9          Low income
## 56      Ethiopia          ETH     32.925            1.9          Low income
## 65        Guinea          GIN     37.337            1.6          Low income
## 118      Myanmar          MMR     18.119            1.6 Lower middle income
## 128        Niger          NER     49.661            1.7          Low income
## 155 Sierra Leone          SLE     36.729            1.7          Low income
## 157      Somalia          SOM     43.891            1.5          Low income
## 173  Timor-Leste          TLS     35.755            1.1 Lower middle income
##     Mycale
## 12  45.451
## 53  35.700
## 56  34.825
## 65  38.937
## 118 19.719
## 128 51.361
## 155 38.429
## 157 45.391
## 173 36.855

list of country who have birth rate more than 40 and internet users less than 2

filter2 <- stats$Birth.rate >40

stats[ stats$Birth.rate > 40 & stats$Internet.user <2,]
##     Country.Name Country.Code Birth.rate Internet.users Income.Group Mycale
## 12       Burundi          BDI     44.151            1.3   Low income 45.451
## 128        Niger          NER     49.661            1.7   Low income 51.361
## 157      Somalia          SOM     43.891            1.5   Low income 45.391

countries with high income group

stats[ stats$Income.Group == "High Income",]
## [1] Country.Name   Country.Code   Birth.rate     Internet.users Income.Group  
## [6] Mycale        
## <0 rows> (or 0-length row.names)

Step 4: Ensure you have the right package for the analysis

library(ggplot2)

Step 5: Scatter plot

qplot(data=stats, x = Internet.users, y=Birth.rate)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

The scatter plot represents the Internet.users on x-axis and Birth.rate on y-Axis.

Step 6: The scatter plot is not as much clear.

So we are adding red colored points and also making the points bold.

qplot(data=stats, x = Internet.users, y=Birth.rate, size=I(2), color = I("red"))

Step 7: Let us add new variable as color which is Income.Group.

qplot(data=stats, x = Internet.users, y=Birth.rate, size=I(2), color=Income.Group)

Interpretation:

From the scatter plot for the DemographicData represents Internet.users are placed in the x-axis and in the y-axis Birth.rate is mentioned. where as different color is representing different groups of income levels. Here we can observe that the highest user of the internet are from the hogh income class, followed by the upper middle income class.The lowest user of internet are the low income class. Here we can also observe that the Higher the Birth.Rate the usage of the Internet is the lowest where as lower the the Birth.rate higher the usages of the internet. From this analysis we can state that there is a inverse relation between the Internet.user and the Birth.Rate. And there is a positive relation between Internet.user and Income.Group. Which means people having higher income are using internet more than the people having lower income.