You are employed as a data scientists by the World Bank and You are working on a Project to analyse the worlds Demographic trends. You are required to produce a scatter plot illustrating Birth Rate and Internet usage statistics by country. The scatter plot needs to be categorized by country’s.
Load the data into R
stats <- read.csv(file.choose())
Other methods to load the data
stats <- read.csv(“DemograhicData.csv”)
setting working directory
getwd() # To get the path of the dataset
## [1] "D:/KIAMS/SEM/SEM-V/R-For-Manager"
View(stats)
str(stats)
## 'data.frame': 195 obs. of 5 variables:
## $ Country.Name : chr "Aruba" "Afghanistan" "Angola" "Albania" ...
## $ Country.Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ Birth.rate : num 10.2 35.3 46 12.9 11 ...
## $ Internet.users: num 78.9 5.9 19.1 57.2 88 ...
## $ Income.Group : chr "High income" "Low income" "Upper middle income" "Upper middle income" ...
nrow(stats) # To check the number of rows.
## [1] 195
ncol(stats) # To check the number of columns.
## [1] 5
head(stats) # To check the first six data rows.
## Country.Name Country.Code Birth.rate Internet.users
## 1 Aruba ABW 10.244 78.9
## 2 Afghanistan AFG 35.253 5.9
## 3 Angola AGO 45.985 19.1
## 4 Albania ALB 12.877 57.2
## 5 United Arab Emirates ARE 11.044 88.0
## 6 Argentina ARG 17.716 59.9
## Income.Group
## 1 High income
## 2 Low income
## 3 Upper middle income
## 4 Upper middle income
## 5 High income
## 6 High income
tail(stats) # To check the bottom six data rows.
## Country.Name Country.Code Birth.rate Internet.users Income.Group
## 190 Samoa WSM 26.172 15.3 Lower middle income
## 191 Yemen, Rep. YEM 32.947 20.0 Lower middle income
## 192 South Africa ZAF 20.850 46.5 Upper middle income
## 193 Congo, Dem. Rep. COD 42.394 2.2 Low income
## 194 Zambia ZMB 40.471 15.4 Lower middle income
## 195 Zimbabwe ZWE 35.715 18.5 Low income
head(stats,10) # To check top 10 rows in the data.
## Country.Name Country.Code Birth.rate Internet.users
## 1 Aruba ABW 10.244 78.9000
## 2 Afghanistan AFG 35.253 5.9000
## 3 Angola AGO 45.985 19.1000
## 4 Albania ALB 12.877 57.2000
## 5 United Arab Emirates ARE 11.044 88.0000
## 6 Argentina ARG 17.716 59.9000
## 7 Armenia ARM 13.308 41.9000
## 8 Antigua and Barbuda ATG 16.447 63.4000
## 9 Australia AUS 13.200 83.0000
## 10 Austria AUT 9.400 80.6188
## Income.Group
## 1 High income
## 2 Low income
## 3 Upper middle income
## 4 Upper middle income
## 5 High income
## 6 High income
## 7 Lower middle income
## 8 High income
## 9 High income
## 10 High income
tail(stats,n=10) # To check bottom 10 rows in the data.
## Country.Name Country.Code Birth.rate Internet.users
## 186 Virgin Islands (U.S.) VIR 10.700 45.3
## 187 Vietnam VNM 15.537 43.9
## 188 Vanuatu VUT 26.739 11.3
## 189 West Bank and Gaza PSE 30.394 46.6
## 190 Samoa WSM 26.172 15.3
## 191 Yemen, Rep. YEM 32.947 20.0
## 192 South Africa ZAF 20.850 46.5
## 193 Congo, Dem. Rep. COD 42.394 2.2
## 194 Zambia ZMB 40.471 15.4
## 195 Zimbabwe ZWE 35.715 18.5
## Income.Group
## 186 High income
## 187 Lower middle income
## 188 Lower middle income
## 189 Lower middle income
## 190 Lower middle income
## 191 Lower middle income
## 192 Upper middle income
## 193 Low income
## 194 Lower middle income
## 195 Low income
summary(stats)
## Country.Name Country.Code Birth.rate Internet.users
## Length:195 Length:195 Min. : 7.90 Min. : 0.90
## Class :character Class :character 1st Qu.:12.12 1st Qu.:14.52
## Mode :character Mode :character Median :19.68 Median :41.00
## Mean :21.47 Mean :42.08
## 3rd Qu.:29.76 3rd Qu.:66.22
## Max. :49.66 Max. :96.55
## Income.Group
## Length:195
## Class :character
## Mode :character
##
##
##
stats$Internet.users # It will give the Internet.users data from the dataset stats.
## [1] 78.90000 5.90000 19.10000 57.20000 88.00000 59.90000 41.90000 63.40000
## [9] 83.00000 80.61880 58.70000 1.30000 82.17020 4.90000 9.10000 6.63000
## [17] 53.06150 90.00004 72.00000 57.79000 54.17000 33.60000 95.30000 36.94000
## [25] 51.04000 73.00000 64.50000 29.90000 15.00000 3.50000 85.80000 86.34000
## [33] 66.50000 45.80000 8.40000 6.40000 6.60000 51.70000 6.50000 37.50000
## [41] 45.96000 27.93000 74.10000 65.45480 74.11040 84.17000 9.50000 94.62970
## [49] 45.90000 16.50000 40.35368 29.40000 0.90000 71.63500 79.40000 1.90000
## [57] 91.51440 37.10000 81.91980 27.80000 9.20000 89.84410 43.30000 12.30000
## [65] 1.60000 14.00000 3.10000 16.40000 59.86630 35.00000 65.80000 19.70000
## [73] 65.40000 35.00000 74.20000 17.80000 66.74760 10.60000 72.64390 14.94000
## [81] 15.10000 78.24770 29.95000 9.20000 96.54680 70.80000 58.45930 37.10000
## [89] 41.00000 89.71000 54.00000 39.00000 23.00000 6.80000 11.50000 84.77000
## [97] 75.46000 12.50000 70.50000 3.20000 16.50000 46.20000 93.80000 21.90000
## [105] 5.00000 68.45290 93.77650 75.23440 65.80000 56.00000 45.00000 3.00000
## [113] 44.10000 43.46000 65.24000 3.50000 68.91380 1.60000 60.31000 20.00000
## [121] 5.40000 6.20000 39.00000 5.05000 66.97000 13.90000 66.00000 1.70000
## [129] 38.00000 15.50000 93.95640 95.05340 13.30000 82.78000 66.45000 10.90000
## [137] 44.03000 39.20000 37.00000 6.50000 62.84920 73.90000 62.09560 36.90000
## [145] 56.80000 85.30000 49.76450 67.97000 9.00000 60.50000 22.70000 13.10000
## [153] 81.00000 8.00000 1.70000 23.10930 1.50000 51.50000 14.10000 23.00000
## [161] 37.40000 77.88260 72.67560 94.78360 24.70000 50.40000 26.20000 2.30000
## [169] 4.50000 28.94000 16.00000 9.60000 1.10000 35.00000 63.80000 43.80000
## [177] 46.25000 4.40000 16.20000 41.00000 57.69000 84.20000 38.20000 52.00000
## [185] 54.90000 45.30000 43.90000 11.30000 46.60000 15.30000 20.00000 46.50000
## [193] 2.20000 15.40000 18.50000
stats$Income.Group <- as.factor(stats$Income.Group) # It will give the Income.Group data from the dataset stats stats$Income.Group levels(stats$Income.Group) # It will gives you all the parameter on which the data is seggregated.
stats[1:10,] # To get the data of first 10 rows of all column.
## Country.Name Country.Code Birth.rate Internet.users
## 1 Aruba ABW 10.244 78.9000
## 2 Afghanistan AFG 35.253 5.9000
## 3 Angola AGO 45.985 19.1000
## 4 Albania ALB 12.877 57.2000
## 5 United Arab Emirates ARE 11.044 88.0000
## 6 Argentina ARG 17.716 59.9000
## 7 Armenia ARM 13.308 41.9000
## 8 Antigua and Barbuda ATG 16.447 63.4000
## 9 Australia AUS 13.200 83.0000
## 10 Austria AUT 9.400 80.6188
## Income.Group
## 1 High income
## 2 Low income
## 3 Upper middle income
## 4 Upper middle income
## 5 High income
## 6 High income
## 7 Lower middle income
## 8 High income
## 9 High income
## 10 High income
stats[3:10,] # To get the data from 3rd row to 10th row.
## Country.Name Country.Code Birth.rate Internet.users
## 3 Angola AGO 45.985 19.1000
## 4 Albania ALB 12.877 57.2000
## 5 United Arab Emirates ARE 11.044 88.0000
## 6 Argentina ARG 17.716 59.9000
## 7 Armenia ARM 13.308 41.9000
## 8 Antigua and Barbuda ATG 16.447 63.4000
## 9 Australia AUS 13.200 83.0000
## 10 Austria AUT 9.400 80.6188
## Income.Group
## 3 Upper middle income
## 4 Upper middle income
## 5 High income
## 6 High income
## 7 Lower middle income
## 8 High income
## 9 High income
## 10 High income
stats[c(4,100),] # To get the data of 4th and 100th rows and all columns.
## Country.Name Country.Code Birth.rate Internet.users Income.Group
## 4 Albania ALB 12.877 57.2 Upper middle income
## 100 Liberia LBR 35.521 3.2 Low income
stats[c(4,100),2] # To get the data of 4th and 100th row but only the second variable stored in the data.
## [1] "ALB" "LBR"
stats$Mycale <- stats$Birth.rate + stats$Internet.users
stats$MyCale
## NULL
stats$Internet.users <2
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [157] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE
filter <- stats$Internet.users <2
filter
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [61] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [73] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [85] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [109] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [121] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [133] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [157] TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [169] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [181] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE
stats[filter,]
## Country.Name Country.Code Birth.rate Internet.users Income.Group
## 12 Burundi BDI 44.151 1.3 Low income
## 53 Eritrea ERI 34.800 0.9 Low income
## 56 Ethiopia ETH 32.925 1.9 Low income
## 65 Guinea GIN 37.337 1.6 Low income
## 118 Myanmar MMR 18.119 1.6 Lower middle income
## 128 Niger NER 49.661 1.7 Low income
## 155 Sierra Leone SLE 36.729 1.7 Low income
## 157 Somalia SOM 43.891 1.5 Low income
## 173 Timor-Leste TLS 35.755 1.1 Lower middle income
## Mycale
## 12 45.451
## 53 35.700
## 56 34.825
## 65 38.937
## 118 19.719
## 128 51.361
## 155 38.429
## 157 45.391
## 173 36.855
filter2 <- stats$Birth.rate >40
stats[ stats$Birth.rate > 40 & stats$Internet.user <2,]
## Country.Name Country.Code Birth.rate Internet.users Income.Group Mycale
## 12 Burundi BDI 44.151 1.3 Low income 45.451
## 128 Niger NER 49.661 1.7 Low income 51.361
## 157 Somalia SOM 43.891 1.5 Low income 45.391
stats[ stats$Income.Group == "High Income",]
## [1] Country.Name Country.Code Birth.rate Internet.users Income.Group
## [6] Mycale
## <0 rows> (or 0-length row.names)
library(ggplot2)
qplot(data=stats, x = Internet.users, y=Birth.rate)
## Warning: `qplot()` was deprecated in ggplot2 3.4.0.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
So we are adding red colored points and also making the points bold.
qplot(data=stats, x = Internet.users, y=Birth.rate, size=I(2), color = I("red"))
qplot(data=stats, x = Internet.users, y=Birth.rate, size=I(2), color=Income.Group)
From the scatter plot for the DemographicData represents Internet.users are placed in the x-axis and in the y-axis Birth.rate is mentioned. where as different color is representing different groups of income levels. Here we can observe that the highest user of the internet are from the hogh income class, followed by the upper middle income class.The lowest user of internet are the low income class. Here we can also observe that the Higher the Birth.Rate the usage of the Internet is the lowest where as lower the the Birth.rate higher the usages of the internet. From this analysis we can state that there is a inverse relation between the Internet.user and the Birth.Rate. And there is a positive relation between Internet.user and Income.Group. Which means people having higher income are using internet more than the people having lower income.