Provide the packages required to reproduce the report. Make sure you fulfilled the minimum requirement #10.
library(mlr)
## Loading required package: ParamHelpers
## Warning: replacing previous import 'BBmisc::isFALSE' by
## 'backports::isFALSE' when loading 'mlr'
library(tidyr)
library(Hmisc)
## Loading required package: lattice
## Loading required package: survival
## Loading required package: Formula
## Loading required package: ggplot2
##
## Attaching package: 'Hmisc'
## The following object is masked from 'package:mlr':
##
## impute
## The following objects are masked from 'package:base':
##
## format.pval, units
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:Hmisc':
##
## src, summarize
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(gdata)
## gdata: read.xls support for 'XLS' (Excel 97-2004) files ENABLED.
##
## gdata: read.xls support for 'XLSX' (Excel 2007+) files ENABLED.
##
## Attaching package: 'gdata'
## The following objects are masked from 'package:dplyr':
##
## combine, first, last
## The following object is masked from 'package:mlr':
##
## resample
## The following object is masked from 'package:stats':
##
## nobs
## The following object is masked from 'package:utils':
##
## object.size
## The following object is masked from 'package:base':
##
## startsWith
library(editrules)
## Loading required package: igraph
##
## Attaching package: 'igraph'
## The following objects are masked from 'package:dplyr':
##
## as_data_frame, groups, union
## The following object is masked from 'package:tidyr':
##
## crossing
## The following objects are masked from 'package:stats':
##
## decompose, spectrum
## The following object is masked from 'package:base':
##
## union
##
## Attaching package: 'editrules'
## The following objects are masked from 'package:igraph':
##
## blocks, normalize
## The following object is masked from 'package:dplyr':
##
## contains
## The following object is masked from 'package:tidyr':
##
## separate
## The following object is masked from 'package:ParamHelpers':
##
## isFeasible
library(stringr)
library(MVN)
## sROC 0.1-2 loaded
library(forecast)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:igraph':
##
## %--%
## The following object is masked from 'package:base':
##
## date
Now, the data processing process has been finished.
From the merged dataset, a dataset “north_america” which contains population information of North America countries—United States and Canada, has been obtained through merging and filtering 3 datasets containing the world male, female population and population’s regional information repectively.
The north_america dataset contains 10 variables with 2 character variables “Country.Code” and “Indicator.Code”, 4 factor variables “Country.Name”, “Indicator.Name”, “Region” and “IncomeGroup”, 3 numeric variables, “male_female_population”, “Population.percentage” and “transformation” and 1 date type variable, “year”. All the 10 variables has been converted into proper date types. The factor variable “IncomeGroup” has been order according to its levels , “Low income”, “Lower middle income”, “Upper middle income” and “High income”. And the merged datasets has been converted from untidy format into tidy format. All variables missing values and inconsistencies has been scaned and proporly dealed with.
The outliers of the numeric variable “male_female_population” which contains the size of population for male and female of each country has been scaned and coped by optimal boxcox transformation.
And the transformed variable from the “male_female_population” variable has been created by naming “transformed.population”. The transformed variable can also decrease the skewness to make the variable more normally distributed, which be used for further population analysis and forcast.
This analysis has retrieved three datasets of the total male and female population across different countries around the world from 1960 to 2016, and the regional information on the areas of these countries.
The datasets for total male and female population accross different countries containing the following variables. The “Country Name” and “Country Code” record the name of different countries and their abbreviations respectively. The “Indicator Name” and “Indicator Code” record the gender of the observations of the population data and their abreviation code respectively. The population size of each country from year 1960 to 2016 has been recorded in individual columns with each column occupied by each year.
The dataset called “Metadata_Country_API_SP.POP.TOTL.FE.IN_DS2_en_csv_v2_9952571” containing the information on regional and income levels also has been used. There are five variables in there, the “Country Code” record the name of different countries , the “Region” denoting which continents the countries belongs to, the “IncomeGroup” conveying the income level of differenting countries, the “SpecialNotes” recording different specific notes and the “TableName” containing different countries name.
All the three datasets are open sourced and can be find at the following websies: https://data.worldbank.org/indicator/SP.POP.TOTL.MA.IN https://data.worldbank.org/indicator/SP.POP.TOTL.FE.IN
A clear description of data sets, their sources, and variable descriptions should be provided. In this section, you must also provide the R codes with outputs (head of data sets) that you used to import/read/scrape the data set. You need to fulfil the minimum requirement #1 and merge at least two data sets to create the one you are going to work on. In addition to the R codes and outputs, you need to explain the steps that you have taken.
#Read the female population dataset.
fe<- read.csv("~/Desktop/population_female.csv", header =TRUE,skip=4)
#Read the male population dataset.
ma<- read.csv("~/Desktop/population_male.csv", header =TRUE,skip=4)
#Read the regional and income level information contained dataset.
region<- read.csv("~/Desktop/pupulation region.csv",header = TRUE)
#Before conducting the gathering method to the dataset, the Column, X2017, in both "ma" and "fe" datasets, has to be delete due to no information contained in there.
fe <- select(fe, -X2017)
ma <- select(ma, -X2017)
head(fe)
## Country.Name Country.Code Indicator.Name Indicator.Code X1960
## 1 Aruba ABW Population, female SP.POP.TOTL.FE.IN 27637
## 2 Afghanistan AFG Population, female SP.POP.TOTL.FE.IN 4346990
## 3 Angola AGO Population, female SP.POP.TOTL.FE.IN 2900597
## 4 Albania ALB Population, female SP.POP.TOTL.FE.IN 780595
## 5 Andorra AND Population, female SP.POP.TOTL.FE.IN NA
## 6 Arab World ARB Population, female SP.POP.TOTL.FE.IN 45909546
## X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968
## 1 28254 28655 28907 29094 29268 29458 29631 29804
## 2 4437679 4532368 4631212 4734371 4842015 4952722 5066254 5185164
## 3 2956543 3014082 3072260 3129657 3185518 3239546 3292896 3348020
## 4 805305 830317 855238 880256 904823 928927 953600 981013
## 5 NA NA NA NA NA NA NA NA
## 6 47174491 48482890 49837622 51242480 52699695 54215933 55790717 57411074
## X1969 X1970 X1971 X1972 X1973 X1974 X1975 X1976
## 1 29988 30178 30401 30634 30889 31065 31150 31111
## 2 5313009 5451441 5599707 5753953 5908686 6056668 6191328 6315722
## 3 3408200 3475917 3552328 3637187 3729918 3829313 3934592 4045039
## 4 1009808 1035954 1061521 1088570 1114865 1141014 1167738 1193877
## 5 NA NA NA NA NA NA NA NA
## 6 59058941 60724221 62398727 64094688 65847194 67704063 69698805 71842855
## X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984
## 1 30989 30834 30753 30805 31047 31457 31912 32227
## 2 6427910 6510970 6543069 6511413 6411284 6255675 6072080 5898422
## 3 4161240 4285197 4419637 4566089 4726261 4898495 5076921 5253511
## 4 1220603 1246195 1271298 1297791 1324521 1353597 1383430 1413336
## 5 NA NA NA NA NA NA NA NA
## 6 74123776 76521599 79005258 81550074 84149697 86805054 89506982 92246331
## X1985 X1986 X1987 X1988 X1989 X1990 X1991
## 1 32303 32055 31571 31104 30993 31491 32693
## 2 5764972 5674753 5627943 5647569 5760223 5981147 6327356
## 3 5422852 5581981 5733487 5884261 6044247 6220388 6416266
## 4 1442682 1469816 1497724 1525329 1568794 1603298 1604803
## 5 NA NA NA NA NA NA NA
## 6 95015336 97806638 100615528 103440246 106280987 110113892 113029317
## X1992 X1993 X1994 X1995 X1996 X1997 X1998
## 1 34482 36633 38785 40708 42294 43615 44757
## 2 6785762 7304033 7809329 8250185 8603315 8885666 9134961
## 3 6629273 6852909 7077584 7296962 7508648 7716678 7928906
## 4 1610619 1617772 1621629 1618647 1607089 1588481 1566199
## 5 NA NA NA NA NA NA NA
## 6 115137501 118134381 121111744 124723010 127619185 130509484 133371678
## X1999 X2000 X2001 X2002 X2003 X2004 X2005
## 1 45856 47012 48252 49506 50707 51712 52456
## 2 9408613 9746770 10162500 10637846 11144964 11642547 12101551
## 3 8156453 8407379 8684468 8985544 9307609 9645607 9995796
## 4 1544994 1528046 1511707 1509000 1507666 1505935 1501423
## 5 NA NA NA NA NA NA NA
## 6 136236749 139113662 141998877 144904104 147871961 150957572 154198970
## X2006 X2007 X2008 X2009 X2010 X2011 X2012
## 1 52896 53083 53104 53112 53202 53404 53701
## 2 12511524 12884170 13239157 13606016 14005473 14444001 14912657
## 3 10357453 10731777 11119158 11520427 11936016 12366067 12809782
## 4 1493529 1482744 1470945 1460006 1451422 1445814 1441287
## 5 NA NA NA NA NA NA NA
## 6 157610509 161177796 164879183 168667331 172508052 176400834 180343747
## X2013 X2014 X2015 X2016
## 1 54060 54417 54743 55023
## 2 15398276 15881092 16346869 16791609
## 3 13265577 13731363 14205741 14688058
## 4 1436444 1431651 1426369 1423809
## 5 NA NA NA NA
## 6 184306583 188253495 192160815 196007048
head(ma)
## Country.Name Country.Code Indicator.Name Indicator.Code X1960
## 1 Aruba ABW Population, male SP.POP.TOTL.MA.IN 26574
## 2 Afghanistan AFG Population, male SP.POP.TOTL.MA.IN 4649361
## 3 Angola AGO Population, male SP.POP.TOTL.MA.IN 2742585
## 4 Albania ALB Population, male SP.POP.TOTL.MA.IN 828205
## 5 Andorra AND Population, male SP.POP.TOTL.MA.IN NA
## 6 Arab World ARB Population, male SP.POP.TOTL.MA.IN 46581386
## X1961 X1962 X1963 X1964 X1965 X1966 X1967 X1968
## 1 27184 27570 27788 27938 28092 28257 28424 28582
## 2 4729085 4813500 4902742 4996990 5096399 5199609 5306376 5419182
## 3 2796481 2851979 2908157 2963664 3017781 3070224 3122099 3175771
## 4 854495 881002 907383 933879 959968 985646 1011998 1041259
## 5 NA NA NA NA NA NA NA NA
## 6 47870006 49199404 50573454 51997422 53475293 55014660 56616215 58269091
## X1969 X1970 X1971 X1972 X1973 X1974 X1975 X1976
## 1 28738 28885 29039 29206 29354 29463 29507 29475
## 2 5541419 5674682 5818118 5967987 6119136 6264873 6398958 6524577
## 3 3234432 3300464 3374941 3457647 3548042 3645025 3747887 3855958
## 4 1071887 1099525 1126332 1154556 1181887 1209110 1237093 1264649
## 5 NA NA NA NA NA NA NA NA
## 6 59957601 61674153 63408692 65174687 67016222 68992698 71144493 73489523
## X1977 X1978 X1979 X1980 X1981 X1982 X1983 X1984
## 1 29377 29269 29227 29291 29520 29888 30289 30609
## 2 6639628 6726764 6763626 6736957 6642670 6493970 6317189 6148693
## 3 3969748 4090950 4221884 4363811 4518246 4683661 4854641 5023810
## 4 1292943 1320071 1346534 1374206 1401535 1430681 1460530 1491093
## 5 NA NA NA NA NA NA NA NA
## 6 76009278 78662125 81387230 84139416 86902253 89685030 92498845 95364425
## X1985 X1986 X1987 X1988 X1989 X1990 X1991
## 1 30723 30589 30262 29975 30039 30658 31929
## 2 6018078 5926288 5874818 5893319 6017386 6267967 6666301
## 3 5186190 5339056 5484781 5629707 5782990 5951053 6137180
## 4 1522080 1552819 1585881 1617007 1659149 1683244 1661987
## 5 NA NA NA NA NA NA NA
## 6 98294965 101287129 104327021 107404525 110506415 114621554 117800551
## X1992 X1993 X1994 X1995 X1996 X1997 X1998
## 1 33753 35871 37915 39616 40906 41836 42520
## 2 7195469 7791066 8363390 8849356 9219569 9495939 9729038
## 3 6339072 6550825 6763717 6972032 7173636 7372303 7575412
## 4 1636420 1609515 1585907 1569137 1560944 1559800 1562331
## 5 NA NA NA NA NA NA NA
## 6 119899678 123151710 126324186 130306661 133224277 136065591 138863468
## X1999 X2000 X2001 X2002 X2003 X2004 X2005
## 1 43149 43841 44646 45486 46310 47025 47575
## 2 9995063 10346986 10803963 11342077 11919887 12476432 12969247
## 3 7793313 8033545 8298798 8587105 8895760 9220109 9556746
## 4 1563784 1560981 1548466 1542010 1531950 1521004 1510064
## 5 NA NA NA NA NA NA NA
## 6 141726120 144718354 147851480 151122471 154562558 158204457 162065758
## X2006 X2007 X2008 X2009 X2010 X2011 X2012
## 1 47936 48137 48249 48341 48467 48649 48876
## 2 13381926 13732622 14054874 14398315 14797694 15264598 15784301
## 3 9904946 10265910 10640262 11029120 11433115 11852498 12286368
## 4 1499018 1487273 1476369 1467513 1461599 1459381 1459114
## 5 NA NA NA NA NA NA NA
## 6 166162755 170476001 174946300 179477763 184000856 188495044 192963246
## X2013 X2014 X2015 X2016
## 1 49127 49378 49598 49799
## 2 16333412 16876928 17389625 17864423
## 3 12732763 13189103 13653564 14125405
## 4 1458648 1457453 1454334 1452292
## 5 NA NA NA NA
## 6 197395503 201789533 206144145 210445642
head(region)
## Country.Code Region IncomeGroup
## 1 ABW Latin America & Caribbean High income
## 2 AFG South Asia Low income
## 3 AGO Sub-Saharan Africa Lower middle income
## 4 ALB Europe & Central Asia Upper middle income
## 5 AND Europe & Central Asia High income
## 6 ARB
## SpecialNotes
## 1 SNA data for 2000-2011 are updated from official government statistics; 1994-1999 from UN databases. Base year has changed from 1995 to 2000.
## 2 Fiscal year end: March 20; reporting period for national accounts data is calendar year, estimated to insure consistency between national accounts and fiscal data. National accounts data are sourced from the IMF and differ from the Central Statistics Organization numbers due to exclusion of the opium economy.
## 3
## 4
## 5 WB-3 code changed from ADO to AND to align with ISO code.
## 6 Arab World aggregate. Arab World is composed of members of the League of Arab States.
## TableName
## 1 Aruba
## 2 Afghanistan
## 3 Angola
## 4 Albania
## 5 Andorra
## 6 Arab World
#It can be seen that the two datasets are not tidy due to the fact that the year of population as a variable should have occupied just one culumn but actually occupied 41 column with each year occupying one column. Accordingly, the amount of population each year which should have occupying just one column has occupying 57 column as the year variable does. Therefore, it is necessary to fisrt change the non-tidy data into the tidy one.
#As the "ma" and "fe" datasets are in untidy format which has been discussed before, it is necessary to change the untidy into the tidy one.
#Get tidy format datasets, "matidy" and "fetidy".
fetidy <-gather(fe,year,female_population,X1960:X2016)
head(fetidy)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Aruba ABW Population, female SP.POP.TOTL.FE.IN X1960
## 2 Afghanistan AFG Population, female SP.POP.TOTL.FE.IN X1960
## 3 Angola AGO Population, female SP.POP.TOTL.FE.IN X1960
## 4 Albania ALB Population, female SP.POP.TOTL.FE.IN X1960
## 5 Andorra AND Population, female SP.POP.TOTL.FE.IN X1960
## 6 Arab World ARB Population, female SP.POP.TOTL.FE.IN X1960
## female_population
## 1 27637
## 2 4346990
## 3 2900597
## 4 780595
## 5 NA
## 6 45909546
matidy <-gather(ma,year,male_population,X1960:X2016)
head(matidy)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Aruba ABW Population, male SP.POP.TOTL.MA.IN X1960
## 2 Afghanistan AFG Population, male SP.POP.TOTL.MA.IN X1960
## 3 Angola AGO Population, male SP.POP.TOTL.MA.IN X1960
## 4 Albania ALB Population, male SP.POP.TOTL.MA.IN X1960
## 5 Andorra AND Population, male SP.POP.TOTL.MA.IN X1960
## 6 Arab World ARB Population, male SP.POP.TOTL.MA.IN X1960
## male_population
## 1 26574
## 2 4649361
## 3 2742585
## 4 828205
## 5 NA
## 6 46581386
#Change the variable name "male_population" and "female_population" in both "matidy" and "fetidy" datasets into the same variable name "male_female_population"
colnames(fetidy)[6] <- "male_female_population"
colnames(matidy)[6] <- "male_female_population"
#Merge the two datasets "matidy" and "fetidy" into dataset "mafamale".
mafemale<-bind_rows(fetidy,matidy)
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): Unequal factor levels: coercing to character
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
## Warning in bind_rows_(x, .id): binding character and factor vector,
## coercing into character vector
#Join another dataset "region" into "mafamale".
region<-select(region,Country.Code,Region,IncomeGroup)
head(region)
## Country.Code Region IncomeGroup
## 1 ABW Latin America & Caribbean High income
## 2 AFG South Asia Low income
## 3 AGO Sub-Saharan Africa Lower middle income
## 4 ALB Europe & Central Asia Upper middle income
## 5 AND Europe & Central Asia High income
## 6 ARB
#Left join the region dataset into the mafemale dataset.
population<-mafemale%>%left_join(region,by="Country.Code")
## Warning: Column `Country.Code` joining factors with different levels,
## coercing to character vector
head(population)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Aruba ABW Population, female SP.POP.TOTL.FE.IN X1960
## 2 Afghanistan AFG Population, female SP.POP.TOTL.FE.IN X1960
## 3 Angola AGO Population, female SP.POP.TOTL.FE.IN X1960
## 4 Albania ALB Population, female SP.POP.TOTL.FE.IN X1960
## 5 Andorra AND Population, female SP.POP.TOTL.FE.IN X1960
## 6 Arab World ARB Population, female SP.POP.TOTL.FE.IN X1960
## male_female_population Region IncomeGroup
## 1 27637 Latin America & Caribbean High income
## 2 4346990 South Asia Low income
## 3 2900597 Sub-Saharan Africa Lower middle income
## 4 780595 Europe & Central Asia Upper middle income
## 5 NA Europe & Central Asia High income
## 6 45909546
#It can be check that the variable "region" groups the data into "East Asia & Pacific", "Europe & Central Asia","Latin America & Caribbean", "Middle East & North Africa" "North America", "South Asia", "Sub-Saharan Africa" and blank group without being classified into any of region of the world.
levels(population$Region)
## [1] "" "East Asia & Pacific"
## [3] "Europe & Central Asia" "Latin America & Caribbean"
## [5] "Middle East & North Africa" "North America"
## [7] "South Asia" "Sub-Saharan Africa"
# Inspect its structure, the data structure is shown below.
str(population)
## 'data.frame': 30096 obs. of 8 variables:
## $ Country.Name : Factor w/ 264 levels "Afghanistan",..: 11 1 6 2 5 8 250 9 10 4 ...
## $ Country.Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ Indicator.Name : chr "Population, female" "Population, female" "Population, female" "Population, female" ...
## $ Indicator.Code : chr "SP.POP.TOTL.FE.IN" "SP.POP.TOTL.FE.IN" "SP.POP.TOTL.FE.IN" "SP.POP.TOTL.FE.IN" ...
## $ year : chr "X1960" "X1960" "X1960" "X1960" ...
## $ male_female_population: num 27637 4346990 2900597 780595 NA ...
## $ Region : Factor w/ 8 levels "","East Asia & Pacific",..: 4 7 8 3 3 1 5 4 3 2 ...
## $ IncomeGroup : Factor w/ 5 levels "","High income",..: 2 3 4 5 2 1 2 5 4 5 ...
#The dataset can be filtered into different subsets according to different regions if the infomation of particular region needs to be investigated. In this analysis, the population information on North America will be analysed.
#Filter the dataset "population" and create the new subset only including the Northe America region, named "north_america".
north_america<-population%>%filter(Region=="North America")
head(north_america)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Bermuda BMU Population, female SP.POP.TOTL.FE.IN X1960
## 2 Canada CAN Population, female SP.POP.TOTL.FE.IN X1960
## 3 United States USA Population, female SP.POP.TOTL.FE.IN X1960
## 4 Bermuda BMU Population, female SP.POP.TOTL.FE.IN X1961
## 5 Canada CAN Population, female SP.POP.TOTL.FE.IN X1961
## 6 United States USA Population, female SP.POP.TOTL.FE.IN X1961
## male_female_population Region IncomeGroup
## 1 NA North America High income
## 2 8851741 North America High income
## 3 91167688 North America High income
## 4 NA North America High income
## 5 9040305 North America High income
## 6 92724776 North America High income
Summarise the types of variables and data structures, check the attributes in the data.
#Check the class of each variable in ???population???
sapply(population, class)
## Country.Name Country.Code Indicator.Name
## "factor" "character" "character"
## Indicator.Code year male_female_population
## "character" "character" "numeric"
## Region IncomeGroup
## "factor" "factor"
#Check the levels of factor variable "IncomeGroup" and order the levels of "IncomeGroup" according to the levels of the factor variable.
levels(population$IncomeGroup)
## [1] "" "High income" "Low income"
## [4] "Lower middle income" "Upper middle income"
population$IncomeGroup<-factor(population$IncomeGroup,levels = c("Low income", "Lower middle income", "Upper middle income","High income"), ordered=TRUE)
#The "year" variable is chacater variable, which should be changed into the date type.
#The year vairable is charactor due to the reason that there is a captial letter "X" at the first place of the string, which needs to be delete and extract all numeric numbers in year variable.
population$year <- str_extract(population$year , "[0-9]+")
#Then convert the character vairable "year" into date type.
population$year<-as.Date(as.character(population$year), format ="%Y")
class(population$year)
## [1] "Date"
#now, the variable year is date type.
head(population$year)
## [1] "1960-06-23" "1960-06-23" "1960-06-23" "1960-06-23" "1960-06-23"
## [6] "1960-06-23"
#The date type year "variable" is in dmy format.
#However, it is only desirable to keep the year part from the dmy format "year" variable. Therefere, the year extraction from date variable needs to be conducted.
population$year <-format(as.Date(population$year, format="%d/%m/%Y"),"%Y")
head(population$year)
## [1] "1960" "1960" "1960" "1960" "1960" "1960"
class(population$year)
## [1] "character"
#Now only the years of each observation has been shown.
typeof(population$male_female_population)
## [1] "double"
#The numeric "male_female_population" variable is in a double type variable.
#Check the class of each variable in ???population??? again.
sapply(population, class)
## $Country.Name
## [1] "factor"
##
## $Country.Code
## [1] "character"
##
## $Indicator.Name
## [1] "character"
##
## $Indicator.Code
## [1] "character"
##
## $year
## [1] "character"
##
## $male_female_population
## [1] "numeric"
##
## $Region
## [1] "factor"
##
## $IncomeGroup
## [1] "ordered" "factor"
#It can be seen that all variable has been converted into the approporiate types they should be.
#The indicator.name variable denotes if the observation is the male or femalepopulation, which should be the factor variable rather than character. Therefore, we change the character type into factor type variable and name the levels of the variable with male and female.
population$Indicator.Name<-as.factor(population$Indicator.Name)
levels(population$Indicator.Name)[levels(population$Indicator.Name)=="Population, female"] <- "female"
levels(population$Indicator.Name)[levels(population$Indicator.Name)=="Population, male"] <- "male"
class(population$Indicator.Name)
## [1] "factor"
levels(population$Indicator.Name)
## [1] "female" "male"
#Finally see the structure of "population" dataframe.
str(population)
## 'data.frame': 30096 obs. of 8 variables:
## $ Country.Name : Factor w/ 264 levels "Afghanistan",..: 11 1 6 2 5 8 250 9 10 4 ...
## $ Country.Code : chr "ABW" "AFG" "AGO" "ALB" ...
## $ Indicator.Name : Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...
## $ Indicator.Code : chr "SP.POP.TOTL.FE.IN" "SP.POP.TOTL.FE.IN" "SP.POP.TOTL.FE.IN" "SP.POP.TOTL.FE.IN" ...
## $ year : chr "1960" "1960" "1960" "1960" ...
## $ male_female_population: num 27637 4346990 2900597 780595 NA ...
## $ Region : Factor w/ 8 levels "","East Asia & Pacific",..: 4 7 8 3 3 1 5 4 3 2 ...
## $ IncomeGroup : Ord.factor w/ 4 levels "Low income"<"Lower middle income"<..: 4 1 2 3 4 NA 4 3 2 3 ...
Check if the data conforms the tidy data principles.
#The target dataset is the region in North America. The tidy format issue of the dataset has ready been coped with. The dataset of population is in tidy format so does in its sub dataset, "north_america".
north_america<-population%>%filter(Region=="North America")
head(north_america)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Bermuda BMU female SP.POP.TOTL.FE.IN 1960
## 2 Canada CAN female SP.POP.TOTL.FE.IN 1960
## 3 United States USA female SP.POP.TOTL.FE.IN 1960
## 4 Bermuda BMU female SP.POP.TOTL.FE.IN 1961
## 5 Canada CAN female SP.POP.TOTL.FE.IN 1961
## 6 United States USA female SP.POP.TOTL.FE.IN 1961
## male_female_population Region IncomeGroup
## 1 NA North America High income
## 2 8851741 North America High income
## 3 91167688 North America High income
## 4 NA North America High income
## 5 9040305 North America High income
## 6 92724776 North America High income
sapply(north_america, class)
## $Country.Name
## [1] "factor"
##
## $Country.Code
## [1] "character"
##
## $Indicator.Name
## [1] "factor"
##
## $Indicator.Code
## [1] "character"
##
## $year
## [1] "character"
##
## $male_female_population
## [1] "numeric"
##
## $Region
## [1] "factor"
##
## $IncomeGroup
## [1] "ordered" "factor"
#check the levels of countries included in the subset "north_america".
levels(droplevels(north_america$Country.Name))
## [1] "Bermuda" "Canada" "United States"
#it can be seen that there are 3 coutries included in the north america, "Bermuda", "Canada" and "United States".
#Then let's scan and deal with the missing value and inconsistencies issue as well as the outlier issue of the datasets.
colSums(is.na(north_america))
## Country.Name Country.Code Indicator.Name
## 0 0 0
## Indicator.Code year male_female_population
## 0 0 114
## Region IncomeGroup
## 0 0
#it can be seen that there are no missing value in all variables in the dataset except for the male_female_population variable with 114 missing data.
#scan the missing value of the population in each of the leveled countries in North America.
ber<-north_america%>% filter(Country.Name=="Bermuda")
canada<-north_america%>% filter(Country.Name=="Canada")
us<-north_america%>% filter(Country.Name=="United States")
sum(is.na(ber$male_female_population))
## [1] 114
sum(is.na(canada$male_female_population))
## [1] 0
sum(is.na(us$male_female_population))
## [1] 0
#it can be seen that there are no missing values in both United States and Canada, however, the population information in the Bermuda region has been totally missed.
#In this case, the original dataset contains no informarion on the population of the Bermuda, therefore, there is no way to recode these missing values but delate them.
#The information on the Bermuda has nothing to do with the population information, which needs to be delated.
#Drop the bermuda information
north_america<-na.omit(north_america)
#check the levels now in the North America.
levels(droplevels(north_america$Country.Name))
## [1] "Canada" "United States"
north_america
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 2 Canada CAN female SP.POP.TOTL.FE.IN 1960
## 3 United States USA female SP.POP.TOTL.FE.IN 1960
## 5 Canada CAN female SP.POP.TOTL.FE.IN 1961
## 6 United States USA female SP.POP.TOTL.FE.IN 1961
## 8 Canada CAN female SP.POP.TOTL.FE.IN 1962
## 9 United States USA female SP.POP.TOTL.FE.IN 1962
## 11 Canada CAN female SP.POP.TOTL.FE.IN 1963
## 12 United States USA female SP.POP.TOTL.FE.IN 1963
## 14 Canada CAN female SP.POP.TOTL.FE.IN 1964
## 15 United States USA female SP.POP.TOTL.FE.IN 1964
## 17 Canada CAN female SP.POP.TOTL.FE.IN 1965
## 18 United States USA female SP.POP.TOTL.FE.IN 1965
## 20 Canada CAN female SP.POP.TOTL.FE.IN 1966
## 21 United States USA female SP.POP.TOTL.FE.IN 1966
## 23 Canada CAN female SP.POP.TOTL.FE.IN 1967
## 24 United States USA female SP.POP.TOTL.FE.IN 1967
## 26 Canada CAN female SP.POP.TOTL.FE.IN 1968
## 27 United States USA female SP.POP.TOTL.FE.IN 1968
## 29 Canada CAN female SP.POP.TOTL.FE.IN 1969
## 30 United States USA female SP.POP.TOTL.FE.IN 1969
## 32 Canada CAN female SP.POP.TOTL.FE.IN 1970
## 33 United States USA female SP.POP.TOTL.FE.IN 1970
## 35 Canada CAN female SP.POP.TOTL.FE.IN 1971
## 36 United States USA female SP.POP.TOTL.FE.IN 1971
## 38 Canada CAN female SP.POP.TOTL.FE.IN 1972
## 39 United States USA female SP.POP.TOTL.FE.IN 1972
## 41 Canada CAN female SP.POP.TOTL.FE.IN 1973
## 42 United States USA female SP.POP.TOTL.FE.IN 1973
## 44 Canada CAN female SP.POP.TOTL.FE.IN 1974
## 45 United States USA female SP.POP.TOTL.FE.IN 1974
## 47 Canada CAN female SP.POP.TOTL.FE.IN 1975
## 48 United States USA female SP.POP.TOTL.FE.IN 1975
## 50 Canada CAN female SP.POP.TOTL.FE.IN 1976
## 51 United States USA female SP.POP.TOTL.FE.IN 1976
## 53 Canada CAN female SP.POP.TOTL.FE.IN 1977
## 54 United States USA female SP.POP.TOTL.FE.IN 1977
## 56 Canada CAN female SP.POP.TOTL.FE.IN 1978
## 57 United States USA female SP.POP.TOTL.FE.IN 1978
## 59 Canada CAN female SP.POP.TOTL.FE.IN 1979
## 60 United States USA female SP.POP.TOTL.FE.IN 1979
## 62 Canada CAN female SP.POP.TOTL.FE.IN 1980
## 63 United States USA female SP.POP.TOTL.FE.IN 1980
## 65 Canada CAN female SP.POP.TOTL.FE.IN 1981
## 66 United States USA female SP.POP.TOTL.FE.IN 1981
## 68 Canada CAN female SP.POP.TOTL.FE.IN 1982
## 69 United States USA female SP.POP.TOTL.FE.IN 1982
## 71 Canada CAN female SP.POP.TOTL.FE.IN 1983
## 72 United States USA female SP.POP.TOTL.FE.IN 1983
## 74 Canada CAN female SP.POP.TOTL.FE.IN 1984
## 75 United States USA female SP.POP.TOTL.FE.IN 1984
## 77 Canada CAN female SP.POP.TOTL.FE.IN 1985
## 78 United States USA female SP.POP.TOTL.FE.IN 1985
## 80 Canada CAN female SP.POP.TOTL.FE.IN 1986
## 81 United States USA female SP.POP.TOTL.FE.IN 1986
## 83 Canada CAN female SP.POP.TOTL.FE.IN 1987
## 84 United States USA female SP.POP.TOTL.FE.IN 1987
## 86 Canada CAN female SP.POP.TOTL.FE.IN 1988
## 87 United States USA female SP.POP.TOTL.FE.IN 1988
## 89 Canada CAN female SP.POP.TOTL.FE.IN 1989
## 90 United States USA female SP.POP.TOTL.FE.IN 1989
## 92 Canada CAN female SP.POP.TOTL.FE.IN 1990
## 93 United States USA female SP.POP.TOTL.FE.IN 1990
## 95 Canada CAN female SP.POP.TOTL.FE.IN 1991
## 96 United States USA female SP.POP.TOTL.FE.IN 1991
## 98 Canada CAN female SP.POP.TOTL.FE.IN 1992
## 99 United States USA female SP.POP.TOTL.FE.IN 1992
## 101 Canada CAN female SP.POP.TOTL.FE.IN 1993
## 102 United States USA female SP.POP.TOTL.FE.IN 1993
## 104 Canada CAN female SP.POP.TOTL.FE.IN 1994
## 105 United States USA female SP.POP.TOTL.FE.IN 1994
## 107 Canada CAN female SP.POP.TOTL.FE.IN 1995
## 108 United States USA female SP.POP.TOTL.FE.IN 1995
## 110 Canada CAN female SP.POP.TOTL.FE.IN 1996
## 111 United States USA female SP.POP.TOTL.FE.IN 1996
## 113 Canada CAN female SP.POP.TOTL.FE.IN 1997
## 114 United States USA female SP.POP.TOTL.FE.IN 1997
## 116 Canada CAN female SP.POP.TOTL.FE.IN 1998
## 117 United States USA female SP.POP.TOTL.FE.IN 1998
## 119 Canada CAN female SP.POP.TOTL.FE.IN 1999
## 120 United States USA female SP.POP.TOTL.FE.IN 1999
## 122 Canada CAN female SP.POP.TOTL.FE.IN 2000
## 123 United States USA female SP.POP.TOTL.FE.IN 2000
## 125 Canada CAN female SP.POP.TOTL.FE.IN 2001
## 126 United States USA female SP.POP.TOTL.FE.IN 2001
## 128 Canada CAN female SP.POP.TOTL.FE.IN 2002
## 129 United States USA female SP.POP.TOTL.FE.IN 2002
## 131 Canada CAN female SP.POP.TOTL.FE.IN 2003
## 132 United States USA female SP.POP.TOTL.FE.IN 2003
## 134 Canada CAN female SP.POP.TOTL.FE.IN 2004
## 135 United States USA female SP.POP.TOTL.FE.IN 2004
## 137 Canada CAN female SP.POP.TOTL.FE.IN 2005
## 138 United States USA female SP.POP.TOTL.FE.IN 2005
## 140 Canada CAN female SP.POP.TOTL.FE.IN 2006
## 141 United States USA female SP.POP.TOTL.FE.IN 2006
## 143 Canada CAN female SP.POP.TOTL.FE.IN 2007
## 144 United States USA female SP.POP.TOTL.FE.IN 2007
## 146 Canada CAN female SP.POP.TOTL.FE.IN 2008
## 147 United States USA female SP.POP.TOTL.FE.IN 2008
## 149 Canada CAN female SP.POP.TOTL.FE.IN 2009
## 150 United States USA female SP.POP.TOTL.FE.IN 2009
## 152 Canada CAN female SP.POP.TOTL.FE.IN 2010
## 153 United States USA female SP.POP.TOTL.FE.IN 2010
## 155 Canada CAN female SP.POP.TOTL.FE.IN 2011
## 156 United States USA female SP.POP.TOTL.FE.IN 2011
## 158 Canada CAN female SP.POP.TOTL.FE.IN 2012
## 159 United States USA female SP.POP.TOTL.FE.IN 2012
## 161 Canada CAN female SP.POP.TOTL.FE.IN 2013
## 162 United States USA female SP.POP.TOTL.FE.IN 2013
## 164 Canada CAN female SP.POP.TOTL.FE.IN 2014
## 165 United States USA female SP.POP.TOTL.FE.IN 2014
## 167 Canada CAN female SP.POP.TOTL.FE.IN 2015
## 168 United States USA female SP.POP.TOTL.FE.IN 2015
## 170 Canada CAN female SP.POP.TOTL.FE.IN 2016
## 171 United States USA female SP.POP.TOTL.FE.IN 2016
## 173 Canada CAN male SP.POP.TOTL.MA.IN 1960
## 174 United States USA male SP.POP.TOTL.MA.IN 1960
## 176 Canada CAN male SP.POP.TOTL.MA.IN 1961
## 177 United States USA male SP.POP.TOTL.MA.IN 1961
## 179 Canada CAN male SP.POP.TOTL.MA.IN 1962
## 180 United States USA male SP.POP.TOTL.MA.IN 1962
## 182 Canada CAN male SP.POP.TOTL.MA.IN 1963
## 183 United States USA male SP.POP.TOTL.MA.IN 1963
## 185 Canada CAN male SP.POP.TOTL.MA.IN 1964
## 186 United States USA male SP.POP.TOTL.MA.IN 1964
## 188 Canada CAN male SP.POP.TOTL.MA.IN 1965
## 189 United States USA male SP.POP.TOTL.MA.IN 1965
## 191 Canada CAN male SP.POP.TOTL.MA.IN 1966
## 192 United States USA male SP.POP.TOTL.MA.IN 1966
## 194 Canada CAN male SP.POP.TOTL.MA.IN 1967
## 195 United States USA male SP.POP.TOTL.MA.IN 1967
## 197 Canada CAN male SP.POP.TOTL.MA.IN 1968
## 198 United States USA male SP.POP.TOTL.MA.IN 1968
## 200 Canada CAN male SP.POP.TOTL.MA.IN 1969
## 201 United States USA male SP.POP.TOTL.MA.IN 1969
## 203 Canada CAN male SP.POP.TOTL.MA.IN 1970
## 204 United States USA male SP.POP.TOTL.MA.IN 1970
## 206 Canada CAN male SP.POP.TOTL.MA.IN 1971
## 207 United States USA male SP.POP.TOTL.MA.IN 1971
## 209 Canada CAN male SP.POP.TOTL.MA.IN 1972
## 210 United States USA male SP.POP.TOTL.MA.IN 1972
## 212 Canada CAN male SP.POP.TOTL.MA.IN 1973
## 213 United States USA male SP.POP.TOTL.MA.IN 1973
## 215 Canada CAN male SP.POP.TOTL.MA.IN 1974
## 216 United States USA male SP.POP.TOTL.MA.IN 1974
## 218 Canada CAN male SP.POP.TOTL.MA.IN 1975
## 219 United States USA male SP.POP.TOTL.MA.IN 1975
## 221 Canada CAN male SP.POP.TOTL.MA.IN 1976
## 222 United States USA male SP.POP.TOTL.MA.IN 1976
## 224 Canada CAN male SP.POP.TOTL.MA.IN 1977
## 225 United States USA male SP.POP.TOTL.MA.IN 1977
## 227 Canada CAN male SP.POP.TOTL.MA.IN 1978
## 228 United States USA male SP.POP.TOTL.MA.IN 1978
## 230 Canada CAN male SP.POP.TOTL.MA.IN 1979
## 231 United States USA male SP.POP.TOTL.MA.IN 1979
## 233 Canada CAN male SP.POP.TOTL.MA.IN 1980
## 234 United States USA male SP.POP.TOTL.MA.IN 1980
## 236 Canada CAN male SP.POP.TOTL.MA.IN 1981
## 237 United States USA male SP.POP.TOTL.MA.IN 1981
## 239 Canada CAN male SP.POP.TOTL.MA.IN 1982
## 240 United States USA male SP.POP.TOTL.MA.IN 1982
## 242 Canada CAN male SP.POP.TOTL.MA.IN 1983
## 243 United States USA male SP.POP.TOTL.MA.IN 1983
## 245 Canada CAN male SP.POP.TOTL.MA.IN 1984
## 246 United States USA male SP.POP.TOTL.MA.IN 1984
## 248 Canada CAN male SP.POP.TOTL.MA.IN 1985
## 249 United States USA male SP.POP.TOTL.MA.IN 1985
## 251 Canada CAN male SP.POP.TOTL.MA.IN 1986
## 252 United States USA male SP.POP.TOTL.MA.IN 1986
## 254 Canada CAN male SP.POP.TOTL.MA.IN 1987
## 255 United States USA male SP.POP.TOTL.MA.IN 1987
## 257 Canada CAN male SP.POP.TOTL.MA.IN 1988
## 258 United States USA male SP.POP.TOTL.MA.IN 1988
## 260 Canada CAN male SP.POP.TOTL.MA.IN 1989
## 261 United States USA male SP.POP.TOTL.MA.IN 1989
## 263 Canada CAN male SP.POP.TOTL.MA.IN 1990
## 264 United States USA male SP.POP.TOTL.MA.IN 1990
## 266 Canada CAN male SP.POP.TOTL.MA.IN 1991
## 267 United States USA male SP.POP.TOTL.MA.IN 1991
## 269 Canada CAN male SP.POP.TOTL.MA.IN 1992
## 270 United States USA male SP.POP.TOTL.MA.IN 1992
## 272 Canada CAN male SP.POP.TOTL.MA.IN 1993
## 273 United States USA male SP.POP.TOTL.MA.IN 1993
## 275 Canada CAN male SP.POP.TOTL.MA.IN 1994
## 276 United States USA male SP.POP.TOTL.MA.IN 1994
## 278 Canada CAN male SP.POP.TOTL.MA.IN 1995
## 279 United States USA male SP.POP.TOTL.MA.IN 1995
## 281 Canada CAN male SP.POP.TOTL.MA.IN 1996
## 282 United States USA male SP.POP.TOTL.MA.IN 1996
## 284 Canada CAN male SP.POP.TOTL.MA.IN 1997
## 285 United States USA male SP.POP.TOTL.MA.IN 1997
## 287 Canada CAN male SP.POP.TOTL.MA.IN 1998
## 288 United States USA male SP.POP.TOTL.MA.IN 1998
## 290 Canada CAN male SP.POP.TOTL.MA.IN 1999
## 291 United States USA male SP.POP.TOTL.MA.IN 1999
## 293 Canada CAN male SP.POP.TOTL.MA.IN 2000
## 294 United States USA male SP.POP.TOTL.MA.IN 2000
## 296 Canada CAN male SP.POP.TOTL.MA.IN 2001
## 297 United States USA male SP.POP.TOTL.MA.IN 2001
## 299 Canada CAN male SP.POP.TOTL.MA.IN 2002
## 300 United States USA male SP.POP.TOTL.MA.IN 2002
## 302 Canada CAN male SP.POP.TOTL.MA.IN 2003
## 303 United States USA male SP.POP.TOTL.MA.IN 2003
## 305 Canada CAN male SP.POP.TOTL.MA.IN 2004
## 306 United States USA male SP.POP.TOTL.MA.IN 2004
## 308 Canada CAN male SP.POP.TOTL.MA.IN 2005
## 309 United States USA male SP.POP.TOTL.MA.IN 2005
## 311 Canada CAN male SP.POP.TOTL.MA.IN 2006
## 312 United States USA male SP.POP.TOTL.MA.IN 2006
## 314 Canada CAN male SP.POP.TOTL.MA.IN 2007
## 315 United States USA male SP.POP.TOTL.MA.IN 2007
## 317 Canada CAN male SP.POP.TOTL.MA.IN 2008
## 318 United States USA male SP.POP.TOTL.MA.IN 2008
## 320 Canada CAN male SP.POP.TOTL.MA.IN 2009
## 321 United States USA male SP.POP.TOTL.MA.IN 2009
## 323 Canada CAN male SP.POP.TOTL.MA.IN 2010
## 324 United States USA male SP.POP.TOTL.MA.IN 2010
## 326 Canada CAN male SP.POP.TOTL.MA.IN 2011
## 327 United States USA male SP.POP.TOTL.MA.IN 2011
## 329 Canada CAN male SP.POP.TOTL.MA.IN 2012
## 330 United States USA male SP.POP.TOTL.MA.IN 2012
## 332 Canada CAN male SP.POP.TOTL.MA.IN 2013
## 333 United States USA male SP.POP.TOTL.MA.IN 2013
## 335 Canada CAN male SP.POP.TOTL.MA.IN 2014
## 336 United States USA male SP.POP.TOTL.MA.IN 2014
## 338 Canada CAN male SP.POP.TOTL.MA.IN 2015
## 339 United States USA male SP.POP.TOTL.MA.IN 2015
## 341 Canada CAN male SP.POP.TOTL.MA.IN 2016
## 342 United States USA male SP.POP.TOTL.MA.IN 2016
## male_female_population Region IncomeGroup
## 2 8851741 North America High income
## 3 91167688 North America High income
## 5 9040305 North America High income
## 6 92724776 North America High income
## 8 9221831 North America High income
## 9 94190774 North America High income
## 11 9408033 North America High income
## 12 95586520 North America High income
## 14 9599279 North America High income
## 15 96963098 North America High income
## 17 9784951 North America High income
## 18 98236551 North America High income
## 20 9977130 North America High income
## 21 99448848 North America High income
## 23 10164729 North America High income
## 24 100623425 North America High income
## 26 10335090 North America High income
## 27 101724070 North America High income
## 29 10480782 North America High income
## 30 102805810 North America High income
## 32 10632184 North America High income
## 33 104076122 North America High income
## 35 10795924 North America High income
## 36 105443973 North America High income
## 38 10972537 North America High income
## 39 106604253 North America High income
## 41 11163418 North America High income
## 42 107643874 North America High income
## 44 11370241 North America High income
## 45 108655509 North America High income
## 47 11594419 North America High income
## 48 109771934 North America High income
## 50 11758334 North America High income
## 51 110880498 North America High income
## 53 11908985 North America High income
## 54 112077291 North America High income
## 56 12041878 North America High income
## 57 113350788 North America High income
## 59 12175105 North America High income
## 60 114675212 North America High income
## 62 12344789 North America High income
## 63 115822943 North America High income
## 65 12508542 North America High income
## 66 116977168 North America High income
## 68 12668611 North America High income
## 69 118084774 North America High income
## 71 12803432 North America High income
## 72 119144238 North America High income
## 74 12933351 North America High income
## 75 120160657 North America High income
## 77 13059560 North America High income
## 78 121228156 North America High income
## 80 13195995 North America High income
## 81 122374325 North America High income
## 83 13373841 North America High income
## 84 123510043 North America High income
## 86 13550793 North America High income
## 87 124677136 North America High income
## 89 13798061 North America High income
## 90 125885861 North America High income
## 92 14009770 North America High income
## 93 127314146 North America High income
## 95 14206688 North America High income
## 96 128993474 North America High income
## 98 14387814 North America High income
## 99 130735361 North America High income
## 101 14551739 North America High income
## 102 132392823 North America High income
## 104 14696788 North America High income
## 105 133942111 North America High income
## 107 14821658 North America High income
## 108 135465255 North America High income
## 110 14982770 North America High income
## 111 136974478 North America High income
## 113 15140827 North America High income
## 114 138561200 North America High income
## 116 15270066 North America High income
## 117 140117226 North America High income
## 119 15394068 North America High income
## 120 141669463 North America High income
## 122 15527833 North America High income
## 123 143190463 North America High income
## 125 15682901 North America High income
## 126 144552310 North America High income
## 128 15821913 North America High income
## 129 145840225 North America High income
## 131 15978061 North America High income
## 132 147043924 North America High income
## 134 16136529 North America High income
## 135 148361957 North America High income
## 137 16293676 North America High income
## 138 149693614 North America High income
## 140 16420913 North America High income
## 141 151110004 North America High income
## 143 16577614 North America High income
## 144 152526919 North America High income
## 146 16754869 North America High income
## 147 153952230 North America High income
## 149 16945444 North America High income
## 150 155280807 North America High income
## 152 17134047 North America High income
## 153 156551423 North America High income
## 155 17304245 North America High income
## 156 157681176 North America High income
## 158 17510939 North America High income
## 159 158813852 North America High income
## 161 17715017 North America High income
## 162 159877330 North America High income
## 164 17908981 North America High income
## 165 161017913 North America High income
## 167 18058362 North America High income
## 168 162149165 North America High income
## 170 18274152 North America High income
## 171 163233094 North America High income
## 173 9057268 North America High income
## 174 89503312 North America High income
## 176 9230695 North America High income
## 177 90966224 North America High income
## 179 9392169 North America High income
## 180 92347226 North America High income
## 182 9555967 North America High income
## 183 93655480 North America High income
## 185 9725721 North America High income
## 186 94925902 North America High income
## 188 9893049 North America High income
## 189 96066449 North America High income
## 191 10070870 North America High income
## 192 97111152 North America High income
## 194 10247271 North America High income
## 195 98088575 North America High income
## 197 10408910 North America High income
## 198 98981930 North America High income
## 200 10547218 North America High income
## 201 99871190 North America High income
## 203 10691816 North America High income
## 204 100975878 North America High income
## 206 10849611 North America High income
## 207 102217027 North America High income
## 209 11021094 North America High income
## 210 103291747 North America High income
## 212 11205990 North America High income
## 213 104265126 North America High income
## 215 11403846 North America High income
## 216 105198491 North America High income
## 218 11614581 North America High income
## 219 106201066 North America High income
## 221 11759666 North America High income
## 222 107154502 North America High income
## 224 11887015 North America High income
## 225 108161709 North America High income
## 227 11994122 North America High income
## 228 109234212 North America High income
## 230 12101895 North America High income
## 231 110379788 North America High income
## 233 12248211 North America High income
## 234 111402057 North America High income
## 236 12391458 North America High income
## 237 112488832 North America High income
## 239 12533389 North America High income
## 240 113579226 North America High income
## 242 12652568 North America High income
## 243 114647762 North America High income
## 245 12768649 North America High income
## 246 115664343 North America High income
## 248 12882440 North America High income
## 249 116695844 North America High income
## 251 13008005 North America High income
## 252 117758675 North America High income
## 254 13176159 North America High income
## 255 118778957 North America High income
## 257 13344207 North America High income
## 258 119821864 North America High income
## 260 13580939 North America High income
## 261 120933139 North America High income
## 263 13781230 North America High income
## 264 122308854 North America High income
## 266 13964994 North America High income
## 267 123987526 North America High income
## 269 14131783 North America High income
## 270 125778639 North America High income
## 272 14281671 North America High income
## 273 127526177 North America High income
## 275 14415118 North America High income
## 276 129183889 North America High income
## 278 14532342 North America High income
## 279 130812745 North America High income
## 281 14689130 North America High income
## 282 132419522 North America High income
## 284 14846373 North America High income
## 285 134095800 North America High income
## 287 14977834 North America High income
## 288 135736774 North America High income
## 290 15105132 North America High income
## 291 137370537 North America High income
## 293 15241867 North America High income
## 294 138971948 North America High income
## 296 15398999 North America High income
## 297 140416645 North America High income
## 299 15540087 North America High income
## 300 141784968 North America High income
## 302 15697939 North America High income
## 303 143064009 North America High income
## 305 15858471 North America High income
## 306 144443341 North America High income
## 308 16018324 North America High income
## 309 145822985 North America High income
## 311 16149592 North America High income
## 312 147269908 North America High income
## 314 16310314 North America High income
## 315 148704288 North America High income
## 317 16490904 North America High income
## 318 150141736 North America High income
## 320 16683127 North America High income
## 321 151490722 North America High income
## 323 16871227 North America High income
## 324 152796770 North America High income
## 326 17038535 North America High income
## 327 153982182 North America High income
## 329 17239606 North America High income
## 330 155184527 North America High income
## 332 17437353 North America High income
## 333 156327578 North America High income
## 335 17626367 North America High income
## 336 157545543 North America High income
## 338 17774151 North America High income
## 339 158747453 North America High income
## 341 17990452 North America High income
## 342 159894419 North America High income
#Now, there are only 2 countries contained in North America dataset, the "Canada"" and "United States"" with no missing value in any varibale of any observations.
#As the same as the tidy format population dataset, north_america dataset is also in tidy format. Each variable must have its own column.Each observation must have its own row.Each value must have its own cell.
Create/mutate at two variables from the existing variables to demonstrate the total population and percentatge of male and female population each year for each country.
#More information of this dataset, like total population and percentage of male and female population each year for both Canada and united states, can be demonstrated by creating and transfroming new variable based on the existing male and female population information.
#Create a new dataset called "k1" to calculate the total the population each year for both Canada and united states,
k1<-aggregate(male_female_population~year+Country.Name,north_america,sum)
#rename the variable "male_female_population" into "total.population"
colnames(k1)[3] <- "total.population"
head(k1)
## year Country.Name total.population
## 1 1960 Canada 17909009
## 2 1961 Canada 18271000
## 3 1962 Canada 18614000
## 4 1963 Canada 18964000
## 5 1964 Canada 19325000
## 6 1965 Canada 19678000
#Now the total population for of male and female population each year for both Canada and united states are get.
#The left join the dataset "k1" to the dataset "north_america".
north_america<-north_america%>%left_join(k1)
## Joining, by = c("Country.Name", "year")
#The new variable "total.population" which represent the total population for each observation has been created.
head(north_america)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Canada CAN female SP.POP.TOTL.FE.IN 1960
## 2 United States USA female SP.POP.TOTL.FE.IN 1960
## 3 Canada CAN female SP.POP.TOTL.FE.IN 1961
## 4 United States USA female SP.POP.TOTL.FE.IN 1961
## 5 Canada CAN female SP.POP.TOTL.FE.IN 1962
## 6 United States USA female SP.POP.TOTL.FE.IN 1962
## male_female_population Region IncomeGroup total.population
## 1 8851741 North America High income 17909009
## 2 91167688 North America High income 180671000
## 3 9040305 North America High income 18271000
## 4 92724776 North America High income 183691000
## 5 9221831 North America High income 18614000
## 6 94190774 North America High income 186538000
#Create a new variable "population.percentage", which represent the percentage of male or female population consituting the total population of each country each year.
north_america<- north_america%>% mutate(population.percentage=male_female_population/total.population)
north_america$population.percentage <- paste(round((north_america$population.percentage)*100,digits=2),"%",sep="")
head(north_america)
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Canada CAN female SP.POP.TOTL.FE.IN 1960
## 2 United States USA female SP.POP.TOTL.FE.IN 1960
## 3 Canada CAN female SP.POP.TOTL.FE.IN 1961
## 4 United States USA female SP.POP.TOTL.FE.IN 1961
## 5 Canada CAN female SP.POP.TOTL.FE.IN 1962
## 6 United States USA female SP.POP.TOTL.FE.IN 1962
## male_female_population Region IncomeGroup total.population
## 1 8851741 North America High income 17909009
## 2 91167688 North America High income 180671000
## 3 9040305 North America High income 18271000
## 4 92724776 North America High income 183691000
## 5 9221831 North America High income 18614000
## 6 94190774 North America High income 186538000
## population.percentage
## 1 49.43%
## 2 50.46%
## 3 49.48%
## 4 50.48%
## 5 49.54%
## 6 50.49%
#Now, new variables,"total.population" and Population.percentage, have been created representing total population and percentage of male and female population each year for both Canada and united states.
Scan and deal with the data for missing values, inconsistencies and obvious errors.
#Next, the consistencies of the information in each valuable needs to be checked.
#list the type of each variable in the dataset.
sapply(north_america, class)
## $Country.Name
## [1] "factor"
##
## $Country.Code
## [1] "character"
##
## $Indicator.Name
## [1] "factor"
##
## $Indicator.Code
## [1] "character"
##
## $year
## [1] "character"
##
## $male_female_population
## [1] "numeric"
##
## $Region
## [1] "factor"
##
## $IncomeGroup
## [1] "ordered" "factor"
##
## $total.population
## [1] "numeric"
##
## $population.percentage
## [1] "character"
#first, the "year" variable is chacater variable, which should be changed into the numeric.
#The year vairable is charactor due to the reason that there is a captial letter "X" at the first place of the string, which needs to be delete and extract all numeric numbers in year variable.
north_america$year <- str_extract(north_america$year , "[0-9]+")
#Then convert the character vairable "year" into date type.
north_america$year<-as.Date(as.character(north_america$year), format ="%Y")
class(north_america$year)
## [1] "Date"
#now, the variable year is date type.
#Convert character variable "population.percentage" into numeric type.
north_america$population.percentage<-as.numeric(north_america$population.percentage)
## Warning: NAs introduced by coercion
#Convert charater variable, "Country.Name", into factor variable and check its consistency.
north_america$Country.Name<-as.factor(north_america$Country.Name)
levels(droplevels(north_america$Country.Name))
## [1] "Canada" "United States"
#Now the charater variable, "Country.Name", has been converted into factor variable with two levels "Canada" and "United States", which indicates all observations in these variable are consistent with no other other type of values.
#set rules to check if the year is restricted between 1960 and 2016.
Rule1 <- editset(c("year >= 1960", "year <= 2016"))
summary(violatedEdits(Rule1, north_america))
## Edit violations, 228 observations, 0 completely missing (0%):
##
## editname freq rel
## num2 164 71.9%
## num1 60 26.3%
##
## Edit violations per record:
##
## errors freq rel
## 0 4 1.8%
## 1 224 98.2%
#There is no violation to the rule 1.
#Check the consistency of factor variable "Country.Name".
levels(droplevels(north_america$Country.Name))
## [1] "Canada" "United States"
#the consistency of the country.name variable can be check jsut through checking the levels of the variable. All the obsetvations has assigned either Canada nor United States without any other levels or inconsistency values, like missing values.
class(north_america$Indicator.Name)
## [1] "factor"
#The indicator.name variable denotes if the observation is male or female population,which should be the factor variable.
class(north_america$Indicator.Name)
## [1] "factor"
levels(north_america$Indicator.Name)
## [1] "female" "male"
#Now the factor variable "indicator.name"" has two differen levels, male and female, which indicates no missing value and all observation in this variable has assigned a value.
#check consistency of IncomeGroup variable.
levels(droplevels(north_america$IncomeGroup))
## [1] "High income"
#All observation in north_america dataset has been classified into high income without any type of inconsistency.
colSums(is.na(north_america))
## Country.Name Country.Code Indicator.Name
## 0 0 0
## Indicator.Code year male_female_population
## 0 0 0
## Region IncomeGroup total.population
## 0 0 0
## population.percentage
## 228
#check consistency of character variables, Country.Code and Indicator.Code to see if all observations in these variables are consistency.
all(north_america$Country.Code %in% c("CAN", "USA"))
## [1] TRUE
#The result is true which means Country.Code variable is consistency and all observations in this variable are in either "CAN" or "USA" value.
all(north_america$Indicator.Code %in% c("SP.POP.TOTL.FE.IN", "SP.POP.TOTL.MA.IN"))
## [1] TRUE
#The result is true which means Indicator.Code variable is consistency and all observations in this variable are in either "SP.POP.TOTL.FE.IN" or "SP.POP.TOTL.MA.IN" value.
#After scan and deal with all the variables and observations, the north_america dataset now has no missing value and all variables are consistency.
#Check the outliers of the numeric variable "male_female_population". Due to the reason that there is only one numeric variable in original datset "male_female_population" needs to be conducting outliers analysis (the other 2 numeric variables "total.population" and "population.percentage" are all derived from "male_female_population" variable, so just univariate outlier detection), therefore, univariate outlier detection methods should be applied.
#The outliers of population size variable "male_female_population" in both Canada and United States has been checked.
canada<-north_america%>% filter(Country.Name=="Canada")
boxplot(male_female_population~Indicator.Name,data=canada,main="Box Plot of Population for Male and Female of Canada", ylab="Population", col = "grey")
us<-north_america%>% filter(Country.Name=="United States")
boxplot(male_female_population~Indicator.Name,data=us,main="Box Plot of Population for Male and Female of United States", ylab="Population", col = "grey")
#From the boxplot, there is no outliers in either United states and Canada male and female population dataset and all the observation performs normally.
#When the two variables year and population in that year are considered, it is shown that there are 3 outliers shown in Chi-Square Q-Q plot.
#However, due to the reason that the data in this analysis is the census of the population of different country,which is relatively accurate data but not baised sample size, therefore, the outliers should not be handled.
#The reason that contributes these outliers might probably by some socal factors, such as policy reform on immigration and child birth social walfare. Handling these outliers probably will bring more inaccuracy from the real data.
#Split the data into subsets that contains Canada-female, Canada-male, US-female, and US-male population information.
north_america %>%group_by(Indicator.Name, Country.Name) %>% hist(north_america$male_female_population)
## Warning in if (length(unique(w)) >= n.unique) {: the condition has length >
## 1 and only the first element will be used
## Warning in if (length(unique(w)) >= n.unique) {: the condition has length >
## 1 and only the first element will be used
## Warning in if (length(unique(w)) >= n.unique) {: the condition has length >
## 1 and only the first element will be used
## Warning in if (length(unique(w)) >= n.unique) {: the condition has length >
## 1 and only the first element will be used
b<-split(north_america, list(north_america$Country.Name, north_america$Indicator.Name),drop=TRUE)
Canada.female<-b$Canada.female
Canada.male<-b$Canada.male
US.male<-b$`United States.male`
US.female<-b$`United States.female`
#Demonstrate the histogram of these subset of the population.
hist(Canada.female$male_female_population)
hist(Canada.male$male_female_population)
hist(US.male$male_female_population)
hist(US.female$male_female_population)
#It can be seen that population in either male and female in both Canada and United States are obviously not normally distributed.
#Due to the fact that population has high correlation with the past, therefore, it is possible to make forecast of the future population, which requires nomally distributed dataset.
#Hence, a boxcox transformation variable are required for the forecasting to make population normally distributed.
#Filter the Canada female data subset and create a new variable named "transformed.population" by boxcox transformation.
Canada.female<-north_america%>%filter(Indicator.Name=="female",Country.Name=="Canada")
lambda=BoxCox.lambda(Canada.female$male_female_population)
Canada.female<-Canada.female%>%mutate(transformed.population=BoxCox(male_female_population,lambda = lambda))
#Filter the Canada female data subset and create a new variable named "transformed.population" by boxcox transformation.
Canada.male<-north_america%>%filter(Indicator.Name=="male",Country.Name=="Canada")
lambda=BoxCox.lambda(Canada.male$male_female_population)
Canada.male<-Canada.male%>%mutate(transformed.population=BoxCox(male_female_population,lambda = lambda))
#Filter the United States male data subset and create a new variable named "transformed.population" by boxcox transformation.
US.male<-north_america%>%filter(Indicator.Name=="male",Country.Name=="United States")
lambda=BoxCox.lambda(US.male$male_female_population)
US.male<-US.male%>%mutate(transformed.population=BoxCox(male_female_population,lambda = lambda))
#Filter the United States female data subset and create a new variable named "transformed.population" by boxcox transformation.
US.female<-north_america%>%filter(Indicator.Name=="female",Country.Name=="United States")
lambda=BoxCox.lambda(US.female$male_female_population)
US.female<-US.female%>%mutate(transformed.population=BoxCox(male_female_population,lambda = lambda))
#Combine these four data subsets together by merging the rows of the four dataset.
north_america<-bind_rows(Canada.male,Canada.female,US.male,US.female)
#sort and order the dataset by ascending order of the year.
north_america<-north_america[order(north_america$year,north_america$Country.Name),]
#Finally the post-processed tidy dataset "north_america" has been obtained.
north_america
## Country.Name Country.Code Indicator.Name Indicator.Code year
## 1 Canada CAN male SP.POP.TOTL.MA.IN 1960-06-23
## 58 Canada CAN female SP.POP.TOTL.FE.IN 1960-06-23
## 115 United States USA male SP.POP.TOTL.MA.IN 1960-06-23
## 172 United States USA female SP.POP.TOTL.FE.IN 1960-06-23
## 2 Canada CAN male SP.POP.TOTL.MA.IN 1961-06-23
## 59 Canada CAN female SP.POP.TOTL.FE.IN 1961-06-23
## 116 United States USA male SP.POP.TOTL.MA.IN 1961-06-23
## 173 United States USA female SP.POP.TOTL.FE.IN 1961-06-23
## 3 Canada CAN male SP.POP.TOTL.MA.IN 1962-06-23
## 60 Canada CAN female SP.POP.TOTL.FE.IN 1962-06-23
## 117 United States USA male SP.POP.TOTL.MA.IN 1962-06-23
## 174 United States USA female SP.POP.TOTL.FE.IN 1962-06-23
## 4 Canada CAN male SP.POP.TOTL.MA.IN 1963-06-23
## 61 Canada CAN female SP.POP.TOTL.FE.IN 1963-06-23
## 118 United States USA male SP.POP.TOTL.MA.IN 1963-06-23
## 175 United States USA female SP.POP.TOTL.FE.IN 1963-06-23
## 5 Canada CAN male SP.POP.TOTL.MA.IN 1964-06-23
## 62 Canada CAN female SP.POP.TOTL.FE.IN 1964-06-23
## 119 United States USA male SP.POP.TOTL.MA.IN 1964-06-23
## 176 United States USA female SP.POP.TOTL.FE.IN 1964-06-23
## 6 Canada CAN male SP.POP.TOTL.MA.IN 1965-06-23
## 63 Canada CAN female SP.POP.TOTL.FE.IN 1965-06-23
## 120 United States USA male SP.POP.TOTL.MA.IN 1965-06-23
## 177 United States USA female SP.POP.TOTL.FE.IN 1965-06-23
## 7 Canada CAN male SP.POP.TOTL.MA.IN 1966-06-23
## 64 Canada CAN female SP.POP.TOTL.FE.IN 1966-06-23
## 121 United States USA male SP.POP.TOTL.MA.IN 1966-06-23
## 178 United States USA female SP.POP.TOTL.FE.IN 1966-06-23
## 8 Canada CAN male SP.POP.TOTL.MA.IN 1967-06-23
## 65 Canada CAN female SP.POP.TOTL.FE.IN 1967-06-23
## 122 United States USA male SP.POP.TOTL.MA.IN 1967-06-23
## 179 United States USA female SP.POP.TOTL.FE.IN 1967-06-23
## 9 Canada CAN male SP.POP.TOTL.MA.IN 1968-06-23
## 66 Canada CAN female SP.POP.TOTL.FE.IN 1968-06-23
## 123 United States USA male SP.POP.TOTL.MA.IN 1968-06-23
## 180 United States USA female SP.POP.TOTL.FE.IN 1968-06-23
## 10 Canada CAN male SP.POP.TOTL.MA.IN 1969-06-23
## 67 Canada CAN female SP.POP.TOTL.FE.IN 1969-06-23
## 124 United States USA male SP.POP.TOTL.MA.IN 1969-06-23
## 181 United States USA female SP.POP.TOTL.FE.IN 1969-06-23
## 11 Canada CAN male SP.POP.TOTL.MA.IN 1970-06-23
## 68 Canada CAN female SP.POP.TOTL.FE.IN 1970-06-23
## 125 United States USA male SP.POP.TOTL.MA.IN 1970-06-23
## 182 United States USA female SP.POP.TOTL.FE.IN 1970-06-23
## 12 Canada CAN male SP.POP.TOTL.MA.IN 1971-06-23
## 69 Canada CAN female SP.POP.TOTL.FE.IN 1971-06-23
## 126 United States USA male SP.POP.TOTL.MA.IN 1971-06-23
## 183 United States USA female SP.POP.TOTL.FE.IN 1971-06-23
## 13 Canada CAN male SP.POP.TOTL.MA.IN 1972-06-23
## 70 Canada CAN female SP.POP.TOTL.FE.IN 1972-06-23
## 127 United States USA male SP.POP.TOTL.MA.IN 1972-06-23
## 184 United States USA female SP.POP.TOTL.FE.IN 1972-06-23
## 14 Canada CAN male SP.POP.TOTL.MA.IN 1973-06-23
## 71 Canada CAN female SP.POP.TOTL.FE.IN 1973-06-23
## 128 United States USA male SP.POP.TOTL.MA.IN 1973-06-23
## 185 United States USA female SP.POP.TOTL.FE.IN 1973-06-23
## 15 Canada CAN male SP.POP.TOTL.MA.IN 1974-06-23
## 72 Canada CAN female SP.POP.TOTL.FE.IN 1974-06-23
## 129 United States USA male SP.POP.TOTL.MA.IN 1974-06-23
## 186 United States USA female SP.POP.TOTL.FE.IN 1974-06-23
## 16 Canada CAN male SP.POP.TOTL.MA.IN 1975-06-23
## 73 Canada CAN female SP.POP.TOTL.FE.IN 1975-06-23
## 130 United States USA male SP.POP.TOTL.MA.IN 1975-06-23
## 187 United States USA female SP.POP.TOTL.FE.IN 1975-06-23
## 17 Canada CAN male SP.POP.TOTL.MA.IN 1976-06-23
## 74 Canada CAN female SP.POP.TOTL.FE.IN 1976-06-23
## 131 United States USA male SP.POP.TOTL.MA.IN 1976-06-23
## 188 United States USA female SP.POP.TOTL.FE.IN 1976-06-23
## 18 Canada CAN male SP.POP.TOTL.MA.IN 1977-06-23
## 75 Canada CAN female SP.POP.TOTL.FE.IN 1977-06-23
## 132 United States USA male SP.POP.TOTL.MA.IN 1977-06-23
## 189 United States USA female SP.POP.TOTL.FE.IN 1977-06-23
## 19 Canada CAN male SP.POP.TOTL.MA.IN 1978-06-23
## 76 Canada CAN female SP.POP.TOTL.FE.IN 1978-06-23
## 133 United States USA male SP.POP.TOTL.MA.IN 1978-06-23
## 190 United States USA female SP.POP.TOTL.FE.IN 1978-06-23
## 20 Canada CAN male SP.POP.TOTL.MA.IN 1979-06-23
## 77 Canada CAN female SP.POP.TOTL.FE.IN 1979-06-23
## 134 United States USA male SP.POP.TOTL.MA.IN 1979-06-23
## 191 United States USA female SP.POP.TOTL.FE.IN 1979-06-23
## 21 Canada CAN male SP.POP.TOTL.MA.IN 1980-06-23
## 78 Canada CAN female SP.POP.TOTL.FE.IN 1980-06-23
## 135 United States USA male SP.POP.TOTL.MA.IN 1980-06-23
## 192 United States USA female SP.POP.TOTL.FE.IN 1980-06-23
## 22 Canada CAN male SP.POP.TOTL.MA.IN 1981-06-23
## 79 Canada CAN female SP.POP.TOTL.FE.IN 1981-06-23
## 136 United States USA male SP.POP.TOTL.MA.IN 1981-06-23
## 193 United States USA female SP.POP.TOTL.FE.IN 1981-06-23
## 23 Canada CAN male SP.POP.TOTL.MA.IN 1982-06-23
## 80 Canada CAN female SP.POP.TOTL.FE.IN 1982-06-23
## 137 United States USA male SP.POP.TOTL.MA.IN 1982-06-23
## 194 United States USA female SP.POP.TOTL.FE.IN 1982-06-23
## 24 Canada CAN male SP.POP.TOTL.MA.IN 1983-06-23
## 81 Canada CAN female SP.POP.TOTL.FE.IN 1983-06-23
## 138 United States USA male SP.POP.TOTL.MA.IN 1983-06-23
## 195 United States USA female SP.POP.TOTL.FE.IN 1983-06-23
## 25 Canada CAN male SP.POP.TOTL.MA.IN 1984-06-23
## 82 Canada CAN female SP.POP.TOTL.FE.IN 1984-06-23
## 139 United States USA male SP.POP.TOTL.MA.IN 1984-06-23
## 196 United States USA female SP.POP.TOTL.FE.IN 1984-06-23
## 26 Canada CAN male SP.POP.TOTL.MA.IN 1985-06-23
## 83 Canada CAN female SP.POP.TOTL.FE.IN 1985-06-23
## 140 United States USA male SP.POP.TOTL.MA.IN 1985-06-23
## 197 United States USA female SP.POP.TOTL.FE.IN 1985-06-23
## 27 Canada CAN male SP.POP.TOTL.MA.IN 1986-06-23
## 84 Canada CAN female SP.POP.TOTL.FE.IN 1986-06-23
## 141 United States USA male SP.POP.TOTL.MA.IN 1986-06-23
## 198 United States USA female SP.POP.TOTL.FE.IN 1986-06-23
## 28 Canada CAN male SP.POP.TOTL.MA.IN 1987-06-23
## 85 Canada CAN female SP.POP.TOTL.FE.IN 1987-06-23
## 142 United States USA male SP.POP.TOTL.MA.IN 1987-06-23
## 199 United States USA female SP.POP.TOTL.FE.IN 1987-06-23
## 29 Canada CAN male SP.POP.TOTL.MA.IN 1988-06-23
## 86 Canada CAN female SP.POP.TOTL.FE.IN 1988-06-23
## 143 United States USA male SP.POP.TOTL.MA.IN 1988-06-23
## 200 United States USA female SP.POP.TOTL.FE.IN 1988-06-23
## 30 Canada CAN male SP.POP.TOTL.MA.IN 1989-06-23
## 87 Canada CAN female SP.POP.TOTL.FE.IN 1989-06-23
## 144 United States USA male SP.POP.TOTL.MA.IN 1989-06-23
## 201 United States USA female SP.POP.TOTL.FE.IN 1989-06-23
## 31 Canada CAN male SP.POP.TOTL.MA.IN 1990-06-23
## 88 Canada CAN female SP.POP.TOTL.FE.IN 1990-06-23
## 145 United States USA male SP.POP.TOTL.MA.IN 1990-06-23
## 202 United States USA female SP.POP.TOTL.FE.IN 1990-06-23
## 32 Canada CAN male SP.POP.TOTL.MA.IN 1991-06-23
## 89 Canada CAN female SP.POP.TOTL.FE.IN 1991-06-23
## 146 United States USA male SP.POP.TOTL.MA.IN 1991-06-23
## 203 United States USA female SP.POP.TOTL.FE.IN 1991-06-23
## 33 Canada CAN male SP.POP.TOTL.MA.IN 1992-06-23
## 90 Canada CAN female SP.POP.TOTL.FE.IN 1992-06-23
## 147 United States USA male SP.POP.TOTL.MA.IN 1992-06-23
## 204 United States USA female SP.POP.TOTL.FE.IN 1992-06-23
## 34 Canada CAN male SP.POP.TOTL.MA.IN 1993-06-23
## 91 Canada CAN female SP.POP.TOTL.FE.IN 1993-06-23
## 148 United States USA male SP.POP.TOTL.MA.IN 1993-06-23
## 205 United States USA female SP.POP.TOTL.FE.IN 1993-06-23
## 35 Canada CAN male SP.POP.TOTL.MA.IN 1994-06-23
## 92 Canada CAN female SP.POP.TOTL.FE.IN 1994-06-23
## 149 United States USA male SP.POP.TOTL.MA.IN 1994-06-23
## 206 United States USA female SP.POP.TOTL.FE.IN 1994-06-23
## 36 Canada CAN male SP.POP.TOTL.MA.IN 1995-06-23
## 93 Canada CAN female SP.POP.TOTL.FE.IN 1995-06-23
## 150 United States USA male SP.POP.TOTL.MA.IN 1995-06-23
## 207 United States USA female SP.POP.TOTL.FE.IN 1995-06-23
## 37 Canada CAN male SP.POP.TOTL.MA.IN 1996-06-23
## 94 Canada CAN female SP.POP.TOTL.FE.IN 1996-06-23
## 151 United States USA male SP.POP.TOTL.MA.IN 1996-06-23
## 208 United States USA female SP.POP.TOTL.FE.IN 1996-06-23
## 38 Canada CAN male SP.POP.TOTL.MA.IN 1997-06-23
## 95 Canada CAN female SP.POP.TOTL.FE.IN 1997-06-23
## 152 United States USA male SP.POP.TOTL.MA.IN 1997-06-23
## 209 United States USA female SP.POP.TOTL.FE.IN 1997-06-23
## 39 Canada CAN male SP.POP.TOTL.MA.IN 1998-06-23
## 96 Canada CAN female SP.POP.TOTL.FE.IN 1998-06-23
## 153 United States USA male SP.POP.TOTL.MA.IN 1998-06-23
## 210 United States USA female SP.POP.TOTL.FE.IN 1998-06-23
## 40 Canada CAN male SP.POP.TOTL.MA.IN 1999-06-23
## 97 Canada CAN female SP.POP.TOTL.FE.IN 1999-06-23
## 154 United States USA male SP.POP.TOTL.MA.IN 1999-06-23
## 211 United States USA female SP.POP.TOTL.FE.IN 1999-06-23
## 41 Canada CAN male SP.POP.TOTL.MA.IN 2000-06-23
## 98 Canada CAN female SP.POP.TOTL.FE.IN 2000-06-23
## 155 United States USA male SP.POP.TOTL.MA.IN 2000-06-23
## 212 United States USA female SP.POP.TOTL.FE.IN 2000-06-23
## 42 Canada CAN male SP.POP.TOTL.MA.IN 2001-06-23
## 99 Canada CAN female SP.POP.TOTL.FE.IN 2001-06-23
## 156 United States USA male SP.POP.TOTL.MA.IN 2001-06-23
## 213 United States USA female SP.POP.TOTL.FE.IN 2001-06-23
## 43 Canada CAN male SP.POP.TOTL.MA.IN 2002-06-23
## 100 Canada CAN female SP.POP.TOTL.FE.IN 2002-06-23
## 157 United States USA male SP.POP.TOTL.MA.IN 2002-06-23
## 214 United States USA female SP.POP.TOTL.FE.IN 2002-06-23
## 44 Canada CAN male SP.POP.TOTL.MA.IN 2003-06-23
## 101 Canada CAN female SP.POP.TOTL.FE.IN 2003-06-23
## 158 United States USA male SP.POP.TOTL.MA.IN 2003-06-23
## 215 United States USA female SP.POP.TOTL.FE.IN 2003-06-23
## 45 Canada CAN male SP.POP.TOTL.MA.IN 2004-06-23
## 102 Canada CAN female SP.POP.TOTL.FE.IN 2004-06-23
## 159 United States USA male SP.POP.TOTL.MA.IN 2004-06-23
## 216 United States USA female SP.POP.TOTL.FE.IN 2004-06-23
## 46 Canada CAN male SP.POP.TOTL.MA.IN 2005-06-23
## 103 Canada CAN female SP.POP.TOTL.FE.IN 2005-06-23
## 160 United States USA male SP.POP.TOTL.MA.IN 2005-06-23
## 217 United States USA female SP.POP.TOTL.FE.IN 2005-06-23
## 47 Canada CAN male SP.POP.TOTL.MA.IN 2006-06-23
## 104 Canada CAN female SP.POP.TOTL.FE.IN 2006-06-23
## 161 United States USA male SP.POP.TOTL.MA.IN 2006-06-23
## 218 United States USA female SP.POP.TOTL.FE.IN 2006-06-23
## 48 Canada CAN male SP.POP.TOTL.MA.IN 2007-06-23
## 105 Canada CAN female SP.POP.TOTL.FE.IN 2007-06-23
## 162 United States USA male SP.POP.TOTL.MA.IN 2007-06-23
## 219 United States USA female SP.POP.TOTL.FE.IN 2007-06-23
## 49 Canada CAN male SP.POP.TOTL.MA.IN 2008-06-23
## 106 Canada CAN female SP.POP.TOTL.FE.IN 2008-06-23
## 163 United States USA male SP.POP.TOTL.MA.IN 2008-06-23
## 220 United States USA female SP.POP.TOTL.FE.IN 2008-06-23
## 50 Canada CAN male SP.POP.TOTL.MA.IN 2009-06-23
## 107 Canada CAN female SP.POP.TOTL.FE.IN 2009-06-23
## 164 United States USA male SP.POP.TOTL.MA.IN 2009-06-23
## 221 United States USA female SP.POP.TOTL.FE.IN 2009-06-23
## 51 Canada CAN male SP.POP.TOTL.MA.IN 2010-06-23
## 108 Canada CAN female SP.POP.TOTL.FE.IN 2010-06-23
## 165 United States USA male SP.POP.TOTL.MA.IN 2010-06-23
## 222 United States USA female SP.POP.TOTL.FE.IN 2010-06-23
## 52 Canada CAN male SP.POP.TOTL.MA.IN 2011-06-23
## 109 Canada CAN female SP.POP.TOTL.FE.IN 2011-06-23
## 166 United States USA male SP.POP.TOTL.MA.IN 2011-06-23
## 223 United States USA female SP.POP.TOTL.FE.IN 2011-06-23
## 53 Canada CAN male SP.POP.TOTL.MA.IN 2012-06-23
## 110 Canada CAN female SP.POP.TOTL.FE.IN 2012-06-23
## 167 United States USA male SP.POP.TOTL.MA.IN 2012-06-23
## 224 United States USA female SP.POP.TOTL.FE.IN 2012-06-23
## 54 Canada CAN male SP.POP.TOTL.MA.IN 2013-06-23
## 111 Canada CAN female SP.POP.TOTL.FE.IN 2013-06-23
## 168 United States USA male SP.POP.TOTL.MA.IN 2013-06-23
## 225 United States USA female SP.POP.TOTL.FE.IN 2013-06-23
## 55 Canada CAN male SP.POP.TOTL.MA.IN 2014-06-23
## 112 Canada CAN female SP.POP.TOTL.FE.IN 2014-06-23
## 169 United States USA male SP.POP.TOTL.MA.IN 2014-06-23
## 226 United States USA female SP.POP.TOTL.FE.IN 2014-06-23
## 56 Canada CAN male SP.POP.TOTL.MA.IN 2015-06-23
## 113 Canada CAN female SP.POP.TOTL.FE.IN 2015-06-23
## 170 United States USA male SP.POP.TOTL.MA.IN 2015-06-23
## 227 United States USA female SP.POP.TOTL.FE.IN 2015-06-23
## 57 Canada CAN male SP.POP.TOTL.MA.IN 2016-06-23
## 114 Canada CAN female SP.POP.TOTL.FE.IN 2016-06-23
## 171 United States USA male SP.POP.TOTL.MA.IN 2016-06-23
## 228 United States USA female SP.POP.TOTL.FE.IN 2016-06-23
## male_female_population Region IncomeGroup total.population
## 1 9057268 North America High income 17909009
## 58 8851741 North America High income 17909009
## 115 89503312 North America High income 180671000
## 172 91167688 North America High income 180671000
## 2 9230695 North America High income 18271000
## 59 9040305 North America High income 18271000
## 116 90966224 North America High income 183691000
## 173 92724776 North America High income 183691000
## 3 9392169 North America High income 18614000
## 60 9221831 North America High income 18614000
## 117 92347226 North America High income 186538000
## 174 94190774 North America High income 186538000
## 4 9555967 North America High income 18964000
## 61 9408033 North America High income 18964000
## 118 93655480 North America High income 189242000
## 175 95586520 North America High income 189242000
## 5 9725721 North America High income 19325000
## 62 9599279 North America High income 19325000
## 119 94925902 North America High income 191889000
## 176 96963098 North America High income 191889000
## 6 9893049 North America High income 19678000
## 63 9784951 North America High income 19678000
## 120 96066449 North America High income 194303000
## 177 98236551 North America High income 194303000
## 7 10070870 North America High income 20048000
## 64 9977130 North America High income 20048000
## 121 97111152 North America High income 196560000
## 178 99448848 North America High income 196560000
## 8 10247271 North America High income 20412000
## 65 10164729 North America High income 20412000
## 122 98088575 North America High income 198712000
## 179 100623425 North America High income 198712000
## 9 10408910 North America High income 20744000
## 66 10335090 North America High income 20744000
## 123 98981930 North America High income 200706000
## 180 101724070 North America High income 200706000
## 10 10547218 North America High income 21028000
## 67 10480782 North America High income 21028000
## 124 99871190 North America High income 202677000
## 181 102805810 North America High income 202677000
## 11 10691816 North America High income 21324000
## 68 10632184 North America High income 21324000
## 125 100975878 North America High income 205052000
## 182 104076122 North America High income 205052000
## 12 10849611 North America High income 21645535
## 69 10795924 North America High income 21645535
## 126 102217027 North America High income 207661000
## 183 105443973 North America High income 207661000
## 13 11021094 North America High income 21993631
## 70 10972537 North America High income 21993631
## 127 103291747 North America High income 209896000
## 184 106604253 North America High income 209896000
## 14 11205990 North America High income 22369408
## 71 11163418 North America High income 22369408
## 128 104265126 North America High income 211909000
## 185 107643874 North America High income 211909000
## 15 11403846 North America High income 22774087
## 72 11370241 North America High income 22774087
## 129 105198491 North America High income 213854000
## 186 108655509 North America High income 213854000
## 16 11614581 North America High income 23209000
## 73 11594419 North America High income 23209000
## 130 106201066 North America High income 215973000
## 187 109771934 North America High income 215973000
## 17 11759666 North America High income 23518000
## 74 11758334 North America High income 23518000
## 131 107154502 North America High income 218035000
## 188 110880498 North America High income 218035000
## 18 11887015 North America High income 23796000
## 75 11908985 North America High income 23796000
## 132 108161709 North America High income 220239000
## 189 112077291 North America High income 220239000
## 19 11994122 North America High income 24036000
## 76 12041878 North America High income 24036000
## 133 109234212 North America High income 222585000
## 190 113350788 North America High income 222585000
## 20 12101895 North America High income 24277000
## 77 12175105 North America High income 24277000
## 134 110379788 North America High income 225055000
## 191 114675212 North America High income 225055000
## 21 12248211 North America High income 24593000
## 78 12344789 North America High income 24593000
## 135 111402057 North America High income 227225000
## 192 115822943 North America High income 227225000
## 22 12391458 North America High income 24900000
## 79 12508542 North America High income 24900000
## 136 112488832 North America High income 229466000
## 193 116977168 North America High income 229466000
## 23 12533389 North America High income 25202000
## 80 12668611 North America High income 25202000
## 137 113579226 North America High income 231664000
## 194 118084774 North America High income 231664000
## 24 12652568 North America High income 25456000
## 81 12803432 North America High income 25456000
## 138 114647762 North America High income 233792000
## 195 119144238 North America High income 233792000
## 25 12768649 North America High income 25702000
## 82 12933351 North America High income 25702000
## 139 115664343 North America High income 235825000
## 196 120160657 North America High income 235825000
## 26 12882440 North America High income 25942000
## 83 13059560 North America High income 25942000
## 140 116695844 North America High income 237924000
## 197 121228156 North America High income 237924000
## 27 13008005 North America High income 26204000
## 84 13195995 North America High income 26204000
## 141 117758675 North America High income 240133000
## 198 122374325 North America High income 240133000
## 28 13176159 North America High income 26550000
## 85 13373841 North America High income 26550000
## 142 118778957 North America High income 242289000
## 199 123510043 North America High income 242289000
## 29 13344207 North America High income 26895000
## 86 13550793 North America High income 26895000
## 143 119821864 North America High income 244499000
## 200 124677136 North America High income 244499000
## 30 13580939 North America High income 27379000
## 87 13798061 North America High income 27379000
## 144 120933139 North America High income 246819000
## 201 125885861 North America High income 246819000
## 31 13781230 North America High income 27791000
## 88 14009770 North America High income 27791000
## 145 122308854 North America High income 249623000
## 202 127314146 North America High income 249623000
## 32 13964994 North America High income 28171682
## 89 14206688 North America High income 28171682
## 146 123987526 North America High income 252981000
## 203 128993474 North America High income 252981000
## 33 14131783 North America High income 28519597
## 90 14387814 North America High income 28519597
## 147 125778639 North America High income 256514000
## 204 130735361 North America High income 256514000
## 34 14281671 North America High income 28833410
## 91 14551739 North America High income 28833410
## 148 127526177 North America High income 259919000
## 205 132392823 North America High income 259919000
## 35 14415118 North America High income 29111906
## 92 14696788 North America High income 29111906
## 149 129183889 North America High income 263126000
## 206 133942111 North America High income 263126000
## 36 14532342 North America High income 29354000
## 93 14821658 North America High income 29354000
## 150 130812745 North America High income 266278000
## 207 135465255 North America High income 266278000
## 37 14689130 North America High income 29671900
## 94 14982770 North America High income 29671900
## 151 132419522 North America High income 269394000
## 208 136974478 North America High income 269394000
## 38 14846373 North America High income 29987200
## 95 15140827 North America High income 29987200
## 152 134095800 North America High income 272657000
## 209 138561200 North America High income 272657000
## 39 14977834 North America High income 30247900
## 96 15270066 North America High income 30247900
## 153 135736774 North America High income 275854000
## 210 140117226 North America High income 275854000
## 40 15105132 North America High income 30499200
## 97 15394068 North America High income 30499200
## 154 137370537 North America High income 279040000
## 211 141669463 North America High income 279040000
## 41 15241867 North America High income 30769700
## 98 15527833 North America High income 30769700
## 155 138971948 North America High income 282162411
## 212 143190463 North America High income 282162411
## 42 15398999 North America High income 31081900
## 99 15682901 North America High income 31081900
## 156 140416645 North America High income 284968955
## 213 144552310 North America High income 284968955
## 43 15540087 North America High income 31362000
## 100 15821913 North America High income 31362000
## 157 141784968 North America High income 287625193
## 214 145840225 North America High income 287625193
## 44 15697939 North America High income 31676000
## 101 15978061 North America High income 31676000
## 158 143064009 North America High income 290107933
## 215 147043924 North America High income 290107933
## 45 15858471 North America High income 31995000
## 102 16136529 North America High income 31995000
## 159 144443341 North America High income 292805298
## 216 148361957 North America High income 292805298
## 46 16018324 North America High income 32312000
## 103 16293676 North America High income 32312000
## 160 145822985 North America High income 295516599
## 217 149693614 North America High income 295516599
## 47 16149592 North America High income 32570505
## 104 16420913 North America High income 32570505
## 161 147269908 North America High income 298379912
## 218 151110004 North America High income 298379912
## 48 16310314 North America High income 32887928
## 105 16577614 North America High income 32887928
## 162 148704288 North America High income 301231207
## 219 152526919 North America High income 301231207
## 49 16490904 North America High income 33245773
## 106 16754869 North America High income 33245773
## 163 150141736 North America High income 304093966
## 220 153952230 North America High income 304093966
## 50 16683127 North America High income 33628571
## 107 16945444 North America High income 33628571
## 164 151490722 North America High income 306771529
## 221 155280807 North America High income 306771529
## 51 16871227 North America High income 34005274
## 108 17134047 North America High income 34005274
## 165 152796770 North America High income 309348193
## 222 156551423 North America High income 309348193
## 52 17038535 North America High income 34342780
## 109 17304245 North America High income 34342780
## 166 153982182 North America High income 311663358
## 223 157681176 North America High income 311663358
## 53 17239606 North America High income 34750545
## 110 17510939 North America High income 34750545
## 167 155184527 North America High income 313998379
## 224 158813852 North America High income 313998379
## 54 17437353 North America High income 35152370
## 111 17715017 North America High income 35152370
## 168 156327578 North America High income 316204908
## 225 159877330 North America High income 316204908
## 55 17626367 North America High income 35535348
## 112 17908981 North America High income 35535348
## 169 157545543 North America High income 318563456
## 226 161017913 North America High income 318563456
## 56 17774151 North America High income 35832513
## 113 18058362 North America High income 35832513
## 170 158747453 North America High income 320896618
## 227 162149165 North America High income 320896618
## 57 17990452 North America High income 36264604
## 114 18274152 North America High income 36264604
## 171 159894419 North America High income 323127513
## 228 163233094 North America High income 323127513
## population.percentage transformed.population
## 1 NA 564047.43
## 58 NA 6899526.15
## 115 NA 10246.06
## 172 NA 5669412.79
## 2 NA 572821.56
## 59 NA 7044034.35
## 116 NA 10323.15
## 173 NA 5750529.11
## 3 NA 580963.40
## 60 NA 7183101.55
## 117 NA 10395.31
## 174 NA 5826699.69
## 4 NA 589195.83
## 61 NA 7325703.78
## 118 NA 10463.13
## 175 NA 5899042.75
## 5 NA 597699.95
## 62 NA 7472120.10
## 119 NA 10528.51
## 176 NA 5970225.76
## 6 NA 606055.52
## 63 NA 7614222.62
## 120 NA 10586.81
## 177 NA 6035931.26
## 7 NA 614906.29
## 64 NA 7761258.02
## 121 NA 10639.88
## 178 NA 6098353.93
## 8 NA 623657.67
## 65 NA 7904743.86
## 122 NA 10689.25
## 179 NA 6158717.48
## 9 NA 631652.11
## 66 NA 8035006.99
## 123 NA 10734.15
## 180 NA 6215178.57
## 10 NA 638474.28
## 67 NA 8146379.13
## 124 NA 10778.63
## 181 NA 6270574.02
## 11 NA 645588.94
## 68 NA 8262088.93
## 125 NA 10833.59
## 182 NA 6335506.43
## 12 NA 653332.52
## 69 NA 8387197.30
## 126 NA 10894.95
## 183 NA 6405281.98
## 13 NA 661724.09
## 70 NA 8522106.16
## 127 NA 10947.75
## 184 NA 6464354.81
## 14 NA 670744.85
## 71 NA 8667873.30
## 128 NA 10995.33
## 185 NA 6517196.62
## 15 NA 680367.28
## 72 NA 8825767.88
## 129 NA 11040.72
## 186 NA 6568537.06
## 16 NA 690581.94
## 73 NA 8996857.88
## 130 NA 11089.25
## 187 NA 6625106.29
## 17 NA 697594.38
## 74 NA 9121921.00
## 131 NA 11135.16
## 188 NA 6681185.54
## 18 NA 703736.31
## 75 NA 9236838.45
## 132 NA 11183.43
## 189 NA 6741626.76
## 19 NA 708892.50
## 76 NA 9338189.92
## 133 NA 11234.56
## 190 NA 6805827.64
## 20 NA 714072.12
## 77 NA 9439777.45
## 134 NA 11288.87
## 191 NA 6872472.75
## 21 NA 721090.41
## 78 NA 9569137.26
## 135 NA 11337.09
## 192 NA 6930126.38
## 22 NA 727946.38
## 79 NA 9693947.49
## 136 NA 11388.09
## 193 NA 6988013.45
## 23 NA 734724.84
## 80 NA 9815923.58
## 137 NA 11438.99
## 194 NA 7043475.98
## 24 NA 740405.65
## 81 NA 9918640.24
## 138 NA 11488.61
## 195 NA 7096449.45
## 25 NA 745929.22
## 82 NA 10017605.19
## 139 NA 11535.60
## 196 NA 7147199.36
## 26 NA 751334.76
## 83 NA 10113728.24
## 140 NA 11583.04
## 197 NA 7200425.28
## 27 NA 757289.30
## 84 NA 10217622.23
## 141 NA 11631.69
## 198 NA 7257489.73
## 28 NA 765246.78
## 85 NA 10353023.54
## 142 NA 11678.18
## 199 NA 7313948.96
## 29 NA 773180.38
## 86 NA 10487714.50
## 143 NA 11725.47
## 200 NA 7371880.83
## 30 NA 784325.12
## 87 NA 10675879.17
## 144 NA 11775.62
## 201 NA 7431787.19
## 31 NA 793726.08
## 88 NA 10836939.76
## 145 NA 11837.36
## 202 NA 7502456.00
## 32 NA 802328.96
## 89 NA 10986711.56
## 146 NA 11912.19
## 203 NA 7585382.80
## 33 NA 810118.93
## 90 NA 11124441.81
## 147 NA 11991.44
## 204 NA 7671215.25
## 34 NA 817104.93
## 91 NA 11249067.37
## 148 NA 12068.18
## 205 NA 7752716.64
## 35 NA 823313.17
## 92 NA 11359322.80
## 149 NA 12140.45
## 206 NA 7828750.29
## 36 NA 828757.86
## 93 NA 11454225.16
## 150 NA 12210.98
## 207 NA 7903362.85
## 37 NA 836027.42
## 94 NA 11576652.16
## 151 NA 12280.09
## 208 NA 7977160.26
## 38 NA 843303.57
## 95 NA 11696736.43
## 152 NA 12351.71
## 209 NA 8054606.05
## 39 NA 849375.70
## 96 NA 11794910.65
## 153 NA 12421.36
## 210 NA 8130414.94
## 40 NA 855246.10
## 97 NA 11889093.68
## 154 NA 12490.26
## 211 NA 8205904.18
## 41 NA 861541.43
## 98 NA 11990677.86
## 155 NA 12557.36
## 212 NA 8279745.11
## 42 NA 868762.89
## 99 NA 12108421.81
## 156 NA 12617.54
## 213 NA 8345752.34
## 43 NA 875235.32
## 100 NA 12213957.90
## 157 NA 12674.24
## 214 NA 8408084.03
## 44 NA 882463.84
## 101 NA 12332485.05
## 158 NA 12726.96
## 215 NA 8466259.74
## 45 NA 889801.23
## 102 NA 12452753.55
## 159 NA 12783.54
## 216 NA 8529873.36
## 46 NA 897093.85
## 103 NA 12572000.09
## 160 NA 12839.85
## 217 NA 8594052.12
## 47 NA 903072.27
## 104 NA 12668536.27
## 161 NA 12898.59
## 218 NA 8662213.69
## 48 NA 910379.84
## 105 NA 12787410.04
## 162 NA 12956.52
## 219 NA 8730297.60
## 49 NA 918574.78
## 106 NA 12921853.63
## 163 NA 13014.27
## 220 NA 8798682.19
## 50 NA 927279.27
## 107 NA 13066373.74
## 164 NA 13068.19
## 221 NA 8862333.79
## 51 NA 935779.00
## 108 NA 13209371.80
## 165 NA 13120.16
## 222 NA 8923126.46
## 52 NA 943324.38
## 109 NA 13338392.80
## 166 NA 13167.12
## 223 NA 8977112.77
## 53 NA 952374.21
## 110 NA 13495051.77
## 167 NA 13214.55
## 224 NA 9031176.23
## 54 NA 961255.29
## 111 NA 13649697.83
## 168 NA 13259.45
## 225 NA 9081880.28
## 55 NA 969726.65
## 112 NA 13796652.26
## 169 NA 13307.11
## 226 NA 9136200.16
## 56 NA 976338.35
## 113 NA 13909810.89
## 170 NA 13353.95
## 227 NA 9190014.44
## 57 NA 985997.02
## 114 NA 14073248.01
## 171 NA 13398.47
## 228 NA 9241520.81