DATA 607 Project 2

Description

This project involves transforming three untidy datasets into tidy datasets using dplyr and tidyr. Once the data is transformed a brief exploratory data analysis will be preformed. Since we are using dplyr and tidyr we will load them now (all other libraries will be loaded as needed).

library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(kableExtra)

American Community Survey

The American Community Survey (henceforth ACS), is a product of the U.S. Census Bureau and measures the current characteristics of the American population. It is the replacement for the Census long form. In this analysis I will be pulling data for all counties in the United States. The two tables contain the count of workers by occupation and gender and their median income.

Research Questions

Here are the questions I want to answer with this data:

  • What occupation has the the greatest pay gap between genders? And which occupations have the most pay equity?

  • How does the pay gap relate to concentration?

About the Data

The data we want to use for this is county level data. We want the most robust numbers so we will be using the latest 5 year estimates available.

Data Acquisition

In order to pull the data I will be using the ACS R package. I have previously registered for an API key, and have use the api.key.install() command to save it to my environment.

# C24010 SEX BY OCCUPATION FOR THE CIVILIAN EMPLOYED POPULATION 16 YEARS AND OVER
file_name <- "data/C24010_2016_5_est.csv"
if(!file.exists(file_name)){
  geos <- acs::geo.make(county = "*", state = "*")
  acs <- acs::acs.fetch(2016, geography = geos, table.number = "C24010", col.names = "pretty")
  acs_df_C24010 <- data.frame(acs::estimate(acs))
  write.csv(acs_df_C24010, file = file_name)
} else {
  acs_df_C24010 <- read.csv(file_name)
}

# B24022 SEX BY OCCUPATION AND MEDIAN EARNINGS IN THE PAST 12 MONTHS (IN 2016 INFLATION-ADJUSTED DOLLARS) FOR THE FULL-TIME, YEAR-ROUND CIVILIAN EMPLOYED POPULATION 16 YEARS AND OVER
file_name <- "data/B24022_2016_5_est.csv"
if(!file.exists(file_name)){
  geos <- acs::geo.make(county = "*", state = "*")
  acs <- acs::acs.fetch(2016, geography = geos, table.number = "B24022", col.names = "pretty")
  acs_df_B24022 <- data.frame(acs::estimate(acs))
  write.csv(acs_df_B24022, file = file_name)
} else {
  acs_df_B24022 <- read.csv(file_name)
}

Creating Tidy Data

The ACS data is broken out by gender and occupation. The rows are counties. It is an extremly wide dataset with 74 columns.

Table 1. C24010 Sex by Occupation for Civilain Employed Pop 16+
X SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Total. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Management..business..and.financial.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Management..business..and.financial.occupations..Management.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Management..business..and.financial.occupations..Business.and.financial.operations.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations..Computer.and.mathematical.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations..Architecture.and.engineering.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations..Life..physical..and.social.science.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Community.and.social.service.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Legal.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Education..training..and.library.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Arts..design..entertainment..sports..and.media.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Healthcare.practitioners.and.technical.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Healthcare.practitioners.and.technical.occupations..Health.diagnosing.and.treating.practitioners.and.other.technical.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Management..business..science..and.arts.occupations..Healthcare.practitioners.and.technical.occupations..Health.technologists.and.technicians SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Healthcare.support.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Protective.service.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Protective.service.occupations..Fire.fighting.and.prevention..and.other.protective.service.workers.including.supervisors SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Protective.service.occupations..Law.enforcement.workers.including.supervisors SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Food.preparation.and.serving.related.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Building.and.grounds.cleaning.and.maintenance.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Service.occupations..Personal.care.and.service.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Sales.and.office.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Sales.and.office.occupations..Sales.and.related.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Sales.and.office.occupations..Office.and.administrative.support.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Natural.resources..construction..and.maintenance.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Natural.resources..construction..and.maintenance.occupations..Farming..fishing..and.forestry.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Natural.resources..construction..and.maintenance.occupations..Construction.and.extraction.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Natural.resources..construction..and.maintenance.occupations..Installation..maintenance..and.repair.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Production..transportation..and.material.moving.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Production..transportation..and.material.moving.occupations..Production.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Production..transportation..and.material.moving.occupations..Transportation.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Male..Production..transportation..and.material.moving.occupations..Material.moving.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Management..business..and.financial.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Management..business..and.financial.occupations..Management.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Management..business..and.financial.occupations..Business.and.financial.operations.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations..Computer.and.mathematical.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations..Architecture.and.engineering.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Computer..engineering..and.science.occupations..Life..physical..and.social.science.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Community.and.social.service.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Legal.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Education..training..and.library.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Education..legal..community.service..arts..and.media.occupations..Arts..design..entertainment..sports..and.media.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Healthcare.practitioners.and.technical.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Healthcare.practitioners.and.technical.occupations..Health.diagnosing.and.treating.practitioners.and.other.technical.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Management..business..science..and.arts.occupations..Healthcare.practitioners.and.technical.occupations..Health.technologists.and.technicians SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Healthcare.support.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Protective.service.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Protective.service.occupations..Fire.fighting.and.prevention..and.other.protective.service.workers.including.supervisors SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Protective.service.occupations..Law.enforcement.workers.including.supervisors SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Food.preparation.and.serving.related.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Building.and.grounds.cleaning.and.maintenance.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Service.occupations..Personal.care.and.service.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Sales.and.office.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Sales.and.office.occupations..Sales.and.related.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Sales.and.office.occupations..Office.and.administrative.support.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Natural.resources..construction..and.maintenance.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Natural.resources..construction..and.maintenance.occupations..Farming..fishing..and.forestry.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Natural.resources..construction..and.maintenance.occupations..Construction.and.extraction.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Natural.resources..construction..and.maintenance.occupations..Installation..maintenance..and.repair.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Production..transportation..and.material.moving.occupations. SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Production..transportation..and.material.moving.occupations..Production.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Production..transportation..and.material.moving.occupations..Transportation.occupations SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..Female..Production..transportation..and.material.moving.occupations..Material.moving.occupations
Autauga County, Alabama 24262 12882 3836 1738 1356 382 1030 369 442 219 849 244 45 409 151 219 139 80 2067 69 781 238 543 552 555 110 1975 1066 909 1894 80 956 858 3110 1327 1061 722 11380 4369 1231 634 597 260 168 58 34 1864 229 374 986 275 1014 672 342 2175 475 91 61 30 581 170 858 3893 1329 2564 11 0 11 0 932 595 181 156
Baldwin County, Alabama 87753 46820 14147 6622 5304 1318 2719 816 1647 256 3089 656 504 1148 781 1717 1325 392 6977 168 1057 536 521 2786 2718 248 8378 5331 3047 8819 397 4812 3610 8499 4182 2971 1346 40933 15843 4863 2941 1922 700 238 275 187 6213 682 539 4024 968 4067 2939 1128 8038 1105 243 156 87 3227 1238 2225 15100 6905 8195 410 23 308 79 1542 1031 325 186
Barbour County, Alabama 8993 4791 884 499 402 97 97 41 56 0 250 41 25 160 24 38 13 25 691 0 269 72 197 114 291 17 772 346 426 1020 170 487 363 1424 821 369 234 4202 1518 468 337 131 36 9 16 11 675 74 22 553 26 339 164 175 704 177 27 19 8 261 137 102 1223 444 779 45 15 28 2 712 519 58 135
Bibb County, Alabama 8354 4816 788 357 254 103 139 45 94 0 229 149 0 66 14 63 58 5 631 0 130 36 94 155 346 0 499 267 232 1214 35 713 466 1684 737 442 505 3538 991 187 82 105 12 10 2 0 474 52 0 422 0 318 192 126 851 273 7 3 4 166 198 207 1100 287 813 175 20 155 0 421 102 57 262
Blount County, Alabama 21593 12169 2486 1291 1206 85 487 197 256 34 517 163 25 170 159 191 96 95 1202 81 445 177 268 223 426 27 1850 1162 688 3080 265 1421 1394 3551 1501 1237 813 9424 3851 1168 688 480 73 29 7 37 1145 177 137 702 129 1465 1020 445 1522 374 50 20 30 468 172 458 3271 781 2490 64 18 29 17 716 578 92 46
Bullock County, Alabama 4174 2369 352 244 185 59 11 0 11 0 97 36 61 0 0 0 0 0 356 0 84 0 84 23 249 0 198 145 53 630 316 200 114 833 446 162 225 1805 378 146 103 43 21 0 21 0 110 0 16 69 25 101 46 55 374 145 17 0 17 63 81 68 614 320 294 27 27 0 0 412 389 3 20
Butler County, Alabama 7810 3779 808 379 328 51 96 32 29 35 219 59 12 109 39 114 92 22 526 4 128 40 88 132 224 38 361 215 146 753 50 498 205 1331 648 366 317 4031 1390 440 349 91 52 5 47 0 463 57 23 324 59 435 256 179 811 254 60 20 40 295 90 112 1211 459 752 104 15 23 66 515 471 44 0
Calhoun County, Alabama 47010 24572 5736 2635 2125 510 1046 276 594 176 1486 232 143 728 383 569 363 206 3507 64 966 397 569 1161 937 379 3770 2165 1605 4551 216 2378 1957 7008 3884 1621 1503 22438 7572 1938 1371 567 233 183 7 43 3170 443 38 2533 156 2231 1346 885 4699 948 274 207 67 1542 839 1096 7408 3040 4368 357 57 137 163 2402 1745 332 325
Chambers County, Alabama 14031 7335 1422 645 512 133 253 48 128 77 317 41 36 180 60 207 146 61 922 0 281 168 113 260 358 23 886 470 416 1534 135 528 871 2571 1580 350 641 6696 1815 676 317 359 16 0 16 0 596 112 24 386 74 527 319 208 982 156 19 9 10 360 150 297 2655 951 1704 114 2 33 79 1130 909 67 154
Cherokee County, Alabama 10193 5672 1342 729 656 73 242 42 184 16 167 35 39 67 26 204 137 67 700 12 198 50 148 137 353 0 623 325 298 1243 111 582 550 1764 1053 414 297 4521 1816 335 235 100 83 0 10 73 528 37 21 437 33 870 499 371 915 282 62 62 0 334 30 207 1183 382 801 50 19 24 7 557 416 82 59

Developing a Workplan

In order to make a tidy data frame the first step will be to transform it from wide to long form. Then the gender and occupation needs to be split out from the column header. Spliting the gender will be simple enough but extracting the occupations will require a bit of cleanup. I will use the following function to strip out the non-occupational parts of the string, then replace the periods added to the columns by R with spaces, remove the duplicate spaces and strip off leading and trailing white space.

get_occupation <- function(string){
  string <- str_remove(string, "SEX.BY.OCCUPATION.FOR.THE.CIVILIAN.EMPLOYED.POPULATION.16.YEARS.AND.OVER..")
  string <- str_remove(string, "B24022..Sex.by.Occupation.and.Median.Earnings.in.the.Past.12.Months..in.2015.Inflation.Adjusted.Dollars..for.the.Full.Time..Year.Round.Civilian.Employed.Population.16.Years.and.Over")
  string <- str_remove(string, "Female")
  string <- str_remove(string, "Male")
  string <- str_replace_all(string, "[.]", " ")
  string <- str_replace_all(string, "  ", " ")
  string <- trimws(string)
  # Simplify occupations
  # Select everything after the last capital letter started sequence
  search_for <- str_extract(string, "[A-Z][[:alpha:] ]+[A-Z]")
  replace_with <- str_sub(search_for, start=-1)
  string <- str_replace_all(string, search_for, replace_with)
  string
}

Now that I have the get_occupation function I am ready to apply it to the dataset.

Execution of the Plan

I will start with the worker counts by sex and occupation.

acs_df_C24010 <- acs_df_C24010 %>%
  rename(County = X) %>%
  gather(key = Header, value = Workers, -County) %>%
  mutate(Occupation = get_occupation(Header),
         Gender = ifelse(grepl("Female", Header), "Female", ifelse(grepl("Male", Header), "Male", "Total"))) %>% 
  filter(Gender != "Total") %>%
  select(-Header) %>%
  drop_na()

Thre are 193200 rows in acs_df_C24010. Now I will apply the same prcess to the median earnings data.

acs_df_B24022 <- acs_df_B24022 %>%
  rename(County = X) %>%
  gather(key = Header, value = "Earnings", -County) %>%
  mutate(Occupation = get_occupation(Header),
         Gender = ifelse(grepl("Female", Header), "Female", ifelse(grepl("Male", Header), "Male", "Total"))) %>%
  filter(Gender != "Total") %>%
  select(-Header)

Thre are 231840 rows in acs_df_B24022. The final step in assembling the tidy dataset is to merge the two data frames togther.

acs_df <- merge(acs_df_C24010, acs_df_B24022)

There are 193200 rows in the combined data frame. Since I want to look at the gender concentrations withing the occupation I will need to create that varaible.

acs_df <- acs_df %>%
  select(Occupation, County, Workers) %>%
  group_by(Occupation, County) %>%
  summarise(Total = sum(Workers)) %>% 
  merge(acs_df) %>%
  mutate(Concentration = Workers / Total) %>%
  select(-Total)

Let’s do a quick sanity check on the data. Let’s examine the Workers, Earnigns and Concentration

Table 2. ACS Data Check
Variable min mean median max
Concentration NA NaN NA NA
Earnings -666666666 -1.579328e+08 35144.5 250001
Workers 0 1.073622e+03 145.0 417485

I see there are cases where the median earnings are -666666666. This is the Census Bureau’s equivalent of N/A, so we will remove those observations.

acs_df <- acs_df %>%
  filter(Earnings > 0)
Table 3. ACS Data After Cleanup
Variable min mean median max
Concentration 6.0044e-03 5.625129e-01 5.801944e-01 1
Earnings 2.4990e+03 4.551645e+04 4.161800e+04 250001
Workers 3.0000e+00 1.401674e+03 2.630000e+02 417485

The median earnings are reasonable and the concentration rates range from roughly 0% to 100%. Now I need to compute a female to male earning ratio.

acs_df <- acs_df %>%
  filter(Gender == "Male") %>%
  select(County, Occupation, Earnings) %>%
  rename(Male_Earnings = Earnings) %>%
  merge(acs_df) %>%
  filter(Gender == "Female") %>%
  mutate(Ratio = Earnings / Male_Earnings)

We now have 61145 observations ready for analysis.

Analysis

Occupation with the the Largest and Smallest Pay Gap

The female workers earn 0.04 to 12.79 dollars for every dollar a male worker earns. That’s a considerable range! The median is 0.78 as shown by the verticle line in the graph below.

Figure 1. Earning Ratio Density Plot (Median Verticle Line in Red)

So which occupations are the most inequitable in pay? There may be some highly unusual markest so let’s average the ratios by occupation and look at the results.

acs_avg_df <- acs_df %>%
  group_by(Occupation) %>%
  summarise(Ratio = mean(Ratio), Concentration = mean(Concentration), N = n())
Table 4. ACS Average Summary Statistics
Variable min mean median max
Concentration 0.0380905 0.4554622 0.4331017 0.8652890
N 856.0000000 2038.1666667 1965.0000000 3194.0000000
Ratio 0.5941787 0.8310052 0.8515281 0.9843413

There is no occupation group where women on average earn more than men. Let’s examine which occupations women earn less than their male counterparts.

Table 5. Top 5 Occupations with Income Inequality
Occupation Ratio Concentration N
Legal occupations 0.5941787 0.5457587 1238
Sales and related occupations 0.6559344 0.5361600 3087
Health diagnosing and treating practitioners and other technical occupations 0.6639209 0.7604075 2338
Production occupations 0.7246064 0.2742888 2795
Healthcare practitioners and technical occupations 0.7319898 0.7868184 2657

Legal, sales, and health diagnosing/treatment/technical occupations seem to be the top occpuations with a gender pay gap. It is interesting to note that most of these occupations have a higher than avergae female conentration. Now let’s see which occupations are closer to equal pay.

Table 6. Top 5 Occupations with Lowest Income Inequality
Occupation Ratio Concentration N
Community and social service occupations 0.9843413 0.6134946 2362
Food preparation and serving related occupations 0.9539764 0.6202989 2321
Installation maintenance and repair occupations 0.9472918 0.0458960 949
Computer and mathematical occupations 0.9261895 0.3051553 1597
Life physical and social science occupations 0.9180171 0.4485583 1282

It looks like community/social service, food preperation and serving, installation/repair, computer and science related occupations have the closest pay equality. The concentration levels are a mixed bag.

Pay Gap and Concentration Relationship

Is there any relationship between the equality of gender representation and pay? Our previous example looked at aggregate trends and it appeared to not have a strong relationship. Let’s examine all the observations.

Figure 2. Concentration and Earning Ratio Scatter Plot

Looking at the scatterplot it doesn’t appear to be much of a relationship. Let’s look at the Pearson’s R.

cor(acs_df$Concentration, acs_df$Ratio)
## [1] -0.02527091

This confirms that the relationship is not based on the level of concentration in the industry. Which suggests that if someone wants to improve the gender pay gap and thinks having more females representation in the occupations will solve that problem, they might be disappointed.

U.N. International Migrant Stock

The next dataset that will undergo the cleanup is the 2017 International Migrant Stock data produced by the United Nations Department of Economic and Social Affairs Population Division. They produce multiple datasets. This project will use the immigrants by origin and destination.

Research Questions

Here are the questions I want to answer with this data:

  • What are the top countries receiving/loosing migrants in absolute numbers by year?

  • How does the United States compare?

  • Do communities exist in this network?

About the Data

These estimates of the international migrants have been equated with the foreignborn population whenever this information is available, which is the case in most countries or areas. It is for the mid-point of each year.

Data Acquisition

We will download the migration stock by destination and origin data to our local system.

file_name <- "data/UN_MigrantStockByOriginAndDestination_2017.xlsx"
if(!file.exists(file_name)){
  download.file("http://www.un.org/en/development/desa/population/migration/data/estimates2/data/UN_MigrantStockByOriginAndDestination_2017.xlsx", file_name)
}

Creating Tidy Data

I will begin by reading in a few rows and examing them to determine what actions need to be taken.

Table 7. First Look at UN Data
X1 X2 X3 X4 United.Nations X6 X7 X8 X9 X10 X11 X12 X13 X14 X15 X16 X17 X18 X19 X20 X21 X22 X23 X24 X25 X26 X27 X28 X29 X30 X31 X32 X33 X34 X35 X36 X37 X38 X39 X40 X41 X42 X43 X44 X45 X46 X47 X48 X49 X50 X51 X52 X53 X54 X55 X56 X57 X58 X59 X60 X61 X62 X63 X64 X65 X66 X67 X68 X69 X70 X71 X72 X73 X74 X75 X76 X77 X78 X79 X80 X81 X82 X83 X84 X85 X86 X87 X88 X89 X90 X91 X92 X93 X94 X95 X96 X97 X98 X99 X100 X101 X102 X103 X104 X105 X106 X107 X108 X109 X110 X111 X112 X113 X114 X115 X116 X117 X118 X119 X120 X121 X122 X123 X124 X125 X126 X127 X128 X129 X130 X131 X132 X133 X134 X135 X136 X137 X138 X139 X140 X141 X142 X143 X144 X145 X146 X147 X148 X149 X150 X151 X152 X153 X154 X155 X156 X157 X158 X159 X160 X161 X162 X163 X164 X165 X166 X167 X168 X169 X170 X171 X172 X173 X174 X175 X176 X177 X178 X179 X180 X181 X182 X183 X184 X185 X186 X187 X188 X189 X190 X191 X192 X193 X194 X195 X196 X197 X198 X199 X200 X201 X202 X203 X204 X205 X206 X207 X208 X209 X210 X211 X212 X213 X214 X215 X216 X217 X218 X219 X220 X221 X222 X223 X224 X225 X226 X227 X228 X229 X230 X231 X232 X233 X234 X235 X236 X237 X238 X239 X240 X241
NA NA NA NA Population Division NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA Department of Economic and Social Affairs NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA Workbook: UN_MigrantStockByOriginAndDestination_2017 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA Table 1. Total migrant stock at mid-year by origin and by major area, region, country or area of destination, 1990-2017 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA December 2017 - Copyright © 2017 by United Nations. All rights reserved NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA Suggested citation: United Nations, Department of Economic and Social Affairs. Population Division (2017). Trends in International Migrant Stock: The 2017 revision (United Nations database, POP/DB/MIG/Stock/Rev.2017). NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Year Sort order Major area, region, country or area of destination Notes Code Type of data (a) Country or area of origin NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
NA NA NA NA NA NA Total Other North Other South Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia (Plurinational State of) Bosnia and Herzegovina Botswana Brazil British Virgin Islands Brunei Darussalam Bulgaria Burkina Faso Burundi Cabo Verde Cambodia Cameroon Canada Caribbean Netherlands Cayman Islands Central African Republic Chad Channel Islands Chile China China, Hong Kong SAR China, Macao SAR Colombia Comoros Congo Cook Islands Costa Rica Côte d’Ivoire Croatia Cuba Curaçao Cyprus Czechia Dem. People’s Republic of Korea Democratic Republic of the Congo Denmark Djibouti Dominica Dominican Republic Ecuador Egypt El Salvador Equatorial Guinea Eritrea Estonia Ethiopia Faeroe Islands Falkland Islands (Malvinas) Fiji Finland France French Guiana French Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guinea Guinea-Bissau Guyana Haiti Holy See Honduras Hungary Iceland India Indonesia Iran (Islamic Republic of) Iraq Ireland Isle of Man Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan Lao People’s Democratic Republic Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia (Fed. States of) Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands New Caledonia New Zealand Nicaragua Niger Nigeria Niue Northern Mariana Islands Norway Oman Pakistan Palau Panama Papua New Guinea Paraguay Peru Philippines Poland Portugal Puerto Rico Qatar Republic of Korea Republic of Moldova Réunion Romania Russian Federation Rwanda Saint Helena Saint Kitts and Nevis Saint Lucia Saint Pierre and Miquelon Saint Vincent and the Grenadines Samoa San Marino Sao Tome and Principe Saudi Arabia Senegal Serbia Seychelles Sierra Leone Singapore Sint Maarten (Dutch part) Slovakia Slovenia Solomon Islands Somalia South Africa South Sudan Spain Sri Lanka State of Palestine Sudan Suriname Swaziland Sweden Switzerland Syrian Arab Republic Tajikistan TFYR Macedonia Thailand Timor-Leste Togo Tokelau Tonga Trinidad and Tobago Tunisia Turkey Turkmenistan Turks and Caicos Islands Tuvalu Uganda Ukraine United Arab Emirates United Kingdom United Republic of Tanzania United States of America United States Virgin Islands Uruguay Uzbekistan Vanuatu Venezuela (Bolivarian Republic of) Viet Nam Wallis and Futuna Islands Western Sahara Yemen Zambia Zimbabwe
1990 1990001 WORLD NA 900 NA 152542373 2144536 6342531 6724681 179490 906030 2041 3792 809942 2047 21747 430322 899683 10597 303406 505818 1634120 25172 12767 5451546 84917 1769029 365414 36114 233642 71703 28366 224687 863399 26037 500149 3094 26323 617155 1018868 337199 91681 354608 115311 997144 4325 373 46240 336916 18727 489955 4229860 551080 95648 1009148 40076 91302 17488 69724 370866 426201 835546 41878 174364 277514 38574 401962 202194 5281 42366 465022 213731 1321128 1241877 34972 170617 113928 1687517 7524 260 90156 250690 1197097 2848 3149 15208 36149 920441 3277677 373303 11994 1041455 9510 43250 5828 1376 343623 352590 55320 233660 527307 32 156553 386934 17635 6718862 1636326 629834 1506679 917639 10735 278956 3416421 588820 609032 313737 2972433 242119 4053 81482 522615 482094 215165 506600 191993 516719 76071 3430 341083 27828 58633 121365 562617 2192 647744 110735 1428 11041 134129 108385 1835 4394684 7714 4407 24155 78678 7188 1588218 2218009 685288 16057 1412 748046 728810 4151 388089 442037 132726 448460 5860 2525 138537 12534 3341574 2958 134703 3015 297925 313595 2029190 1533306 1880727 1200835 12204 1624797 625810 3087 813087 12664537 550719 884 20714 22006 485 37043 74861 1415 13941 107037 369263 708804 29376 60952 156201 14368 133461 91652 2212 846967 327661 514943 1449316 885836 1813068 587063 179657 35184 207067 323233 620868 536252 429555 309088 11261 193369 1684 32666 197522 453933 2530619 259991 2311 2350 311490 5549477 79540 3795662 204272 1736288 2362 237258 1429956 5060 185888 1235348 6484 168239 455442 83210 176697
1990 1990002 More developed regions b 901 NA 82391619 1094823 3412144 119338 177192 851351 1027 3737 166746 540 14555 222418 654149 4640 244911 471667 1056582 23365 1730 160805 79548 1571715 338069 32179 14192 71550 455 42741 855982 2827 282279 158 6623 122671 6192 3751 79896 236452 49649 937447 2843 269 8992 3500 18625 187964 1458177 516749 14382 363807 18939 56389 17441 48021 48925 422495 774343 38671 163971 268717 14424 60761 194766 3439 23743 386354 164957 268813 507444 7290 25579 110189 114456 7522 237 85987 247005 861985 61 324 8573 12554 802697 2456479 126156 11920 890987 8997 23826 162 43 240573 14544 16017 206793 293209 11 114272 359584 17545 1230030 308106 516925 145152 909294 10735 166683 2854508 574958 430357 70267 2834662 154331 989 16296 483080 250850 200685 367739 363 18187 25186 3252 310331 27464 42709 11568 188073 303 51228 110489 1117 316 12935 99294 .. 4349767 2764 4128 23989 78437 5174 1407705 78789 44676 1282 465 7163 665910 1268 371394 178897 3298 147701 5821 274 130212 713 445224 12 90071 1749 14467 226813 1345130 1346022 1477882 1180927 904 1382397 567292 125 661086 7568076 6563 539 9886 10108 433 18414 59558 1372 5848 24773 129700 705233 28945 18816 90287 13798 132902 89617 1115 66288 222195 1 926829 260005 35605 15018 160167 549 197343 285092 125340 471240 378026 203796 10514 18737 1523 29974 182904 404841 2439037 249217 221 1171 71017 4672151 5595 3463535 60911 885534 70 56562 1078599 1017 114635 1083015 884 333 11407 25922 40832
1990 1990003 Less developed regions c 902 NA 70150754 1049713 2930387 6605343 2298 54679 1014 55 643196 1507 7192 207904 245534 5957 58495 34151 577538 1807 11037 5290741 5369 197314 27345 3935 219450 153 27911 181946 7417 23210 217870 2936 19700 494484 1012676 333448 11785 118156 65662 59697 1482 104 37248 333416 102 301991 2771683 34331 81266 645341 21137 34913 47 21703 321941 3706 61203 3207 10393 8797 24150 341201 7428 1842 18623 78668 48774 1052315 734433 27682 145038 3739 1573061 2 23 4169 3685 335112 2787 2825 6635 23595 117744 821198 247147 74 150468 513 19424 5666 1333 103050 338046 39303 26867 234098 21 42281 27350 90 5488832 1328220 112909 1361527 8345 .. 112273 561913 13862 178675 243470 137771 87788 3064 65186 39535 231244 14480 138861 191630 498532 50885 178 30752 364 15924 109797 374544 1889 596516 246 311 10725 121194 9091 1835 44917 4950 279 166 241 2014 180513 2139220 640612 14775 947 740883 62900 2883 16695 263140 129428 300759 39 2251 8325 11821 2896350 2946 44632 1266 283458 86782 684060 187284 402845 19908 11300 242400 58518 2962 152001 5096461 544156 345 10828 11898 52 18629 15303 43 8093 82264 239563 3571 431 42136 65914 570 559 2035 1097 780679 105466 514942 522487 625831 1777463 572045 19490 34635 9724 38141 495528 65012 51529 105292 747 174632 161 2692 14618 49092 91582 10774 2090 1179 240473 877326 73945 332127 143361 850754 2292 180696 351357 4043 71253 152333 5600 167906 444035 57288 135865
1990 1990004 Least developed countries d 941 NA 11073226 250575 533018 .. .. 5587 .. .. 593741 .. .. 526 .. .. 1465 190 .. .. 89 2047 .. .. 974 .. 41733 .. 16805 .. .. 652 1631 .. .. .. 40895 330554 11440 14322 33380 1879 .. .. 26440 208385 .. 333 245928 54 .. 828 8549 25312 .. 274 313312 .. 1349 .. 5 .. 79 257417 195 904 .. 2406 .. 30538 .. 355 134471 .. 1432956 .. .. 721 .. 38686 .. .. 5273 13035 .. 4784 62598 .. 41 .. .. .. .. .. 200949 37630 .. .. .. .. .. .. 462759 75246 .. 5020 1 .. .. 2415 .. 9732 .. .. 75003 1054 29 .. 58843 .. 7437 3610 477733 3708 .. .. .. 10810 26559 92880 .. 132718 .. .. .. 107762 113 .. 306 .. .. .. .. .. 3711 1265161 226295 3080 921 10025 555 292 487 .. 39015 116441 .. .. 4240 .. 17673 .. .. 1160 .. 351 818 .. 13026 1546 .. 356 .. 957 .. 1475 537260 .. .. .. .. .. 21 .. 3219 2132 196096 .. 214 39039 10678 .. .. 15 83 757435 42457 498608 1780 115 2697 246795 .. 102 1141 686 2393 40537 .. 32076 .. 33039 .. 24 .. 221 6 .. .. 377 149308 84 62 15797 74016 38334 .. 286 2027 9 2510 71579 .. .. 357 25400 47578
1990 1990005 Less developed regions, excluding least developed countries NA 934 NA 59077528 799138 2397369 6605343 2298 49092 1014 55 49455 1507 7192 207378 245534 5957 57030 33961 577538 1807 10948 5288694 5369 197314 26371 3935 177717 153 11106 181946 7417 22558 216239 2936 19700 494484 971781 2894 345 103834 32282 57818 1482 104 10808 125031 102 301658 2525755 34277 81266 644513 12588 9601 47 21429 8629 3706 59854 3207 10388 8797 24071 83784 7233 938 18623 76262 48774 1021777 734433 27327 10567 3739 140105 2 23 3448 3685 296426 2787 2825 1362 10560 117744 816414 184549 74 150427 513 19424 5666 1333 103050 137097 1673 26867 234098 21 42281 27350 90 5026073 1252974 112909 1356507 8344 .. 112273 559498 13862 168943 243470 137771 12785 2010 65157 39535 172401 14480 131424 188020 20799 47177 178 30752 364 5114 83238 281664 1889 463798 246 311 10725 13432 8978 1835 44611 4950 279 166 241 2014 176802 874059 414317 11695 26 730858 62345 2591 16208 263140 90413 184318 39 2251 4085 11821 2878677 2946 44632 106 283458 86431 683242 187284 389819 18362 11300 242044 58518 2005 152001 5094986 6896 345 10828 11898 52 18629 15282 43 4874 80132 43467 3571 217 3097 55236 570 559 2020 1014 23244 63009 16334 520707 625716 1774766 325250 19490 34533 8583 37455 493135 24475 51529 73216 747 141593 161 2668 14618 48871 91576 10774 2090 802 91165 877242 73883 316330 69345 812420 2292 180410 349330 4034 68743 80754 5600 167906 443678 31888 88287
1990 1990006 High-income countries e 1503 NA 75239010 1129128 3750041 269865 130506 870696 68 3747 166230 1364 19767 295653 76183 5113 256890 471750 33547 24343 2339 1038610 82908 378283 341469 32345 13875 71625 458 49737 495784 2633 302533 2979 6624 104707 5665 3595 79847 236706 48942 955026 3230 295 8873 15899 18630 191920 3393997 543947 83390 371559 18926 55889 17443 48999 48823 364144 792038 39123 163358 253465 5006 60941 193341 3439 33314 428041 167368 988912 508196 7264 33720 33763 221817 7524 242 86020 244824 920276 67 3068 8575 12552 85436 2341943 125922 11924 850206 9002 41916 210 1258 241209 14110 15677 218018 307389 32 115686 345011 17522 3220840 1262444 558435 260814 909863 10735 160317 2865574 584177 447810 220953 58669 154006 988 37136 7158 251279 83242 438825 374 18194 43988 3254 197421 27374 42610 11559 433218 170 51042 109923 1302 326 12944 99283 .. 4354433 7613 3939 2797 55407 6971 1566509 78690 133895 1178 465 198216 676799 3016 374264 179710 3276 150030 5823 2140 129151 699 1364109 2577 92275 1752 16247 235120 1877166 1408466 1482462 1186217 964 1390454 61579 125 782241 1696178 6387 539 17764 18009 435 36004 59557 1346 5860 39846 129613 669796 28947 18818 98375 13931 127080 55158 1115 71322 231960 5082 978205 570988 62634 267235 160906 550 198343 287883 457665 9548 354377 232460 10514 18740 1523 29975 188544 447653 2462019 46257 2307 1138 70937 1013987 52769 3543224 60703 1243574 1567 61771 65706 4358 121817 1043808 6472 223 429974 25598 40644
1990 1990007 Middle-income countries e 1517 NA 68475181 863674 2156273 6454816 48984 30942 1973 45 174216 515 1487 134162 823500 1108 46271 33878 1600573 785 10428 4412312 1867 1390746 22708 3769 178525 78 11103 174950 367615 20379 190098 115 19699 512448 972308 3050 8213 117902 33349 39862 1095 78 28838 308855 97 297659 790296 6978 12258 635974 1872 10627 45 20451 8911 62057 41623 388 11001 24049 33509 138965 8676 938 633 22579 46363 332149 733681 27658 67216 80165 1004248 .. 18 4034 5866 121687 946 .. 1448 10562 835005 931166 185575 70 191208 508 1329 141 118 102414 139915 2557 12415 193114 .. 40867 41923 113 3117161 372978 71399 1245865 7775 .. 118639 548183 3973 160640 92784 2913764 19661 3065 44346 515457 229895 131923 60011 188009 20792 30887 176 143662 454 788 31753 129272 2022 471844 812 126 342 13141 7981 .. 39935 101 468 21358 23271 82 18304 436930 551393 14509 947 549830 44446 292 11294 262327 90570 205357 28 385 9384 11835 1965418 381 42428 1253 281678 77728 150908 124840 389487 13049 11240 228256 564231 1143 30846 10966911 7118 314 265 1012 50 955 15148 69 7974 65332 136902 39008 215 3095 57826 141 6381 36479 1097 156197 53289 11253 469393 314650 1750434 91681 1454 34583 7594 34586 162988 486167 75178 76396 747 141694 161 2643 8889 6280 68594 213734 4 1151 166212 4535406 26771 227178 76599 482311 551 175201 1362223 550 61107 190824 12 168016 25400 14442 123177
1990 1990008 Upper-middle-income countries e 1502 NA 33412332 217939 784944 3152582 48446 4728 1954 43 37435 515 1428 110950 767233 1099 19915 32649 1004974 742 1440 34282 1796 887449 20464 1718 16857 78 13 174543 367272 19646 177525 115 3259 509624 2817 2897 108 74902 19086 27958 1082 10 260 2432 97 291754 223910 6706 12258 633504 1441 8666 45 11459 3058 61914 39481 278 10879 23126 30696 15746 7873 695 627 22270 45229 110351 369556 26868 1929 67176 5405 .. 18 914 5212 70955 933 .. 375 141 741974 863749 9652 63 184100 239 1328 141 5 89633 2129 573 12414 193023 .. 9988 41018 111 60483 281978 49674 1206517 7426 .. 14331 537390 3893 131006 24495 2468541 4795 2074 8350 449222 165694 107830 47029 187710 309 3801 174 108135 447 738 14606 14863 191 18016 806 .. 342 588 5434 .. 22125 101 404 21314 23271 82 7692 347999 234764 11301 26 21400 41585 .. 7803 111840 332 28463 28 282 4837 1595 26627 .. 40905 84 280370 69612 141398 108806 381154 12678 945 213232 369602 1143 24002 3510798 2867 314 264 1012 50 955 15137 69 2387 23877 13402 38973 200 743 32914 109 6294 36479 690 8720 20288 5219 461125 20471 595184 55109 1425 34532 6677 31579 85496 436941 74811 40419 .. 7105 161 2572 8885 2713 60052 175728 4 774 5738 3963480 5402 174388 2888 372371 494 174515 1177624 387 60075 82891 12 168016 17333 13174 88436
1990 1990009 Lower-middle-income countries e 1501 NA 35062849 645735 1371329 3302234 538 26214 19 2 136781 .. 59 23212 56267 9 26356 1229 595599 43 8988 4378030 71 503297 2244 2051 161668 .. 11090 407 343 733 12573 .. 16440 2824 969491 153 8105 43000 14263 11904 13 68 28578 306423 .. 5905 566386 272 .. 2470 431 1961 .. 8992 5853 143 2142 110 122 923 2813 123219 803 243 6 309 1134 221798 364125 790 65287 12989 998843 .. .. 3120 654 50732 13 .. 1073 10421 93031 67417 175923 7 7108 269 1 .. 113 12781 137786 1984 1 91 .. 30879 905 2 3056678 91000 21725 39348 349 .. 104308 10793 80 29634 68289 445223 14866 991 35996 66235 64201 24093 12982 299 20483 27086 2 35527 7 50 17147 114409 1831 453828 6 126 .. 12553 2547 .. 17810 .. 64 44 .. .. 10612 88931 316629 3208 921 528430 2861 292 3491 150487 90238 176894 .. 103 4547 10240 1938791 381 1523 1169 1308 8116 9510 16034 8333 371 10295 15024 194629 .. 6844 7456113 4251 .. 1 .. .. .. 11 .. 5587 41455 123500 35 15 2352 24912 32 87 .. 407 147477 33001 6034 8268 294179 1155250 36572 29 51 917 3007 77492 49226 367 35977 747 134589 .. 71 4 3567 8542 38006 .. 377 160474 571926 21369 52790 73711 109940 57 686 184599 163 1032 107933 .. .. 8067 1268 34741

Developing a Workplan

There is a lot of data at the begining of the file that needs to be overlooked. When I read in the data I will skip down to the row after the one that begins with “year”. The country names are in that row. Thre are “..” that will need to be changed to NAs. The data will need to be renamed. I won’t need the sort order, code or type of data columns so I will drop them.

Execution of the Plan

un_df <- read.xlsx('data/UN_MigrantStockByOriginAndDestination_2017.xlsx', 'Table 1', startRow = 16, na.strings = "..") %>%
  rename(Year = X1,
         Sort.order = X2,
         Destination = X3,
         Notes = X4,
         Code = X5,
         Type.of.data = X6) %>%
  select(-Sort.order, -Code, -Notes)

Now that these steps have been taken let’s take a look at the data frame to determine what to do next.

Table 8. Untidy UN Data
Year Destination Type.of.data Total Other.North Other.South Afghanistan Albania Algeria American.Samoa Andorra Angola Anguilla Antigua.and.Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia.(Plurinational.State.of) Bosnia.and.Herzegovina Botswana Brazil British.Virgin.Islands Brunei.Darussalam Bulgaria Burkina.Faso Burundi Cabo.Verde Cambodia Cameroon Canada Caribbean.Netherlands Cayman.Islands Central.African.Republic Chad Channel.Islands Chile China China,.Hong.Kong.SAR China,.Macao.SAR Colombia Comoros Congo Cook.Islands Costa.Rica Côte.d’Ivoire Croatia Cuba Curaçao Cyprus Czechia Dem..People’s.Republic.of.Korea Democratic.Republic.of.the.Congo Denmark Djibouti Dominica Dominican.Republic Ecuador Egypt El.Salvador Equatorial.Guinea Eritrea Estonia Ethiopia Faeroe.Islands Falkland.Islands.(Malvinas) Fiji Finland France French.Guiana French.Polynesia Gabon Gambia Georgia Germany Ghana Gibraltar Greece Greenland Grenada Guadeloupe Guam Guatemala Guinea Guinea-Bissau Guyana Haiti Holy.See Honduras Hungary Iceland India Indonesia Iran.(Islamic.Republic.of) Iraq Ireland Isle.of.Man Israel Italy Jamaica Japan Jordan Kazakhstan Kenya Kiribati Kuwait Kyrgyzstan Lao.People’s.Democratic.Republic Latvia Lebanon Lesotho Liberia Libya Liechtenstein Lithuania Luxembourg Madagascar Malawi Malaysia Maldives Mali Malta Marshall.Islands Martinique Mauritania Mauritius Mayotte Mexico Micronesia.(Fed..States.of) Monaco Mongolia Montenegro Montserrat Morocco Mozambique Myanmar Namibia Nauru Nepal Netherlands New.Caledonia New.Zealand Nicaragua Niger Nigeria Niue Northern.Mariana.Islands Norway Oman Pakistan Palau Panama Papua.New.Guinea Paraguay Peru Philippines Poland Portugal Puerto.Rico Qatar Republic.of.Korea Republic.of.Moldova Réunion Romania Russian.Federation Rwanda Saint.Helena Saint.Kitts.and.Nevis Saint.Lucia Saint.Pierre.and.Miquelon Saint.Vincent.and.the.Grenadines Samoa San.Marino Sao.Tome.and.Principe Saudi.Arabia Senegal Serbia Seychelles Sierra.Leone Singapore Sint.Maarten.(Dutch.part) Slovakia Slovenia Solomon.Islands Somalia South.Africa South.Sudan Spain Sri.Lanka State.of.Palestine Sudan Suriname Swaziland Sweden Switzerland Syrian.Arab.Republic Tajikistan TFYR.Macedonia Thailand Timor-Leste Togo Tokelau Tonga Trinidad.and.Tobago Tunisia Turkey Turkmenistan Turks.and.Caicos.Islands Tuvalu Uganda Ukraine United.Arab.Emirates United.Kingdom United.Republic.of.Tanzania United.States.of.America United.States.Virgin.Islands Uruguay Uzbekistan Vanuatu Venezuela.(Bolivarian.Republic.of) Viet.Nam Wallis.and.Futuna.Islands Western.Sahara Yemen Zambia Zimbabwe
1990 WORLD NA 152542373 2144536 6342531 6724681 179490 906030 2041 3792 809942 2047 21747 430322 899683 10597 303406 505818 1634120 25172 12767 5451546 84917 1769029 365414 36114 233642 71703 28366 224687 863399 26037 500149 3094 26323 617155 1018868 337199 91681 354608 115311 997144 4325 373 46240 336916 18727 489955 4229860 551080 95648 1009148 40076 91302 17488 69724 370866 426201 835546 41878 174364 277514 38574 401962 202194 5281 42366 465022 213731 1321128 1241877 34972 170617 113928 1687517 7524 260 90156 250690 1197097 2848 3149 15208 36149 920441 3277677 373303 11994 1041455 9510 43250 5828 1376 343623 352590 55320 233660 527307 32 156553 386934 17635 6718862 1636326 629834 1506679 917639 10735 278956 3416421 588820 609032 313737 2972433 242119 4053 81482 522615 482094 215165 506600 191993 516719 76071 3430 341083 27828 58633 121365 562617 2192 647744 110735 1428 11041 134129 108385 1835 4394684 7714 4407 24155 78678 7188 1588218 2218009 685288 16057 1412 748046 728810 4151 388089 442037 132726 448460 5860 2525 138537 12534 3341574 2958 134703 3015 297925 313595 2029190 1533306 1880727 1200835 12204 1624797 625810 3087 813087 12664537 550719 884 20714 22006 485 37043 74861 1415 13941 107037 369263 708804 29376 60952 156201 14368 133461 91652 2212 846967 327661 514943 1449316 885836 1813068 587063 179657 35184 207067 323233 620868 536252 429555 309088 11261 193369 1684 32666 197522 453933 2530619 259991 2311 2350 311490 5549477 79540 3795662 204272 1736288 2362 237258 1429956 5060 185888 1235348 6484 168239 455442 83210 176697
1990 More developed regions NA 82391619 1094823 3412144 119338 177192 851351 1027 3737 166746 540 14555 222418 654149 4640 244911 471667 1056582 23365 1730 160805 79548 1571715 338069 32179 14192 71550 455 42741 855982 2827 282279 158 6623 122671 6192 3751 79896 236452 49649 937447 2843 269 8992 3500 18625 187964 1458177 516749 14382 363807 18939 56389 17441 48021 48925 422495 774343 38671 163971 268717 14424 60761 194766 3439 23743 386354 164957 268813 507444 7290 25579 110189 114456 7522 237 85987 247005 861985 61 324 8573 12554 802697 2456479 126156 11920 890987 8997 23826 162 43 240573 14544 16017 206793 293209 11 114272 359584 17545 1230030 308106 516925 145152 909294 10735 166683 2854508 574958 430357 70267 2834662 154331 989 16296 483080 250850 200685 367739 363 18187 25186 3252 310331 27464 42709 11568 188073 303 51228 110489 1117 316 12935 99294 NA 4349767 2764 4128 23989 78437 5174 1407705 78789 44676 1282 465 7163 665910 1268 371394 178897 3298 147701 5821 274 130212 713 445224 12 90071 1749 14467 226813 1345130 1346022 1477882 1180927 904 1382397 567292 125 661086 7568076 6563 539 9886 10108 433 18414 59558 1372 5848 24773 129700 705233 28945 18816 90287 13798 132902 89617 1115 66288 222195 1 926829 260005 35605 15018 160167 549 197343 285092 125340 471240 378026 203796 10514 18737 1523 29974 182904 404841 2439037 249217 221 1171 71017 4672151 5595 3463535 60911 885534 70 56562 1078599 1017 114635 1083015 884 333 11407 25922 40832
1990 Less developed regions NA 70150754 1049713 2930387 6605343 2298 54679 1014 55 643196 1507 7192 207904 245534 5957 58495 34151 577538 1807 11037 5290741 5369 197314 27345 3935 219450 153 27911 181946 7417 23210 217870 2936 19700 494484 1012676 333448 11785 118156 65662 59697 1482 104 37248 333416 102 301991 2771683 34331 81266 645341 21137 34913 47 21703 321941 3706 61203 3207 10393 8797 24150 341201 7428 1842 18623 78668 48774 1052315 734433 27682 145038 3739 1573061 2 23 4169 3685 335112 2787 2825 6635 23595 117744 821198 247147 74 150468 513 19424 5666 1333 103050 338046 39303 26867 234098 21 42281 27350 90 5488832 1328220 112909 1361527 8345 NA 112273 561913 13862 178675 243470 137771 87788 3064 65186 39535 231244 14480 138861 191630 498532 50885 178 30752 364 15924 109797 374544 1889 596516 246 311 10725 121194 9091 1835 44917 4950 279 166 241 2014 180513 2139220 640612 14775 947 740883 62900 2883 16695 263140 129428 300759 39 2251 8325 11821 2896350 2946 44632 1266 283458 86782 684060 187284 402845 19908 11300 242400 58518 2962 152001 5096461 544156 345 10828 11898 52 18629 15303 43 8093 82264 239563 3571 431 42136 65914 570 559 2035 1097 780679 105466 514942 522487 625831 1777463 572045 19490 34635 9724 38141 495528 65012 51529 105292 747 174632 161 2692 14618 49092 91582 10774 2090 1179 240473 877326 73945 332127 143361 850754 2292 180696 351357 4043 71253 152333 5600 167906 444035 57288 135865
1990 Least developed countries NA 11073226 250575 533018 NA NA 5587 NA NA 593741 NA NA 526 NA NA 1465 190 NA NA 89 2047 NA NA 974 NA 41733 NA 16805 NA NA 652 1631 NA NA NA 40895 330554 11440 14322 33380 1879 NA NA 26440 208385 NA 333 245928 54 NA 828 8549 25312 NA 274 313312 NA 1349 NA 5 NA 79 257417 195 904 NA 2406 NA 30538 NA 355 134471 NA 1432956 NA NA 721 NA 38686 NA NA 5273 13035 NA 4784 62598 NA 41 NA NA NA NA NA 200949 37630 NA NA NA NA NA NA 462759 75246 NA 5020 1 NA NA 2415 NA 9732 NA NA 75003 1054 29 NA 58843 NA 7437 3610 477733 3708 NA NA NA 10810 26559 92880 NA 132718 NA NA NA 107762 113 NA 306 NA NA NA NA NA 3711 1265161 226295 3080 921 10025 555 292 487 NA 39015 116441 NA NA 4240 NA 17673 NA NA 1160 NA 351 818 NA 13026 1546 NA 356 NA 957 NA 1475 537260 NA NA NA NA NA 21 NA 3219 2132 196096 NA 214 39039 10678 NA NA 15 83 757435 42457 498608 1780 115 2697 246795 NA 102 1141 686 2393 40537 NA 32076 NA 33039 NA 24 NA 221 6 NA NA 377 149308 84 62 15797 74016 38334 NA 286 2027 9 2510 71579 NA NA 357 25400 47578
1990 Less developed regions, excluding least developed countries NA 59077528 799138 2397369 6605343 2298 49092 1014 55 49455 1507 7192 207378 245534 5957 57030 33961 577538 1807 10948 5288694 5369 197314 26371 3935 177717 153 11106 181946 7417 22558 216239 2936 19700 494484 971781 2894 345 103834 32282 57818 1482 104 10808 125031 102 301658 2525755 34277 81266 644513 12588 9601 47 21429 8629 3706 59854 3207 10388 8797 24071 83784 7233 938 18623 76262 48774 1021777 734433 27327 10567 3739 140105 2 23 3448 3685 296426 2787 2825 1362 10560 117744 816414 184549 74 150427 513 19424 5666 1333 103050 137097 1673 26867 234098 21 42281 27350 90 5026073 1252974 112909 1356507 8344 NA 112273 559498 13862 168943 243470 137771 12785 2010 65157 39535 172401 14480 131424 188020 20799 47177 178 30752 364 5114 83238 281664 1889 463798 246 311 10725 13432 8978 1835 44611 4950 279 166 241 2014 176802 874059 414317 11695 26 730858 62345 2591 16208 263140 90413 184318 39 2251 4085 11821 2878677 2946 44632 106 283458 86431 683242 187284 389819 18362 11300 242044 58518 2005 152001 5094986 6896 345 10828 11898 52 18629 15282 43 4874 80132 43467 3571 217 3097 55236 570 559 2020 1014 23244 63009 16334 520707 625716 1774766 325250 19490 34533 8583 37455 493135 24475 51529 73216 747 141593 161 2668 14618 48871 91576 10774 2090 802 91165 877242 73883 316330 69345 812420 2292 180410 349330 4034 68743 80754 5600 167906 443678 31888 88287
1990 High-income countries NA 75239010 1129128 3750041 269865 130506 870696 68 3747 166230 1364 19767 295653 76183 5113 256890 471750 33547 24343 2339 1038610 82908 378283 341469 32345 13875 71625 458 49737 495784 2633 302533 2979 6624 104707 5665 3595 79847 236706 48942 955026 3230 295 8873 15899 18630 191920 3393997 543947 83390 371559 18926 55889 17443 48999 48823 364144 792038 39123 163358 253465 5006 60941 193341 3439 33314 428041 167368 988912 508196 7264 33720 33763 221817 7524 242 86020 244824 920276 67 3068 8575 12552 85436 2341943 125922 11924 850206 9002 41916 210 1258 241209 14110 15677 218018 307389 32 115686 345011 17522 3220840 1262444 558435 260814 909863 10735 160317 2865574 584177 447810 220953 58669 154006 988 37136 7158 251279 83242 438825 374 18194 43988 3254 197421 27374 42610 11559 433218 170 51042 109923 1302 326 12944 99283 NA 4354433 7613 3939 2797 55407 6971 1566509 78690 133895 1178 465 198216 676799 3016 374264 179710 3276 150030 5823 2140 129151 699 1364109 2577 92275 1752 16247 235120 1877166 1408466 1482462 1186217 964 1390454 61579 125 782241 1696178 6387 539 17764 18009 435 36004 59557 1346 5860 39846 129613 669796 28947 18818 98375 13931 127080 55158 1115 71322 231960 5082 978205 570988 62634 267235 160906 550 198343 287883 457665 9548 354377 232460 10514 18740 1523 29975 188544 447653 2462019 46257 2307 1138 70937 1013987 52769 3543224 60703 1243574 1567 61771 65706 4358 121817 1043808 6472 223 429974 25598 40644
1990 Middle-income countries NA 68475181 863674 2156273 6454816 48984 30942 1973 45 174216 515 1487 134162 823500 1108 46271 33878 1600573 785 10428 4412312 1867 1390746 22708 3769 178525 78 11103 174950 367615 20379 190098 115 19699 512448 972308 3050 8213 117902 33349 39862 1095 78 28838 308855 97 297659 790296 6978 12258 635974 1872 10627 45 20451 8911 62057 41623 388 11001 24049 33509 138965 8676 938 633 22579 46363 332149 733681 27658 67216 80165 1004248 NA 18 4034 5866 121687 946 NA 1448 10562 835005 931166 185575 70 191208 508 1329 141 118 102414 139915 2557 12415 193114 NA 40867 41923 113 3117161 372978 71399 1245865 7775 NA 118639 548183 3973 160640 92784 2913764 19661 3065 44346 515457 229895 131923 60011 188009 20792 30887 176 143662 454 788 31753 129272 2022 471844 812 126 342 13141 7981 NA 39935 101 468 21358 23271 82 18304 436930 551393 14509 947 549830 44446 292 11294 262327 90570 205357 28 385 9384 11835 1965418 381 42428 1253 281678 77728 150908 124840 389487 13049 11240 228256 564231 1143 30846 10966911 7118 314 265 1012 50 955 15148 69 7974 65332 136902 39008 215 3095 57826 141 6381 36479 1097 156197 53289 11253 469393 314650 1750434 91681 1454 34583 7594 34586 162988 486167 75178 76396 747 141694 161 2643 8889 6280 68594 213734 4 1151 166212 4535406 26771 227178 76599 482311 551 175201 1362223 550 61107 190824 12 168016 25400 14442 123177
1990 Upper-middle-income countries NA 33412332 217939 784944 3152582 48446 4728 1954 43 37435 515 1428 110950 767233 1099 19915 32649 1004974 742 1440 34282 1796 887449 20464 1718 16857 78 13 174543 367272 19646 177525 115 3259 509624 2817 2897 108 74902 19086 27958 1082 10 260 2432 97 291754 223910 6706 12258 633504 1441 8666 45 11459 3058 61914 39481 278 10879 23126 30696 15746 7873 695 627 22270 45229 110351 369556 26868 1929 67176 5405 NA 18 914 5212 70955 933 NA 375 141 741974 863749 9652 63 184100 239 1328 141 5 89633 2129 573 12414 193023 NA 9988 41018 111 60483 281978 49674 1206517 7426 NA 14331 537390 3893 131006 24495 2468541 4795 2074 8350 449222 165694 107830 47029 187710 309 3801 174 108135 447 738 14606 14863 191 18016 806 NA 342 588 5434 NA 22125 101 404 21314 23271 82 7692 347999 234764 11301 26 21400 41585 NA 7803 111840 332 28463 28 282 4837 1595 26627 NA 40905 84 280370 69612 141398 108806 381154 12678 945 213232 369602 1143 24002 3510798 2867 314 264 1012 50 955 15137 69 2387 23877 13402 38973 200 743 32914 109 6294 36479 690 8720 20288 5219 461125 20471 595184 55109 1425 34532 6677 31579 85496 436941 74811 40419 NA 7105 161 2572 8885 2713 60052 175728 4 774 5738 3963480 5402 174388 2888 372371 494 174515 1177624 387 60075 82891 12 168016 17333 13174 88436
1990 Lower-middle-income countries NA 35062849 645735 1371329 3302234 538 26214 19 2 136781 NA 59 23212 56267 9 26356 1229 595599 43 8988 4378030 71 503297 2244 2051 161668 NA 11090 407 343 733 12573 NA 16440 2824 969491 153 8105 43000 14263 11904 13 68 28578 306423 NA 5905 566386 272 NA 2470 431 1961 NA 8992 5853 143 2142 110 122 923 2813 123219 803 243 6 309 1134 221798 364125 790 65287 12989 998843 NA NA 3120 654 50732 13 NA 1073 10421 93031 67417 175923 7 7108 269 1 NA 113 12781 137786 1984 1 91 NA 30879 905 2 3056678 91000 21725 39348 349 NA 104308 10793 80 29634 68289 445223 14866 991 35996 66235 64201 24093 12982 299 20483 27086 2 35527 7 50 17147 114409 1831 453828 6 126 NA 12553 2547 NA 17810 NA 64 44 NA NA 10612 88931 316629 3208 921 528430 2861 292 3491 150487 90238 176894 NA 103 4547 10240 1938791 381 1523 1169 1308 8116 9510 16034 8333 371 10295 15024 194629 NA 6844 7456113 4251 NA 1 NA NA NA 11 NA 5587 41455 123500 35 15 2352 24912 32 87 NA 407 147477 33001 6034 8268 294179 1155250 36572 29 51 917 3007 77492 49226 367 35977 747 134589 NA 71 4 3567 8542 38006 NA 377 160474 571926 21369 52790 73711 109940 57 686 184599 163 1032 107933 NA NA 8067 1268 34741
1990 Low-income countries NA 8533974 141127 419508 NA NA 4137 NA NA 469496 NA NA 466 NA NA 105 190 NA NA NA 624 NA NA 918 NA 41242 NA 16805 NA NA 3025 1263 NA NA NA 40895 330554 3621 NA 33020 1718 NA NA 8529 12162 NA 333 44515 97 NA 828 8549 24786 NA 274 313132 NA 1264 NA 5 NA 59 202056 177 904 NA 2406 NA 67 NA 50 69681 NA 461452 NA NA NA NA 38235 NA NA 5185 13035 NA 4443 61806 NA 41 NA NA NA NA NA 198070 36563 NA NA NA NA NA NA 380769 845 NA NA 1 NA NA 2352 NA 506 NA NA 68452 NA NA NA NA NA 7331 3610 477733 1196 NA NA NA 10810 78053 127 NA 124664 NA NA NA 107762 113 NA 306 NA NA NA NA NA 3405 1702389 NA 370 NA NA 532 NA NA NA 38880 93073 NA NA NA NA 12047 NA NA NA NA 438 1116 NA 7264 1546 NA 6087 NA 957 NA 1448 537214 NA NA NA NA NA NA NA 107 1859 101843 NA 214 39039 NA NA NA 15 NA 619448 42341 498608 1703 198 NA 228147 NA 51 1130 675 30 40537 NA 232 NA 32935 NA NA NA NA 6 NA NA NA 74341 84 NA 23637 66970 9357 NA 286 2027 NA 2510 716 NA NA 68 43170 12876

There are regional aggregates that I don’t want to include These aggregates dont have a type of data column. I will use that to remove them. I will then drop the Notes, Total, Other.North and Other.South columns.

un_df <- un_df %>%
  filter(str_length(Type.of.data)>0) %>%
  select(-Type.of.data:-Other.South)

Now we are ready to change the data from wide to long.

un_df <- un_df %>%
  gather(key = Origin, value = Migrants, -Year, -Destination) %>%
  drop_na(Migrants)

Now I need to clean up the Origins columns. There are periods that represent spaces. When there are two periods it needs to be changed into a period and space. An example of that is Micronesia.(Fed..States.of). I will also reorder the columns.

un_df <- un_df %>%
  mutate(Origin = str_replace_all(Origin, "[.]", " ")) %>%
  mutate(Origin = str_replace_all(Origin, "  ", ". ")) %>%
  mutate(Migrants = as.numeric(Migrants)) %>%
  mutate(Origin = as.vector(Origin)) %>%
  mutate(Destination = as.vector(Destination)) %>%
  select(Origin, Destination, Migrants, Year)

And here is the results:

Origin Destination Migrants Year
Afghanistan Egypt 237 1990
Afghanistan Libya 677 1990
Afghanistan Namibia 64 1990
Afghanistan South Africa 59 1990
Afghanistan Tajikistan 8485 1990
Afghanistan India 14159 1990
Afghanistan Iran (Islamic Republic of) 3123968 1990
Afghanistan Pakistan 3276673 1990
Afghanistan Malaysia 32 1990
Afghanistan Philippines 25 1990

Analysis

Top 5 Receivers of International Migrants

What countries are the top receiverd of migrants they are receiving by year? Does the contries on this list change over time? Note: These data are not adjusted for population.

top_destinations <- un_df %>%
  select(-Origin) %>%
  group_by(Year, Destination) %>%
  summarise(Migrants = sum(Migrants)) %>%
  rename(Country = Destination) %>%
  arrange(Year, -Migrants) %>%
  top_n(5) %>%
  mutate(Rank = 1:n())
Table 9. Top 5 Migrant Destinations by Year
Rank 1990 1995 2000 2005 2010 2015 2017
1 United States of America United States of America United States of America United States of America United States of America United States of America United States of America
2 Russian Federation Russian Federation Russian Federation Russian Federation Russian Federation Russian Federation Germany
3 India Germany Germany Germany Germany Saudi Arabia Saudi Arabia
4 Ukraine India India France Saudi Arabia Germany Russian Federation
5 Pakistan France France Saudi Arabia United Kingdom United Kingdom United Kingdom

The United States is consitently at the top of the list. I guess we continue to be a nation of immigrants. Russia, Germany and the United Kingdom look like other hot spots for immigrants.

The United States

I was originally going to look at how the United States stacks up in the ranking but we know the answer to that from the last analysis. I will look deeper into the United States. What countries do most of the immgigrants come from?

rankings_df <- un_df %>%
  filter(Destination == "United States of America") %>%
  group_by(Year, Origin) %>%
  summarise(Migrants = sum(Migrants)) %>%
  rename(Country = Origin) %>%
  arrange(Year, -Migrants) %>%
  top_n(10) %>%
  mutate(Rank = 1:n())
Table 10. Countries the United States Immigrants Come From
Rank 1990 1995 2000 2005 2010 2015 2017
1 Mexico Mexico Mexico Mexico Mexico Mexico Mexico
2 Puerto Rico Puerto Rico Puerto Rico Puerto Rico China China China
3 Philippines Philippines Philippines China Philippines India India
4 China China China Philippines India Philippines Philippines
5 Canada Cuba India India Puerto Rico Puerto Rico Puerto Rico
6 Cuba Canada Viet Nam Viet Nam Viet Nam Viet Nam Viet Nam
7 Germany Viet Nam Cuba El Salvador El Salvador El Salvador El Salvador
8 United Kingdom India Republic of Korea Republic of Korea Republic of Korea Cuba Cuba
9 Italy Republic of Korea Canada Cuba Cuba Republic of Korea Republic of Korea
10 Republic of Korea Germany El Salvador Canada Canada Dominican Republic Dominican Republic

Mexico seems to consistently be the highest source of immigrants to the United States. China and India have recently increased in popularity. Puerto Rico is often high on the list. The mix of top countries is pretty consistent year over year.

Network Communities

library(igraph)
un_graph_2017 <- un_df %>%
  filter(Year == 2017) %>%
  select(Origin, Destination) %>%
  graph_from_data_frame()

communities <- cluster_walktrap(un_graph_2017)

communities
## IGRAPH clustering walktrap, groups: 10, mod: 0.1
## + groups:
##   $`1`
##    [1] "American Samoa" "Cook Islands"   "Fiji"           "Kiribati"      
##    [5] "Nauru"          "Niue"           "Samoa"          "Tokelau"       
##    [9] "Tonga"          "Tuvalu"        
##   
##   $`2`
##     [1] "Afghanistan"                     
##     [2] "Albania"                         
##     [3] "Algeria"                         
##     [4] "Andorra"                         
##   + ... omitted several groups/vertices

There are 10 coomunities that emerge. I would like to visulize it but that would take too much time. I have one more dataset to look at.

NYS Adult Arrests by County

This is the datase featured in my discussion board post. It is produced by the NYS Division of Criminal Justics Services and is the count of adult arrests.

Research Questions

Here are the questions I want to answer with this data:

  • What is the general trend in total arrests?

About the Data

The counts of arrests are for fingerprintable offenses. An adult arrest is defined as an arrest of a person 16 years old or older or a juvenile offender prosecuted in adult court. Fingerprintable offenses (defined in Criminal Procedure Law §160.10) include any felony, a misdemeanor defined in the penal law, a misdemeanor defined outside the penal law which would constitute a felony if such a person had a previous judgment of conviction for a crime, or loitering for the purpose of engaging in prostitution as defined in subdivision two of Penal Law §240.37.

Data Acquisition

The dataset from the New York State Division of Criminal Justice Services was accessed via data.ny.gov. It was last updated on August 7, 2018 at the time of this pull.

file_name <- "data/NYS_Adult_Arrests_by_County.csv"
if(!file.exists(file_name)){
  download.file("https://data.ny.gov/api/views/rikd-mt35/rows.csv?accessType=DOWNLOAD", file_name)
}

arrest_df <- read.csv(file_name)

Creating Tidy Data

I will begin by reading in a few rows and examing them to determine what actions need to be taken.

Table 11. NYS Adult Arrests
County Year Total Felony.Total Drug.Felony Violent.Felony DWI.Felony Other.Felony Misdemeanor.Total Drug.Misd DWI.Misd Property.Misd Other.Misd
Albany 1970 1226 688 97 191 5 395 538 207 48 95 188
Albany 1971 1833 829 131 231 6 461 1004 204 111 272 417
Albany 1972 3035 1054 211 256 8 579 1981 285 297 541 858
Albany 1973 3573 1134 244 274 28 588 2439 369 497 668 905
Albany 1974 4255 1329 281 308 17 723 2926 437 619 885 985
Albany 1975 4173 1259 209 344 12 694 2914 398 463 977 1076
Albany 1976 4601 1435 201 434 26 774 3166 362 574 1011 1219
Albany 1977 4819 1342 122 403 45 772 3477 270 858 1133 1216
Albany 1978 5768 1485 85 433 58 909 4283 157 1540 1333 1253
Albany 1979 6534 1660 144 513 65 938 4874 224 1846 1418 1386
Albany 1980 6335 1519 99 508 79 833 4816 257 1670 1476 1413
Albany 1981 6453 1737 158 601 81 897 4716 221 1483 1502 1510
Albany 1982 6405 1684 180 642 122 740 4721 188 1789 1384 1360
Albany 1983 6317 1706 139 632 126 809 4611 158 1851 1361 1241
Albany 1984 5932 1702 150 592 116 844 4230 199 1602 1196 1233
Albany 1985 6261 1736 162 569 159 846 4525 195 1736 1346 1248
Albany 1986 6660 2026 174 721 144 987 4634 247 1479 1398 1510
Albany 1987 6952 1842 224 686 118 814 5110 300 1649 1637 1524
Albany 1988 7749 1975 332 686 138 819 5774 413 1647 1839 1875
Albany 1989 8768 2377 573 715 149 940 6391 402 1817 2075 2097
Albany 1990 9165 2339 370 754 158 1057 6826 255 1878 2247 2446
Albany 1991 9937 2743 603 826 140 1174 7194 589 1520 2289 2796
Albany 1992 10190 2786 598 842 182 1164 7404 765 1406 2462 2771
Albany 1993 10182 2918 760 860 149 1149 7264 797 1308 2250 2909
Albany 1994 10004 2988 736 876 132 1244 7016 880 1176 2309 2651

Developing a Workplan

This is a wide data set that needs to be transformed to a long dataset. The column headers will need to be cleaned up after the transformation.

Execution of the Plan

arrest_df <- arrest_df %>%
  gather(Type, Arrests, -County, -Year) %>%
  mutate(Type = gsub("[.]", " ", Type))

This is realitively straight forward and is ready for an analysis.

Table 12. Tidy NYS Adult Arrests
County Year Type Arrests
Albany 1970 Total 1226
Albany 1971 Total 1833
Albany 1972 Total 3035
Albany 1973 Total 3573
Albany 1974 Total 4255
Albany 1975 Total 4173
Albany 1976 Total 4601
Albany 1977 Total 4819
Albany 1978 Total 5768
Albany 1979 Total 6534
Albany 1980 Total 6335
Albany 1981 Total 6453
Albany 1982 Total 6405
Albany 1983 Total 6317
Albany 1984 Total 5932
Albany 1985 Total 6261
Albany 1986 Total 6660
Albany 1987 Total 6952
Albany 1988 Total 7749
Albany 1989 Total 8768
Albany 1990 Total 9165
Albany 1991 Total 9937
Albany 1992 Total 10190
Albany 1993 Total 10182
Albany 1994 Total 10004

Analysis

General Trend in Arrests

Crime rates have been declining generally across the country. Are the arrests rates decreasing in NYS?

Figure 3. Total Adult Arrests by County and Year

It’s difficult to tell. The difference in scale is making it really difficult. If I were continuing the analysis I would express the data on a per capita basis. I can do it as a percentage change from 1970

For both of these questions I want to look at the change in percentage terms. I will quickly add in the percentage change to the dataset. I will base all the changes from the earliest data, 1970.

arrest_df <- arrest_df %>%
  filter(Year == 1970) %>%
  select( -Year) %>%
  rename(Arrests_Base = Arrests) %>%
  merge(arrest_df) %>%
  mutate(Change = (Arrests / Arrests_Base)-1)
Figure 4. Percentage Change from 1970 Level (No Change in Red)

It appears that the number of arrests are up since 1970 which is unexpected.

arrest_df %>%
  group_by(Year) %>%
  filter(Type == "Total") %>%
  select(Arrests, Change) %>%
  summarise_all(funs(min=min, median=median, mean = mean, max=max)) %>%
  gather(stat, val, -Year)  %>%
  separate(stat, into = c("Variable", "stat"), "_") %>%
  spread(stat, val) %>%
  select(Variable, Year, min, mean, median, max) %>%
  arrange(Variable, Year) %>%
  kable() %>%
  kable_styling(bootstrap_options = c("striped", "hover", "responsive")) %>%
  scroll_box(width = "100%", height = "200px")
## Adding missing grouping variables: `Year`
Variable Year min mean median max
Arrests 1970 23.0000000 2742.1250000 223.0000000 5.182400e+04
Arrests 1971 59.0000000 3292.3125000 433.5000000 5.584900e+04
Arrests 1972 88.0000000 3897.3593750 781.5000000 5.710200e+04
Arrests 1973 121.0000000 3809.6250000 937.0000000 4.945000e+04
Arrests 1974 122.0000000 4293.7031250 1055.0000000 5.269200e+04
Arrests 1975 140.0000000 4455.5312500 1124.0000000 5.379400e+04
Arrests 1976 106.0000000 4778.8125000 1144.5000000 6.292100e+04
Arrests 1977 91.0000000 4918.8593750 1111.0000000 7.692400e+04
Arrests 1978 86.0000000 4970.3593750 1208.5000000 7.585500e+04
Arrests 1979 69.0000000 5061.5625000 1333.5000000 6.598900e+04
Arrests 1980 102.0000000 5049.4687500 1449.5000000 5.746800e+04
Arrests 1981 98.0000000 5306.4375000 1402.0000000 6.784900e+04
Arrests 1982 92.0000000 5870.1718750 1574.5000000 8.524200e+04
Arrests 1983 84.0000000 6004.6093750 1507.0000000 9.397600e+04
Arrests 1984 77.0000000 6328.7031250 1442.0000000 9.907400e+04
Arrests 1985 100.0000000 6606.4062500 1511.5000000 1.054290e+05
Arrests 1986 86.0000000 7149.9843750 1593.0000000 1.137840e+05
Arrests 1987 86.0000000 7541.0312500 1675.5000000 1.127390e+05
Arrests 1988 93.0000000 7968.6562500 1861.5000000 1.113210e+05
Arrests 1989 113.0000000 8246.4218750 1972.0000000 1.059500e+05
Arrests 1990 35.0000000 8122.8437500 2126.5000000 1.046460e+05
Arrests 1991 13.0000000 7956.7031250 2210.5000000 1.035500e+05
Arrests 1992 31.0000000 7647.9062500 2028.5000000 9.736500e+04
Arrests 1993 28.0000000 7593.7812500 2059.0000000 9.687000e+04
Arrests 1994 45.0000000 8456.0156250 2071.0000000 1.126740e+05
Arrests 1995 23.0000000 8741.7968750 2113.5000000 1.143890e+05
Arrests 1996 46.0000000 8896.9843750 2160.5000000 1.093160e+05
Arrests 1997 40.0000000 9210.0937500 2181.0000000 1.144940e+05
Arrests 1998 39.0000000 9308.2500000 2074.5000000 1.126800e+05
Arrests 1999 98.0000000 8787.6190476 2055.0000000 9.890500e+04
Arrests 2000 80.0000000 9123.7619048 2178.0000000 1.001920e+05
Arrests 2001 1.0000000 8335.0625000 2021.5000000 9.186300e+04
Arrests 2002 74.0000000 8376.6507937 2145.0000000 8.459000e+04
Arrests 2003 79.0000000 8207.7936508 2045.0000000 7.942800e+04
Arrests 2004 1.0000000 8124.2343750 2023.0000000 8.391100e+04
Arrests 2005 77.0000000 8422.4444444 2058.0000000 8.196500e+04
Arrests 2006 93.0000000 8720.5873016 2154.0000000 8.522300e+04
Arrests 2007 90.0000000 9142.4126984 2199.0000000 9.259600e+04
Arrests 2008 67.0000000 9180.2063492 2153.0000000 9.180500e+04
Arrests 2009 58.0000000 9316.7619048 2002.0000000 9.540200e+04
Arrests 2010 95.0000000 9314.2857143 1950.0000000 9.548300e+04
Arrests 2011 80.0000000 9172.0476190 2066.0000000 9.563100e+04
Arrests 2012 102.0000000 8961.2857143 1997.0000000 9.132000e+04
Arrests 2013 75.0000000 8729.7301587 2005.0000000 8.652800e+04
Arrests 2014 66.0000000 8434.3333333 1879.0000000 8.234800e+04
Arrests 2015 67.0000000 7794.3492063 1725.0000000 7.430200e+04
Arrests 2016 66.0000000 7625.8571429 1824.0000000 7.384400e+04
Arrests 2017 51.0000000 7133.0952381 1840.0000000 6.596700e+04
Change 1970 0.0000000 0.0000000 0.0000000 0.000000e+00
Change 1971 -0.1038961 0.7169906 0.6075843 2.082192e+00
Change 1972 -0.7157490 1.8812727 1.5375921 6.258993e+00
Change 1973 -0.4711908 2.5372940 2.1667064 8.035971e+00
Change 1974 -0.6939821 2.9962392 2.5911565 9.702703e+00
Change 1975 -0.5505762 3.2688592 3.0252239 9.013699e+00
Change 1976 -0.3341869 3.3889699 2.9589547 1.150685e+01
Change 1977 0.0806658 3.2103357 2.8194958 9.520548e+00
Change 1978 -0.2970551 3.4745031 2.9726789 1.046575e+01
Change 1979 0.0501014 4.1127151 3.6325398 1.352055e+01
Change 1980 -0.0034991 4.6125421 4.0172231 1.656164e+01
Change 1981 0.0733230 4.6196455 4.2254792 1.546575e+01
Change 1982 0.0486111 4.7633325 4.2262675 1.779452e+01
Change 1983 -0.7887324 4.7577780 4.2568198 1.665753e+01
Change 1984 -0.8796415 4.8991264 4.2193356 1.656164e+01
Change 1985 -0.8425096 5.1121035 4.2336060 1.823288e+01
Change 1986 -0.8156210 5.3386590 4.6518420 1.764384e+01
Change 1987 -0.8591549 5.5961969 4.9073519 1.861644e+01
Change 1988 -0.8706786 6.1244213 5.3743184 1.846575e+01
Change 1989 -0.8527529 6.7085685 6.1282821 1.936986e+01
Change 1990 -0.9551857 7.1866145 6.7804582 1.969863e+01
Change 1991 -0.9833547 7.2096979 7.0042062 1.973973e+01
Change 1992 -0.9603073 6.8707563 6.4226012 1.957534e+01
Change 1993 -0.9641485 6.7130617 6.2063510 1.767164e+01
Change 1994 -0.9423816 6.6492056 6.3303807 1.775373e+01
Change 1995 -0.9705506 7.0171122 6.7958592 1.984932e+01
Change 1996 -0.9411012 7.4747759 7.0133136 2.234247e+01
Change 1997 -0.9487836 7.2161148 7.1190220 1.841096e+01
Change 1998 -0.9500640 7.2158654 7.0959047 1.867910e+01
Change 1999 -0.3194444 7.0638271 6.5513627 1.883562e+01
Change 2000 -0.4305556 7.0340923 6.4343607 2.105479e+01
Change 2001 -0.9987196 6.7430754 6.2890361 1.860448e+01
Change 2002 -0.4861111 7.0475491 6.6936620 2.039726e+01
Change 2003 -0.4513889 7.0313270 6.4239130 2.109589e+01
Change 2004 -0.9987196 6.8762596 5.9845039 1.846269e+01
Change 2005 -0.4652778 6.9483520 6.1793478 1.945522e+01
Change 2006 -0.2500000 7.0285105 6.5194805 1.789552e+01
Change 2007 -0.2777778 6.9550611 6.6141304 1.705970e+01
Change 2008 -0.2638889 7.0818360 6.4966887 2.096269e+01
Change 2009 -0.2013889 7.1599724 6.0857143 2.137313e+01
Change 2010 0.0625000 7.0762433 6.3380952 2.058209e+01
Change 2011 -0.0902778 6.9641257 6.1329527 2.271642e+01
Change 2012 -0.2361111 7.0960175 6.3076923 2.298507e+01
Change 2013 -0.1250000 6.7837454 6.0946746 2.160448e+01
Change 2014 -0.2222222 6.4165523 5.6867863 2.013699e+01
Change 2015 -0.2638889 6.1525654 5.7712509 2.043284e+01
Change 2016 -0.2500000 6.1381153 5.8641304 1.977612e+01
Change 2017 -0.3472222 5.9585375 5.5756738 1.838060e+01

It appears that there are some oddities with this dataset that would need to be better understood before carying out further analyses.

Mike Silva

October 1, 2018