In the wake of the Great Recession of 2009, there has been a good deal of focus on employment statistics, one of the most important metrics policymakers use to gauge the overall strength of the economy. In the United States, the government measures unemployment using the Current Population Survey (CPS), which collects demographic and employment information from a wide range of Americans each month. In this exercise, we will employ the topics reviewed in the lectures as well as a few new techniques using the September 2013 version of this rich, nationally representative dataset (available online).

The observations in the dataset represent people surveyed in the September 2013 CPS who actually completed a survey. While the full dataset has 385 variables, in this exercise we will use a more compact version of the dataset, CPSData.csv, which has the following variables:

PeopleInHousehold: The number of people in the interviewee’s household.

Region: The census region where the interviewee lives.

State: The state where the interviewee lives.

MetroAreaCode: A code that identifies the metropolitan area in which the interviewee lives (missing if the interviewee does not live in a metropolitan area). The mapping from codes to names of metropolitan areas is provided in the file MetroAreaCodes.csv.

Age: The age, in years, of the interviewee. 80 represents people aged 80-84, and 85 represents people aged 85 and higher.

Married: The marriage status of the interviewee.

Sex: The sex of the interviewee.

Education: The maximum level of education obtained by the interviewee.

Race: The race of the interviewee.

Hispanic: Whether the interviewee is of Hispanic ethnicity.

CountryOfBirthCode: A code identifying the country of birth of the interviewee. The mapping from codes to names of countries is provided in the file CountryCodes.csv.

Citizenship: The United States citizenship status of the interviewee.

EmploymentStatus: The status of employment of the interviewee.

Industry: The industry of employment of the interviewee (only available if they are employed).

Section 1 - Loading and Summarizing the Dataset

1.1

Load the dataset from CPSData.csv into a data frame called CPS, and view the dataset with the summary() and str() commands.

CPS <- read.csv("CPSData.csv")
summary(CPS)
##  PeopleInHousehold       Region               State       MetroAreaCode  
##  Min.   : 1.000    Midwest  :30684   California  :11570   Min.   :10420  
##  1st Qu.: 2.000    Northeast:25939   Texas       : 7077   1st Qu.:21780  
##  Median : 3.000    South    :41502   New York    : 5595   Median :34740  
##  Mean   : 3.284    West     :33177   Florida     : 5149   Mean   :35075  
##  3rd Qu.: 4.000                      Pennsylvania: 3930   3rd Qu.:41860  
##  Max.   :15.000                      Illinois    : 3912   Max.   :79600  
##                                      (Other)     :94069   NA's   :34238  
##       Age                 Married          Sex       
##  Min.   : 0.00   Divorced     :11151   Female:67481  
##  1st Qu.:19.00   Married      :55509   Male  :63821  
##  Median :39.00   Never Married:30772                 
##  Mean   :38.83   Separated    : 2027                 
##  3rd Qu.:57.00   Widowed      : 6505                 
##  Max.   :85.00   NA's         :25338                 
##                                                      
##                    Education                   Race       
##  High school            :30906   American Indian :  1433  
##  Bachelor's degree      :19443   Asian           :  6520  
##  Some college, no degree:18863   Black           : 13913  
##  No high school diploma :16095   Multiracial     :  2897  
##  Associate degree       : 9913   Pacific Islander:   618  
##  (Other)                :10744   White           :105921  
##  NA's                   :25338                            
##     Hispanic      CountryOfBirthCode               Citizenship    
##  Min.   :0.0000   Min.   : 57.00     Citizen, Native     :116639  
##  1st Qu.:0.0000   1st Qu.: 57.00     Citizen, Naturalized:  7073  
##  Median :0.0000   Median : 57.00     Non-Citizen         :  7590  
##  Mean   :0.1393   Mean   : 82.68                                  
##  3rd Qu.:0.0000   3rd Qu.: 57.00                                  
##  Max.   :1.0000   Max.   :555.00                                  
##                                                                   
##            EmploymentStatus                               Industry    
##  Disabled          : 5712   Educational and health services   :15017  
##  Employed          :61733   Trade                             : 8933  
##  Not in Labor Force:15246   Professional and business services: 7519  
##  Retired           :18619   Manufacturing                     : 6791  
##  Unemployed        : 4203   Leisure and hospitality           : 6364  
##  NA's              :25789   (Other)                           :21618  
##                             NA's                              :65060
str(CPS)
## 'data.frame':    131302 obs. of  14 variables:
##  $ PeopleInHousehold : int  1 3 3 3 3 3 3 2 2 2 ...
##  $ Region            : Factor w/ 4 levels "Midwest","Northeast",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ State             : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MetroAreaCode     : int  26620 13820 13820 13820 26620 26620 26620 33660 33660 26620 ...
##  $ Age               : int  85 21 37 18 52 24 26 71 43 52 ...
##  $ Married           : Factor w/ 5 levels "Divorced","Married",..: 5 3 3 3 5 3 3 1 1 3 ...
##  $ Sex               : Factor w/ 2 levels "Female","Male": 1 2 1 2 1 2 2 1 2 2 ...
##  $ Education         : Factor w/ 8 levels "Associate degree",..: 1 4 4 6 1 2 4 4 4 2 ...
##  $ Race              : Factor w/ 6 levels "American Indian",..: 6 3 3 3 6 6 6 6 6 6 ...
##  $ Hispanic          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CountryOfBirthCode: int  57 57 57 57 57 57 57 57 57 57 ...
##  $ Citizenship       : Factor w/ 3 levels "Citizen, Native",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ EmploymentStatus  : Factor w/ 5 levels "Disabled","Employed",..: 4 5 1 3 2 2 2 2 3 2 ...
##  $ Industry          : Factor w/ 14 levels "Agriculture, forestry, fishing, and hunting",..: NA 11 NA NA 11 4 14 4 NA 12 ...

1.2

Among the interviewees with a value reported for the Industry variable, what is the most common industry of employment? Please enter the name exactly how you see it.

summary(CPS)
##  PeopleInHousehold       Region               State       MetroAreaCode  
##  Min.   : 1.000    Midwest  :30684   California  :11570   Min.   :10420  
##  1st Qu.: 2.000    Northeast:25939   Texas       : 7077   1st Qu.:21780  
##  Median : 3.000    South    :41502   New York    : 5595   Median :34740  
##  Mean   : 3.284    West     :33177   Florida     : 5149   Mean   :35075  
##  3rd Qu.: 4.000                      Pennsylvania: 3930   3rd Qu.:41860  
##  Max.   :15.000                      Illinois    : 3912   Max.   :79600  
##                                      (Other)     :94069   NA's   :34238  
##       Age                 Married          Sex       
##  Min.   : 0.00   Divorced     :11151   Female:67481  
##  1st Qu.:19.00   Married      :55509   Male  :63821  
##  Median :39.00   Never Married:30772                 
##  Mean   :38.83   Separated    : 2027                 
##  3rd Qu.:57.00   Widowed      : 6505                 
##  Max.   :85.00   NA's         :25338                 
##                                                      
##                    Education                   Race       
##  High school            :30906   American Indian :  1433  
##  Bachelor's degree      :19443   Asian           :  6520  
##  Some college, no degree:18863   Black           : 13913  
##  No high school diploma :16095   Multiracial     :  2897  
##  Associate degree       : 9913   Pacific Islander:   618  
##  (Other)                :10744   White           :105921  
##  NA's                   :25338                            
##     Hispanic      CountryOfBirthCode               Citizenship    
##  Min.   :0.0000   Min.   : 57.00     Citizen, Native     :116639  
##  1st Qu.:0.0000   1st Qu.: 57.00     Citizen, Naturalized:  7073  
##  Median :0.0000   Median : 57.00     Non-Citizen         :  7590  
##  Mean   :0.1393   Mean   : 82.68                                  
##  3rd Qu.:0.0000   3rd Qu.: 57.00                                  
##  Max.   :1.0000   Max.   :555.00                                  
##                                                                   
##            EmploymentStatus                               Industry    
##  Disabled          : 5712   Educational and health services   :15017  
##  Employed          :61733   Trade                             : 8933  
##  Not in Labor Force:15246   Professional and business services: 7519  
##  Retired           :18619   Manufacturing                     : 6791  
##  Unemployed        : 4203   Leisure and hospitality           : 6364  
##  NA's              :25789   (Other)                           :21618  
##                             NA's                              :65060
table(CPS$Industry)
## 
## Agriculture, forestry, fishing, and hunting 
##                                        1307 
##                                Armed forces 
##                                          29 
##                                Construction 
##                                        4387 
##             Educational and health services 
##                                       15017 
##                                   Financial 
##                                        4347 
##                                 Information 
##                                        1328 
##                     Leisure and hospitality 
##                                        6364 
##                               Manufacturing 
##                                        6791 
##                                      Mining 
##                                         550 
##                              Other services 
##                                        3224 
##          Professional and business services 
##                                        7519 
##                       Public administration 
##                                        3186 
##                                       Trade 
##                                        8933 
##                Transportation and utilities 
##                                        3260
max(table(CPS$Industry))
## [1] 15017

1.3

Recall from the homework assignment “The Analytical Detective” that you can call the sort() function on the output of the table() function to obtain a sorted breakdown of a variable. For instance, sort(table(CPS$Region)) sorts the regions by the number of interviewees from that region.

Which state has the fewest interviewees?

table(CPS$State)
## 
##              Alabama               Alaska              Arizona 
##                 1376                 1590                 1528 
##             Arkansas           California             Colorado 
##                 1421                11570                 2925 
##          Connecticut             Delaware District of Columbia 
##                 2836                 2214                 1791 
##              Florida              Georgia               Hawaii 
##                 5149                 2807                 2099 
##                Idaho             Illinois              Indiana 
##                 1518                 3912                 2004 
##                 Iowa               Kansas             Kentucky 
##                 2528                 1935                 1841 
##            Louisiana                Maine             Maryland 
##                 1450                 2263                 3200 
##        Massachusetts             Michigan            Minnesota 
##                 1987                 3063                 3139 
##          Mississippi             Missouri              Montana 
##                 1230                 2145                 1214 
##             Nebraska               Nevada        New Hampshire 
##                 1949                 1856                 2662 
##           New Jersey           New Mexico             New York 
##                 2567                 1102                 5595 
##       North Carolina         North Dakota                 Ohio 
##                 2619                 1645                 3678 
##             Oklahoma               Oregon         Pennsylvania 
##                 1523                 1943                 3930 
##         Rhode Island       South Carolina         South Dakota 
##                 2209                 1658                 2000 
##            Tennessee                Texas                 Utah 
##                 1784                 7077                 1842 
##              Vermont             Virginia           Washington 
##                 1890                 2953                 2366 
##        West Virginia            Wisconsin              Wyoming 
##                 1409                 2686                 1624
sort(table(CPS$State)) 
## 
##           New Mexico              Montana          Mississippi 
##                 1102                 1214                 1230 
##              Alabama        West Virginia             Arkansas 
##                 1376                 1409                 1421 
##            Louisiana                Idaho             Oklahoma 
##                 1450                 1518                 1523 
##              Arizona               Alaska              Wyoming 
##                 1528                 1590                 1624 
##         North Dakota       South Carolina            Tennessee 
##                 1645                 1658                 1784 
## District of Columbia             Kentucky                 Utah 
##                 1791                 1841                 1842 
##               Nevada              Vermont               Kansas 
##                 1856                 1890                 1935 
##               Oregon             Nebraska        Massachusetts 
##                 1943                 1949                 1987 
##         South Dakota              Indiana               Hawaii 
##                 2000                 2004                 2099 
##             Missouri         Rhode Island             Delaware 
##                 2145                 2209                 2214 
##                Maine           Washington                 Iowa 
##                 2263                 2366                 2528 
##           New Jersey       North Carolina        New Hampshire 
##                 2567                 2619                 2662 
##            Wisconsin              Georgia          Connecticut 
##                 2686                 2807                 2836 
##             Colorado             Virginia             Michigan 
##                 2925                 2953                 3063 
##            Minnesota             Maryland                 Ohio 
##                 3139                 3200                 3678 
##             Illinois         Pennsylvania              Florida 
##                 3912                 3930                 5149 
##             New York                Texas           California 
##                 5595                 7077                11570

Which state has the largest number of interviewees?

table(CPS$State)
## 
##              Alabama               Alaska              Arizona 
##                 1376                 1590                 1528 
##             Arkansas           California             Colorado 
##                 1421                11570                 2925 
##          Connecticut             Delaware District of Columbia 
##                 2836                 2214                 1791 
##              Florida              Georgia               Hawaii 
##                 5149                 2807                 2099 
##                Idaho             Illinois              Indiana 
##                 1518                 3912                 2004 
##                 Iowa               Kansas             Kentucky 
##                 2528                 1935                 1841 
##            Louisiana                Maine             Maryland 
##                 1450                 2263                 3200 
##        Massachusetts             Michigan            Minnesota 
##                 1987                 3063                 3139 
##          Mississippi             Missouri              Montana 
##                 1230                 2145                 1214 
##             Nebraska               Nevada        New Hampshire 
##                 1949                 1856                 2662 
##           New Jersey           New Mexico             New York 
##                 2567                 1102                 5595 
##       North Carolina         North Dakota                 Ohio 
##                 2619                 1645                 3678 
##             Oklahoma               Oregon         Pennsylvania 
##                 1523                 1943                 3930 
##         Rhode Island       South Carolina         South Dakota 
##                 2209                 1658                 2000 
##            Tennessee                Texas                 Utah 
##                 1784                 7077                 1842 
##              Vermont             Virginia           Washington 
##                 1890                 2953                 2366 
##        West Virginia            Wisconsin              Wyoming 
##                 1409                 2686                 1624
sort(table(CPS$State)) 
## 
##           New Mexico              Montana          Mississippi 
##                 1102                 1214                 1230 
##              Alabama        West Virginia             Arkansas 
##                 1376                 1409                 1421 
##            Louisiana                Idaho             Oklahoma 
##                 1450                 1518                 1523 
##              Arizona               Alaska              Wyoming 
##                 1528                 1590                 1624 
##         North Dakota       South Carolina            Tennessee 
##                 1645                 1658                 1784 
## District of Columbia             Kentucky                 Utah 
##                 1791                 1841                 1842 
##               Nevada              Vermont               Kansas 
##                 1856                 1890                 1935 
##               Oregon             Nebraska        Massachusetts 
##                 1943                 1949                 1987 
##         South Dakota              Indiana               Hawaii 
##                 2000                 2004                 2099 
##             Missouri         Rhode Island             Delaware 
##                 2145                 2209                 2214 
##                Maine           Washington                 Iowa 
##                 2263                 2366                 2528 
##           New Jersey       North Carolina        New Hampshire 
##                 2567                 2619                 2662 
##            Wisconsin              Georgia          Connecticut 
##                 2686                 2807                 2836 
##             Colorado             Virginia             Michigan 
##                 2925                 2953                 3063 
##            Minnesota             Maryland                 Ohio 
##                 3139                 3200                 3678 
##             Illinois         Pennsylvania              Florida 
##                 3912                 3930                 5149 
##             New York                Texas           California 
##                 5595                 7077                11570

1.4

What proportion of interviewees are citizens of the United States?

 table(CPS$Citizenship)
## 
##      Citizen, Native Citizen, Naturalized          Non-Citizen 
##               116639                 7073                 7590
116639/(116639+7073+7590)
## [1] 0.8883261

1.5

The CPS differentiates between race (with possible values American Indian, Asian, Black, Pacific Islander, White, or Multiracial) and ethnicity. A number of interviewees are of Hispanic ethnicity, as captured by the Hispanic variable. For which races are there at least 250 interviewees in the CPS dataset of Hispanic ethnicity? (Select all that apply.)

  • American Indian
  • Asian
  • Black
  • Multiracial
  • Pacific Islander
  • White
table(CPS$Race, CPS$Hispanic)
##                   
##                        0     1
##   American Indian   1129   304
##   Asian             6407   113
##   Black            13292   621
##   Multiracial       2449   448
##   Pacific Islander   541    77
##   White            89190 16731

Section 2 - Evaluating Missing Values

2.1

Which variables have at least one interviewee with a missing (NA) value? (Select all that apply.)

  • PeopleInHousehold
  • Region
  • State
  • MetroAreaCode
  • Age
  • Married
  • Sex
  • Education
  • Race
  • Hispanic
  • CountryOfBirthCode
  • Citizenship
  • EmploymentStatus
  • Industry
 summary(CPS)
##  PeopleInHousehold       Region               State       MetroAreaCode  
##  Min.   : 1.000    Midwest  :30684   California  :11570   Min.   :10420  
##  1st Qu.: 2.000    Northeast:25939   Texas       : 7077   1st Qu.:21780  
##  Median : 3.000    South    :41502   New York    : 5595   Median :34740  
##  Mean   : 3.284    West     :33177   Florida     : 5149   Mean   :35075  
##  3rd Qu.: 4.000                      Pennsylvania: 3930   3rd Qu.:41860  
##  Max.   :15.000                      Illinois    : 3912   Max.   :79600  
##                                      (Other)     :94069   NA's   :34238  
##       Age                 Married          Sex       
##  Min.   : 0.00   Divorced     :11151   Female:67481  
##  1st Qu.:19.00   Married      :55509   Male  :63821  
##  Median :39.00   Never Married:30772                 
##  Mean   :38.83   Separated    : 2027                 
##  3rd Qu.:57.00   Widowed      : 6505                 
##  Max.   :85.00   NA's         :25338                 
##                                                      
##                    Education                   Race       
##  High school            :30906   American Indian :  1433  
##  Bachelor's degree      :19443   Asian           :  6520  
##  Some college, no degree:18863   Black           : 13913  
##  No high school diploma :16095   Multiracial     :  2897  
##  Associate degree       : 9913   Pacific Islander:   618  
##  (Other)                :10744   White           :105921  
##  NA's                   :25338                            
##     Hispanic      CountryOfBirthCode               Citizenship    
##  Min.   :0.0000   Min.   : 57.00     Citizen, Native     :116639  
##  1st Qu.:0.0000   1st Qu.: 57.00     Citizen, Naturalized:  7073  
##  Median :0.0000   Median : 57.00     Non-Citizen         :  7590  
##  Mean   :0.1393   Mean   : 82.68                                  
##  3rd Qu.:0.0000   3rd Qu.: 57.00                                  
##  Max.   :1.0000   Max.   :555.00                                  
##                                                                   
##            EmploymentStatus                               Industry    
##  Disabled          : 5712   Educational and health services   :15017  
##  Employed          :61733   Trade                             : 8933  
##  Not in Labor Force:15246   Professional and business services: 7519  
##  Retired           :18619   Manufacturing                     : 6791  
##  Unemployed        : 4203   Leisure and hospitality           : 6364  
##  NA's              :25789   (Other)                           :21618  
##                             NA's                              :65060

2.2

Often when evaluating a new dataset, we try to identify if there is a pattern in the missing values in the dataset. We will try to determine if there is a pattern in the missing values of the Married variable. The function

is.na(CPS$Married) 

returns a vector of TRUE/FALSE values for whether the Married variable is missing. We can see the breakdown of whether Married is missing based on the reported value of the Region variable with the function

table(CPS$Region, is.na(CPS$Married))

Which is the most accurate:

  • The Married variable being missing is related to the Region value for the interviewee.
  • The Married variable being missing is related to the Sex value for the interviewee.
  • The Married variable being missing is related to the Age value for the interviewee.
  • The Married variable being missing is related to the Citizenship value for the interviewee.
  • The Married variable being missing is not related to the Region, Sex, Age, or Citizenship value for the interviewee.
table(CPS$Region, is.na(CPS$Married))
##            
##             FALSE  TRUE
##   Midwest   24609  6075
##   Northeast 21432  4507
##   South     33535  7967
##   West      26388  6789
table(CPS$Sex, is.na(CPS$Married))
##         
##          FALSE  TRUE
##   Female 55264 12217
##   Male   50700 13121
table(CPS$Age, is.na(CPS$Married))
##     
##      FALSE TRUE
##   0      0 1283
##   1      0 1559
##   2      0 1574
##   3      0 1693
##   4      0 1695
##   5      0 1795
##   6      0 1721
##   7      0 1681
##   8      0 1729
##   9      0 1748
##   10     0 1750
##   11     0 1721
##   12     0 1797
##   13     0 1802
##   14     0 1790
##   15  1795    0
##   16  1751    0
##   17  1764    0
##   18  1596    0
##   19  1517    0
##   20  1398    0
##   21  1525    0
##   22  1536    0
##   23  1638    0
##   24  1627    0
##   25  1604    0
##   26  1643    0
##   27  1657    0
##   28  1736    0
##   29  1645    0
##   30  1854    0
##   31  1762    0
##   32  1790    0
##   33  1804    0
##   34  1653    0
##   35  1716    0
##   36  1663    0
##   37  1531    0
##   38  1530    0
##   39  1542    0
##   40  1571    0
##   41  1673    0
##   42  1711    0
##   43  1819    0
##   44  1764    0
##   45  1749    0
##   46  1665    0
##   47  1647    0
##   48  1791    0
##   49  1989    0
##   50  1966    0
##   51  1931    0
##   52  1935    0
##   53  1994    0
##   54  1912    0
##   55  1895    0
##   56  1935    0
##   57  1827    0
##   58  1874    0
##   59  1758    0
##   60  1746    0
##   61  1735    0
##   62  1595    0
##   63  1596    0
##   64  1519    0
##   65  1569    0
##   66  1577    0
##   67  1227    0
##   68  1130    0
##   69  1062    0
##   70  1195    0
##   71  1031    0
##   72   941    0
##   73   896    0
##   74   842    0
##   75   763    0
##   76   729    0
##   77   698    0
##   78   659    0
##   79   661    0
##   80  2664    0
##   85  2446    0
table(CPS$Citizenship, is.na(CPS$Married))
##                       
##                        FALSE  TRUE
##   Citizen, Native      91956 24683
##   Citizen, Naturalized  6910   163
##   Non-Citizen           7098   492

2.3

As mentioned in the variable descriptions, MetroAreaCode is missing if an interviewee does not live in a metropolitan area. Using the same technique as in the previous question, answer the following questions about people who live in non-metropolitan areas.

How many states had all interviewees living in a non-metropolitan area (aka they have a missing MetroAreaCode value)? For this question, treat the District of Columbia as a state (even though it is not technically a state).

"2"
## [1] "2"

How many states had all interviewees living in a metropolitan area? Again, treat the District of Columbia as a state.

 table(CPS$State, is.na(CPS$MetroAreaCode))
##                       
##                        FALSE  TRUE
##   Alabama               1020   356
##   Alaska                   0  1590
##   Arizona               1327   201
##   Arkansas               724   697
##   California           11333   237
##   Colorado              2545   380
##   Connecticut           2593   243
##   Delaware              1696   518
##   District of Columbia  1791     0
##   Florida               4947   202
##   Georgia               2250   557
##   Hawaii                1576   523
##   Idaho                  761   757
##   Illinois              3473   439
##   Indiana               1420   584
##   Iowa                  1297  1231
##   Kansas                1234   701
##   Kentucky               908   933
##   Louisiana             1216   234
##   Maine                  909  1354
##   Maryland              2978   222
##   Massachusetts         1858   129
##   Michigan              2517   546
##   Minnesota             2150   989
##   Mississippi            376   854
##   Missouri              1440   705
##   Montana                199  1015
##   Nebraska               816  1133
##   Nevada                1609   247
##   New Hampshire         1148  1514
##   New Jersey            2567     0
##   New Mexico             832   270
##   New York              5144   451
##   North Carolina        1642   977
##   North Dakota           432  1213
##   Ohio                  2754   924
##   Oklahoma              1024   499
##   Oregon                1519   424
##   Pennsylvania          3245   685
##   Rhode Island          2209     0
##   South Carolina        1139   519
##   South Dakota           595  1405
##   Tennessee             1149   635
##   Texas                 6060  1017
##   Utah                  1455   387
##   Vermont                657  1233
##   Virginia              2367   586
##   Washington            1937   429
##   West Virginia          344  1065
##   Wisconsin             1882   804
##   Wyoming                  0  1624
"3"
## [1] "3"

2.4

Which region of the United States has the largest proportion of interviewees living in a non-metropolitan area?

  • Midwest
  • Northeast
  • South
  • West
 table(CPS$Region, is.na(CPS$MetroAreaCode))
##            
##             FALSE  TRUE
##   Midwest   20010 10674
##   Northeast 20330  5609
##   South     31631  9871
##   West      25093  8084

2.5

While we were able to use the table() command to compute the proportion of interviewees from each region not living in a metropolitan area, it was somewhat tedious (it involved manually computing the proportion for each region) and isn’t something you would want to do if there were a larger number of options. It turns out there is a less tedious way to compute the proportion of values that are TRUE. The mean() function, which takes the average of the values passed to it, will treat TRUE as 1 and FALSE as 0, meaning it returns the proportion of values that are true. For instance, mean(c(TRUE, FALSE, TRUE, TRUE)) returns 0.75. Knowing this, use tapply() with the mean function to answer the following questions:

Which state has a proportion of interviewees living in a non-metropolitan area closest to 30%?

tapply(is.na(CPS$MetroAreaCode), CPS$State, mean)
##              Alabama               Alaska              Arizona 
##           0.25872093           1.00000000           0.13154450 
##             Arkansas           California             Colorado 
##           0.49049965           0.02048401           0.12991453 
##          Connecticut             Delaware District of Columbia 
##           0.08568406           0.23396567           0.00000000 
##              Florida              Georgia               Hawaii 
##           0.03923092           0.19843249           0.24916627 
##                Idaho             Illinois              Indiana 
##           0.49868248           0.11221881           0.29141717 
##                 Iowa               Kansas             Kentucky 
##           0.48694620           0.36227390           0.50678979 
##            Louisiana                Maine             Maryland 
##           0.16137931           0.59832081           0.06937500 
##        Massachusetts             Michigan            Minnesota 
##           0.06492199           0.17825661           0.31506849 
##          Mississippi             Missouri              Montana 
##           0.69430894           0.32867133           0.83607908 
##             Nebraska               Nevada        New Hampshire 
##           0.58132376           0.13308190           0.56874530 
##           New Jersey           New Mexico             New York 
##           0.00000000           0.24500907           0.08060769 
##       North Carolina         North Dakota                 Ohio 
##           0.37304315           0.73738602           0.25122349 
##             Oklahoma               Oregon         Pennsylvania 
##           0.32764281           0.21821925           0.17430025 
##         Rhode Island       South Carolina         South Dakota 
##           0.00000000           0.31302774           0.70250000 
##            Tennessee                Texas                 Utah 
##           0.35594170           0.14370496           0.21009772 
##              Vermont             Virginia           Washington 
##           0.65238095           0.19844226           0.18131868 
##        West Virginia            Wisconsin              Wyoming 
##           0.75585522           0.29932986           1.00000000
"Wisconsin"
## [1] "Wisconsin"

Which state has the largest proportion of non-metropolitan interviewees, ignoring states where all interviewees were non-metropolitan?

sort(tapply(is.na(CPS$MetroAreaCode), CPS$State, mean))
## District of Columbia           New Jersey         Rhode Island 
##           0.00000000           0.00000000           0.00000000 
##           California              Florida        Massachusetts 
##           0.02048401           0.03923092           0.06492199 
##             Maryland             New York          Connecticut 
##           0.06937500           0.08060769           0.08568406 
##             Illinois             Colorado              Arizona 
##           0.11221881           0.12991453           0.13154450 
##               Nevada                Texas            Louisiana 
##           0.13308190           0.14370496           0.16137931 
##         Pennsylvania             Michigan           Washington 
##           0.17430025           0.17825661           0.18131868 
##              Georgia             Virginia                 Utah 
##           0.19843249           0.19844226           0.21009772 
##               Oregon             Delaware           New Mexico 
##           0.21821925           0.23396567           0.24500907 
##               Hawaii                 Ohio              Alabama 
##           0.24916627           0.25122349           0.25872093 
##              Indiana            Wisconsin       South Carolina 
##           0.29141717           0.29932986           0.31302774 
##            Minnesota             Oklahoma             Missouri 
##           0.31506849           0.32764281           0.32867133 
##            Tennessee               Kansas       North Carolina 
##           0.35594170           0.36227390           0.37304315 
##                 Iowa             Arkansas                Idaho 
##           0.48694620           0.49049965           0.49868248 
##             Kentucky        New Hampshire             Nebraska 
##           0.50678979           0.56874530           0.58132376 
##                Maine              Vermont          Mississippi 
##           0.59832081           0.65238095           0.69430894 
##         South Dakota         North Dakota        West Virginia 
##           0.70250000           0.73738602           0.75585522 
##              Montana               Alaska              Wyoming 
##           0.83607908           1.00000000           1.00000000

Section 3 - Integrating Metropolitan Area Data

Codes like MetroAreaCode and CountryOfBirthCode are a compact way to encode factor variables with text as their possible values, and they are therefore quite common in survey datasets. In fact, all but one of the variables in this dataset were actually stored by a numeric code in the original CPS datafile.

When analyzing a variable stored by a numeric code, we will often want to convert it into the values the codes represent. To do this, we will use a dictionary, which maps the the code to the actual value of the variable. We have provided dictionaries MetroAreaCodes.csv and CountryCodes.csv, which respectively map MetroAreaCode and CountryOfBirthCode into their true values. Read these two dictionaries into data frames MetroAreaMap and CountryMap.

3.1

How many observations (codes for metropolitan areas) are there in MetroAreaMap?

MetroAreaMap <- read.csv("MetroAreaCodes.csv")
CountryMap <- read.csv("CountryCodes.csv")
str(MetroAreaMap)
## 'data.frame':    271 obs. of  2 variables:
##  $ Code     : int  460 3000 3160 3610 3720 6450 10420 10500 10580 10740 ...
##  $ MetroArea: Factor w/ 271 levels "Akron, OH","Albany-Schenectady-Troy, NY",..: 12 92 97 117 122 195 1 3 2 4 ...

How many observations (codes for countries) are there in CountryMap?

str(CountryMap)
## 'data.frame':    149 obs. of  2 variables:
##  $ Code   : int  57 66 73 78 96 100 102 103 104 105 ...
##  $ Country: Factor w/ 149 levels "Afghanistan",..: 139 57 105 135 97 3 11 18 24 37 ...

3.2

To merge in the metropolitan areas, we want to connect the field MetroAreaCode from the CPS data frame with the field Code in MetroAreaMap. The following command merges the two data frames on these columns, overwriting the CPS data frame with the result:

CPS = merge(CPS, MetroAreaMap, by.x="MetroAreaCode", by.y="Code", all.x=TRUE)

The first two arguments determine the data frames to be merged (they are called “x” and “y”, respectively, in the subsequent parameters to the merge function). by.x=“MetroAreaCode” means we’re matching on the MetroAreaCode variable from the “x” data frame (CPS), while by.y=“Code” means we’re matching on the Code variable from the “y” data frame (MetroAreaMap). Finally, all.x=TRUE means we want to keep all rows from the “x” data frame (CPS), even if some of the rows’ MetroAreaCode doesn’t match any codes in MetroAreaMap (for those familiar with database terminology, this parameter makes the operation a left outer join instead of an inner join).

Review the new version of the CPS data frame with the summary() and str() functions. What is the name of the variable that was added to the data frame by the merge() operation?

str(CPS)
## 'data.frame':    131302 obs. of  14 variables:
##  $ PeopleInHousehold : int  1 3 3 3 3 3 3 2 2 2 ...
##  $ Region            : Factor w/ 4 levels "Midwest","Northeast",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ State             : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ MetroAreaCode     : int  26620 13820 13820 13820 26620 26620 26620 33660 33660 26620 ...
##  $ Age               : int  85 21 37 18 52 24 26 71 43 52 ...
##  $ Married           : Factor w/ 5 levels "Divorced","Married",..: 5 3 3 3 5 3 3 1 1 3 ...
##  $ Sex               : Factor w/ 2 levels "Female","Male": 1 2 1 2 1 2 2 1 2 2 ...
##  $ Education         : Factor w/ 8 levels "Associate degree",..: 1 4 4 6 1 2 4 4 4 2 ...
##  $ Race              : Factor w/ 6 levels "American Indian",..: 6 3 3 3 6 6 6 6 6 6 ...
##  $ Hispanic          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ CountryOfBirthCode: int  57 57 57 57 57 57 57 57 57 57 ...
##  $ Citizenship       : Factor w/ 3 levels "Citizen, Native",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ EmploymentStatus  : Factor w/ 5 levels "Disabled","Employed",..: 4 5 1 3 2 2 2 2 3 2 ...
##  $ Industry          : Factor w/ 14 levels "Agriculture, forestry, fishing, and hunting",..: NA 11 NA NA 11 4 14 4 NA 12 ...

How many interviewees have a missing value for the new metropolitan area variable? Note that all of these interviewees would have been removed from the merged data frame if we did not include the all.x=TRUE parameter.

summary(CPS)
##  PeopleInHousehold       Region               State       MetroAreaCode  
##  Min.   : 1.000    Midwest  :30684   California  :11570   Min.   :10420  
##  1st Qu.: 2.000    Northeast:25939   Texas       : 7077   1st Qu.:21780  
##  Median : 3.000    South    :41502   New York    : 5595   Median :34740  
##  Mean   : 3.284    West     :33177   Florida     : 5149   Mean   :35075  
##  3rd Qu.: 4.000                      Pennsylvania: 3930   3rd Qu.:41860  
##  Max.   :15.000                      Illinois    : 3912   Max.   :79600  
##                                      (Other)     :94069   NA's   :34238  
##       Age                 Married          Sex       
##  Min.   : 0.00   Divorced     :11151   Female:67481  
##  1st Qu.:19.00   Married      :55509   Male  :63821  
##  Median :39.00   Never Married:30772                 
##  Mean   :38.83   Separated    : 2027                 
##  3rd Qu.:57.00   Widowed      : 6505                 
##  Max.   :85.00   NA's         :25338                 
##                                                      
##                    Education                   Race       
##  High school            :30906   American Indian :  1433  
##  Bachelor's degree      :19443   Asian           :  6520  
##  Some college, no degree:18863   Black           : 13913  
##  No high school diploma :16095   Multiracial     :  2897  
##  Associate degree       : 9913   Pacific Islander:   618  
##  (Other)                :10744   White           :105921  
##  NA's                   :25338                            
##     Hispanic      CountryOfBirthCode               Citizenship    
##  Min.   :0.0000   Min.   : 57.00     Citizen, Native     :116639  
##  1st Qu.:0.0000   1st Qu.: 57.00     Citizen, Naturalized:  7073  
##  Median :0.0000   Median : 57.00     Non-Citizen         :  7590  
##  Mean   :0.1393   Mean   : 82.68                                  
##  3rd Qu.:0.0000   3rd Qu.: 57.00                                  
##  Max.   :1.0000   Max.   :555.00                                  
##                                                                   
##            EmploymentStatus                               Industry    
##  Disabled          : 5712   Educational and health services   :15017  
##  Employed          :61733   Trade                             : 8933  
##  Not in Labor Force:15246   Professional and business services: 7519  
##  Retired           :18619   Manufacturing                     : 6791  
##  Unemployed        : 4203   Leisure and hospitality           : 6364  
##  NA's              :25789   (Other)                           :21618  
##                             NA's                              :65060

3.3

Which of the following metropolitan areas has the largest number of interviewees?

  • Atlanta-Sandy Springs-Marietta, GA
  • Baltimore-Towson, MD
  • Boston-Cambridge-Quincy, MA-NH
  • San Francisco-Oakland-Fremont, CA
table(CPS$MetroArea)
## 
## 10420 10500 10580 10740 10900 11020 11100 11300 11340 11460 11500 11540 
##   231    68   268   609   334    82    88    62    64    85    61   125 
## 11700 12020 12060 12100 12260 12420 12540 12580 12940 13140 13380 13460 
##   116    65  1552   111   161   516   245  1483   262   123    70   140 
## 13740 13780 13820 14020 14060 14260 14500 14540 14740 15180 15380 15940 
##   199    73   392   104    40   644   171    29    87    79   344   118 
## 15980 16300 16580 16620 16700 16740 16860 16980 17020 17140 17460 17660 
##   146   196   122   262   232   517   167  2772    60   719   681   117 
## 17820 17860 17900 17980 18140 18580 19100 19340 19380 19460 19500 19660 
##   372    47   291    59   551   132  1863   240   268    96    81   140 
## 19740 19780 19820 20100 20260 20500 20740 20940 21340 21500 21660 21780 
##  1504   501  1354   456   126   189   110    99   244    87   196    99 
## 22020 22140 22180 22220 22420 22460 22660 22900 23020 23060 23420 23540 
##   432    64    77   215   102    63   206   105    80   136   303    70 
## 24340 24540 24580 24660 24860 25060 25180 25420 25500 25860 26100 26180 
##   304   162   136   251   185    65    86   174    90    57    78  1576 
## 26420 26580 26620 26900 26980 27100 27140 27260 27340 27500 27740 27780 
##  1649    82   117   570   131    70   222   393    63    99    52    63 
## 27900 28020 28100 28140 28660 28700 28740 28940 29100 29180 29340 29460 
##    59   127    87   962   101    67    87   168   114   181    81   149 
## 29540 29620 29700 29740 29820 29940 30020 30460 30780 30980 31100 31140 
##   156   119    89   107  1299    98    97   198   404    65  4102   519 
## 31180 31340 31420 31460 31540 32580 32780 32820 32900 33100 33140 33260 
##    63    73    65    57   284   195    82   348   106  1554    77    51 
## 33340 33460 33660 33700 33740 33780 33860 34740 34820 34900 34940 34980 
##   714  1942   110   158   179    63   103    90   102    61    82   505 
## 35380 35620 35660 36100 36140 36260 36420 36500 36540 36740 36780 37100 
##   367  5409    51    76    30   423   604    99   957   610    85   267 
## 37340 37460 37860 37900 37980 38060 38300 38900 38940 39100 39140 39340 
##   168    59   107   112  2855   971   732  1089   109   201    54   309 
## 39380 39460 39540 39580 39740 39900 40060 40140 40220 40380 40420 40900 
##   130    48   119   336   142   310   490  1290    66   307   114   667 
## 40980 41060 41180 41420 41500 41540 41620 41700 41740 41860 41940 42020 
##    74    82   956   170   104    74   723   607   907  1386   670    77 
## 42060 42100 42140 42220 42260 42340 42540 42660 43340 43620 43780 43900 
##   132    66    52   129   192   202   176  1255   146   595    81    99 
## 44060 44100 44180 44220 44700 45060 45220 45300 45780 45820 45940 46060 
##   156    76   161    34   193   223    43   842   235   182    91   302 
## 46140 46220 46540 46660 46700 46940 47020 47220 47260 47300 47380 47580 
##   323    78    80    42   133    79   116    54   597   121    79    42 
## 47900 47940 48140 48620 49180 49420 49620 49660 70750 70900 71650 71950 
##  4177   156    96   427   127   112   117   153   208    75  2229   730 
## 72400 72850 73450 74500 75700 76450 76750 77200 77350 78100 78700 79600 
##   657   112   885    66   506   203   701  2284   262   155   157   144

3.4

Which metropolitan area has the highest proportion of interviewees of Hispanic ethnicity? Hint: Use tapply() with mean, as in the previous subproblem. Calling sort() on the output of tapply() could also be helpful here.

tapply(CPS$Hispanic, CPS$MetroArea, mean)
##       10420       10500       10580       10740       10900       11020 
## 0.012987013 0.044117647 0.041044776 0.441707718 0.086826347 0.012195122 
##       11100       11300       11340       11460       11500       11540 
## 0.261363636 0.064516129 0.000000000 0.000000000 0.049180328 0.008000000 
##       11700       12020       12060       12100       12260       12420 
## 0.060344828 0.123076923 0.085695876 0.090090090 0.093167702 0.310077519 
##       12540       12580       12940       13140       13380       13460 
## 0.489795918 0.082265678 0.038167939 0.227642276 0.014285714 0.021428571 
##       13740       13780       13820       14020       14060       14260 
## 0.030150754 0.041095890 0.053571429 0.000000000 0.000000000 0.093167702 
##       14500       14540       14740       15180       15380       15940 
## 0.146198830 0.000000000 0.057471264 0.797468354 0.017441860 0.076271186 
##       15980       16300       16580       16620       16700       16740 
## 0.438356164 0.015306122 0.032786885 0.007633588 0.017241379 0.117988395 
##       16860       16980       17020       17140       17460       17660 
## 0.077844311 0.167388167 0.116666667 0.040333797 0.060205580 0.025641026 
##       17820       17860       17900       17980       18140       18580 
## 0.120967742 0.042553191 0.079037801 0.203389831 0.043557169 0.606060606 
##       19100       19340       19380       19460       19500       19660 
## 0.283950617 0.091666667 0.003731343 0.052083333 0.000000000 0.100000000 
##       19740       19780       19820       20100       20260       20500 
## 0.232047872 0.073852295 0.037666174 0.057017544 0.015873016 0.111111111 
##       20740       20940       21340       21500       21660       21780 
## 0.000000000 0.686868687 0.790983607 0.022988506 0.076530612 0.020202020 
##       22020       22140       22180       22220       22420       22460 
## 0.025462963 0.234375000 0.155844156 0.148837209 0.039215686 0.000000000 
##       22660       22900       23020       23060       23420       23540 
## 0.121359223 0.085714286 0.112500000 0.036764706 0.409240924 0.042857143 
##       24340       24540       24580       24660       24860       25060 
## 0.138157895 0.160493827 0.125000000 0.075697211 0.037837838 0.015384615 
##       25180       25420       25500       25860       26100       26180 
## 0.000000000 0.022988506 0.000000000 0.087719298 0.012820513 0.059644670 
##       26420       26580       26620       26900       26980       27100 
## 0.359005458 0.000000000 0.000000000 0.071929825 0.030534351 0.000000000 
##       27140       27260       27340       27500       27740       27780 
## 0.009009009 0.091603053 0.126984127 0.030303030 0.038461538 0.000000000 
##       27900       28020       28100       28140       28660       28700 
## 0.016949153 0.031496063 0.114942529 0.121621622 0.386138614 0.014925373 
##       28740       28940       29100       29180       29340       29460 
## 0.068965517 0.005952381 0.017543860 0.060773481 0.024691358 0.134228188 
##       29540       29620       29700       29740       29820       29940 
## 0.102564103 0.084033613 0.966292135 0.542056075 0.251732102 0.040816327 
##       30020       30460       30780       30980       31100       31140 
## 0.123711340 0.040404040 0.037128713 0.292307692 0.460263286 0.038535645 
##       31180       31340       31420       31460       31540       32580 
## 0.285714286 0.027397260 0.000000000 0.614035088 0.024647887 0.948717949 
##       32780       32820       32900       33100       33140       33260 
## 0.085365854 0.028735632 0.566037736 0.467824968 0.038961039 0.352941176 
##       33340       33460       33660       33700       33740       33780 
## 0.085434174 0.052008239 0.000000000 0.341772152 0.005586592 0.063492063 
##       33860       34740       34820       34900       34940       34980 
## 0.009708738 0.022222222 0.058823529 0.229508197 0.182926829 0.069306931 
##       35380       35620       35660       36100       36140       36260 
## 0.111716621 0.228508042 0.019607843 0.157894737 0.066666667 0.144208038 
##       36420       36500       36540       36740       36780       37100 
## 0.107615894 0.121212121 0.070010449 0.213114754 0.011764706 0.359550562 
##       37340       37460       37860       37900       37980       38060 
## 0.053571429 0.067796610 0.028037383 0.062500000 0.078458844 0.254376931 
##       38300       38900       38940       39100       39140       39340 
## 0.016393443 0.094582185 0.100917431 0.273631841 0.129629630 0.064724919 
##       39380       39460       39540       39580       39740       39900 
## 0.307692308 0.041666667 0.058823529 0.119047619 0.211267606 0.196774194 
##       40060       40140       40220       40380       40420       40900 
## 0.042857143 0.502325581 0.030303030 0.058631922 0.043859649 0.263868066 
##       40980       41060       41180       41420       41500       41540 
## 0.027027027 0.012195122 0.030334728 0.211764706 0.557692308 0.000000000 
##       41620       41700       41740       41860       41940       42020 
## 0.154910097 0.644151565 0.269018743 0.199855700 0.316417910 0.246753247 
##       42060       42100       42140       42220       42260       42340 
## 0.401515152 0.151515152 0.461538462 0.232558140 0.046875000 0.000000000 
##       42540       42660       43340       43620       43780       43900 
## 0.136363636 0.088446215 0.082191781 0.042016807 0.049382716 0.020202020 
##       44060       44100       44180       44220       44700       45060 
## 0.025641026 0.013157895 0.043478261 0.029411765 0.321243523 0.080717489 
##       45220       45300       45780       45820       45940       46060 
## 0.069767442 0.159144893 0.034042553 0.093406593 0.131868132 0.506622517 
##       46140       46220       46540       46660       46700       46940 
## 0.114551084 0.102564103 0.075000000 0.047619048 0.210526316 0.075949367 
##       47020       47220       47260       47300       47380       47580 
## 0.465517241 0.407407407 0.050251256 0.438016529 0.329113924 0.000000000 
##       47900       47940       48140       48620       49180       49420 
## 0.121378980 0.108974359 0.010416667 0.133489461 0.055118110 0.357142857 
##       49620       49660       70750       70900       71650       71950 
## 0.042735043 0.032679739 0.014423077 0.000000000 0.069537909 0.112328767 
##       72400       72850       73450       74500       75700       76450 
## 0.009132420 0.339285714 0.105084746 0.090909091 0.073122530 0.103448276 
##       76750       77200       77350       78100       78700       79600 
## 0.011412268 0.114273205 0.030534351 0.219354839 0.248407643 0.083333333
sort(tapply(CPS$Hispanic, CPS$MetroArea, mean))
##       11340       11460       14020       14060       14540       19500 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       20740       22460       25180       25500       26580       26620 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       27100       27780       31420       33660       41540       42340 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       47580       70900       19380       33740       28940       16620 
## 0.000000000 0.000000000 0.003731343 0.005586592 0.005952381 0.007633588 
##       11540       27140       72400       33860       48140       76750 
## 0.008000000 0.009009009 0.009132420 0.009708738 0.010416667 0.011412268 
##       36780       11020       41060       26100       10420       44100 
## 0.011764706 0.012195122 0.012195122 0.012820513 0.012987013 0.013157895 
##       13380       70750       28700       16300       25060       20260 
## 0.014285714 0.014423077 0.014925373 0.015306122 0.015384615 0.015873016 
##       38300       27900       16700       15380       29100       35660 
## 0.016393443 0.016949153 0.017241379 0.017441860 0.017543860 0.019607843 
##       21780       43900       13460       34740       21500       25420 
## 0.020202020 0.020202020 0.021428571 0.022222222 0.022988506 0.022988506 
##       31540       29340       22020       17660       44060       40980 
## 0.024647887 0.024691358 0.025462963 0.025641026 0.025641026 0.027027027 
##       31340       37860       32820       44220       13740       27500 
## 0.027397260 0.028037383 0.028735632 0.029411765 0.030150754 0.030303030 
##       40220       41180       26980       77350       28020       49660 
## 0.030303030 0.030334728 0.030534351 0.030534351 0.031496063 0.032679739 
##       16580       45780       23060       30780       19820       24860 
## 0.032786885 0.034042553 0.036764706 0.037128713 0.037666174 0.037837838 
##       12940       27740       31140       33140       22420       17140 
## 0.038167939 0.038461538 0.038535645 0.038961039 0.039215686 0.040333797 
##       30460       29940       10580       13780       39460       43620 
## 0.040404040 0.040816327 0.041044776 0.041095890 0.041666667 0.042016807 
##       17860       49620       23540       40060       44180       18140 
## 0.042553191 0.042735043 0.042857143 0.042857143 0.043478261 0.043557169 
##       40420       10500       42260       46660       11500       43780 
## 0.043859649 0.044117647 0.046875000 0.047619048 0.049180328 0.049382716 
##       47260       33460       19460       13820       37340       49180 
## 0.050251256 0.052008239 0.052083333 0.053571429 0.053571429 0.055118110 
##       20100       14740       40380       34820       39540       26180 
## 0.057017544 0.057471264 0.058631922 0.058823529 0.058823529 0.059644670 
##       17460       11700       29180       37900       33780       11300 
## 0.060205580 0.060344828 0.060773481 0.062500000 0.063492063 0.064516129 
##       39340       36140       37460       28740       34980       71650 
## 0.064724919 0.066666667 0.067796610 0.068965517 0.069306931 0.069537909 
##       45220       36540       26900       75700       19780       46540 
## 0.069767442 0.070010449 0.071929825 0.073122530 0.073852295 0.075000000 
##       24660       46940       15940       21660       16860       37980 
## 0.075697211 0.075949367 0.076271186 0.076530612 0.077844311 0.078458844 
##       17900       45060       43340       12580       79600       29620 
## 0.079037801 0.080717489 0.082191781 0.082265678 0.083333333 0.084033613 
##       32780       33340       12060       22900       10900       25860 
## 0.085365854 0.085434174 0.085695876 0.085714286 0.086826347 0.087719298 
##       42660       12100       74500       27260       19340       12260 
## 0.088446215 0.090090090 0.090909091 0.091603053 0.091666667 0.093167702 
##       14260       45820       38900       19660       38940       29540 
## 0.093167702 0.093406593 0.094582185 0.100000000 0.100917431 0.102564103 
##       46220       76450       73450       36420       47940       20500 
## 0.102564103 0.103448276 0.105084746 0.107615894 0.108974359 0.111111111 
##       35380       71950       23020       77200       46140       28100 
## 0.111716621 0.112328767 0.112500000 0.114273205 0.114551084 0.114942529 
##       17020       16740       39580       17820       36500       22660 
## 0.116666667 0.117988395 0.119047619 0.120967742 0.121212121 0.121359223 
##       47900       28140       12020       30020       24580       27340 
## 0.121378980 0.121621622 0.123076923 0.123711340 0.125000000 0.126984127 
##       39140       45940       48620       29460       42540       24340 
## 0.129629630 0.131868132 0.133489461 0.134228188 0.136363636 0.138157895 
##       36260       14500       22220       42100       41620       22180 
## 0.144208038 0.146198830 0.148837209 0.151515152 0.154910097 0.155844156 
##       36100       45300       24540       16980       34940       39900 
## 0.157894737 0.159144893 0.160493827 0.167388167 0.182926829 0.196774194 
##       41860       17980       46700       39740       41420       36740 
## 0.199855700 0.203389831 0.210526316 0.211267606 0.211764706 0.213114754 
##       78100       13140       35620       34900       19740       42220 
## 0.219354839 0.227642276 0.228508042 0.229508197 0.232047872 0.232558140 
##       22140       42020       78700       29820       38060       11100 
## 0.234375000 0.246753247 0.248407643 0.251732102 0.254376931 0.261363636 
##       40900       41740       39100       19100       31180       30980 
## 0.263868066 0.269018743 0.273631841 0.283950617 0.285714286 0.292307692 
##       39380       12420       41940       44700       47380       72850 
## 0.307692308 0.310077519 0.316417910 0.321243523 0.329113924 0.339285714 
##       33700       33260       49420       26420       37100       28660 
## 0.341772152 0.352941176 0.357142857 0.359005458 0.359550562 0.386138614 
##       42060       47220       23420       47300       15980       10740 
## 0.401515152 0.407407407 0.409240924 0.438016529 0.438356164 0.441707718 
##       31100       42140       47020       33100       12540       40140 
## 0.460263286 0.461538462 0.465517241 0.467824968 0.489795918 0.502325581 
##       46060       29740       41500       32900       18580       31460 
## 0.506622517 0.542056075 0.557692308 0.566037736 0.606060606 0.614035088 
##       41700       20940       21340       15180       32580       29700 
## 0.644151565 0.686868687 0.790983607 0.797468354 0.948717949 0.966292135

3.5

Remembering that CPS$Race == “Asian” returns a TRUE/FALSE vector of whether an interviewee is Asian, determine the number of metropolitan areas in the United States from which at least 20% of interviewees are Asian.

sort(tapply(CPS$Race == "Asian", CPS$MetroArea, mean))
##       10500       11020       11100       11300       11540       11700 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       13140       13740       13780       14020       14540       15940 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       16620       17020       17980       19500       20500       20740 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       21340       21500       22140       22460       25180       26620 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       27100       27140       27500       27740       27900       28100 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       28660       28700       28940       29180       29620       29700 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       30980       31180       31340       31420       31460       32580 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       33140       33260       33780       34740       34820       35660 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       36140       36780       38940       39100       39380       39460 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       39540       39740       40220       40420       40980       41060 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       41420       41540       42100       42140       42540       43340 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       43780       43900       44220       45220       46220       46540 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       46660       46940       47020       47220       47380       48140 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 0.000000000 
##       70900       74500       78100       78700       41180       35380 
## 0.000000000 0.000000000 0.000000000 0.000000000 0.002092050 0.002724796 
##       41700       16700       33740       16860       33700       13460 
## 0.003294893 0.004310345 0.005586592 0.005988024 0.006329114 0.007142857 
##       19380       42060       42220       45780       17660       49620 
## 0.007462687 0.007575758 0.007751938 0.008510638 0.008547009 0.008547009 
##       49420       24340       43620       21780       29940       17460 
## 0.008928571 0.009868421 0.010084034 0.010101010 0.010204082 0.010279001 
##       30020       14260       25420       28740       31140       32780 
## 0.010309278 0.010869565 0.011494253 0.011494253 0.011560694 0.012195122 
##       24540       44180       13820       47940       39340       49660 
## 0.012345679 0.012422360 0.012755102 0.012820513 0.012944984 0.013071895 
##       36100       10900       18580       20100       16740       42260 
## 0.013157895 0.014970060 0.015151515 0.015350877 0.015473888 0.015625000 
##       28020       49180       27780       17820       16580       34900 
## 0.015748031 0.015748031 0.015873016 0.016129032 0.016393443 0.016393443 
##       37460       32820       18140       39140       29740       37860 
## 0.016949153 0.017241379 0.018148820 0.018518519 0.018691589 0.018691589 
##       44060       22660       22420       42340       46060       20940 
## 0.019230769 0.019417476 0.019607843 0.019801980 0.019867550 0.020202020 
##       21660       19340       19660       45820       17140       30780 
## 0.020408163 0.020833333 0.021428571 0.021978022 0.022253129 0.022277228 
##       10580       12940       14740       70750       34940       26900 
## 0.022388060 0.022900763 0.022988506 0.024038462 0.024390244 0.024561404 
##       12260       26100       22180       36260       77350       47260 
## 0.024844720 0.025641026 0.025974026 0.026004728 0.026717557 0.026800670 
##       29460       17900       22020       13380       33860       36540 
## 0.026845638 0.027491409 0.027777778 0.028571429 0.029126214 0.029258098 
##       10420       48620       12020       25060       11340       19740 
## 0.030303030 0.030444965 0.030769231 0.030769231 0.031250000 0.031914894 
##       24860       37980       25500       15980       28140       79600 
## 0.032432432 0.032924694 0.033333333 0.034246575 0.034303534 0.034722222 
##       36420       25860       30460       33100       37340       41620 
## 0.034768212 0.035087719 0.035353535 0.035392535 0.035714286 0.035961272 
##       33660       26580       40060       23060       23020       19780 
## 0.036363636 0.036585366 0.036734694 0.036764706 0.037500000 0.037924152 
##       38060       38300       71950       77200       45300       20260 
## 0.038105046 0.038251366 0.038356164 0.038966725 0.039192399 0.039682540 
##       45060       10740       19460       76750       23540       19820 
## 0.040358744 0.041050903 0.041666667 0.042796006 0.042857143 0.043574594 
##       45940       75700       27340       33340       27260       72400 
## 0.043956044 0.047430830 0.047619048 0.047619048 0.048346056 0.048706240 
##       11500       46140       39580       36740       22220       42020 
## 0.049180328 0.049535604 0.050595238 0.050819672 0.051162791 0.051948052 
##       71650       12420       15380       44100       26980       37900 
## 0.052041274 0.052325581 0.052325581 0.052631579 0.053435115 0.053571429 
##       31540       32900       22900       34980       29540       12580 
## 0.056338028 0.056603774 0.057142857 0.057425743 0.057692308 0.057990560 
##       39900       16980       14500       26420       40140       72850 
## 0.058064516 0.058441558 0.058479532 0.061249242 0.062015504 0.062500000 
##       19100       17860       40380       16300       73450       38900 
## 0.062801932 0.063829787 0.065146580 0.066326531 0.066666667 0.069788797 
##       47900       12060       76450       29340       37100       14060 
## 0.070624850 0.072809278 0.073891626 0.074074074 0.074906367 0.075000000 
##       15180       33460       29820       24660       12540       11460 
## 0.075949367 0.076725026 0.078521940 0.079681275 0.081632653 0.082352941 
##       29100       24580       47300       42660       35620       41500 
## 0.087719298 0.088235294 0.090909091 0.099601594 0.104270660 0.125000000 
##       36500       31100       41740       40900       12100       44700 
## 0.131313131 0.135056070 0.142227122 0.142428786 0.144144144 0.155440415 
##       47580       23420       46700       41940       41860       26180 
## 0.166666667 0.184818482 0.203007519 0.241791045 0.246753247 0.501903553

3.6

Normally, we would look at the sorted proportion of interviewees from each metropolitan area who have not received a high school diploma with the command:

sort(tapply(CPS$Education == "No high school diploma", CPS$MetroArea, mean))

However, none of the interviewees aged 14 and younger have an education value reported, so the mean value is reported as NA for each metropolitan area. To get mean (and related functions, like sum) to ignore missing values, you can pass the parameter na.rm=TRUE. Passing na.rm=TRUE to the tapply function, determine which metropolitan area has the smallest proportion of interviewees who have received no high school diploma.

Section 4 - Integrating Country of Birth Data

Just as we did with the metropolitan area information, merge in the country of birth information from the CountryMap data frame, replacing the CPS data frame with the result. If you accidentally overwrite CPS with the wrong values, remember that you can restore it by re-loading the data frame from CPSData.csv and then merging in the metropolitan area information using the command provided in the previous subproblem.

4.1

What is the name of the variable added to the CPS data frame by this merge operation?

"CPS"
## [1] "CPS"

How many interviewees have a missing value for the new country of birth variable?

CPS = merge(CPS, CountryMap, by.x="CountryOfBirthCode", by.y="Code", all.x=TRUE)
summary(CPS)
##  CountryOfBirthCode PeopleInHousehold       Region     
##  Min.   : 57.00     Min.   : 1.000    Midwest  :30684  
##  1st Qu.: 57.00     1st Qu.: 2.000    Northeast:25939  
##  Median : 57.00     Median : 3.000    South    :41502  
##  Mean   : 82.68     Mean   : 3.284    West     :33177  
##  3rd Qu.: 57.00     3rd Qu.: 4.000                     
##  Max.   :555.00     Max.   :15.000                     
##                                                        
##           State       MetroAreaCode        Age       
##  California  :11570   Min.   :10420   Min.   : 0.00  
##  Texas       : 7077   1st Qu.:21780   1st Qu.:19.00  
##  New York    : 5595   Median :34740   Median :39.00  
##  Florida     : 5149   Mean   :35075   Mean   :38.83  
##  Pennsylvania: 3930   3rd Qu.:41860   3rd Qu.:57.00  
##  Illinois    : 3912   Max.   :79600   Max.   :85.00  
##  (Other)     :94069   NA's   :34238                  
##           Married          Sex                          Education    
##  Divorced     :11151   Female:67481   High school            :30906  
##  Married      :55509   Male  :63821   Bachelor's degree      :19443  
##  Never Married:30772                  Some college, no degree:18863  
##  Separated    : 2027                  No high school diploma :16095  
##  Widowed      : 6505                  Associate degree       : 9913  
##  NA's         :25338                  (Other)                :10744  
##                                       NA's                   :25338  
##                Race           Hispanic                    Citizenship    
##  American Indian :  1433   Min.   :0.0000   Citizen, Native     :116639  
##  Asian           :  6520   1st Qu.:0.0000   Citizen, Naturalized:  7073  
##  Black           : 13913   Median :0.0000   Non-Citizen         :  7590  
##  Multiracial     :  2897   Mean   :0.1393                                
##  Pacific Islander:   618   3rd Qu.:0.0000                                
##  White           :105921   Max.   :1.0000                                
##                                                                          
##            EmploymentStatus                               Industry    
##  Disabled          : 5712   Educational and health services   :15017  
##  Employed          :61733   Trade                             : 8933  
##  Not in Labor Force:15246   Professional and business services: 7519  
##  Retired           :18619   Manufacturing                     : 6791  
##  Unemployed        : 4203   Leisure and hospitality           : 6364  
##  NA's              :25789   (Other)                           :21618  
##                             NA's                              :65060  
##           Country      
##  United States:115063  
##  Mexico       :  3921  
##  Philippines  :   839  
##  India        :   770  
##  China        :   581  
##  (Other)      :  9952  
##  NA's         :   176

4.2

Among all interviewees born outside of North America, which country was the most common place of birth?

sort(table(CPS$Country))
## 
##                         Cyprus                         Kosovo 
##                              0                              0 
##         Oceania, not specified       Other U. S. Island Areas 
##                              0                              0 
##                          Wales               Northern Ireland 
##                              0                              2 
##                       Tanzania                     Azerbaijan 
##                              2                              3 
##                 Czechoslovakia               St. Kitts--Nevis 
##                              3                              3 
##                        Georgia                       Barbados 
##                              5                              6 
##                        Denmark                         Latvia 
##                              6                              6 
##                          Samoa                        Senegal 
##                              6                              6 
##                      Singapore                       Slovakia 
##                              6                              6 
##                          Tonga                       Zimbabwe 
##                              6                              6 
##   South America, not specified                      St. Lucia 
##                              7                              7 
##                        Algeria        Americas, not specified 
##                              9                              9 
##                         Belize                           Fiji 
##                              9                              9 
## St. Vincent and the Grenadines                        Bahamas 
##                              9                             10 
##                        Finland                         Kuwait 
##                             10                             10 
##                      Lithuania                 Czech Republic 
##                             10                             11 
##                       Dominica                       Paraguay 
##                             11                             11 
##                        Croatia                      Macedonia 
##                             12                             12 
##                        Moldova            Antigua and Barbuda 
##                             12                             13 
##                        Belgium                        Bermuda 
##                             13                             13 
##                        Bolivia                        Grenada 
##                             13                             13 
##                          Sudan                     Cape Verde 
##                             13                             15 
##                        Eritrea                   Sierra Leone 
##                             15                             15 
##                         Uganda                        Austria 
##                             15                             17 
##                        Morocco                      Sri Lanka 
##                             17                             17 
##           U. S. Virgin Islands                        Uruguay 
##                             17                             17 
##                        Albania                         Norway 
##                             18                             18 
##          Europe, not specified                     Uzbekistan 
##                             19                             19 
##     West Indies, not specified                       Malaysia 
##                             19                             20 
##                         Serbia                         Azores 
##                             20                             22 
##                           USSR                    New Zealand 
##                             22                             23 
##                    Switzerland                          Yemen 
##                             23                             23 
##                        Belarus                       Scotland 
##                             24                             24 
##                     Yugoslavia                        Hungary 
##                             24                             25 
##                    Afghanistan                      Indonesia 
##                             26                             26 
##                    Netherlands                         Sweden 
##                             28                             28 
##                       Bulgaria                     Costa Rica 
##                             29                             29 
##                   Saudi Arabia                           Guam 
##                             29                             31 
##                       Cameroon                          Syria 
##                             32                             32 
##                        Armenia                         Jordan 
##                             35                             36 
##                          Chile            Asia, not specified 
##                             37                             39 
##                        Ireland                          Spain 
##                             39                             41 
##                     Bangladesh                      Australia 
##                             42                             43 
##                          Nepal                         Panama 
##                             44                             44 
##                        Lebanon                Myanmar (Burma) 
##                             45                             45 
##                   South Africa                         Turkey 
##                             48                             48 
##                       Cambodia                        Liberia 
##                             49                             52 
##                          Kenya                        Romania 
##                             55                             55 
##                         Greece                         Israel 
##                             56                             57 
##            Trinidad and Tobago           Bosnia & Herzegovina 
##                             60                             61 
##                      Venezuela                      Argentina 
##                             61                             64 
##                      Hong Kong                       Portugal 
##                             64                             64 
##                          Egypt                        Somalia 
##                             65                             72 
##                         France                    South Korea 
##                             73                             73 
##                          Ghana                      Nicaragua 
##                             76                             76 
##                       Ethiopia                      Elsewhere 
##                             80                             81 
##                        Nigeria                           Iraq 
##                             85                             97 
##                           Laos                         Taiwan 
##                             98                            102 
##                        Ukraine                         Guyana 
##                            104                            109 
##                       Pakistan                 United Kingdom 
##                            109                            111 
##                       Thailand          Africa, not specified 
##                            128                            129 
##                        Ecuador                           Peru 
##                            136                            136 
##                           Iran                          Italy 
##                            144                            149 
##                         Brazil                         Poland 
##                            159                            162 
##                          Haiti                         Russia 
##                            167                            173 
##                        England                          Japan 
##                            179                            187 
##                       Honduras                       Columbia 
##                            189                            206 
##                        Jamaica                      Guatemala 
##                            217                            309 
##             Dominican Republic                          Korea 
##                            330                            334 
##                         Canada                           Cuba 
##                            410                            426 
##                        Germany                        Vietnam 
##                            438                            458 
##                    El Salvador                    Puerto Rico 
##                            477                            518 
##                          China                          India 
##                            581                            770 
##                    Philippines                         Mexico 
##                            839                           3921 
##                  United States 
##                         115063

4.3

What proportion of the interviewees from the “New York-Northern New Jersey-Long Island, NY-NJ-PA” metropolitan area have a country of birth that is not the United States? For this computation, don’t include people from this metropolitan area who have a missing country of birth.

table(CPS$MetroArea == "New York-Northern New Jersey-Long Island, NY-NJ-PA", CPS$Country != "United States")
##        
##         FALSE  TRUE
##   FALSE 82493 14412

4.4

Which metropolitan area has the largest number (note – not proportion) of interviewees with a country of birth in India? Hint – remember to include na.rm=TRUE if you are using tapply() to answer this question.

  • Boston-Cambridge-Quincy, MA-NH
  • Minneapolis-St Paul-Bloomington, MN-WI
  • New York-Northern New Jersey-Long Island, NY-NJ-PA
  • Washington-Arlington-Alexandria, DC-VA-MD-WV
sort(tapply(CPS$Country == "India", CPS$MetroArea, sum, na.rm=TRUE))
## 10420 10500 10580 10900 11020 11100 11300 11460 11500 11540 11700 12020 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 12260 12940 13140 13380 13460 13740 13780 14020 14500 14540 14740 15380 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 15940 15980 16300 16580 16620 16860 17020 17660 17820 17860 17980 18140 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 18580 19340 19380 19460 19500 19740 20100 20260 20500 20740 20940 21340 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 21500 21660 21780 22020 22140 22180 22420 22460 22660 22900 23020 23540 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 24340 24540 24580 24660 25060 25180 25500 25860 26100 26580 26620 27100 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 27140 27340 27500 27740 27780 27900 28020 28100 28660 28700 28740 28940 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 29100 29180 29340 29460 29540 29620 29700 29740 30020 30460 30980 31140 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 31180 31340 31420 31460 32580 32780 32900 33140 33260 33660 33700 33740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 33780 33860 34740 34820 34900 35660 36100 36140 36780 37340 37460 37860 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 38940 39100 39140 39380 39460 39540 39580 39740 40060 40140 40220 40420 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 40980 41060 41180 41420 41500 41540 41700 42020 42060 42100 42140 42220 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 42260 42340 42540 43340 43620 43780 43900 44060 44180 44220 44700 45220 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 45780 45820 46220 46540 46660 46700 46940 47020 47220 47260 47380 47940 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 48140 48620 49420 49620 49660 70750 70900 72850 74500 76750 78100 78700 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 79600 11340 14060 14260 17140 17900 24860 25420 27260 29940 34940 35380 
##     0     1     1     1     1     1     1     1     1     1     1     1 
## 36500 39340 45060 46060 12100 12540 13820 16700 17460 19660 23060 29820 
##     1     1     1     1     2     2     2     2     2     2     2     2 
## 32820 33100 34980 36260 36420 37100 38060 40380 41620 44100 49180 72400 
##     2     2     2     2     2     2     2     2     2     2     2     2 
## 10740 26980 31540 39900 47300 76450 16740 26900 36540 37900 41740 45940 
##     3     3     3     3     3     3     4     4     4     4     4     4 
## 46140 77350 36740 42660 12420 15180 19780 30780 38900 47580 75700 45300 
##     4     4     5     5     6     6     6     6     6     6     6     7 
## 22220 40900 26180 28140 71650 33340 71950 77200 26420 12580 23420 38300 
##     8     8     9    11    11    12    12    14    15    16    16    16 
## 19100 31100 41940 33460 73450 12060 41860 19820 16980 37980 47900 35620 
##    18    19    19    23    26    27    27    30    31    32    50    96
sort(tapply(CPS$Country == "Brazil", CPS$MetroArea, sum, na.rm=TRUE))
## 10500 10580 10900 11020 11100 11300 11340 11460 11500 11540 11700 12020 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 12100 12260 12420 12540 12580 12940 13140 13380 13460 13740 13780 13820 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 14020 14060 14260 14500 14540 15180 15380 16300 16580 16620 16700 16860 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 17460 17660 17820 17860 17980 18140 18580 19380 19460 19500 19660 19780 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 19820 20100 20260 20500 20740 20940 21340 21500 21660 21780 22020 22140 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 22180 22220 22420 22460 22660 22900 23020 23060 23420 23540 24340 24540 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 24580 24660 24860 25060 25180 25420 25500 25860 26100 26180 26420 26580 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 26620 26900 26980 27100 27140 27340 27500 27740 27780 27900 28020 28100 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 28660 28700 28740 28940 29100 29180 29340 29460 29540 29620 29700 29740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 29820 29940 30020 30460 30780 30980 31180 31340 31420 31460 31540 32580 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 32780 32820 32900 33140 33260 33340 33660 33700 33780 34740 34820 34900 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 34940 34980 35380 35660 36100 36140 36260 36420 36500 36540 36780 37340 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 37460 37900 38300 38900 38940 39100 39140 39340 39380 39460 39580 39740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 39900 40060 40140 40220 40420 40980 41060 41180 41500 41540 41700 41740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 42020 42060 42100 42140 42220 42260 42340 42540 43340 43620 43780 43900 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 44060 44100 44180 44220 44700 45060 45220 45780 45820 46060 46140 46220 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 46540 46660 46700 46940 47020 47220 47300 47380 47580 47940 48140 49180 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 49420 49620 49660 70750 72400 75700 76450 76750 77350 78100 79600 10420 
##     0     0     0     0     0     0     0     0     0     0     0     1 
## 10740 12060 14740 15980 17020 17140 19740 28140 31140 33460 33740 33860 
##     1     1     1     1     1     1     1     1     1     1     1     1 
## 37100 37860 39540 40380 41420 41940 42660 45300 45940 47260 48620 73450 
##     1     1     1     1     1     1     1     1     1     1     1     1 
## 74500 78700 16740 16980 17900 19100 27260 36740 40900 70900 15940 38060 
##     1     1     2     2     2     2     2     2     2     2     3     3 
## 41620 77200 19340 37980 72850 41860 35620 71950 47900 31100 33100 71650 
##     3     3     4     4     5     6     7     7     8     9    16    18
sort(tapply(CPS$Country == "Somalia", CPS$MetroArea, sum, na.rm=TRUE))
## 10420 10500 10580 10740 10900 11020 11100 11300 11340 11460 11500 11540 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 11700 12020 12060 12100 12260 12420 12540 12580 12940 13140 13380 13460 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 13740 13780 13820 14020 14060 14260 14500 14540 14740 15180 15380 15940 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 15980 16300 16580 16620 16700 16740 16860 16980 17020 17140 17460 17660 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 17820 17860 17900 17980 18580 19100 19340 19460 19500 19660 19740 19780 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 19820 20100 20260 20500 20740 20940 21340 21500 21660 21780 22140 22180 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 22220 22420 22460 22660 22900 23020 23060 23420 23540 24340 24540 24580 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 24660 24860 25060 25180 25420 25500 25860 26100 26180 26580 26620 26900 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 26980 27100 27140 27260 27340 27500 27740 27780 27900 28020 28100 28140 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 28660 28700 28740 28940 29100 29180 29340 29460 29540 29620 29700 29740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 29820 29940 30020 30460 30780 30980 31100 31140 31180 31340 31420 31460 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 31540 32580 32780 32820 32900 33100 33140 33260 33340 33660 33700 33740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 33780 33860 34740 34820 34900 34940 34980 35380 35620 35660 36100 36140 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 36260 36420 36500 36540 36740 36780 37100 37340 37460 37860 37900 37980 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 38300 38940 39100 39140 39340 39380 39460 39540 39580 39740 39900 40140 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 40220 40380 40420 40900 40980 41180 41420 41500 41540 41620 41700 41740 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 41860 41940 42020 42060 42100 42140 42220 42260 42340 42540 43340 43780 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 43900 44060 44100 44180 44220 44700 45060 45220 45300 45780 45820 45940 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 46060 46140 46220 46540 46660 46700 46940 47020 47220 47260 47300 47380 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 47580 47900 47940 48140 48620 49180 49420 49620 49660 70750 70900 71650 
##     0     0     0     0     0     0     0     0     0     0     0     0 
## 71950 72850 73450 74500 75700 76450 77200 77350 78100 78700 79600 19380 
##     0     0     0     0     0     0     0     0     0     0     0     1 
## 40060 26420 43620 38900 72400 76750 18140 22020 38060 41060 42660 33460 
##     1     2     2     3     3     3     5     5     7     7     7    17
"New York-Northern New Jersey-Long Island, NY-NJ-PA"
## [1] "New York-Northern New Jersey-Long Island, NY-NJ-PA"

In Brazil?

  • Boston-Cambridge-Quincy, MA-NH
  • Minneapolis-St Paul-Bloomington, MN-WI
  • New York-Northern New Jersey-Long Island, NY-NJ-PA
  • Washington-Arlington-Alexandria, DC-VA-MD-WV
"Boston-Cambridge-Quincy, MA-NH"
## [1] "Boston-Cambridge-Quincy, MA-NH"

In Somalia?

  • Boston-Cambridge-Quincy, MA-NH
  • Minneapolis-St Paul-Bloomington, MN-WI
  • New York-Northern New Jersey-Long Island, NY-NJ-PA
  • Washington-Arlington-Alexandria, DC-VA-MD-WV
"Minneapolis-St Paul-Bloomington, MN-WI"
## [1] "Minneapolis-St Paul-Bloomington, MN-WI"