CSC 360 Module 2 Notes

Harold Nelson

10/31/2016

Starting Up

replace the following with whatever is necessary on your system.

load("~/cdc.Rdata")

cC <- read.delim("~/Downloads/Data from openintro.org/countyComplete.txt")

An Example

The variable exerany could be made more readable. The following code illustrates how a programmer in another language would do this.

for(i in 1:nrow(cdc)){

   if (cdc[i,'exerany'] == 0){
      cdc[i,'exerany2'] = 'No Exercise'
       } else {
         cdc[i,'exerany2'] = 'Yes Exercise'
      }
}

Let’s see if this worked.

table(cdc$exerany,cdc$exerany2)
##    
##     No Exercise Yes Exercise
##   0        5086            0
##   1           0        14914

The R Way

An experienced R user would do this differently.

cdc[cdc$exerany==0,'exerany3'] = 'No exercise'
cdc[cdc$exerany==1,'exerany3'] = 'Yes exercise'

Did this work?

table(cdc$exerany,cdc$exerany3)
##    
##     No exercise Yes exercise
##   0        5086            0
##   1           0        14914

Subsetting

This is an example of “subsetting,” a common programming technique in R.

Suppose DF is a dataframe.

DF[Row Spec,Col Spec]

Row Spec May be a logical vector or a vector of integers specifying row numbers.

Col Spec may be vector of column names enclosed in quotes or a vector of integers specifying column numbers.

A subsetting expression may appear on the right side or left side of a replacement statement or be used alone to display the subset.

Examples of Subsetting

Simple integer vectors

cC[1:100,1:5]
##       state                              name FIPS pop2010 pop2000
## 1   Alabama                    Autauga County 1001   54571   43671
## 2   Alabama                    Baldwin County 1003  182265  140415
## 3   Alabama                    Barbour County 1005   27457   29038
## 4   Alabama                       Bibb County 1007   22915   20826
## 5   Alabama                     Blount County 1009   57322   51024
## 6   Alabama                    Bullock County 1011   10914   11714
## 7   Alabama                     Butler County 1013   20947   21399
## 8   Alabama                    Calhoun County 1015  118572  112249
## 9   Alabama                   Chambers County 1017   34215   36583
## 10  Alabama                   Cherokee County 1019   25989   23988
## 11  Alabama                    Chilton County 1021   43643   39593
## 12  Alabama                    Choctaw County 1023   13859   15922
## 13  Alabama                     Clarke County 1025   25833   27867
## 14  Alabama                       Clay County 1027   13932   14254
## 15  Alabama                   Cleburne County 1029   14972   14123
## 16  Alabama                     Coffee County 1031   49948   43615
## 17  Alabama                    Colbert County 1033   54428   54984
## 18  Alabama                    Conecuh County 1035   13228   14089
## 19  Alabama                      Coosa County 1037   11539   12202
## 20  Alabama                  Covington County 1039   37765   37631
## 21  Alabama                   Crenshaw County 1041   13906   13665
## 22  Alabama                    Cullman County 1043   80406   77483
## 23  Alabama                       Dale County 1045   50251   49129
## 24  Alabama                     Dallas County 1047   43820   46365
## 25  Alabama                     DeKalb County 1049   71109   64452
## 26  Alabama                     Elmore County 1051   79303   65874
## 27  Alabama                   Escambia County 1053   38319   38440
## 28  Alabama                     Etowah County 1055  104430  103459
## 29  Alabama                    Fayette County 1057   17241   18495
## 30  Alabama                   Franklin County 1059   31704   31223
## 31  Alabama                     Geneva County 1061   26790   25764
## 32  Alabama                     Greene County 1063    9045    9974
## 33  Alabama                       Hale County 1065   15760   17185
## 34  Alabama                      Henry County 1067   17302   16310
## 35  Alabama                    Houston County 1069  101547   88787
## 36  Alabama                    Jackson County 1071   53227   53926
## 37  Alabama                  Jefferson County 1073  658466  662047
## 38  Alabama                      Lamar County 1075   14564   15904
## 39  Alabama                 Lauderdale County 1077   92709   87966
## 40  Alabama                   Lawrence County 1079   34339   34803
## 41  Alabama                        Lee County 1081  140247  115092
## 42  Alabama                  Limestone County 1083   82782   65676
## 43  Alabama                    Lowndes County 1085   11299   13473
## 44  Alabama                      Macon County 1087   21452   24105
## 45  Alabama                    Madison County 1089  334811  276700
## 46  Alabama                    Marengo County 1091   21027   22539
## 47  Alabama                     Marion County 1093   30776   31214
## 48  Alabama                   Marshall County 1095   93019   82231
## 49  Alabama                     Mobile County 1097  412992  399843
## 50  Alabama                     Monroe County 1099   23068   24324
## 51  Alabama                 Montgomery County 1101  229363  223510
## 52  Alabama                     Morgan County 1103  119490  111064
## 53  Alabama                      Perry County 1105   10591   11861
## 54  Alabama                    Pickens County 1107   19746   20949
## 55  Alabama                       Pike County 1109   32899   29605
## 56  Alabama                   Randolph County 1111   22913   22380
## 57  Alabama                    Russell County 1113   52947   49756
## 58  Alabama                  St. Clair County 1115   83593   64742
## 59  Alabama                     Shelby County 1117  195085  143293
## 60  Alabama                     Sumter County 1119   13763   14798
## 61  Alabama                  Talladega County 1121   82291   80321
## 62  Alabama                 Tallapoosa County 1123   41616   41475
## 63  Alabama                 Tuscaloosa County 1125  194656  164875
## 64  Alabama                     Walker County 1127   67023   70713
## 65  Alabama                 Washington County 1129   17581   18097
## 66  Alabama                     Wilcox County 1131   11670   13183
## 67  Alabama                    Winston County 1133   24484   24843
## 68   Alaska            Aleutians East Borough 2013    3141    2697
## 69   Alaska        Aleutians West Census Area 2016    5561    5465
## 70   Alaska            Anchorage Municipality 2020  291826  260283
## 71   Alaska                Bethel Census Area 2050   17013   16006
## 72   Alaska               Bristol Bay Borough 2060     997    1258
## 73   Alaska                    Denali Borough 2068    1826    1893
## 74   Alaska            Dillingham Census Area 2070    4847    4922
## 75   Alaska      Fairbanks North Star Borough 2090   97581   82840
## 76   Alaska                    Haines Borough 2100    2508    2392
## 77   Alaska         Hoonah-Angoon Census Area 2105    2150    3436
## 78   Alaska           Juneau City and Borough 2110   31275   30711
## 79   Alaska           Kenai Peninsula Borough 2122   55400   49691
## 80   Alaska         Ketchikan Gateway Borough 2130   13477   14070
## 81   Alaska             Kodiak Island Borough 2150   13592   13913
## 82   Alaska        Lake and Peninsula Borough 2164    1631    1823
## 83   Alaska         Matanuska-Susitna Borough 2170   88995   59322
## 84   Alaska                  Nome Census Area 2180    9492    9196
## 85   Alaska               North Slope Borough 2185    9430    7385
## 86   Alaska          Northwest Arctic Borough 2188    7523    7208
## 87   Alaska            Petersburg Census Area 2195    3815    6684
## 88   Alaska Prince of Wales-Hyder Census Area 2198    5559    6146
## 89   Alaska            Sitka City and Borough 2220    8881    8835
## 90   Alaska              Skagway Municipality 2230     968      NA
## 91   Alaska   Southeast Fairbanks Census Area 2240    7029    6174
## 92   Alaska        Valdez-Cordova Census Area 2261    9636   10195
## 93   Alaska          Wade Hampton Census Area 2270    7459    7028
## 94   Alaska         Wrangell City and Borough 2275    2369      NA
## 95   Alaska          Yakutat City and Borough 2282     662     808
## 96   Alaska         Yukon-Koyukuk Census Area 2290    5588    6551
## 97  Arizona                     Apache County 4001   71518   69423
## 98  Arizona                    Cochise County 4003  131346  117755
## 99  Arizona                   Coconino County 4005  134421  116320
## 100 Arizona                       Gila County 4007   53597   51335

Logical specification of rows

PNW = cC[cC$state == "Washington" | cC$state == "Oregon",c(1:2,c(10:15))]
PNW
##           state                name white black native asian pac_isl
## 2209     Oregon        Baker County  94.6   0.4    1.1   0.5      NA
## 2210     Oregon       Benton County  87.1   0.9    0.7   5.2     0.2
## 2211     Oregon    Clackamas County  88.2   0.8    0.8   3.7     0.2
## 2212     Oregon      Clatsop County  90.9   0.5    1.0   1.2     0.2
## 2213     Oregon     Columbia County  92.5   0.4    1.3   0.9     0.2
## 2214     Oregon         Coos County  89.8   0.4    2.5   1.0     0.2
## 2215     Oregon        Crook County  92.7   0.2    1.4   0.5      NA
## 2216     Oregon        Curry County  92.0   0.3    1.9   0.7     0.1
## 2217     Oregon    Deschutes County  92.2   0.4    0.9   0.9     0.1
## 2218     Oregon      Douglas County  92.4   0.3    1.8   1.0     0.1
## 2219     Oregon      Gilliam County  95.2   0.2    1.0   0.2     0.7
## 2220     Oregon        Grant County  95.0   0.2    1.2   0.3     0.1
## 2221     Oregon       Harney County  91.9   0.3    3.1   0.5     0.0
## 2222     Oregon   Hood River County  83.1   0.5    0.8   1.4     0.2
## 2223     Oregon      Jackson County  88.7   0.7    1.2   1.2     0.3
## 2224     Oregon    Jefferson County  69.0   0.6   16.9   0.4     0.1
## 2225     Oregon    Josephine County  92.4   0.4    1.4   0.8     0.2
## 2226     Oregon      Klamath County  85.9   0.7    4.1   0.9     0.1
## 2227     Oregon         Lake County  90.3   0.5    2.1   0.7     0.1
## 2228     Oregon         Lane County  88.3   1.0    1.2   2.4     0.2
## 2229     Oregon      Lincoln County  87.7   0.4    3.5   1.1     0.1
## 2230     Oregon         Linn County  90.6   0.5    1.3   1.0     0.1
## 2231     Oregon      Malheur County  77.5   1.2    1.2   1.7     0.1
## 2232     Oregon       Marion County  78.2   1.1    1.6   1.9     0.7
## 2233     Oregon       Morrow County  77.7   0.5    1.2   0.9     0.1
## 2234     Oregon    Multnomah County  76.5   5.6    1.1   6.5     0.5
## 2235     Oregon         Polk County  85.9   0.6    2.1   1.9     0.3
## 2236     Oregon      Sherman County  93.4   0.2    1.6   0.2     0.1
## 2237     Oregon    Tillamook County  91.5   0.3    1.0   0.9     0.2
## 2238     Oregon     Umatilla County  79.1   0.8    3.5   0.9     0.1
## 2239     Oregon        Union County  93.1   0.5    1.1   0.8     0.9
## 2240     Oregon      Wallowa County  96.0   0.4    0.6   0.3      NA
## 2241     Oregon        Wasco County  86.1   0.4    4.4   0.8     0.6
## 2242     Oregon   Washington County  76.6   1.8    0.7   8.6     0.5
## 2243     Oregon      Wheeler County  92.4   0.0    1.2   0.6     0.1
## 2244     Oregon      Yamhill County  85.4   0.9    1.5   1.5     0.2
## 2955 Washington        Adams County  62.5   0.6    1.9   0.7     0.0
## 2956 Washington       Asotin County  94.3   0.4    1.4   0.5      NA
## 2957 Washington       Benton County  82.4   1.3    0.9   2.7     0.1
## 2958 Washington       Chelan County  79.3   0.3    1.0   0.8     0.1
## 2959 Washington      Clallam County  87.0   0.8    5.1   1.4     0.1
## 2960 Washington        Clark County  85.4   2.0    0.9   4.1     0.6
## 2961 Washington     Columbia County  93.0   0.3    1.4   0.6      NA
## 2962 Washington      Cowlitz County  88.9   0.6    1.5   1.5     0.2
## 2963 Washington      Douglas County  79.6   0.3    1.1   0.7     0.1
## 2964 Washington        Ferry County  76.3   0.3   16.7   0.7     0.1
## 2965 Washington     Franklin County  60.5   1.9    0.7   1.8     0.1
## 2966 Washington     Garfield County  93.8   0.0    0.3   1.7      NA
## 2967 Washington        Grant County  72.8   1.1    1.2   0.9     0.1
## 2968 Washington Grays Harbor County  84.9   1.1    4.6   1.4     0.3
## 2969 Washington       Island County  86.1   2.2    0.8   4.4     0.5
## 2970 Washington    Jefferson County  91.0   0.8    2.3   1.6     0.2
## 2971 Washington         King County  68.7   6.2    0.8  14.6     0.8
## 2972 Washington       Kitsap County  82.6   2.6    1.6   4.9     0.9
## 2973 Washington     Kittitas County  89.3   0.9    1.0   2.0     0.1
## 2974 Washington    Klickitat County  87.7   0.2    2.4   0.6     0.1
## 2975 Washington        Lewis County  89.7   0.5    1.4   0.9     0.2
## 2976 Washington      Lincoln County  95.0   0.3    1.6   0.4     0.0
## 2977 Washington        Mason County  86.1   1.1    3.7   1.2     0.4
## 2978 Washington     Okanogan County  73.9   0.4   11.4   0.6     0.1
## 2979 Washington      Pacific County  87.4   0.4    2.3   2.0     0.1
## 2980 Washington Pend Oreille County  91.6   0.4    3.8   0.6     0.1
## 2981 Washington       Pierce County  74.2   6.8    1.4   6.0     1.3
## 2982 Washington     San Juan County  92.6   0.3    0.7   1.1     0.1
## 2983 Washington       Skagit County  83.4   0.7    2.2   1.8     0.2
## 2984 Washington     Skamania County  92.8   0.4    1.6   0.9     0.1
## 2985 Washington    Snohomish County  78.4   2.5    1.4   8.9     0.4
## 2986 Washington      Spokane County  89.2   1.7    1.5   2.1     0.4
## 2987 Washington      Stevens County  89.4   0.3    5.5   0.5     0.2
## 2988 Washington     Thurston County  82.4   2.7    1.4   5.2     0.8
## 2989 Washington    Wahkiakum County  94.0   0.3    1.3   0.6     0.2
## 2990 Washington  Walla Walla County  84.5   1.8    1.0   1.3     0.3
## 2991 Washington      Whatcom County  85.4   1.0    2.8   3.5     0.2
## 2992 Washington      Whitman County  84.6   1.7    0.7   7.8     0.2
## 2993 Washington       Yakima County  63.7   1.0    4.3   1.1     0.1
##      two_plus_races
## 2209            2.4
## 2210            3.6
## 2211            3.2
## 2212            2.8
## 2213            3.4
## 2214            4.3
## 2215            2.0
## 2216            3.7
## 2217            2.5
## 2218            3.2
## 2219            1.4
## 2220            2.3
## 2221            3.0
## 2222            3.2
## 2223            3.5
## 2224            3.8
## 2225            3.2
## 2226            4.1
## 2227            3.3
## 2228            4.2
## 2229            3.7
## 2230            3.3
## 2231            2.9
## 2232            3.9
## 2233            2.6
## 2234            4.6
## 2235            3.8
## 2236            1.8
## 2237            2.4
## 2238            3.1
## 2239            2.3
## 2240            2.0
## 2241            2.5
## 2242            4.3
## 2243            3.1
## 2244            3.3
## 2955            2.8
## 2956            2.4
## 2957            3.6
## 2958            2.7
## 2959            3.8
## 2960            4.0
## 2961            2.7
## 2962            3.7
## 2963            2.6
## 2964            4.8
## 2965            3.2
## 2966            1.9
## 2967            3.5
## 2968            3.9
## 2969            4.5
## 2970            3.4
## 2971            5.0
## 2972            5.8
## 2973            3.0
## 2974            3.3
## 2975            3.2
## 2976            2.2
## 2977            4.1
## 2978            3.5
## 2979            3.4
## 2980            2.9
## 2981            6.8
## 2982            2.5
## 2983            3.2
## 2984            3.0
## 2985            4.6
## 2986            3.8
## 2987            3.3
## 2988            5.3
## 2989            3.1
## 2990            3.1
## 2991            3.8
## 2992            3.6
## 2993            3.7

Dealing with Factors in Subsets

Get all of the variables for PNW.

PNW = cC[cC$state %in% c("Oregon","Washington"),]

Note the use of %in% and the placeholder comma.

Let’s do a side-by-side boxplot of pop2010 by state.

boxplot(PNW$pop2010~PNW$state)

Why is our graph cluttered up with the names of all of the states. What can we do about this?

Check the structure of the large dataframe and examine the variable state.

str(cC)
## 'data.frame':    3143 obs. of  53 variables:
##  $ state                                    : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ name                                     : Factor w/ 1877 levels "Abbeville County",..: 83 90 101 151 166 227 237 250 298 320 ...
##  $ FIPS                                     : int  1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ...
##  $ pop2010                                  : int  54571 182265 27457 22915 57322 10914 20947 118572 34215 25989 ...
##  $ pop2000                                  : int  43671 140415 29038 20826 51024 11714 21399 112249 36583 23988 ...
##  $ age_under_5                              : num  6.6 6.1 6.2 6 6.3 6.8 6.5 6.1 5.7 5.3 ...
##  $ age_under_18                             : num  26.8 23 21.9 22.7 24.6 22.3 24.1 22.9 22.5 21.4 ...
##  $ age_over_65                              : num  12 16.8 14.2 12.7 14.7 13.5 16.7 14.3 16.7 17.9 ...
##  $ female                                   : num  51.3 51.1 46.9 46.3 50.5 45.8 53 51.8 52.2 50.4 ...
##  $ white                                    : num  78.5 85.7 48 75.8 92.6 23 54.4 74.9 58.8 92.7 ...
##  $ black                                    : num  17.7 9.4 46.9 22 1.3 70.2 43.4 20.6 38.7 4.6 ...
##  $ native                                   : num  0.4 0.7 0.4 0.3 0.5 0.2 0.3 0.5 0.2 0.5 ...
##  $ asian                                    : num  0.9 0.7 0.4 0.1 0.2 0.2 0.8 0.7 0.5 0.2 ...
##  $ pac_isl                                  : num  NA NA NA NA NA NA 0 0.1 0 0 ...
##  $ two_plus_races                           : num  1.6 1.5 0.9 0.9 1.2 0.8 0.8 1.7 1.1 1.5 ...
##  $ hispanic                                 : num  2.4 4.4 5.1 1.8 8.1 7.1 0.9 3.3 1.6 1.2 ...
##  $ white_not_hispanic                       : num  77.2 83.5 46.8 75 88.9 21.9 54.1 73.6 58.1 92.1 ...
##  $ no_move_in_one_plus_year                 : num  86.3 83 83 90.5 87.2 88.5 92.8 82.9 86.2 88.1 ...
##  $ foreign_born                             : num  2 3.6 2.8 0.7 4.7 1.1 1.1 2.5 0.9 0.5 ...
##  $ foreign_spoken_at_home                   : num  3.7 5.5 4.7 1.5 7.2 3.8 1.6 4.5 1.6 1.4 ...
##  $ hs_grad                                  : num  85.3 87.6 71.9 74.5 74.7 74.7 74.8 78.5 71.8 73.4 ...
##  $ bachelors                                : num  21.7 26.8 13.5 10 12.5 12 11 16.1 10.8 10.5 ...
##  $ veterans                                 : int  5817 20396 2327 1883 4072 943 1675 11757 2893 2172 ...
##  $ mean_work_travel                         : num  25.1 25.8 23.8 28.3 33.2 28.1 25.1 22.1 23.6 26.2 ...
##  $ housing_units                            : int  22135 104061 11829 8981 23887 4493 9964 53289 17004 16267 ...
##  $ home_ownership                           : num  77.5 76.7 68 82.9 82 76.9 69 70.7 71.4 77.5 ...
##  $ housing_multi_unit                       : num  7.2 22.6 11.1 6.6 3.7 9.9 13.7 14.3 8.7 4.3 ...
##  $ median_val_owner_occupied                : num  133900 177200 88200 81200 113700 ...
##  $ households                               : int  19718 69476 9795 7441 20605 3732 8019 46421 13681 11352 ...
##  $ persons_per_household                    : num  2.7 2.5 2.52 3.02 2.73 2.85 2.58 2.46 2.51 2.22 ...
##  $ per_capita_income                        : int  24568 26469 15875 19918 21070 20289 16916 20574 16626 21322 ...
##  $ median_household_income                  : int  53255 50147 33219 41770 45549 31602 30659 38407 31467 40690 ...
##  $ poverty                                  : num  10.6 12.2 25 12.6 13.4 25.3 25 19.5 20.3 17.6 ...
##  $ private_nonfarm_establishments           : int  877 4812 522 318 749 120 446 2444 568 350 ...
##  $ private_nonfarm_employment               : int  10628 52233 7990 2927 6968 1919 5400 38324 6241 3600 ...
##  $ percent_change_private_nonfarm_employment: num  16.6 17.4 -27 -14 -11.4 -18.5 2.1 -5.6 -45.8 5.4 ...
##  $ nonemployment_establishments             : int  2971 14175 1527 1192 3501 390 1180 6329 2074 1627 ...
##  $ firms                                    : int  4067 19035 1667 1385 4458 417 1769 8713 1981 2180 ...
##  $ black_owned_firms                        : num  15.2 2.7 NA 14.9 NA NA NA 7.2 NA NA ...
##  $ native_owned_firms                       : num  NA 0.4 NA NA NA NA NA NA NA NA ...
##  $ asian_owned_firms                        : num  1.3 1 NA NA NA NA 3.3 1.6 NA NA ...
##  $ pac_isl_owned_firms                      : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ hispanic_owned_firms                     : num  0.7 1.3 NA NA NA NA NA 0.5 NA NA ...
##  $ women_owned_firms                        : num  31.7 27.3 27 NA 23.2 38.8 NA 24.7 29.3 14.5 ...
##  $ manufacturer_shipments_2007              : int  NA 1410273 NA 0 341544 NA 399132 2679991 667283 307439 ...
##  $ mercent_whole_sales_2007                 : int  NA NA NA NA NA NA 56712 NA NA 62293 ...
##  $ sales                                    : int  598175 2966489 188337 124707 319700 43810 229277 1542981 264650 186321 ...
##  $ sales_per_capita                         : int  12003 17166 6334 5804 5622 3995 11326 13678 7620 7613 ...
##  $ accommodation_food_service               : int  88157 436955 NA 10757 20941 3670 28427 186533 23237 13948 ...
##  $ building_permits                         : int  191 696 10 8 18 1 3 107 10 6 ...
##  $ fed_spending                             : int  331142 1119082 240308 163201 294114 108846 195055 1830659 294718 184642 ...
##  $ area                                     : num  594 1590 885 623 645 ...
##  $ density                                  : num  91.8 114.6 31 36.8 88.9 ...

Now look at the small dataframe and examine state.

str(PNW)
## 'data.frame':    75 obs. of  53 variables:
##  $ state                                    : Factor w/ 51 levels "Alabama","Alaska",..: 38 38 38 38 38 38 38 38 38 38 ...
##  $ name                                     : Factor w/ 1877 levels "Abbeville County",..: 89 140 342 351 386 402 428 439 481 510 ...
##  $ FIPS                                     : int  41001 41003 41005 41007 41009 41011 41013 41015 41017 41019 ...
##  $ pop2010                                  : int  16134 85579 375992 37039 49351 63043 20978 22364 157733 107667 ...
##  $ pop2000                                  : int  16741 78153 338391 35630 43560 62779 19182 21137 115367 100399 ...
##  $ age_under_5                              : num  5.3 4.4 5.7 5.6 5.7 5.1 5.4 3.8 6.1 5.2 ...
##  $ age_under_18                             : num  20.3 17.8 23.7 20.5 23.5 18.9 21.9 15.7 23 20.5 ...
##  $ age_over_65                              : num  22 12 13.6 16.6 13.9 21.4 20 28 14.9 21 ...
##  $ female                                   : num  49.5 49.9 50.8 50.3 49.9 50.7 50.4 50.6 50.6 50.6 ...
##  $ white                                    : num  94.6 87.1 88.2 90.9 92.5 89.8 92.7 92 92.2 92.4 ...
##  $ black                                    : num  0.4 0.9 0.8 0.5 0.4 0.4 0.2 0.3 0.4 0.3 ...
##  $ native                                   : num  1.1 0.7 0.8 1 1.3 2.5 1.4 1.9 0.9 1.8 ...
##  $ asian                                    : num  0.5 5.2 3.7 1.2 0.9 1 0.5 0.7 0.9 1 ...
##  $ pac_isl                                  : num  NA 0.2 0.2 0.2 0.2 0.2 NA 0.1 0.1 0.1 ...
##  $ two_plus_races                           : num  2.4 3.6 3.2 2.8 3.4 4.3 2 3.7 2.5 3.2 ...
##  $ hispanic                                 : num  3.3 6.4 7.7 7.7 4 5.4 7 5.4 7.4 4.7 ...
##  $ white_not_hispanic                       : num  92.6 83.6 84.5 87.2 90.3 87 89.4 88.7 88.4 89.5 ...
##  $ no_move_in_one_plus_year                 : num  84.7 74.6 85.3 79.2 85.2 79.3 81.8 83.3 83.6 82.6 ...
##  $ foreign_born                             : num  0.5 8.3 8.5 5 3.2 3.2 4.2 3.8 4.7 2.7 ...
##  $ foreign_spoken_at_home                   : num  0.9 11 11.2 7.8 4.4 4.6 5.8 4.4 6.5 4.5 ...
##  $ hs_grad                                  : num  88.7 94.2 91.5 91.1 88.5 85.8 85.7 91.6 92.9 86.4 ...
##  $ bachelors                                : num  20.5 47.9 31.4 21.6 16.8 18.3 15.4 18.5 29.1 15.5 ...
##  $ veterans                                 : int  2214 6263 31985 4447 5990 9268 2642 3536 15683 14861 ...
##  $ mean_work_travel                         : num  14.8 17.9 26 17.5 31.1 18.9 20.6 14.1 18.6 20 ...
##  $ housing_units                            : int  8826 36245 156945 21546 20698 30593 10202 12613 80139 48915 ...
##  $ home_ownership                           : num  71.2 57.2 70.7 62 76.8 67 72.1 70.8 68 70.6 ...
##  $ housing_multi_unit                       : num  11 30 21.3 22.7 10.6 12.9 7.2 10.4 14.9 12.1 ...
##  $ median_val_owner_occupied                : num  142400 263200 331100 253100 219800 ...
##  $ households                               : int  6902 33471 143357 16267 19075 27247 8754 10473 63190 43916 ...
##  $ persons_per_household                    : num  2.32 2.35 2.56 2.18 2.56 2.29 2.45 2.14 2.43 2.41 ...
##  $ per_capita_income                        : int  21683 26177 31785 25347 24613 21981 22275 23842 27920 21342 ...
##  $ median_household_income                  : int  39704 48012 62007 42223 55199 37491 46059 37469 53071 39711 ...
##  $ poverty                                  : num  19.9 19.1 9 12.8 10.3 16.4 14 13.9 10.5 15.6 ...
##  $ private_nonfarm_establishments           : int  548 2077 10865 1480 964 1720 501 721 5951 2621 ...
##  $ private_nonfarm_employment               : int  3840 25094 124588 13640 7284 17007 3917 4854 52759 28065 ...
##  $ percent_change_private_nonfarm_employment: num  -1.8 -7.9 7.7 19.8 -19.4 -6.5 -22.3 -0.7 23 -7.2 ...
##  $ nonemployment_establishments             : int  1161 5182 26594 2771 2719 3713 1410 1832 14497 5877 ...
##  $ firms                                    : int  1761 7157 36997 4221 4017 5034 2236 2490 21423 9160 ...
##  $ black_owned_firms                        : num  NA NA 0.6 NA NA NA NA NA 0.3 NA ...
##  $ native_owned_firms                       : num  NA NA 0.6 NA 1.1 NA NA 1.2 1 NA ...
##  $ asian_owned_firms                        : num  NA 3.4 3.6 2 NA NA NA NA 1.6 1.8 ...
##  $ pac_isl_owned_firms                      : num  NA NA NA NA NA NA NA NA 0.1 NA ...
##  $ hispanic_owned_firms                     : num  NA NA 2.7 NA NA NA NA NA 1.8 NA ...
##  $ women_owned_firms                        : num  NA 29.3 28.9 22.7 27.5 30.2 29.2 26.4 26.2 23.6 ...
##  $ manufacturer_shipments_2007              : int  137989 590473 5668235 683627 826200 279107 209877 190654 897387 1506405 ...
##  $ mercent_whole_sales_2007                 : int  19141 387072 5292449 75515 96784 260759 100505 NA 1222598 NA ...
##  $ sales                                    : int  155456 685172 5095774 604480 357383 716036 139916 227007 2809059 1169216 ...
##  $ sales_per_capita                         : int  9710 8398 13567 16299 7306 11339 6147 10555 18311 11276 ...
##  $ accommodation_food_service               : int  25659 134027 607444 160570 44223 83538 22704 56246 416425 228371 ...
##  $ building_permits                         : int  34 93 665 160 62 24 61 27 377 181 ...
##  $ fed_spending                             : int  171016 600481 2206421 356029 329573 643165 173511 225362 897936 1045528 ...
##  $ area                                     : num  3068 676 1870 829 657 ...
##  $ density                                  : num  5.3 126.6 201 44.7 75.1 ...
table(PNW$state)
## 
##              Alabama               Alaska              Arizona 
##                    0                    0                    0 
##             Arkansas           California             Colorado 
##                    0                    0                    0 
##          Connecticut             Delaware District of Columbia 
##                    0                    0                    0 
##              Florida              Georgia               Hawaii 
##                    0                    0                    0 
##                Idaho             Illinois              Indiana 
##                    0                    0                    0 
##                 Iowa               Kansas             Kentucky 
##                    0                    0                    0 
##            Louisiana                Maine             Maryland 
##                    0                    0                    0 
##        Massachusetts             Michigan            Minnesota 
##                    0                    0                    0 
##          Mississippi             Missouri              Montana 
##                    0                    0                    0 
##             Nebraska               Nevada        New Hampshire 
##                    0                    0                    0 
##           New Jersey           New Mexico             New York 
##                    0                    0                    0 
##       North Carolina         North Dakota                 Ohio 
##                    0                    0                    0 
##             Oklahoma               Oregon         Pennsylvania 
##                    0                   36                    0 
##         Rhode Island       South Carolina         South Dakota 
##                    0                    0                    0 
##            Tennessee                Texas                 Utah 
##                    0                    0                    0 
##              Vermont             Virginia           Washington 
##                    0                    0                   39 
##        West Virginia            Wisconsin              Wyoming 
##                    0                    0                    0

We see that the factor county also has every county in the entire US. How do we fix this?

PNW$state = as.character(PNW$state)
PNW$name = as.character(PNW$name)
str(PNW)
## 'data.frame':    75 obs. of  53 variables:
##  $ state                                    : chr  "Oregon" "Oregon" "Oregon" "Oregon" ...
##  $ name                                     : chr  "Baker County" "Benton County" "Clackamas County" "Clatsop County" ...
##  $ FIPS                                     : int  41001 41003 41005 41007 41009 41011 41013 41015 41017 41019 ...
##  $ pop2010                                  : int  16134 85579 375992 37039 49351 63043 20978 22364 157733 107667 ...
##  $ pop2000                                  : int  16741 78153 338391 35630 43560 62779 19182 21137 115367 100399 ...
##  $ age_under_5                              : num  5.3 4.4 5.7 5.6 5.7 5.1 5.4 3.8 6.1 5.2 ...
##  $ age_under_18                             : num  20.3 17.8 23.7 20.5 23.5 18.9 21.9 15.7 23 20.5 ...
##  $ age_over_65                              : num  22 12 13.6 16.6 13.9 21.4 20 28 14.9 21 ...
##  $ female                                   : num  49.5 49.9 50.8 50.3 49.9 50.7 50.4 50.6 50.6 50.6 ...
##  $ white                                    : num  94.6 87.1 88.2 90.9 92.5 89.8 92.7 92 92.2 92.4 ...
##  $ black                                    : num  0.4 0.9 0.8 0.5 0.4 0.4 0.2 0.3 0.4 0.3 ...
##  $ native                                   : num  1.1 0.7 0.8 1 1.3 2.5 1.4 1.9 0.9 1.8 ...
##  $ asian                                    : num  0.5 5.2 3.7 1.2 0.9 1 0.5 0.7 0.9 1 ...
##  $ pac_isl                                  : num  NA 0.2 0.2 0.2 0.2 0.2 NA 0.1 0.1 0.1 ...
##  $ two_plus_races                           : num  2.4 3.6 3.2 2.8 3.4 4.3 2 3.7 2.5 3.2 ...
##  $ hispanic                                 : num  3.3 6.4 7.7 7.7 4 5.4 7 5.4 7.4 4.7 ...
##  $ white_not_hispanic                       : num  92.6 83.6 84.5 87.2 90.3 87 89.4 88.7 88.4 89.5 ...
##  $ no_move_in_one_plus_year                 : num  84.7 74.6 85.3 79.2 85.2 79.3 81.8 83.3 83.6 82.6 ...
##  $ foreign_born                             : num  0.5 8.3 8.5 5 3.2 3.2 4.2 3.8 4.7 2.7 ...
##  $ foreign_spoken_at_home                   : num  0.9 11 11.2 7.8 4.4 4.6 5.8 4.4 6.5 4.5 ...
##  $ hs_grad                                  : num  88.7 94.2 91.5 91.1 88.5 85.8 85.7 91.6 92.9 86.4 ...
##  $ bachelors                                : num  20.5 47.9 31.4 21.6 16.8 18.3 15.4 18.5 29.1 15.5 ...
##  $ veterans                                 : int  2214 6263 31985 4447 5990 9268 2642 3536 15683 14861 ...
##  $ mean_work_travel                         : num  14.8 17.9 26 17.5 31.1 18.9 20.6 14.1 18.6 20 ...
##  $ housing_units                            : int  8826 36245 156945 21546 20698 30593 10202 12613 80139 48915 ...
##  $ home_ownership                           : num  71.2 57.2 70.7 62 76.8 67 72.1 70.8 68 70.6 ...
##  $ housing_multi_unit                       : num  11 30 21.3 22.7 10.6 12.9 7.2 10.4 14.9 12.1 ...
##  $ median_val_owner_occupied                : num  142400 263200 331100 253100 219800 ...
##  $ households                               : int  6902 33471 143357 16267 19075 27247 8754 10473 63190 43916 ...
##  $ persons_per_household                    : num  2.32 2.35 2.56 2.18 2.56 2.29 2.45 2.14 2.43 2.41 ...
##  $ per_capita_income                        : int  21683 26177 31785 25347 24613 21981 22275 23842 27920 21342 ...
##  $ median_household_income                  : int  39704 48012 62007 42223 55199 37491 46059 37469 53071 39711 ...
##  $ poverty                                  : num  19.9 19.1 9 12.8 10.3 16.4 14 13.9 10.5 15.6 ...
##  $ private_nonfarm_establishments           : int  548 2077 10865 1480 964 1720 501 721 5951 2621 ...
##  $ private_nonfarm_employment               : int  3840 25094 124588 13640 7284 17007 3917 4854 52759 28065 ...
##  $ percent_change_private_nonfarm_employment: num  -1.8 -7.9 7.7 19.8 -19.4 -6.5 -22.3 -0.7 23 -7.2 ...
##  $ nonemployment_establishments             : int  1161 5182 26594 2771 2719 3713 1410 1832 14497 5877 ...
##  $ firms                                    : int  1761 7157 36997 4221 4017 5034 2236 2490 21423 9160 ...
##  $ black_owned_firms                        : num  NA NA 0.6 NA NA NA NA NA 0.3 NA ...
##  $ native_owned_firms                       : num  NA NA 0.6 NA 1.1 NA NA 1.2 1 NA ...
##  $ asian_owned_firms                        : num  NA 3.4 3.6 2 NA NA NA NA 1.6 1.8 ...
##  $ pac_isl_owned_firms                      : num  NA NA NA NA NA NA NA NA 0.1 NA ...
##  $ hispanic_owned_firms                     : num  NA NA 2.7 NA NA NA NA NA 1.8 NA ...
##  $ women_owned_firms                        : num  NA 29.3 28.9 22.7 27.5 30.2 29.2 26.4 26.2 23.6 ...
##  $ manufacturer_shipments_2007              : int  137989 590473 5668235 683627 826200 279107 209877 190654 897387 1506405 ...
##  $ mercent_whole_sales_2007                 : int  19141 387072 5292449 75515 96784 260759 100505 NA 1222598 NA ...
##  $ sales                                    : int  155456 685172 5095774 604480 357383 716036 139916 227007 2809059 1169216 ...
##  $ sales_per_capita                         : int  9710 8398 13567 16299 7306 11339 6147 10555 18311 11276 ...
##  $ accommodation_food_service               : int  25659 134027 607444 160570 44223 83538 22704 56246 416425 228371 ...
##  $ building_permits                         : int  34 93 665 160 62 24 61 27 377 181 ...
##  $ fed_spending                             : int  171016 600481 2206421 356029 329573 643165 173511 225362 897936 1045528 ...
##  $ area                                     : num  3068 676 1870 829 657 ...
##  $ density                                  : num  5.3 126.6 201 44.7 75.1 ...
table(PNW$state)
## 
##     Oregon Washington 
##         36         39

Suppose we want to have a factor with the state information. Some statistical procedures want a categorical to be in the form of a factor. We can create a new factor with the factor() function.

PNW$state_f1 = factor(PNW$state)
PNW$state_f2 = factor(PNW$state,labels=c("OR","WA"))
PNW$state_f2c = as.character(PNW$state_f2)
smallDF = PNW[c("state","state_f1","state_f2","state_f2c")]
smallDF
##           state   state_f1 state_f2 state_f2c
## 2209     Oregon     Oregon       OR        OR
## 2210     Oregon     Oregon       OR        OR
## 2211     Oregon     Oregon       OR        OR
## 2212     Oregon     Oregon       OR        OR
## 2213     Oregon     Oregon       OR        OR
## 2214     Oregon     Oregon       OR        OR
## 2215     Oregon     Oregon       OR        OR
## 2216     Oregon     Oregon       OR        OR
## 2217     Oregon     Oregon       OR        OR
## 2218     Oregon     Oregon       OR        OR
## 2219     Oregon     Oregon       OR        OR
## 2220     Oregon     Oregon       OR        OR
## 2221     Oregon     Oregon       OR        OR
## 2222     Oregon     Oregon       OR        OR
## 2223     Oregon     Oregon       OR        OR
## 2224     Oregon     Oregon       OR        OR
## 2225     Oregon     Oregon       OR        OR
## 2226     Oregon     Oregon       OR        OR
## 2227     Oregon     Oregon       OR        OR
## 2228     Oregon     Oregon       OR        OR
## 2229     Oregon     Oregon       OR        OR
## 2230     Oregon     Oregon       OR        OR
## 2231     Oregon     Oregon       OR        OR
## 2232     Oregon     Oregon       OR        OR
## 2233     Oregon     Oregon       OR        OR
## 2234     Oregon     Oregon       OR        OR
## 2235     Oregon     Oregon       OR        OR
## 2236     Oregon     Oregon       OR        OR
## 2237     Oregon     Oregon       OR        OR
## 2238     Oregon     Oregon       OR        OR
## 2239     Oregon     Oregon       OR        OR
## 2240     Oregon     Oregon       OR        OR
## 2241     Oregon     Oregon       OR        OR
## 2242     Oregon     Oregon       OR        OR
## 2243     Oregon     Oregon       OR        OR
## 2244     Oregon     Oregon       OR        OR
## 2955 Washington Washington       WA        WA
## 2956 Washington Washington       WA        WA
## 2957 Washington Washington       WA        WA
## 2958 Washington Washington       WA        WA
## 2959 Washington Washington       WA        WA
## 2960 Washington Washington       WA        WA
## 2961 Washington Washington       WA        WA
## 2962 Washington Washington       WA        WA
## 2963 Washington Washington       WA        WA
## 2964 Washington Washington       WA        WA
## 2965 Washington Washington       WA        WA
## 2966 Washington Washington       WA        WA
## 2967 Washington Washington       WA        WA
## 2968 Washington Washington       WA        WA
## 2969 Washington Washington       WA        WA
## 2970 Washington Washington       WA        WA
## 2971 Washington Washington       WA        WA
## 2972 Washington Washington       WA        WA
## 2973 Washington Washington       WA        WA
## 2974 Washington Washington       WA        WA
## 2975 Washington Washington       WA        WA
## 2976 Washington Washington       WA        WA
## 2977 Washington Washington       WA        WA
## 2978 Washington Washington       WA        WA
## 2979 Washington Washington       WA        WA
## 2980 Washington Washington       WA        WA
## 2981 Washington Washington       WA        WA
## 2982 Washington Washington       WA        WA
## 2983 Washington Washington       WA        WA
## 2984 Washington Washington       WA        WA
## 2985 Washington Washington       WA        WA
## 2986 Washington Washington       WA        WA
## 2987 Washington Washington       WA        WA
## 2988 Washington Washington       WA        WA
## 2989 Washington Washington       WA        WA
## 2990 Washington Washington       WA        WA
## 2991 Washington Washington       WA        WA
## 2992 Washington Washington       WA        WA
## 2993 Washington Washington       WA        WA
str(PNW)
## 'data.frame':    75 obs. of  56 variables:
##  $ state                                    : chr  "Oregon" "Oregon" "Oregon" "Oregon" ...
##  $ name                                     : chr  "Baker County" "Benton County" "Clackamas County" "Clatsop County" ...
##  $ FIPS                                     : int  41001 41003 41005 41007 41009 41011 41013 41015 41017 41019 ...
##  $ pop2010                                  : int  16134 85579 375992 37039 49351 63043 20978 22364 157733 107667 ...
##  $ pop2000                                  : int  16741 78153 338391 35630 43560 62779 19182 21137 115367 100399 ...
##  $ age_under_5                              : num  5.3 4.4 5.7 5.6 5.7 5.1 5.4 3.8 6.1 5.2 ...
##  $ age_under_18                             : num  20.3 17.8 23.7 20.5 23.5 18.9 21.9 15.7 23 20.5 ...
##  $ age_over_65                              : num  22 12 13.6 16.6 13.9 21.4 20 28 14.9 21 ...
##  $ female                                   : num  49.5 49.9 50.8 50.3 49.9 50.7 50.4 50.6 50.6 50.6 ...
##  $ white                                    : num  94.6 87.1 88.2 90.9 92.5 89.8 92.7 92 92.2 92.4 ...
##  $ black                                    : num  0.4 0.9 0.8 0.5 0.4 0.4 0.2 0.3 0.4 0.3 ...
##  $ native                                   : num  1.1 0.7 0.8 1 1.3 2.5 1.4 1.9 0.9 1.8 ...
##  $ asian                                    : num  0.5 5.2 3.7 1.2 0.9 1 0.5 0.7 0.9 1 ...
##  $ pac_isl                                  : num  NA 0.2 0.2 0.2 0.2 0.2 NA 0.1 0.1 0.1 ...
##  $ two_plus_races                           : num  2.4 3.6 3.2 2.8 3.4 4.3 2 3.7 2.5 3.2 ...
##  $ hispanic                                 : num  3.3 6.4 7.7 7.7 4 5.4 7 5.4 7.4 4.7 ...
##  $ white_not_hispanic                       : num  92.6 83.6 84.5 87.2 90.3 87 89.4 88.7 88.4 89.5 ...
##  $ no_move_in_one_plus_year                 : num  84.7 74.6 85.3 79.2 85.2 79.3 81.8 83.3 83.6 82.6 ...
##  $ foreign_born                             : num  0.5 8.3 8.5 5 3.2 3.2 4.2 3.8 4.7 2.7 ...
##  $ foreign_spoken_at_home                   : num  0.9 11 11.2 7.8 4.4 4.6 5.8 4.4 6.5 4.5 ...
##  $ hs_grad                                  : num  88.7 94.2 91.5 91.1 88.5 85.8 85.7 91.6 92.9 86.4 ...
##  $ bachelors                                : num  20.5 47.9 31.4 21.6 16.8 18.3 15.4 18.5 29.1 15.5 ...
##  $ veterans                                 : int  2214 6263 31985 4447 5990 9268 2642 3536 15683 14861 ...
##  $ mean_work_travel                         : num  14.8 17.9 26 17.5 31.1 18.9 20.6 14.1 18.6 20 ...
##  $ housing_units                            : int  8826 36245 156945 21546 20698 30593 10202 12613 80139 48915 ...
##  $ home_ownership                           : num  71.2 57.2 70.7 62 76.8 67 72.1 70.8 68 70.6 ...
##  $ housing_multi_unit                       : num  11 30 21.3 22.7 10.6 12.9 7.2 10.4 14.9 12.1 ...
##  $ median_val_owner_occupied                : num  142400 263200 331100 253100 219800 ...
##  $ households                               : int  6902 33471 143357 16267 19075 27247 8754 10473 63190 43916 ...
##  $ persons_per_household                    : num  2.32 2.35 2.56 2.18 2.56 2.29 2.45 2.14 2.43 2.41 ...
##  $ per_capita_income                        : int  21683 26177 31785 25347 24613 21981 22275 23842 27920 21342 ...
##  $ median_household_income                  : int  39704 48012 62007 42223 55199 37491 46059 37469 53071 39711 ...
##  $ poverty                                  : num  19.9 19.1 9 12.8 10.3 16.4 14 13.9 10.5 15.6 ...
##  $ private_nonfarm_establishments           : int  548 2077 10865 1480 964 1720 501 721 5951 2621 ...
##  $ private_nonfarm_employment               : int  3840 25094 124588 13640 7284 17007 3917 4854 52759 28065 ...
##  $ percent_change_private_nonfarm_employment: num  -1.8 -7.9 7.7 19.8 -19.4 -6.5 -22.3 -0.7 23 -7.2 ...
##  $ nonemployment_establishments             : int  1161 5182 26594 2771 2719 3713 1410 1832 14497 5877 ...
##  $ firms                                    : int  1761 7157 36997 4221 4017 5034 2236 2490 21423 9160 ...
##  $ black_owned_firms                        : num  NA NA 0.6 NA NA NA NA NA 0.3 NA ...
##  $ native_owned_firms                       : num  NA NA 0.6 NA 1.1 NA NA 1.2 1 NA ...
##  $ asian_owned_firms                        : num  NA 3.4 3.6 2 NA NA NA NA 1.6 1.8 ...
##  $ pac_isl_owned_firms                      : num  NA NA NA NA NA NA NA NA 0.1 NA ...
##  $ hispanic_owned_firms                     : num  NA NA 2.7 NA NA NA NA NA 1.8 NA ...
##  $ women_owned_firms                        : num  NA 29.3 28.9 22.7 27.5 30.2 29.2 26.4 26.2 23.6 ...
##  $ manufacturer_shipments_2007              : int  137989 590473 5668235 683627 826200 279107 209877 190654 897387 1506405 ...
##  $ mercent_whole_sales_2007                 : int  19141 387072 5292449 75515 96784 260759 100505 NA 1222598 NA ...
##  $ sales                                    : int  155456 685172 5095774 604480 357383 716036 139916 227007 2809059 1169216 ...
##  $ sales_per_capita                         : int  9710 8398 13567 16299 7306 11339 6147 10555 18311 11276 ...
##  $ accommodation_food_service               : int  25659 134027 607444 160570 44223 83538 22704 56246 416425 228371 ...
##  $ building_permits                         : int  34 93 665 160 62 24 61 27 377 181 ...
##  $ fed_spending                             : int  171016 600481 2206421 356029 329573 643165 173511 225362 897936 1045528 ...
##  $ area                                     : num  3068 676 1870 829 657 ...
##  $ density                                  : num  5.3 126.6 201 44.7 75.1 ...
##  $ state_f1                                 : Factor w/ 2 levels "Oregon","Washington": 1 1 1 1 1 1 1 1 1 1 ...
##  $ state_f2                                 : Factor w/ 2 levels "OR","WA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ state_f2c                                : chr  "OR" "OR" "OR" "OR" ...

For a good tutorial on factors see
http://www.ats.ucla.edu/stat/r/modules/factor_variables.htm

Classic Control Structures

Most computer languages come equipped with similar standard features to implement the logic of algorithms.

If…Then…Else in R

Read the following:

http://www.programiz.com/r-programming/if-else-statement

A Task

In the countyComplete dataset, we have a variable density, probably people per square mile. I’d like to create a categorical variable that breaks this variable down into categories High, Low and Medium.

Your task is to create a character variable densityCat in countyComplete with these categorical values. After you create the character variable, you should create a factor version in a second variable densityCatF. Low is defined as the first quartile. Medium is the second and third quartiles. High is the fourth quartile. You will need the quantile function. Use Google for help. It’s your best friend!