Harold Nelson
10/31/2016
replace the following with whatever is necessary on your system.
load("~/cdc.Rdata")
cC <- read.delim("~/Downloads/Data from openintro.org/countyComplete.txt")
The variable exerany could be made more readable. The following code illustrates how a programmer in another language would do this.
for(i in 1:nrow(cdc)){
if (cdc[i,'exerany'] == 0){
cdc[i,'exerany2'] = 'No Exercise'
} else {
cdc[i,'exerany2'] = 'Yes Exercise'
}
}
Let’s see if this worked.
table(cdc$exerany,cdc$exerany2)
##
## No Exercise Yes Exercise
## 0 5086 0
## 1 0 14914
An experienced R user would do this differently.
cdc[cdc$exerany==0,'exerany3'] = 'No exercise'
cdc[cdc$exerany==1,'exerany3'] = 'Yes exercise'
Did this work?
table(cdc$exerany,cdc$exerany3)
##
## No exercise Yes exercise
## 0 5086 0
## 1 0 14914
This is an example of “subsetting,” a common programming technique in R.
Suppose DF is a dataframe.
DF[Row Spec,Col Spec]
Row Spec May be a logical vector or a vector of integers specifying row numbers.
Col Spec may be vector of column names enclosed in quotes or a vector of integers specifying column numbers.
A subsetting expression may appear on the right side or left side of a replacement statement or be used alone to display the subset.
Simple integer vectors
cC[1:100,1:5]
## state name FIPS pop2010 pop2000
## 1 Alabama Autauga County 1001 54571 43671
## 2 Alabama Baldwin County 1003 182265 140415
## 3 Alabama Barbour County 1005 27457 29038
## 4 Alabama Bibb County 1007 22915 20826
## 5 Alabama Blount County 1009 57322 51024
## 6 Alabama Bullock County 1011 10914 11714
## 7 Alabama Butler County 1013 20947 21399
## 8 Alabama Calhoun County 1015 118572 112249
## 9 Alabama Chambers County 1017 34215 36583
## 10 Alabama Cherokee County 1019 25989 23988
## 11 Alabama Chilton County 1021 43643 39593
## 12 Alabama Choctaw County 1023 13859 15922
## 13 Alabama Clarke County 1025 25833 27867
## 14 Alabama Clay County 1027 13932 14254
## 15 Alabama Cleburne County 1029 14972 14123
## 16 Alabama Coffee County 1031 49948 43615
## 17 Alabama Colbert County 1033 54428 54984
## 18 Alabama Conecuh County 1035 13228 14089
## 19 Alabama Coosa County 1037 11539 12202
## 20 Alabama Covington County 1039 37765 37631
## 21 Alabama Crenshaw County 1041 13906 13665
## 22 Alabama Cullman County 1043 80406 77483
## 23 Alabama Dale County 1045 50251 49129
## 24 Alabama Dallas County 1047 43820 46365
## 25 Alabama DeKalb County 1049 71109 64452
## 26 Alabama Elmore County 1051 79303 65874
## 27 Alabama Escambia County 1053 38319 38440
## 28 Alabama Etowah County 1055 104430 103459
## 29 Alabama Fayette County 1057 17241 18495
## 30 Alabama Franklin County 1059 31704 31223
## 31 Alabama Geneva County 1061 26790 25764
## 32 Alabama Greene County 1063 9045 9974
## 33 Alabama Hale County 1065 15760 17185
## 34 Alabama Henry County 1067 17302 16310
## 35 Alabama Houston County 1069 101547 88787
## 36 Alabama Jackson County 1071 53227 53926
## 37 Alabama Jefferson County 1073 658466 662047
## 38 Alabama Lamar County 1075 14564 15904
## 39 Alabama Lauderdale County 1077 92709 87966
## 40 Alabama Lawrence County 1079 34339 34803
## 41 Alabama Lee County 1081 140247 115092
## 42 Alabama Limestone County 1083 82782 65676
## 43 Alabama Lowndes County 1085 11299 13473
## 44 Alabama Macon County 1087 21452 24105
## 45 Alabama Madison County 1089 334811 276700
## 46 Alabama Marengo County 1091 21027 22539
## 47 Alabama Marion County 1093 30776 31214
## 48 Alabama Marshall County 1095 93019 82231
## 49 Alabama Mobile County 1097 412992 399843
## 50 Alabama Monroe County 1099 23068 24324
## 51 Alabama Montgomery County 1101 229363 223510
## 52 Alabama Morgan County 1103 119490 111064
## 53 Alabama Perry County 1105 10591 11861
## 54 Alabama Pickens County 1107 19746 20949
## 55 Alabama Pike County 1109 32899 29605
## 56 Alabama Randolph County 1111 22913 22380
## 57 Alabama Russell County 1113 52947 49756
## 58 Alabama St. Clair County 1115 83593 64742
## 59 Alabama Shelby County 1117 195085 143293
## 60 Alabama Sumter County 1119 13763 14798
## 61 Alabama Talladega County 1121 82291 80321
## 62 Alabama Tallapoosa County 1123 41616 41475
## 63 Alabama Tuscaloosa County 1125 194656 164875
## 64 Alabama Walker County 1127 67023 70713
## 65 Alabama Washington County 1129 17581 18097
## 66 Alabama Wilcox County 1131 11670 13183
## 67 Alabama Winston County 1133 24484 24843
## 68 Alaska Aleutians East Borough 2013 3141 2697
## 69 Alaska Aleutians West Census Area 2016 5561 5465
## 70 Alaska Anchorage Municipality 2020 291826 260283
## 71 Alaska Bethel Census Area 2050 17013 16006
## 72 Alaska Bristol Bay Borough 2060 997 1258
## 73 Alaska Denali Borough 2068 1826 1893
## 74 Alaska Dillingham Census Area 2070 4847 4922
## 75 Alaska Fairbanks North Star Borough 2090 97581 82840
## 76 Alaska Haines Borough 2100 2508 2392
## 77 Alaska Hoonah-Angoon Census Area 2105 2150 3436
## 78 Alaska Juneau City and Borough 2110 31275 30711
## 79 Alaska Kenai Peninsula Borough 2122 55400 49691
## 80 Alaska Ketchikan Gateway Borough 2130 13477 14070
## 81 Alaska Kodiak Island Borough 2150 13592 13913
## 82 Alaska Lake and Peninsula Borough 2164 1631 1823
## 83 Alaska Matanuska-Susitna Borough 2170 88995 59322
## 84 Alaska Nome Census Area 2180 9492 9196
## 85 Alaska North Slope Borough 2185 9430 7385
## 86 Alaska Northwest Arctic Borough 2188 7523 7208
## 87 Alaska Petersburg Census Area 2195 3815 6684
## 88 Alaska Prince of Wales-Hyder Census Area 2198 5559 6146
## 89 Alaska Sitka City and Borough 2220 8881 8835
## 90 Alaska Skagway Municipality 2230 968 NA
## 91 Alaska Southeast Fairbanks Census Area 2240 7029 6174
## 92 Alaska Valdez-Cordova Census Area 2261 9636 10195
## 93 Alaska Wade Hampton Census Area 2270 7459 7028
## 94 Alaska Wrangell City and Borough 2275 2369 NA
## 95 Alaska Yakutat City and Borough 2282 662 808
## 96 Alaska Yukon-Koyukuk Census Area 2290 5588 6551
## 97 Arizona Apache County 4001 71518 69423
## 98 Arizona Cochise County 4003 131346 117755
## 99 Arizona Coconino County 4005 134421 116320
## 100 Arizona Gila County 4007 53597 51335
Logical specification of rows
PNW = cC[cC$state == "Washington" | cC$state == "Oregon",c(1:2,c(10:15))]
PNW
## state name white black native asian pac_isl
## 2209 Oregon Baker County 94.6 0.4 1.1 0.5 NA
## 2210 Oregon Benton County 87.1 0.9 0.7 5.2 0.2
## 2211 Oregon Clackamas County 88.2 0.8 0.8 3.7 0.2
## 2212 Oregon Clatsop County 90.9 0.5 1.0 1.2 0.2
## 2213 Oregon Columbia County 92.5 0.4 1.3 0.9 0.2
## 2214 Oregon Coos County 89.8 0.4 2.5 1.0 0.2
## 2215 Oregon Crook County 92.7 0.2 1.4 0.5 NA
## 2216 Oregon Curry County 92.0 0.3 1.9 0.7 0.1
## 2217 Oregon Deschutes County 92.2 0.4 0.9 0.9 0.1
## 2218 Oregon Douglas County 92.4 0.3 1.8 1.0 0.1
## 2219 Oregon Gilliam County 95.2 0.2 1.0 0.2 0.7
## 2220 Oregon Grant County 95.0 0.2 1.2 0.3 0.1
## 2221 Oregon Harney County 91.9 0.3 3.1 0.5 0.0
## 2222 Oregon Hood River County 83.1 0.5 0.8 1.4 0.2
## 2223 Oregon Jackson County 88.7 0.7 1.2 1.2 0.3
## 2224 Oregon Jefferson County 69.0 0.6 16.9 0.4 0.1
## 2225 Oregon Josephine County 92.4 0.4 1.4 0.8 0.2
## 2226 Oregon Klamath County 85.9 0.7 4.1 0.9 0.1
## 2227 Oregon Lake County 90.3 0.5 2.1 0.7 0.1
## 2228 Oregon Lane County 88.3 1.0 1.2 2.4 0.2
## 2229 Oregon Lincoln County 87.7 0.4 3.5 1.1 0.1
## 2230 Oregon Linn County 90.6 0.5 1.3 1.0 0.1
## 2231 Oregon Malheur County 77.5 1.2 1.2 1.7 0.1
## 2232 Oregon Marion County 78.2 1.1 1.6 1.9 0.7
## 2233 Oregon Morrow County 77.7 0.5 1.2 0.9 0.1
## 2234 Oregon Multnomah County 76.5 5.6 1.1 6.5 0.5
## 2235 Oregon Polk County 85.9 0.6 2.1 1.9 0.3
## 2236 Oregon Sherman County 93.4 0.2 1.6 0.2 0.1
## 2237 Oregon Tillamook County 91.5 0.3 1.0 0.9 0.2
## 2238 Oregon Umatilla County 79.1 0.8 3.5 0.9 0.1
## 2239 Oregon Union County 93.1 0.5 1.1 0.8 0.9
## 2240 Oregon Wallowa County 96.0 0.4 0.6 0.3 NA
## 2241 Oregon Wasco County 86.1 0.4 4.4 0.8 0.6
## 2242 Oregon Washington County 76.6 1.8 0.7 8.6 0.5
## 2243 Oregon Wheeler County 92.4 0.0 1.2 0.6 0.1
## 2244 Oregon Yamhill County 85.4 0.9 1.5 1.5 0.2
## 2955 Washington Adams County 62.5 0.6 1.9 0.7 0.0
## 2956 Washington Asotin County 94.3 0.4 1.4 0.5 NA
## 2957 Washington Benton County 82.4 1.3 0.9 2.7 0.1
## 2958 Washington Chelan County 79.3 0.3 1.0 0.8 0.1
## 2959 Washington Clallam County 87.0 0.8 5.1 1.4 0.1
## 2960 Washington Clark County 85.4 2.0 0.9 4.1 0.6
## 2961 Washington Columbia County 93.0 0.3 1.4 0.6 NA
## 2962 Washington Cowlitz County 88.9 0.6 1.5 1.5 0.2
## 2963 Washington Douglas County 79.6 0.3 1.1 0.7 0.1
## 2964 Washington Ferry County 76.3 0.3 16.7 0.7 0.1
## 2965 Washington Franklin County 60.5 1.9 0.7 1.8 0.1
## 2966 Washington Garfield County 93.8 0.0 0.3 1.7 NA
## 2967 Washington Grant County 72.8 1.1 1.2 0.9 0.1
## 2968 Washington Grays Harbor County 84.9 1.1 4.6 1.4 0.3
## 2969 Washington Island County 86.1 2.2 0.8 4.4 0.5
## 2970 Washington Jefferson County 91.0 0.8 2.3 1.6 0.2
## 2971 Washington King County 68.7 6.2 0.8 14.6 0.8
## 2972 Washington Kitsap County 82.6 2.6 1.6 4.9 0.9
## 2973 Washington Kittitas County 89.3 0.9 1.0 2.0 0.1
## 2974 Washington Klickitat County 87.7 0.2 2.4 0.6 0.1
## 2975 Washington Lewis County 89.7 0.5 1.4 0.9 0.2
## 2976 Washington Lincoln County 95.0 0.3 1.6 0.4 0.0
## 2977 Washington Mason County 86.1 1.1 3.7 1.2 0.4
## 2978 Washington Okanogan County 73.9 0.4 11.4 0.6 0.1
## 2979 Washington Pacific County 87.4 0.4 2.3 2.0 0.1
## 2980 Washington Pend Oreille County 91.6 0.4 3.8 0.6 0.1
## 2981 Washington Pierce County 74.2 6.8 1.4 6.0 1.3
## 2982 Washington San Juan County 92.6 0.3 0.7 1.1 0.1
## 2983 Washington Skagit County 83.4 0.7 2.2 1.8 0.2
## 2984 Washington Skamania County 92.8 0.4 1.6 0.9 0.1
## 2985 Washington Snohomish County 78.4 2.5 1.4 8.9 0.4
## 2986 Washington Spokane County 89.2 1.7 1.5 2.1 0.4
## 2987 Washington Stevens County 89.4 0.3 5.5 0.5 0.2
## 2988 Washington Thurston County 82.4 2.7 1.4 5.2 0.8
## 2989 Washington Wahkiakum County 94.0 0.3 1.3 0.6 0.2
## 2990 Washington Walla Walla County 84.5 1.8 1.0 1.3 0.3
## 2991 Washington Whatcom County 85.4 1.0 2.8 3.5 0.2
## 2992 Washington Whitman County 84.6 1.7 0.7 7.8 0.2
## 2993 Washington Yakima County 63.7 1.0 4.3 1.1 0.1
## two_plus_races
## 2209 2.4
## 2210 3.6
## 2211 3.2
## 2212 2.8
## 2213 3.4
## 2214 4.3
## 2215 2.0
## 2216 3.7
## 2217 2.5
## 2218 3.2
## 2219 1.4
## 2220 2.3
## 2221 3.0
## 2222 3.2
## 2223 3.5
## 2224 3.8
## 2225 3.2
## 2226 4.1
## 2227 3.3
## 2228 4.2
## 2229 3.7
## 2230 3.3
## 2231 2.9
## 2232 3.9
## 2233 2.6
## 2234 4.6
## 2235 3.8
## 2236 1.8
## 2237 2.4
## 2238 3.1
## 2239 2.3
## 2240 2.0
## 2241 2.5
## 2242 4.3
## 2243 3.1
## 2244 3.3
## 2955 2.8
## 2956 2.4
## 2957 3.6
## 2958 2.7
## 2959 3.8
## 2960 4.0
## 2961 2.7
## 2962 3.7
## 2963 2.6
## 2964 4.8
## 2965 3.2
## 2966 1.9
## 2967 3.5
## 2968 3.9
## 2969 4.5
## 2970 3.4
## 2971 5.0
## 2972 5.8
## 2973 3.0
## 2974 3.3
## 2975 3.2
## 2976 2.2
## 2977 4.1
## 2978 3.5
## 2979 3.4
## 2980 2.9
## 2981 6.8
## 2982 2.5
## 2983 3.2
## 2984 3.0
## 2985 4.6
## 2986 3.8
## 2987 3.3
## 2988 5.3
## 2989 3.1
## 2990 3.1
## 2991 3.8
## 2992 3.6
## 2993 3.7
Get all of the variables for PNW.
PNW = cC[cC$state %in% c("Oregon","Washington"),]
Note the use of %in% and the placeholder comma.
Let’s do a side-by-side boxplot of pop2010 by state.
boxplot(PNW$pop2010~PNW$state)
Why is our graph cluttered up with the names of all of the states. What can we do about this?
Check the structure of the large dataframe and examine the variable state.
str(cC)
## 'data.frame': 3143 obs. of 53 variables:
## $ state : Factor w/ 51 levels "Alabama","Alaska",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ name : Factor w/ 1877 levels "Abbeville County",..: 83 90 101 151 166 227 237 250 298 320 ...
## $ FIPS : int 1001 1003 1005 1007 1009 1011 1013 1015 1017 1019 ...
## $ pop2010 : int 54571 182265 27457 22915 57322 10914 20947 118572 34215 25989 ...
## $ pop2000 : int 43671 140415 29038 20826 51024 11714 21399 112249 36583 23988 ...
## $ age_under_5 : num 6.6 6.1 6.2 6 6.3 6.8 6.5 6.1 5.7 5.3 ...
## $ age_under_18 : num 26.8 23 21.9 22.7 24.6 22.3 24.1 22.9 22.5 21.4 ...
## $ age_over_65 : num 12 16.8 14.2 12.7 14.7 13.5 16.7 14.3 16.7 17.9 ...
## $ female : num 51.3 51.1 46.9 46.3 50.5 45.8 53 51.8 52.2 50.4 ...
## $ white : num 78.5 85.7 48 75.8 92.6 23 54.4 74.9 58.8 92.7 ...
## $ black : num 17.7 9.4 46.9 22 1.3 70.2 43.4 20.6 38.7 4.6 ...
## $ native : num 0.4 0.7 0.4 0.3 0.5 0.2 0.3 0.5 0.2 0.5 ...
## $ asian : num 0.9 0.7 0.4 0.1 0.2 0.2 0.8 0.7 0.5 0.2 ...
## $ pac_isl : num NA NA NA NA NA NA 0 0.1 0 0 ...
## $ two_plus_races : num 1.6 1.5 0.9 0.9 1.2 0.8 0.8 1.7 1.1 1.5 ...
## $ hispanic : num 2.4 4.4 5.1 1.8 8.1 7.1 0.9 3.3 1.6 1.2 ...
## $ white_not_hispanic : num 77.2 83.5 46.8 75 88.9 21.9 54.1 73.6 58.1 92.1 ...
## $ no_move_in_one_plus_year : num 86.3 83 83 90.5 87.2 88.5 92.8 82.9 86.2 88.1 ...
## $ foreign_born : num 2 3.6 2.8 0.7 4.7 1.1 1.1 2.5 0.9 0.5 ...
## $ foreign_spoken_at_home : num 3.7 5.5 4.7 1.5 7.2 3.8 1.6 4.5 1.6 1.4 ...
## $ hs_grad : num 85.3 87.6 71.9 74.5 74.7 74.7 74.8 78.5 71.8 73.4 ...
## $ bachelors : num 21.7 26.8 13.5 10 12.5 12 11 16.1 10.8 10.5 ...
## $ veterans : int 5817 20396 2327 1883 4072 943 1675 11757 2893 2172 ...
## $ mean_work_travel : num 25.1 25.8 23.8 28.3 33.2 28.1 25.1 22.1 23.6 26.2 ...
## $ housing_units : int 22135 104061 11829 8981 23887 4493 9964 53289 17004 16267 ...
## $ home_ownership : num 77.5 76.7 68 82.9 82 76.9 69 70.7 71.4 77.5 ...
## $ housing_multi_unit : num 7.2 22.6 11.1 6.6 3.7 9.9 13.7 14.3 8.7 4.3 ...
## $ median_val_owner_occupied : num 133900 177200 88200 81200 113700 ...
## $ households : int 19718 69476 9795 7441 20605 3732 8019 46421 13681 11352 ...
## $ persons_per_household : num 2.7 2.5 2.52 3.02 2.73 2.85 2.58 2.46 2.51 2.22 ...
## $ per_capita_income : int 24568 26469 15875 19918 21070 20289 16916 20574 16626 21322 ...
## $ median_household_income : int 53255 50147 33219 41770 45549 31602 30659 38407 31467 40690 ...
## $ poverty : num 10.6 12.2 25 12.6 13.4 25.3 25 19.5 20.3 17.6 ...
## $ private_nonfarm_establishments : int 877 4812 522 318 749 120 446 2444 568 350 ...
## $ private_nonfarm_employment : int 10628 52233 7990 2927 6968 1919 5400 38324 6241 3600 ...
## $ percent_change_private_nonfarm_employment: num 16.6 17.4 -27 -14 -11.4 -18.5 2.1 -5.6 -45.8 5.4 ...
## $ nonemployment_establishments : int 2971 14175 1527 1192 3501 390 1180 6329 2074 1627 ...
## $ firms : int 4067 19035 1667 1385 4458 417 1769 8713 1981 2180 ...
## $ black_owned_firms : num 15.2 2.7 NA 14.9 NA NA NA 7.2 NA NA ...
## $ native_owned_firms : num NA 0.4 NA NA NA NA NA NA NA NA ...
## $ asian_owned_firms : num 1.3 1 NA NA NA NA 3.3 1.6 NA NA ...
## $ pac_isl_owned_firms : num NA NA NA NA NA NA NA NA NA NA ...
## $ hispanic_owned_firms : num 0.7 1.3 NA NA NA NA NA 0.5 NA NA ...
## $ women_owned_firms : num 31.7 27.3 27 NA 23.2 38.8 NA 24.7 29.3 14.5 ...
## $ manufacturer_shipments_2007 : int NA 1410273 NA 0 341544 NA 399132 2679991 667283 307439 ...
## $ mercent_whole_sales_2007 : int NA NA NA NA NA NA 56712 NA NA 62293 ...
## $ sales : int 598175 2966489 188337 124707 319700 43810 229277 1542981 264650 186321 ...
## $ sales_per_capita : int 12003 17166 6334 5804 5622 3995 11326 13678 7620 7613 ...
## $ accommodation_food_service : int 88157 436955 NA 10757 20941 3670 28427 186533 23237 13948 ...
## $ building_permits : int 191 696 10 8 18 1 3 107 10 6 ...
## $ fed_spending : int 331142 1119082 240308 163201 294114 108846 195055 1830659 294718 184642 ...
## $ area : num 594 1590 885 623 645 ...
## $ density : num 91.8 114.6 31 36.8 88.9 ...
Now look at the small dataframe and examine state.
str(PNW)
## 'data.frame': 75 obs. of 53 variables:
## $ state : Factor w/ 51 levels "Alabama","Alaska",..: 38 38 38 38 38 38 38 38 38 38 ...
## $ name : Factor w/ 1877 levels "Abbeville County",..: 89 140 342 351 386 402 428 439 481 510 ...
## $ FIPS : int 41001 41003 41005 41007 41009 41011 41013 41015 41017 41019 ...
## $ pop2010 : int 16134 85579 375992 37039 49351 63043 20978 22364 157733 107667 ...
## $ pop2000 : int 16741 78153 338391 35630 43560 62779 19182 21137 115367 100399 ...
## $ age_under_5 : num 5.3 4.4 5.7 5.6 5.7 5.1 5.4 3.8 6.1 5.2 ...
## $ age_under_18 : num 20.3 17.8 23.7 20.5 23.5 18.9 21.9 15.7 23 20.5 ...
## $ age_over_65 : num 22 12 13.6 16.6 13.9 21.4 20 28 14.9 21 ...
## $ female : num 49.5 49.9 50.8 50.3 49.9 50.7 50.4 50.6 50.6 50.6 ...
## $ white : num 94.6 87.1 88.2 90.9 92.5 89.8 92.7 92 92.2 92.4 ...
## $ black : num 0.4 0.9 0.8 0.5 0.4 0.4 0.2 0.3 0.4 0.3 ...
## $ native : num 1.1 0.7 0.8 1 1.3 2.5 1.4 1.9 0.9 1.8 ...
## $ asian : num 0.5 5.2 3.7 1.2 0.9 1 0.5 0.7 0.9 1 ...
## $ pac_isl : num NA 0.2 0.2 0.2 0.2 0.2 NA 0.1 0.1 0.1 ...
## $ two_plus_races : num 2.4 3.6 3.2 2.8 3.4 4.3 2 3.7 2.5 3.2 ...
## $ hispanic : num 3.3 6.4 7.7 7.7 4 5.4 7 5.4 7.4 4.7 ...
## $ white_not_hispanic : num 92.6 83.6 84.5 87.2 90.3 87 89.4 88.7 88.4 89.5 ...
## $ no_move_in_one_plus_year : num 84.7 74.6 85.3 79.2 85.2 79.3 81.8 83.3 83.6 82.6 ...
## $ foreign_born : num 0.5 8.3 8.5 5 3.2 3.2 4.2 3.8 4.7 2.7 ...
## $ foreign_spoken_at_home : num 0.9 11 11.2 7.8 4.4 4.6 5.8 4.4 6.5 4.5 ...
## $ hs_grad : num 88.7 94.2 91.5 91.1 88.5 85.8 85.7 91.6 92.9 86.4 ...
## $ bachelors : num 20.5 47.9 31.4 21.6 16.8 18.3 15.4 18.5 29.1 15.5 ...
## $ veterans : int 2214 6263 31985 4447 5990 9268 2642 3536 15683 14861 ...
## $ mean_work_travel : num 14.8 17.9 26 17.5 31.1 18.9 20.6 14.1 18.6 20 ...
## $ housing_units : int 8826 36245 156945 21546 20698 30593 10202 12613 80139 48915 ...
## $ home_ownership : num 71.2 57.2 70.7 62 76.8 67 72.1 70.8 68 70.6 ...
## $ housing_multi_unit : num 11 30 21.3 22.7 10.6 12.9 7.2 10.4 14.9 12.1 ...
## $ median_val_owner_occupied : num 142400 263200 331100 253100 219800 ...
## $ households : int 6902 33471 143357 16267 19075 27247 8754 10473 63190 43916 ...
## $ persons_per_household : num 2.32 2.35 2.56 2.18 2.56 2.29 2.45 2.14 2.43 2.41 ...
## $ per_capita_income : int 21683 26177 31785 25347 24613 21981 22275 23842 27920 21342 ...
## $ median_household_income : int 39704 48012 62007 42223 55199 37491 46059 37469 53071 39711 ...
## $ poverty : num 19.9 19.1 9 12.8 10.3 16.4 14 13.9 10.5 15.6 ...
## $ private_nonfarm_establishments : int 548 2077 10865 1480 964 1720 501 721 5951 2621 ...
## $ private_nonfarm_employment : int 3840 25094 124588 13640 7284 17007 3917 4854 52759 28065 ...
## $ percent_change_private_nonfarm_employment: num -1.8 -7.9 7.7 19.8 -19.4 -6.5 -22.3 -0.7 23 -7.2 ...
## $ nonemployment_establishments : int 1161 5182 26594 2771 2719 3713 1410 1832 14497 5877 ...
## $ firms : int 1761 7157 36997 4221 4017 5034 2236 2490 21423 9160 ...
## $ black_owned_firms : num NA NA 0.6 NA NA NA NA NA 0.3 NA ...
## $ native_owned_firms : num NA NA 0.6 NA 1.1 NA NA 1.2 1 NA ...
## $ asian_owned_firms : num NA 3.4 3.6 2 NA NA NA NA 1.6 1.8 ...
## $ pac_isl_owned_firms : num NA NA NA NA NA NA NA NA 0.1 NA ...
## $ hispanic_owned_firms : num NA NA 2.7 NA NA NA NA NA 1.8 NA ...
## $ women_owned_firms : num NA 29.3 28.9 22.7 27.5 30.2 29.2 26.4 26.2 23.6 ...
## $ manufacturer_shipments_2007 : int 137989 590473 5668235 683627 826200 279107 209877 190654 897387 1506405 ...
## $ mercent_whole_sales_2007 : int 19141 387072 5292449 75515 96784 260759 100505 NA 1222598 NA ...
## $ sales : int 155456 685172 5095774 604480 357383 716036 139916 227007 2809059 1169216 ...
## $ sales_per_capita : int 9710 8398 13567 16299 7306 11339 6147 10555 18311 11276 ...
## $ accommodation_food_service : int 25659 134027 607444 160570 44223 83538 22704 56246 416425 228371 ...
## $ building_permits : int 34 93 665 160 62 24 61 27 377 181 ...
## $ fed_spending : int 171016 600481 2206421 356029 329573 643165 173511 225362 897936 1045528 ...
## $ area : num 3068 676 1870 829 657 ...
## $ density : num 5.3 126.6 201 44.7 75.1 ...
table(PNW$state)
##
## Alabama Alaska Arizona
## 0 0 0
## Arkansas California Colorado
## 0 0 0
## Connecticut Delaware District of Columbia
## 0 0 0
## Florida Georgia Hawaii
## 0 0 0
## Idaho Illinois Indiana
## 0 0 0
## Iowa Kansas Kentucky
## 0 0 0
## Louisiana Maine Maryland
## 0 0 0
## Massachusetts Michigan Minnesota
## 0 0 0
## Mississippi Missouri Montana
## 0 0 0
## Nebraska Nevada New Hampshire
## 0 0 0
## New Jersey New Mexico New York
## 0 0 0
## North Carolina North Dakota Ohio
## 0 0 0
## Oklahoma Oregon Pennsylvania
## 0 36 0
## Rhode Island South Carolina South Dakota
## 0 0 0
## Tennessee Texas Utah
## 0 0 0
## Vermont Virginia Washington
## 0 0 39
## West Virginia Wisconsin Wyoming
## 0 0 0
We see that the factor county also has every county in the entire US. How do we fix this?
PNW$state = as.character(PNW$state)
PNW$name = as.character(PNW$name)
str(PNW)
## 'data.frame': 75 obs. of 53 variables:
## $ state : chr "Oregon" "Oregon" "Oregon" "Oregon" ...
## $ name : chr "Baker County" "Benton County" "Clackamas County" "Clatsop County" ...
## $ FIPS : int 41001 41003 41005 41007 41009 41011 41013 41015 41017 41019 ...
## $ pop2010 : int 16134 85579 375992 37039 49351 63043 20978 22364 157733 107667 ...
## $ pop2000 : int 16741 78153 338391 35630 43560 62779 19182 21137 115367 100399 ...
## $ age_under_5 : num 5.3 4.4 5.7 5.6 5.7 5.1 5.4 3.8 6.1 5.2 ...
## $ age_under_18 : num 20.3 17.8 23.7 20.5 23.5 18.9 21.9 15.7 23 20.5 ...
## $ age_over_65 : num 22 12 13.6 16.6 13.9 21.4 20 28 14.9 21 ...
## $ female : num 49.5 49.9 50.8 50.3 49.9 50.7 50.4 50.6 50.6 50.6 ...
## $ white : num 94.6 87.1 88.2 90.9 92.5 89.8 92.7 92 92.2 92.4 ...
## $ black : num 0.4 0.9 0.8 0.5 0.4 0.4 0.2 0.3 0.4 0.3 ...
## $ native : num 1.1 0.7 0.8 1 1.3 2.5 1.4 1.9 0.9 1.8 ...
## $ asian : num 0.5 5.2 3.7 1.2 0.9 1 0.5 0.7 0.9 1 ...
## $ pac_isl : num NA 0.2 0.2 0.2 0.2 0.2 NA 0.1 0.1 0.1 ...
## $ two_plus_races : num 2.4 3.6 3.2 2.8 3.4 4.3 2 3.7 2.5 3.2 ...
## $ hispanic : num 3.3 6.4 7.7 7.7 4 5.4 7 5.4 7.4 4.7 ...
## $ white_not_hispanic : num 92.6 83.6 84.5 87.2 90.3 87 89.4 88.7 88.4 89.5 ...
## $ no_move_in_one_plus_year : num 84.7 74.6 85.3 79.2 85.2 79.3 81.8 83.3 83.6 82.6 ...
## $ foreign_born : num 0.5 8.3 8.5 5 3.2 3.2 4.2 3.8 4.7 2.7 ...
## $ foreign_spoken_at_home : num 0.9 11 11.2 7.8 4.4 4.6 5.8 4.4 6.5 4.5 ...
## $ hs_grad : num 88.7 94.2 91.5 91.1 88.5 85.8 85.7 91.6 92.9 86.4 ...
## $ bachelors : num 20.5 47.9 31.4 21.6 16.8 18.3 15.4 18.5 29.1 15.5 ...
## $ veterans : int 2214 6263 31985 4447 5990 9268 2642 3536 15683 14861 ...
## $ mean_work_travel : num 14.8 17.9 26 17.5 31.1 18.9 20.6 14.1 18.6 20 ...
## $ housing_units : int 8826 36245 156945 21546 20698 30593 10202 12613 80139 48915 ...
## $ home_ownership : num 71.2 57.2 70.7 62 76.8 67 72.1 70.8 68 70.6 ...
## $ housing_multi_unit : num 11 30 21.3 22.7 10.6 12.9 7.2 10.4 14.9 12.1 ...
## $ median_val_owner_occupied : num 142400 263200 331100 253100 219800 ...
## $ households : int 6902 33471 143357 16267 19075 27247 8754 10473 63190 43916 ...
## $ persons_per_household : num 2.32 2.35 2.56 2.18 2.56 2.29 2.45 2.14 2.43 2.41 ...
## $ per_capita_income : int 21683 26177 31785 25347 24613 21981 22275 23842 27920 21342 ...
## $ median_household_income : int 39704 48012 62007 42223 55199 37491 46059 37469 53071 39711 ...
## $ poverty : num 19.9 19.1 9 12.8 10.3 16.4 14 13.9 10.5 15.6 ...
## $ private_nonfarm_establishments : int 548 2077 10865 1480 964 1720 501 721 5951 2621 ...
## $ private_nonfarm_employment : int 3840 25094 124588 13640 7284 17007 3917 4854 52759 28065 ...
## $ percent_change_private_nonfarm_employment: num -1.8 -7.9 7.7 19.8 -19.4 -6.5 -22.3 -0.7 23 -7.2 ...
## $ nonemployment_establishments : int 1161 5182 26594 2771 2719 3713 1410 1832 14497 5877 ...
## $ firms : int 1761 7157 36997 4221 4017 5034 2236 2490 21423 9160 ...
## $ black_owned_firms : num NA NA 0.6 NA NA NA NA NA 0.3 NA ...
## $ native_owned_firms : num NA NA 0.6 NA 1.1 NA NA 1.2 1 NA ...
## $ asian_owned_firms : num NA 3.4 3.6 2 NA NA NA NA 1.6 1.8 ...
## $ pac_isl_owned_firms : num NA NA NA NA NA NA NA NA 0.1 NA ...
## $ hispanic_owned_firms : num NA NA 2.7 NA NA NA NA NA 1.8 NA ...
## $ women_owned_firms : num NA 29.3 28.9 22.7 27.5 30.2 29.2 26.4 26.2 23.6 ...
## $ manufacturer_shipments_2007 : int 137989 590473 5668235 683627 826200 279107 209877 190654 897387 1506405 ...
## $ mercent_whole_sales_2007 : int 19141 387072 5292449 75515 96784 260759 100505 NA 1222598 NA ...
## $ sales : int 155456 685172 5095774 604480 357383 716036 139916 227007 2809059 1169216 ...
## $ sales_per_capita : int 9710 8398 13567 16299 7306 11339 6147 10555 18311 11276 ...
## $ accommodation_food_service : int 25659 134027 607444 160570 44223 83538 22704 56246 416425 228371 ...
## $ building_permits : int 34 93 665 160 62 24 61 27 377 181 ...
## $ fed_spending : int 171016 600481 2206421 356029 329573 643165 173511 225362 897936 1045528 ...
## $ area : num 3068 676 1870 829 657 ...
## $ density : num 5.3 126.6 201 44.7 75.1 ...
table(PNW$state)
##
## Oregon Washington
## 36 39
Suppose we want to have a factor with the state information. Some statistical procedures want a categorical to be in the form of a factor. We can create a new factor with the factor() function.
PNW$state_f1 = factor(PNW$state)
PNW$state_f2 = factor(PNW$state,labels=c("OR","WA"))
PNW$state_f2c = as.character(PNW$state_f2)
smallDF = PNW[c("state","state_f1","state_f2","state_f2c")]
smallDF
## state state_f1 state_f2 state_f2c
## 2209 Oregon Oregon OR OR
## 2210 Oregon Oregon OR OR
## 2211 Oregon Oregon OR OR
## 2212 Oregon Oregon OR OR
## 2213 Oregon Oregon OR OR
## 2214 Oregon Oregon OR OR
## 2215 Oregon Oregon OR OR
## 2216 Oregon Oregon OR OR
## 2217 Oregon Oregon OR OR
## 2218 Oregon Oregon OR OR
## 2219 Oregon Oregon OR OR
## 2220 Oregon Oregon OR OR
## 2221 Oregon Oregon OR OR
## 2222 Oregon Oregon OR OR
## 2223 Oregon Oregon OR OR
## 2224 Oregon Oregon OR OR
## 2225 Oregon Oregon OR OR
## 2226 Oregon Oregon OR OR
## 2227 Oregon Oregon OR OR
## 2228 Oregon Oregon OR OR
## 2229 Oregon Oregon OR OR
## 2230 Oregon Oregon OR OR
## 2231 Oregon Oregon OR OR
## 2232 Oregon Oregon OR OR
## 2233 Oregon Oregon OR OR
## 2234 Oregon Oregon OR OR
## 2235 Oregon Oregon OR OR
## 2236 Oregon Oregon OR OR
## 2237 Oregon Oregon OR OR
## 2238 Oregon Oregon OR OR
## 2239 Oregon Oregon OR OR
## 2240 Oregon Oregon OR OR
## 2241 Oregon Oregon OR OR
## 2242 Oregon Oregon OR OR
## 2243 Oregon Oregon OR OR
## 2244 Oregon Oregon OR OR
## 2955 Washington Washington WA WA
## 2956 Washington Washington WA WA
## 2957 Washington Washington WA WA
## 2958 Washington Washington WA WA
## 2959 Washington Washington WA WA
## 2960 Washington Washington WA WA
## 2961 Washington Washington WA WA
## 2962 Washington Washington WA WA
## 2963 Washington Washington WA WA
## 2964 Washington Washington WA WA
## 2965 Washington Washington WA WA
## 2966 Washington Washington WA WA
## 2967 Washington Washington WA WA
## 2968 Washington Washington WA WA
## 2969 Washington Washington WA WA
## 2970 Washington Washington WA WA
## 2971 Washington Washington WA WA
## 2972 Washington Washington WA WA
## 2973 Washington Washington WA WA
## 2974 Washington Washington WA WA
## 2975 Washington Washington WA WA
## 2976 Washington Washington WA WA
## 2977 Washington Washington WA WA
## 2978 Washington Washington WA WA
## 2979 Washington Washington WA WA
## 2980 Washington Washington WA WA
## 2981 Washington Washington WA WA
## 2982 Washington Washington WA WA
## 2983 Washington Washington WA WA
## 2984 Washington Washington WA WA
## 2985 Washington Washington WA WA
## 2986 Washington Washington WA WA
## 2987 Washington Washington WA WA
## 2988 Washington Washington WA WA
## 2989 Washington Washington WA WA
## 2990 Washington Washington WA WA
## 2991 Washington Washington WA WA
## 2992 Washington Washington WA WA
## 2993 Washington Washington WA WA
str(PNW)
## 'data.frame': 75 obs. of 56 variables:
## $ state : chr "Oregon" "Oregon" "Oregon" "Oregon" ...
## $ name : chr "Baker County" "Benton County" "Clackamas County" "Clatsop County" ...
## $ FIPS : int 41001 41003 41005 41007 41009 41011 41013 41015 41017 41019 ...
## $ pop2010 : int 16134 85579 375992 37039 49351 63043 20978 22364 157733 107667 ...
## $ pop2000 : int 16741 78153 338391 35630 43560 62779 19182 21137 115367 100399 ...
## $ age_under_5 : num 5.3 4.4 5.7 5.6 5.7 5.1 5.4 3.8 6.1 5.2 ...
## $ age_under_18 : num 20.3 17.8 23.7 20.5 23.5 18.9 21.9 15.7 23 20.5 ...
## $ age_over_65 : num 22 12 13.6 16.6 13.9 21.4 20 28 14.9 21 ...
## $ female : num 49.5 49.9 50.8 50.3 49.9 50.7 50.4 50.6 50.6 50.6 ...
## $ white : num 94.6 87.1 88.2 90.9 92.5 89.8 92.7 92 92.2 92.4 ...
## $ black : num 0.4 0.9 0.8 0.5 0.4 0.4 0.2 0.3 0.4 0.3 ...
## $ native : num 1.1 0.7 0.8 1 1.3 2.5 1.4 1.9 0.9 1.8 ...
## $ asian : num 0.5 5.2 3.7 1.2 0.9 1 0.5 0.7 0.9 1 ...
## $ pac_isl : num NA 0.2 0.2 0.2 0.2 0.2 NA 0.1 0.1 0.1 ...
## $ two_plus_races : num 2.4 3.6 3.2 2.8 3.4 4.3 2 3.7 2.5 3.2 ...
## $ hispanic : num 3.3 6.4 7.7 7.7 4 5.4 7 5.4 7.4 4.7 ...
## $ white_not_hispanic : num 92.6 83.6 84.5 87.2 90.3 87 89.4 88.7 88.4 89.5 ...
## $ no_move_in_one_plus_year : num 84.7 74.6 85.3 79.2 85.2 79.3 81.8 83.3 83.6 82.6 ...
## $ foreign_born : num 0.5 8.3 8.5 5 3.2 3.2 4.2 3.8 4.7 2.7 ...
## $ foreign_spoken_at_home : num 0.9 11 11.2 7.8 4.4 4.6 5.8 4.4 6.5 4.5 ...
## $ hs_grad : num 88.7 94.2 91.5 91.1 88.5 85.8 85.7 91.6 92.9 86.4 ...
## $ bachelors : num 20.5 47.9 31.4 21.6 16.8 18.3 15.4 18.5 29.1 15.5 ...
## $ veterans : int 2214 6263 31985 4447 5990 9268 2642 3536 15683 14861 ...
## $ mean_work_travel : num 14.8 17.9 26 17.5 31.1 18.9 20.6 14.1 18.6 20 ...
## $ housing_units : int 8826 36245 156945 21546 20698 30593 10202 12613 80139 48915 ...
## $ home_ownership : num 71.2 57.2 70.7 62 76.8 67 72.1 70.8 68 70.6 ...
## $ housing_multi_unit : num 11 30 21.3 22.7 10.6 12.9 7.2 10.4 14.9 12.1 ...
## $ median_val_owner_occupied : num 142400 263200 331100 253100 219800 ...
## $ households : int 6902 33471 143357 16267 19075 27247 8754 10473 63190 43916 ...
## $ persons_per_household : num 2.32 2.35 2.56 2.18 2.56 2.29 2.45 2.14 2.43 2.41 ...
## $ per_capita_income : int 21683 26177 31785 25347 24613 21981 22275 23842 27920 21342 ...
## $ median_household_income : int 39704 48012 62007 42223 55199 37491 46059 37469 53071 39711 ...
## $ poverty : num 19.9 19.1 9 12.8 10.3 16.4 14 13.9 10.5 15.6 ...
## $ private_nonfarm_establishments : int 548 2077 10865 1480 964 1720 501 721 5951 2621 ...
## $ private_nonfarm_employment : int 3840 25094 124588 13640 7284 17007 3917 4854 52759 28065 ...
## $ percent_change_private_nonfarm_employment: num -1.8 -7.9 7.7 19.8 -19.4 -6.5 -22.3 -0.7 23 -7.2 ...
## $ nonemployment_establishments : int 1161 5182 26594 2771 2719 3713 1410 1832 14497 5877 ...
## $ firms : int 1761 7157 36997 4221 4017 5034 2236 2490 21423 9160 ...
## $ black_owned_firms : num NA NA 0.6 NA NA NA NA NA 0.3 NA ...
## $ native_owned_firms : num NA NA 0.6 NA 1.1 NA NA 1.2 1 NA ...
## $ asian_owned_firms : num NA 3.4 3.6 2 NA NA NA NA 1.6 1.8 ...
## $ pac_isl_owned_firms : num NA NA NA NA NA NA NA NA 0.1 NA ...
## $ hispanic_owned_firms : num NA NA 2.7 NA NA NA NA NA 1.8 NA ...
## $ women_owned_firms : num NA 29.3 28.9 22.7 27.5 30.2 29.2 26.4 26.2 23.6 ...
## $ manufacturer_shipments_2007 : int 137989 590473 5668235 683627 826200 279107 209877 190654 897387 1506405 ...
## $ mercent_whole_sales_2007 : int 19141 387072 5292449 75515 96784 260759 100505 NA 1222598 NA ...
## $ sales : int 155456 685172 5095774 604480 357383 716036 139916 227007 2809059 1169216 ...
## $ sales_per_capita : int 9710 8398 13567 16299 7306 11339 6147 10555 18311 11276 ...
## $ accommodation_food_service : int 25659 134027 607444 160570 44223 83538 22704 56246 416425 228371 ...
## $ building_permits : int 34 93 665 160 62 24 61 27 377 181 ...
## $ fed_spending : int 171016 600481 2206421 356029 329573 643165 173511 225362 897936 1045528 ...
## $ area : num 3068 676 1870 829 657 ...
## $ density : num 5.3 126.6 201 44.7 75.1 ...
## $ state_f1 : Factor w/ 2 levels "Oregon","Washington": 1 1 1 1 1 1 1 1 1 1 ...
## $ state_f2 : Factor w/ 2 levels "OR","WA": 1 1 1 1 1 1 1 1 1 1 ...
## $ state_f2c : chr "OR" "OR" "OR" "OR" ...
For a good tutorial on factors see
http://www.ats.ucla.edu/stat/r/modules/factor_variables.htm
Most computer languages come equipped with similar standard features to implement the logic of algorithms.
In the countyComplete dataset, we have a variable density, probably people per square mile. I’d like to create a categorical variable that breaks this variable down into categories High, Low and Medium.
Your task is to create a character variable densityCat in countyComplete with these categorical values. After you create the character variable, you should create a factor version in a second variable densityCatF. Low is defined as the first quartile. Medium is the second and third quartiles. High is the fourth quartile. You will need the quantile function. Use Google for help. It’s your best friend!