state county total_employees
1 AE APO 2
2 AK ANCHORAGE 7
3 AK FAIRBANKS NORTH STAR 2
4 AK JUNEAU 3
5 AK MATANUSKA-SUSITNA 2
6 AK SITKA 1
7 AK SKAGWAY MUNICIPALITY 88
8 AL AUTAUGA 102
9 AL BALDWIN 143
10 AL BARBOUR 1
11 AL BIBB 25
12 AL BLOUNT 154
13 AL BULLOCK 13
14 AL BUTLER 29
15 AL CALHOUN 45
16 AL CHAMBERS 13
17 AL CHEROKEE 9
18 AL CHILTON 72
19 AL CHOCTAW 7
20 AL CLARKE 26
21 AL CLAY 10
22 AL CLEBURNE 7
23 AL COFFEE 14
24 AL COLBERT 199
25 AL CONECUH 11
Describing the Data
My interpretation is that the data was collected from some of the railroad stations around the US, with the columns corresponding to the location and number of employees at each station. The dataset is composed of 3 columns. The first column state includes the two letter acronym for a state which is stored as a character string. The second column county refers to the county within the corresponding state and is also stored as a character string. The third column total_employees has the total number of employees at the corresponding state/county pair, stored as an integer.
We can view the locations with most and least employees by computing the following:
dataset[which.max(dataset$total_employees), ] # maximum
state county total_employees
659 IL COOK 8207
We see that the mean number of employees is ~87.18 with a standard deviation of ~283.64. The minimum employees (or one of the tied minimums) is 1 in Sitka county in Alaska, and the maximum employees is 8207 in Cook county in Illinois.
We can also compute the number of times that each state appears in the dataframe:
state_counts <-table(dataset$state)state_counts
AE AK AL AP AR AZ CA CO CT DC DE FL GA HI IA ID IL IN KS KY
1 6 67 1 72 15 55 57 8 1 3 67 152 3 99 36 103 92 95 119
LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH OK OR
63 12 24 16 78 86 115 78 53 94 49 89 10 21 29 12 61 88 73 33
PA RI SC SD TN TX UT VA VT WA WI WV WY
65 5 46 52 91 221 25 92 14 39 69 53 22
We can see each state end its corresponding number of occurrences in the list. We see that there are two states which only occur a single time (AE and DC), while Texas occurs the most with 221 occurrences. We can also note that there are 53 states listed, so the dataset is also including things like AE or “Armed Forces Europe” as a state rather than the typical 50.