Loading Data

library("dslabs")
data("murders")
df <- murders

Q1: Use the function str to examine the structure of the murders object. We can see that this object is a data frame with 51 rows and fve columns. Which of the following best describes the variables represented in this data frame?

str(df)
## 'data.frame':    51 obs. of  5 variables:
##  $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ abb       : chr  "AL" "AK" "AZ" "AR" ...
##  $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
##  $ population: num  4779736 710231 6392017 2915918 37253956 ...
##  $ total     : num  135 19 232 93 1257 ...

Q2: What are the column names used by the data frame for these fve variables?

column_name <- names(murders)
column_name
## [1] "state"      "abb"        "region"     "population" "total"

Q3: Use the accessor $ to extract the state abbreviations and assign them to the object a. What is the class of this object?

a <- murders$abb
a
##  [1] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "DC" "FL" "GA" "HI" "ID" "IL" "IN"
## [16] "IA" "KS" "KY" "LA" "ME" "MD" "MA" "MI" "MN" "MS" "MO" "MT" "NE" "NV" "NH"
## [31] "NJ" "NM" "NY" "NC" "ND" "OH" "OK" "OR" "PA" "RI" "SC" "SD" "TN" "TX" "UT"
## [46] "VT" "VA" "WA" "WV" "WI" "WY"
class(a)
## [1] "character"

Q4: Now use the square brackets to extract the state abbreviations and assign them to the object b. Use the identical function to determine if a and b are the same.

b <- murders["abb"]
b
##    abb
## 1   AL
## 2   AK
## 3   AZ
## 4   AR
## 5   CA
## 6   CO
## 7   CT
## 8   DE
## 9   DC
## 10  FL
## 11  GA
## 12  HI
## 13  ID
## 14  IL
## 15  IN
## 16  IA
## 17  KS
## 18  KY
## 19  LA
## 20  ME
## 21  MD
## 22  MA
## 23  MI
## 24  MN
## 25  MS
## 26  MO
## 27  MT
## 28  NE
## 29  NV
## 30  NH
## 31  NJ
## 32  NM
## 33  NY
## 34  NC
## 35  ND
## 36  OH
## 37  OK
## 38  OR
## 39  PA
## 40  RI
## 41  SC
## 42  SD
## 43  TN
## 44  TX
## 45  UT
## 46  VT
## 47  VA
## 48  WA
## 49  WV
## 50  WI
## 51  WY
c <- a == b
c
##        abb
##  [1,] TRUE
##  [2,] TRUE
##  [3,] TRUE
##  [4,] TRUE
##  [5,] TRUE
##  [6,] TRUE
##  [7,] TRUE
##  [8,] TRUE
##  [9,] TRUE
## [10,] TRUE
## [11,] TRUE
## [12,] TRUE
## [13,] TRUE
## [14,] TRUE
## [15,] TRUE
## [16,] TRUE
## [17,] TRUE
## [18,] TRUE
## [19,] TRUE
## [20,] TRUE
## [21,] TRUE
## [22,] TRUE
## [23,] TRUE
## [24,] TRUE
## [25,] TRUE
## [26,] TRUE
## [27,] TRUE
## [28,] TRUE
## [29,] TRUE
## [30,] TRUE
## [31,] TRUE
## [32,] TRUE
## [33,] TRUE
## [34,] TRUE
## [35,] TRUE
## [36,] TRUE
## [37,] TRUE
## [38,] TRUE
## [39,] TRUE
## [40,] TRUE
## [41,] TRUE
## [42,] TRUE
## [43,] TRUE
## [44,] TRUE
## [45,] TRUE
## [46,] TRUE
## [47,] TRUE
## [48,] TRUE
## [49,] TRUE
## [50,] TRUE
## [51,] TRUE

Q5: We saw that the region column stores a factor. You can corroborate this by typing:

fctr <- class(murders$region)

length(levels(murders$region))
## [1] 4