Section 4.4

  1. Load the dplyr package and the murders dataset.

Use the function mutate to add a murders column named rate with the per 100,000 murder rate as in the example code above. Make sure you redefine murders as done in the example code above ( murders <- [your code]) so we can keep using this variable.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(dslabs)
data(murders)
murders<-mutate(murders, rate=total/(population/100000))
murders
##                   state abb        region population total       rate
## 1               Alabama  AL         South    4779736   135  2.8244238
## 2                Alaska  AK          West     710231    19  2.6751860
## 3               Arizona  AZ          West    6392017   232  3.6295273
## 4              Arkansas  AR         South    2915918    93  3.1893901
## 5            California  CA          West   37253956  1257  3.3741383
## 6              Colorado  CO          West    5029196    65  1.2924531
## 7           Connecticut  CT     Northeast    3574097    97  2.7139722
## 8              Delaware  DE         South     897934    38  4.2319369
## 9  District of Columbia  DC         South     601723    99 16.4527532
## 10              Florida  FL         South   19687653   669  3.3980688
## 11              Georgia  GA         South    9920000   376  3.7903226
## 12               Hawaii  HI          West    1360301     7  0.5145920
## 13                Idaho  ID          West    1567582    12  0.7655102
## 14             Illinois  IL North Central   12830632   364  2.8369608
## 15              Indiana  IN North Central    6483802   142  2.1900730
## 16                 Iowa  IA North Central    3046355    21  0.6893484
## 17               Kansas  KS North Central    2853118    63  2.2081106
## 18             Kentucky  KY         South    4339367   116  2.6732010
## 19            Louisiana  LA         South    4533372   351  7.7425810
## 20                Maine  ME     Northeast    1328361    11  0.8280881
## 21             Maryland  MD         South    5773552   293  5.0748655
## 22        Massachusetts  MA     Northeast    6547629   118  1.8021791
## 23             Michigan  MI North Central    9883640   413  4.1786225
## 24            Minnesota  MN North Central    5303925    53  0.9992600
## 25          Mississippi  MS         South    2967297   120  4.0440846
## 26             Missouri  MO North Central    5988927   321  5.3598917
## 27              Montana  MT          West     989415    12  1.2128379
## 28             Nebraska  NE North Central    1826341    32  1.7521372
## 29               Nevada  NV          West    2700551    84  3.1104763
## 30        New Hampshire  NH     Northeast    1316470     5  0.3798036
## 31           New Jersey  NJ     Northeast    8791894   246  2.7980319
## 32           New Mexico  NM          West    2059179    67  3.2537239
## 33             New York  NY     Northeast   19378102   517  2.6679599
## 34       North Carolina  NC         South    9535483   286  2.9993237
## 35         North Dakota  ND North Central     672591     4  0.5947151
## 36                 Ohio  OH North Central   11536504   310  2.6871225
## 37             Oklahoma  OK         South    3751351   111  2.9589340
## 38               Oregon  OR          West    3831074    36  0.9396843
## 39         Pennsylvania  PA     Northeast   12702379   457  3.5977513
## 40         Rhode Island  RI     Northeast    1052567    16  1.5200933
## 41       South Carolina  SC         South    4625364   207  4.4753235
## 42         South Dakota  SD North Central     814180     8  0.9825837
## 43            Tennessee  TN         South    6346105   219  3.4509357
## 44                Texas  TX         South   25145561   805  3.2013603
## 45                 Utah  UT          West    2763885    22  0.7959810
## 46              Vermont  VT     Northeast     625741     2  0.3196211
## 47             Virginia  VA         South    8001024   250  3.1246001
## 48           Washington  WA          West    6724540    93  1.3829942
## 49        West Virginia  WV         South    1852994    27  1.4571013
## 50            Wisconsin  WI North Central    5686986    97  1.7056487
## 51              Wyoming  WY          West     563626     5  0.8871131
  1. If rank(x) gives you the ranks of x from lowest to highest, rank(-x) gives you the ranks from highest to lowest. Use the function mutate to add a column rank containing the rank, from highest to lowest murder rate. Make sure you redefine murders so we can keep using this variable.
murders<-mutate(murders, rank=rank(-rate))
murders
##                   state abb        region population total       rate rank
## 1               Alabama  AL         South    4779736   135  2.8244238   23
## 2                Alaska  AK          West     710231    19  2.6751860   27
## 3               Arizona  AZ          West    6392017   232  3.6295273   10
## 4              Arkansas  AR         South    2915918    93  3.1893901   17
## 5            California  CA          West   37253956  1257  3.3741383   14
## 6              Colorado  CO          West    5029196    65  1.2924531   38
## 7           Connecticut  CT     Northeast    3574097    97  2.7139722   25
## 8              Delaware  DE         South     897934    38  4.2319369    6
## 9  District of Columbia  DC         South     601723    99 16.4527532    1
## 10              Florida  FL         South   19687653   669  3.3980688   13
## 11              Georgia  GA         South    9920000   376  3.7903226    9
## 12               Hawaii  HI          West    1360301     7  0.5145920   49
## 13                Idaho  ID          West    1567582    12  0.7655102   46
## 14             Illinois  IL North Central   12830632   364  2.8369608   22
## 15              Indiana  IN North Central    6483802   142  2.1900730   31
## 16                 Iowa  IA North Central    3046355    21  0.6893484   47
## 17               Kansas  KS North Central    2853118    63  2.2081106   30
## 18             Kentucky  KY         South    4339367   116  2.6732010   28
## 19            Louisiana  LA         South    4533372   351  7.7425810    2
## 20                Maine  ME     Northeast    1328361    11  0.8280881   44
## 21             Maryland  MD         South    5773552   293  5.0748655    4
## 22        Massachusetts  MA     Northeast    6547629   118  1.8021791   32
## 23             Michigan  MI North Central    9883640   413  4.1786225    7
## 24            Minnesota  MN North Central    5303925    53  0.9992600   40
## 25          Mississippi  MS         South    2967297   120  4.0440846    8
## 26             Missouri  MO North Central    5988927   321  5.3598917    3
## 27              Montana  MT          West     989415    12  1.2128379   39
## 28             Nebraska  NE North Central    1826341    32  1.7521372   33
## 29               Nevada  NV          West    2700551    84  3.1104763   19
## 30        New Hampshire  NH     Northeast    1316470     5  0.3798036   50
## 31           New Jersey  NJ     Northeast    8791894   246  2.7980319   24
## 32           New Mexico  NM          West    2059179    67  3.2537239   15
## 33             New York  NY     Northeast   19378102   517  2.6679599   29
## 34       North Carolina  NC         South    9535483   286  2.9993237   20
## 35         North Dakota  ND North Central     672591     4  0.5947151   48
## 36                 Ohio  OH North Central   11536504   310  2.6871225   26
## 37             Oklahoma  OK         South    3751351   111  2.9589340   21
## 38               Oregon  OR          West    3831074    36  0.9396843   42
## 39         Pennsylvania  PA     Northeast   12702379   457  3.5977513   11
## 40         Rhode Island  RI     Northeast    1052567    16  1.5200933   35
## 41       South Carolina  SC         South    4625364   207  4.4753235    5
## 42         South Dakota  SD North Central     814180     8  0.9825837   41
## 43            Tennessee  TN         South    6346105   219  3.4509357   12
## 44                Texas  TX         South   25145561   805  3.2013603   16
## 45                 Utah  UT          West    2763885    22  0.7959810   45
## 46              Vermont  VT     Northeast     625741     2  0.3196211   51
## 47             Virginia  VA         South    8001024   250  3.1246001   18
## 48           Washington  WA          West    6724540    93  1.3829942   37
## 49        West Virginia  WV         South    1852994    27  1.4571013   36
## 50            Wisconsin  WI North Central    5686986    97  1.7056487   34
## 51              Wyoming  WY          West     563626     5  0.8871131   43
  1. With dplyr, we can use select to show only certain columns. For example, with this code we would only show the states and population sizes. Use select to show the state names and abbreviations in murders. Do not redefine murders, just show the results.
select(murders, state, abb) 
##                   state abb
## 1               Alabama  AL
## 2                Alaska  AK
## 3               Arizona  AZ
## 4              Arkansas  AR
## 5            California  CA
## 6              Colorado  CO
## 7           Connecticut  CT
## 8              Delaware  DE
## 9  District of Columbia  DC
## 10              Florida  FL
## 11              Georgia  GA
## 12               Hawaii  HI
## 13                Idaho  ID
## 14             Illinois  IL
## 15              Indiana  IN
## 16                 Iowa  IA
## 17               Kansas  KS
## 18             Kentucky  KY
## 19            Louisiana  LA
## 20                Maine  ME
## 21             Maryland  MD
## 22        Massachusetts  MA
## 23             Michigan  MI
## 24            Minnesota  MN
## 25          Mississippi  MS
## 26             Missouri  MO
## 27              Montana  MT
## 28             Nebraska  NE
## 29               Nevada  NV
## 30        New Hampshire  NH
## 31           New Jersey  NJ
## 32           New Mexico  NM
## 33             New York  NY
## 34       North Carolina  NC
## 35         North Dakota  ND
## 36                 Ohio  OH
## 37             Oklahoma  OK
## 38               Oregon  OR
## 39         Pennsylvania  PA
## 40         Rhode Island  RI
## 41       South Carolina  SC
## 42         South Dakota  SD
## 43            Tennessee  TN
## 44                Texas  TX
## 45                 Utah  UT
## 46              Vermont  VT
## 47             Virginia  VA
## 48           Washington  WA
## 49        West Virginia  WV
## 50            Wisconsin  WI
## 51              Wyoming  WY
  1. The dplyr function filter is used to choose specific rows of the data frame to keep. Unlike select which is for columns, filter is for rows. For example, you can show just the New York row like this.
filter(murders, state == "New York") 

You can use other logical vectors to filter rows.You can use other logical vectors to filter rows. Use filter to show the top 5 states with the highest murder rates. After we add murder rate and rank, do not change the murders dataset, just show the result. Remember that you can filter based on the rank column.

filter(murders, rank<6)
##                  state abb        region population total      rate rank
## 1 District of Columbia  DC         South     601723    99 16.452753    1
## 2            Louisiana  LA         South    4533372   351  7.742581    2
## 3             Maryland  MD         South    5773552   293  5.074866    4
## 4             Missouri  MO North Central    5988927   321  5.359892    3
## 5       South Carolina  SC         South    4625364   207  4.475323    5

5.  We can remove rows using the != operator. For example, to remove Florida, we would do this:

no_florida <- filter(murders, state != "Florida") 

Create a new data frame called no_south that removes states from the South region. How many states are in this category? You can use the function nrow for this.

no_south<-filter(murders, region !="South")
no_south
##            state abb        region population total      rate rank
## 1         Alaska  AK          West     710231    19 2.6751860   27
## 2        Arizona  AZ          West    6392017   232 3.6295273   10
## 3     California  CA          West   37253956  1257 3.3741383   14
## 4       Colorado  CO          West    5029196    65 1.2924531   38
## 5    Connecticut  CT     Northeast    3574097    97 2.7139722   25
## 6         Hawaii  HI          West    1360301     7 0.5145920   49
## 7          Idaho  ID          West    1567582    12 0.7655102   46
## 8       Illinois  IL North Central   12830632   364 2.8369608   22
## 9        Indiana  IN North Central    6483802   142 2.1900730   31
## 10          Iowa  IA North Central    3046355    21 0.6893484   47
## 11        Kansas  KS North Central    2853118    63 2.2081106   30
## 12         Maine  ME     Northeast    1328361    11 0.8280881   44
## 13 Massachusetts  MA     Northeast    6547629   118 1.8021791   32
## 14      Michigan  MI North Central    9883640   413 4.1786225    7
## 15     Minnesota  MN North Central    5303925    53 0.9992600   40
## 16      Missouri  MO North Central    5988927   321 5.3598917    3
## 17       Montana  MT          West     989415    12 1.2128379   39
## 18      Nebraska  NE North Central    1826341    32 1.7521372   33
## 19        Nevada  NV          West    2700551    84 3.1104763   19
## 20 New Hampshire  NH     Northeast    1316470     5 0.3798036   50
## 21    New Jersey  NJ     Northeast    8791894   246 2.7980319   24
## 22    New Mexico  NM          West    2059179    67 3.2537239   15
## 23      New York  NY     Northeast   19378102   517 2.6679599   29
## 24  North Dakota  ND North Central     672591     4 0.5947151   48
## 25          Ohio  OH North Central   11536504   310 2.6871225   26
## 26        Oregon  OR          West    3831074    36 0.9396843   42
## 27  Pennsylvania  PA     Northeast   12702379   457 3.5977513   11
## 28  Rhode Island  RI     Northeast    1052567    16 1.5200933   35
## 29  South Dakota  SD North Central     814180     8 0.9825837   41
## 30          Utah  UT          West    2763885    22 0.7959810   45
## 31       Vermont  VT     Northeast     625741     2 0.3196211   51
## 32    Washington  WA          West    6724540    93 1.3829942   37
## 33     Wisconsin  WI North Central    5686986    97 1.7056487   34
## 34       Wyoming  WY          West     563626     5 0.8871131   43
nrow(no_south)
## [1] 34

6. We can also use %in% to filter with dplyr. You can therefore see the data from New York and Texas like this. Create a new data frame called murders_nw with only the states from the Northeast and the West. How many states are in this category?

filter(murders, state %in% c("New York", "Texas")) 
murders_nw<-filter(murders,region %in% c("Northeast", "West"))
murders_nw
##            state abb    region population total      rate rank
## 1         Alaska  AK      West     710231    19 2.6751860   27
## 2        Arizona  AZ      West    6392017   232 3.6295273   10
## 3     California  CA      West   37253956  1257 3.3741383   14
## 4       Colorado  CO      West    5029196    65 1.2924531   38
## 5    Connecticut  CT Northeast    3574097    97 2.7139722   25
## 6         Hawaii  HI      West    1360301     7 0.5145920   49
## 7          Idaho  ID      West    1567582    12 0.7655102   46
## 8          Maine  ME Northeast    1328361    11 0.8280881   44
## 9  Massachusetts  MA Northeast    6547629   118 1.8021791   32
## 10       Montana  MT      West     989415    12 1.2128379   39
## 11        Nevada  NV      West    2700551    84 3.1104763   19
## 12 New Hampshire  NH Northeast    1316470     5 0.3798036   50
## 13    New Jersey  NJ Northeast    8791894   246 2.7980319   24
## 14    New Mexico  NM      West    2059179    67 3.2537239   15
## 15      New York  NY Northeast   19378102   517 2.6679599   29
## 16        Oregon  OR      West    3831074    36 0.9396843   42
## 17  Pennsylvania  PA Northeast   12702379   457 3.5977513   11
## 18  Rhode Island  RI Northeast    1052567    16 1.5200933   35
## 19          Utah  UT      West    2763885    22 0.7959810   45
## 20       Vermont  VT Northeast     625741     2 0.3196211   51
## 21    Washington  WA      West    6724540    93 1.3829942   37
## 22       Wyoming  WY      West     563626     5 0.8871131   43
nrow(murders_nw)
## [1] 22

7. Suppose you want to live in the Northeast or West and want the murder rate to be less than 1. We want to see the data for the states satisfying these options. Note that you can use logical operators with filter. Here is an example in which we filter to keep only small states in the Northeast region.

filter(murders, population \< 5000000 & region == "Northeast") 

Make sure murders has been defined with rate and rank and still has all states. Create a table called my_states that contains rows for states satisfying both the conditions: it is in the Northeast or West and the murder rate is less than 1. Use select to show only the state name, the rate, and the rank.

my_states<-filter(murders, region %in% c("Northeast", "West") & rate<1)
select(my_states, state, rate, rank)
##           state      rate rank
## 1        Hawaii 0.5145920   49
## 2         Idaho 0.7655102   46
## 3         Maine 0.8280881   44
## 4 New Hampshire 0.3798036   50
## 5        Oregon 0.9396843   42
## 6          Utah 0.7959810   45
## 7       Vermont 0.3196211   51
## 8       Wyoming 0.8871131   43