Section 4.4
Use the function mutate to add a murders column named rate with the per 100,000 murder rate as in the example code above. Make sure you redefine murders as done in the example code above ( murders <- [your code]) so we can keep using this variable.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dslabs)
data(murders)
murders<-mutate(murders, rate=total/(population/100000))
murders
## state abb region population total rate
## 1 Alabama AL South 4779736 135 2.8244238
## 2 Alaska AK West 710231 19 2.6751860
## 3 Arizona AZ West 6392017 232 3.6295273
## 4 Arkansas AR South 2915918 93 3.1893901
## 5 California CA West 37253956 1257 3.3741383
## 6 Colorado CO West 5029196 65 1.2924531
## 7 Connecticut CT Northeast 3574097 97 2.7139722
## 8 Delaware DE South 897934 38 4.2319369
## 9 District of Columbia DC South 601723 99 16.4527532
## 10 Florida FL South 19687653 669 3.3980688
## 11 Georgia GA South 9920000 376 3.7903226
## 12 Hawaii HI West 1360301 7 0.5145920
## 13 Idaho ID West 1567582 12 0.7655102
## 14 Illinois IL North Central 12830632 364 2.8369608
## 15 Indiana IN North Central 6483802 142 2.1900730
## 16 Iowa IA North Central 3046355 21 0.6893484
## 17 Kansas KS North Central 2853118 63 2.2081106
## 18 Kentucky KY South 4339367 116 2.6732010
## 19 Louisiana LA South 4533372 351 7.7425810
## 20 Maine ME Northeast 1328361 11 0.8280881
## 21 Maryland MD South 5773552 293 5.0748655
## 22 Massachusetts MA Northeast 6547629 118 1.8021791
## 23 Michigan MI North Central 9883640 413 4.1786225
## 24 Minnesota MN North Central 5303925 53 0.9992600
## 25 Mississippi MS South 2967297 120 4.0440846
## 26 Missouri MO North Central 5988927 321 5.3598917
## 27 Montana MT West 989415 12 1.2128379
## 28 Nebraska NE North Central 1826341 32 1.7521372
## 29 Nevada NV West 2700551 84 3.1104763
## 30 New Hampshire NH Northeast 1316470 5 0.3798036
## 31 New Jersey NJ Northeast 8791894 246 2.7980319
## 32 New Mexico NM West 2059179 67 3.2537239
## 33 New York NY Northeast 19378102 517 2.6679599
## 34 North Carolina NC South 9535483 286 2.9993237
## 35 North Dakota ND North Central 672591 4 0.5947151
## 36 Ohio OH North Central 11536504 310 2.6871225
## 37 Oklahoma OK South 3751351 111 2.9589340
## 38 Oregon OR West 3831074 36 0.9396843
## 39 Pennsylvania PA Northeast 12702379 457 3.5977513
## 40 Rhode Island RI Northeast 1052567 16 1.5200933
## 41 South Carolina SC South 4625364 207 4.4753235
## 42 South Dakota SD North Central 814180 8 0.9825837
## 43 Tennessee TN South 6346105 219 3.4509357
## 44 Texas TX South 25145561 805 3.2013603
## 45 Utah UT West 2763885 22 0.7959810
## 46 Vermont VT Northeast 625741 2 0.3196211
## 47 Virginia VA South 8001024 250 3.1246001
## 48 Washington WA West 6724540 93 1.3829942
## 49 West Virginia WV South 1852994 27 1.4571013
## 50 Wisconsin WI North Central 5686986 97 1.7056487
## 51 Wyoming WY West 563626 5 0.8871131
murders<-mutate(murders, rank=rank(-rate))
murders
## state abb region population total rate rank
## 1 Alabama AL South 4779736 135 2.8244238 23
## 2 Alaska AK West 710231 19 2.6751860 27
## 3 Arizona AZ West 6392017 232 3.6295273 10
## 4 Arkansas AR South 2915918 93 3.1893901 17
## 5 California CA West 37253956 1257 3.3741383 14
## 6 Colorado CO West 5029196 65 1.2924531 38
## 7 Connecticut CT Northeast 3574097 97 2.7139722 25
## 8 Delaware DE South 897934 38 4.2319369 6
## 9 District of Columbia DC South 601723 99 16.4527532 1
## 10 Florida FL South 19687653 669 3.3980688 13
## 11 Georgia GA South 9920000 376 3.7903226 9
## 12 Hawaii HI West 1360301 7 0.5145920 49
## 13 Idaho ID West 1567582 12 0.7655102 46
## 14 Illinois IL North Central 12830632 364 2.8369608 22
## 15 Indiana IN North Central 6483802 142 2.1900730 31
## 16 Iowa IA North Central 3046355 21 0.6893484 47
## 17 Kansas KS North Central 2853118 63 2.2081106 30
## 18 Kentucky KY South 4339367 116 2.6732010 28
## 19 Louisiana LA South 4533372 351 7.7425810 2
## 20 Maine ME Northeast 1328361 11 0.8280881 44
## 21 Maryland MD South 5773552 293 5.0748655 4
## 22 Massachusetts MA Northeast 6547629 118 1.8021791 32
## 23 Michigan MI North Central 9883640 413 4.1786225 7
## 24 Minnesota MN North Central 5303925 53 0.9992600 40
## 25 Mississippi MS South 2967297 120 4.0440846 8
## 26 Missouri MO North Central 5988927 321 5.3598917 3
## 27 Montana MT West 989415 12 1.2128379 39
## 28 Nebraska NE North Central 1826341 32 1.7521372 33
## 29 Nevada NV West 2700551 84 3.1104763 19
## 30 New Hampshire NH Northeast 1316470 5 0.3798036 50
## 31 New Jersey NJ Northeast 8791894 246 2.7980319 24
## 32 New Mexico NM West 2059179 67 3.2537239 15
## 33 New York NY Northeast 19378102 517 2.6679599 29
## 34 North Carolina NC South 9535483 286 2.9993237 20
## 35 North Dakota ND North Central 672591 4 0.5947151 48
## 36 Ohio OH North Central 11536504 310 2.6871225 26
## 37 Oklahoma OK South 3751351 111 2.9589340 21
## 38 Oregon OR West 3831074 36 0.9396843 42
## 39 Pennsylvania PA Northeast 12702379 457 3.5977513 11
## 40 Rhode Island RI Northeast 1052567 16 1.5200933 35
## 41 South Carolina SC South 4625364 207 4.4753235 5
## 42 South Dakota SD North Central 814180 8 0.9825837 41
## 43 Tennessee TN South 6346105 219 3.4509357 12
## 44 Texas TX South 25145561 805 3.2013603 16
## 45 Utah UT West 2763885 22 0.7959810 45
## 46 Vermont VT Northeast 625741 2 0.3196211 51
## 47 Virginia VA South 8001024 250 3.1246001 18
## 48 Washington WA West 6724540 93 1.3829942 37
## 49 West Virginia WV South 1852994 27 1.4571013 36
## 50 Wisconsin WI North Central 5686986 97 1.7056487 34
## 51 Wyoming WY West 563626 5 0.8871131 43
select to show the state names and abbreviations
in murders. Do not redefine murders, just show
the results.select(murders, state, abb)
## state abb
## 1 Alabama AL
## 2 Alaska AK
## 3 Arizona AZ
## 4 Arkansas AR
## 5 California CA
## 6 Colorado CO
## 7 Connecticut CT
## 8 Delaware DE
## 9 District of Columbia DC
## 10 Florida FL
## 11 Georgia GA
## 12 Hawaii HI
## 13 Idaho ID
## 14 Illinois IL
## 15 Indiana IN
## 16 Iowa IA
## 17 Kansas KS
## 18 Kentucky KY
## 19 Louisiana LA
## 20 Maine ME
## 21 Maryland MD
## 22 Massachusetts MA
## 23 Michigan MI
## 24 Minnesota MN
## 25 Mississippi MS
## 26 Missouri MO
## 27 Montana MT
## 28 Nebraska NE
## 29 Nevada NV
## 30 New Hampshire NH
## 31 New Jersey NJ
## 32 New Mexico NM
## 33 New York NY
## 34 North Carolina NC
## 35 North Dakota ND
## 36 Ohio OH
## 37 Oklahoma OK
## 38 Oregon OR
## 39 Pennsylvania PA
## 40 Rhode Island RI
## 41 South Carolina SC
## 42 South Dakota SD
## 43 Tennessee TN
## 44 Texas TX
## 45 Utah UT
## 46 Vermont VT
## 47 Virginia VA
## 48 Washington WA
## 49 West Virginia WV
## 50 Wisconsin WI
## 51 Wyoming WY
filter(murders, state == "New York")
You can use other logical vectors to filter rows.You can use other logical vectors to filter rows. Use filter to show the top 5 states with the highest murder rates. After we add murder rate and rank, do not change the murders dataset, just show the result. Remember that you can filter based on the rank column.
filter(murders, rank<6)
## state abb region population total rate rank
## 1 District of Columbia DC South 601723 99 16.452753 1
## 2 Louisiana LA South 4533372 351 7.742581 2
## 3 Maryland MD South 5773552 293 5.074866 4
## 4 Missouri MO North Central 5988927 321 5.359892 3
## 5 South Carolina SC South 4625364 207 4.475323 5
5. We can remove rows using the != operator. For example, to remove Florida, we would do this:
no_florida <- filter(murders, state != "Florida")
Create a new data frame called no_south that removes states from the South region. How many states are in this category? You can use the function nrow for this.
no_south<-filter(murders, region !="South")
no_south
## state abb region population total rate rank
## 1 Alaska AK West 710231 19 2.6751860 27
## 2 Arizona AZ West 6392017 232 3.6295273 10
## 3 California CA West 37253956 1257 3.3741383 14
## 4 Colorado CO West 5029196 65 1.2924531 38
## 5 Connecticut CT Northeast 3574097 97 2.7139722 25
## 6 Hawaii HI West 1360301 7 0.5145920 49
## 7 Idaho ID West 1567582 12 0.7655102 46
## 8 Illinois IL North Central 12830632 364 2.8369608 22
## 9 Indiana IN North Central 6483802 142 2.1900730 31
## 10 Iowa IA North Central 3046355 21 0.6893484 47
## 11 Kansas KS North Central 2853118 63 2.2081106 30
## 12 Maine ME Northeast 1328361 11 0.8280881 44
## 13 Massachusetts MA Northeast 6547629 118 1.8021791 32
## 14 Michigan MI North Central 9883640 413 4.1786225 7
## 15 Minnesota MN North Central 5303925 53 0.9992600 40
## 16 Missouri MO North Central 5988927 321 5.3598917 3
## 17 Montana MT West 989415 12 1.2128379 39
## 18 Nebraska NE North Central 1826341 32 1.7521372 33
## 19 Nevada NV West 2700551 84 3.1104763 19
## 20 New Hampshire NH Northeast 1316470 5 0.3798036 50
## 21 New Jersey NJ Northeast 8791894 246 2.7980319 24
## 22 New Mexico NM West 2059179 67 3.2537239 15
## 23 New York NY Northeast 19378102 517 2.6679599 29
## 24 North Dakota ND North Central 672591 4 0.5947151 48
## 25 Ohio OH North Central 11536504 310 2.6871225 26
## 26 Oregon OR West 3831074 36 0.9396843 42
## 27 Pennsylvania PA Northeast 12702379 457 3.5977513 11
## 28 Rhode Island RI Northeast 1052567 16 1.5200933 35
## 29 South Dakota SD North Central 814180 8 0.9825837 41
## 30 Utah UT West 2763885 22 0.7959810 45
## 31 Vermont VT Northeast 625741 2 0.3196211 51
## 32 Washington WA West 6724540 93 1.3829942 37
## 33 Wisconsin WI North Central 5686986 97 1.7056487 34
## 34 Wyoming WY West 563626 5 0.8871131 43
nrow(no_south)
## [1] 34
6. We can also use %in% to filter with dplyr. You can therefore see the data from New York and Texas like this. Create a new data frame called murders_nw with only the states from the Northeast and the West. How many states are in this category?
filter(murders, state %in% c("New York", "Texas"))
murders_nw<-filter(murders,region %in% c("Northeast", "West"))
murders_nw
## state abb region population total rate rank
## 1 Alaska AK West 710231 19 2.6751860 27
## 2 Arizona AZ West 6392017 232 3.6295273 10
## 3 California CA West 37253956 1257 3.3741383 14
## 4 Colorado CO West 5029196 65 1.2924531 38
## 5 Connecticut CT Northeast 3574097 97 2.7139722 25
## 6 Hawaii HI West 1360301 7 0.5145920 49
## 7 Idaho ID West 1567582 12 0.7655102 46
## 8 Maine ME Northeast 1328361 11 0.8280881 44
## 9 Massachusetts MA Northeast 6547629 118 1.8021791 32
## 10 Montana MT West 989415 12 1.2128379 39
## 11 Nevada NV West 2700551 84 3.1104763 19
## 12 New Hampshire NH Northeast 1316470 5 0.3798036 50
## 13 New Jersey NJ Northeast 8791894 246 2.7980319 24
## 14 New Mexico NM West 2059179 67 3.2537239 15
## 15 New York NY Northeast 19378102 517 2.6679599 29
## 16 Oregon OR West 3831074 36 0.9396843 42
## 17 Pennsylvania PA Northeast 12702379 457 3.5977513 11
## 18 Rhode Island RI Northeast 1052567 16 1.5200933 35
## 19 Utah UT West 2763885 22 0.7959810 45
## 20 Vermont VT Northeast 625741 2 0.3196211 51
## 21 Washington WA West 6724540 93 1.3829942 37
## 22 Wyoming WY West 563626 5 0.8871131 43
nrow(murders_nw)
## [1] 22
7. Suppose you want to live in the Northeast or West and want the murder rate to be less than 1. We want to see the data for the states satisfying these options. Note that you can use logical operators with filter. Here is an example in which we filter to keep only small states in the Northeast region.
filter(murders, population \< 5000000 & region == "Northeast")
Make sure murders has been defined with rate and rank and still has all states. Create a table called my_states that contains rows for states satisfying both the conditions: it is in the Northeast or West and the murder rate is less than 1. Use select to show only the state name, the rate, and the rank.
my_states<-filter(murders, region %in% c("Northeast", "West") & rate<1)
select(my_states, state, rate, rank)
## state rate rank
## 1 Hawaii 0.5145920 49
## 2 Idaho 0.7655102 46
## 3 Maine 0.8280881 44
## 4 New Hampshire 0.3798036 50
## 5 Oregon 0.9396843 42
## 6 Utah 0.7959810 45
## 7 Vermont 0.3196211 51
## 8 Wyoming 0.8871131 43