Section 4.15
1. Load the murders
dataset. Which of the following is
true?
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(dslabs)
murders
## state abb region population total
## 1 Alabama AL South 4779736 135
## 2 Alaska AK West 710231 19
## 3 Arizona AZ West 6392017 232
## 4 Arkansas AR South 2915918 93
## 5 California CA West 37253956 1257
## 6 Colorado CO West 5029196 65
## 7 Connecticut CT Northeast 3574097 97
## 8 Delaware DE South 897934 38
## 9 District of Columbia DC South 601723 99
## 10 Florida FL South 19687653 669
## 11 Georgia GA South 9920000 376
## 12 Hawaii HI West 1360301 7
## 13 Idaho ID West 1567582 12
## 14 Illinois IL North Central 12830632 364
## 15 Indiana IN North Central 6483802 142
## 16 Iowa IA North Central 3046355 21
## 17 Kansas KS North Central 2853118 63
## 18 Kentucky KY South 4339367 116
## 19 Louisiana LA South 4533372 351
## 20 Maine ME Northeast 1328361 11
## 21 Maryland MD South 5773552 293
## 22 Massachusetts MA Northeast 6547629 118
## 23 Michigan MI North Central 9883640 413
## 24 Minnesota MN North Central 5303925 53
## 25 Mississippi MS South 2967297 120
## 26 Missouri MO North Central 5988927 321
## 27 Montana MT West 989415 12
## 28 Nebraska NE North Central 1826341 32
## 29 Nevada NV West 2700551 84
## 30 New Hampshire NH Northeast 1316470 5
## 31 New Jersey NJ Northeast 8791894 246
## 32 New Mexico NM West 2059179 67
## 33 New York NY Northeast 19378102 517
## 34 North Carolina NC South 9535483 286
## 35 North Dakota ND North Central 672591 4
## 36 Ohio OH North Central 11536504 310
## 37 Oklahoma OK South 3751351 111
## 38 Oregon OR West 3831074 36
## 39 Pennsylvania PA Northeast 12702379 457
## 40 Rhode Island RI Northeast 1052567 16
## 41 South Carolina SC South 4625364 207
## 42 South Dakota SD North Central 814180 8
## 43 Tennessee TN South 6346105 219
## 44 Texas TX South 25145561 805
## 45 Utah UT West 2763885 22
## 46 Vermont VT Northeast 625741 2
## 47 Virginia VA South 8001024 250
## 48 Washington WA West 6724540 93
## 49 West Virginia WV South 1852994 27
## 50 Wisconsin WI North Central 5686986 97
## 51 Wyoming WY West 563626 5
class(murders)
## [1] "data.frame"
*murders
is in tidy format and is stored in a data
frame.
2. Use as_tibble
to convert
the murders
data table into a tibble and save it in an
object called murders_tibble
.
murders_tibble<-as_tibble(murders)
3. Use the group_by
function to
convert murders
into a tibble that is grouped by
region.
group_by(murders_tibble, region)
## # A tibble: 51 × 5
## # Groups: region [4]
## state abb region population total
## <chr> <chr> <fct> <dbl> <dbl>
## 1 Alabama AL South 4779736 135
## 2 Alaska AK West 710231 19
## 3 Arizona AZ West 6392017 232
## 4 Arkansas AR South 2915918 93
## 5 California CA West 37253956 1257
## 6 Colorado CO West 5029196 65
## 7 Connecticut CT Northeast 3574097 97
## 8 Delaware DE South 897934 38
## 9 District of Columbia DC South 601723 99
## 10 Florida FL South 19687653 669
## # ℹ 41 more rows
4. Write tidyverse code that is equivalent to this code:
exp(mean(log(murders$population)))
Write it using the pipe so that each function is called without
arguments. Use the dot operator to access the population. Hint: The code
should start with murders |>
.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
murders |> _$population |> log() |> mean() |> exp()
## [1] 3675209
5. Use the map_df
to create a data frame with three
columns named n
, s_n
, and s_n_2
.
The first column should contain the numbers 1 through 100. The second
and third columns should each contain the sum of 1 through \(n\) with \(n\) the row number.
library(purrr)
library(dplyr)
compute_s_n <- function(n){
x <- 1:n
tibble(sum = sum(x))
}
n<-c(seq(1, 100, by=1))
sno<-map_df(n, compute_s_n)
snox<-mutate(sno, n)
compute_s_n_2<-map_df(n, compute_s_n)
final<-full_join(snox, compute_s_n_2)
## Joining with `by = join_by(sum)`