Part 2 - Introduction to the Tidyverse

October 15, 2020

Before we get started

The student shared experience
Slowing down the class
Any issues running R at this point?
R Studio vs. R; .Rmd files vs. .R files

What we learned in the previous class

Getting up and running in RStudio
Core R concepts
- Data types
- Relationship between data and functions
How to import data
Regularly used data summary functions
Assignment and pipe operators
Troubleshooting basics

What we’ll learn today

Selecting variables
Creating variables
Updating values
Subsetting tibbles/data frames
Aggregation

Let’s begin by reading the Covid-19 and Seattle crime datasets

library(tidyverse)

covid <- read_csv('https://rb.gy/lzlylj')
crime <- read_csv('https://rb.gy/5zuayh')

Select, create, and rename variables

select() keeps the named variables from the dataset
You can also remove variables from the dataset with -
Let’s keep it simple for now and not nest functions within select()
- Exception: everything()
Use '``' for multi-word variables and variable names that begin with numbers

covid %>% 
  select(location, deaths, confirmed_infections, est_infections, date) %>% 
  head(3)
## # A tibble: 3 x 5
##   location deaths confirmed_infections est_infections date      
##   <chr>     <dbl>                <dbl>          <dbl> <date>    
## 1 Global      103                 4966             NA 2020-02-04
## 2 Global      111                 4749             NA 2020-02-05
## 3 Global      110                 3919             NA 2020-02-06

Select, create, and rename variables

select() keeps the named variables from the dataset
You can also remove variables from the dataset with -
Let’s keep it simple for now and not nest functions within select()
- Exception: everything()
Use '``' for multi-word variables and variable names that begin with numbers

covid %>% 
  select(location, deaths, confirmed_infections, est_infections, date) %>% 
  select(-est_infections) %>%
  head(3)
## # A tibble: 3 x 4
##   location deaths confirmed_infections date      
##   <chr>     <dbl>                <dbl> <date>    
## 1 Global      103                 4966 2020-02-04
## 2 Global      111                 4749 2020-02-05
## 3 Global      110                 3919 2020-02-06

Select, create, and rename variables

select() keeps the named variables from the dataset
You can also remove variables from the dataset with -
Let’s keep it simple for now and not nest functions within select()
- Exception: everything()
Use '``' for multi-word variables and variable names that begin with numbers

covid %>% 
  select(location, deaths, confirmed_infections, est_infections, date) %>% 
  select(location, date, everything())  %>% 
  head(3)
## # A tibble: 3 x 5
##   location date       deaths confirmed_infections est_infections
##   <chr>    <date>      <dbl>                <dbl>          <dbl>
## 1 Global   2020-02-04    103                 4966             NA
## 2 Global   2020-02-05    111                 4749             NA
## 3 Global   2020-02-06    110                 3919             NA

Select, create, and rename variables

mutate() creates variables
Create multiple variables in one go
Use functions you already learned within mutate()
- min(), max(), mean()
- as.character(), as.factor(), as.numeric()
Use ifelse() within mutate()

covid %>% 
  mutate(
    cases_deaths_ratio = (deaths / confirmed_infections) %>% round(2)
    , tests_over_mean = (total_tests / mean(total_tests, na.rm = TRUE)) %>% round(2)
    )

Select, create, and rename variables

mutate() creates variables
Create multiple variables in one go
Use functions you already learned within mutate()
- min(), max(), mean()
- as.character(), as.factor(), as.numeric()
Use ifelse() within mutate()

x <- c(1, 2, 3)
y <- c(4, 5, 1)

ifelse(x > y, 'yes', 'no')

Select, create, and rename variables

mutate() creates variables
Create multiple variables in one go
Use functions you already learned within mutate()
- min(), max(), mean()
- as.character(), as.factor(), as.numeric()
Use ifelse() within mutate()

x <- c(1, 2, 3)
y <- c(4, 5, 1)

ifelse(x > y, 'yes', 'no')
## [1] "no"  "no"  "yes"

Select, create, and rename variables

mutate() creates variables
Create multiple variables in one go
Use functions you already learned within mutate()
- min(), max(), mean()
- as.character(), as.factor(), as.numeric()
Use ifelse() within mutate()

covid %>% 
  mutate(
    cases_deaths_ratio = (deaths / confirmed_infections) %>% round(2)
    , tests_over_mean = (total_tests / mean(total_tests, na.rm = TRUE)) %>% round(2)
    , global_infections = ifelse(location %in% 'Global', confirmed_infections, NA) 
    , global_mobility = ifelse(location %in% 'Global', mobility_composite, NA) 
    )

Select, create, and rename variables

covid %>% 
  mutate(
    cases_deaths_ratio = (deaths / confirmed_infections) %>% round(2)
    , tests_over_mean = (total_tests / mean(total_tests, na.rm = TRUE)) %>% round(2)
    , global_infections = ifelse(location %in% 'Global', confirmed_infections, NA) 
    , global_mobility = ifelse(location %in% 'Global', mobility_composite, NA) 
    ) %>%
  select(location, date, cases_deaths_ratio, tests_over_mean, global_infections, global_mobility) %>%
  head(5) 
## # A tibble: 5 x 6
##   location date       cases_deaths_ra… tests_over_mean global_infectio…
##   <chr>    <date>                <dbl>           <dbl>            <dbl>
## 1 Global   2020-02-04             0.02            1.54            4966 
## 2 Global   2020-02-05             0.02            1.68            4749 
## 3 Global   2020-02-06             0.03            1.82            3919 
## 4 Global   2020-02-07             0.03            1.97            4495.
## 5 Global   2020-02-08             0.04            2.11            3403.
## # … with 1 more variable: global_mobility <dbl>

Select, create, and rename variables

Exercise - 5 minutes
- With the crime dataset, create a variable called larceny
- In larceny, flag with a 1 all records that have a ‘LARCENY-THEFT’ value in offense_parent_group. All other records should have a 0
- Set neighborhood to NA when the new variable larceny is equal to 0
- Only report the following varilables in the output: neighborhood, offense_parent_group, larceny

crime %>% 
  mutate(
    larceny = ifelse(offense_parent_group %in% 'LARCENY-THEFT', 1, 0)
    , neighborhood = ifelse(larceny %in% 1, neighborhood, NA)
    ) %>%
  select(offense_parent_group, larceny, neighborhood) %>%
  tail(8)

Select, create, and rename variables

crime %>% 
  mutate(
    larceny = ifelse(offense_parent_group %in% 'LARCENY-THEFT', 1, 0)
    , neighborhood = ifelse(larceny %in% 1, neighborhood, NA)
    ) %>%
  select(offense_parent_group, larceny, neighborhood) %>%
  tail(8)
## # A tibble: 8 x 3
##   offense_parent_group larceny neighborhood     
##   <chr>                  <dbl> <chr>            
## 1 LARCENY-THEFT              1 CAPITOL HILL     
## 2 ASSAULT OFFENSES           0 <NA>             
## 3 ASSAULT OFFENSES           0 <NA>             
## 4 LARCENY-THEFT              1 ROOSEVELT/RAVENNA
## 5 ASSAULT OFFENSES           0 <NA>             
## 6 MOTOR VEHICLE THEFT        0 <NA>             
## 7 LARCENY-THEFT              1 MOUNT BAKER      
## 8 ASSAULT OFFENSES           0 <NA>

Select, create, and rename variables

transmute() = mutate() + select()

covid %>% 
  mutate(
    cases_deaths_ratio = (deaths / confirmed_infections) %>% round(2)
    , tests_over_mean = (total_tests / mean(total_tests, na.rm = TRUE)) %>% round(2)
    , global_infections = ifelse(location %in% 'Global', confirmed_infections, NA) 
    , global_mobility = ifelse(location %in% 'Global', mobility_composite, NA) 
    ) %>%
  select(cases_deaths_ratio, tests_over_mean, global_infections, global_mobility)
## # A tibble: 103,366 x 4
##    cases_deaths_ratio tests_over_mean global_infections global_mobility
##                 <dbl>           <dbl>             <dbl>           <dbl>
##  1               0.02            1.54             4966             NA  
##  2               0.02            1.68             4749             NA  
##  3               0.03            1.82             3919             NA  
##  4               0.03            1.97             4495.            NA  
##  5               0.04            2.11             3403.           -23.7
##  6               0.04            2.25             3954            -23.9
##  7               0.05            2.39             3294.           -23.9
##  8               0.06            2.54             2591.           -23.4
##  9               0.01            2.68              134            -23.5
## 10               0.09            2.83             1728            -23.2
## # … with 103,356 more rows

Select, create, and rename variables

transmute() = mutate() + select()

covid %>% 
  transmute(
    cases_deaths_ratio = (deaths / confirmed_infections) %>% round(2)
    , tests_over_mean = (total_tests / mean(total_tests, na.rm = TRUE)) %>% round(2)
    , global_infections = ifelse(location %in% 'Global', confirmed_infections, NA) 
    , global_mobility = ifelse(location %in% 'Global', mobility_composite, NA) 
    )
## # A tibble: 103,366 x 4
##    cases_deaths_ratio tests_over_mean global_infections global_mobility
##                 <dbl>           <dbl>             <dbl>           <dbl>
##  1               0.02            1.54             4966             NA  
##  2               0.02            1.68             4749             NA  
##  3               0.03            1.82             3919             NA  
##  4               0.03            1.97             4495.            NA  
##  5               0.04            2.11             3403.           -23.7
##  6               0.04            2.25             3954            -23.9
##  7               0.05            2.39             3294.           -23.9
##  8               0.06            2.54             2591.           -23.4
##  9               0.01            2.68              134            -23.5
## 10               0.09            2.83             1728            -23.2
## # … with 103,356 more rows

Select, create, and rename variables

Keep unmanipulated variables too if you would like

covid %>% 
  transmute(
    location
    , date
    , cases_deaths_ratio = (deaths / confirmed_infections) %>% round(2)
    , tests_over_mean = (total_tests / mean(total_tests, na.rm = TRUE)) %>% round(2)
    , global_infections = ifelse(location %in% 'Global', confirmed_infections, NA) 
    , global_mobility = ifelse(location %in% 'Global', mobility_composite, NA) 
    )
## # A tibble: 103,366 x 6
##    location date       cases_deaths_ra… tests_over_mean global_infectio…
##    <chr>    <date>                <dbl>           <dbl>            <dbl>
##  1 Global   2020-02-04             0.02            1.54            4966 
##  2 Global   2020-02-05             0.02            1.68            4749 
##  3 Global   2020-02-06             0.03            1.82            3919 
##  4 Global   2020-02-07             0.03            1.97            4495.
##  5 Global   2020-02-08             0.04            2.11            3403.
##  6 Global   2020-02-09             0.04            2.25            3954 
##  7 Global   2020-02-10             0.05            2.39            3294.
##  8 Global   2020-02-11             0.06            2.54            2591.
##  9 Global   2020-02-12             0.01            2.68             134 
## 10 Global   2020-02-13             0.09            2.83            1728 
## # … with 103,356 more rows, and 1 more variable: global_mobility <dbl>

Select, create, and rename variables

rename() assigns a new name to an existing variable
Renaming is possible with transmute() and mutate()as well
- rename(): renames selected variables, keeps all other variables
- transmute(): renames selected variables, drops all other variables
- mutate(): creates new variables (with new names) and keeps originals

Exercise - 3 minutes
- Rename confirmed_infections as cases, location as geography, and date as ymd with one of the methods

Select, create, and rename variables

rename() assigns a new name to an existing variable
Renaming is possible with transmute() and mutate()as well
- rename(): renames selected variables, keeps all other variables
- transmute(): renames selected variables, drops all other variables
- mutate(): creates new variables (with new names) and keeps originals

covid %>% 
  rename(
    geography = location
    , ymd = date
    , cases = confirmed_infections
    ) %>%
  head(1)
## # A tibble: 1 x 14
##   geography ymd        deaths mobility_data_t… mobility_compos… total_tests_dat…
##   <chr>     <date>      <dbl> <chr>                       <dbl> <chr>           
## 1 Global    2020-02-04    103 <NA>                           NA observed        
## # … with 8 more variables: total_tests <dbl>, cases <dbl>,
## #   est_infections <dbl>, population <dbl>, deaths_p100k <dbl>,
## #   confirmed_infections_p100k <dbl>, est_infections_p100k <dbl>, month <dbl>

Select, create, and rename variables

rename() assigns a new name to an existing variable
Renaming is possible with transmute() and mutate()as well
- rename(): renames selected variables, keeps all other variables
- transmute(): renames selected variables, drops all other variables
- mutate(): creates new variables (with new names) and keeps originals

covid %>% 
  transmute(
    geography = location
    , ymd = date
    , cases = confirmed_infections
    ) %>%
  head(1)
## # A tibble: 1 x 3
##   geography ymd        cases
##   <chr>     <date>     <dbl>
## 1 Global    2020-02-04  4966

Select, create, and rename variables

rename() assigns a new name to an existing variable
Renaming is possible with transmute() and mutate()as well
- rename(): renames selected variables, keeps all other variables
- transmute(): renames selected variables, drops all other variables
- mutate(): creates new variables (with new names) and keeps originals

covid %>% 
  mutate(
    geography = location
    , ymd = date
    , cases = confirmed_infections
    ) %>%
  select(location, geography, ymd, date, cases, confirmed_infections) %>%
  head(1)
## # A tibble: 1 x 6
##   location geography ymd        date       cases confirmed_infections
##   <chr>    <chr>     <date>     <date>     <dbl>                <dbl>
## 1 Global   Global    2020-02-04 2020-02-04  4966                 4966

Exercise - 5 minutes

With crime, create a data frame that meets the following conditions
- The data frame includes three variables: report_number, crime_against_category, offense_location
- Make no changes to report_number
- In crime_against_category, change NA values to UNKNOWN
- Rename neighborhood as offense_location

crime %>% 
  transmute(
    report_number
    , crime_against_category = ifelse(crime_against_category %in% NA
                                , 'UNKNOWN', crime_against_category)
    , offense_location = neighborhood
    )

Exercise - 5 minutes

crime %>% 
  transmute(
    report_number
    , crime_against_category = ifelse(crime_against_category %in% NA
                                , 'UNKNOWN', crime_against_category)
    , offense_location = neighborhood
    ) %>%
  head()
## # A tibble: 6 x 3
##   report_number crime_against_category offense_location        
##   <chr>         <chr>                  <chr>                   
## 1 2015-323587   PROPERTY               ROOSEVELT/RAVENNA       
## 2 2015-267992   PROPERTY               FIRST HILL              
## 3 2013-900207   PROPERTY               <NA>                    
## 4 2008-334626   PROPERTY               ROOSEVELT/RAVENNA       
## 5 2014-363634   PROPERTY               CENTRAL AREA/SQUIRE PARK
## 6 2010-019552   UNKNOWN                GREENWOOD

Filter data frames

Use filter() to subset your data
With filter(), you tell R to exclude/include data that meet certain conditions
Operators for conditions
- !, %in%, >, >=, <, <=, ==

covid %>% 
  filter(est_infections_p100k > 60) %>% 
  select(location, date, est_infections_p100k) %>% 
  head(2)
## # A tibble: 2 x 3
##   location date       est_infections_p100k
##   <chr>    <date>                    <dbl>
## 1 Global   2020-12-15                 61.2
## 2 Global   2020-12-16                 62.4

Filter data frames

Use filter() to subset your data
With filter(), you tell R to exclude/include data that meet certain conditions
Operators for conditions
- !, %in%, >, >=, <, <=, ==

covid %>% 
  filter(est_infections_p100k <= 60) %>% 
  select(location, date, est_infections_p100k) %>% 
  head(2)
## # A tibble: 2 x 3
##   location date       est_infections_p100k
##   <chr>    <date>                    <dbl>
## 1 Global   2020-08-13                 21.7
## 2 Global   2020-08-14                 21.6

Filter data frames

Use filter() to subset your data
With filter(), you tell R to exclude/include data that meet certain conditions
Operators for conditions
- !, %in%, >, >=, <, <=, ==

covid %>% 
  filter(location %in% 'Peru') %>% 
  select(location, date, est_infections_p100k) %>% 
  tail(2)
## # A tibble: 2 x 3
##   location date       est_infections_p100k
##   <chr>    <date>                    <dbl>
## 1 Peru     2020-12-31                 2.67
## 2 Peru     2021-01-01                 2.63

Filter data frames

Use filter() to subset your data
With filter(), you tell R to exclude/include data that meet certain conditions
Operators for conditions
- !, %in%, >, >=, <, <=, ==

covid %>% 
  filter(! location %in% 'Peru') %>% 
  select(location, date, est_infections_p100k) %>% 
  tail(2)
## # A tibble: 2 x 3
##   location date       est_infections_p100k
##   <chr>    <date>                    <dbl>
## 1 Zimbabwe 2020-12-31                0.581
## 2 Zimbabwe 2021-01-01                0.579

Filter data frames

You can stipulate multiple conditions with filter()
- Use & for ‘and’ expressions
- Use | for ‘or’ expressions
- If there are many conditions, wrap conditions in () or use more than one filter()
- If you want to filter on multiple vaues, use c()

covid %>% 
  filter(est_infections_p100k > 60 & location %in% 'Malta') %>% 
  select(location, est_infections_p100k, date) %>% 
  head(2)
## # A tibble: 2 x 3
##   location est_infections_p100k date      
##   <chr>                   <dbl> <date>    
## 1 Malta                    85.6 2020-09-29
## 2 Malta                    92.3 2020-09-30

Filter data frames

You can stipulate multiple conditions with filter()
- Use & for ‘and’ expressions
- Use | for ‘or’ expressions
- If there are many conditions, wrap conditions in () or use more than one filter()
- If you want to filter on multiple vaues, use c()

covid %>% 
  filter(est_infections_p100k > 434 | location %in% 'Malta') %>% 
  select(location, est_infections_p100k, date) %>% 
  head(2)
## # A tibble: 2 x 3
##   location    est_infections_p100k date      
##   <chr>                      <dbl> <date>    
## 1 Maharashtra                 435. 2020-11-30
## 2 Malta                        NA  2020-02-04

Filter data frames

You can stipulate multiple conditions with filter()
- Use & for ‘and’ expressions
- Use | for ‘or’ expressions
- If there are many conditions, wrap conditions in () or use more than one filter()
- If you want to filter on multiple vaues, use c()

covid %>% 
  filter(
    (est_infections_p100k > 60 & location %in% 'Malta') | 
    (est_infections_p100k <= 2.5 & location %in% 'Italy')
    ) %>% 
  select(location, est_infections_p100k, date) %>% 
  head(5)
## # A tibble: 5 x 3
##   location est_infections_p100k date      
##   <chr>                   <dbl> <date>    
## 1 Italy                    2.38 2020-09-29
## 2 Italy                    2.43 2020-09-30
## 3 Italy                    2.46 2020-10-01
## 4 Italy                    2.50 2020-10-02
## 5 Malta                   85.6  2020-09-29

Filter data frames

You can stipulate multiple conditions with filter()
- Use & for ‘and’ expressions
- Use | for ‘or’ expressions
- If there are many conditions, wrap conditions in () or use more than one filter()
- If you want to filter on multiple vaues, use c()

covid %>% 
  filter(location %in% c('Peru', 'Colombia') & est_infections_p100k > 50) %>% 
  select(location, date, est_infections_p100k) %>% 
  head(5)
## # A tibble: 5 x 3
##   location date       est_infections_p100k
##   <chr>    <date>                    <dbl>
## 1 Colombia 2020-09-29                 101.
## 2 Colombia 2020-09-30                 101.
## 3 Colombia 2020-10-01                 101.
## 4 Colombia 2020-10-02                 100.
## 5 Colombia 2020-10-03                 100.

Filter data frames

Other functions for subsetting your data
- head()
- tail()
- sample_n()

covid %>% sample_n(3) %>% select(location, date, est_infections_p100k)
## # A tibble: 3 x 3
##   location date       est_infections_p100k
##   <chr>    <date>                    <dbl>
## 1 Romania  2020-10-29                 42.4
## 2 Oklahoma 2020-05-29                 NA  
## 3 Nagaland 2020-12-22                179.

covid %>% sample_n(3) %>% select(location, date, est_infections_p100k)
## # A tibble: 3 x 3
##   location     date       est_infections_p100k
##   <chr>        <date>                    <dbl>
## 1 Rhode Island 2020-06-29                NA   
## 2 Belgium      2020-02-25                NA   
## 3 Bulgaria     2020-11-13                 9.71

Exercise - 10 minutes

Produce a dataset that meets these conditions
- Contains total_tests_data_type values not equal to NA
- Contains location values equal to 'China' or 'Germany'
- Contains no dates prior to July 4, 2020
- Contains confirmed_infections greater than 130
- Contains a variable called high_cases. This variable should flag with a 1 observations that have confirmed_infections greater than 400
- Contains 5 variables in total in the following order: location, high_cases, date, confirmed_infections, total_tests_data_type
Lastly, determine how many records meet the above filtering conditions

Hint

covid %>% 
  filter(date >= '2020-07-04')

Exercise - 10 minutes

covid %>% 
  filter(
    ! total_tests_data_type %in% NA & 
    location %in% c('China', 'Germany') & 
    confirmed_infections > 130 & 
    date >= '2020-07-04'
    ) %>%  
  transmute(
    location
    , high_cases = ifelse(confirmed_infections > 400, 1, NA)
    , date
    , confirmed_infections
    , total_tests_data_type
  ) 
## # A tibble: 58 x 5
##    location high_cases date       confirmed_infections total_tests_data_type
##    <chr>         <dbl> <date>                    <dbl> <chr>                
##  1 China            NA 2020-07-22                 136  observed             
##  2 China            NA 2020-07-25                 137  observed             
##  3 China            NA 2020-07-27                 149  observed             
##  4 China            NA 2020-07-30                 152  observed             
##  5 China            NA 2020-08-01                 133. observed             
##  6 Germany           1 2020-07-04                 422. observed             
##  7 Germany          NA 2020-07-05                 239. observed             
##  8 Germany          NA 2020-07-06                 219. observed             
##  9 Germany          NA 2020-07-07                 390. observed             
## 10 Germany          NA 2020-07-08                 397. observed             
## # … with 48 more rows

Exercise - 10 minutes

covid %>% 
  filter(
    ! total_tests_data_type %in% NA & 
    location %in% c('China', 'Germany') & 
    confirmed_infections > 130 & 
    date >= '2020-07-04'
    ) %>%
  nrow()
## [1] 58

Aggregate data with group_by() and summarise()

What you can do with group_by() and summarise()

Count, sum, average, and identify max and min values
Example questions you could answer
- In covid, which month reports the most total_tests?
- In covid, which location reports the largest number of estimated infections in September, excluding 'Global' values?
- In crime, of all 'ASSAULT OFFENSES' values in offense_parent_group which offense value has the fewest incidences?
- In crime, how many crimes were reported (reported_year) one and two years after the year in which they occurred (occurred_year)?

How you use group_by() and summarise() to aggregate data

In group_by(), list the variables by which you want to aggregate data
In summarise(), create a variable and define the variable with an aggregation function
Use %>% to ‘link’ group_by() and summarise()
Aggregation functions
- n(), n_distinct(), sum(), mean(), max(), min(), etc.
Perform multiple aggregations in a single summarise() function

Example
In covid, which month reports the most total_tests?

covid %>% 
  group_by(month) %>% 
  summarise(total_tests = sum(total_tests, na.rm = TRUE))

How you use group_by() and summarise() to aggregate data

Example
In covid, which month reports the most total_tests?

covid %>% 
  group_by(month) %>% 
  summarise(total_tests = sum(total_tests, na.rm = TRUE)) %>%
  arrange(desc(total_tests))
## # A tibble: 12 x 2
##    month total_tests
##    <dbl>       <dbl>
##  1     8    1719495.
##  2     7    1689056.
##  3     5    1210343.
##  4     6    1182845.
##  5     9    1152910.
##  6     4     958562.
##  7     3     312733.
##  8     2      44480.
##  9     1          0 
## 10    10          0 
## 11    11          0 
## 12    12          0

How you use group_by() and summarise() to aggregate data

Example
In covid, which location reports the largest number of estimated infections in September, excluding 'Global' values?

covid %>% 
  group_by(month, location) %>% 
  summarise(est_infections = sum(est_infections, na.rm = TRUE)) %>%
  filter(month %in% 9 & ! location %in% 'Global') %>%
  arrange(desc(est_infections)) %>% 
  head()
## # A tibble: 6 x 3
## # Groups:   month [1]
##   month location                    est_infections
##   <dbl> <chr>                                <dbl>
## 1     9 Latin America and Caribbean       7757636.
## 2     9 Europe and Central Asia           3550230.
## 3     9 South Asia                        2481264.
## 4     9 India                             2330349.
## 5     9 East Asia and Pacific             1596947.
## 6     9 Maharashtra                        812265.

How you use group_by() and summarise() to aggregate data

How can you rewrite the code below and get the same output?

In covid, which location reports the largest number of estimated infections in September, excluding 'Global' values?

covid %>% 
  group_by(month, location) %>% 
  summarise(est_infections = sum(est_infections, na.rm = TRUE)) %>%
  filter(month %in% 9 & ! location %in% 'Global') %>%
  arrange(desc(est_infections)) %>% 
  head()

How you use group_by() and summarise() to aggregate data

How can you rewrite the code below and get the same output?

In covid, which location reports the largest number of estimated infections in September, excluding 'Global' values?

covid %>% 
  filter(month %in% 9 & ! location %in% 'Global') %>%
  group_by(location) %>% 
  summarise(est_infections = sum(est_infections, na.rm = TRUE)) %>%
  arrange(desc(est_infections)) %>% 
  head()

How you use group_by() and summarise() to aggregate data

Exercise #1 - 5 minutes
In crime, of all 'ASSAULT OFFENSES' values in offense_parent_group which offense value has the fewest incidences? (Exclude NAs from the output you produce.)

Exercise #2 - 7 minutes
In crime, how many crimes were reported (reported_year) one and two years after the year in which they occurred (occurred_year)?

Exercise #1 - 5 minutes

In crime, of all 'ASSAULT OFFENSES' values in offense_parent_group which offense value has the fewest incidences? (Exclude NAs from the output you produce.)

crime %>% 
  group_by(offense_parent_group, offense) %>%

Exercise #1 - 5 minutes

In crime, of all 'ASSAULT OFFENSES' values in offense_parent_group which offense value has the fewest incidences? (Exclude NAs from the output you produce.)

crime %>% 
  group_by(offense_parent_group, offense) %>% 
  summarise(incidence = n())

Exercise #1 - 5 minutes

In crime, of all 'ASSAULT OFFENSES' values in offense_parent_group which offense value has the fewest incidences? (Exclude NAs from the output you produce.)

crime %>% 
  group_by(offense_parent_group, offense) %>% 
  summarise(incidence = n()) %>%
  filter(offense_parent_group %in% 'ASSAULT OFFENSES' & ! offense %in% NA)
## # A tibble: 3 x 3
## # Groups:   offense_parent_group [1]
##   offense_parent_group offense            incidence
##   <chr>                <chr>                  <int>
## 1 ASSAULT OFFENSES     Aggravated Assault      3754
## 2 ASSAULT OFFENSES     Intimidation            3963
## 3 ASSAULT OFFENSES     Simple Assault          9527