The data set concerns species and weight of animals caught in plots in a study area in Arizona over time.

Each row holds information for a single animal, and the columns represent:


Chunk 1

Load the package tidyverse by using pacman.

Chunk 2

## Parsed with column specification:
## cols(
##   record_id = col_double(),
##   month = col_double(),
##   day = col_double(),
##   year = col_double(),
##   plot_id = col_double(),
##   species_id = col_character(),
##   sex = col_character(),
##   hindfoot_length = col_double(),
##   weight = col_double(),
##   genus = col_character(),
##   species = col_character(),
##   taxa = col_character(),
##   plot_type = col_character()
## )

Load in the comma-delimited data set via the URL by using read_csv{readr} and name the dataset dta.

Chunk 3

## Rows: 34,786
## Columns: 13
## $ record_id       <dbl> 1, 72, 224, 266, 349, 363, 435, 506, 588, 661, …
## $ month           <dbl> 7, 8, 9, 10, 11, 11, 12, 1, 2, 3, 4, 5, 6, 8, 9…
## $ day             <dbl> 16, 19, 13, 16, 12, 12, 10, 8, 18, 11, 8, 6, 9,…
## $ year            <dbl> 1977, 1977, 1977, 1977, 1977, 1977, 1977, 1978,…
## $ plot_id         <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ species_id      <chr> "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL",…
## $ sex             <chr> "M", "M", NA, NA, NA, NA, NA, NA, "M", NA, NA, …
## $ hindfoot_length <dbl> 32, 31, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32,…
## $ weight          <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 218, NA, NA, 20…
## $ genus           <chr> "Neotoma", "Neotoma", "Neotoma", "Neotoma", "Ne…
## $ species         <chr> "albigula", "albigula", "albigula", "albigula",…
## $ taxa            <chr> "Rodent", "Rodent", "Rodent", "Rodent", "Rodent…
## $ plot_type       <chr> "Control", "Control", "Control", "Control", "Co…

Get the basic info of dta, including its dimension and names of variables.

Chunk 4

## [1] 34786    13

Get the dimension of dta (no. of rows and no. of columns).

Chunk 5

Use select{dplyr} to pick up plot_id, species_id, and weightsome, variables in dta. And use head to display the first 6 rows.

Chunk 6

Use select{dplyr} to pick up variables in dta except record_id and species_id. And use head to display the first 6 rows.

Chunk 7

Use filter{dplyr} to pick up rows that correspond the specified condition (e.g., data in variable year is 1995). And use head to display the first 6 rows.

Chunk 8

  1. Use filter{dplyr} to pick up rows which data in variable weight is not larger than 5.
  2. Use select{dplyr} to pick up variables species_id, sex, and weight.
  3. Use head to display the first 6 rows.

Chunk 9

  1. Use filter{dplyr} to pick up rows which data in variable weight is not larger than 5.
  2. Use select{dplyr} to pick up variables species_id, sex, and weight.
  3. Use head to display the first 6 rows and name it dta.

Chunk 10

Use mutate to create two new variables: (a) weight_kg: the existing variable, weight, divied by 1000. (b) weight_lb: the new-creating variable, weight_kg, multiply 2.2. (In other words, this procedure is conducting unit conversion.) And use head to display the first 6 rows.

Chunk 11

  1. Use filter{dplyr} to pick up rows which data in variable weight is not a missing value.
  2. Use group_by to group the data by variables sex and species_id. There will be # classes in sex * # classes in species_id groups.
  3. Use summarize to compute weight means of each group.
  4. Use arrange and desc to sort the data the descending order of weight means.
  5. Use head to display the first 6 rows.

Chunk 12

Group the data by the variable sex and count total observations in each class of group. That is, count observations in each class of sex.

Chunk 13

Count observations in each class of sex.

Chunk 14

Group the data by the variable sex and create a new varibale with total observations in each class of group. That is, count observations in each class of sex.

Chunk 15

Group the data by the variable sex and create a new varibale with total no. of non-missing values of year in each class of group.

Chunk 16

  1. Get the rows without missing value in weight.
  2. Group the data by variables genus and plot_id.
  3. Compute weight means for each group.
  4. Save the data and name it dta_gw

Chunk 17

## Rows: 196
## Columns: 3
## Groups: genus [10]
## $ genus       <chr> "Baiomys", "Baiomys", "Baiomys", "Baiomys", "Baiomy…
## $ plot_id     <dbl> 1, 2, 3, 5, 18, 19, 20, 21, 1, 2, 3, 4, 5, 6, 7, 8,…
## $ mean_weight <dbl> 7.000000, 6.000000, 8.611111, 7.750000, 9.500000, 9…

Get the basic info of dta_gw, including its dimension and names of variables.

Chunk 18

  1. Ungroup (spread) the data by the variable genus to get the wide data format contains columns of classes in genus and values of mean_weight.
  2. Save the data and name it dta_w.

Chunk 19

## Rows: 24
## Columns: 11
## $ plot_id         <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ Baiomys         <dbl> 7.000000, 6.000000, 8.611111, NA, 7.750000, NA,…
## $ Chaetodipus     <dbl> 22.19939, 25.11014, 24.63636, 23.02381, 17.9827…
## $ Dipodomys       <dbl> 60.23214, 55.68259, 52.04688, 57.52454, 51.1135…
## $ Neotoma         <dbl> 156.2222, 169.1436, 158.2414, 164.1667, 190.037…
## $ Onychomys       <dbl> 27.67550, 26.87302, 26.03241, 28.09375, 27.0169…
## $ Perognathus     <dbl> 9.625000, 6.947368, 7.507812, 7.824427, 8.65853…
## $ Peromyscus      <dbl> 22.22222, 22.26966, 21.37037, 22.60000, 21.2317…
## $ Reithrodontomys <dbl> 11.375000, 10.680556, 10.516588, 10.263158, 11.…
## $ Sigmodon        <dbl> NA, 70.85714, 65.61404, 82.00000, 82.66667, 68.…
## $ Spermophilus    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Get the basic info of dta_w, including its dimension and names of variables.

Chunk 20

  1. Ungroup (spread) the data by the variable genus to get the wide data format contains columns of classes in genus and values of mean_weight.
  2. Fill the missing values with 0.
  3. Display the first 6 rows.

Chunk 21

  1. Stack (gather) the data by the variable genus to get the long data format that contains a single column genus with different classes and a single column with values of mean_weight.
  2. Drop out the column plot_id.
  3. Save the data and name it dta_l.

Chunk 22

## Rows: 240
## Columns: 3
## $ plot_id     <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ genus       <chr> "Baiomys", "Baiomys", "Baiomys", "Baiomys", "Baiomy…
## $ mean_weight <dbl> 7.000000, 6.000000, 8.611111, NA, 7.750000, NA, NA,…

Get the basic info of dta_l, including its dimension and names of variables.

Chunk 23

  1. select column from Baiomys to Spermophilus in dta_w.
  2. Stack (gather) the data by the variable genus to get the long data format that contains a single column genus with different classes and a single column with values of mean_weight.
  3. Display the first 6 rows

Chunk 24

  1. Get the rows in dta without missing values in columns weight, hindfoot_length, or sex.
  2. Save the data and name it dta_complete.

Chunk 25

  1. Count no. of the species in the complete data (dta_complete).
  2. Get the rows that species counts are not less than 50.
  3. Save the data and name it species_counts.

Chunk 26

Revise dta_complete: Retain data that species id appears in species_counts. That is, drop out the data that species counts are less than 50.