DataM: Inclass Exercise 0330 - 3
The data set concerns species and weight of animals caught in plots in a study area in Arizona over time.
Each row holds information for a single animal, and the columns represent:
- record_id: Unique id for the observation
- month: month of observation
- day: day of observation
- year: year of observation
- plot_id: ID of a particular plot
- species_id: 2-letter code
- sex: sex of animal ("M", "F")
- hindfoot_length: length of the hindfoot in mm
- weight: weight of the animal in grams
- genus: genus of animal
- species: species of animal
- taxa: e.g. Rodent, Reptile, Bird, Rabbit
- plot_type: type of plot
Chunk 2
## Parsed with column specification:
## cols(
## record_id = col_double(),
## month = col_double(),
## day = col_double(),
## year = col_double(),
## plot_id = col_double(),
## species_id = col_character(),
## sex = col_character(),
## hindfoot_length = col_double(),
## weight = col_double(),
## genus = col_character(),
## species = col_character(),
## taxa = col_character(),
## plot_type = col_character()
## )
Load in the comma-delimited data set via the URL by using read_csv{readr} and name the dataset dta.
Chunk 3
## Rows: 34,786
## Columns: 13
## $ record_id <dbl> 1, 72, 224, 266, 349, 363, 435, 506, 588, 661, …
## $ month <dbl> 7, 8, 9, 10, 11, 11, 12, 1, 2, 3, 4, 5, 6, 8, 9…
## $ day <dbl> 16, 19, 13, 16, 12, 12, 10, 8, 18, 11, 8, 6, 9,…
## $ year <dbl> 1977, 1977, 1977, 1977, 1977, 1977, 1977, 1978,…
## $ plot_id <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
## $ species_id <chr> "NL", "NL", "NL", "NL", "NL", "NL", "NL", "NL",…
## $ sex <chr> "M", "M", NA, NA, NA, NA, NA, NA, "M", NA, NA, …
## $ hindfoot_length <dbl> 32, 31, NA, NA, NA, NA, NA, NA, NA, NA, NA, 32,…
## $ weight <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 218, NA, NA, 20…
## $ genus <chr> "Neotoma", "Neotoma", "Neotoma", "Neotoma", "Ne…
## $ species <chr> "albigula", "albigula", "albigula", "albigula",…
## $ taxa <chr> "Rodent", "Rodent", "Rodent", "Rodent", "Rodent…
## $ plot_type <chr> "Control", "Control", "Control", "Control", "Co…
Get the basic info of dta, including its dimension and names of variables.
Chunk 5
Use select{dplyr} to pick up plot_id, species_id, and weightsome, variables in dta. And use head to display the first 6 rows.
Chunk 6
Use select{dplyr} to pick up variables in dta except record_id and species_id. And use head to display the first 6 rows.
Chunk 7
Use filter{dplyr} to pick up rows that correspond the specified condition (e.g., data in variable year is 1995). And use head to display the first 6 rows.
Chunk 8
- Use
filter{dplyr}to pick up rows which data in variableweightis not larger than 5. - Use
select{dplyr}to pick up variablesspecies_id,sex, andweight. - Use
headto display the first 6 rows.
Chunk 9
- Use
filter{dplyr}to pick up rows which data in variableweightis not larger than 5. - Use
select{dplyr}to pick up variablesspecies_id,sex, andweight. - Use
headto display the first 6 rows and name itdta.
Chunk 10
Use mutate to create two new variables: (a) weight_kg: the existing variable, weight, divied by 1000. (b) weight_lb: the new-creating variable, weight_kg, multiply 2.2. (In other words, this procedure is conducting unit conversion.) And use head to display the first 6 rows.
Chunk 11
dta %>%
filter(!is.na(weight)) %>%
group_by(sex, species_id) %>%
summarize(mean_weight = mean(weight)) %>%
arrange(desc(mean_weight)) %>%
head()- Use
filter{dplyr}to pick up rows which data in variableweightis not a missing value. - Use
group_byto group the data by variablessexandspecies_id. There will be# classes in sex*# classes in species_idgroups. - Use
summarizeto compute weight means of each group. - Use
arrangeanddescto sort the data the descending order of weight means. - Use
headto display the first 6 rows.
Chunk 12
Group the data by the variable sex and count total observations in each class of group. That is, count observations in each class of sex.
Chunk 14
Group the data by the variable sex and create a new varibale with total observations in each class of group. That is, count observations in each class of sex.
Chunk 15
Group the data by the variable sex and create a new varibale with total no. of non-missing values of year in each class of group.
Chunk 16
dta_gw <- dta %>%
filter(!is.na(weight)) %>%
group_by(genus, plot_id) %>%
summarize(mean_weight = mean(weight))- Get the rows without missing value in
weight. - Group the data by variables
genusandplot_id. - Compute weight means for each group.
- Save the data and name it dta_gw
Chunk 17
## Rows: 196
## Columns: 3
## Groups: genus [10]
## $ genus <chr> "Baiomys", "Baiomys", "Baiomys", "Baiomys", "Baiomy…
## $ plot_id <dbl> 1, 2, 3, 5, 18, 19, 20, 21, 1, 2, 3, 4, 5, 6, 7, 8,…
## $ mean_weight <dbl> 7.000000, 6.000000, 8.611111, 7.750000, 9.500000, 9…
Get the basic info of dta_gw, including its dimension and names of variables.
Chunk 18
- Ungroup (spread) the data by the variable
genusto get the wide data format contains columns of classes ingenusand values ofmean_weight. - Save the data and name it
dta_w.
Chunk 19
## Rows: 24
## Columns: 11
## $ plot_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, …
## $ Baiomys <dbl> 7.000000, 6.000000, 8.611111, NA, 7.750000, NA,…
## $ Chaetodipus <dbl> 22.19939, 25.11014, 24.63636, 23.02381, 17.9827…
## $ Dipodomys <dbl> 60.23214, 55.68259, 52.04688, 57.52454, 51.1135…
## $ Neotoma <dbl> 156.2222, 169.1436, 158.2414, 164.1667, 190.037…
## $ Onychomys <dbl> 27.67550, 26.87302, 26.03241, 28.09375, 27.0169…
## $ Perognathus <dbl> 9.625000, 6.947368, 7.507812, 7.824427, 8.65853…
## $ Peromyscus <dbl> 22.22222, 22.26966, 21.37037, 22.60000, 21.2317…
## $ Reithrodontomys <dbl> 11.375000, 10.680556, 10.516588, 10.263158, 11.…
## $ Sigmodon <dbl> NA, 70.85714, 65.61404, 82.00000, 82.66667, 68.…
## $ Spermophilus <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
Get the basic info of dta_w, including its dimension and names of variables.
Chunk 20
- Ungroup (spread) the data by the variable
genusto get the wide data format contains columns of classes ingenusand values ofmean_weight. - Fill the missing values with 0.
- Display the first 6 rows.
Chunk 21
- Stack (gather) the data by the variable
genusto get the long data format that contains a single columngenuswith different classes and a single column with values ofmean_weight. - Drop out the column
plot_id. - Save the data and name it
dta_l.
Chunk 22
## Rows: 240
## Columns: 3
## $ plot_id <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
## $ genus <chr> "Baiomys", "Baiomys", "Baiomys", "Baiomys", "Baiomy…
## $ mean_weight <dbl> 7.000000, 6.000000, 8.611111, NA, 7.750000, NA, NA,…
Get the basic info of dta_l, including its dimension and names of variables.
Chunk 23
- select column from
BaiomystoSpermophilusindta_w. - Stack (gather) the data by the variable
genusto get the long data format that contains a single columngenuswith different classes and a single column with values ofmean_weight. - Display the first 6 rows
Chunk 24
- Get the rows in
dtawithout missing values in columnsweight,hindfoot_length, orsex. - Save the data and name it
dta_complete.
Chunk 25
- Count no. of the species in the complete data (
dta_complete). - Get the rows that species counts are not less than 50.
- Save the data and name it
species_counts.
Chunk 26
Revise dta_complete: Retain data that species id appears in species_counts. That is, drop out the data that species counts are less than 50.