1) Store all data in long format
2) One table for each level of observation
3) Always start with raw data … no derived variables!
4) One data value per field
1) Store all data in long format
Wide format data:
| subject | mass2016 | mass2017 |
|---|---|---|
| A | 13.2 | 26.4 |
| B | 14.6 | 15.2 |
| C | 27.1 | 31.3 |
1) Store all data in long format
Long format data:
| subject | year | value |
|---|---|---|
| A | 2016 | 13.2 |
| B | 2016 | 14.6 |
| C | 2016 | 27.1 |
| A | 2017 | 26.4 |
| B | 2017 | 15.2 |
| C | 2017 | 31.3 |
2) One table for each level of observation
Don't do this:
| subject | year | value | site | canopy | precip |
|---|---|---|---|---|---|
| A | 2016 | 13.2 | 1 | 13.3 | 91.7 |
| B | 2016 | 14.6 | 1 | 13.3 | 91.7 |
| C | 2016 | 27.1 | 2 | 26.8 | 78.1 |
| A | 2017 | 26.4 | 1 | 13.3 | 91.7 |
| B | 2017 | 15.2 | 1 | 13.3 | 91.7 |
| C | 2017 | 31.3 | 2 | 26.8 | 78.1 |
2) One table for each level of observation
Do this:
| subject | year | value | site |
|---|---|---|---|
| A | 2016 | 13.2 | 1 |
| B | 2016 | 14.6 | 1 |
| C | 2016 | 27.1 | 2 |
| A | 2017 | 26.4 | 1 |
| B | 2017 | 15.2 | 1 |
| C | 2017 | 31.3 | 2 |
| site | canopy | precip |
|---|---|---|
| 1 | 13.3 | 91.7 |
| 2 | 26.8 | 78.1 |
3) Always start with raw data … no derived variables!
Don't do this:
| subject | date | year | value |
|---|---|---|---|
| A | 2016-06-12 | 2016 | 13.2 |
| B | 2016-06-17 | 2016 | 14.6 |
| C | 2016-07-01 | 2016 | 27.1 |
| A | 2017-06-14 | 2017 | 26.4 |
| B | 2017-06-18 | 2017 | 15.2 |
| C | 2017-06-29 | 2017 | 31.3 |
3) Always start with raw data … no derived variables!
Do this:
| subject | date | value |
|---|---|---|
| A | 2016-06-12 | 13.2 |
| B | 2016-06-17 | 14.6 |
| C | 2016-07-01 | 27.1 |
| A | 2017-06-14 | 26.4 |
| B | 2017-06-18 | 15.2 |
| C | 2017-06-29 | 31.3 |
4) One data value per field
Don't do this:
| subject | value | sexYear |
|---|---|---|
| A | 13.2 | m2016 |
| B | 14.6 | f2016 |
| C | 27.1 | f2016 |
| A | 26.4 | m2017 |
| B | 15.2 | f2017 |
| C | 31.3 | f2017 |
4) One data value per field
Don't do this:
| subject | year | value | sexYear |
|---|---|---|---|
| A | 2016 | 13.2 | m |
| B | 2016 | 14.6 | f |
| C | 2016 | 27.1 | f |
| A | 2017 | 26.4 | m |
| B | 2017 | 15.2 | f |
| C | 2017 | 31.3 | f |

install.packages('tidyverse')
install.packages('stringr')
install.packages('lubridate')
Tibbles:
subject year value
1 A 2016 13.2
2 B 2016 14.6
3 C 2016 27.1
4 A 2017 26.4
5 B 2017 15.2
6 C 2017 31.3
Tibbles:
# A tibble: 6 × 3
subject year value
<fctr> <chr> <dbl>
1 A 2016 13.2
2 B 2016 14.6
3 C 2016 27.1
4 A 2017 26.4
5 B 2017 15.2
6 C 2017 31.3
Tibbles:
subject year value
1 A 2016 13.2
2 B 2016 14.6
3 C 2016 27.1
4 A 2017 26.4
5 B 2017 15.2
6 C 2017 31.3
# A tibble: 6 × 3
subject year value
<fctr> <chr> <dbl>
1 A 2016 13.2
2 B 2016 14.6
3 C 2016 27.1
4 A 2017 26.4
5 B 2017 15.2
6 C 2017 31.3
The Pipe operator (%>%) allows you to pass output from an argument to another argument without assigning a name or nesting functions.
For example, we can make use the tbl_df function and a pipe to turn a regular data frame to a tibble:
dataFrame %>%
tbl_df
The Pipe operator (%>%) allows you to pass output from an argument to another argument without assigning a name or nesting functions.
For example, we can make use the tbl_df function and a pipe to turn a regular data frame to a tibble:
dataFrame %>%
tbl_df
Note the convention to start a new line after each pipe. This is to make your code more readible.
We can read a data table into R directly as a tibble using the readr function read_csv. For today's work, we will read in files from GitHub. To do so, we will use the package RCurl to read in the data from the web.
# Get URL for website:
gitSite <- 'https://raw.githubusercontent.com/bsevansunc/rWorkshop/master/'
# Paste URL to the file names:
dirtyBirdURL <- getURL(paste0(gitSite, 'dirtyBirdData','.csv'))
dirtyBandingURL <- getURL(paste0(gitSite, 'dirtyBandingData','.csv'))
dirtyResightURL <- getURL(paste0(gitSite, 'dirtyResightData','.csv'))
# Read in the tibbles:
dirtyBird <- read_csv(dirtyBirdURL)
dirtyBanding <- read_csv(dirtyBandingURL)
dirtyResight <- read_csv(dirtyResightURL)