Working with the full tidyverse and the lubridate time-package
suppressWarnings(library(tidyverse))
## Loading tidyverse: ggplot2
## Loading tidyverse: tibble
## Loading tidyverse: tidyr
## Loading tidyverse: readr
## Loading tidyverse: purrr
## Loading tidyverse: dplyr
## Conflicts with tidy packages ----------------------------------------------
## filter(): dplyr, stats
## lag(): dplyr, stats
suppressWarnings(library(lubridate))
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
Read the master weight sheet into R and use the gather function to go from a wide data set to a tall data set.
### read master weight file into R
weights <- read_csv("share_weight.csv")
## Parsed with column specification:
## cols(
## .default = col_double(),
## `ANIMAL ID` = col_character(),
## DOB = col_character(),
## SEX = col_character(),
## GENOTYPE = col_character(),
## `April 4, 2014` = col_character(),
## `April 18, 2014` = col_character(),
## `January 2, 2015` = col_character(),
## `March 27, 2015` = col_character(),
## `July 3, 2015` = col_character(),
## `November 6, 2015` = col_character(),
## `December 18, 2015` = col_character(),
## `January 6, 2016` = col_character(),
## `February 5, 2016` = col_character(),
## `April 1, 2016` = col_character(),
## `August 5, 2016` = col_character(),
## `August 19, 2016` = col_character(),
## `September 2, 2016` = col_character(),
## `April 7, 2017` = col_character(),
## `July 7, 2017` = col_character()
## )
## See spec(...) for full column specifications.
head(weights)
## # A tibble: 6 x 117
## `ANIMAL ID` DOB SEX GENOTYPE `December 13, 2013`
## <chr> <chr> <chr> <chr> <dbl>
## 1 749162 7/31/2013 male wt 32.3
## 2 943492 7/31/2013 male wt 25.9
## 3 914784 7/31/2013 male wt 34.5
## 4 48711 7/31/2013 female wt 24.9
## 5 722836 7/31/2013 female wt 21.8
## 6 869852 7/31/2013 female wt 25.5
## # ... with 112 more variables: `December 27, 2013` <dbl>, `January 10,
## # 2014` <dbl>, `January 24, 2014` <dbl>, `February 7, 2014` <dbl>,
## # `February 21, 2014` <dbl>, `March 7, 2014` <dbl>, `March 21,
## # 2014` <dbl>, `April 4, 2014` <chr>, `April 18, 2014` <chr>, `May 2,
## # 2014` <dbl>, `May 15, 2014` <dbl>, `May 30, 2014` <dbl>, `June 13,
## # 2014` <dbl>, `June 27, 2014` <dbl>, `July 11, 2014` <dbl>, `July 25,
## # 2014` <dbl>, `August 4, 2014` <dbl>, `August 15, 2014` <dbl>, `August
## # 29, 2014` <dbl>, `September 5, 2014` <dbl>, `September 12,
## # 2014` <dbl>, `September 26, 2014` <dbl>, `October 3, 2014` <dbl>,
## # `October 10, 2014` <dbl>, `October 24, 2014` <dbl>, `November 7,
## # 2014` <dbl>, `November 21, 2014` <dbl>, `December 5, 2014` <dbl>,
## # `December 19, 2014` <dbl>, `January 2, 2015` <chr>, `January 16,
## # 2015` <dbl>, `January 30, 2015` <dbl>, `February 6, 2015` <dbl>,
## # `February 13, 2015` <dbl>, `February 27, 2015` <dbl>, `March 6,
## # 2015` <dbl>, `March 13, 2015` <dbl>, `March 27, 2015` <chr>, `April 3,
## # 2015` <dbl>, `April 10, 2015` <dbl>, `April 24, 2015` <dbl>, `May 3,
## # 2015` <dbl>, `May 8, 2015` <dbl>, `May 22, 2015` <dbl>, `June 5,
## # 2015` <dbl>, `June 19, 2015` <dbl>, `July 3, 2015` <chr>, `July 17,
## # 2015` <dbl>, `July 31, 2015` <dbl>, `August 7, 2015` <dbl>, `August
## # 14, 2015` <dbl>, `August 28, 2015` <dbl>, `September 4, 2015` <dbl>,
## # `September 11, 2015` <dbl>, `September 25, 2015` <dbl>, `October 2,
## # 2015` <dbl>, `October 9, 2015` <dbl>, `October 23, 2015` <dbl>,
## # `November 6, 2015` <chr>, `November 20, 2015` <dbl>, `December 4,
## # 2015` <dbl>, `December 18, 2015` <chr>, `January 6, 2016` <chr>,
## # `January 22, 2016` <dbl>, `February 5, 2016` <chr>, `February 19,
## # 2016` <dbl>, `March 4, 2016` <dbl>, `March 18, 2016` <dbl>, `April 1,
## # 2016` <chr>, `April 14, 2016` <dbl>, `April 28, 2016` <dbl>, `May 6,
## # 2016` <dbl>, `May 12, 2016` <dbl>, `May 27, 2016` <dbl>, `June 3,
## # 2016` <dbl>, `June 10, 2016` <dbl>, `June 24, 2016` <dbl>, `July 1,
## # 2016` <dbl>, `July 8, 2016` <dbl>, `July 22, 2016` <dbl>, `August 5,
## # 2016` <chr>, `August 19, 2016` <chr>, `September 2, 2016` <chr>,
## # `September 16, 2016` <dbl>, `September 29, 2016` <dbl>, `October 7,
## # 2016` <dbl>, `October 14, 2016` <dbl>, `October 28, 2016` <dbl>,
## # `November 4, 2016` <dbl>, `November 11, 2016` <dbl>, `December 2,
## # 2016` <dbl>, `December 9, 2016` <dbl>, `December 23, 2016` <dbl>,
## # `January 6, 2017` <dbl>, `January 20, 2017` <dbl>, `February 3,
## # 2017` <dbl>, `February 17, 2017` <dbl>, `March 3, 2017` <dbl>, `March
## # 17, 2017` <dbl>, `April 7, 2017` <chr>, ...
The gather() function is used to shift data that is a column header into a single row. This converts for example the column headers Dec 13, 2013, December 27th, 2013, January 12th, 2014, etc from column headers into a single column named “date.” The corresponding weights are matched to the appropriate animals and into a new column named “weight.” We can see that now the date and corresponding weights are matched into new columns named date/weight.
### Gather the data from columns into a single row for tidy data
wt_gather <- weights %>% gather(date, weight, 5:117)
head(wt_gather)
## # A tibble: 6 x 6
## `ANIMAL ID` DOB SEX GENOTYPE date weight
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 749162 7/31/2013 male wt December 13, 2013 32.3
## 2 943492 7/31/2013 male wt December 13, 2013 25.9
## 3 914784 7/31/2013 male wt December 13, 2013 34.5
## 4 48711 7/31/2013 female wt December 13, 2013 24.9
## 5 722836 7/31/2013 female wt December 13, 2013 21.8
## 6 869852 7/31/2013 female wt December 13, 2013 25.5
We can then convert the dates from the December 13th, 2013 format into a 12/13/13 (mdy) format that is machine-readable.
### Convert the dates into machine-readable month-day-year dates
wt_gather$date <- mdy(wt_gather$date)
wt_gather$DOB <- mdy(wt_gather$DOB)
Using the timediff function behind the scenes we can create a new column (age_wt) that calculates the number of days between the date of the weight and the date of birth. We can exclude the dates that are negative. Additionally we convert weight to numeric, as it was originally imported as a character (eg letter).
### Calculate the difference in days between the date of birth (DOB)
### and the specific date the weight was taken
### convert weight to numeric instead of character
wt_age <- wt_gather %>%
mutate(age_wt = date - DOB) %>%
filter(age_wt > 0)
wt_age$weight <- as.numeric(wt_age$weight)
## Warning: NAs introduced by coercion
head(wt_age)
## # A tibble: 6 x 7
## `ANIMAL ID` DOB SEX GENOTYPE date weight age_wt
## <chr> <date> <chr> <chr> <date> <dbl> <time>
## 1 749162 2013-07-31 male wt 2013-12-13 32.3 135 days
## 2 943492 2013-07-31 male wt 2013-12-13 25.9 135 days
## 3 914784 2013-07-31 male wt 2013-12-13 34.5 135 days
## 4 48711 2013-07-31 female wt 2013-12-13 24.9 135 days
## 5 722836 2013-07-31 female wt 2013-12-13 21.8 135 days
## 6 869852 2013-07-31 female wt 2013-12-13 25.5 135 days
To associate the days w/ number of months between weights we can pull the wt_age (time between weight date/DOB) and then “bin” it into 30 day increments. This is slightly messy for use in statistics, as 31 days = 2 months, while 30 days = 1 month, but it works perfectly for graphing. We then add a new column to our original dataframe that incorporates the matching time-bins to the animals. In this example, 135 days = 4 months + 15 days = 5 months.
### create a vector and then bin
age_vec <- wt_age$age_wt
bin_x <- seq(0, 1530, 30)
bin_vec <- .bincode(age_vec, bin_x, TRUE, TRUE)
wt_age["bin"] <- bin_vec
head(wt_age)
## # A tibble: 6 x 8
## `ANIMAL ID` DOB SEX GENOTYPE date weight age_wt bin
## <chr> <date> <chr> <chr> <date> <dbl> <time> <int>
## 1 749162 2013-07-31 male wt 2013-12-13 32.3 135 days 5
## 2 943492 2013-07-31 male wt 2013-12-13 25.9 135 days 5
## 3 914784 2013-07-31 male wt 2013-12-13 34.5 135 days 5
## 4 48711 2013-07-31 female wt 2013-12-13 24.9 135 days 5
## 5 722836 2013-07-31 female wt 2013-12-13 21.8 135 days 5
## 6 869852 2013-07-31 female wt 2013-12-13 25.5 135 days 5
Now that we have a nice tidy data set we can exclude animals w/ unknown genotypes, weights that don’t exist, or outliers (one animal has a bodyweight of 2708 g). Our summary stats are grouped by sex, genotype, and then time bin. The summary stats include the N and the mean weight. From the header we can see the new dataframe, which now has headers for sex, gt, time bin, n, and then weight. It is origin/systat ready, or we can keep going in R for preliminary graphs.
### start the summarize pipe, exclude unknown genotypes, NA weights or outliers (> 55)
### group by SEX, GENOTYPE, and by the age in days then calc the mean
wt_sum <- wt_age %>%
filter(GENOTYPE != "NA", GENOTYPE != "?", weight != "NA", weight < 55) %>%
group_by(SEX, GENOTYPE, bin) %>%
summarise(n = n(), weight = mean(weight, na.rm = T))
### view the end results
wt_sum
## # A tibble: 148 x 5
## # Groups: SEX, GENOTYPE [?]
## SEX GENOTYPE bin n weight
## <chr> <chr> <int> <int> <dbl>
## 1 female het 1 26 14.50769
## 2 female het 2 251 16.90837
## 3 female het 3 304 18.96283
## 4 female het 4 296 20.26081
## 5 female het 5 261 21.23908
## 6 female het 6 165 21.87152
## 7 female het 7 140 22.56571
## 8 female het 8 119 23.11597
## 9 female het 9 120 23.41667
## 10 female het 10 112 23.60357
## # ... with 138 more rows
We can graph the prelim data, and facet by sex/genotype. I applied a smoothed line of best fit, not a true correlation but a moving average equivalent.
### Faceted graph by genotype and sex
(g1 <- ggplot(data = wt_sum, aes(x = bin, y = weight, color = GENOTYPE)) +
geom_point(size = 6, alpha = 0.2) +
geom_smooth(alpha = .5, size = 1.5) +
facet_wrap(~SEX, nrow = 2) +
labs(x = "Age in months", y = "Body weight in grams") +
scale_x_continuous(breaks = c(0, 5, 10, 15, 20, 25, 30)))
## `geom_smooth()` using method = 'loess'
These last two steps would save the graph and the data file into excel.
ggsave(“weight_age.png”, width = 20, height = 20, units = “cm”)
write.csv(wt_age, “wt_age.csv”)