Introduction:

Part 1 had you follow my instructiongs for loading and working with data. This part will have you dive into a VERY useful package in R called “dplyr”, this package allows you to easily manipulate and rework data. To complete this lab, you will be following the instructions on this website:

https://datacarpentry.org/R-genomics/04-dplyr.html

***You will need to bring the “metadata.csv” file into R (found on canvas) NOTE: in order for the codes you use later on to work, you NEED to name your dataframe “metadata”

Follow this guide step-by-step. This will be as simple as copying and pasting code from the site. You WILL need to perform the challenge outlined!

Place all of your code for this assignment below and knit this to an HTML to save and upload to canvas.

library("dplyr")          ## load
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
metadata<-read.table('Ecoli_metadata.csv',sep=',', header=T)

metadata
select(metadata, sample, clade, cit, genome_size)
filter(metadata, cit == "plus")
metadata %>%
  filter(cit == "plus") %>%
  select(sample, generation, clade)
meta_citplus <- metadata %>%
  filter(cit == "plus") %>%
  select(sample, generation, clade)

meta_citplus
metadata %>%
  filter(clade == "Cit+") %>%
  select(sample, cit, genome_size)
metadata %>%
  mutate(genome_bp = genome_size *1e6)
metadata %>%
  mutate(genome_bp = genome_size *1e6) %>%
  head
metadata %>%
  mutate(genome_bp = genome_size *1e6) %>%
  filter(!is.na(clade)) %>%
  head
metadata %>%
  group_by(cit) %>%
  summarize(n())
metadata %>%
  group_by(cit) %>%
  summarize(mean_size = mean(genome_size, na.rm = TRUE))
metadata %>%
  group_by(cit, clade) %>%
  summarize(mean_size = mean(genome_size, na.rm = TRUE))
## `summarise()` has grouped output by 'cit'. You can override using the `.groups`
## argument.
metadata %>%
  group_by(cit, clade) %>%
  summarize(mean_size = mean(genome_size, na.rm = TRUE)) %>%
  filter(!is.na(clade))
## `summarise()` has grouped output by 'cit'. You can override using the `.groups`
## argument.
metadata %>%
  group_by(cit, clade) %>%
  summarize(mean_size = mean(genome_size, na.rm = TRUE),
            min_generation = min(generation))
## `summarise()` has grouped output by 'cit'. You can override using the `.groups`
## argument.