The assignments for our online R training course will all take the form of R markdown documents like this one. These are files ending in ‘.Rmd’ that combine R code blocks with plain text, such as this paragraph that you’re reading now.
To run the R code blocks you need to have this R markdown file open in RStudio. If you’re reading this in RStudio right now then all is good and well. If not, please start RStudio and open this file using the ‘Open File…’ option from the ‘File’ menu.
Also, it might be a good idea to create a folder on your computer in which to store this assignment and the assignments for the coming weeks, along with some data files that will be provided for use in some of the future assignments.
In these assignments we will describe what we’re asking you to do in some plain text which will be followed by an empty code block in which you will write the R commands as your answers to the exercise.
Here’s an example. Suppose we ask you to add two numbers, 34 and 102, together. The empty code block where you would carry out this calculation would look like this:
It should appear as a grey box with three small grey and green icons at the top on the right hand side.
You can fill in the answer as follows:
34 + 102
## [1] 136
You can add extra lines as necessary by hitting the ‘Return’ key.
Now to run this command you can click on the green arrow or triangle icon, the one that is furthest right and points right-ward. If you hover over this icon, you should get a tool tip saying ‘Run current chunk’. Click on this and see what happens.
You should see the answer printed just below the code block, much as you would have seen in the Console prefixed with the (hopefully) now familiar ‘[1]’ that indicates this is a vector and you’re looking at the first element.
In this first assignment, we are deliberately going to restrict our use of the markdown format to just a few features. You’ll notice the header containing the title, author and date and also section headings starting with ‘#’. Other than that we will just use sections of plain text and R code blocks or chunks as they’re more properly known.
As we go through successive assignments during the course, we’ll introduce more features of R markdown. It’s a great way of writing R code for analyzing and visualizing your data that lets you present your work in beautiful, self-describing reports, a sure-fire way to impress your group leader and colleagues alike.
R markdown is really easy to learn and will let you do a lot of cool things. Our course website was created with R markdown documents that are not that much more complicated than the file you’re reading right now for this assignment.
Now on with this week’s exercises.
fahrenheit.to.celsius <- function(x){
x <- (5/9) * (x - 32)
return(x)
}
list <- c(45,96,451)
lapply(list, fahrenheit.to.celsius)
## [[1]]
## [1] 7.222222
##
## [[2]]
## [1] 35.55556
##
## [[3]]
## [1] 232.7778
Hint: just do what you’d normally do if you can’t remember the formula for converting between Celsius and Fahrenheit (Google in my case).
If you like, you can experiment with getting your R code right in the Console window first and then copy it into the code chunk above when you’re happy with it. It’s not crucial and getting it wrong in the R markdown is no big deal. You can always fix any problems (the most likely being forgetting to use parentheses or brackets in the right place) and re-run your code using the green arrow/triangle icon.
Check you’ve got the right answer by finding a web page with a handy conversion tool.
celsius.to.fahrenheit <- function(x){
x <- 32 + (x * 1.8)
return(x)
}
list <- c(-65,100,20)
lapply(list, celsius.to.fahrenheit)
## [[1]]
## [1] -85
##
## [[2]]
## [1] 212
##
## [[3]]
## [1] 68
obs <- seq(1, 365, by = 5)
obs
## [1] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91
## [20] 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186
## [39] 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281
## [58] 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361
Your friendly neighbourhood statistician has suggested that there should be an R function to do that. What is the function and how do you find out about it and what is the code you will use to create the sequence? Check the resulting vector.
new_obs <- function(x){
x <- seq(1, x, by = 5)
return(x)
}
new_obs(365)
## [1] 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91
## [20] 96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186
## [39] 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281
## [58] 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")
typeof(tricky)
## [1] "character"
typeof(num_char)
## [1] "character"
typeof(num_logical)
## [1] "double"
typeof(char_logical)
## [1] "character"
class(tricky)
## [1] "character"
class(num_char)
## [1] "character"
class(num_logical)
## [1] "numeric"
class(char_logical)
## [1] "character"
Create a new code chunk to test each of the vectors in a separate block. You can do this by using the ‘Insert’ menu just at the top of the pane for this markdown file and selecting R for an R code chunk, or by using the keyboard shortcut (on a Mac this is cmd-alt-i).
You should find that R coerces the data to a lowest common denominator - can you work out the hierarchy?
days <- c(1, 2, 4, 6, 8, 12, 16)
counts <- days ^ 2 + rnorm(days, mean = days)
# add your code here
plot.default(days,counts,pch = 5, col="Purple")
Check out what we did in the above example for getting some example counts data points. Can you make sense of what is going on here? Look at the help page for the rnorm function.
Our counts data don’t really look like counts as they are not whole numbers. Find the function in R that can round these up or down to the closest whole number and apply it in the above code chunk.
| Day | LineA | LineB | LineC |
|---|---|---|---|
| 1 | 4 | 5 | 14 |
| 2 | 9 | 17 | 16 |
| 3 | 7 | 22 | 10 |
| 4 | 12 | 20 | 14 |
| 5 | 23 | 24 | 20 |
| 6 | 8 | 18 | 12 |
Create some R vectors to hold this data and provide summary statisics for number of cells for each cell line. Plot some base R graphs if you like. Describe the data.
Day <- 1:6
LineA <- c(4,9,7,12,23,8)
LineB <- c(5,17,22,20,24,18)
LineC <- c(14,16,10,14,20,12)
datagraph <- plot(Day, LineA, type="o", col="blue", pch="o", lty=1, ylim=c(0,30), ylab="Count" )
# Add second curve to the same plot by calling points() and lines()
points(Day, LineB, col="red", pch="*")
lines(Day, LineB, col="red",lty=2)
# Add Third curve to the same plot by calling points() and lines()
points(Day, LineC, col="dark red",pch="+")
lines(Day, LineC, col="dark red", lty=3)
summary(LineA)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.00 7.25 8.50 10.50 11.25 23.00
summary(LineB)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.00 17.25 19.00 17.67 21.50 24.00
summary(LineC)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 10.00 12.50 14.00 14.33 15.50 20.00
You are then provided with assay data that states that LineA had an activity of 4.2 per cell, LineB an activity of 3.4 and LineC of 1.3.
Use R to calculate the activities of each sample on each day and provide summary statistics of activity for each line.
Click on the ‘Knit’ menu at the top of this file and select either whichever option you prefer to create an HTML, PDF or Word document version of your assignment. This will run all the code chunks and “knit” the resulting results with the surrounding text to produce a report.