A word about R markdown

The assignments for our online R training course will all take the form of R markdown documents like this one. These are files ending in ‘.Rmd’ that combine R code blocks with plain text, such as this paragraph that you’re reading now.

To run the R code blocks you need to have this R markdown file open in RStudio. If you’re reading this in RStudio right now then all is good and well. If not, please start RStudio and open this file using the ‘Open File…’ option from the ‘File’ menu.

Also, it might be a good idea to create a folder on your computer in which to store this assignment and the assignments for the coming weeks, along with some data files that will be provided for use in some of the future assignments.

In these assignments we will describe what we’re asking you to do in some plain text which will be followed by an empty code block in which you will write the R commands as your answers to the exercise.

Here’s an example. Suppose we ask you to add two numbers, 34 and 102, together. The empty code block where you would carry out this calculation would look like this:

It should appear as a grey box with three small grey and green icons at the top on the right hand side.

You can fill in the answer as follows:

34 + 102
## [1] 136

You can add extra lines as necessary by hitting the ‘Return’ key.

Now to run this command you can click on the green arrow or triangle icon, the one that is furthest right and points right-ward. If you hover over this icon, you should get a tool tip saying ‘Run current chunk’. Click on this and see what happens.

You should see the answer printed just below the code block, much as you would have seen in the Console prefixed with the (hopefully) now familiar ‘[1]’ that indicates this is a vector and you’re looking at the first element.

In this first assignment, we are deliberately going to restrict our use of the markdown format to just a few features. You’ll notice the header containing the title, author and date and also section headings starting with ‘#’. Other than that we will just use sections of plain text and R code blocks or chunks as they’re more properly known.

As we go through successive assignments during the course, we’ll introduce more features of R markdown. It’s a great way of writing R code for analyzing and visualizing your data that lets you present your work in beautiful, self-describing reports, a sure-fire way to impress your group leader and colleagues alike.

R markdown is really easy to learn and will let you do a lot of cool things. Our course website was created with R markdown documents that are not that much more complicated than the file you’re reading right now for this assignment.

Now on with this week’s exercises.

Using R as a calculator

  1. Convert the following temperatures given in degrees Fahrenheit to Celsius: 45, 96, 451
fahrenheit.to.celsius <- function(x){
  x <- (5/9) * (x - 32)
  return(x)
}

list <- c(45,96,451)

lapply(list, fahrenheit.to.celsius)
## [[1]]
## [1] 7.222222
## 
## [[2]]
## [1] 35.55556
## 
## [[3]]
## [1] 232.7778

Hint: just do what you’d normally do if you can’t remember the formula for converting between Celsius and Fahrenheit (Google in my case).

If you like, you can experiment with getting your R code right in the Console window first and then copy it into the code chunk above when you’re happy with it. It’s not crucial and getting it wrong in the R markdown is no big deal. You can always fix any problems (the most likely being forgetting to use parentheses or brackets in the right place) and re-run your code using the green arrow/triangle icon.

Check you’ve got the right answer by finding a web page with a handy conversion tool.

  1. Similarly, convert the following temperatures in degrees Celsius to Fahrenheit: -65, 100, 20
celsius.to.fahrenheit <- function(x){
  x <- 32 + (x * 1.8)
  return(x)
}

list <- c(-65,100,20)

lapply(list, celsius.to.fahrenheit)
## [[1]]
## [1] -85
## 
## [[2]]
## [1] 212
## 
## [[3]]
## [1] 68

Generating sequence vectors

  1. Generate a sequence of numbers representing the days at which you take a measurement or a sample at 5-day intervals for about a year.
obs <- seq(1, 365, by = 5)
obs
##  [1]   1   6  11  16  21  26  31  36  41  46  51  56  61  66  71  76  81  86  91
## [20]  96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186
## [39] 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281
## [58] 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361

Your friendly neighbourhood statistician has suggested that there should be an R function to do that. What is the function and how do you find out about it and what is the code you will use to create the sequence? Check the resulting vector.

new_obs <- function(x){
  x <- seq(1, x, by = 5)
  return(x)
}
new_obs(365)
##  [1]   1   6  11  16  21  26  31  36  41  46  51  56  61  66  71  76  81  86  91
## [20]  96 101 106 111 116 121 126 131 136 141 146 151 156 161 166 171 176 181 186
## [39] 191 196 201 206 211 216 221 226 231 236 241 246 251 256 261 266 271 276 281
## [58] 286 291 296 301 306 311 316 321 326 331 336 341 346 351 356 361

Looking at types of objects

  1. Run the code below and use the typeof() and/or class() function (check it’s help page) and see how R treats each newly-created vector?
num_char <- c(1, 2, 3, "a")
num_logical <- c(1, 2, 3, TRUE)
char_logical <- c("a", "b", "c", TRUE)
tricky <- c(1, 2, 3, "4")

typeof(tricky)
## [1] "character"
typeof(num_char)
## [1] "character"
typeof(num_logical)
## [1] "double"
typeof(char_logical)
## [1] "character"
class(tricky)
## [1] "character"
class(num_char)
## [1] "character"
class(num_logical)
## [1] "numeric"
class(char_logical)
## [1] "character"

Create a new code chunk to test each of the vectors in a separate block. You can do this by using the ‘Insert’ menu just at the top of the pane for this markdown file and selecting R for an R code chunk, or by using the keyboard shortcut (on a Mac this is cmd-alt-i).

You should find that R coerces the data to a lowest common denominator - can you work out the hierarchy?

Plotting data

  1. You have been asked to plot a graph of counts data measured over several days. The Principal Investigator has requested that certain symbols be used for each dataset being plotted (he’s a bit like that!). What command would you use to find out the parameter which sets this for the plot.default command and what is the parameter’s name?
days <- c(1, 2, 4, 6, 8, 12, 16)
counts <- days ^ 2 + rnorm(days, mean = days)

# add your code here
plot.default(days,counts,pch = 5, col="Purple")

Check out what we did in the above example for getting some example counts data points. Can you make sense of what is going on here? Look at the help page for the rnorm function.

Our counts data don’t really look like counts as they are not whole numbers. Find the function in R that can round these up or down to the closest whole number and apply it in the above code chunk.

Exploring and summarizing data

  1. Your colleague has supplied you with the following table of data (number of cells per sample volume):
Day LineA LineB LineC
1 4 5 14
2 9 17 16
3 7 22 10
4 12 20 14
5 23 24 20
6 8 18 12

Create some R vectors to hold this data and provide summary statisics for number of cells for each cell line. Plot some base R graphs if you like. Describe the data.

Day <- 1:6
LineA <- c(4,9,7,12,23,8)
LineB <- c(5,17,22,20,24,18)
LineC <- c(14,16,10,14,20,12)

 
datagraph <- plot(Day, LineA, type="o", col="blue", pch="o", lty=1, ylim=c(0,30), ylab="Count" )

# Add second curve to the same plot by calling points() and lines()
points(Day, LineB, col="red", pch="*")
lines(Day, LineB, col="red",lty=2)

# Add Third curve to the same plot by calling points() and lines()
points(Day, LineC, col="dark red",pch="+")
lines(Day, LineC, col="dark red", lty=3)

summary(LineA)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.00    7.25    8.50   10.50   11.25   23.00
summary(LineB)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    5.00   17.25   19.00   17.67   21.50   24.00
summary(LineC)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   12.50   14.00   14.33   15.50   20.00

You are then provided with assay data that states that LineA had an activity of 4.2 per cell, LineB an activity of 3.4 and LineC of 1.3.

Use R to calculate the activities of each sample on each day and provide summary statistics of activity for each line.

Creating a report for your assignment

Click on the ‘Knit’ menu at the top of this file and select either whichever option you prefer to create an HTML, PDF or Word document version of your assignment. This will run all the code chunks and “knit” the resulting results with the surrounding text to produce a report.