R debut

My top ten tricks for new R users

I am a new-ish R user myself (most of my own computational projects are in Python). I look forward to learning more along with you. Here’s my current top ten list of things to know. If you’re new to R, I suggest that you carefully read each line and make sure you understand where the results come from.

0: Install new packages

Sometimes you’ll get an error saying “this package is unknown” or something similar. In that case, click on the Packages tab in the lower right subwindow. Look for the package you want, for example “ggplot2”. Check the box next to the name of the package. If you don’t see it, click on Install and follow the directions. In your R Markdown file, you should also put a line of code like this:

library(ggplot2)

1: Use R as a calculator

You can use R for basic arithmetic like this:

(1+(4-2)*2)^2 / 5

## [1] 5

There’s a lot more beyond arithmetic! Two other basic neat operations are the quotient and remainder. Seventeen divided by five is three with remainder two; compare these lines:

print(17 / 5)

## [1] 3.4

print(17 %/%  5)

## [1] 3

print(17 %%  5)

## [1] 2

Then there are many built-in calculus functions and constants. For example:

print(sin(pi/2) + cos(pi/2))

## [1] 1

print(log10(100))

## [1] 2

print(log(100))

## [1] 4.60517

print(exp(-10))

## [1] 4.539993e-05

Please note in the previous lines that “log” is the logarithm with base e; if you want log base 10 you have to write “log10”. Also, to find e^x you type “exp(x)”. The last answer has a funny “e-05” which is scientific notation, that is, \(\times 10^{-5}\).

2: Store variables

In R there are two equivalent ways to assign variables. The “<-” notation is supposed to look like an arrow. The “=” notation (which is distinct from the double equals “==”) is an alternative.

a <- 4
b = -1 
print(a^b)

## [1] 0.25

3: Write a function

A basic function takes in a number and returns another number. Here’s an example:

myfun <- function(x){
    a <- 3
    b <- x+2 
    c <- b^2 
    return(a+b+c)
}

That cell defines the function that takes in \(x\) and returns \(3 + (x+2) + (x+2)^2\). That is, we’ve just defined a quadratic function. Now we can use it in other code fragments:

print(myfun(-2))

## [1] 3

d = myfun(10.5) - myfun(9.5)
print(d)

## [1] 25

More complicated functions types are possible. A function can even create and return another function. Careful function writing can make your code nice and clean and organized.

4: Make a vector using the colon operator

I love this one. To make a list of all integers from 2 to 9, for example, we can just write:

v = 2:9
print(v)

## [1] 2 3 4 5 6 7 8 9

print(sum(v))

## [1] 44

Now the variable v is a vector (a list of numbers). The built-in function sum takes in a vector and returns a number, specifically the sum of all vector entries.

5: Make a vector by specifying each entry

Another way to make a vector is to type each entry individually. The R method is like this:

w = c(2,-1,4,0,-10)
print(w)

## [1]   2  -1   4   0 -10

print(sum(w))

## [1] -5

6: Manipulate each entry in a vector all at once

If you have a list of numbers and you want to do the same simple thing to each entry, R has really nice ways to do this! Here are some examples. Try to understand where every output number comes from.

a = 1:4
b = c(2,5,5,-1)
print(2*a^2) # a new vector containing two times the squares of each entry in "a"

## [1]  2  8 18 32

print(a*b)   # a new vector containing the products of the entries of "a" and "b"

## [1]  2 10 15 -4

print(0*b-1) # a new vector of the same size as "b", with entries all equal to -1

## [1] -1 -1 -1 -1

These operations make clean code (we didn’t write any for-loops) and they internally run very quickly, compared to a version where we write an explicit for-loop.

Suppose next that we want 100 equally spaced numbers starting from 2 and ending at 22. One way to do this is:

begin with the integers from 0 to 99 (that’s 100 integers)
divide by 99 to get a list of numbers from 0 to 1
multiply by 20 to get a list of numbers from 0 to 20
add 2 to get a list of numbers from 2 to 22, just as we wanted.

v = (0:99) / 99 * 20 + 2
print(v)

##   [1]  2.000000  2.202020  2.404040  2.606061  2.808081  3.010101  3.212121
##   [8]  3.414141  3.616162  3.818182  4.020202  4.222222  4.424242  4.626263
##  [15]  4.828283  5.030303  5.232323  5.434343  5.636364  5.838384  6.040404
##  [22]  6.242424  6.444444  6.646465  6.848485  7.050505  7.252525  7.454545
##  [29]  7.656566  7.858586  8.060606  8.262626  8.464646  8.666667  8.868687
##  [36]  9.070707  9.272727  9.474747  9.676768  9.878788 10.080808 10.282828
##  [43] 10.484848 10.686869 10.888889 11.090909 11.292929 11.494949 11.696970
##  [50] 11.898990 12.101010 12.303030 12.505051 12.707071 12.909091 13.111111
##  [57] 13.313131 13.515152 13.717172 13.919192 14.121212 14.323232 14.525253
##  [64] 14.727273 14.929293 15.131313 15.333333 15.535354 15.737374 15.939394
##  [71] 16.141414 16.343434 16.545455 16.747475 16.949495 17.151515 17.353535
##  [78] 17.555556 17.757576 17.959596 18.161616 18.363636 18.565657 18.767677
##  [85] 18.969697 19.171717 19.373737 19.575758 19.777778 19.979798 20.181818
##  [92] 20.383838 20.585859 20.787879 20.989899 21.191919 21.393939 21.595960
##  [99] 21.797980 22.000000

This could also be done quickly if you know the special command “seq”.

w = seq(from=2, to=22, length.out = 100)
print(w)

##   [1]  2.000000  2.202020  2.404040  2.606061  2.808081  3.010101  3.212121
##   [8]  3.414141  3.616162  3.818182  4.020202  4.222222  4.424242  4.626263
##  [15]  4.828283  5.030303  5.232323  5.434343  5.636364  5.838384  6.040404
##  [22]  6.242424  6.444444  6.646465  6.848485  7.050505  7.252525  7.454545
##  [29]  7.656566  7.858586  8.060606  8.262626  8.464646  8.666667  8.868687
##  [36]  9.070707  9.272727  9.474747  9.676768  9.878788 10.080808 10.282828
##  [43] 10.484848 10.686869 10.888889 11.090909 11.292929 11.494949 11.696970
##  [50] 11.898990 12.101010 12.303030 12.505051 12.707071 12.909091 13.111111
##  [57] 13.313131 13.515152 13.717172 13.919192 14.121212 14.323232 14.525253
##  [64] 14.727273 14.929293 15.131313 15.333333 15.535354 15.737374 15.939394
##  [71] 16.141414 16.343434 16.545455 16.747475 16.949495 17.151515 17.353535
##  [78] 17.555556 17.757576 17.959596 18.161616 18.363636 18.565657 18.767677
##  [85] 18.969697 19.171717 19.373737 19.575758 19.777778 19.979798 20.181818
##  [92] 20.383838 20.585859 20.787879 20.989899 21.191919 21.393939 21.595960
##  [99] 21.797980 22.000000

Both ways are fine. I like the first one because I don’t have to remember the syntax of the “seq” command, and I don’t mind a little mental math for scaling and shifting the numbers. But if I did this a lot, maybe the “seq” would feel more natural.

Note on getting help in R: Just now while writing this document, I could remember the name of the “seq” command but not how to use it. I typed “?seq” into the Console panel to see the help file for that command and that answered my question! Web searches can also be helpful for stuff like this.

7: Write a for-loop

Here is a basic program to add up all of the squares of the numbers between 10 and 20:

vec = 10:20
total = 0
for (x in vec) {
  total = total + x^2
}
print(total)

## [1] 2585

A much shorter way to do this is:

vec = 10:20
print(sum(vec^2))

## [1] 2585

If you have lots of for-loops, especially for-loops inside of other for-loops, your code can run slowly. They are a tool that experienced R users employ sparingly. However, I suggest using them whenever it’s convenient for this class. Here’s another example that might be more difficult to rewrite without a loop. We’ll start with the number 10. Then we’ll multiply by 2, then subtract 1, then multiply by 2, then subtract 1, and so on until we’ve subtracted 1 for the eighth time.

f = 10
for (j in 1:8){
  f = f*2 
  f = f-1
}
print(f)

## [1] 2305

In this example we don’t actually do anything with the index j as it increases from 1 to 8. That’s just a way to make the commands "f = f*2" and “f = f-1” run eight times. Here’s a more extensive discussion of loops: link.

8: Use if/then/else and True/False

The function “runif” produces random numbers between 0 and 1 (the name runif is a contraction of random uniform). Here’s a code block where:

we generate ten random numbers between 0 and 1
we find the maximum among those numbers
if the max is above 0.9, we print “hooray”; otherwise “too bad”

v = runif(10)
M = max(v)
print(M)

## [1] 0.7396858

if (M>0.9){ 
  print("hooray")
} else {
  print("too bad")
}

## [1] "too bad"

The statement “M>0.9” produces a Boolean variable (either “TRUE” or “FALSE”). You can get create Booleans in a few ways. Here are some examples:

a <- 1 
b <- 3
print(a == b)

## [1] FALSE

print(a <= b-2)

## [1] TRUE

print(-a > -b && 10*a > b)

## [1] TRUE

print(a > 50 || b > 0)

## [1] TRUE

Note the difference between “a==b” and “a=b”. With double equals, you’re testing whether or not these things are the same. With a single equals, you’re copying the value of “b” into the variable “a”.

The and “&&” and or “||” can be strung together into more complicated expressions; I suggest using lots of parentheses since (A && B) || C is different than A && (B || C).

9: Make a basic line plot

Let’s make a line plot illustrating the function \(f(x) = 1 + 2\sin(2x) + 2\sin(4x)\), between the bounds \(x=0\) and \(x = 2\pi\).

To do this, we start by making a vector of numbers varying from \(0\) to \(2\pi\). These are the values of the horizontal coordinate.

x = 0:100 / 100 * 2 * pi

Next we’ll build a vector of \(y\)-values corresponding to these \(x\)-values:

y = 1 + 2*sin(2*x) + 2*sin(4*x)

Finally we’ll make a plot! You can just type “plot(x,y)”, but you get a prettier result if you enable the “ggplot2” package and type the following:

qplot(x,y)

You can see in this figure that we used 101 points. If you want them connected in a nice line, you can give another argument like this one:

qplot(x,y,geom="line")

In the preceding figure there are a few places where the graph has sharp corners. It would be better if we had used about a thousand points instead of 101.