R is a program that was built for statisticians, by statisticians. But it is so flexible that people have been able to use it for many, many other uses. In fact, R is one of the most powerful analytic and visualization tools available to you, and it is free. The only catch is that the learning curve can be a little steep at first.
So, our task here is to give you a chance to play around in R a little and get used to the language, snytax, and other quirks of using this program.
The biggest takeaway from this should be that R is a scripting language. If you have used R Commander, or similar graphic user interface (GUI) in R, then you were using something that essentially ran R commands for you as you selected the various options that you wanted to perform. The scripts were running, either in the background, or as a part of the GUI - as in the case of R Commander, which uses the top window to display all the code that is being run in R. Once you step away from using the point-and-click GUIs, you will begin to realize the potential of scripting your analysis.
The caveat is, of course, that you will also realize the annoyance and frustration of dealing with scripts. Scripts can be a little finicky. So let’s start with a few things that you should know upfront:
x
and X
to be two entirely different objects, rather than seeing them as different cases of the same letter.help()
or ?
. For example, try typing the following commands into the R console.help(plot)
# or
?plot
# You should get the same result either way.
#
, which tells R to ignore everything after the hash on that particular line.#
) followed by whatever note you would like to leave for yourself or others. That way, you will not have to guess what a particular chunk of code was meant to do, or even where you found it.#
is the ability to “turn off” chunks of your code by simply putting a hash in front of it.function()
).2+2
the same way that it treats 2 + 2
or 2 + 2
. Try it out and see what you think.That should be enough background to get you started for now. Let’s move on to some simple uses of R.
As you have, no doubt, worked out by now, R can perform both simple and complex calculations. Just to belabor the obvious, try a few mathematical functions using +
, -
, *
, and ?
.
2+2
## [1] 4
2*2
## [1] 4
2/2
## [1] 1
2-2
## [1] 0
# You can also use a range of numbers.
2:5 # This will create a list of numbers from 2 to 5.
## [1] 2 3 4 5
# Multiply the range by some number.
2:5 * 5
## [1] 10 15 20 25
# of put it all together...
(2+2/(2*2))-2
## [1] 0.5
# Surely, you get the point. Try this out with any values you like and get as complicated as you like.
Unlike Excel, you do not need to use the equal sign (=
) to tell R that you are writing a formula. R assumes that you are, and will tell you if what you are writing is not logical.
2==2
## [1] TRUE
2==3
## [1] FALSE
Notice that the equal sign is doubled in the examples above. That is because a single equal sign has a very special function in R and other coding languages: assignment. From here on, any time that you are writing a logical test in R, you will use a double equal sign. Alternatively, you can ask R whether two things are items are different (not equal) by using !=
. This is necessary since few keyboards have a “not equal” sign. Imagine the exclamation point as a slash through the equal sign.
2!=3
## [1] TRUE
Simple calculations are all well and good. But we want R to be able to do things for use if we fill in a few values. So, for that reason, we will frequently want to use placeholders that we’ll call “objects”. Such objects can represent a particular value, a series of values, or even an entire function. All you are essentially doing is “naming” a number, word, phrase, or procedure.
Note: You can use either a single equal sign (=
) or <-
for assignment. I strongly prefer <-
, since I feel that it makes my code easier to read. But you are welcome to decide for yourself.
# You can create a data object that represents the value "26" with either:
x <- 26
# Then type:
x
## [1] 26
# or,
x = 26
x
## [1] 26
# Then try using the object:
2 + x
## [1] 28
You can also do the same with words (enclosed within parentheses).
y <- "Try this now."
y
## [1] "Try this now."
I will no longer include the output in the examples below. Try them out for yourself in R to get a feel for what they can do.
You can also put several values into one object. To do this, you will use the c()
function. The “c” stands for concatenate, meaning to link stuff together in a series. (And yes, I am using “stuff” as a technical term.)
Try some of the following.
A <- c(11, 22, 33, 44, 55) # Separate values with commas, or R will return a warning.
# Then call it up:
A
# Or use non-numeric values.
B <- c("Try", "this", "list", "of", "words.") # Note that non-numeric values are enclosed in parentheses.
I am assigning these values to letters here out of laziness. You can also use entire words to name objects.
numbers <- c(11, 22, 33, 44, 55)
words <- c("Try", "this", "list", "of", "words.")
# Make some vectors
numbers <- c(11, 22, 33, 44, 55) # Vector of numeric values
words <- c("Try", "this", "list", "of", "words.") # Vector of characters
combo <- c(1, "word", 2, "more words", 3) # Vector of characters and numbers
rnj <- c(3:8) # Vector of numbers from 3 to 8.
logic <- c(TRUE, FALSE, TRUE, FALSE, FALSE) # Logical vector
Then, use typeof()
to assess whether the object consists of numeric values, character values, or other values.
Note: When you have a mix of numeric and non-numeric values in a vector, R will see everything in that vector as a character string. That means that you will sometimes have an extra, non-numeric entry in your vector, and it will make R think it is looking at a character string.
# Try:
typeof(combo)
## [1] "character"
Now test your skills a little. See if you can tell what is wrong with the following:
other <- c(o, 1, 2, 3, 4)
# You will see similar SNAFUs as you continue to use R.
There are a number of things that you can do with the objects that you create. You can apply a mathematical function to all values in the object, and create a new object based on those values.
# Multiply a vector of numbers (use the vector you named "numbers") by some value
numbers * 4
A * 4
# Divide by some number
numbers / 4
# Create a new object using the modified values.
New.Numbers <- numbers * 5
# Compare the two:
numbers
New.Numbers
Each of the above objects may be combined to form a matrix, data frame, or other object.
To do this, you will need to use the cbind()
command to bind the vectors into columns, or the rbind()
command to bind them into rows. Try each to get a feel for what they can do.
You will also see a command for transposing (rotating) a matrix (t()
) and for finding out what class an object is in R (class()
).
# Again, start with some vectors.
numbers <- c(11, 22, 33, 44, 55) # Vector of numeric values
words <- c("Try", "this", "list", "of", "words.") # Vector of characters
combo <- c(1, "word", 2, "more words", 3) # Vector of characters and numbers
rnj <- c(3:7) # Vector of numbers from 3 to 7.
logic <- c(TRUE, FALSE, TRUE, FALSE, FALSE) # Logical vector
# Then, combine them.
Numeric.Object <- cbind(numbers, rnj)
# Take a look at the object.
Numeric.Object
# Ask R what type of object it is.
class(Numeric.Object) # Another way of testing
# Try binding them into rows using rbind().
Numeric.Object2 <- rbind(numbers, rnj)
# Put them all together.
Mixed.Object <- rbind(numbers, words, combo, rnj, logic)
# Then look at it.
Mixed.Object
# If you put the object into rows by mistake, you can transpose the matrix using t().
New.Mixed.Object <- t(Mixed.Object)
# Now take a look at it.
New.Mixed.Object
Suppose you are interested in using just one of the elements in a particular data object. All you have to do is tell R which one you are interested in calling up. You can do so using the name of the data object followed by square brackets ([ ]).
If you are dealing with a vector (a string of numbers or characters/words), then you just use the number that corresponds with the element you are interested in calling. For example, consider the vector we just created called “combo”. If we are interested in calling up just the forth element of that vector (in this case, that is “more words”.), then you may type: combo[4]
.
This works the same way with matrices. But matrices have more coordinates: rows and columns. So inside the square brackets, need to specify which row and which column, separated by a comma. The square brackets will always list the row number first, and then column number. For example, if you would like to call up or use the fifth row of the second column of what we called “Mixed.Object” above, then you would type: Mixed.Object[5,2]
You can do the same thing with arrays (stacked matrices), but that is something you can google. For now, let’s play around with using elements of vectors and arrays.
# Using the objects we created above, look at an element of the vectors.
words[1] # The first element of the vector "words"
words[2] # The second element of the vector "words"
words[3:5] # The third, forth, and fifth elements of the vector "words"
# Then look at the elements in a matrix.
Numeric.Object[4, 1] # This calls the element in row 4, column 1.
Numeric.Object[1:5, 2] # This calls rows one through five in column 2.
Numeric.Object[ , 2] # If you would like to call up all the elements of a particular column, leave the rows blank.
Numeric.Object[4 , ] # Same goes for rows. Here is everything in row four of "Mixed.Object".
# Now to use these new skills:
Numeric.Object[1,1] * Numeric.Object[ , 2] # Multiply the 4th column in the object by the first entry in column one.
New.Stuff <- Numeric.Object[1,1] * Numeric.Object[ , 2] # Do it again, but save it as a vector named "New.Column".
CSVs can provide an easy way to get your data into R. To practice loading data into R, first create a spreadsheet to upload.
Enter the information above into an Excel spreadsheet and save it as a CSV file. Because CSVs are essentially text files, they are much easier to load into R. For more information on this, use the R help file: help(read.csv)
The file.choose()
function allows you to upliad your data without first specifying where the file is located.
data <- read.csv(file.choose(), header=TRUE) # R will let you search for
# the file on your computer.
data <- read.table(file.choose(), header=TRUE, sep=",") # Same as above, but loaded as a
# comma separated text file.
# And now some variations:
data <- read.csv(TryThis.csv, skip=1, header=FALSE) # If you don't want to use headers,
# tell R to ignore the first line.
Sometimes it will be necessary to change the class of object that you are working with in R. Thankfully, working with data sets in R can be very flexible.
When you loaded the CSV into R, it was saved as a data frame. You can use the class()
function to check this in the R console. You can also use ls()
or names()
to see the contents of the data frame.
Data objects in R can be coerced into idfferent forms as well. Data objets can be specified as being data frames (as.data.frame()
), matrices (as.matrix()
), or other forms.
# Using the data you just loaded from the CSV file, above:
class(data) # Check whether this is a data frame, matrix, or somethig else.
# Check the contents
names(data) # Variable names in the order they appear
ls(data) # Variable names in alphabetical order
# Use Absolute referencing to call up a particular variable
data$ColumnOne
# Change the data from a data frame to a matrix
data2 <- as.matrix(data)
# Compare the two data objects
data
data2
Two common ways to save data objects in R are to save them as CSV files, or R files. The CSVs are easy to work with outside of F. But the R files are best for easy and trouble-free loading of data at another time, or for saving more complex data objects like arrays, network objects, and others.
In either case, both of these procedures require the name of the data object in R followed by the name that you want the file saved as, followed by the proper suffix (.csv or .rda).
write.csv(data, file="data.csv") # Save as a CSV file
save(data, file="data.RDA") # Save as an R object
There is way too much to cover in plotting data. The example below is therefore a little basic, and meant to introduce the idea of plotting data in R.
For a much more detailed treatment of how to start plotting data in R, check out the Quick-R website. http://www.statmethods.net/graphs/scatterplot.html
ABut keep in mind that even that is just a beginning overview. There are many more sites that go much further on this topic.
# Using the data set above, or any other with two numeric variables...
plot(data) # This will only work if there are just two nimeric variables
# Alternatively, you can...
plot(data[, 1], data[, 2]) # specify data by column number
plot(data$ColumnOne, data$ColumnTwo) # use absolute referencing to identify variables
# To improve the look of the plot, you can
plot(data$ColumnOne, data$ColumnTwo,
main="This is a scatterplot", # Add a title
xlab="Label for the X Axis", # Label the axes
ylab="Label for the Y Axis",
col="red") # Color the points red
That should be sufficient to get you started. I will be happy to add anything else that you still find confusing.