R is a program that was built for statisticians, by statisticians. But it is so flexible that people have been able to use it for many, many other uses. In fact, R is one of the most powerful analytic and visualization tools available to you, and it is free. The only catch is that the learning curve can be a little steep at first.
So, our task here is to give you a chance to play around in R a little and get used to the language, snytax, and other quirks of using this program.
The biggest takeaway from this should be that R is a scripting language. If you have used R Commander, or similar graphic user interface (GUI) in R, then you were using something that essentially ran R commands for you as you selected the various options that you wanted to perform. The scripts were running, either in the background, or as a part of the GUI - as in the case of R Commander, which uses the top window to display all the code that is being run in R. Once you step away from using the point-and-click GUIs, you will begin to realize the potential of scripting your analysis.
The caveat is, of course, that you will also realize the annoyance and frustration of dealing with scripts. Scripts can be a little finicky. So let’s start with a few things that you should know upfront:
x and X to be two entirely different objects, rather than seeing them as different cases of the same letter.help() or ?. For example, try typing the following commands into the R console.help(plot)
# or
?plot
# You should get the same result either way.
#, which tells R to ignore everything after the hash on that particular line.#) followed by whatever note you would like to leave for yourself or others. That way, you will not have to guess what a particular chunk of code was meant to do, or even where you found it.# is the ability to “turn off” chunks of your code by simply putting a hash in front of it.function()).2+2 the same way that it treats 2 + 2 or 2 + 2. Try it out and see what you think.That should be enough background to get you started for now. Let’s move on to some simple uses of R.
As you have, no doubt, worked out by now, R can perform both simple and complex calculations. Just to belabor the obvious, try a few mathematical functions using +, -, *, and ?.
2+2
## [1] 4
2*2
## [1] 4
2/2
## [1] 1
2-2
## [1] 0
# You can also use a range of numbers.
2:5 # This will create a list of numbers from 2 to 5.
## [1] 2 3 4 5
# Multiply the range by some number.
2:5 * 5
## [1] 10 15 20 25
# of put it all together...
(2+2/(2*2))-2
## [1] 0.5
# Surely, you get the point. Try this out with any values you like and get as complicated as you like.
Unlike Excel, you do not need to use the equal sign (=) to tell R that you are writing a formula. R assumes that you are, and will tell you if what you are writing is not logical.
2==2
## [1] TRUE
2==3
## [1] FALSE
Notice that the equal sign is doubled in the examples above. That is because a single equal sign has a very special function in R and other coding languages: assignment. From here on, any time that you are writing a logical test in R, you will use a double equal sign. Alternatively, you can ask R whether two things are items are different (not equal) by using !=. This is necessary since few keyboards have a “not equal” sign. Imagine the exclamation point as a slash through the equal sign.
2!=3
## [1] TRUE
Simple calculations are all well and good. But we want R to be able to do things for use if we fill in a few values. So, for that reason, we will frequently want to use placeholders that we’ll call “objects”. Such objects can represent a particular value, a series of values, or even an entire function. All you are essentially doing is “naming” a number, word, phrase, or procedure.
Note: You can use either a single equal sign (=) or <- for assignment. I strongly prefer <-, since I feel that it makes my code easier to read. But you are welcome to decide for yourself.
# You can create a data object that represents the value "26" with either:
x <- 26
# Then type:
x
## [1] 26
# or,
x = 26
x
## [1] 26
# Then try using the object:
2 + x
## [1] 28
You can also do the same with words (enclosed within parentheses).
y <- "Try this now."
y
## [1] "Try this now."
I will no longer include the output in the examples below. Try them out for yourself in R to get a feel for what they can do.
You can also put several values into one object. To do this, you will use the c() function. The “c” stands for concatenate, meaning to link stuff together in a series. (And yes, I am using “stuff” as a technical term.)
Try some of the following.
A <- c(11, 22, 33, 44, 55) # Separate values with commas, or R will return a warning.
# Then call it up:
A
# Or use non-numeric values.
B <- c("Try", "this", "list", "of", "words.") # Note that non-numeric values are enclosed in parentheses.
I am assigning these values to letters here out of laziness. You can also use entire words to name objects.
numbers <- c(11, 22, 33, 44, 55)
words <- c("Try", "this", "list", "of", "words.")
# Make some vectors
numbers <- c(11, 22, 33, 44, 55) # Vector of numeric values
words <- c("Try", "this", "list", "of", "words.") # Vector of characters
combo <- c(1, "word", 2, "more words", 3) # Vector of characters and numbers
rnj <- c(3:7) # Vector of numbers from 3 to 8.
logic <- c(TRUE, FALSE, TRUE, FALSE, FALSE) # Logical vector
Then, use typeof() to assess whether the object consists of numeric values, character values, or other values.
Note: When you have a mix of numeric and non-numeric values in a vector, R will see everything in that vector as a character string. That means that you will sometimes have an extra, non-numeric entry in your vector, and it will make R think it is looking at a character string.
# Try:
typeof(combo)
## [1] "character"
Now test your skills a little. See if you can tell what is wrong with the following:
other <- c(o, 1, 2, 3, 4)
# You will see similar SNAFUs as you continue to use R.
There are a number of things that you can do with the objects that you create. You can apply a mathematical function to all values in the object, and create a new object based on those values.
# Multiply a vector of numbers (use the vector you named "numbers") by some value
numbers * 4
A * 4
# Divide by some number
numbers / 4
# Create a new object using the modified values.
New.Numbers <- numbers * 5
# Compare the two:
numbers
New.Numbers
Each of the above objects may be combined to form a matrix, data frame, or other object.
To do this, you will need to use the cbind() command to bind the vectors into columns, or the rbind() command to bind them into rows. Try each to get a feel for what they can do.
You will also see a command for transposing (rotating) a matrix (t()) and for finding out what class an object is in R (class()).
# Again, start with some vectors.
numbers <- c(11, 22, 33, 44, 55) # Vector of numeric values
words <- c("Try", "this", "list", "of", "words.") # Vector of characters
combo <- c(1, "word", 2, "more words", 3) # Vector of characters and numbers
rnj <- c(3:7) # Vector of numbers from 3 to 7.
logic <- c(TRUE, FALSE, TRUE, FALSE, FALSE) # Logical vector
# Then, combine them.
Numeric.Object <- cbind(numbers, rnj)
# Take a look at the object.
Numeric.Object
# Ask R what type of object it is.
class(Numeric.Object) # Another way of testing
# Try binding them into rows using rbind().
Numeric.Object2 <- rbind(numbers, rnj)
# Put them all together.
Mixed.Object <- rbind(numbers, words, combo, rnj, logic)
# Then look at it.
Mixed.Object
# If you put the object into rows by mistake, you can transpose the matrix using t().
New.Mixed.Object <- t(Mixed.Object)
# Now take a look at it.
New.Mixed.Object
Suppose you are interested in using just one of the elements in a particular data object. All you have to do is tell R which one you are interested in calling up. You can do so using the name of the data object followed by square brackets ([ ]).
If you are dealing with a vector (a string of numbers or characters/words), then you just use the number that corresponds with the element you are interested in calling. For example, consider the vector we just created called “combo”. If we are interested in calling up just the forth element of that vector (in this case, that is “more words”.), then you may type: combo[4].
This works the same way with matrices. But matrices have more coordinates: rows and columns. So inside the square brackets, need to specify which row and which column, separated by a comma. The square brackets will always list the row number first, and then column number. For example, if you would like to call up or use the fifth row of the second column of what we called “Mixed.Object” above, then you would type: Mixed.Object[5,2]
You can do the same thing with arrays (stacked matrices), but that is something you can google. For now, let’s play around with using elements of vectors and arrays.
# Using the objects we created above, look at an element of the vectors.
words[1] # The first element of the vector "words"
words[2] # The second element of the vector "words"
words[3:5] # The third, forth, and fifth elements of the vector "words"
# Then look at the elements in a matrix.
Numeric.Object[4, 1] # This calls the element in row 4, column 1.
Numeric.Object[1:5, 2] # This calls rows one through five in column 2.
Numeric.Object[ , 2] # If you would like to call up all the elements of a particular column, leave the rows blank.
Numeric.Object[4 , ] # Same goes for rows. Here is everything in row four of "Mixed.Object".
# Now to use these new skills:
Numeric.Object[1,1] * Numeric.Object[ , 2] # Multiply the 4th column in the object by the first entry in column one.
New.Stuff <- Numeric.Object[1,1] * Numeric.Object[ , 2] # Do it again, but save it as a vector named "New.Column".
There are many, many ways to get data into R. For the sake of simplicity, we will just focus on a couple here.
The first, and most obvous, is to simply type it in. We already did this above when we created some vectors. Provided all the vectors are the same length, you can make a data object from the vectors you created in the example above by binding them together by column (cbind()).
Be careful when you combine vectors. If they are not the same length, then R will try to make them the same length by repeating the pattern in the shorter vectors to fill in the blanks.
dataObject <- cbind(numbers, words, combo, rnj, logic)
At this point, you have a matrix. To look at the matrix you have created, enter its name into R. To verify, ask R what class of object you have. You can also get an idea of the dimensions (dim) of the object to find out how many rows and columns you have.
class(dataObject) # For object type
## [1] "matrix"
dim(dataObject) # For object dimenstions
## [1] 5 5
As you can see, above, you have a matrix with 5 rows and 5 columns. (Rows are always given first and columns are given second.) To make this into a data frame, just tell R that is what you want. To do so, it is a good practice to create it under a new name so that you don’t overwrite your earlier work. Of course, this will only be necessary if you make a mistake in your coding. So, if you don’t make mistakes, then feel free to keep the same name.
Below, we are converting dataObject to a data frame and renaming the resulting data object “dataFrame”.
dataFrame <- as.data.frame(dataObject)
To save the data frame as R data that you can load again later, the command is save. But, you will have to specify what it is you are saving and you will have to specify what name you want to call it when you save it. The proper suffix for R data is rda.
save(dataFrame, file="dataFrame.rda")
You may also chose to save the data as a CSV file so that you can open it later in a text editor, Excel, or similar. “CSV” stands for comma separated values. So, each value in the data set is separated by a comma. The command is write.csv, and like the save command, you will have to specify what you are saving and what you wish to call it once it is saved. Also, make sure to use the .csv suffix.
write.csv(dataFrame, file="dataFrame.csv")
Once you have started a new session, you can use the data you created by reading it back into R. The command you use to read data into R will depend on the type of data you are reading. For the two examples above, you will use either load, for R data, or read.csv or read.table for text files like a csv.
Note that the R data will remember whatever you had named it last time it was saved. The other two commands, on the other hand, will require you to name the data you are importing. Here, we are naming the data “d1” and “d2”.
load("dataFrame.rda")
d1 <- read.csv("dataFrame.csv", header=TRUE) # retain column headers
d2 <- read.table("dataFrame.csv", sep=",", header=TRUE) # point out that values are separated by commas
That is it for now. I will add to this as I can later. For more information on how this applies to social network analysis in R, check out Sean Everton’s examples. https://www.seaneverton.com/a-brief-introduction-to-sna