Gentle Intro to R

Starting with the Very Basics

R is a program that was built for statisticians, by statisticians. But it is so flexible that people have been able to use it for many, many other uses. In fact, R is one of the most powerful analytic and visualization tools available to you, and it is free. The only catch is that the learning curve can be a little steep at first.

So, our task here is to give you a chance to play around in R a little and get used to the language, snytax, and other quirks of using this program.

The biggest takeaway from this should be that R is a scripting language. If you have used R Commander, or similar graphic user interface (GUI) in R, then you were using something that essentially ran R commands for you as you selected the various options that you wanted to perform. The scripts were running, either in the background, or as a part of the GUI - as in the case of R Commander, which uses the top window to display all the code that is being run in R. Once you step away from using the point-and-click GUIs, you will begin to realize the potential of scripting your analysis.

The caveat is, of course, that you will also realize the annoyance and frustration of dealing with scripts. Scripts can be a little finicky. So let’s start with a few things that you should know upfront:

Case matters!

For example, R does not read what is on the screen - or in its memory - the way a human would. So, R understands x and X to be two entirely different objects, rather than seeing them as different cases of the same letter.

There are packages to help you.

R is very flexible and can be made to do many things. But few people would want to tel it what to do from scratch. That is why a lot of what R does has been automated in pre-written scripts that you can use to carry out common tasks. Those pre-written scripts are called “packages”, and some of those packages are loaded into R every time you open R on your computer. Othe packages are avaialble either, in the CRAN (Comprehensive R Archive Network) repository, or elsewhwere on the web.
The network analysis package igraph is one such “package”. It was written to make it easier for people to use common network analysis routines without first having to spend hours programming a computer. Although igraph is powerful, and has many functions and features, it does not do everything that we need. We will therefore be using other packages, such as sna, network, foreign, and statnet in later modules. Those packages, however, were written by other people and will therefore have their own idiosyncracies and styles.
For now, we will focus on the way igraph does things.

You are not alone!

You do not need to memorize everything about R in order to get a lot out of it. Getting help is fairly straightforward - both in R and on the web. There is a massive international community of R users who have likely see whatever problem you are having and have proposed multiple solutions to it. For most problems, a simple web search - either by describing your problem, or by cutting and pasting whatever warning you are getting - should give you multiple results.
Though, if your problem is directly related to how to run a particular command, then R’s built-in help files should be able to give you what you need. To use the help function in R, you can use either help() or ?. For example, try typing the following commands into the R console.

help(plot)
# or
?plot
# You should get the same result either way.

Both commands do the same thing: they open the R help section. If you are working in the basic R console, then the help section should have opened in a separate window. Those who are using an IDE like R Studio will see the help section open in the help window - which defaults to the lower-right corner, unless you customize it to be somewhere else.

Comments

There are two comments in the example above. They are identified by the #, which tells R to ignore everything after the hash on that particular line.
One of the best things about working with codes and scripts is that you never have to remember what you clicked to get a particular result. But that doesn’t mean that you will always remember what you were thinking as you worked. This is why I always implore people who are starting out in R to please put comments in their work.
Comments allow you to keep track of what you are thinking as you work. As you are writing a script, simply add in a hash (#) followed by whatever note you would like to leave for yourself or others. That way, you will not have to guess what a particular chunk of code was meant to do, or even where you found it.
The added bonus to being able to use # is the ability to “turn off” chunks of your code by simply putting a hash in front of it.

Spacing

Spacing within a code chunk is not important. R will ignore extra spaces within your code, as long the space does not separate things that must be adjacent, such as functions and their parentheses (function()).
Glance through the code presented below. You will note that R will treat 2+2 the same way that it treats 2 + 2 or 2 + 2. Try it out and see what you think.

Naming Conventions in R

When you name objects in R, as we do below, it is generally a good idea - though not necessary - to keep the name brief. Though, keep in mind that names that you create for objects and functions should be in the form of a single word. Your options are to combine words using periods, hyphens, camelbacked (capitalized) words, or some shortened word that represents what you want to call the object. For example, if you are creating a new network, you can name it “new.network”, “new_network”, “NewNetwork”, “newnet”, or any other single word or letter.
Also keep in mind that there are a lot of packages and functions that are already loaded into R. Be careful not to overwrite any of them with your new name. In other words, be careful not to use a word that is already in use.

That should be enough background to get you started for now. Let’s move on to some simple uses of R.

Calculation

As you have, no doubt, worked out by now, R can perform both simple and complex calculations. Just to belabor the obvious, try a few mathematical functions using +, -, *, and ?.

2+2

## [1] 4

2*2

## [1] 4

2/2

## [1] 1

2-2

## [1] 0

# You can also use a range of numbers.
2:5  # This will create a list of numbers from 2 to 5.

## [1] 2 3 4 5

  # Multiply the range by some number.
2:5 * 5

## [1] 10 15 20 25

# of put it all together...
(2+2/(2*2))-2

## [1] 0.5

# Surely, you get the point. Try this out with any values you like and get as complicated as you like.

Unlike Excel, you do not need to use the equal sign (=) to tell R that you are writing a formula. R assumes that you are, and will tell you if what you are writing is not logical.

2==2

## [1] TRUE

2==3

## [1] FALSE

Notice that the equal sign is doubled in the examples above. That is because a single equal sign has a very special function in R and other coding languages: assignment. From here on, any time that you are writing a logical test in R, you will use a double equal sign. Alternatively, you can ask R whether two things are items are different (not equal) by using !=. This is necessary since few keyboards have a “not equal” sign. Imagine the exclamation point as a slash through the equal sign.

2!=3

## [1] TRUE

Assignment

Single Values

Simple calculations are all well and good. But we want R to be able to do things for use if we fill in a few values. So, for that reason, we will frequently want to use placeholders that we’ll call “objects”. Such objects can represent a particular value, a series of values, or even an entire function. All you are essentially doing is “naming” a number, word, phrase, or procedure.
Note: You can use either a single equal sign (=) or <- for assignment. I strongly prefer <-, since I feel that it makes my code easier to read. But you are welcome to decide for yourself.

  # You can create a data object that represents the value "26" with either: 
x <- 26
  # Then type:
x

## [1] 26

  # or, 
x = 26
x

## [1] 26

# Then try using the object:
2 + x

## [1] 28

You can also do the same with words (enclosed within parentheses).

y <- "Try this now."
y

## [1] "Try this now."

I will no longer include the output in the examples below. Try them out for yourself in R to get a feel for what they can do.

Concatenation

You can also put several values into one object. To do this, you will use the c() function. The “c” stands for concatenate, meaning to link stuff together in a series. (And yes, I am using “stuff” as a technical term.)
Try some of the following.

A <- c(11, 22, 33, 44, 55) # Separate values with commas, or R will return a warning.
  # Then call it up:
A
  # Or use non-numeric values.
B <- c("Try", "this", "list", "of", "words.") # Note that non-numeric values are enclosed in parentheses.

I am assigning these values to letters here out of laziness. You can also use entire words to name objects.

numbers <- c(11, 22, 33, 44, 55)
words <- c("Try", "this", "list", "of", "words.")

Object Types

  # Make some vectors
numbers <- c(11, 22, 33, 44, 55)                  # Vector of numeric values
words <- c("Try", "this", "list", "of", "words.") # Vector of characters
combo <- c(1, "word", 2, "more words", 3)         # Vector of characters and numbers
rnj   <- c(3:8)                                   # Vector of numbers from 3 to 8.
logic <- c(TRUE, FALSE, TRUE, FALSE, FALSE)       # Logical vector

Then, use typeof() to assess whether the object consists of numeric values, character values, or other values.
Note: When you have a mix of numeric and non-numeric values in a vector, R will see everything in that vector as a character string. That means that you will sometimes have an extra, non-numeric entry in your vector, and it will make R think it is looking at a character string.

  # Try:
typeof(combo)

## [1] "character"

Now test your skills a little. See if you can tell what is wrong with the following:

other <- c(o, 1, 2, 3, 4)
# You will see similar SNAFUs as you continue to use R.

Cool Stuff to do with Objects

There are a number of things that you can do with the objects that you create. You can apply a mathematical function to all values in the object, and create a new object based on those values.

  # Multiply a vector of numbers (use the vector you named "numbers") by some value
numbers * 4
A * 4

  # Divide by some number
numbers / 4

  # Create a new object using the modified values.
New.Numbers <- numbers * 5

  # Compare the two:
numbers
New.Numbers

Combining Vectors and Strings

Each of the above objects may be combined to form a matrix, data frame, or other object.

To do this, you will need to use the cbind() command to bind the vectors into columns, or the rbind() command to bind them into rows. Try each to get a feel for what they can do.

You will also see a command for transposing (rotating) a matrix (t()) and for finding out what class an object is in R (class()).

  # Again, start with some vectors.
numbers <- c(11, 22, 33, 44, 55)                  # Vector of numeric values
words <- c("Try", "this", "list", "of", "words.") # Vector of characters
combo <- c(1, "word", 2, "more words", 3)         # Vector of characters and numbers
rnj <- c(3:7)                                   # Vector of numbers from 3 to 7.
logic <- c(TRUE, FALSE, TRUE, FALSE, FALSE)       # Logical vector

# Then, combine them.
Numeric.Object <- cbind(numbers, rnj)
  # Take a look at the object.
Numeric.Object
  # Ask R what type of object it is.
class(Numeric.Object)  # Another way of testing

# Try binding them into rows using rbind(). 
Numeric.Object2 <- rbind(numbers, rnj)

# Put them all together.
Mixed.Object <- rbind(numbers, words, combo, rnj, logic)
  # Then look at it.
Mixed.Object

# If you put the object into rows by mistake, you can transpose the matrix using t().
New.Mixed.Object <- t(Mixed.Object)
  # Now take a look at it.
New.Mixed.Object

Calling up one or More Elements of an Object

Suppose you are interested in using just one of the elements in a particular data object. All you have to do is tell R which one you are interested in calling up. You can do so using the name of the data object followed by square brackets ([ ]).
If you are dealing with a vector (a string of numbers or characters/words), then you just use the number that corresponds with the element you are interested in calling. For example, consider the vector we just created called “combo”. If we are interested in calling up just the forth element of that vector (in this case, that is “more words”.), then you may type: combo[4].

This works the same way with matrices. But matrices have more coordinates: rows and columns. So inside the square brackets, need to specify which row and which column, separated by a comma. The square brackets will always list the row number first, and then column number. For example, if you would like to call up or use the fifth row of the second column of what we called “Mixed.Object” above, then you would type: Mixed.Object[5,2]

You can do the same thing with arrays (stacked matrices), but that is something you can google. For now, let’s play around with using elements of vectors and arrays.

# Using the objects we created above, look at an element of the vectors.
words[1]  # The first element of the vector "words"
words[2]  # The second element of the vector "words"
words[3:5]  # The third, forth, and fifth elements of the vector "words"

# Then look at the elements in a matrix.
Numeric.Object[4, 1]    # This calls the element in row 4, column 1.
Numeric.Object[1:5, 2]  # This calls rows one through five in column 2.
Numeric.Object[ , 2]    # If you would like to call up all the elements of a particular column, leave the rows blank.
Numeric.Object[4 , ]    # Same goes for rows. Here is everything in row four of "Mixed.Object".

# Now to use these new skills:
Numeric.Object[1,1] * Numeric.Object[ , 2]  # Multiply the 4th column in the object by the first entry in column one.
New.Stuff <- Numeric.Object[1,1] * Numeric.Object[ , 2]  # Do it again, but save it as a vector named "New.Column".

Loading Data

CSVs can provide an easy way to get your data into R. To practice loading data into R, first create a spreadsheet to upload.

Enter the information above into an Excel spreadsheet and save it as a CSV file. Because CSVs are essentially text files, they are much easier to load into R. For more information on this, use the R help file: help(read.csv)

The file.choose() function allows you to upliad your data without first specifying where the file is located.

data <- read.csv(file.choose(), header=TRUE)  # R will let you search for 
                                              # the file on your computer.
data <- read.table(file.choose(), header=TRUE, sep=",")  # Same as above, but loaded as a
                                                         # comma separated text file.

# And now some variations: 
data <- read.csv(TryThis.csv, skip=1, header=FALSE) # If you don't want to use headers,
                                                    # tell R to ignore the first line.

Working with Data Objects a Little More

Sometimes it will be necessary to change the class of object that you are working with in R. Thankfully, working with data sets in R can be very flexible.

When you loaded the CSV into R, it was saved as a data frame. You can use the class() function to check this in the R console. You can also use ls() or names() to see the contents of the data frame.

Data objects in R can be coerced into idfferent forms as well. Data objets can be specified as being data frames (as.data.frame()), matrices (as.matrix()), or other forms.

# Using the data you just loaded from the CSV file, above:
class(data)       # Check whether this is a data frame, matrix, or somethig else.

# Check the contents
names(data)  # Variable names in the order they appear 
ls(data)     # Variable names in alphabetical order  

# Use Absolute referencing to call up a particular variable
data$ColumnOne

# Change the data from a data frame to a matrix
data2 <- as.matrix(data)

# Compare the two data objects
data
data2

Saving Data in R

Two common ways to save data objects in R are to save them as CSV files, or R files. The CSVs are easy to work with outside of F. But the R files are best for easy and trouble-free loading of data at another time, or for saving more complex data objects like arrays, network objects, and others.

In either case, both of these procedures require the name of the data object in R followed by the name that you want the file saved as, followed by the proper suffix (.csv or .rda).

write.csv(data, file="data.csv")  # Save as a CSV file

save(data, file="data.RDA")       # Save as an R object

Plotting and Such

There is way too much to cover in plotting data. The example below is therefore a little basic, and meant to introduce the idea of plotting data in R.

For a much more detailed treatment of how to start plotting data in R, check out the Quick-R website. http://www.statmethods.net/graphs/scatterplot.html
ABut keep in mind that even that is just a beginning overview. There are many more sites that go much further on this topic.

# Using the data set above, or any other with two numeric variables...
plot(data)        # This will only work if there are just two nimeric variables 

# Alternatively, you can...
plot(data[, 1], data[, 2])  # specify data by column number
plot(data$ColumnOne, data$ColumnTwo) # use absolute referencing to identify variables

# To improve the look of the plot, you can
plot(data$ColumnOne, data$ColumnTwo, 
     main="This is a scatterplot",    # Add a title
     xlab="Label for the X Axis",     # Label the axes
     ylab="Label for the Y Axis", 
     col="red")                       # Color the points red

That should be sufficient to get you started. I will be happy to add anything else that you still find confusing.