Before we can start exploring data in R, there are some key concepts to understand first:
- What are R and RStudio?
- How do I code in R?
What are R and RStudio?
In this class, we will be using R via RStudio. First time users often confuse the two. At its simplest R is like a car’s engine while RStudio is like a car’s dashboard.
More precisely, R is a programming language that runs computations while RStudio is an integrated development environment (IDE) that provides an interface by adding many convenient features and tools. So just as the way of having access to a speedometer, rear-view mirrors, and a navigation system makes driving much easier, using RStudio’s interface makes using R much easier as well.
Since most have you have never used R, we are also using RStudio Cloud, meaning that you don’t have to install anything. Everything has already been set-up for you so that you can just log in and get started.
How do I code in R?
Now that you’re set up with R and RStudio, you are probably asking yourself “OK. Now how do I use R?” The first thing to note is that unlike other statistical software programs like Excel, STATA, or SAS that provide point-and-click interfaces, R is an interpreted language. This means you have to type in commands written in R code. In other words, you have to code/program in R. Note that we’ll use the terms “coding” and “programming” interchangeably in this course.
While it is not required to be a seasoned coder/computer programmer to use R, there is still a set of basic programming concepts that linguistics working with quantitative data need to understand. Consequently, while this course is not a course on programming, you will still learn just enough of these basic programming concepts needed to explore and analyze data effectively.
Tips on learning to code
Learning to code/program is very much like learning a foreign language. It can be very daunting and frustrating at first. Such frustrations are very common and it is very normal to feel discouraged as you learn. However just as with learning a foreign language, if you put in the effort and are not afraid to make mistakes, anybody can learn.
Here are a few useful tips to keep in mind as you learn to program:
- Remember that computers are not actually that smart: You may think your computer or smartphone are “smart,” but really people spent a lot of time and energy designing them to appear “smart.” In reality, you have to tell a computer everything it needs to do. Furthermore, the instructions you give your computer can’t have any mistakes in them nor can they be ambiguous in any way.
- Take the “copy, paste, and tweak” approach: Especially when you learn your first programming language or you need to understand particularly complicated code, it is often much easier to take existing code that you know works and modify it to suit your ends. This is opposed to trying to type out the code from scratch. We call this the “copy, paste, and tweak” approach. So early on, we suggest not trying to write code from memory, but rather take existing examples we have provided you, then copy, paste, and tweak them to suit your goals. After you start feeling more confident, you can slowly move away from this approach. Think of the “copy, paste, and tweak” approach as training wheels for a child learning to ride a bike. After getting comfortable, they won’t need them anymore.
- The best way to learn to code is by doing: Rather than learning to code for its own sake, we feel that learning to code goes much smoother when you have a goal in mind or when you are working on a particular project, like analyzing data that you are interested in.
- Practice is key: Just as the only method to improve your foreign language skills is through lots of practice, the only method to improving your coding skills is through lots of practice. Don’t worry however, we’ll give you plenty of opportunities to do so!
Getting started
After you log in, you will see two main windows on the left of the screen. * The window on top that contains the text and code from this section is called an R script (or sometimes an R notebook–but we’ll get to that later). * The window below, where all of the output is listed is called the R console.
The difference is that you can write and erase as much as you want inside the script, and it will not do anything until you transfer a line of code to the console. Inside the console window, you cannot change a line of code once it is entered; you instead have to re-enter the changed line of code. To run any line of code, you an do any of the following:
Type the line directly into the R console window followed by a carriage return (i.e. ENTER).
Use the keyboard or mouse to copy the line (including the carriage return at the end) from the script and paste it into the R console window.
Highlight the part of the code you want to use from the script by clicking and dragging your mouse over it. Then, for a Mac, hold down the key (the swirlygig button, formerly the apple button) on the left of the keyboard, and then also hit (as you are continuing to hold the key). If you use R on a PC running windows (or linux), you will highlight the code and press and the letter “r”.
The last option is the fastest way of getting a line of code to work because it transfers it to the console and executes the command all at once. Even if you are writing the commands or functions yourself, you should write them in a script window first and then execute them from the script. That way, it is easier to go back and make changes to your code (which you will probably have to do often at first).
Code vs. text
If you are using the .Rmd file, you’ll notice that this text is just written like normal text. In a notebook like this, you tell RStudio that you’re writing code (not text) by inserting a code chunk like this.
7 - 2
The code above subtracts 2 from 7.
The output (i.e. 5) is also shown below the code.
Arithmetic operators
Now the easiest thing to do in R is basic arithmetic operations. Basic operator signs are as follows:
Adding:
459 + 51
Subtracting:
459 - 51
Multiplying:
459 * 51
Dividing:
459 / 51
Exponentiation
51 ^ 3
51 * 51 * 51
Remember, to solve each of these equations, you could
- type it into the console then hit return
- copy and paste it into the console and press return, or
- highlight it from the portion of the notebook and press + (or + for a PC running windows).
Try executing the commands in each of the three ways.
If it has been a while since you have done a lot of this kind of arithmetic, it might be good to review some other basics, such as the difference between the following 2 commands.
(459 + 51) / 3
459 + (51 / 3)
The first line adds 51 to 459 and then divides the result by 3. The second line adds 459 plus the result of dividing 51 by 3.
Functions
Addition also can be performed using the R function sum(). A function is a named command that can stand for an arbitrarily long sequence of operations. That is, it is like a shortcut for more complicated mathematical functions or processes, including operating the graphical device, that have been pre-programmed into R. A function will be followed by parentheses where you will specify further bits of information that R needs in order to perform the selected function. Each bit of information is an argument.
If there are several arguments, they are separated from each other by commas. For example, the sum() function adds its arguments together, so the following two commands return the same result:
459 + 51 + 327
sum(459, 51, 327)
The R assignment operator
Another very important special symbol is =, the assignment operator. This operator tells R to assign to the thing on its left, the value to its right. In the simplest case, this is just like giving a name to the value.
So for example:
x1 = 459 + 51
adds 51 to 459 and assigns the result the name x1.
x2 = sum(459, 51, 327)
stores the sum of these 3 numbers in x2.
x3 = 7 - 2
takes 2 from 7 and stores the result in x3.
x4 = sum(7, -2)
stores the sum of 7 and -2 in x4 (same as x3).
Once you have stored the result of an operation in this way, you can retrieve the computed value just by typing the “name” that you’ve given to the value. So, for example, if you type x1 or x2 or x3 or x4 in the R console window after running the above four commands, the next line on the R console window is the same value that you would have got by running the original command again. This is especially convenient if you want to store more complicated values, such as a vector of numbers instead of a single number.
The R vector function
First of all, a vector is basically a single row of items. In R, you can specify that a set of values is a vector by typing them, separated by commas, as arguments to the c() function, like this:
c(459, 51, 327)
The c() part of this command is a function that tells R to “concatenate” the arguments, which means “to group these items together in order,” which is a simple definition of a vector. Again, a function is like a shortcut for more complicated mathematical functions or processes.
The name of the function will be followed by parentheses where you will specify further information that R needs in order to perform the selected function. In this case, the () parentheses enclose the items you want grouped together, and the items are separated by commas, to show where one item ends and the next begins. If you assign the vector a name, like this:
x5 = c(459, 51, 327)
you have a way of referring back to it, so that you don’t have to keep typing the same numbers in over and over again. So, after you run the above command, the following two lines of code return the same value.
sum(459, 51, 327)
sum(x5)
The length() function lets you count the number of items in a vector. So the value that is returned by the following command is the number of items in the vector x5 that you created earlier.
length(x5)
You can refer to a value at any position in a vector by following the vector with the position number enclosed in square brackets. So after you have defined x5 as above, the following three commands all return the same result.
Just type the number and R will echo it.
length(x5)
Specify the second item in the vector
c(459, 51, 327)[2]
x5[2]
The following three commands also are equivalent ways of adding 459 and 51.
459 + 51
x5[1] + x5[2]
sum(x5[1],x5[2])
This equivalence may seem a bit boring and trivial now, but wait until you see what this buys you when you’re dealing with longer vectors or more complicated items!
