2+2[1] 4
Practical 1: Statistical Techniques for Research (FAB500)
Note: for this tutorial I have relied on two excellent resources:
The excellent introduction to R coding by Garrett Grolemund.
This great YouTube video on the very basics of R - by a South African guy
To maximize your learning, you can open up RStudio on your own computer and work through each of the instructions below in R to see how it works and get more used to using R yourself.
Install R and RStudio on your laptop or other PC you have access to, for free, using the instructions at this link. Then you can keep practicing and working with R at home and in your own time.
How does this help my Honours research project? Before you can start analysing your research project data in R, you will have to get to know the basics of R. This will take some time, but believe me it will be worth it :)
Need more convincing? Check out this 4 minute video
“Why should you use R?”
R works together with RStudio. RStudio is the software that makes it easy to work with R. Go ahead and open up RStudio on your computer, in the same way you open any other application.
You “speak” with R by typing code (code is like instructions) into R and then running/executing the code (sending it to the computer). That is where the term “computer coding” or “coding” comes from.
Here are the steps for running code
You type the code into the R script which is the top left panel in RStudio.
You then run the code (send it to the computer) by clicking the “Run” button in the R script (see top of the script window) or by pressing Ctrl+Enter.
R then follows the instructions and sends back an “answer” in the R console
Here is an example
Type 2+2 into the R script and then click “Run”
2+2[1] 4
To make it easy for me to teach you R, I use grey boxes and white spaces in this tutorial when writing code and showing the results of the code
The grey box above with the “2+2” represents the R script (top left panel in RStudio) - the place where you write instructions to R. In this document, all grey boxes represent R script instructions. If you want to follow along and do this in your own RStudio, you can copy what you see in these grey boxes into your own R script in RStudio. Then click “Run” or hit “Ctrl+Enter” as per step 2 above.
The white space after the grey box (the part starting with the [1] and showing the answer 4) represents the R console in RStudio (bottom left panel). This is the place where the answers to your instructions are given.
This is the format we will use in this tutorial.
Now lets carry on with some other R basics
##REMEMBER - these grey boxes represent "code scripts".
#You can copy the stuff in these boxes into your R script in RStudio
#(top left panel in RStudio) and then hit "Ctrl+Enter" to Run,
#or click "Run" button.
#Check if you get the same results as below.
# print a number:
2 [1] 2
# perform a simple calculation
2+3 [1] 5
# Multiplication
3 * 5 [1] 15
# Division
4/2[1] 2
#Another
(5+3)/2[1] 4
#Use a function
sqrt(4) [1] 2
max(2,3,7)[1] 7
You can make a list of numbers in R - using the : symbol. Try the below
1:3[1] 1 2 3
1:10 [1] 1 2 3 4 5 6 7 8 9 10
R is an “object oriented” programming language. You create an object and assign a value or an attribute to the object. Below we create the object called “x” and tell R that the value/attribute of x is “5”
You can view the attribute or value of any object by typing the object name.
##REMEMBER - these grey boxes represent "code scripts".
#You can copy the stuff in these boxes into your R script in RStudio
#(top left panel in RStudio) and then hit "Ctrl+Enter" to Run,
#or click "Run" button.
#Check if you get the same results as below.
#Create an object x (left of arrow) and assign it the value 5 (right of arrow)
x<-5
#Now ask R to tell you what x is
x [1] 5
What just happened? To create an R object, choose a name and then use the less-than symbol, <, followed by a minus sign, -, to save data into it. This combination looks like an arrow, <-. R makes an object called x and then stores in it whatever follows the arrow, in this case it stores the number 5 in x.
Now you can do calculations on x
x+5[1] 10
x-2[1] 3
x*2[1] 10
Capitalization
R is case-sensitive, so name and Name will refer to different objects:
Name <- 1
name <- 0
Name + 1
## 2
Adding two objects/variables.
Below we create a new variable called y, assign it the value 4, and then add it to x
y<-4
y[1] 4
x[1] 5
x+y[1] 9
What if we sign more than one number to an object/variable? Let’s try assigning the numbers 1 to 3 to the object z
z<-1:3
z[1] 1 2 3
Congratulations! You have created your first vector…
You create vectors in R by using the concatenate function c(). Below we create a new vector called “goals” and assign it three values - the number of goals scored by three different soccer players, during their entire international careers. Can you guess their names?
goals<-c(123,106,79)
goals[1] 123 106 79
We can also calculate the mean and SD of a vector using the functions mean() and sd() as below
#calculate the mean of the goals
mean(goals) [1] 102.6667
#calculate the standard deviation of goals
sd(goals) [1] 22.18859
You can also apply other functions to a vector to get more information
#Calculate the number of observations in the vector
length(goals) [1] 3
# Ask R what TYPE of varaible "goals" is
class(goals) [1] "numeric"
“goals” is a numeric vector. Now let’s create another vector, this time with the names of the soccer players:
players<-c("Ronaldo","Messi","Neymar")
players[1] "Ronaldo" "Messi" "Neymar"
#What type of variable is the vector "players"?
class(players)[1] "character"
#What happens if you try and get the mena of a character vector?
mean(players)Warning in mean.default(players): argument is not numeric or logical: returning
NA
[1] NA
R sends you a warning that the variable is not numeric, so it does not make sense calculating a mean.
R calls data tables or data sheets “tibbles” - a weird name don’t ask me where it comes from. What I do know, however, is that tibbles are your friend. They are very handy and easy to use. Let’s create our first tibble.
We will do this by combining our two vectors (the players vector/variable and the goals and vector/variable) into a tibble which we will call stars.
Notice in the R code below that you still use the assign arrow (<-) and then you use the tibble() function
This tibble() function tells R to make a tibble out of whatever you put between the brackets in the tibble() function. This function expects you to put in between the brackets the names of the COLUMNS that you want to make up the tibble. Plus put a comma between the column names
Try the below in R yourself. We will re-create the vectors for clarity:
players<-c("Ronaldo","Messi","Neymar")
goals<-c(123,106,79)
stars<-tibble(players,goals)
stars# A tibble: 3 × 2
players goals
<chr> <dbl>
1 Ronaldo 123
2 Messi 106
3 Neymar 79
Well done! You just created your first tibble in R. You can see it has two columns and three rows.
First, lets ask R how many columns and rows our stars tibble has. We do this using the functions nrow() and ncol():
ncol(stars)[1] 2
nrow(stars)[1] 3
You can select a particular column in the tibble by using the $ symbol, followed by the column name
stars$players[1] "Ronaldo" "Messi" "Neymar"
stars$goals[1] 123 106 79
A super power - selecting particular rows or columns in a tibble
You can also select a particular column/row using the row and column number method as explained in the image below.
Step 1: type your tibble name and then open square brackets - stars[
Step 2: In the square brackets first type the row number you want followed by a comma - stars[2,
Step 3: then type the column number you want and close the bracket - stars[2,1]
Step 4: run your code R should give you the SECOND row and FIRST column.
Let’s try this in RStudio - type into your R script and then click “Run”
#First load the stars tibble to view it to help you select the right rows and columns
stars# A tibble: 3 × 2
players goals
<chr> <dbl>
1 Ronaldo 123
2 Messi 106
3 Neymar 79
#Now select the element that is in the position of second row, first column
#REMEMBER - before the comma you specify the row after comma is the colum
stars[2,1]# A tibble: 1 × 1
players
<chr>
1 Messi
See that R sends you the answer “Messi” - see below why this is correct
When you want to select a whole row or a whole column, you use blank spaces as per the Figure above.
#Select the full first row (remember leave blank after comma - select ALL columns)
stars[1,]# A tibble: 1 × 2
players goals
<chr> <dbl>
1 Ronaldo 123
#Select the full second column (remember leave blank before comma)
stars[,2]# A tibble: 3 × 1
goals
<dbl>
1 123
2 106
3 79
You can also subset the data frame to include only the first two rows (players), but keep all the columns, like this:
stars[1:2,]# A tibble: 2 × 2
players goals
<chr> <dbl>
1 Ronaldo 123
2 Messi 106
What if you wanted to do all of the coding (creating variables, vectors and tibbles in R) again? Or if you just wanted to change things a little bit?
You could go back and retype each line of code above, but it would be so much easier if you had a draft of the code to start from
This is where R scripts come in. You can create a draft of your code as you go by using an R script. An R script is just a plain text file that you save R code in. You can open an R script in RStudio by going to File > New File > R script in the menu bar. RStudio will then open a fresh script above your console pane, as shown in the Figure below.
I strongly encourage you to write and edit all of your R code in a script before you run it in the console. Why? This habit creates a reproducible record of your work. When you’re finished for the day, you can save your script and then use it to rerun your entire analysis the next day. Scripts are also very handy for editing and proofreading your code, and they make a nice copy of your work to share with others. To save a script, click the scripts pane, and then go to File > Save As in the menu bar.
Figure: When you open an R Script (File > New File > R Script in the menu bar), RStudio creates a fourth pane above the console where you can write and edit your code.
RStudio comes with many built-in features that make it easy to work with scripts. First, you can automatically execute a line of code in a script by clicking the Run button.
R will run whichever line of code your cursor is on. If you have a whole section highlighted, R will run the highlighted code. Alternatively, you can run the entire script by clicking the Source button. Don’t like clicking buttons? You can use Control + ENTER as a shortcut for the Run button. On Macs, that would be Command + Return.
Figure 2.8: You can run a highlighted portion of code in your script if you click the Run button at the top of the scripts pane. You can run the entire script by clicking the Source button.
If you’re not convinced about scripts, you soon will be. It becomes a pain to write multi-line code in the console’s single-line command line. Let’s avoid that headache and open your first script now before we move to the next chapter.
Extract function
RStudio comes with a tool that can help you build functions. To use it, highlight the lines of code in your R script that you want to turn into a function. Then click Code > Extract Function in the menu bar. RStudio will ask you for a function name to use and then wrap your code in a function call. It will scan the code for undefined variables and use these as arguments.
You may want to double-check RStudio’s work. It assumes that your code is correct, so if it does something surprising, you may have a problem in your code.
library(palmerpenguins)
head(penguins)# A tibble: 6 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
# ℹ 2 more variables: sex <fct>, year <int>
plot(penguins$body_mass_g,penguins$bill_length_mm)There are several types of variables in R
numeric vector (numbers)
character vector (words/letters)
factor vector/variable - this is for categorical variables (see below for an example)