1. Introduction
This is the first tutorial of the workshop. This is for you to familiarise yourself with R, R Studio and the console.
2. Installing R and R Studio
You can go directly to 3 if you’ve installed R and R Studio
We will be using R Studio, but R studio can only work if R is already installed, so we need to have both installed.
2.1. Installing R
R is free and can be installed in Windows, Mac, Linux. Go to the R project website. Once there click on Download R and select a CRAN Mirror (you can choose any)
Now, follow the instructions and you will have R installed in your own computer.
2.2. Installing R Studio
Go to the R Studio website, choose the free version and download the file according to your operating system.
Once you have installed both R and R Studio, you don’t need to interact at all with R. You can do everything you need from within R Studio.
Note: R Studio can do everything that R can, but R cannot do everything that R Studio does
3.Getting started with R Studio
3.1. Working directory
Set the working directory. In a real-life situation, this is where you saved your data.
setwd("copy/the/Path/to/your/folder/here")
If you are unsure where your current working directory is, you can use the function getwd(), which literally means: “get working directory”. The working directory will be printed on your console.
getwd()
3.2. Open a new script
Go to the top left corner, underneath the “File” menu, and click on the icon with a plus sign. That will open a new script for you to type your commands.
3.3. Installing packages
Installing packages can be done by coding or clicking on the “Tools” tab of the R Studio bar. You only need to install packages once (unless you change computer or there is an update).
The code for installing packages is as follows:
#one time instalation
install.packages('tidyverse')
install.packages('haven')
install.packages("foreign")
# You can also install more than one package at a time, adding a letter "c" for concatenate before the names of the packages
install.packages(c("tidyverse", "haven"))
3.4. Loading packages
Installing packages is the first step, but we still need to add another step to actually be able to use them. Unfortunately, this has to be done each time you start a new session in R or R Studio.
Every time that you start a new data analysis project, it is a good practice to start your script with loading the packages that you want to use. You don’t need to add them all at once, but it makes for tidier and more efficient coding if you do so.
When installing, packages are downloaded to a library (a folder in your computer), so when we want to load packages to R, we use the function library.
# Loading packages (load the packages in every session!)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.1
## -- Attaching packages --------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.2
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.1
## -- Conflicts ------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(haven)
## Warning: package 'haven' was built under R version 3.6.1
library(foreign)
## Warning: package 'foreign' was built under R version 3.6.1
3.5. Data types and structures
Creating variables (vectors) of length 1, using the assign arrow
a <- 3
b <- 5
c <- 9
We can perform basic operations with variables
b*c
## [1] 45
We can assign the result to a new variable called ‘d’
d <- b*c/a
and then, see the value of the variable “d”
d
## [1] 15
4. Loading/Importing data into R
The function we use to import data into R will depend on the format in which the data is. Data in .dat, .txt or .csv formats can be read using base R functions (don’t need to use any package), but data in Stata, SPSS or SAS format, to name a few, need a special package.
The package haven can be used to easily import Stata and SPSS data files.
Each of these packages has a specific function to read a specific dataset. You can check the package documentation (linked above) or a simple Google search to get your answer.
The following example is to read in a .dat file from: Wright, D. and London, K. (2011). Modern Regression Techniques Using R, London: Sage. (Chapter 8). We use the base package function read.delim, as such
mydata <- read.delim("26934_exercise.dat")
You can also read online data directly if you specify the link, as such:
(This is from Wright and London’s book website companion. It’s the same data as before. You can use any of the two codes)
mydata<-read.delim("https://us.sagepub.com/sites/default/files/upm-binaries/26934_exercise.dat")
5. Explore the data
If you want to browse the data to see what it contains, you can simply click on the “Environment” tab and then click on the corresponding object. Alternatively, you can use the following code:
View(mydata)
Or you can print the first six rows of data by typing:
head(mydata)
## wcond class w1sn02 w2sn02 w1int w2int w1att w2att z1pbc z2pbc sqw1
## 1 1 1 7 7 7.0 7.0 7.0 7.0 7.00 7.00 2.5495098
## 2 1 1 5 3 5.5 3.5 6.0 4.0 4.75 4.25 2.3452079
## 3 1 1 2 2 2.5 1.5 4.5 3.5 1.75 2.50 0.7071068
## 4 1 1 4 2 3.5 1.5 4.0 2.5 4.25 2.75 0.7071068
## 5 1 1 2 2 3.0 1.5 4.5 3.0 5.25 3.50 1.2247449
## 6 1 1 2 4 6.0 3.0 5.5 4.5 3.75 2.75 1.8708287
## sqw2
## 1 2.5495098
## 2 2.3452079
## 3 0.7071068
## 4 0.7071068
## 5 0.7071068
## 6 2.1213203
To obtain descriptive statistics, you can type:
mean(mydata$sqw2, na.rm=T) # na.rm=T is used to remove missing values
## [1] 1.78919
sd(mydata$sqw2, na.rm=T) # that will give you the standard deviation
## [1] 0.5633153
A much faster way of obtaining more detailed information about a variable is to use the summary function:
summary(mydata$sqw2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7071 1.5811 1.8708 1.7892 2.1213 3.2404
In the next practical, you will get to use more functions. This is just the first taster.
Bonus track
If this is the first time you’ve been coding, then you need to be “initiated”. Type the following on your R script:
myfirstprogramme<-function() {
print("Hello World!")
}
And then type:
myfirstprogramme()
## [1] "Hello World!"
Welcome to world of coding!

Patricio Troncoso
patricio.troncoso@manchester.ac.uk