MCD Workshop 1

Ibrahim Inal

Hello World!

Meet the Team

Dr Ibrahim Inal

Lecturer in Economics

Prof Anurag Banerjee

Lecturer in Economics

Mr. George Cheatle

PhD in Economics

Meet each other!

03:00

This module

Data science and computation is a huge field, and it is impossible to mastering all aspects by taking a module or reading a single book.

  • Get to know some fundamental principles and tools
  • How to use R

You will have

  • two group projects
  • an individual project

details later…

You will probably be disappointed…

Instead I hope to convince you…

And yes I know it was hoping

Instead getting

Hello R!

R and RStudio

  • R is an open-source statistical programming language
  • R is also an environment for statistical computing and graphics
  • It’s easily extensible with packages

  • RStudio is a convenient interface for R called an IDE (integrated development environment), e.g. “I write R code in the RStudio IDE”
  • RStudio is not a requirement for programming with R, but it’s very commonly used by R programmers and data scientists

Help for R and RStudio

The RStudio Help menu contains links to many documents for help with both R (select R Help) and RStudio (see RStudio Docs and RStudio Community Forum).

I particularly like the Cheatsheets, which are compact documents crammed with useful information on how to use various products made by the RStudio group.

RStudio Cheatsheet

Open the cheatsheet for RStudio by selecting the Help menu -> Cheatsheets -> RStudio IDE Cheat Sheet. Note the cheatsheet will usually be downloaded in a web browser as a .pdf.

R Packages

  • Packages include reusable R functions, the documentation that describes how to use them and sample data
  • You could get more details and the total number of packages from https://cran.r-project.org/web/packages/
  • To use packages in R, we must first install them using the install.packages() function, which typically downloads the package from CRAN and installs it for use. Installing once is enough, but if you update R or RStudio, need to install again
install.packages(tidyverse, dependencies = "TRUE")
  • Once you load the package, you can call it by using library() or require()
library(tidyverse)

Vignettes*

Many packages include vignettes – longer, tutorial style guides for a package.

vignette()

Tip

View the “Introduction to dplyr” vignette by issuing the command vignette(“dplyr”).

Potential Issues

  • You may want to check the system capability of your OS
    • For Windows: This PC> Properties
    • For MAC: System Profile/Info>Software>Contents
    • For Linux: Type getconf LONG_BIT command in the terminal
  • If you have older system than you could go and check the older version of both R and RStudio

An Alternative Way

You could also use R and RStudio from


https://appsanywhere.durham.ac.uk/login

R operators

Arithmetic operators

1+1  #sum of two numbers
[1] 2
x<-1  # assignment of x variable
y<-1 #assignment of y variable
x+y #sum of variables
[1] 2

Below is a list of arithmetic operators in R.

Arithmetic Operators in R
Operator Description Example Result
+ Addition 2 + 3 5.00
- Subtraction 5 - 2 3.00
* Multiplication 4 * 5 20.00
/ Division 10 / 2 5.00
^ Exponentiation 2^3 8.00
%% Modulus (Remainder) 10 %% 3 1.00
%/% Integer Division 10 %/% 3 3.00

Logical operators

1 == 1
[1] TRUE

Caution

\(=\) does the same thing with <- i.e., assignment operator. \(==\) means equal.

Logical Operators in R
Operator Description Example Result
& Logical AND TRUE & FALSE FALSE
| Logical OR TRUE | FALSE TRUE
! Logical NOT !TRUE FALSE
== Equal to 5 == 5 TRUE
!= Not equal to 5 != 3 TRUE
> Greater than 3 > 2 TRUE
>= Greater than or equal to 3 >= 3 TRUE
< Less than 4 < 6 TRUE
<= Less than or equal to 4 <= 4 TRUE

In-built functions

sqrt(144)#square root function
abs(-1)#absolute value function
round(1.2)#round to a whole number
exp(1)#exponential function
log(1,base=10)#logarithm in base 10
rep(5,3)#repeat something
?log#help function
q()#quit function.

Note

Note that in order to use package specific functions you need to install and call out the package.

Practice

Open up a script file by clicking File>New File>R Script. Write #This is my first script and carry out

  1. Assign the number 11 to a variable called x and number 5 to a variable y. Find the sum of x and y.
  1. Now assign 11 to a variable called z. Find x+y. Find the reminder when you divide z to y
  2. Remove the variable z by using rm(). Try getting help about rm().

R Objects

R works with objects.

Any object in R is of a particular type, is stored in a particular way, and belongs to a particular class. The first two is more related with how R handle the object, the last one is based on the use of the object. Note that in most text the distinction between data and object types, storage and classes is not clear and depends on the context. You could read more on this https://stackoverflow.com/questions/6258004/types-and-classes-of-variables.

You could think class as the structure of the object from programming perspective and type from R’s perspective.

Caution

This discussion is not relevant for other programming languages.

Class Types in R
Class Type Description Example
character Character/String data name <- "John"
numeric Numeric data (real numbers) price <- 12.34
integer Integer data (whole numbers) age <- 30
complex Complex data (real + imaginary parts) z <- 3 + 2i
logical Boolean/Logical data (TRUE/FALSE) is_valid <- TRUE

Caution

Class is a blueprint for the object.class() gives the class of variable

Types in R
Types Description Example
logical Boolean/Logical data (TRUE/FALSE) is_valid <- TRUE
integer Integer data (whole numbers) age <- 30L
double Numeric data (real numbers) price <- 12.34
complex Complex data (real + imaginary parts) z <- 3 + 2i
character Character/String data name <- "John"
raw Raw bytes binary_data <- charToRaw("hello")
list List of objects my_list <- list(1, "hello", TRUE)
NULL Null object (no value) x <- NULL
closure Function/Closure add_numbers <- function(a, b) { return(a + b) }
special Special function types special_func <- system.time()
builtin Basic functions and operators sum_builtin <- sum(1:10)
environment Environment (symbol and value bindings) env <- new.env()

Caution

We can check the type of variables by using typeof() function. We can change the type of a variable to type x using the function as.x. This process is called “coercion”, e.g., as.numeric("123").

In addition to these abstract structures, there are native objects in R. Since R is primarily designed for statistical analysis, its objects are generally data.

Data Structures in R and Examples
Data Structure Example
vector x <- c(1, 2, 3, 4)
matrix mat <- matrix(1:6, nrow = 2)
array arr <- array(1:24, dim = c(2, 3, 4))
list my_list <- list(1, "hello", TRUE)
data frame df <- data.frame(Name = c("Alice", "Bob"), Age = c(25, 30))
factor gender <- factor(c("Male", "Female", "Male"))
table tbl <- table(c("A", "A", "B", "C", "C", "C"))

Data structures: Vectors, matrices & others

Vectors can be generated by many functions.

x <- numeric()               # initiate an empty numeric vector
y <- c(5, 6, 7)               # generate a vector by connecting scalars via c() ("concatenate")
z <- c(5, "test", 7)          # does not work as intended
typeof(x)
[1] "double"
typeof(y)
[1] "double"
typeof(z)
[1] "character"
is.numeric(y)                 # check whether y is numeric
[1] TRUE
is.integer(y)                 # ... but its not integer
[1] FALSE
y <- as.integer(y)            # unless we declare it as such
is.integer(y)                 
[1] TRUE
is.numeric(z)                 # check whether z is numeric
[1] FALSE
# sequences
i <- 5:7                      # short hand for integer vectors
is.integer(i)                 # ... is by definition integer
[1] TRUE
mode(i)
[1] "numeric"
typeof(i)
[1] "integer"
i == y                        # is i and y the same?
[1] TRUE TRUE TRUE
i == as.numeric(y)            # is i and y the same?
[1] TRUE TRUE TRUE
-5:5                          # also works with negative numbers
 [1] -5 -4 -3 -2 -1  0  1  2  3  4  5
5:-5                          # ... or backwards      
 [1]  5  4  3  2  1  0 -1 -2 -3 -4 -5
seq(from = 10, to = 12.5, by = .5)  # sequence with equal increments
[1] 10.0 10.5 11.0 11.5 12.0 12.5
seq(from = 10, to = 12.5, length.out = 10) # ... sequence with predefined length (implicit increment) 
 [1] 10.00000 10.27778 10.55556 10.83333 11.11111 11.38889 11.66667 11.94444
 [9] 12.22222 12.50000
# repetitions
rep(x = 5, times = 3)           # concatenates an argument some times with each other
[1] 5 5 5
rep(x = c(6,7), times=3)        # the argument can also be a vector
[1] 6 7 6 7 6 7
rep(x = c(6,7), each = 3)       # elementwise repetition
[1] 6 6 6 7 7 7

Matrices can be generated directly by using matrix() function or by using vectors

x <- matrix(1:6, ncol=2)        # construct matrix with 2 columns filled columnwise (default)
matrix(1:6, ncol=2, byrow=T)    # ... or row by row
     [,1] [,2]
[1,]    1    2
[2,]    3    4
[3,]    5    6
matrix(1:6, nrow=2)             # matrix with 2 rows
     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6
ncol(x)                         # number of columns
[1] 2
nrow(x)                         # number of rows
[1] 3
# constructing matrices from vectors 
x <- cbind(1:6, 2:7, 3:8)       # bind vectors column-wise 
dim(x)                          # reports dimensions of matrix
[1] 6 3
x <- rbind(1:6, 2:7, 3:8)       # row-wise binding
is.matrix(x)                    # chack whether x is a matrix
[1] TRUE
# arrays
x <- array(1:12, dim=c(2,2,3))
dim(x)
[1] 2 2 3
# problem: different data types 
x <- cbind(1:3, c("rest", "test", "nest"))
is.matrix(x)
[1] TRUE

Most of mathematical operations on vectors and matrices can be carried out with R.

# vectors
x <- 1:5          # initialize a vectors
y <- 6:10
x - y             # substraction
[1] -5 -5 -5 -5 -5
x * y             # elementwise multiplication
[1]  6 14 24 36 50
x / y             # elementwise division
[1] 0.1666667 0.2857143 0.3750000 0.4444444 0.5000000
x %*% y           # dot (or inner) product
     [,1]
[1,]  130
rev(y)            # reverse order
[1] 10  9  8  7  6
outer(x,y)        # outer product
     [,1] [,2] [,3] [,4] [,5]
[1,]    6    7    8    9   10
[2,]   12   14   16   18   20
[3,]   18   21   24   27   30
[4,]   24   28   32   36   40
[5,]   30   35   40   45   50
sum(x)            # sum of a vector
[1] 15
# matrices
x <- matrix(1:9, ncol = 3)
y <- matrix(10:18, ncol = 3)

x * y             # elementwise multiplication
     [,1] [,2] [,3]
[1,]   10   52  112
[2,]   22   70  136
[3,]   36   90  162
x %*% y           # matrix multiplication
     [,1] [,2] [,3]
[1,]  138  174  210
[2,]  171  216  261
[3,]  204  258  312
t(x)              # transpose matrix
     [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6
[3,]    7    8    9
sum(x)            # sum of the matrix
[1] 45
#solve(x)          # inverse of a matrix
diag(x)           # extracts diagonal elements of x 
[1] 1 5 9
lower.tri(x)      # lower triangle matrix of x (upper.tri() also exists)
      [,1]  [,2]  [,3]
[1,] FALSE FALSE FALSE
[2,]  TRUE FALSE FALSE
[3,]  TRUE  TRUE FALSE
rowSums(x)        # calculates sums of rows 
[1] 12 15 18
colSums(x)        # calculates sums of columns
[1]  6 15 24
rowMeans(x)       # calculates means of rows 
[1] 4 5 6
colMeans(x)       # calculates means of columns
[1] 2 5 8

Data frames are like matrices, but they can contain variables of multiple types.

x <- data.frame(a = 1:3, b = c("rest", "test", "nest"), c = c(T,F,T))     # a data frame with 3 columns named a, b, and c
is.matrix(x)
[1] FALSE
is.data.frame(x)
[1] TRUE
dim(x)            # dimension of data frame
[1] 3 3

List is one of the most flexible way to handle data.

x <- list(a = 1:3, b = "nest", c = TRUE)  # a  simple list
x <- list(a = 1:3, b = "nest", c = list(d = "test", e = rep(x = c(TRUE,FALSE), each = 3)))  # a more complicated list
is.list(x)
[1] TRUE
length(x)  
[1] 3

Practice

  1. Open a new script or continue to your script from the previous practice part.
  2. Assign the character string 1 to x and the character string 2 to y.
  3. Find the result of x+y. Comment on your result
  4. Is there a way to change the result of x+y. If you find a way, then implement this and call the new variable z
  5. Find the square root of z.
  6. Remove all the variables you created.
  7. Print ECON1181.
  8. Create a vector of integers from \(-1\) to \(2\) (inclusive) and assign it to the variable u
  9. Create another vector which consists of the first \(5\) multiples of \(2\) (i.e., \((2,4,8,...)\)) and assign it to the variable w.
  10. What is the result of z <- c(1, 2, "3")? Why does R return this result?

Indexing

Indexing allows to get an information from data structures.

vec<-10:20 #creating vector
vec[2]     # getting the second element of vec
[1] 11
vec[6:8]    # getting the elements with indices 6,7 and 8
[1] 15 16 17
vec[c(6,8)]  #getting the elements with indices 6 and 8
[1] 15 17
vec[-c(6,8)] #getting all the elements except with indices 6 and 8
[1] 10 11 12 13 14 16 18 19 20
M<-matrix(1:12,nrow=3)
M
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
M[1,2]
[1] 4
M[c(1,2),c(3,4)]
     [,1] [,2]
[1,]    7   10
[2,]    8   11
registry<-list( students=c("Jack","Jill","George"), attendance=c(0.2,0.6,0.5), marks=c("B", "B", "C"))
registry$students
[1] "Jack"   "Jill"   "George"
registry[["students"]]
[1] "Jack"   "Jill"   "George"

Practice

  1. Create a list that contains name, age and techniques. In your list, have one person named John Wick who is 36 years old, who uses Judo, Kali, Jujitsu, Gun-fu and Krav maga.
  1. Get Kali by using indexing.
  1. What code can I use to find out how many techniques John Wick knows?
  1. Consider the following vectors
person <- c(
  "Abe", "Bet", "Can", 
  "Dev", "Esme"
)
numberKids <- c(2, 1, 0, 2, 3)
yearsEducation <- c(12, 16, 13, 14, 18)
hasPets <- c(FALSE, FALSE, TRUE, TRUE, FALSE)
  1. Write a command that produces the names of people who have more than 1 child.(Hint: use indexing on person vector)

  2. Write a command that produces the years of education with at least 13 years of education. (Hint: use indexing on yearsEducation vector)

  3. Write a command that produces the names of people who have more than 1 child and who have at least 13 years of education (Hint use previous two)

  4. Write a command that says whether or not there is someone who has more 15 years of education and at least one child, but doesn’t have any pets (Hint: you could use any(). Also note that ! means not for R )

  5. Write a command that says whether or not every person has more than 13 years education. (Hint; You could use all() function)

  6. Check the which() function by using help. Write a code to get some information by using which()

RStudio

RMarkdown anatomy

Within RStudio, click on the menu File -> New File -> R Markdown.... In the pop up window, give the document a ‘Title’ and enter the ‘Author’ information (your name) and select HTML as the default output. Note that HTML files can be opened with any browser.

R Projects

You could create project to keep your files organised. Project files could contain anything (images, data, scripts etc.) related with your project.

To create a project, open RStudio and select File -> New Project… from the menu.

Alternatively, create a new project by clicking on the ‘Project’ button in the top right of RStudio and selecting ‘New Project…’

In the next window, select New Project

You could name your project and set working directory (it is the place R would looks for things such as data, images etc.Try getwd()).

You could also see your projects within RStudio in the Files tab.

Main ideas

  • We have learned the basic environment of R and Rstudio.
  • We have met the types and classes in R.
  • We have learned some operators and assigning variables with <-.
  • We learned about basic functions on vectors and matrices.
  • We learned indexing and using logical operators for subsetting vectors.

References