Introduction

All the regular mathematical operations can be carried out. The usual signals and hierarchies are used.

4 + 5
4 * 5
4 / 5
4^5

sqrt(5)
log(5)

test <- 2*3
test

test = 2/3
test

Practice

  • Calculate the average of 2, 5, 6, 8, 9, 15, 1, 3
  • Calculate the difference between 225 and 200 as a percentage change

Objects

R uses objects. Among the main objects are:

When things go wrong, it is often the case that you have the wrong class. For example, you could be trying to adding up words (“strings”) together. That can only be done with + if you are using an integer or a numeric.

String

For an example of string

a <- "My name"
a
## [1] "My name"

Practice

  • Assign you first and last name to the objects a and b
  • Add a to b

Logical (boolean, true or false)

Test whether it is true or false.

1 == 1
## [1] TRUE
1 == 2 
## [1] FALSE
1 & 2 == 1
## [1] FALSE
1 | 2 == 1
## [1] TRUE
1 > 2
## [1] FALSE
1 < 2
## [1] TRUE

Vectors and matrices

Numerics and strings can be concatenated or combined to form a vector or matrix

mynumbers <- c(3,5,6,7,9)
mynumbers
## [1] 3 5 6 7 9
mynumbers <- c(1:10)
mynumbers
##  [1]  1  2  3  4  5  6  7  8  9 10

To create vectors there are a number of functions that can be used. For example, to create a sequence, use the command sequence

a <- seq(0.5,2.5, length=100)
a
##   [1] 0.5000000 0.5202020 0.5404040 0.5606061 0.5808081 0.6010101 0.6212121
##   [8] 0.6414141 0.6616162 0.6818182 0.7020202 0.7222222 0.7424242 0.7626263
##  [15] 0.7828283 0.8030303 0.8232323 0.8434343 0.8636364 0.8838384 0.9040404
##  [22] 0.9242424 0.9444444 0.9646465 0.9848485 1.0050505 1.0252525 1.0454545
##  [29] 1.0656566 1.0858586 1.1060606 1.1262626 1.1464646 1.1666667 1.1868687
##  [36] 1.2070707 1.2272727 1.2474747 1.2676768 1.2878788 1.3080808 1.3282828
##  [43] 1.3484848 1.3686869 1.3888889 1.4090909 1.4292929 1.4494949 1.4696970
##  [50] 1.4898990 1.5101010 1.5303030 1.5505051 1.5707071 1.5909091 1.6111111
##  [57] 1.6313131 1.6515152 1.6717172 1.6919192 1.7121212 1.7323232 1.7525253
##  [64] 1.7727273 1.7929293 1.8131313 1.8333333 1.8535354 1.8737374 1.8939394
##  [71] 1.9141414 1.9343434 1.9545455 1.9747475 1.9949495 2.0151515 2.0353535
##  [78] 2.0555556 2.0757576 2.0959596 2.1161616 2.1363636 2.1565657 2.1767677
##  [85] 2.1969697 2.2171717 2.2373737 2.2575758 2.2777778 2.2979798 2.3181818
##  [92] 2.3383838 2.3585859 2.3787879 2.3989899 2.4191919 2.4393939 2.4595960
##  [99] 2.4797980 2.5000000
b <- seq(1,10,1)
b
##  [1]  1  2  3  4  5  6  7  8  9 10
mynumbers <- 1:12
m <-matrix(mynumbers, nrow=4)
m
##      [,1] [,2] [,3]
## [1,]    1    5    9
## [2,]    2    6   10
## [3,]    3    7   11
## [4,]    4    8   12

Extract elements from the matrix by using [x, y] where x is the row and y is the column. Therefore m[1, 1] would be the top left hand corner. m[ , 1] will select all the first column, m[2, ] will select all the second row.

Practice

  • Create a matrix with 3 rows and 3 columns with numbers running from 1 to 9
  • Extract the top right and bottom left numbers

Data frames

Matrices can only contain one type of object. Data-frames can group together vectors or various class. These will be the most useful for us as we want time series with dates and integers.

Individual components of a vector, matrix or data.frame can be identifies by using square brackets and a pair of numbers with the first equal to the row and the second equal to column. You only need the first number for a vector.

Students <- c('Rob', 'James', 'Sam', 'Jane')
Marks <- c(20, 45, 65, 52)
myclass <- data.frame(Students, Marks)
myclass[, 1]
## [1] "Rob"   "James" "Sam"   "Jane"
myclass[, 2]
## [1] 20 45 65 52
mean(myclass[, 2])
## [1] 45.5

It is also possible to subset columns by their name in a dataframe. Use the $ after the dataframe name.

myclass$Students
## [1] "Rob"   "James" "Sam"   "Jane"
myclass$Marks
## [1] 20 45 65 52
myclass$Students == 'Rob'
## [1]  TRUE FALSE FALSE FALSE
myclass[myclass$Students == 'Rob', ]
##   Students Marks
## 1      Rob    20
myclass$Marks > 50
## [1] FALSE FALSE  TRUE  TRUE
myclass$Marks > 50
## [1] FALSE FALSE  TRUE  TRUE
myclass[myclass$Marks > 50, ]
##   Students Marks
## 3      Sam    65
## 4     Jane    52

Practice

  • Extract the marks for Jane
  • Extract those marks that are below 50

Getting help

There is a lot of help. Open source ensures that there is a community where help is available. You might try one of the AI machines like Chat-GPT4 or dedicated help forums like Stackoverflow.

Search for functions and assistance. You can also get help built in to R and Rstudio.

Practice

Plotting

The plot function will allow you to create graphs with the data that you have. Plotting is the best way to understand the data that you have. It will also identify if there are any errors.

The plot function will plot the data. You need two series that match. If you get an error it is frequently the case that they are of different length. You can you add more lines by using the lines function. Here we create two series and plot.

x <- seq(1: 10)
y1 <- x^2
y2 <- 2 * x ^ 2
plot(x,y2)
lines(y1)

There are a number of parameters that can be used to customise the plot.

Practice

Using the same x, y1 and y2 variables, see if you can create a plot with a heading and alternative labels for the x and y axis. If you are adventurous you can change the colour, weight and type of lines.

Histogram

It is also possible to create histograms. Here we use the rnorm function to create 100 normal random variables and then to calculate the mean and standard deviation of these series and use these as the basis for constructing a normal distribution using these values for the mean and standard deviation.

z <- rnorm(100)
hist(z, prob=TRUE, col = 'cornflowerblue')
mu <- mean(z)
sig <-sd(z)
x <- seq(-4,4,length=500)
y <- dnorm(x,mu,sig)
lines(x,y, col='red')

Practice

  • Repeat the last exercise using 100,000 generated normal random variables. What do you notice?
  • Take a look at the function rnorm and work out what it does and the alternatives like dnorm, qnorm
  • Find the midpoint of a standard normal distribution using the qnorm function and then check the answer by using pnorm. Remember that the probability of being below the mid-point of a normal standard deviation is 50%. Remember that a standard normal distribution has a mean of zero and a standard deviation of one.