## What is R

• R is

“a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc”

• It has a 'base' system with thousands of additional packages
• You need to install the base system, and then any further packages
• The Comprehensive R Archive Network (CRAN) is at http://cran.r-project.org/index.html

• From CRAN click on
• Install R for the first time
• Both versions compatible
• If using Windows
• Right-click on the R icon to bring up properties
• Add --internet2 to the end of the Target string on the Shortcut tab
• Click on OK
• Installing Rstudio
• https://www.rstudio.com/products/rstudio/download/

## R Basics

• Inside R (most) data and results are stored in objects
• Objects can be scalars, vectors, matrices, text strings, lists or more generic objects
• Generic objects such as:
• data frames
• spatial objects similar to shapefiles
• graphs
• R is a programming language allowing new functions to be created

## Vectors and Functions

heights <- c(4.7, 5.3, 6.2, 3.8, 4.4, 7.1, 2.5)
heights
## [1] 4.7 5.3 6.2 3.8 4.4 7.1 2.5
length(heights)
## [1] 7
sum(heights)
## [1] 34
mean(heights)
## [1] 4.857143
maxHeight <- max(heights)
maxHeight
## [1] 7.1

## Subsets of Vectors - 1

heights
## [1] 4.7 5.3 6.2 3.8 4.4 7.1 2.5
heights[5]
## [1] 4.4
heights[3:6]
## [1] 6.2 3.8 4.4 7.1
heights[c(3,6,1)]
## [1] 6.2 7.1 4.7
heights[c(1,1,2,1,6)]
## [1] 4.7 4.7 5.3 4.7 7.1

## Subsets of Vectors - 2

heights
## [1] 4.7 5.3 6.2 3.8 4.4 7.1 2.5
heights[c(-1,-4)]
## [1] 5.3 6.2 4.4 7.1 2.5
heights > 6
## [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
which(heights > 6)
## [1] 3 6
heights[heights > 6]
## [1] 6.2 7.1

## Testing conditions

if (heights[1] > 6) cat("Large\n") else cat("Small\n")
## Small
# Can spread this over lines
if (heights[6] > 6)
cat("Large\n") else
cat("Small\n")
## Large
# Can do multiple instructions
if (heights[1] < 6) {
cat('Number is very small\n')
cat('Here are all the heights in order',sort(heights))
}
## Number is very small
## Here are all the heights in order 2.5 3.8 4.4 4.7 5.3 6.2 7.1

## Repetition - Loops 1

N <- length(heights)
for (i in 1:N) {
if (heights[i] > 6) {
cat(" Tree", i, "- tall.")
} else {
cat(" Tree", i, "- short.")
}
}
##  Tree 1 - short. Tree 2 - short. Tree 3 - tall. Tree 4 - short. Tree 5 - short. Tree 6 - tall. Tree 7 - short.

Determininistic loop - you know in advance how many cycles (N here)

## Repetition - Loops 2

z <- 1
count <- 0
repeat {
if (z > 100) break
z <- z * 2
count <- count + 1
cat(z,' ')
}
## 2  4  8  16  32  64  128
cat('\n It took',count,' doubles to exceed 100.')
##
##  It took 7  doubles to exceed 100.

Non-determininistic loop - you don't know in advance how many cycles

## Basic Types of Data

• Sometimes called levels of measurement
• Continuous (numerical)
• ratio data (has a well defined zero) - eg height
• interval data (no well defined zero) - eg date
• Categorical (factor)
• Ordinal data (rankable) - eg 'strongly disagree', … ,'strongly agree'
• Nominal data - (no intrinsic order) eg 'red', 'blue', 'yellow'
• Influences the type of analysis and visualisation

## Factors in R

x <- c("MacOS","Linux","Windows","Windows","MacOS")
xf <- factor(x)
xf # Nominal
## [1] MacOS   Linux   Windows Windows MacOS
## Levels: Linux MacOS Windows
y <- c("good","very good","good","average","poor","very poor","average")
yf <- factor(y,levels=c("very poor","poor","average","good","very good"),ordered=TRUE)
yf # Ordinal
## [1] good      very good good      average   poor      very poor average
## Levels: very poor < poor < average < good < very good
yf[1] > y[3]
## [1] FALSE
xf[1] > xf[3]
## Warning in Ops.factor(xf[1], xf[3]): '>' not meaningful for factors

## Graphics

• R has a rich range of tools for visualising data
• Visualisation is often an initial step in data exploration
• Here are some simple examples - first a demonstration data set
head(mtcars,n=6)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

mtcars is a built-in data frame: note that the columns are are different types: character, real, integer. We can use some simple exploratory visualisations…

hist(mtcars$mpg,main="Fuel Consumption") ## Boxplot boxplot(mtcars$mpg,main="Fuel Consumption")

• Shows median, quartiles, and the range

## Boxplot: Comparing Categories

boxplot(mpg~cyl,main='Fuel Consumption',xlab='Cylinders',data=mtcars)

## Boxplot: Nicer Horizontally

boxplot(mpg~cyl,main='Fuel Consumption',xlab='Cylinders',data=mtcars, horizontal=TRUE)

Note boxplot also finds outliers

## Relationships: Scatterplot

plot(mtcars$wt,mtcars$mpg,main="MPG vs Weight")

Heavier cars have poorer fuel consumption

## Space and Time Data

• There is an equally rich variety of approaches for handling spatial and temporal data in R
• These will be introduced over the next few weeks
• Data sets
• Rainfall records 1850 - 2014
• Crime data