dschneiderch R-Journal

GEOG 5023: Quantitative Methods In Geography

Your R-Journal should be a list of function or techniques that you've used to complete labs and in class assignments. It will be useful to include worked examples in code.

Setting up RStudio Environment

setwd("path")  #use this to set working directory of input files.
read.csv("filename")  #to import a csv file. There is also read.table etc.

Use Latex syntax to write out equations nicely:

$$\bar{x} = \frac{\sum_{i=1}^N x_i}{N}$$

\[ \bar{x} = \frac{\sum_{i=1}^N x_i}{N} \]

Writing functions

mystd <- function(data) {
    d <- (data - mean(data))^2
    return(sqrt(mean(d)))
}

mystd(somedata)  #to execute the function on your data

Plotting

# USE {r fig.width=7, fig.height=6} to set size
plot(x, y)
or
plot(y ~ x)

plot(y = MN$YCoord, x = MN$XCoord, col = MN$HD, pch = 16, cex = 0.5, main = "Correlation Example r= .8")
#'col' changes the color of dots depending upon the value in the 'HD' column
#'pch' sets the symbol to a solid dot
#'cex'  makes the dot .5 the normal size
# main gives a title

plot(linearModel)  # this will plot 4 graphs to help assess model fit visually. precede with par(mfrow=c(2,2)) to see all at once.
plot(lm1, which = 1:2)  #use which command to specify which plots e.g. with plot(linearModel,which=1:2)

layout(matrix(c(1, 2), 1, 2))  #similar to subplot in matlab. set up matrix with 1 row, 2 col for plots.
par(mfrow = c(1, 2))

dev.off()  #to close the figure so the next plot isn't two plots

Summary and Accessing Variables (Objects)

summary()  #min max mean etc
names()  #header names
object(, 2)  # all rows, 2nd column
object$col2  # all rows, 2nd column by header name

c(1, 2, 3, 4)  # combine to make a vector. Does not distinguish between row and column vectors
matrix(c(1, 2, 3, 4, 5, 6), 3, 2)  # 3 rows, 2 columns

head(obj)  #displays the first 5 rows of an object

## The line below selects all rows where the block column contains values
## in our list of blocks Save the result as a new object where block was
## created based on condition
HDB <- MN[MN$Block %in% blocks, ]

## arithmetic or otherwise with parts of objects based on conditions
HDB_in_sqft <- HDB_in[HDB_in$BldgArea > 0, "AssessTot"]/HDB_in[HDB_in$BldgArea > 
    0, "BldgArea"]

Conditionals

< > == !=
inHD <- MN[MN$HD ==1,] #This line places all rows in MN with a "1" code in the HD column into a new variable inHD.

QCing data

is.na(MN[1:100, "HistDist"])  #returns true anywhere in the first 100 rows of MN$HistDist there is an NaN
ifelse(is.na(MN[1:100, "HistDist"]), 0, 1)  #replaces NaN with 0 and non-NaN with 1. Usually assign to a new object
as.factor(obj)  #tells R obj is not data but a categorial factor. summary is different

Statistical Tests

mean()
std()
sqrt()
shapiro.test()  #test for normality
wilcox.test()  # Wilcox rank sum test to compare distributions
t.test(x = inHD$AssessTot, y = outHD$AssessTot)  # 2 sample t test. two sided.
corr(x, y)  #correlation
cor.test(x, y)  #correlations with more info
lm(y ~ x + I(x^n))  #linear regression. Use I(x^n) for higher order polynomials
anova(lm1, lm2)  #compare regression models.  Use just lm1 to compare lm1 to mean of distribution