In this practical you will use regression, probably the standard method for constructing predictive models. You will learn how to set up a regression, consider about some of the key issues in model construction (selection) and develop local, kernel based regression models using GWR (Brunsdon et al 2016; Fotheringham et al 2002).
You will find Chapters 5 and Chapter 8 in An Introduction to R for Spatial Analysis and Mapping by Chris Brunsdon and Lex Comber useful for this practical.
The aim of this practical is to develop your competency and expertise in generating inference from data and to understand some of the limitations of classic statistical inference in the analysis of big data. This practical provides a smooth introduction to the sessions next week on Machine Learning and Outlier detection.
To do this you will use georgia
dataset in the GISTools
package and the GWmodel
package which is curated and supported by the original GWR team. The code below will make sure you have all of the R packages you will need installed. The tests check whether a given package is in the list of installed packages, and installs it if not. Package installation only needs to be done once - not every time you use R - so if you are sure you have all of the packages installed already, you can skip this step.
# set a mirror
options(repos = c(CRAN = "http://cran.rstudio.com"))
# test for package existance and install
if (!is.element("tidyverse", installed.packages()))
install.packages("tidyverse", dep = T)
if (!is.element("GISTools", installed.packages()))
install.packages("GISTools", dep = T)
if (!is.element("GWmodel", installed.packages()))
install.packages("GWmodel", dep = T)
if (!is.element("rgdal", installed.packages()))
install.packages("rgdal", dep = T)
if (!is.element("tmap", installed.packages()))
install.packages("tmap", dep = T)
if (!is.element("knitr", installed.packages()))
install.packages("knitr", dep = T)
if (!is.element("car", installed.packages()))
install.packages("car", dep = T)
if (!is.element("gclus", installed.packages()))
install.packages("gclus", dep = T)
if (!is.element("kableExtra", installed.packages()))
install.packages("kableExtra", dep = T)
Once we are sure all the packages are installed, you need to load them into the current session:
# load into the R session
library(tidyverse)
library(GISTools)
library(GWmodel)
library(rgdal)
library(tmap)
library(knitr)
library(car)
library(gclus)
library(kableExtra)
You should load the georgia
data, examine it and assign as selection of the variables to a data frame df
:
library(GISTools)
data(georgia)
# have a look!
class(georgia)
head(georgia@data)
# assign selected variables to df
df <- as_tibble(georgia@data[, c(14,4:9)])
df$MedInc <- df$MedInc/1000
You will see the df
object is a tibble
format and contains a number of variables for the counties in Georgia from the 1990 census including the percentage of the population in each County that
PctRural
)PctBach
)PctEld
)PctFB
)PctPov
)PctBlack
) and the median income of the county (MedInc
) (in 1000s of dollars)Below is a quick look at the first few records of the data:
MedInc | PctRural | PctBach | PctEld | PctFB | PctPov | PctBlack |
---|---|---|---|---|---|---|
32.152 | 75.6 | 8.2 | 11.43 | 0.64 | 19.9 | 20.76 |
27.657 | 100.0 | 6.4 | 11.77 | 1.58 | 26.0 | 26.86 |
29.342 | 61.7 | 6.6 | 11.11 | 0.27 | 24.1 | 15.42 |
29.610 | 100.0 | 9.4 | 13.17 | 0.11 | 24.8 | 51.67 |
36.414 | 42.7 | 13.3 | 8.64 | 1.43 | 17.5 | 42.39 |
41.783 | 100.0 | 6.4 | 11.37 | 0.34 | 15.1 | 3.49 |
To see this in R, enter
head(df)
Our aim in this practical is to build a model that describes the median income using the factors listed above. We will use this to understand a number of key features in regression.
We should have a look at the data to see what kinds of properties it has - we could use previously introduced functions such as summary
to summarize the attributes. You may wish to use this now. But in terms of model selection what we are interested in the degree to which different variables are good predictors of Median Income.
In order to examine the relationships between the variables in df
we can plot the continuous data against each other and use the cor
function to examine collinearity amongst them:
plot(df, cex = 0.5, col = grey(0.145,alpha=0.5))
Correlations between numeric variables.
Another useful tool here is to show the upper triangle of the scatterplot matrix with smoothed trend lines. These are achieved with lowess curve fits (Cleveland, 1979) - these are smooth (but possibly curved) bivariate trend lines - and provide a good way of judging by eye whether there is colinearity in the predictor variables. Essentially a straight-line shaped trend with not too much scattering of the individual points suggests collinearity might be an issue. When two predictors are correlated it can be difficult to identify whether it is one or the other (or both) that influence the quantity to be predicted. The code below does this. Note that as collinearity amongst the predictors is the key concern here, we focus on these, ignoring medinc
for this graphic. The upper.panel=panel.smooth
causes the lowess curves to be added.
plot(df[,-1], cex = 0.5, col = grey(0.145,alpha=0.5), upper.panel=panel.smooth)
Visually, it seems that there may be some colinearity between
PctPov
,PctBlack
and PctEld
.
A numerical table of these relationships as correlations can be generated with the kable
function:
library(knitr)
tab <- round(cor(df), 3)
kable(tab, booktabs = T, format = 'markdown')
MedInc | PctRural | PctBach | PctEld | PctFB | PctPov | PctBlack | |
---|---|---|---|---|---|---|---|
MedInc | 1.000 | -0.169 | 0.523 | -0.615 | 0.320 | -0.818 | -0.536 |
PctRural | -0.169 | 1.000 | -0.619 | 0.390 | -0.547 | 0.174 | -0.069 |
PctBach | 0.523 | -0.619 | 1.000 | -0.458 | 0.672 | -0.402 | -0.109 |
PctEld | -0.615 | 0.390 | -0.458 | 1.000 | -0.483 | 0.568 | 0.297 |
PctFB | 0.320 | -0.547 | 0.672 | -0.483 | 1.000 | -0.329 | -0.112 |
PctPov | -0.818 | 0.174 | -0.402 | 0.568 | -0.329 | 1.000 | 0.736 |
PctBlack | -0.536 | -0.069 | -0.109 | 0.297 | -0.112 | 0.736 | 1.000 |
What we are looking for in the plots, and to be confirmed in the table, are correlations that look like straight lines, showing that values in one variable change with changes in another.
A final useful tool here is Catherine Hurley’s gclus
package. This is similar to the data frame plotting, but provides tools to reorder the variables so that the most strongly correlated pairs are closer to the diagonal in the plots. It can also colour-code the plots to show whether the correlations are positive (red) negative (blue) or near to zero (yellow). For this plot we focus on the predictoes again. This is shown below
library(gclus)
cor.mat <- cov(df[,-1])
cpairs(df[,-1],order.single(cor.mat),dmat.color(cor.mat))
The order of the variables in the diagonal generally puts positively correlated variables near to each other as much as possible. The ordering here is based on a clustering method (single linkage) provided via the order.single
functtion in gclus
- the colouring is based on correlations (via the dmat.color
) function in gclus
.
Note, you could also write the table out to a .csv
file:
write.csv(tab, file = "tab.csv")
The plot and the table above of the correlations between the variables show that a number of the variables are highly correlated (< -0.8 or > +0.8) with each other and with the response variable MedInc
and others less so. This provides us with confidence that we can construct a reasonable model of from this data - at least some of the variables are correlated with MedInc
- but are may be needed in interpreting some of the regression parameters due to some collinearity in the predictors.
We are interested in a predictive model of income and it is useful to how that value varies. The code below plots the histogram of the variable MedInc
. Note the use of the parameter prob=T
and prob = F
: if set to TRUE
this computes probabilities for each histogram bin - rather than a frequency count. This makes the \(y\)-axis compatible with a kernel density estimate (a smooth estimate of the probability distribution of \(x\)) , so the two may be overlaid. Here this is done by using the lines
function with a kernel density estimate for the distribution given by density
. A rug
plot is added to show the exact locations of the values on the \(x\)-axis.
The graphs were obtained with the following code:
hist(df$MedInc, xlab='Median Income ($1000\'s)', main='Histogram of Median Income', col = 'DarkRed')
hist(df$MedInc, prob=T, xlab='Median Income ($1000\'s)', main='Histogram of Median Income', col='NavyBlue')
lines(density(df$MedInc,na.rm=T),col='salmon',lwd=2)
rug(jitter(df$MedInc))
# qqPlot(df$MedInc,main='Normal QQ plot of Median Income')
You may wish to explore some of the other variables in the same way. For example:
hist(df$PctFB, prob=T, xlab='Proportion of Foreign-Born Population (%)', main='Histogram of % Foreign-Born',col='peru')
lines(density(df$PctFB,na.rm=T),col='olivedrab',lwd=2)
rug(jitter(df$PctFB))
# qqPlot(df$PctBach,main='Normal QQ plot of % with a Degree')
As an aside, you will see R offers quite a range of colours - including a few I had never heared of - ‘peru’ is a colour ?! - these can be helpful in making plots more visible. Entering colors()
gives the full list of named R colours - and is used in an addendum at the end of this is session to create a visual guide.
So we have some data. We have had a look at some of the variables, seen how some of them relate to the variable we are interested in predicting, MedInc
, how they correlate with each other, and perhaps we have even started to think about hypotheses that we might wish to test in our model.
We will start with a hypothesized model of County Median Income in the state of Georgia that associated with rurality (PctRurual
), whether people went to university (PctBach
), how old the population is (PctEld
), the number of people that are foreign born (PctFB
), the percentage of the populations classed as being in poverty (PctPov
) and being black (PctBlack
).
In R we can construct a regression model using the function lm
which stands for linear model. This is a standard ordinary regression function, although simple it is very powerful and there are many variants and functions for manipulating lm
outputs: remember R was originally a stats package.
The equation for a regression is:
\(y_i = \beta_{0} + \sum_{m}^{k=1}\beta_{k}x_{ik} + \epsilon_i \textsf{ for } i \in 1 \cdots n\)
where \(y_i\) is the response variable for observation \(i\), \(x_{ik}\) is the value of the \(k^{\textsf{th}}\) predictor variable for observation \(i\), \(m\) is the number of predictor variables, \(\beta_{0}\) is the intercept term, \(\beta_{k}\) is the regression coefficient for the \(k^{th}\) predictor variable, \(\epsilon_i\) is the random error term for observation \(i\) and \(n\) is the number of observations.
In our case, the response variable \(y\) is median income (MedInc
), and the 6 socio-economic variables PctRural
, PctBach
, PctEld
, PctFB
, PctPov
, PctBlack
are the different predictor variables, the \(k\) different \(x\)’s.
The basic formula in R to specify a linear regression model, lm
is as follows:
model <- lm(variable_we_are_trying_to_predict ~ predictor_variable1 +
predictor_variable2...)
where the ~ sign (tilde) separates the variable we are trying to model (the response variable) from the variables we will use to predict it, the predictor variables. The predictor variables are usually separated by a +
sign - meaning that they are linear terms in the model added together.
We can specify our first model.
m <- lm(MedInc~PctRural+PctBach+PctEld+PctFB+PctPov+PctBlack, data = df)
And look at its summary attributes
summary(m)
##
## Call:
## lm(formula = MedInc ~ PctRural + PctBach + PctEld + PctFB + PctPov +
## PctBlack, data = df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -12.4203 -2.9897 -0.6163 2.2095 25.8201
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 52.59895 3.10893 16.919 < 2e-16 ***
## PctRural 0.07377 0.02043 3.611 0.000414 ***
## PctBach 0.69726 0.11221 6.214 4.73e-09 ***
## PctEld -0.78862 0.17979 -4.386 2.14e-05 ***
## PctFB -1.29030 0.47388 -2.723 0.007229 **
## PctPov -0.95400 0.10459 -9.121 4.19e-16 ***
## PctBlack 0.03313 0.03717 0.891 0.374140
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.155 on 152 degrees of freedom
## Multiple R-squared: 0.7685, Adjusted R-squared: 0.7593
## F-statistic: 84.09 on 6 and 152 DF, p-value: < 2.2e-16
There are lots of things to look at in the summary of the model but we will focus on 3 aspects:
First, Significance. If we examine the model we can see that nearly all are significant predictors of Median Income. The level of significance is indicated by the \(P\)-value in the column \(Pr(>|t|)\) and the asterisks \(*\). For each coefficient \(\beta_k\) these are the significance levels associated with the hypothesis \(\textsf{H}_0 : \beta_k = 0\) against the alternative \(\textsf{H}_0 : \beta_k \ne 0\) - this corresponds to the probability that one would obtain a \(t\)-statistic for \(\beta_k\) at least as large as the one observed given \(\textsf{H}_0\) is true.
It is a measure of how surprising the relationship is and indicative of whether the relationship between \(x_k\) and \(y\) found in this data could have been found by chance. Very small values suggest that the level of association found here could have come from a random sample of data if \(\beta_k\) really is zero. Cutting to the chase, small \(P\)-values mean strong evidence for a non-zero regression coefficient.
So in this case we can say:
PctRural
, PctBach
, PctEld
, and PctPov
are significantly associated with changes in MedInc
at the at the \(<0.001\) level’, which is actually an indicator of less than \(0.001\). So were are even more confident that the observed relationship between these variables and MedInc are not by chance and can again confidently reject the Null hypotheses.PctBlack
was not found to be significantly associated with MedInc
.Second, the Coeffeicient Estimates. These are the \(\beta_{k}\) values that tell you the rate of change between \(x_k\) and \(y\) in the column ‘Estimate’ in the summary table. The first of these is the Intercept - this is \(\beta_0\) in the regression equation and tells you the value of \(y\) if \(x_1 \cdots x_k\) are all zero.
Remember regression is like taking lots of \(x_k\) and \(y\) values and seeing how well they are correlated. The regression fits a line between them. So in the figure below a regression model with just one \(x\) is defined, it is summarised, and then the \(x\) and \(y\) values are plotted and the model of the relationship is plotted.
m2 <- lm(df$MedInc~df$PctBach)
summary(m2)
##
## Call:
## lm(formula = df$MedInc ~ df$PctBach)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.124 -6.025 -1.905 3.416 39.969
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.5913 1.5476 17.182 < 2e-16 ***
## df$PctBach 0.9643 0.1255 7.684 1.57e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.986 on 157 degrees of freedom
## Multiple R-squared: 0.2733, Adjusted R-squared: 0.2687
## F-statistic: 59.04 on 1 and 157 DF, p-value: 1.566e-12
plot(df$PctBach, df$MedInc, xlab = "x = PctBach", ylab = "y = MedInc")
abline(m2, col = "red", lty = 2, lwd = 2)
Here we can see that the intercept of the relationship is 26.6 and the line shows this, and that the rate of change coefficient between MedInc
and PctBach
, when they are modelled on their own, is 0.964. This means that each increase in 1 of PctBach
(each extra 1% of people with a degree) is associated with an increase in 0.964 in income, ie about $964 per annum.
So returning to our model m
, we can now see how changes of 1 unit for the different predictor variables are associated with changes in MedInc
. The association of PctRural
is positive but small, the association of PctBach
is reduced due to interactions with other \(x\)’s but is still positive, the PctEld
, PctFB
and PctPov
variables are strongly negatively associated with MedInc
.
So the regression model m
gives us an idea of how different factors are associated in different ways with Median Income in the counties of the state of Georgia.
Third the ** \(R^2\) ** value. This provides a measure of model fit. They are calculated as the difference between the actual values of \(y\) (eg of MedInc
) and the values predicted by the model - sometimes written as \(\hat{y\ }\) (called ‘\(y\)-hat’). A good description is here http://blog.minitab.com/blog/adventures-in-statistics-2/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodness-of-fit: Generally \(R^2\)’s of greater than 0.7 indicate a reasonably well fitting model. Some users of this method describe low \(R^2\) as ‘bad’ - they are not - it just means you a have a more imprecise model - the predictions are a bit wrong-er! Essentially there is a sliding scale of reliability of the predictions - based on the relative amounts of deterministic and random variability in the system being analysed - and \(R^2\) is a measure of this.
Finally, we are also interested in outliers - these are records that the model does not describe very well, either vastly over-predicting Median Income or under-predicting it. When the bivariate regression was considered earlier, these were easy to find, since they were just the points lying a long way from the regression line. However, when there are multiple predictors, relationships cannot be encapsulated in two dimensional space.
For this situation, a useful diagnostic is the studentised residual. For different values of predictors, the expected difference between the actual and fitted \(y\) values has a different variance even if the theoretic model gives \(\epsilon_i\) a constant variance. This discrepancy is due to issues such as collinear predictors. Standardised variables attempt to compensate for this. Plotting these and looking for notably high or low values is a helpful way of spotting outliers in the regression model.
A simple way of doing this is just to plot the residuals in order:
s.resids <- rstudent(m)
plot(s.resids,type='h')
abline(h=c(-2,2),col='lightskyblue')
Here we see some residuals are relatively high. Normally, one might expect standardised residuals to lie between -2 and 2. Adding lines to this plot helps to identify when this is not the case - as is done above.
Another useful job is to identify which are the outlying cases - here just done as a basic table (via kable
)
df_out <- data_frame(Name=georgia$Name,Residual=s.resids)
## Warning: `data_frame()` is deprecated, use `tibble()`.
## This warning is displayed once per session.
df_out <- df_out[abs(df_out$Residual)>2,]
kable(df_out)
Name | Residual |
---|---|
Charlton | -2.248484 |
Chattahoochee | -2.205354 |
Clarke | -2.918361 |
Fayette | 2.984996 |
Forsyth | 5.620108 |
Seminole | 2.954353 |
Wilkinson | -2.307853 |
Negative values here have a much lower income than the model predicts, positive values suggest it is much higher.
Finally, we need to check another model assumption in lm
- that the error term \(\epsilon_i\)’s are normally distributed. The best way to do this is via a qq-plot which plots the empirical quantiles of a variable whose distribution we are interested in against the theoretical distribution we are testing for. If the variable is from this distribution, the results should yield a straight line - frequently a reference line is provided to test this. Here s.resids
is the variable of interest - the studentised residuals, and these are compared to the \(t\) distribution. Although the \(\epsilon_i\)’s have a normal distribution, theory shows that the studentised residuals derived from a sample coming from this model have a \(t\) distribution. However, the qqPlot
function from the package car
does all the work, and requires only the regression model object to create a plot:
qqPlot(m)
This demonstrates that mostly the residuals do conform to this, with a couple of extreme outliers at the higher end - labelled in the plot. Consulting the earlier table suggests these are Fayette and Forsythe Counties.
It is also important to consider spatial patterns in the residuals, from a geographical viewpoint. The code below plots the outliers and their spatial distributions:
# determine residuals
s.resids = rstudent(m)
# 1.5 standard deviations
resid.shades = shading(c(-1.5,1.5),c("indianred1","antiquewhite2","powderblue"))
# define a color scheme
cols = resid.shades$cols[1 + findInterval(s.resids, resid.shades$breaks)]
# set plot paramters
par(mar = c(0,0,2,0))
# have a look at the spatial distribution of outliers
choropleth(georgia,s.resids,resid.shades, main = "Model Outliers")
This session finishes here, but you will need the results of this session later on. To save all of the variables, regression objects and so on from this session, enter
save.image(file='session2.RData')
This saves an image of all of the R objects currently defined in a file called session2.RData
- later on you will be able to load this again via the load
function.
As referred to earlier in the text. For those interested the code used is listed below. You’ll need kableExtra
installed to get this to work.
library(kableExtra)
### Matrix of rgb values for each named colour
rgbmat <- t(col2rgb(colors()))
### Turned into hex formats
hexes <- apply(rgbmat, 1, function(x) sprintf("#%02X%02X%02X",x[1],x[2],x[3]))
### These will be backgrounds - foreground either black or white depending on lightness of colour
textcol <- ifelse(rowMeans(rgbmat) > 156, 'black', 'white')
### Now create the cells with fg/bg colour spec
cols <- colours()
cols <- cell_spec(cols,background=hexes,color=textcol)
### Make this a grid
colour_grid <- matrix(c(cols,rep("",7)),83,8)
colnames(colour_grid)=rep(" ",8)
### Use kableExtra stuff to create html table
ktab <- kable(colour_grid,align='l',format='html',escape=FALSE)
kable_styling(ktab)
white | darkmagenta | gray14 | gray97 | grey72 | lightgray | orange1 | seashell3 |
aliceblue | darkolivegreen | gray15 | gray98 | grey73 | lightgreen | orange2 | seashell4 |
antiquewhite | darkolivegreen1 | gray16 | gray99 | grey74 | lightgrey | orange3 | sienna |
antiquewhite1 | darkolivegreen2 | gray17 | gray100 | grey75 | lightpink | orange4 | sienna1 |
antiquewhite2 | darkolivegreen3 | gray18 | green | grey76 | lightpink1 | orangered | sienna2 |
antiquewhite3 | darkolivegreen4 | gray19 | green1 | grey77 | lightpink2 | orangered1 | sienna3 |
antiquewhite4 | darkorange | gray20 | green2 | grey78 | lightpink3 | orangered2 | sienna4 |
aquamarine | darkorange1 | gray21 | green3 | grey79 | lightpink4 | orangered3 | skyblue |
aquamarine1 | darkorange2 | gray22 | green4 | grey80 | lightsalmon | orangered4 | skyblue1 |
aquamarine2 | darkorange3 | gray23 | greenyellow | grey81 | lightsalmon1 | orchid | skyblue2 |
aquamarine3 | darkorange4 | gray24 | grey | grey82 | lightsalmon2 | orchid1 | skyblue3 |
aquamarine4 | darkorchid | gray25 | grey0 | grey83 | lightsalmon3 | orchid2 | skyblue4 |
azure | darkorchid1 | gray26 | grey1 | grey84 | lightsalmon4 | orchid3 | slateblue |
azure1 | darkorchid2 | gray27 | grey2 | grey85 | lightseagreen | orchid4 | slateblue1 |
azure2 | darkorchid3 | gray28 | grey3 | grey86 | lightskyblue | palegoldenrod | slateblue2 |
azure3 | darkorchid4 | gray29 | grey4 | grey87 | lightskyblue1 | palegreen | slateblue3 |
azure4 | darkred | gray30 | grey5 | grey88 | lightskyblue2 | palegreen1 | slateblue4 |
beige | darksalmon | gray31 | grey6 | grey89 | lightskyblue3 | palegreen2 | slategray |
bisque | darkseagreen | gray32 | grey7 | grey90 | lightskyblue4 | palegreen3 | slategray1 |
bisque1 | darkseagreen1 | gray33 | grey8 | grey91 | lightslateblue | palegreen4 | slategray2 |
bisque2 | darkseagreen2 | gray34 | grey9 | grey92 | lightslategray | paleturquoise | slategray3 |
bisque3 | darkseagreen3 | gray35 | grey10 | grey93 | lightslategrey | paleturquoise1 | slategray4 |
bisque4 | darkseagreen4 | gray36 | grey11 | grey94 | lightsteelblue | paleturquoise2 | slategrey |
black | darkslateblue | gray37 | grey12 | grey95 | lightsteelblue1 | paleturquoise3 | snow |
blanchedalmond | darkslategray | gray38 | grey13 | grey96 | lightsteelblue2 | paleturquoise4 | snow1 |
blue | darkslategray1 | gray39 | grey14 | grey97 | lightsteelblue3 | palevioletred | snow2 |
blue1 | darkslategray2 | gray40 | grey15 | grey98 | lightsteelblue4 | palevioletred1 | snow3 |
blue2 | darkslategray3 | gray41 | grey16 | grey99 | lightyellow | palevioletred2 | snow4 |
blue3 | darkslategray4 | gray42 | grey17 | grey100 | lightyellow1 | palevioletred3 | springgreen |
blue4 | darkslategrey | gray43 | grey18 | honeydew | lightyellow2 | palevioletred4 | springgreen1 |
blueviolet | darkturquoise | gray44 | grey19 | honeydew1 | lightyellow3 | papayawhip | springgreen2 |
brown | darkviolet | gray45 | grey20 | honeydew2 | lightyellow4 | peachpuff | springgreen3 |
brown1 | deeppink | gray46 | grey21 | honeydew3 | limegreen | peachpuff1 | springgreen4 |
brown2 | deeppink1 | gray47 | grey22 | honeydew4 | linen | peachpuff2 | steelblue |
brown3 | deeppink2 | gray48 | grey23 | hotpink | magenta | peachpuff3 | steelblue1 |
brown4 | deeppink3 | gray49 | grey24 | hotpink1 | magenta1 | peachpuff4 | steelblue2 |
burlywood | deeppink4 | gray50 | grey25 | hotpink2 | magenta2 | peru | steelblue3 |
burlywood1 | deepskyblue | gray51 | grey26 | hotpink3 | magenta3 | pink | steelblue4 |
burlywood2 | deepskyblue1 | gray52 | grey27 | hotpink4 | magenta4 | pink1 | tan |
burlywood3 | deepskyblue2 | gray53 | grey28 | indianred | maroon | pink2 | tan1 |
burlywood4 | deepskyblue3 | gray54 | grey29 | indianred1 | maroon1 | pink3 | tan2 |
cadetblue | deepskyblue4 | gray55 | grey30 | indianred2 | maroon2 | pink4 | tan3 |
cadetblue1 | dimgray | gray56 | grey31 | indianred3 | maroon3 | plum | tan4 |
cadetblue2 | dimgrey | gray57 | grey32 | indianred4 | maroon4 | plum1 | thistle |
cadetblue3 | dodgerblue | gray58 | grey33 | ivory | mediumaquamarine | plum2 | thistle1 |
cadetblue4 | dodgerblue1 | gray59 | grey34 | ivory1 | mediumblue | plum3 | thistle2 |
chartreuse | dodgerblue2 | gray60 | grey35 | ivory2 | mediumorchid | plum4 | thistle3 |
chartreuse1 | dodgerblue3 | gray61 | grey36 | ivory3 | mediumorchid1 | powderblue | thistle4 |
chartreuse2 | dodgerblue4 | gray62 | grey37 | ivory4 | mediumorchid2 | purple | tomato |
chartreuse3 | firebrick | gray63 | grey38 | khaki | mediumorchid3 | purple1 | tomato1 |
chartreuse4 | firebrick1 | gray64 | grey39 | khaki1 | mediumorchid4 | purple2 | tomato2 |
chocolate | firebrick2 | gray65 | grey40 | khaki2 | mediumpurple | purple3 | tomato3 |
chocolate1 | firebrick3 | gray66 | grey41 | khaki3 | mediumpurple1 | purple4 | tomato4 |
chocolate2 | firebrick4 | gray67 | grey42 | khaki4 | mediumpurple2 | red | turquoise |
chocolate3 | floralwhite | gray68 | grey43 | lavender | mediumpurple3 | red1 | turquoise1 |
chocolate4 | forestgreen | gray69 | grey44 | lavenderblush | mediumpurple4 | red2 | turquoise2 |
coral | gainsboro | gray70 | grey45 | lavenderblush1 | mediumseagreen | red3 | turquoise3 |
coral1 | ghostwhite | gray71 | grey46 | lavenderblush2 | mediumslateblue | red4 | turquoise4 |
coral2 | gold | gray72 | grey47 | lavenderblush3 | mediumspringgreen | rosybrown | violet |
coral3 | gold1 | gray73 | grey48 | lavenderblush4 | mediumturquoise | rosybrown1 | violetred |
coral4 | gold2 | gray74 | grey49 | lawngreen | mediumvioletred | rosybrown2 | violetred1 |
cornflowerblue | gold3 | gray75 | grey50 | lemonchiffon | midnightblue | rosybrown3 | violetred2 |
cornsilk | gold4 | gray76 | grey51 | lemonchiffon1 | mintcream | rosybrown4 | violetred3 |
cornsilk1 | goldenrod | gray77 | grey52 | lemonchiffon2 | mistyrose | royalblue | violetred4 |
cornsilk2 | goldenrod1 | gray78 | grey53 | lemonchiffon3 | mistyrose1 | royalblue1 | wheat |
cornsilk3 | goldenrod2 | gray79 | grey54 | lemonchiffon4 | mistyrose2 | royalblue2 | wheat1 |
cornsilk4 | goldenrod3 | gray80 | grey55 | lightblue | mistyrose3 | royalblue3 | wheat2 |
cyan | goldenrod4 | gray81 | grey56 | lightblue1 | mistyrose4 | royalblue4 | wheat3 |
cyan1 | gray | gray82 | grey57 | lightblue2 | moccasin | saddlebrown | wheat4 |
cyan2 | gray0 | gray83 | grey58 | lightblue3 | navajowhite | salmon | whitesmoke |
cyan3 | gray1 | gray84 | grey59 | lightblue4 | navajowhite1 | salmon1 | yellow |
cyan4 | gray2 | gray85 | grey60 | lightcoral | navajowhite2 | salmon2 | yellow1 |
darkblue | gray3 | gray86 | grey61 | lightcyan | navajowhite3 | salmon3 | yellow2 |
darkcyan | gray4 | gray87 | grey62 | lightcyan1 | navajowhite4 | salmon4 | yellow3 |
darkgoldenrod | gray5 | gray88 | grey63 | lightcyan2 | navy | sandybrown | yellow4 |
darkgoldenrod1 | gray6 | gray89 | grey64 | lightcyan3 | navyblue | seagreen | yellowgreen |
darkgoldenrod2 | gray7 | gray90 | grey65 | lightcyan4 | oldlace | seagreen1 | |
darkgoldenrod3 | gray8 | gray91 | grey66 | lightgoldenrod | olivedrab | seagreen2 | |
darkgoldenrod4 | gray9 | gray92 | grey67 | lightgoldenrod1 | olivedrab1 | seagreen3 | |
darkgray | gray10 | gray93 | grey68 | lightgoldenrod2 | olivedrab2 | seagreen4 | |
darkgreen | gray11 | gray94 | grey69 | lightgoldenrod3 | olivedrab3 | seashell | |
darkgrey | gray12 | gray95 | grey70 | lightgoldenrod4 | olivedrab4 | seashell1 | |
darkkhaki | gray13 | gray96 | grey71 | lightgoldenrodyellow | orange | seashell2 |
Brunsdon, C. and Comber, L., 2015. An introduction to R for spatial analysis and mapping. Sage.
Cleveland, W. S. (1979) Robust locally weighted regression and smoothing scatterplots. J. American Statistical Association 74, 829–836.
Fotheringham, A. S., Brunsdon, C., & Charlton, M. (2002). Geographically weighted regression: the analysis of spatially varying relationships. John Wiley & Sons.
Hurley, Catherine B. (2004) Clustering Visualisations of Multidimensional Data, Journal of Computational and Graphical Statistics, 13(4), 788-806.