library(foreign)
options( java.parameters = "-Xmx12g" )
options(mc.cores=4)
library(RWeka)
library(tidyr)
library(dplyr)
library(purrr)
#library(pracma) #Savitzky_Golnay filter, savgol() - unused
library(knitr) #kable
library(caret) # cross-validate R's lm()
# filename: advweka_soilsamples.Rmd
Can Total Organic Carbon of soil samples be predicted, quickly and easily, from Near-IR spectroscopy data?
The following problem set -the Dataset selection and question design- was created by a researcher from New Zealand. It was an exercise to be completed by students of a Massive Open Online Course, “Advanced Data Mining with Weka”.
Application: Infrared data from soil samples
by Geoff Holmes
Department of Computer Science, University of Waikato, New Zealand
My (K.B.) Objectives were:
kable())We will examine a soil dataset that is described here.
It originates in Kenya and is supplied courtesy of the World Agroforestry Centre (ICRAF) and ISRIC, the International Soil Reference and Information Centre.
The dataset has been converted into an ARFF file called org_c_n. Load it into the Weka Explorer. The instances represent 4439 samples of soil that have been processed by a NIR (near-infrared) device. Most of the 220 attributes are wave bands, and contain the reflectance values produced by the device. For our purposes the dataset should contain only the wave bands plus the class we are interested in, and for this activity we will concentrate on organic carbon. Remove the unnecessary attributes from the dataset.
samp <- read.arff(file="org_c_n.arff")
samp <- samp %>%
select( -Batch_Labid, -ISO)
Answer: 218
First and last 5 rows of the dataset:
glimpse(samp[,c(1:5, ((ncol(samp) - 5):(ncol(samp))))])
## Observations: 4,439
## Variables: 11
## $ W350 <dbl> 0.08727, 0.09176, 0.08909, 0.09495, 0.09124, 0...
## $ W360 <dbl> 0.07229, 0.07082, 0.06935, 0.08900, 0.06571, 0...
## $ W370 <dbl> 0.06788, 0.06902, 0.06966, 0.08105, 0.06595, 0...
## $ W380 <dbl> 0.07128, 0.07013, 0.06820, 0.08351, 0.06622, 0...
## $ W390 <dbl> 0.07091, 0.07222, 0.07005, 0.08543, 0.06628, 0...
## $ W2470 <dbl> 0.3473, 0.3241, 0.2760, 0.2812, 0.3233, 0.3279...
## $ W2480 <dbl> 0.3393, 0.3262, 0.2687, 0.2807, 0.3206, 0.3255...
## $ W2490 <dbl> 0.3368, 0.3264, 0.2799, 0.2879, 0.3260, 0.3287...
## $ W2500 <dbl> 0.3428, 0.3309, 0.2756, 0.2903, 0.3245, 0.3240...
## $ OrganicNitrogen <dbl> 0.09, 0.06, 0.06, 0.05, NA, NA, NA, NA, 0.10, ...
## $ OrganicCarbon <dbl> 0.99, 0.65, 0.46, 0.47, 0.19, 0.15, 0.13, NA, ...
There is still a problem with this dataset. If you click on the class attribute, OrganicCarbon, you will see that 12% of the values are missing.
samp %>% select(OrganicNitrogen, OrganicCarbon) %>% summary()
## OrganicNitrogen OrganicCarbon
## Min. :0.0 Min. : 0.0
## 1st Qu.:0.0 1st Qu.: 0.2
## Median :0.1 Median : 0.5
## Mean :0.1 Mean : 1.3
## 3rd Qu.:0.1 3rd Qu.: 1.2
## Max. :7.0 Max. :62.8
## NA's :1555 NA's :528
These are samples for which there was no wet chemistry reference, and are useless for our purpose. Use an appropriate Weka instance filter to remove all instances whose class attribute is missing.
samp <- samp %>%
filter(!is.na(OrganicCarbon) )
Answer: 3911
samp %>% select(OrganicNitrogen, OrganicCarbon) %>% summary()
## OrganicNitrogen OrganicCarbon
## Min. :0.0 Min. : 0.00
## 1st Qu.:0.0 1st Qu.: 0.22
## Median :0.1 Median : 0.49
## Mean :0.1 Mean : 1.27
## 3rd Qu.:0.1 3rd Qu.: 1.20
## Max. :7.0 Max. :62.78
## NA's :1041
samp <- samp %>%
select(-OrganicNitrogen)
# too simple, not cross-validated:
fit0 <- samp %>% lm(data=., OrganicCarbon ~ .) %>% summary %>% .$r.squared
# 10 fold cross-validation with caret
train_control <- trainControl(method="cv", number=10)
# fix the parameters of the algorithm ???
#grid <- expand.grid(.fL=c(0), .usekernel=c(FALSE))
fit1 <- train(OrganicCarbon~., data=samp, trControl=train_control, method="lm")
#fit1.summ <- summary(fit1)
fit2 <-samp %>%
RWeka::LinearRegression(data=., OrganicCarbon ~ .) %>%
evaluate_Weka_classifier(.,
numFolds = 10, complexity = FALSE,
seed = 1, class = TRUE)
fit2.summ <- summary(fit2)
Answer:
lm() regression, not cross validated: r² = 0.5222lm() regression, 10-fold cross validated wiith caret package: r² = 0.4319Next we investigate the performance of some more sophisticated classifiers: M5P, RepTree and RandomForest. (There are other possibilities, but they are all slower.) Run these three with default settings, and record the resulting correlation coefficients.
weka_summary <- function(classifier, dfr){
learner <- make_Weka_classifier(name=classifier)
fit <- dfr %>%
learner(data=., OrganicCarbon ~ .)
# use 10-fold cross validation
e <- evaluate_Weka_classifier(fit,
numFolds = 10, complexity = FALSE,
seed = 1, class = TRUE)
e
}
lst <- list(classifier=list(
"weka.classifiers.functions.LinearRegression",
"weka.classifiers.trees.M5P",
"weka.classifiers.trees.REPTree",
"weka.classifiers.trees.RandomForest"))
summaries <- pmap(.l=lst, .f=weka_summary, dfr=samp)
names(summaries) <-lst[["classifier"]]
summaries
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.3951
## Mean absolute error 1.1959
## Root mean squared error 2.9421
## Relative absolute error 93.3773 %
## Root relative squared error 91.8547 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6026
## Mean absolute error 0.9266
## Root mean squared error 2.5573
## Relative absolute error 72.3515 %
## Root relative squared error 79.8397 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6526
## Mean absolute error 0.9654
## Root mean squared error 2.4264
## Relative absolute error 75.3802 %
## Root relative squared error 75.7531 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6934
## Mean absolute error 0.8438
## Root mean squared error 2.3088
## Relative absolute error 65.8845 %
## Root relative squared error 72.0818 %
## Total Number of Instances 3911
r2_max_idx <- which.max(
map(summaries, function(x) x$details[1])
)
ans <- list(learner=lst[["classifier"]][[r2_max_idx]],corr.coeff= summaries[[r2_max_idx]]$details[[1]])
Answer: weka.classifiers.trees.RandomForest, 0.6934
We now examine the effect of preprocessing the data, using the results of these classifiers as a benchmark. We investigate three commonly used techniques for NIR data: 1. downsampling, 2. row normalisation 3. a signal smoothing method called Savitzky-Golay.
Downsampling is a simple method that can accelerate processing with little loss in accuracy (this may also allow slower classification methods to be applied without too much delay).
By hand, remove every second attribute, W350, W370, … W2490. The resulting dataset will have 109 attributes including the class (you may wish to save it).
Run the benchmark classifiers (again with default settings), along with 10-fold cross-validation. You will probably notice that they are faster than before. We will continue to use the correlation coefficient as our measure of success.
bad_cols <- seq(350,2490,20) %>% paste0("W", .)
bad_cols <- which(colnames(samp) %in% bad_cols)
samp_downsampled <- samp %>%
select(-bad_cols)
summaries_downs <- pmap(.l=lst, .f=weka_summary, dfr=samp_downsampled)
names(summaries_downs) <-lst[["classifier"]]
summaries_downs
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.4006
## Mean absolute error 1.1931
## Root mean squared error 2.9344
## Relative absolute error 93.1619 %
## Root relative squared error 91.6136 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6301
## Mean absolute error 0.8939
## Root mean squared error 2.4865
## Relative absolute error 69.8017 %
## Root relative squared error 77.6304 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6541
## Mean absolute error 0.9551
## Root mean squared error 2.4229
## Relative absolute error 74.5737 %
## Root relative squared error 75.6454 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6941
## Mean absolute error 0.8442
## Root mean squared error 2.3056
## Relative absolute error 65.9213 %
## Root relative squared error 71.9807 %
## Total Number of Instances 3911
r2_min_idx <- which.min(
map(summaries_downs, function(x) x$details[1])
)
(ans_downs <- list(learner=lst[["classifier"]][[r2_min_idx]],corr.coeff= summaries_downs[[r2_min_idx]]$details[[1]]))
## $learner
## [1] "weka.classifiers.functions.LinearRegression"
##
## $corr.coeff
## [1] 0.4006
Downsampling has improved both speed and accuracy for all these classifiers. Let’s keep going: make the dataset half the size again! Construct a new dataset with 55 attributes: the class and wavebands W380, W420, W460, … W2500. Run the benchmark again.
bad_cols_2 <- seq(380,2500,40) %>% paste0("W", .)
bad_cols_2 <- which(colnames(samp_downsampled) %in% bad_cols_2)
samp_downsampled_2 <- samp_downsampled %>%
select(-bad_cols_2)
summaries_downs_2 <- pmap(.l=lst, .f=weka_summary, dfr=samp_downsampled_2)
names(summaries_downs_2) <-lst[["classifier"]]
summaries_downs_2
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.3855
## Mean absolute error 1.1889
## Root mean squared error 2.9552
## Relative absolute error 92.8312 %
## Root relative squared error 92.2639 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.5776
## Mean absolute error 0.9249
## Root mean squared error 2.6159
## Relative absolute error 72.2181 %
## Root relative squared error 81.6698 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6151
## Mean absolute error 0.9775
## Root mean squared error 2.525
## Relative absolute error 76.3258 %
## Root relative squared error 78.8329 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6928
## Mean absolute error 0.8491
## Root mean squared error 2.311
## Relative absolute error 66.2989 %
## Root relative squared error 72.1519 %
## Total Number of Instances 3911
r2_downs <- map2_df(summaries_downs_2, summaries_downs, function(x, y) data.frame( r2_downsampled_2=x$details[1] ,r2_downsampled_1= y$details[1], diff=x$details[1] - y$details[1]))
r2_max_idx <- which.max(r2_downs$diff)
ans_downs_2 <- data.frame(learner=lst[["classifier"]][[r2_max_idx]],corr.coeff= summaries_downs_2[[r2_max_idx]]$details[[1]])
kable(cbind(names=names(summaries_downs_2),map2_df(summaries_downs,summaries_downs_2, function(x,y) data.frame( r2_before=x$details[[1]], r2_after=y$details[[1]], r2_diff=y$details[[1]] - x$details[[1]]))),
caption="Comparison of 2 downsampling steps, corr. coeff r²")
| names | r2_before | r2_after | r2_diff |
|---|---|---|---|
| weka.classifiers.functions.LinearRegression | 0.4006 | 0.3855 | -0.0150 |
| weka.classifiers.trees.M5P | 0.6301 | 0.5776 | -0.0526 |
| weka.classifiers.trees.REPTree | 0.6541 | 0.6151 | -0.0390 |
| weka.classifiers.trees.RandomForest | 0.6941 | 0.6928 | -0.0014 |
Row normalization improves results for two of the methods, compared to the original (non-downsampled) result. Which method gains most?
samp_norm <- read.arff(file="org_c_no_missing-rn.arff")
summaries_norm <- pmap(.l=lst, .f=weka_summary, dfr=samp_norm)
names(summaries_norm) <-lst[["classifier"]]
summaries_norm
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.513
## Mean absolute error 1.148
## Root mean squared error 2.7498
## Relative absolute error 89.6384 %
## Root relative squared error 85.8504 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.703
## Mean absolute error 0.8034
## Root mean squared error 2.2891
## Relative absolute error 62.7346 %
## Root relative squared error 71.4686 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.4408
## Mean absolute error 1.0717
## Root mean squared error 2.9172
## Relative absolute error 83.6822 %
## Root relative squared error 91.0777 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.633
## Mean absolute error 0.8185
## Root mean squared error 2.4872
## Relative absolute error 63.9111 %
## Root relative squared error 77.6511 %
## Total Number of Instances 3911
r2_max_idx <- which.max(
map(summaries_norm, function(x) x$details[1] ))
ans_norm_diffs <- cbind(algorithm=names(summaries_norm),map2_df(summaries_norm,summaries, function(x,y) data.frame( r2_orig=y$details[[1]], r2_normalized=x$details[[1]], difference=x$details[[1]] - y$details[[1]])))
kable(ans_norm_diffs, caption = "Comparison unnormalized vs normalized data, corr.coeff r²")
| algorithm | r2_orig | r2_normalized | difference |
|---|---|---|---|
| weka.classifiers.functions.LinearRegression | 0.3951 | 0.5130 | 0.1179 |
| weka.classifiers.trees.M5P | 0.6026 | 0.7030 | 0.1004 |
| weka.classifiers.trees.REPTree | 0.6526 | 0.4408 | -0.2118 |
| weka.classifiers.trees.RandomForest | 0.6934 | 0.6330 | -0.0604 |
r2_max_idx <- which.max(ans_norm_diffs$difference)
(ans_norm <- list(learner=lst[["classifier"]][[r2_max_idx]],
corr.coeff= summaries_norm[[r2_max_idx]]$details[[1]]))
## $learner
## [1] "weka.classifiers.functions.LinearRegression"
##
## $corr.coeff
## [1] 0.513
Answer: weka.classifiers.functions.LinearRegression, 0.513
samp_savgol <- map_df(samp_norm %>% select(-OrganicCarbon), savgol, fl=7, forder = 2, dorder = 0)
samp_savgol <- cbind(samp_savgol, samp_norm$OrganicCarbon)
samp_savgol_7 <- read.arff(file="org_c_no_missing-sg7.arff")
samp_savgol_11 <- read.arff(file="org_c_no_missing-sg11.arff")
summaries_savgol_7 <- pmap(.l=lst, .f=weka_summary, dfr=samp_savgol_7)
names(summaries_savgol_7) <-lst[["classifier"]]
summaries_savgol_7
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6436
## Mean absolute error 1.1741
## Root mean squared error 2.4534
## Relative absolute error 91.6766 %
## Root relative squared error 76.5982 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.8455
## Mean absolute error 0.6531
## Root mean squared error 1.7293
## Relative absolute error 50.9957 %
## Root relative squared error 53.9885 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.7649
## Mean absolute error 1.0013
## Root mean squared error 2.0974
## Relative absolute error 78.1868 %
## Root relative squared error 65.4822 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.8533
## Mean absolute error 0.6248
## Root mean squared error 1.7402
## Relative absolute error 48.7829 %
## Root relative squared error 54.3292 %
## Total Number of Instances 3911
summaries_savgol_11 <- pmap(.l=lst, .f=weka_summary, dfr=samp_savgol_11)
names(summaries_savgol_11) <-lst[["classifier"]]
ans_savgol_max <- unlist(map(c( summaries_savgol_7, summaries_savgol_11), function(x) x$details[[1]]))
ans_savgol_max_df = data.frame(classifier=names(ans_savgol_max), correl.coef=as.vector(ans_savgol_max), window_size=paste0("size ", c(rep("7", length(summaries_savgol_7)), rep("11", length(summaries_savgol_11)))))
# Output results as table
kable(xtabs(correl.coef ~ classifier + window_size, ans_savgol_max_df), caption = "Correlation coefficients for Savitzky-Golay Window Sizes")
| size 11 | size 7 | |
|---|---|---|
| weka.classifiers.functions.LinearRegression | 0.5771 | 0.6436 |
| weka.classifiers.trees.M5P | 0.8262 | 0.8455 |
| weka.classifiers.trees.RandomForest | 0.8499 | 0.8533 |
| weka.classifiers.trees.REPTree | 0.6936 | 0.7649 |
# More comprehensive:
summaries_savgol_11
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.5771
## Mean absolute error 1.201
## Root mean squared error 2.6163
## Relative absolute error 93.7764 %
## Root relative squared error 81.6833 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.8262
## Mean absolute error 0.6729
## Root mean squared error 1.8072
## Relative absolute error 52.5421 %
## Root relative squared error 56.422 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6936
## Mean absolute error 0.9145
## Root mean squared error 2.3122
## Relative absolute error 71.4095 %
## Root relative squared error 72.1871 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.8499
## Mean absolute error 0.6315
## Root mean squared error 1.7573
## Relative absolute error 49.3077 %
## Root relative squared error 54.8645 %
## Total Number of Instances 3911
An example. Check yourself which is bigger:
(ans_savgol_diffs <- unlist(map2(summaries_savgol_11, summaries_savgol_7, function(x,y) list( savgol7=y$details[[1]], savgol11=x$details[[1]]))))[1:2]
## weka.classifiers.functions.LinearRegression.savgol7
## 0.6436
## weka.classifiers.functions.LinearRegression.savgol11
## 0.5771
We have seen that preprocessing can make a big difference to the performance of a classifier. So far, three different techniques have been applied independently. What if we combine them? We downsampled the original file by removing every second attribute, then applied Savitzky-Golay, then row normalization, to produce org_c_no_missing-d2sg7rn. Load this dataset and re-run the benchmark.
samp_savgol_7_norm <- read.arff(file="org_c_no_missing-d2sg7rn.arff")
summaries_savgol_7_norm <- pmap(.l=lst, .f=weka_summary, dfr=samp_savgol_7_norm)
names(summaries_savgol_7_norm) <-lst[["classifier"]]
summaries_savgol_7_norm
## $weka.classifiers.functions.LinearRegression
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.6734
## Mean absolute error 1.0782
## Root mean squared error 2.3675
## Relative absolute error 84.1895 %
## Root relative squared error 73.9135 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.M5P
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.8216
## Mean absolute error 0.6938
## Root mean squared error 1.8264
## Relative absolute error 54.1761 %
## Root relative squared error 57.02 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.REPTree
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.4821
## Mean absolute error 0.976
## Root mean squared error 2.8614
## Relative absolute error 76.2103 %
## Root relative squared error 89.3357 %
## Total Number of Instances 3911
##
## $weka.classifiers.trees.RandomForest
## === 10 Fold Cross Validation ===
##
## === Summary ===
##
## Correlation coefficient 0.8336
## Mean absolute error 0.6739
## Root mean squared error 1.8818
## Relative absolute error 52.623 %
## Root relative squared error 58.7518 %
## Total Number of Instances 3911
Note that we have not examined the effects of parameter changes in either the classifiers or the preprocessing techniques (except the Savitzky-Golay window size).
You can perform much more experimentation in search of a good model! One problem faced in all application development is knowing when a result is good enough to be used in practice.
In our experience, the correlation coefficient needs to increase to 0.95-0.99 for this problem.
Our best result in this activity is 0.8336, still a long way off. Another important factor that we have not explored is the effect of outliers in regression problems. Filtering out outlier instances can make a huge difference to performance.
Note that this RWeka package uses RWekajars 3.9.0.
sessionInfo()
## R version 3.2.2 (2015-08-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.1 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=de_DE.UTF-8
## [9] LC_ADDRESS=de_DE.UTF-8 LC_TELEPHONE=de_DE.UTF-8
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=de_DE.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] caret_6.0-70 ggplot2_2.1.0 lattice_0.20-33 knitr_1.13
## [5] purrr_0.2.2 dplyr_0.5.0 tidyr_0.5.1 RWeka_0.4-29
## [9] foreign_0.8-66 colorout_1.1-2
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.6 highr_0.6 nloptr_1.0.4
## [4] formatR_1.4 plyr_1.8.4 iterators_1.0.8
## [7] tools_3.2.2 RWekajars_3.9.0-1 lme4_1.1-12
## [10] digest_0.6.9 evaluate_0.9 tibble_1.1
## [13] gtable_0.2.0 nlme_3.1-128 mgcv_1.8-12
## [16] Matrix_1.2-6 foreach_1.4.3 DBI_0.4-1
## [19] parallel_3.2.2 yaml_2.1.13 SparseM_1.7
## [22] rJava_0.9-8 stringr_1.0.0 MatrixModels_0.4-1
## [25] stats4_3.2.2 grid_3.2.2 nnet_7.3-12
## [28] R6_2.1.2 rmarkdown_1.0 minqa_1.2.4
## [31] reshape2_1.4.1 car_2.1-2 magrittr_1.5
## [34] splines_3.2.2 scales_0.4.0 codetools_0.2-14
## [37] htmltools_0.3.5 MASS_7.3-45 assertthat_0.1
## [40] pbkrtest_0.4-4 colorspace_1.2-6 quantreg_5.26
## [43] stringi_1.1.1 lazyeval_0.2.0 munsell_0.4.3