a15

###Assignment 15.

In this assignment, we will be predicting forest percent canopy cover within 30 meter Landsat pixels using four predictor variables:

normalized difference vegetation index (NDVI).
brightness.
greenness.
wetness.

Setting up the workspace.

library(randomForest)

## Warning: package 'randomForest' was built under R version 4.0.5

## randomForest 4.6-14

## Type rfNews() to see new features/changes/bug fixes.

library(pROC)

## Warning: package 'pROC' was built under R version 4.0.5

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

library(raster)

## Warning: package 'raster' was built under R version 4.0.5

## Loading required package: sp

## Warning: package 'sp' was built under R version 4.0.4

library(rgdal)

## Warning: package 'rgdal' was built under R version 4.0.5

## rgdal: version: 1.5-23, (SVN revision 1121)
## Geospatial Data Abstraction Library extensions to R successfully loaded
## Loaded GDAL runtime: GDAL 3.2.1, released 2020/12/29
## Path to GDAL shared files: C:/Users/jmhp2/OneDrive/Documents/R/win-library/4.0/rgdal/gdal
## GDAL binary built with GEOS: TRUE 
## Loaded PROJ runtime: Rel. 7.2.1, January 1st, 2021, [PJ_VERSION: 721]
## Path to PROJ shared files: C:/Users/jmhp2/OneDrive/Documents/R/win-library/4.0/rgdal/proj
## PROJ CDN enabled: FALSE
## Linking to sp version:1.4-5
## To mute warnings of possible GDAL/OSR exportToProj4() degradation,
## use options("rgdal_show_exportToProj4_warnings"="none") before loading rgdal.
## Overwritten PROJ_LIB was C:/Users/jmhp2/OneDrive/Documents/R/win-library/4.0/rgdal/proj

library(tmap)

## Warning: package 'tmap' was built under R version 4.0.5

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.0.5

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:randomForest':
## 
##     margin

library(caret)

## Loading required package: lattice

library(GGally)

## Warning: package 'GGally' was built under R version 4.0.5

## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.0.5

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:raster':
## 
##     intersect, select, union

## The following object is masked from 'package:randomForest':
## 
##     combine

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(sf)

## Warning: package 'sf' was built under R version 4.0.5

## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1

library(Metrics)

## Warning: package 'Metrics' was built under R version 4.0.5

## 
## Attaching package: 'Metrics'

## The following objects are masked from 'package:caret':
## 
##     precision, recall

## The following object is masked from 'package:pROC':
## 
##     auc

library(car)

## Warning: package 'car' was built under R version 4.0.5

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

library(gvlma)
library(spdep)

## Warning: package 'spdep' was built under R version 4.0.5

## Loading required package: spData

## Warning: package 'spData' was built under R version 4.0.5

## To access larger datasets in this package, install the spDataLarge
## package with: `install.packages('spDataLarge',
## repos='https://nowosad.github.io/drat/', type='source')`

library(spgwr)

## Warning: package 'spgwr' was built under R version 4.0.5

## NOTE: This package does not constitute approval of GWR
## as a method of spatial analysis; see example(gwr)

library(ModelMetrics)

## 
## Attaching package: 'ModelMetrics'

## The following objects are masked from 'package:Metrics':
## 
##     auc, ce, logLoss, mae, mse, msle, precision, recall, rmse, rmsle

## The following objects are masked from 'package:caret':
## 
##     confusionMatrix, precision, recall, sensitivity, specificity

## The following object is masked from 'package:pROC':
## 
##     auc

## The following object is masked from 'package:base':
## 
##     kappa

library(kernlab)

## 
## Attaching package: 'kernlab'

## The following object is masked from 'package:ggplot2':
## 
##     alpha

## The following objects are masked from 'package:raster':
## 
##     buffer, rotated

training <- read.csv("C:/Users/jmhp2/Downloads/canopy_cover_data/canopy_cover_data/training.csv")
validation <- read.csv("C:/Users/jmhp2/Downloads/canopy_cover_data/canopy_cover_data/validation.csv")

Here we are creating four serperate scatter plots to visualize the relationship between percent canopy cover and the four predictor variables using ggplot2 and the training samples.

#PCC & ndvi 
ggplot(training, aes(x = pCC, y = ndvi))+
geom_point()+
ggtitle("Percent Canopy Cover & NDVI  Correlation.")+
labs(x="pCC", y="NDVI")

#PCC & Brightness 
ggplot(training, aes(x = pCC, y = Brightness))+
geom_point()+
ggtitle("Percent Canopy Cover & Brightness  Correlation.")+
labs(x="pCC", y="Brightness")

#PCC & Greeness
ggplot(training, aes(x = pCC, y = Greeness))+
geom_point()+
ggtitle("Percent Canopy Cover & Greeness  Correlation.")+
labs(x="pCC", y="Greeness")

#PCC & Wetness
ggplot(training, aes(x = pCC, y = Wetness))+
geom_point()+
ggtitle("Percent Canopy Cover & Wetness  Correlation.")+
labs(x="pCC", y="Wetness")

Now we are using the training data and caret to generate a SVM model (method = “svmRadial”) with a tuneLength of 10, and optimize relative to the RMSE metric.

train <- training %>% group_by(pCC) %>% sample_n(100, replace = TRUE)
val <- setdiff(train, validation)

set.seed(50)
trainctrl <- trainControl(method = "cv", number = 5, verboseIter = FALSE)

#```{r} set.seed(50) svm_model <- train(pCC ~ ndvi + Brightness + Greeness + Wetness, data=train, method = “svmRadial”, tuneLength = 10,preProcess = c(“center”, “scale”), trControl = trainctrl, metric=“RMSE”) #computer accpeted code, but would run on a constant loop and not output answer.


The predict function is being use along with the validation data, the predicted model is then used to calculated MSE and RMSE from the result.


```r
#svm.predict <-predict(svm.predict, val)

#svm_rmse <- rmse(val$pCC, svm.predict)

#svm_mse <- mse(val$pCC, svm.predict)
#would not kit past this point

Here we are raning the raster bands to “ndvi”, “Brightness”,“Greeness” & “Wetness”.

#names(pred) <- c("ndvi", "Brightness", "Greeness", "Wetness")
# will not knit this.

Now we re suing the predict fuction with the raster stack.

#predict_test <- predict(pred, svm_model, type="prob", index=2, na.rm=TRUE, progress="window", overwrite=TRUE, filename))
#this code will not run as it says it is "trying to get slot "file" from an object of a basic class ("character") with no slots", however the rest of the code below works

Here we are masking the result relative to the forest raster mask and proding a map output using tmap.

#raster_result <- raster("svm_model")
#masked_result <- mask*raster_result

#will not knit this part of code.

#tm_shape(masked_result)+
#tm_raster(style="cat", labels=c("ndvi", "Brightness", "Greeness", "Wetness"), palette = c("ndvi", "Brightness", "Greeness", "Wetness"), title="Percent Canopy Cover")+
#tm_layout(legend.outside = TRUE)+
#tm_layout(title = "SVM Model", title.size = 1.5)
#WIll not knit this part of code.

Questions.

Provide a discussion of each of the four scatter plots. Do you see a relationship between percent canopy cover and each specific predictor variable? Describe the relationship. Which variables appear to have the strongest relationship with percent canopy cover?

For the first scatter plot containing the variables pCC & ndvi, has a very strong positive correlation between these two variables. The greater the ndvi, the greater the pCC was.
For the second scatter plot containing the variables pCC & Brightness, has a strong positive correlation. The greater the PCC was, there greater the spread of Brightness points were between 2000 - 3500.
For the third scatter plot containing the variables pCC & Greeness, has a strong positive correlation like Brightness does. The greater the PCC, the more data points where found between 1200 - 2500 of Greeness.
For the fourth scatter plot containing the variables pCC & Wetness, has a very strong positive correlation like ndvi does. The greater the PCC, the greater amount of values wear at or greater than 0 on the Wetness scale.

Of all of the variables, I believe that ndvi has the strongest relationship/correlation with pCC.

What is the reported MSE? What are the units of MSE for this prediction?

Answer: 145.79 is the mse, the units in this occation would be a percentage.

What is the reported RMSE? What are the units of RMSE for this prediction?

Answer: 12.07 is the rmse, the units in this occation would be a percentage like mse.

Provide an interpretation of RMSE? Is this a strong prediction or is there a lot of error or uncertainty?

Answer: This model is returning a very high level of uncertainly, far more than desired.

a15

Jacob Hartwell

4/29/2021