###Author and date need to be changed; Name - student name and date; done when a new markdown is started. #students only need to submit in their html the answers and code for the questions they need to complete/finish.. 31b8e172-b470-440e-83d8-e6b185028602:dAB5AHAAZQA6AFoAUQBBAHgAQQBEAGcAQQBNAFEAQQA1AEEARABZAEEATQBBAEEAMQBBAEMAMABBAE0AQQBCAGgAQQBHAE0AQQBaAEEAQQB0AEEARABRAEEAWgBnAEIAaABBAEcAVQBBAEwAUQBBADQAQQBHAEkAQQBPAFEAQQA1AEEAQwAwAEEATwBRAEIAagBBAEQARQBBAFkAZwBBADMAQQBHAEkAQQBaAEEAQQAzAEEARwBNAEEATQBBAEIAbQBBAEQARQBBAAoAcABvAHMAaQB0AGkAbwBuADoATQBnAEEAeQBBAEQAQQBBAAoAcAByAGUAZgBpAHgAOgAKAHMAbwB1AHIAYwBlADoATABRAEEAdABBAEMAMABBAEMAZwBCADAAQQBHAGsAQQBkAEEAQgBzAEEARwBVAEEATwBnAEEAZwBBAEMASQBBAFIAdwBCAEgAQQBGAEkAQQBNAGcAQQAzAEEARABZAEEASQBBAEIATQBBAEcARQBBAFkAZwBBAGcAQQBEAEUAQQBJAEEAQgBRAEEARwBFAEEAYwBnAEIAMABBAEMAQQBBAE0AdwBBAGcAQQBGAFUAQQBiAGcAQgBrAEEARwBVAEEAYwBnAEIAegBBAEgAUQBBAFkAUQBCAHUAQQBHAFEAQQBhAFEAQgB1AEEARwBjAEEASQBBAEIAMABBAEcAZwBBAFoAUQBBAGcAQQBFAGMAQQBSAFEAQgBQAEEAQwBBAEEAYQBRAEIAdQBBAEMAQQBBAFIAdwBCAGwAQQBHADgAQQBjAHcAQgAwAEEARwBFAEEAZABBAEIAcABBAEgATQBBAGQAQQBCAHAAQQBHAE0AQQBjAHcAQQBpAEEAQQBvAEEAWQBRAEIAMQBBAEgAUQBBAGEAQQBCAHYAQQBIAEkAQQBPAGcAQQBnAEEAQwBJAEEAUQBRAEIAcwBBAEcAVQBBAGUAQQBCAHAAQQBIAE0AQQBJAGcAQQBLAEEARwBRAEEAWQBRAEIAMABBAEcAVQBBAE8AZwBBAGcAQQBDAEkAQQBNAGcAQQB3AEEARABJAEEATgBBAEEAdABBAEQAQQBBAE4AUQBBAHQAQQBEAEEAQQBOAFEAQQBpAEEAQQBvAEEAYgB3AEIAMQBBAEgAUQBBAGMAQQBCADEAQQBIAFEAQQBPAGcAQQBnAEEARwBnAEEAZABBAEIAdABBAEcAdwBBAFgAdwBCAGsAQQBHADgAQQBZAHcAQgAxAEEARwAwAEEAWgBRAEIAdQBBAEgAUQBBAEMAZwBBAHQAQQBDADAAQQBMAFEAQQA9AAoAcwB1AGYAZgBpAHgAOgA=:31b8e172-b470-440e-83d8-e6b185028602

Info about R Markdown

R Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Part 3A. Non-spatial Statistics

Import Data

If your R Markdown is NOT in the same folder as your data, please set your working directory using setwd() first. Here is an example setwd("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1"). You will need to change the code to reflect your personal directory. Otherwise, you may skip this step and continue to import data by reading your WaterRetainingFacilities.csv file. You may view the data by clicking the WaterRetaining in the Environment window or type code View(DataSet). Click on the little green triangle on the right to run current chunk.

  • For testing where there is any error in any line of code, you should run the script line by line and check how they work carefully.
    • Any errors in your code will be displayed in the bottom left Console window of RStudio, which will help you pinpoint where the error lies.
    • If there is an error, try selecting each line of code individually and running the script one by one. This will tell you which line of code is causing the error.
WaterRetaining <- read.csv("WaterRetainingFacilities.csv", sep = ',', header = TRUE)

Descriptive Statistics of ImpoundmentVolume

Now that we have data imported, we are ready to calculate median, mean, range and quantiles of the Impoundment Volume in cubic meters (m^3)

summary(WaterRetaining$ImpoundmentVolume)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.02    0.20    0.96   17.85    9.20  385.00

Next, we will calculate the standard deviation of Impoundment Volume in cubic meters (m^3). If there are missing values in your dataset, simply add na.rm = TRUE to your code to tell the R to remove NAs in the calculation. Like this: sd(WaterRetaining$ImpoundmentVolume, na.rm=TRUE). Since there isn’t a missing value in this dataset, this line is not necessary here.

sd(WaterRetaining$ImpoundmentVolume)
## [1] 44.7084

Descriptive Statistics of StorageLevel

Question 6: Now it is your turn to write the code to calculate the median, mean, range, interquartile range, and standard deviation of the Storage Level; review the variable information to find out the units. (2 marks)

#TODO
summary(WaterRetaining$StorageLevel)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   10.50   17.40   37.39   43.50  284.00
sd(WaterRetaining$StorageLevel)
## [1] 45.47832

Visualize

Visualize the spread of the Impoundment volume dataset, and answer Question 7: According to the boxplot, would you use median or mean as your central tendency measure? Explain and justify your choice? (2 marks)

I would use the median as my central tendency measure as we can see that the data is skewed due to outliers as a result that would impact the mean:

boxplot(WaterRetaining$ImpoundmentVolume)

Part 3B. Mapping Mean and Weighted Mean Centre

Visualize the locations of the water retaining facilities by Dam Height and the mean and weighted mean centres.

plot(WaterRetaining$easting_m, WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switherland's Water Retaining Facilities", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
n<-nrow(WaterRetaining[1])
mc_x<-sum(WaterRetaining$easting_m)/n
mc_y<-sum(WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$easting_m))/sum(WaterRetaining$DamHeight)
wmc_y<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$northing_m))/sum(WaterRetaining$DamHeight)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))

Notice that after running this script, this script shows all of the dams even for facilities that are non-hydroelectric. Let’s filter to keep only the FacilityAim that is for “Hydroelectricity”.

filtered_WaterRetaining <- subset(WaterRetaining, (FacilityAim == "Hydroelectricity"))

Visualize the locations of the Water Retaining Facitilites whose primary function is to produce hydroelectricity include their mean and weighted mean centres on the filtered data. Please adjust the range of x and y axes as well as the symbols to best represent the data.

#TODO
plot(filtered_WaterRetaining$easting_m, filtered_WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switzerland's Hydroelectricity Facilities", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
n<-nrow(filtered_WaterRetaining[1])
mc_x<-sum(filtered_WaterRetaining$easting_m)/n
mc_y<-sum(filtered_WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(filtered_WaterRetaining$DamHeight*filtered_WaterRetaining$easting_m))/sum(filtered_WaterRetaining$DamHeight)
wmc_y<-sum(as.numeric(filtered_WaterRetaining$DamHeight*filtered_WaterRetaining$northing_m))/sum(filtered_WaterRetaining$DamHeight)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))

Question 8: Produce comments (i.e., a detailed explanation for each parameter in the code) that describes the execution of the statements given to you in the WaterRetaining example. Make sure you provide a description for each set of statements and organize your answer in a manner similar to the following example. (16 marks)

Example: setwd ("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")

The setwd() command tells R the current folder to look into when searching for data, saving outputs etc.

Explain what is occurring in each of the following lines of code (2 marks each):

  1. WaterRetaining <- read.csv("WaterRetainingFacilities.csv", sep = ',', header = TRUE)
  2. plot(WaterRetaining$easting_m, WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switherland's Water Retaining Facilities", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
  3. n<-nrow(WaterRetaining[1])
  4. mc_x<-sum(WaterRetaining$easting_m)/n mc_y<-sum(WaterRetaining$northing_m)/n
  5. points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
  6. wmc_x<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$easting_m))/sum(WaterRetaining$DamHeight) wmc_y<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$northing_m))/sum(WaterRetaining$DamHeight)
  7. legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
  8. install.packages("rmarkdown") library(rmarkdown)

a) - This code is reading data and assigning it to the value WaterRetaining. The sep = “,” indicates the data is seperated by a comma, and header=TRUE lets the code know that there are headers with column names

b) - The plot function indicates that we are looking to create a scatter plot through the $ sign we are specifying the exact data we want from the dataframe which are the coordinates. Through xlab and ylab we can label the name of the axis, and we are able to control the range of the axis by indicating the min and max values through the xlim and ylim function. The main function is allowing to select a title

c) - In this function we are assigning the total number of rows in the dataframe and assigning it to n

d) - In this code we are calculating the sum of the northing and easting coordinates from the dataframe and dividing them by the total number of rows from the data frame in order to obtain the mean for the northing and easting coordinates

e)- This code is putting points on the scatterplot for the mean values for the northing and easting coordinates. It is assigning the plots to be blue, you can choose the shape and size through pch and cex.”p” is representing a single point.

f)- The as.,numeric function in this code makes sure that the values we obtain from our code are numeric. We are solving for the weighted mean where we multiply the dam height with its reflected coordinates and then divide it by the sum of the total height values in the dataframe. We assign this code to the values of wmc_x and wmc_y

g)- In this code we are looking to add a legend to the scatterplot, we are specifying the location in the top q right, the names of the values, and the shape and colour which we can see are blue and red

h) - In this code we are installing the R markdown package and loading it onto our current file:

Question 9: What do the mean and weighted mean center tell you about the distribution of the filtered Water Retaining Facility locations and their dam heights? Please explain. (3 marks)

Understanding the weighted mean center we are basically taking into account the dam height in the spatial distribution of the filtered water Retaining Facilities. The mean center informs us about the average location of the facilities with no influence from other values, if this differs from the weighted mean center quite a lot this can tell us that the dam height influences the distribution of the water retaining facilities and the spatial center.:

Part 3C. Create and Visualize Subsets

Create Subsets

There are different sources of energy. Do they have the same mean centre? Imagine we are interested in how the distribution of Hydroelectric Water Retaining Facilties with sm_DamHeights (Dam Heights less than the median) differ from those with lrg_DamHeights (Dam Heights greater than or equal to the median). To do this, we will split the WaterRetaining dataset into two subsets. One subset will only contain Dam Heights using sm_DamHeights as the primary source and the other will contain only those with lrg_DamHeights as the primary source.

#TODO:find the median using the summary function
summary(filtered_WaterRetaining)
##  ReservoirName        DamType          FacilityAim          DamHeight     
##  Length:196         Length:196         Length:196         Min.   :  2.00  
##  Class :character   Class :character   Class :character   1st Qu.: 15.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 23.60  
##                                                           Mean   : 43.86  
##                                                           3rd Qu.: 53.25  
##                                                           Max.   :285.00  
##    CrestLevel      CrestLength      ImpoundmentVolume ImpoundmentLevel
##  Min.   : 255.0   Min.   :  20.00   Min.   :  0.02    Min.   : 254.2  
##  1st Qu.: 690.5   1st Qu.:  81.12   1st Qu.:  0.24    1st Qu.: 685.6  
##  Median :1306.7   Median : 145.00   Median :  1.60    Median :1304.9  
##  Mean   :1279.9   Mean   : 232.05   Mean   : 20.45    Mean   :1278.0  
##  3rd Qu.:1820.4   3rd Qu.: 325.50   3rd Qu.: 17.34    3rd Qu.:1819.5  
##  Max.   :2476.0   Max.   :1025.00   Max.   :385.00    Max.   :2474.6  
##   StorageLevel     Construction  StartSuperVision     DamName         
##  Min.   :  2.51   Min.   :1872   Length:196         Length:196        
##  1st Qu.: 10.50   1st Qu.:1937   Class :character   Class :character  
##  Median : 18.35   Median :1957   Mode  :character   Mode  :character  
##  Mean   : 40.20   Mean   :1952                                        
##  3rd Qu.: 56.50   3rd Qu.:1965                                        
##  Max.   :284.00   Max.   :2022                                        
##    easting_m         northing_m     
##  Min.   :2487028   Min.   :1086090  
##  1st Qu.:2616057   1st Qu.:1134542  
##  Median :2679456   Median :1162090  
##  Mean   :2669908   Mean   :1169896  
##  3rd Qu.:2722481   3rd Qu.:1198310  
##  Max.   :2831511   Max.   :1284130
sm_DamHeights<- subset(filtered_WaterRetaining, (filtered_WaterRetaining$DamHeight < 23.60))

lrg_DamHeights <- subset(filtered_WaterRetaining, (filtered_WaterRetaining$DamHeight >= 23.60))
#TODO lrg_DamHeights

Once created, the subset can be viewed in the Console by calling the object (sm_DamHeights) using View(sm_Damheights). If you want to know the number of Small Dam Heights, you can run: nrow(sm_DamHeights[1]).

Visualize Subsets

Question 10: Show the Switherland Water Retaining Facilities scatter plot using the subset data that you created. Remember to overlay the subsets on the original data to see the distribution of the subset. Use a different color for the subsets and the original dataset. Also plot the mean centres for the entire dataset and the subsets. Be sure to include a legend, axis titles (with units), and a main title. Include your name and student number in brackets at the end of your main title. Please do not show irrelevant information on the graph. Please also type your code in the code chunk below. (10 marks)

#TODO
plot(WaterRetaining$easting_m, WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switzerland's Water Retaining Facilities, (Amroop Bains, 1008063863)", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))

points(sm_DamHeights$easting_m, sm_DamHeights$northing_m, col = "green")
points(lrg_DamHeights$easting_m, sm_DamHeights$northing_m, col = "orange")

n<-nrow(filtered_WaterRetaining[1])
mc_x<-sum(filtered_WaterRetaining$easting_m)/n
mc_y<-sum(filtered_WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")

n<-nrow(sm_DamHeights[1])
mc_x<-sum(sm_DamHeights$easting_m)/n
mc_y<-sum(sm_DamHeights$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="green")

n<-nrow(lrg_DamHeights[1])
mc_x<-sum(lrg_DamHeights$easting_m)/n
mc_y<-sum(lrg_DamHeights$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col='orange')

legend("topright", legend = c("Small Dam Heights", "Large Dam Heights", "Mean Centre", "Mean Center Small Dam Heights", "Mean Center Large Dam Heights"), pch = c(1,1,15,15,15), col = c("green", "orange", "blue","green", "orange"))

Part 3D. Dispersion of Subsets

Standard Deviation of Subsets

So far, we have measured the central tendency of the spatial data. How about dispersion? Standard deviation is a measure of dispersion that can be used to assess the distribution of spatial data. To calculate the orthogonal dispersion (east-west, north-south) associated with filtered_WaterRetaining dataset, we will use sd() command applied on easting and northing, respectively. Please do the same for the subsets in Question 11.

sd(filtered_WaterRetaining$easting_m)
## [1] 70065.49
sd(filtered_WaterRetaining$northing_m)
## [1] 48814.57
sd(sm_DamHeights$easting_m)
## [1] 69253.81
sd(sm_DamHeights$northing_m)
## [1] 54962.51
sd(lrg_DamHeights$easting_m)
## [1] 70923.38
sd(lrg_DamHeights$northing_m)
## [1] 37380.44

Question 11: Please show your code as well as the calculated standard deviation in your R Markdown. Provide a concise conclusion regarding the orthogonal dispersion for Water Retaining Facilities dataset and subsets. These conclusions should include a short description of the dispersion and a comparison (i.e. Water Retaining Hydroelectric Facilities (filtered) vs. Small Dam Heights vs. Large Dam Heigts). Remember to include units of measurement in your response and round to one decimal place. Finally why was median used to subset the data versus the mean? (6 marks)

#TODO
sd(filtered_WaterRetaining$DamHeight)
## [1] 48.62984
sd(lrg_DamHeights$DamHeight)
## [1] 54.02332
sd(sm_DamHeights$DamHeight)
## [1] 4.987907

The filtered facilities have a standard deviation of 48.6 being dispersed around the mean centre, on the other hand the small dams have the most decreased standard deviation which was 5, this showed decreased dispersion when compared to the other datasets. Large dams had the highest standard deviation which was 54, it was higher than the filtered data indicating there was increased dispersion, when compared to the mean. The median was used in the data versus the mean because outliers would not alter the data causing it to be skewed. As a result through the median we have a more accurate measurement and are not subject to the data being altered by outliers.