###Author and date need to be changed; Name - student name and date; done when a new markdown is started. #students only need to submit in their html the answers and code for the questions they need to complete/finish.. 31b8e172-b470-440e-83d8-e6b185028602:dAB5AHAAZQA6AFoAUQBBAHgAQQBEAGcAQQBNAFEAQQA1AEEARABZAEEATQBBAEEAMQBBAEMAMABBAE0AQQBCAGgAQQBHAE0AQQBaAEEAQQB0AEEARABRAEEAWgBnAEIAaABBAEcAVQBBAEwAUQBBADQAQQBHAEkAQQBPAFEAQQA1AEEAQwAwAEEATwBRAEIAagBBAEQARQBBAFkAZwBBADMAQQBHAEkAQQBaAEEAQQAzAEEARwBNAEEATQBBAEIAbQBBAEQARQBBAAoAcABvAHMAaQB0AGkAbwBuADoATQBnAEEAeQBBAEQAQQBBAAoAcAByAGUAZgBpAHgAOgAKAHMAbwB1AHIAYwBlADoATABRAEEAdABBAEMAMABBAEMAZwBCADAAQQBHAGsAQQBkAEEAQgBzAEEARwBVAEEATwBnAEEAZwBBAEMASQBBAFIAdwBCAEgAQQBGAEkAQQBNAGcAQQAzAEEARABZAEEASQBBAEIATQBBAEcARQBBAFkAZwBBAGcAQQBEAEUAQQBJAEEAQgBRAEEARwBFAEEAYwBnAEIAMABBAEMAQQBBAE0AdwBBAGcAQQBGAFUAQQBiAGcAQgBrAEEARwBVAEEAYwBnAEIAegBBAEgAUQBBAFkAUQBCAHUAQQBHAFEAQQBhAFEAQgB1AEEARwBjAEEASQBBAEIAMABBAEcAZwBBAFoAUQBBAGcAQQBFAGMAQQBSAFEAQgBQAEEAQwBBAEEAYQBRAEIAdQBBAEMAQQBBAFIAdwBCAGwAQQBHADgAQQBjAHcAQgAwAEEARwBFAEEAZABBAEIAcABBAEgATQBBAGQAQQBCAHAAQQBHAE0AQQBjAHcAQQBpAEEAQQBvAEEAWQBRAEIAMQBBAEgAUQBBAGEAQQBCAHYAQQBIAEkAQQBPAGcAQQBnAEEAQwBJAEEAUQBRAEIAcwBBAEcAVQBBAGUAQQBCAHAAQQBIAE0AQQBJAGcAQQBLAEEARwBRAEEAWQBRAEIAMABBAEcAVQBBAE8AZwBBAGcAQQBDAEkAQQBNAGcAQQB3AEEARABJAEEATgBBAEEAdABBAEQAQQBBAE4AUQBBAHQAQQBEAEEAQQBOAFEAQQBpAEEAQQBvAEEAYgB3AEIAMQBBAEgAUQBBAGMAQQBCADEAQQBIAFEAQQBPAGcAQQBnAEEARwBnAEEAZABBAEIAdABBAEcAdwBBAFgAdwBCAGsAQQBHADgAQQBZAHcAQgAxAEEARwAwAEEAWgBRAEIAdQBBAEgAUQBBAEMAZwBBAHQAQQBDADAAQQBMAFEAQQA9AAoAcwB1AGYAZgBpAHgAOgA=:31b8e172-b470-440e-83d8-e6b185028602
R Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
If your R Markdown is NOT in the same folder as your
data, please set your working directory using setwd()
first. Here is an example
setwd("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")
.
You will need to change the code to reflect your personal directory.
Otherwise, you may skip this step and continue to import
data by reading your WaterRetainingFacilities.csv
file. You may view the data by clicking the WaterRetaining in the
Environment window or type code View(DataSet)
. Click on the
little green triangle on the right to run current
chunk.
WaterRetaining <- read.csv("WaterRetainingFacilities.csv", sep = ',', header = TRUE)
Now that we have data imported, we are ready to calculate median, mean, range and quantiles of the Impoundment Volume in cubic meters (m^3)
summary(WaterRetaining$ImpoundmentVolume)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.02 0.20 0.96 17.85 9.20 385.00
Next, we will calculate the standard deviation of Impoundment Volume
in cubic meters (m^3). If there are missing values in your dataset,
simply add na.rm = TRUE
to your code to tell the R to
remove NAs in the calculation. Like this:
sd(WaterRetaining$ImpoundmentVolume, na.rm=TRUE)
. Since
there isn’t a missing value in this dataset, this line is not necessary
here.
sd(WaterRetaining$ImpoundmentVolume)
## [1] 44.7084
Question 6: Now it is your turn to write the code to calculate the median, mean, range, interquartile range, and standard deviation of the Storage Level; review the variable information to find out the units. (2 marks)
#TODO
summary(WaterRetaining$StorageLevel)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 10.50 17.40 37.39 43.50 284.00
sd(WaterRetaining$StorageLevel)
## [1] 45.47832
Visualize the spread of the Impoundment volume dataset, and answer Question 7: According to the boxplot, would you use median or mean as your central tendency measure? Explain and justify your choice? (2 marks)
I would use the median as my central tendency measure as we can see that the data is skewed due to outliers as a result that would impact the mean:
boxplot(WaterRetaining$ImpoundmentVolume)
Visualize the locations of the water retaining facilities by Dam Height and the mean and weighted mean centres.
plot(WaterRetaining$easting_m, WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switherland's Water Retaining Facilities", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
n<-nrow(WaterRetaining[1])
mc_x<-sum(WaterRetaining$easting_m)/n
mc_y<-sum(WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$easting_m))/sum(WaterRetaining$DamHeight)
wmc_y<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$northing_m))/sum(WaterRetaining$DamHeight)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
Notice that after running this script, this script shows all of the dams even for facilities that are non-hydroelectric. Let’s filter to keep only the FacilityAim that is for “Hydroelectricity”.
filtered_WaterRetaining <- subset(WaterRetaining, (FacilityAim == "Hydroelectricity"))
Visualize the locations of the Water Retaining Facitilites whose primary function is to produce hydroelectricity include their mean and weighted mean centres on the filtered data. Please adjust the range of x and y axes as well as the symbols to best represent the data.
#TODO
plot(filtered_WaterRetaining$easting_m, filtered_WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switzerland's Hydroelectricity Facilities", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
n<-nrow(filtered_WaterRetaining[1])
mc_x<-sum(filtered_WaterRetaining$easting_m)/n
mc_y<-sum(filtered_WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(filtered_WaterRetaining$DamHeight*filtered_WaterRetaining$easting_m))/sum(filtered_WaterRetaining$DamHeight)
wmc_y<-sum(as.numeric(filtered_WaterRetaining$DamHeight*filtered_WaterRetaining$northing_m))/sum(filtered_WaterRetaining$DamHeight)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
Question 8: Produce comments (i.e., a detailed explanation for each parameter in the code) that describes the execution of the statements given to you in the WaterRetaining example. Make sure you provide a description for each set of statements and organize your answer in a manner similar to the following example. (16 marks)
Example:
setwd ("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")
The setwd()
command tells R the current folder to look
into when searching for data, saving outputs etc.
Explain what is occurring in each of the following lines of code (2 marks each):
WaterRetaining <- read.csv("WaterRetainingFacilities.csv", sep = ',', header = TRUE)
plot(WaterRetaining$easting_m, WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switherland's Water Retaining Facilities", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
n<-nrow(WaterRetaining[1])
mc_x<-sum(WaterRetaining$easting_m)/n
mc_y<-sum(WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$easting_m))/sum(WaterRetaining$DamHeight)
wmc_y<-sum(as.numeric(WaterRetaining$DamHeight*WaterRetaining$northing_m))/sum(WaterRetaining$DamHeight)
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
install.packages("rmarkdown")
library(rmarkdown)
a) - This code is reading data and assigning it to the value WaterRetaining. The sep = “,” indicates the data is seperated by a comma, and header=TRUE lets the code know that there are headers with column names
b) - The plot function indicates that we are looking to create a scatter plot through the $ sign we are specifying the exact data we want from the dataframe which are the coordinates. Through xlab and ylab we can label the name of the axis, and we are able to control the range of the axis by indicating the min and max values through the xlim and ylim function. The main function is allowing to select a title
c) - In this function we are assigning the total number of rows in the dataframe and assigning it to n
d) - In this code we are calculating the sum of the northing and easting coordinates from the dataframe and dividing them by the total number of rows from the data frame in order to obtain the mean for the northing and easting coordinates
e)- This code is putting points on the scatterplot for the mean values for the northing and easting coordinates. It is assigning the plots to be blue, you can choose the shape and size through pch and cex.”p” is representing a single point.
f)- The as.,numeric function in this code makes sure that the values we obtain from our code are numeric. We are solving for the weighted mean where we multiply the dam height with its reflected coordinates and then divide it by the sum of the total height values in the dataframe. We assign this code to the values of wmc_x and wmc_y
g)- In this code we are looking to add a legend to the scatterplot, we are specifying the location in the top q right, the names of the values, and the shape and colour which we can see are blue and red
h) - In this code we are installing the R markdown package and loading it onto our current file:
Question 9: What do the mean and weighted mean center tell you about the distribution of the filtered Water Retaining Facility locations and their dam heights? Please explain. (3 marks)
Understanding the weighted mean center we are basically taking into account the dam height in the spatial distribution of the filtered water Retaining Facilities. The mean center informs us about the average location of the facilities with no influence from other values, if this differs from the weighted mean center quite a lot this can tell us that the dam height influences the distribution of the water retaining facilities and the spatial center.:
There are different sources of energy. Do they have the same mean centre? Imagine we are interested in how the distribution of Hydroelectric Water Retaining Facilties with sm_DamHeights (Dam Heights less than the median) differ from those with lrg_DamHeights (Dam Heights greater than or equal to the median). To do this, we will split the WaterRetaining dataset into two subsets. One subset will only contain Dam Heights using sm_DamHeights as the primary source and the other will contain only those with lrg_DamHeights as the primary source.
#TODO:find the median using the summary function
summary(filtered_WaterRetaining)
## ReservoirName DamType FacilityAim DamHeight
## Length:196 Length:196 Length:196 Min. : 2.00
## Class :character Class :character Class :character 1st Qu.: 15.00
## Mode :character Mode :character Mode :character Median : 23.60
## Mean : 43.86
## 3rd Qu.: 53.25
## Max. :285.00
## CrestLevel CrestLength ImpoundmentVolume ImpoundmentLevel
## Min. : 255.0 Min. : 20.00 Min. : 0.02 Min. : 254.2
## 1st Qu.: 690.5 1st Qu.: 81.12 1st Qu.: 0.24 1st Qu.: 685.6
## Median :1306.7 Median : 145.00 Median : 1.60 Median :1304.9
## Mean :1279.9 Mean : 232.05 Mean : 20.45 Mean :1278.0
## 3rd Qu.:1820.4 3rd Qu.: 325.50 3rd Qu.: 17.34 3rd Qu.:1819.5
## Max. :2476.0 Max. :1025.00 Max. :385.00 Max. :2474.6
## StorageLevel Construction StartSuperVision DamName
## Min. : 2.51 Min. :1872 Length:196 Length:196
## 1st Qu.: 10.50 1st Qu.:1937 Class :character Class :character
## Median : 18.35 Median :1957 Mode :character Mode :character
## Mean : 40.20 Mean :1952
## 3rd Qu.: 56.50 3rd Qu.:1965
## Max. :284.00 Max. :2022
## easting_m northing_m
## Min. :2487028 Min. :1086090
## 1st Qu.:2616057 1st Qu.:1134542
## Median :2679456 Median :1162090
## Mean :2669908 Mean :1169896
## 3rd Qu.:2722481 3rd Qu.:1198310
## Max. :2831511 Max. :1284130
sm_DamHeights<- subset(filtered_WaterRetaining, (filtered_WaterRetaining$DamHeight < 23.60))
lrg_DamHeights <- subset(filtered_WaterRetaining, (filtered_WaterRetaining$DamHeight >= 23.60))
#TODO lrg_DamHeights
Once created, the subset can be viewed in the Console by calling the
object (sm_DamHeights) using View(sm_Damheights)
. If you
want to know the number of Small Dam Heights, you can run:
nrow(sm_DamHeights[1])
.
Question 10: Show the Switherland Water Retaining Facilities scatter plot using the subset data that you created. Remember to overlay the subsets on the original data to see the distribution of the subset. Use a different color for the subsets and the original dataset. Also plot the mean centres for the entire dataset and the subsets. Be sure to include a legend, axis titles (with units), and a main title. Include your name and student number in brackets at the end of your main title. Please do not show irrelevant information on the graph. Please also type your code in the code chunk below. (10 marks)
#TODO
plot(WaterRetaining$easting_m, WaterRetaining$northing_m, xlab="Easting (m)", ylab="Northing (m)", main = "Switzerland's Water Retaining Facilities, (Amroop Bains, 1008063863)", xlim=c(2488000, 2840000), ylim=c(1085000, 1285000))
points(sm_DamHeights$easting_m, sm_DamHeights$northing_m, col = "green")
points(lrg_DamHeights$easting_m, sm_DamHeights$northing_m, col = "orange")
n<-nrow(filtered_WaterRetaining[1])
mc_x<-sum(filtered_WaterRetaining$easting_m)/n
mc_y<-sum(filtered_WaterRetaining$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
n<-nrow(sm_DamHeights[1])
mc_x<-sum(sm_DamHeights$easting_m)/n
mc_y<-sum(sm_DamHeights$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="green")
n<-nrow(lrg_DamHeights[1])
mc_x<-sum(lrg_DamHeights$easting_m)/n
mc_y<-sum(lrg_DamHeights$northing_m)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col='orange')
legend("topright", legend = c("Small Dam Heights", "Large Dam Heights", "Mean Centre", "Mean Center Small Dam Heights", "Mean Center Large Dam Heights"), pch = c(1,1,15,15,15), col = c("green", "orange", "blue","green", "orange"))
So far, we have measured the central tendency of the spatial data. How about dispersion? Standard deviation is a measure of dispersion that can be used to assess the distribution of spatial data. To calculate the orthogonal dispersion (east-west, north-south) associated with filtered_WaterRetaining dataset, we will use sd() command applied on easting and northing, respectively. Please do the same for the subsets in Question 11.
sd(filtered_WaterRetaining$easting_m)
## [1] 70065.49
sd(filtered_WaterRetaining$northing_m)
## [1] 48814.57
sd(sm_DamHeights$easting_m)
## [1] 69253.81
sd(sm_DamHeights$northing_m)
## [1] 54962.51
sd(lrg_DamHeights$easting_m)
## [1] 70923.38
sd(lrg_DamHeights$northing_m)
## [1] 37380.44
Question 11: Please show your code as well as the calculated standard deviation in your R Markdown. Provide a concise conclusion regarding the orthogonal dispersion for Water Retaining Facilities dataset and subsets. These conclusions should include a short description of the dispersion and a comparison (i.e. Water Retaining Hydroelectric Facilities (filtered) vs. Small Dam Heights vs. Large Dam Heigts). Remember to include units of measurement in your response and round to one decimal place. Finally why was median used to subset the data versus the mean? (6 marks)
#TODO
sd(filtered_WaterRetaining$DamHeight)
## [1] 48.62984
sd(lrg_DamHeights$DamHeight)
## [1] 54.02332
sd(sm_DamHeights$DamHeight)
## [1] 4.987907
The filtered facilities have a standard deviation of 48.6 being dispersed around the mean centre, on the other hand the small dams have the most decreased standard deviation which was 5, this showed decreased dispersion when compared to the other datasets. Large dams had the highest standard deviation which was 54, it was higher than the filtered data indicating there was increased dispersion, when compared to the mean. The median was used in the data versus the mean because outliers would not alter the data causing it to be skewed. As a result through the median we have a more accurate measurement and are not subject to the data being altered by outliers.