R Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
If your R Markdown is NOT in the same folder as your
data, please set your working directory using setwd()
first. Here is an example
setwd("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")
.
You will need to change the code to reflect your personal directory.
Otherwise, you may skip this step and continue to import
data by reading your stations.csv file. You may view
the data by clicking the stations in the Environment window or type code
View(DataSet)
. Click on the little green triangle on the
right to run current chunk.
setwd("/Users/amroopbains/Downloads")
stations <- read.csv("stations.csv", sep = ',', header = TRUE)
Now that we have data imported, we are ready to calculate median, mean, range and quantiles of the compression.
summary(stations$CNG_Compression)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 2.0 250.0 672.0 912.4 1182.8 10620.0 108521
Next, we will calculate the standard deviation of the
CNG_Compression. If there are missing values in your dataset, simply add
na.rm = TRUE
to your code to tell the R to remove NAs in
the calculation. Like this:
sd(stations$CNG_Compression, na.rm=TRUE)
.
sd(stations$CNG_Compression, na.rm = TRUE)
## [1] 1067.735
Question 6: Now it is your turn to write the code to calculate the median, mean, range, interquartile range, and standard deviation of the CNG_Dispensers. (2 marks)
summary(stations$CNG_Dispensers)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0 1 2 3 2 82 107907
sd(stations$CNG_Dispensers, na.rm = TRUE)
## [1] 7.15083
#TODO
Visualize the spread of the CNG_Compression, and answer Question 7: According to the boxplot, would you use median or mean as your central tendency measure? Explain and justify your choice? (2 marks)
Type your response here:
boxplot(stations$CNG_Compression)
Visualize the locations of the CNG stations and the mean and weighted mean centres.
CNG_stations <- stations[stations$Fuel_Type == "CNG", ]
plot(CNG_stations$Longitude, CNG_stations$Latitude, xlab="Longitude", ylab="Latitude", main = "CNG Stations in US and Canada", xlim=c(-125, -63), ylim=c(25, 62))
n<-nrow(CNG_stations[1])
mc_x<-sum(CNG_stations$Longitude)/n
mc_y<-sum(CNG_stations$Latitude)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x <- sum(as.numeric(CNG_stations$CNG_Compression * CNG_stations$Longitude), na.rm = TRUE) / sum(CNG_stations$CNG_Compression, na.rm = TRUE)
wmc_y <- sum(as.numeric(CNG_stations$CNG_Compression * CNG_stations$Latitude), na.rm = TRUE) / sum(CNG_stations$CNG_Compression, na.rm = TRUE)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
Question 8: Produce comments (i.e., a detailed explanation for each parameter in the code) that describes the execution of the statements given to you in the stations example. Make sure you provide a description for each set of statements and organize your answer in a manner similar to the following example. (16 marks)
Example:
setwd ("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")
The setwd()
command tells R the current folder to look
into when searching for data, saving outputs etc.
Explain what is occurring in each of the following lines of code:
stations <- read.csv("stations.csv", sep = ',', header = TRUE)
plot(CNG_stations$Longitude, CNG_stations$Latitude, xlab="Longitude", ylab="Latitude", main = "CNG Stations in US and Canada", xlim=c(-125, -63), ylim=c(25, 62))
n<-nrow(CNG_stations[1])
mc_x<-sum(CNG_stations$Longitude)/n
mc_y<-sum(CNG_stations$Latitude)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(CNG_stations$CNG_Compression*CNG_stations$Longitude))/sum(CNG_stations$CNG_Compression)
wmc_y<-sum(as.numeric(CNG_stations$CNG_Compression*CNG_stations$Latitude))/sum(CNG_stations$CNG_Compression)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
Type your response here:
a. Reads and assigns stations.csv file into data frame on R called stations, indicates data is comma seperated, and that the first row includes labels
b. Creates scatterplot for the longitude and latitude for the CNG stations, in which the longitude is on the x axis, and the latitude is on the y axis. The x axis label is Longitude, and the y axis label is Latitude. The title for the plot is CNG Stations in the US and Canada, and the x axis is limited between -125 and -63, the y axis is limited between 25 and 62.
c. The total number of rows in first column of CNG_stations is assigned to the variable of n
d. The mean longitude is calculated by getting sum of all longitudes and dividing by number of stations, assigned to variable mc_x. The mean latitude is calculated as well by getting sum of all latitudes and dividing by number of stations, assigned to variable mc_y.
e. Points added to plot for mean centre, pch=15 indicates square symbol, cex=2 indicates point size, and the point colour is selected as blue
f. Weighted mean centre is calculated for longitude and latitude of CNG stations in which they are weighted by the total CNG compression values. Each stations longitude and latitude values are multiplied by its CNG compression, and then the sum function gives total for weighted values.
g. Adds points to plot, in which blue square for mean centre, and red square for weighted mean centre.
h. Legend added to topright, in which blue represents mean centre, and red represents weighted mean centre. Plot symbol indicated to be square
Question 9: What do the mean and weighted mean center tell you about the distribution of the CNG station locations and their compression? Please explain. (3 marks)
The mean for the CNG station locations represents the average location for all CNG stations, while the weighted mean centre accounts the location of CNG stations with the compression values. As a result the weight mean centre will be around the stations with high compression values.
There are different types of fuels. Do they have the same mean centre? Imagine we are interested in how the distribution of stations with ELEC differ from those with LPG. To do this, we will split the stations dataset into two subsets. One subset will only contain stations using ELEC as fuel and the other will contain only those with LPG as fuel.
ELEC_stations <- subset(stations, (stations$Fuel_Type == "ELEC"))
LPG_stations <- subset(stations, stations$Fuel_Type == "LPG")
#TODO
Once created, the subset can be viewed in the Console by calling the
object (ELEC_stations) using View(ELEC_stations)
. If you
want to know the number of ELEC_stations, you can run:
nrow(ELEC_stations[1])
.
Question 10: Show the stations scatter plot using the subset data that you created. Use a different color for the subsets. Also plot the mean centres for the subsets. Be sure to include a legend, axis titles (with units), and a main title. Include your name and student number in brackets at the end of your main title. Do not show irrelevant information on the graph. Please also type your code in the code chunk below. (10 marks)
plot(ELEC_stations$Longitude, ELEC_stations$Latitude,
xlab="Longitude (°)", ylab="Latitude (°)",
main="CNG Stations: ELEC vs LPG (Amroop Bains, 1008063863)",
col="red", pch=16, cex=0.5, xlim=c(-125, -63), ylim=c(25, 62))
# Add the LPG stations to the plot (using blue color)
points(LPG_stations$Longitude, LPG_stations$Latitude,
col="blue", pch=16, cex=0.5)
# Calculate and plot the mean center for ELEC stations
mc_ELEC_x <- mean(ELEC_stations$Longitude)
mc_ELEC_y <- mean(ELEC_stations$Latitude)
points(mc_ELEC_x, mc_ELEC_y, pch=15, cex=2, col="red") # Square marker for mean center
# Calculate and plot the mean center for LPG stations
mc_LPG_x <- mean(LPG_stations$Longitude)
mc_LPG_y <- mean(LPG_stations$Latitude)
points(mc_LPG_x, mc_LPG_y, pch=15, cex=2, col="blue") # Square marker for mean center
# Add a legend to the plot
legend("topright",
legend = c("ELEC Stations", "LPG Stations", "Mean Centre ELEC", "Mean Centre LPG"),
pch = c(16, 16, 15, 15), # Use pch=15 (square) for mean centers
col = c("red", "blue", "red", "blue"),
cex = 0.8)
#TODO
So far, we have measured the central tendency of the spatial data. How about dispersion? Standard deviation is a measure of dispersion that can be used to assess the distribution of spatial data. To calculate the orthogonal dispersion (east-west, north-south) associated with CNG_stations dataset, we will use sd() command applied on Longitude and Latitude, respectively. Please do the same for the subsets in Question 11.
sd(CNG_stations$Longitude)
## [1] 16.70479
sd(CNG_stations$Latitude)
## [1] 5.167414
Question 11: Please show your code as well as the calculated standard deviation in your R Markdown. Provide a concise conclusion regarding the orthogonal dispersion for the stations dataset and subsets. These conclusions should include a short description of the dispersion and a comparison (i.e. stations vs. ELEC_stations vs. LPG_stations). Remember to include units of measurement in your response. (6 marks)
sd(ELEC_stations$Longitude)
## [1] 19.80361
sd(ELEC_stations$Latitude)
## [1] 5.722009
sd(LPG_stations$Longitude)
## [1] 15.63939
sd(LPG_stations$Latitude)
## [1] 6.650902
#TODO
Type your response here: For CNG stations the longitude standard deviation was 16.7 degrees, this means the stations are spread out widely from east-west, and the latitude standard deviation was 5.17 degrees which means a much more narrow dispersion for the stations in the north-south direction. The ELEC stations had the widet spatial dispersion for longitude being 16.7 degrees meaning it covers a lot of area in the east-direction, although it had a narrow dispersion for latitude being 5.7. The LPG stations had a standard deviation of 15.64 degrees for longitude being moderately distributed in east-west, and a standard deviation of 6.51 degrees for latitude being more narrow similar to the other categories. From these values we can see in all categories that there is much more dispersion in the longitude compared to the latitude. ELEc stations had the highest east-west dispersion, and LPG stations had the highest north-south dispersion.