rm(list=ls()); gc()

##          used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
## Ncells 528671 28.3    1175765 62.8         NA   669445 35.8
## Vcells 975057  7.5    8388608 64.0      16384  1851708 14.2

Part 3A. Non-spatial Statistics (Date: 29/01/2025)

Import Data

stations <- read.csv("station.csv", sep = ',', header = TRUE)

Descriptive Statistics of CNG_Compression

Now that we have data imported, we are ready to calculate median, mean, range and quantiles of the compression.

summary(stations$CNG_Compression)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     2.0   250.0   672.0   912.4  1182.8 10620.0  108907

Next, we will calculate the standard deviation of the CNG_Compression. If there are missing values in your dataset, simply add na.rm = TRUE to your code to tell the R to remove NAs in the calculation. Like this: sd(stations$CNG_Compression, na.rm=TRUE).

sd(stations$CNG_Compression, na.rm = TRUE)

## [1] 1067.735

Descriptive Statistics of InitialPow

Question 6: Now it is your turn to write the code to calculate the median, mean, range, interquartile range, and standard deviation of the CNG_Dispensers. (2 marks)

summary(stations$CNG_Dispensers)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##       0       1       2       3       2      82  108292

sd(stations$CNG_Dispensers, na.rm = TRUE)

## [1] 7.148115

range(stations$CNG_Dispensers, na.rm = TRUE)

## [1]  0 82

IQR(stations$CNG_Dispensers, na.rm = TRUE)

## [1] 1

Visualize

Visualize the spread of the CNG_Compression, and answer Question 7: According to the boxplot, would you use median or mean as your central tendency measure? Explain and justify your choice? (2 marks)

Type your response here:
I would use the median as the central tendency measure. According to the boxplot, the data is positively skewed, with many outliers above the upper whisker. When there is a skewed distribution and data with extreme values, the median is the best measure of central tendency. On the other hand, mode is the best option for bimodal or multimodal data but not skewed distributions.

boxplot(stations$CNG_Compression)

Part 3B. Mapping Mean and Weighted Mean Centre (Date: 29/01/2025)

Visualize the locations of the CNG stations and the mean and weighted mean centres.

CNG_stations <- stations[stations$Fuel_Type == "CNG", ]
plot(CNG_stations$Longitude, CNG_stations$Latitude, xlab="Longitude", ylab="Latitude", main = "CNG Stations in US and Canada", xlim=c(-125, -63), ylim=c(25, 62))
n<-nrow(CNG_stations[1])
mc_x<-sum(CNG_stations$Longitude)/n
mc_y<-sum(CNG_stations$Latitude)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x <- sum(as.numeric(CNG_stations$CNG_Compression * CNG_stations$Longitude), na.rm = TRUE) / sum(CNG_stations$CNG_Compression, na.rm = TRUE)
wmc_y <- sum(as.numeric(CNG_stations$CNG_Compression * CNG_stations$Latitude), na.rm = TRUE) / sum(CNG_stations$CNG_Compression, na.rm = TRUE)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))

Code for different colours in R can be found here: http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf
Symbol code in R can be found here: http://www.statmethods.net/advgraphs/parameters.html
Use xlim=c(,) or ylim=c(,) in plot() to change the scale of the dataset. Make sure that there isn’t much white space and that the legend does not cover the data points. You can also change the position of the legend.

Question 8: Produce comments (i.e., a detailed explanation for each parameter in the code) that describes the execution of the statements given to you in the stations example. Make sure you provide a description for each set of statements and organize your answer in a manner similar to the following example. (16 marks)

Example: setwd (“\medusaUTOR ID) The setwd() command tells R the current folder to look into when searching for data, saving outputs etc.

Explain what is occurring in each of the following lines of code:

stations <- read.csv("stations.csv", sep = ',', header = TRUE)
plot(CNG_stations$Longitude, CNG_stations$Latitude, xlab="Longitude", ylab="Latitude", main = "CNG Stations in US and Canada", xlim=c(-125, -63), ylim=c(25, 62))
n<-nrow(CNG_stations[1])
mc_x<-sum(CNG_stations$Longitude)/n mc_y<-sum(CNG_stations$Latitude)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x<-sum(as.numeric(CNG_stations$CNG_Compression*CNG_stations$Longitude))/sum(CNG_stations$CNG_Compression) wmc_y<-sum(as.numeric(CNG_stations$CNG_Compression*CNG_stations$Latitude))/sum(CNG_stations$CNG_Compression)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))

Type your response here: a) read and name the file “stations.csv”, separate values by ‘,’, and ensure the first row is header b) plot and label the longitude and latitude as x-axis and y-axis, main is the title, and use xlim and ylim to limit the range of the map c) get the number of rows of CNG_stations d) mc_x and mc_y refers to the calculation of mean center of longitude and latitude of CNG_stations e) point the mean center in blue with square shape and doubled larger f) wmc_x and wmc_y refers to the calculation of weighted mean center of longitude and latitude of CNG_stations by using CNG_Compression to weight g) point the weighted mean center in red with square shape and doubled larger h) label “Mean centre” in blue and “Weighted mean centre” in red within a legend in the top-right corner of the plot with square shape

Question 9: What do the mean and weighted mean center tell you about the distribution of the CNG station locations and their compression? Please explain. (3 marks)

Type your response here: The mean represents an average of all CNG station locations. It is the geographic center of all CNG stations, regardless of any weighting factor. The weighted mean center considers the capacity of CNG_Compression at each station. Those with better capacity have a larger influence. Both are measures of centrality for spatial data, helping us understand the overall distribution of CNG stations. If two centers are close, stations are equally distributed. If two centers are far away, stations are clustered in some specific areas.

Part 3C. Create and Visualize Subsets (Date: 29/01/2025)

Create Subsets

There are different types of fuels. Do they have the same mean centre? Imagine we are interested in how the distribution of stations with ELEC differ from those with LPG. To do this, we will split the stations dataset into two subsets. One subset will only contain stations using ELEC as fuel and the other will contain only those with LPG as fuel.

ELEC_stations <- subset(stations, (stations$Fuel_Type == "ELEC"))
LPG_stations <- subset(stations, (stations$Fuel_Type == "LPG"))

Once created, the subset can be viewed in the Console by calling the object (ELEC_stations) using View(ELEC_stations). If you want to know the number of ELEC_stations, you can run: nrow(ELEC_stations[1]).

Visualize Subsets

Question 10: Show the stations scatter plot using the subset data that you created. Use a different color for the subsets. Also plot the mean centres for the subsets. Be sure to include a legend, axis titles (with units), and a main title. Include your name and student number in brackets at the end of your main title. Do not show irrelevant information on the graph. Please also type your code in the code chunk below. (10 marks)

nrow(ELEC_stations)

## [1] 96202

nrow(LPG_stations)

## [1] 3549

plot(ELEC_stations$Longitude, ELEC_stations$Latitude, 
     xlab="Longitude (°)", ylab="Latitude (°)", main = "ELEC and LPG Stations in US and Canada (MAN LAI MANNIE SUM 1008964952)",
     xlim=c(-125, -63), ylim=c(25, 62))
points(LPG_stations$Longitude, LPG_stations$Latitude)
mELEC_x<-sum(ELEC_stations$Longitude)/nrow(ELEC_stations)
mELEC_y<-sum(ELEC_stations$Latitude)/nrow(ELEC_stations)
mLPG_x<-sum(LPG_stations$Longitude)/nrow(LPG_stations)
mLPG_y<-sum(LPG_stations$Latitude)/nrow(LPG_stations)
points(mELEC_x,mELEC_y,'p',pch=15,cex=2,col="blue")
points(mLPG_x, mLPG_y,'p',pch=15,cex=2,col='red')
legend("topright", legend = c("ELEC_stations", "LPG_stations"), pch = c(15,15), col = c("blue","red"))

Part 3D. Dispersion of Subsets (Date: 29/01/2025)

Standard Deviation of Subsets

So far, we have measured the central tendency of the spatial data. How about dispersion? Standard deviation is a measure of dispersion that can be used to assess the distribution of spatial data. To calculate the orthogonal dispersion (east-west, north-south) associated with CNG_stations dataset, we will use sd() command applied on Longitude and Latitude, respectively. Please do the same for the subsets in Question 11.

sd(CNG_stations$Longitude)

## [1] 16.70479

sd(CNG_stations$Latitude)

## [1] 5.167413

Question 11: Please show your code as well as the calculated standard deviation in your R Markdown. Provide a concise conclusion regarding the orthogonal dispersion for the stations dataset and subsets. These conclusions should include a short description of the dispersion and a comparison (i.e. stations vs. ELEC_stations vs. LPG_stations). Remember to include units of measurement in your response. (6 marks)

sd(ELEC_stations$Longitude)

## [1] 19.80548

sd(ELEC_stations$Latitude)

## [1] 5.720748

sd(LPG_stations$Longitude)

## [1] 15.63939

sd(LPG_stations$Latitude)

## [1] 6.650902

Type your response here: The longitude standard deviation of CNG_stations is 16.70479°, while the latitude standard deviation of CNG_stations is 5.167413°. The longitude standard deviation of ELEC_stations is 19.80548°, while the latitude standard deviation of ELEC_stations is 5.720748°. The longitude standard deviation of LPG_stations is 15.63939°, while the latitude standard deviation of LPG_stations is 6.650902°. It shows that ELEC_stations are on the east side of CNG_stations for 4.16609°, while LPG_stations are on the west side of CNG_stations for 1.10654°. Also, LPG_stations are on the most north side of CNG_stations for 1.483489°, while ELEC_stations are in the middle. hence, comparing with CNG_stations, ELEC_stations are on the north-east side, and LPG_stations are on the north-west side.

GGR276 Lab 1 Part 3 Understanding the GEO in Geostatistics

MAN LAI MANNIE SUM

2025-01-29