Info about R Markdown

R Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

Part 3A. Non-spatial Statistics

Import Data

If your R Markdown is NOT in the same folder as your data, please set your working directory using setwd() first. Here is an example setwd("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1"). You will need to change the code to reflect your personal directory. in Otherwise, you may skip this step and continue to import data by reading your aviationfacilities.csv file. You may view the data by clicking the aviationfacilities in the Environment window or type code View(DataSet). Click on the little green triangle on the right to run current chunk.

  • For testing where there is any error in any line of code, you should run the script line by line and check how they work carefully.
    • Any errors in your code will be displayed in the bottom left Console window of RStudio, which will help you pinpoint where the error lies.
    • If there is an error, try selecting each line of code individually and running the script one by one. This will tell you which line of code is causing the error.
setwd("//medusa/StudentWork/mannja10")

aviationfacilities <- read.csv("aviationfacilitiesdata.csv", sep = ',', header = TRUE)

Descriptive Statistics of aviation facilities

Now that we have data imported, we are ready to calculate median, mean, range and quantiles of the elevation.

summary(aviationfacilities$ELEV)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -223     275     755    1189    1273   12442

Next, we will calculate the standard deviation of the elevation (ELEV). If there are missing values in your dataset, simply add na.rm = TRUE to your code to tell the R to remove NAs in the calculation. Like this: sd(aviationfacilities$ELEV, na.rm=TRUE).

sd(aviationfacilities$ELEV, na.rm = TRUE)
## [1] 1491.002

Subset for Florida

Create a subset of aviation facilities in Florida.

aviationfacilities_fl <- subset(aviationfacilities, STATE_CODE == "FL")

Descriptive Statistics of Elevation

Question 7: Now it is your turn to write the code to calculate the median, mean, range, interquartile range, and standard deviation of the aviation facilities elevation in Florida (2 marks)

summary(aviationfacilities_fl$ELEV)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   19.45   55.00   69.11   98.00  709.00
median(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 55
mean(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 69.10834
min(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 0
max(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 709
IQR(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 78.55
sd(aviationfacilities_fl$ELEV, na.rm = TRUE)
## [1] 64.22694

Type your response here (include units, round to 1 decimal place): The Median is 55.0, the mean is 69.1, the min is 0 and the max is 709 which means the range is 0 to 709, the interquartile range is Q3 98.0 - Q1 19.4 which is is 78.5 and the standard deviation is 64.2.

Question 8: According to the boxplot below, would you use median or mean as your central tendency measure? Explain and justify your choice? (2 marks)

Type your response here:According to the boxplot below I would use median as my central tendency measure because most of the data clustered together but there are clear outliers. The median is the best choice because it focuses on the middle values and the average numbers in the data set discounting the extreme values while the mean would take those into account as equal numbers to be included and judged.

boxplot(aviationfacilities_fl$ELEV)

Part 3B. Mapping Mean and Weighted Mean Centre

Visualize the locations of the Florida airports (A) and the mean and weighted mean centres according to the elevation.

FL_airports <- aviationfacilities_fl[aviationfacilities_fl$SITE_TYPE_CODE == "A", ]
plot(FL_airports$LONG_DECIMAL, FL_airports$LAT_DECIMAL, xlab="Longitude", ylab="Latitude", main = "Airports in Florida", xlim=c(-88, -80), ylim=c(24, 33))
n<-nrow(FL_airports[1])
mc_x<-sum(FL_airports$LONG_DECIMAL)/n
mc_y<-sum(FL_airports$LAT_DECIMAL)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x <- sum(as.numeric(FL_airports$ELEV * FL_airports$LONG_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)
wmc_y <- sum(as.numeric(FL_airports$ELEV * FL_airports$LAT_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("bottomleft", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))

Question 9: Produce comments (i.e., a detailed explanation for each parameter in the code) that describes the execution of the statements given to you in the above example. Make sure you provide a description for each set of statements and organize your answer in a manner similar to the following example. (16 marks)

Example: setwd ("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")

The setwd() command tells R the current folder to look into when searching for data, saving outputs etc.

Explain what is occurring in each of the following lines of code:

  1. aviationfacilities <- read.csv("aviationfacilities.csv", sep = ',', header = TRUE)
  2. plot(FL_airports$LONG_DECIMAL, FL_airports$LAT_DECIMAL, xlab="Longitude", ylab="Latitude", main = "Airports in Florida", xlim=c(-88, -80), ylim=c(24, 33))
  3. n<-nrow(FL_airports[1])
  4. mc_x<-sum(FL_airports$LONG_DECIMAL)/n mc_y<-sum(FL_airports$LAT_DECIMAL)/n
  5. points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
  6. wmc_x <- sum(as.numeric(FL_airports$ELEV * FL_airports$LONG_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE) wmc_y <- sum(as.numeric(FL_airports$ELEV * FL_airports$LAT_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)
  7. points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
  8. legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))

Type your response here: a) The code tells R to import aviationfacilities.csv into r, commas seperate the values, and that the first row of the file is the column names and to name the data frame aviationfacilities.

  1. The code tells R to create a scatter plot of the spatial distribution of airport locations in Flordia. The longitude is plotted on the x-axis and the latitude is plotted on the y-axis, the axes are labeled “Longitude” and “Latitude” accordingly using xlab and ylab. The title of the plot is to be labelled “Airports in Florida” using main. The range of the x axis is set from -88 to -80 and the y axis range is 24 to 33 written through xlim and ylim, to ensure the map only is relevant to florida distribution and numbers.

  2. The code tells R to calculate the number of rows in the data frame of florida airports which is the total number of florida airports and is represented and written as n. 

  3. The code tells R to calculate the mean centre of Florida airports. This is done by calculating the sum of florida airports longitude divided by the number of airports for mean centre of longitude and calculating by taking the sum of florida airports latitude divided by the number of airports for centre of latitude for the mean centre of latitude.Together mc_x and mc_y show the mean centre of florida airports.

  4. The code tells R to plot the point of mean centre of longitude as a filled square at size of 2 in the colour blue.

  5. The code tells R to calculate the weighted mean centre longitude and latitude of florida airports. This is done by multiplying the elevation of florida airports by the longitude or latitude of florida airports, ignoring the missing data and dividing it by the sum total elevation of florida airports

  6. The code tells R to plot the point of weighted mean centre of latitude as a filled square at a size of 2 in the colour red.

  7. The code tells R to create a legend for the map in the topright corner, the labels the points as “Mean centre” and “Weighted mean centre” with the corresponding blue and red coloured square symbols next to the names of the plot points for mean centre and weighted mean centre.

Question 10: What do the mean and weighted mean center tell you about the distribution of the aviation facilities in Florida and their elevation? Please explain. (3 marks)

Type your response here: The mean and weighted mean center tell use about the distribution of the aviation facilities in Florida and their elevation is black dots are the locations of airports in Florida, the blue square is the mean centre which is the average location of all airports in Florida and the red square is the weighted mean centre which is the average location of airports in Florida after weighing in elevation. It tell us that airports are spatially clusted and not evenly distributed. The mean centre shows where most aiports are located and that is to the top left of the mainland while the weighted mean centre shows that higher elevation airports are not evenly disturbed and are more common in the panhandle.

Part 3C. Create and Visualize Subsets

Create Subsets

There are different elevation restrictions for airplanes. Do they have the same mean centre? Imagine we are interested in how the distribution of elevation with sm_elev (airport elevation less than the median) differ from those with lrg_elev (airport elevation greater than or equal to the median). To do this, we will split the Florida aviation facilities dataset into two subsets. One subset will only contain elevation using sm_elev as the primary source and the other will contain only those with lrg_elev as the primary source.

summary(FL_airports$ELEV)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    3.00   23.00   65.00   73.93  102.00  295.00
sm_elev <- subset(FL_airports, (FL_airports$ELEV <65))
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   4.0.1     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
lrg_elev <- subset(FL_airports, (FL_airports$ELEV <65)) 
view(sm_elev)
view(lrg_elev)

nrow(sm_elev[1])
## [1] 242
nrow(lrg_elev[1])
## [1] 242

Once created, the subset can be viewed in the Console by calling the object (sm_elev) using View(sm_elev). If you want to know the number of sm_elev, you can run: nrow(sm_elev[1]).

Visualize Subsets

Question 11: Show the Florida airports scatter plot using the subset data that you created. Use a different color for the subsets. Also plot the mean centres for the subsets. Be sure to include a legend, axis titles (with units), and a main title. Include your name and student number in brackets at the end of your main title. Do not show irrelevant information on the graph. Please also type your code in the code chunk below. (10 marks)

FL_airports <- aviationfacilities_fl[aviationfacilities_fl$SITE_TYPE_CODE == "A",]
plot(FL_airports$LONG_DECIMAL, FL_airports$LAT_DECIMAL, xlab="Longitude", ylab="Latitude", main = "Airports in Florida, (Jasmine Mann 1005255028)", xlim=c(-88, -80), ylim=c(24, 33))
points(sm_elev$LONG_DECIMAL, sm_elev$LAT_DECIMAL, pch = 20, col = "orange")

points(lrg_elev$LONG_DECIMAL, lrg_elev$LAT_DECIMAL, pch = 20, col = "purple")

n<-nrow(FL_airports[1])
mc_x<-sum(FL_airports$LONG_DECIMAL)/n
mc_y<-sum(FL_airports$LAT_DECIMAL)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
mc_x<-sum(sm_elev$LONG_DECIMAL)/n
mc_y<-sum(sm_elev$LAT_DECIMAL)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="purple")
legend("bottomleft", legend = c("Mean centre", "Small Elevation", "Large Elevation"), pch = c(15,15), col = c("blue","orange","purple"))

Part 3D. Dispersion of Subsets

Standard Deviation of Subsets

So far, we have measured the central tendency of the spatial data. How about dispersion? Standard deviation is a measure of dispersion that can be used to assess the distribution of spatial data. To calculate the orthogonal dispersion (east-west, north-south) associated with Florida airports dataset, we will use sd() command applied on Longitude and Latitude, respectively. Please do the same for the two elevation subsets in Question 12.

sd(FL_airports$LONG_DECIMAL)
## [1] 1.785058
sd(FL_airports$LAT_DECIMAL)
## [1] 1.542124

Question 12: Please show your code as well as the calculated standard deviation in your R Markdown. Provide a concise conclusion regarding the orthogonal dispersion for the Florida aviation facilities dataset and subsets. These conclusions should include a short description of the dispersion and a comparison (i.e. Florida airports vs. sm_elev vs. lrg_elev). Remember to include units of measurement in your response. (6 marks)

sd(sm_elev$LONG_DECIMAL)
## [1] 1.343617
sd(sm_elev$LAT_DECIMAL)
## [1] 1.533126
sd(lrg_elev$LONG_DECIMAL)
## [1] 1.343617
sd(lrg_elev$LAT_DECIMAL)
## [1] 1.533126

Type your response here: The orthogonal dispersion for the florida aviation facilities dataset and subsets is not equally spread out. The mean centre of the florida airports is to the top left of the mainland while the weighted mean centre of the florida airports is to the pandle.