R Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
If your R Markdown is NOT in the same folder as your
data, please set your working directory using setwd()
first. Here is an example
setwd("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1").
You will need to change the code to reflect your personal directory. in
Otherwise, you may skip this step and continue to import
data by reading your aviationfacilities.csv file. You
may view the data by clicking the aviationfacilities in the Environment
window or type code View(DataSet). Click on the little
green triangle on the right to run current chunk.
setwd("//medusa/StudentWork/mannja10")
aviationfacilities <- read.csv("aviationfacilitiesdata.csv", sep = ',', header = TRUE)
Now that we have data imported, we are ready to calculate median, mean, range and quantiles of the elevation.
summary(aviationfacilities$ELEV)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -223 275 755 1189 1273 12442
Next, we will calculate the standard deviation of the elevation
(ELEV). If there are missing values in your dataset, simply add
na.rm = TRUE to your code to tell the R to remove NAs in
the calculation. Like this:
sd(aviationfacilities$ELEV, na.rm=TRUE).
sd(aviationfacilities$ELEV, na.rm = TRUE)
## [1] 1491.002
Create a subset of aviation facilities in Florida.
aviationfacilities_fl <- subset(aviationfacilities, STATE_CODE == "FL")
Question 7: Now it is your turn to write the code to calculate the median, mean, range, interquartile range, and standard deviation of the aviation facilities elevation in Florida (2 marks)
summary(aviationfacilities_fl$ELEV)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 19.45 55.00 69.11 98.00 709.00
median(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 55
mean(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 69.10834
min(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 0
max(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 709
IQR(aviationfacilities_fl$ELEV, na.rm=TRUE)
## [1] 78.55
sd(aviationfacilities_fl$ELEV, na.rm = TRUE)
## [1] 64.22694
Type your response here (include units, round to 1 decimal place): The Median is 55.0, the mean is 69.1, the min is 0 and the max is 709 which means the range is 0 to 709, the interquartile range is Q3 98.0 - Q1 19.4 which is is 78.5 and the standard deviation is 64.2.
Question 8: According to the boxplot below, would you use median or mean as your central tendency measure? Explain and justify your choice? (2 marks)
Type your response here:According to the boxplot below I would use median as my central tendency measure because most of the data clustered together but there are clear outliers. The median is the best choice because it focuses on the middle values and the average numbers in the data set discounting the extreme values while the mean would take those into account as equal numbers to be included and judged.
boxplot(aviationfacilities_fl$ELEV)
Visualize the locations of the Florida airports (A) and the mean and weighted mean centres according to the elevation.
FL_airports <- aviationfacilities_fl[aviationfacilities_fl$SITE_TYPE_CODE == "A", ]
plot(FL_airports$LONG_DECIMAL, FL_airports$LAT_DECIMAL, xlab="Longitude", ylab="Latitude", main = "Airports in Florida", xlim=c(-88, -80), ylim=c(24, 33))
n<-nrow(FL_airports[1])
mc_x<-sum(FL_airports$LONG_DECIMAL)/n
mc_y<-sum(FL_airports$LAT_DECIMAL)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
wmc_x <- sum(as.numeric(FL_airports$ELEV * FL_airports$LONG_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)
wmc_y <- sum(as.numeric(FL_airports$ELEV * FL_airports$LAT_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)
points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')
legend("bottomleft", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))
Question 9: Produce comments (i.e., a detailed explanation for each parameter in the code) that describes the execution of the statements given to you in the above example. Make sure you provide a description for each set of statements and organize your answer in a manner similar to the following example. (16 marks)
Example:
setwd ("\\medusa\StudentWork\(Your UTOR ID)\GGR276\Lab1")
The setwd() command tells R the current folder to look
into when searching for data, saving outputs etc.
Explain what is occurring in each of the following lines of code:
aviationfacilities <- read.csv("aviationfacilities.csv", sep = ',', header = TRUE)plot(FL_airports$LONG_DECIMAL, FL_airports$LAT_DECIMAL, xlab="Longitude", ylab="Latitude", main = "Airports in Florida", xlim=c(-88, -80), ylim=c(24, 33))n<-nrow(FL_airports[1])mc_x<-sum(FL_airports$LONG_DECIMAL)/n
mc_y<-sum(FL_airports$LAT_DECIMAL)/npoints(mc_x,mc_y,'p',pch=15,cex=2,col="blue")wmc_x <- sum(as.numeric(FL_airports$ELEV * FL_airports$LONG_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)
wmc_y <- sum(as.numeric(FL_airports$ELEV * FL_airports$LAT_DECIMAL), na.rm = TRUE) / sum(FL_airports$ELEV, na.rm = TRUE)points(wmc_x, wmc_y,'p',pch=15,cex=2,col='red')legend("topright", legend = c("Mean centre", "Weighted mean centre"), pch = c(15,15), col = c("blue","red"))Type your response here: a) The code tells R to import aviationfacilities.csv into r, commas seperate the values, and that the first row of the file is the column names and to name the data frame aviationfacilities.
The code tells R to create a scatter plot of the spatial distribution of airport locations in Flordia. The longitude is plotted on the x-axis and the latitude is plotted on the y-axis, the axes are labeled “Longitude” and “Latitude” accordingly using xlab and ylab. The title of the plot is to be labelled “Airports in Florida” using main. The range of the x axis is set from -88 to -80 and the y axis range is 24 to 33 written through xlim and ylim, to ensure the map only is relevant to florida distribution and numbers.
The code tells R to calculate the number of rows in the data frame of florida airports which is the total number of florida airports and is represented and written as n.
The code tells R to calculate the mean centre of Florida airports. This is done by calculating the sum of florida airports longitude divided by the number of airports for mean centre of longitude and calculating by taking the sum of florida airports latitude divided by the number of airports for centre of latitude for the mean centre of latitude.Together mc_x and mc_y show the mean centre of florida airports.
The code tells R to plot the point of mean centre of longitude as a filled square at size of 2 in the colour blue.
The code tells R to calculate the weighted mean centre longitude and latitude of florida airports. This is done by multiplying the elevation of florida airports by the longitude or latitude of florida airports, ignoring the missing data and dividing it by the sum total elevation of florida airports
The code tells R to plot the point of weighted mean centre of latitude as a filled square at a size of 2 in the colour red.
The code tells R to create a legend for the map in the topright corner, the labels the points as “Mean centre” and “Weighted mean centre” with the corresponding blue and red coloured square symbols next to the names of the plot points for mean centre and weighted mean centre.
Question 10: What do the mean and weighted mean center tell you about the distribution of the aviation facilities in Florida and their elevation? Please explain. (3 marks)
Type your response here: The mean and weighted mean center tell use about the distribution of the aviation facilities in Florida and their elevation is black dots are the locations of airports in Florida, the blue square is the mean centre which is the average location of all airports in Florida and the red square is the weighted mean centre which is the average location of airports in Florida after weighing in elevation. It tell us that airports are spatially clusted and not evenly distributed. The mean centre shows where most aiports are located and that is to the top left of the mainland while the weighted mean centre shows that higher elevation airports are not evenly disturbed and are more common in the panhandle.
There are different elevation restrictions for airplanes. Do they have the same mean centre? Imagine we are interested in how the distribution of elevation with sm_elev (airport elevation less than the median) differ from those with lrg_elev (airport elevation greater than or equal to the median). To do this, we will split the Florida aviation facilities dataset into two subsets. One subset will only contain elevation using sm_elev as the primary source and the other will contain only those with lrg_elev as the primary source.
summary(FL_airports$ELEV)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 23.00 65.00 73.93 102.00 295.00
sm_elev <- subset(FL_airports, (FL_airports$ELEV <65))
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 4.0.1 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
lrg_elev <- subset(FL_airports, (FL_airports$ELEV <65))
view(sm_elev)
view(lrg_elev)
nrow(sm_elev[1])
## [1] 242
nrow(lrg_elev[1])
## [1] 242
Once created, the subset can be viewed in the Console by calling the
object (sm_elev) using View(sm_elev). If you want to know
the number of sm_elev, you can run: nrow(sm_elev[1]).
Question 11: Show the Florida airports scatter plot using the subset data that you created. Use a different color for the subsets. Also plot the mean centres for the subsets. Be sure to include a legend, axis titles (with units), and a main title. Include your name and student number in brackets at the end of your main title. Do not show irrelevant information on the graph. Please also type your code in the code chunk below. (10 marks)
FL_airports <- aviationfacilities_fl[aviationfacilities_fl$SITE_TYPE_CODE == "A",]
plot(FL_airports$LONG_DECIMAL, FL_airports$LAT_DECIMAL, xlab="Longitude", ylab="Latitude", main = "Airports in Florida, (Jasmine Mann 1005255028)", xlim=c(-88, -80), ylim=c(24, 33))
points(sm_elev$LONG_DECIMAL, sm_elev$LAT_DECIMAL, pch = 20, col = "orange")
points(lrg_elev$LONG_DECIMAL, lrg_elev$LAT_DECIMAL, pch = 20, col = "purple")
n<-nrow(FL_airports[1])
mc_x<-sum(FL_airports$LONG_DECIMAL)/n
mc_y<-sum(FL_airports$LAT_DECIMAL)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="blue")
mc_x<-sum(sm_elev$LONG_DECIMAL)/n
mc_y<-sum(sm_elev$LAT_DECIMAL)/n
points(mc_x,mc_y,'p',pch=15,cex=2,col="purple")
legend("bottomleft", legend = c("Mean centre", "Small Elevation", "Large Elevation"), pch = c(15,15), col = c("blue","orange","purple"))
So far, we have measured the central tendency of the spatial data. How about dispersion? Standard deviation is a measure of dispersion that can be used to assess the distribution of spatial data. To calculate the orthogonal dispersion (east-west, north-south) associated with Florida airports dataset, we will use sd() command applied on Longitude and Latitude, respectively. Please do the same for the two elevation subsets in Question 12.
sd(FL_airports$LONG_DECIMAL)
## [1] 1.785058
sd(FL_airports$LAT_DECIMAL)
## [1] 1.542124
Question 12: Please show your code as well as the calculated standard deviation in your R Markdown. Provide a concise conclusion regarding the orthogonal dispersion for the Florida aviation facilities dataset and subsets. These conclusions should include a short description of the dispersion and a comparison (i.e. Florida airports vs. sm_elev vs. lrg_elev). Remember to include units of measurement in your response. (6 marks)
sd(sm_elev$LONG_DECIMAL)
## [1] 1.343617
sd(sm_elev$LAT_DECIMAL)
## [1] 1.533126
sd(lrg_elev$LONG_DECIMAL)
## [1] 1.343617
sd(lrg_elev$LAT_DECIMAL)
## [1] 1.533126
Type your response here: The orthogonal dispersion for the florida aviation facilities dataset and subsets is not equally spread out. The mean centre of the florida airports is to the top left of the mainland while the weighted mean centre of the florida airports is to the pandle.