Download the assignment07.Rmd file from Canvas and open it in RStudio. Complete this assignment by filling in the answers below in the R Markdown Notebook document.
The University would like to determine whether they should move forward with a campus beautification initiative. You have been hired to determine the average size of trees near the Academic Building as measured by the diameter at breast height (DBH). The study area is shown in the map below. Note that tree locations are shown as green triangles.
Data
The provided csv data was collected by previous student using one of the
sampling approach listed below.
Simple random sampling: each individual has an equal chance of being included in a sample. Random samples are, however, prone to error. For example, quite by chance a random sample might contain a disproportionately large number of individuals with specific characteristics.
Systematic sampling: every nth individual is selected or individuals are sampled every nth minute/hour etc.
Stratified random sampling: the population is divided into strata and samples are drawn randomly (from the above methods) from each strata. In this assignment, you and your partner must decide whether conducting a proportional or disproportional random sampling is more appropriate.
Cluster sampling: groups of individuals or samples within specific areas, with the individuals in each cluster drawn at random. A strength and a weakness of cluster sampling is that individuals from each cluster tend to be homogeneous.
For this assignment, each person should submit to Canvas the
following items:
1. A completed .Rmd file
2. A PDF (or Word) file knitted from the completed .Rmd file
Please work individually to answer the questions below.
df <- read.csv("C:/Users/lilli/OneDrive/Desktop/Spring25 Classes/GEOG_312/assignment07_data_final.csv")
str(df)
## 'data.frame': 32 obs. of 6 variables:
## $ Number : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Latitude_dd : num 30.6 30.6 30.6 30.6 30.6 ...
## $ Longitude_dd: num -96.3 -96.3 -96.3 -96.3 -96.3 ...
## $ DBH_in : num 0.318 0.637 0.955 1.274 1.592 ...
## $ DBH_cm : num 0.78 1.56 2.34 3.12 3.9 ...
## $ Notes : chr "one big tree" "one single trunk" "one single trunk" "multiple trunks" ...
library(ggplot2)
df <- read.csv("C:/Users/lilli/OneDrive/Desktop/Spring25 Classes/GEOG_312/assignment07_data_final.csv")
ggplot(df, aes(x = Longitude_dd, y = Latitude_dd, size = DBH_in)) +
geom_point(alpha = 0.7, color = "forestgreen") +
scale_size_continuous(name = "DBH (cm)") +
coord_cartesian(xlim = c(-96.342505, -96.340389), ylim = c(30.614661, 30.616921)) +
labs(
title = "Spatial Distribution of Tree DBH Measurements",
x = "Longitude (decimal degrees)",
y = "Latitude (decimal degrees)",
caption = "DBH data collected on a sunny day (Feb 4, 2:16 PM - 2:36 PM).\nData measured using an infrared thermometer, GPS, and tape measure."
) +
theme_minimal(base_size = 14)
mean_dbh <- mean(df$DBH, na.rm = TRUE)
## Warning in mean.default(df$DBH, na.rm = TRUE): argument is not numeric or
## logical: returning NA
print(mean_dbh)
## [1] NA
str(df)
## 'data.frame': 32 obs. of 6 variables:
## $ Number : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Latitude_dd : num 30.6 30.6 30.6 30.6 30.6 ...
## $ Longitude_dd: num -96.3 -96.3 -96.3 -96.3 -96.3 ...
## $ DBH_in : num 0.318 0.637 0.955 1.274 1.592 ...
## $ DBH_cm : num 0.78 1.56 2.34 3.12 3.9 ...
## $ Notes : chr "one big tree" "one single trunk" "one single trunk" "multiple trunks" ...
df$DBH_in <- as.numeric(df$DBH_in)
mean_dbh_in <- mean(df$DBH_in, na.rm = TRUE)
print(mean_dbh_in)
## [1] 5.254777
For the following questions, assume that your DBH sample is independent and normally distributed.
mean_dbh <- mean(df$DBH, na.rm = TRUE)
## Warning in mean.default(df$DBH, na.rm = TRUE): argument is not numeric or
## logical: returning NA
sd_dbh <- sd(df$DBH, na.rm = TRUE)
z_score <- (50 - mean_dbh) / sd_dbh
print(z_score)
## [1] NA
df$DBHIin <- as.numeric(as.character(df$DBH_in))
mean_dbh_in <- mean(df$DBH_in, na.rm = TRUE)
sd_dbh_in <- sd(df$DBH_in, na.rm = TRUE)
z_score <- (50 - mean_dbh_in) / sd_dbh_in
print(z_score)
## [1] 14.97735
df$DBHIin <- as.numeric(as.character(df$DBH_in))
mean_dbh_in <- mean(df$DBH_in, na.rm = TRUE)
sd_dbh_in <- sd(df$DBH_in, na.rm = TRUE)
probability <- 1 - pnorm(50, mean = mean_dbh_in, sd = sd_dbh_in)
print(probability)
## [1] 0