Instructions

Download the assignment07.Rmd file from Canvas and open it in RStudio. Complete this assignment by filling in the answers below in the R Markdown Notebook document.

Research Question

The University would like to determine whether they should move forward with a campus beautification initiative. You have been hired to determine the average size of trees near the Academic Building as measured by the diameter at breast height (DBH). The study area is shown in the map below. Note that tree locations are shown as green triangles.

Data
The provided csv data was collected by previous student using one of the sampling approach listed below.

  • Simple random sampling: each individual has an equal chance of being included in a sample. Random samples are, however, prone to error. For example, quite by chance a random sample might contain a disproportionately large number of individuals with specific characteristics.

  • Systematic sampling: every nth individual is selected or individuals are sampled every nth minute/hour etc.

  • Stratified random sampling: the population is divided into strata and samples are drawn randomly (from the above methods) from each strata. In this assignment, you and your partner must decide whether conducting a proportional or disproportional random sampling is more appropriate.

  • Cluster sampling: groups of individuals or samples within specific areas, with the individuals in each cluster drawn at random. A strength and a weakness of cluster sampling is that individuals from each cluster tend to be homogeneous.

Deliverables

For this assignment, each person should submit to Canvas the following items:
1. A completed .Rmd file
2. A PDF (or Word) file knitted from the completed .Rmd file

Questions

Please work individually to answer the questions below.

  1. Read in your DBH sample csv (assignment07_data.csv) and set it to the variable “df”. Print the structure of df.
df <- read.csv("C:/Users/lilli/OneDrive/Desktop/Spring25 Classes/GEOG_312/assignment07_data_final.csv")

str(df)
## 'data.frame':    32 obs. of  6 variables:
##  $ Number      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Latitude_dd : num  30.6 30.6 30.6 30.6 30.6 ...
##  $ Longitude_dd: num  -96.3 -96.3 -96.3 -96.3 -96.3 ...
##  $ DBH_in      : num  0.318 0.637 0.955 1.274 1.592 ...
##  $ DBH_cm      : num  0.78 1.56 2.34 3.12 3.9 ...
##  $ Notes       : chr  "one big tree" "one single trunk" "one single trunk" "multiple trunks" ...
  1. Using ggplot2, plot the spatial locations of the DBH measurements using decimal degrees. Set the plot boundary to the following coordinates: xmin: -96.342505 ymin: 30.614661 xmax: -96.340389 ymax: 30.616921. Scale the size of the symbols representing the trees to the DBH of the tree. Your plot will be graded based on the highest standards of data visualization. Remember to include a title and descriptive caption.
library(ggplot2)

df <- read.csv("C:/Users/lilli/OneDrive/Desktop/Spring25 Classes/GEOG_312/assignment07_data_final.csv")

ggplot(df, aes(x = Longitude_dd, y = Latitude_dd, size = DBH_in)) +
  geom_point(alpha = 0.7, color = "forestgreen") +
  scale_size_continuous(name = "DBH (cm)") +
  coord_cartesian(xlim = c(-96.342505, -96.340389), ylim = c(30.614661, 30.616921)) +
  labs(
    title = "Spatial Distribution of Tree DBH Measurements",
    x = "Longitude (decimal degrees)",
    y = "Latitude (decimal degrees)",
    caption = "DBH data collected on a sunny day (Feb 4, 2:16 PM - 2:36 PM).\nData measured using an infrared thermometer, GPS, and tape measure."
  ) +
  theme_minimal(base_size = 14)

  1. What is the mean of your DBH sample in cm?
mean_dbh <- mean(df$DBH, na.rm = TRUE)
## Warning in mean.default(df$DBH, na.rm = TRUE): argument is not numeric or
## logical: returning NA
print(mean_dbh)
## [1] NA
str(df)
## 'data.frame':    32 obs. of  6 variables:
##  $ Number      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Latitude_dd : num  30.6 30.6 30.6 30.6 30.6 ...
##  $ Longitude_dd: num  -96.3 -96.3 -96.3 -96.3 -96.3 ...
##  $ DBH_in      : num  0.318 0.637 0.955 1.274 1.592 ...
##  $ DBH_cm      : num  0.78 1.56 2.34 3.12 3.9 ...
##  $ Notes       : chr  "one big tree" "one single trunk" "one single trunk" "multiple trunks" ...
df$DBH_in <- as.numeric(df$DBH_in)
mean_dbh_in <- mean(df$DBH_in, na.rm = TRUE)
print(mean_dbh_in)
## [1] 5.254777

For the following questions, assume that your DBH sample is independent and normally distributed.

  1. Using your sample, what is the Z-Score of a tree with a 50 cm DBH?
mean_dbh <- mean(df$DBH, na.rm = TRUE)
## Warning in mean.default(df$DBH, na.rm = TRUE): argument is not numeric or
## logical: returning NA
sd_dbh <- sd(df$DBH, na.rm = TRUE)
z_score <- (50 - mean_dbh) / sd_dbh
print(z_score)
## [1] NA
df$DBHIin <- as.numeric(as.character(df$DBH_in))
mean_dbh_in <- mean(df$DBH_in, na.rm = TRUE)
sd_dbh_in <- sd(df$DBH_in, na.rm = TRUE)
z_score <- (50 - mean_dbh_in) / sd_dbh_in
print(z_score)
## [1] 14.97735
  1. Using your sample, calculate the likelihood of that if you measured another tree in the study area that it will have a DBH greater than 50 cm?
df$DBHIin <- as.numeric(as.character(df$DBH_in))
mean_dbh_in <- mean(df$DBH_in, na.rm = TRUE)
sd_dbh_in <- sd(df$DBH_in, na.rm = TRUE)
probability <- 1 - pnorm(50, mean = mean_dbh_in, sd = sd_dbh_in)
print(probability)
## [1] 0