Are there natural ways to divide our plots into three groups?
The density function suggests there may not be a natural division between small, medium, and large plots:
library(ggplot2)
library(readxl)
sheet = read_xlsx("~/Downloads/220114 - Garden Plot Data - Draft.xlsx")
plots = sheet[3:91, 2]
colnames(plots) <- c("area")
plots$area = as.double(plots$area)
ggplot(plots, aes(area)) + geom_density() + geom_rug() + labs(title="Density function of plot area", x="Area", y="Density")
Mark and Matthew propose that a small plot is 81 ft^2 or smaller, and a large plot is 130 ft^2 or larger.
One desirable property of a breakpoint between size classes is that there are not many plots with sizes near the breakpoint; otherwise, there may be disagreement about whether a garden plot falls on the “small” or “medium” side.
The horizontal lines in a empirical CDF plot represent intervals where no plots fall:
ggplot(plots, aes(area)) +
stat_ecdf() +
labs(title="Empirical cumulative distribution function", x="Plot area", y="Fraction plots ≤ x") +
geom_vline(xintercept=c(81, 130), alpha=0.3) +
geom_vline(xintercept=67, color="blue", alpha=0.3) +
scale_x_continuous(breaks=c(50, 100, 150, 200, 81, 130, 67), minor_breaks=c(25, 75, 125, 175))
If we use 81 ft^2 and 130 ft^2 as our break points, we find this many of each class:
table(cut(plots$area, c(0, 81, 130, Inf), c("S", "M", "L")))
##
## S M L
## 49 19 21
Some fees that work here are:
S: $45 M: $50 L: $65
for a total of $4520.
If we use 67 ft^2 and 130 ft^2, we find this many of each class:
table(cut(plots$area, c(0, 67, 130, Inf), c("S", "M", "L")))
##
## S M L
## 38 30 21
Some fees that work here are:
S: $45 M: $50 L: $60
for a total of $4470