At an experimental station, there are 100 fields sown with wheat. Each field was divided into 16 plots of equal size (1/16th hectare). Out of 100 fields, 10 were selected by SRSWOR. From each selected field, 4 plots were chosen by SRSWOR. The yields in kg/plot are given below.
knitr::include_graphics("optimal.png")
# Data
yield_data <- matrix(c(
4.32, 4.16, 3.06, 4.00, 4.12, 4.08, 5.16, 4.40, 4.20, 4.38,
4.84, 4.36, 4.24, 4.84, 4.68, 3.96, 4.24, 4.72, 4.66, 4.36,
3.96, 3.50, 4.76, 4.32, 3.46, 4.42, 4.96, 4.04, 3.64, 3.00,
4.04, 5.00, 3.12, 3.72, 4.08, 3.08, 3.84, 3.98, 5.00, 3.92
), nrow = 4, byrow = TRUE)
yield_data
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4.32 4.16 3.06 4.00 4.12 4.08 5.16 4.40 4.20 4.38
[2,] 4.84 4.36 4.24 4.84 4.68 3.96 4.24 4.72 4.66 4.36
[3,] 3.96 3.50 4.76 4.32 3.46 4.42 4.96 4.04 3.64 3.00
[4,] 4.04 5.00 3.12 3.72 4.08 3.08 3.84 3.98 5.00 3.92
# Calculate mean yield per plot for each field
mean_yield_per_plot <- apply(yield_data, 1, mean)
mean_yield_per_plot
[1] 4.188 4.490 4.006 3.978
# Calculate overall mean yield per plot
overall_mean_yield_per_plot <- mean(mean_yield_per_plot)
overall_mean_yield_per_plot
[1] 4.1655
# Convert to yield per hectare
yield_per_hectare <- overall_mean_yield_per_plot * 16
yield_per_hectare
[1] 66.648
# Calculate standard error
N <- 100 # Total number of fields
n <- 10 # Number of selected fields
standard_error <- sqrt(sum((yield_data - overall_mean_yield_per_plot)^2) / (N * (N - 1))) / sqrt(n)
# Output
cat("Wheat yield per hectare:", yield_per_hectare, "kg/hectare\n")
Wheat yield per hectare: 66.648 kg/hectare
cat("Standard error:", standard_error, "\n")
Standard error: 0.0110199
To compare an estimate obtained from a Simple Random Sample (SRS) of 40 plots with the estimate obtained above, we need to consider the differences in sampling methods and sample sizes. Here’s how we can do it:
The estimate obtained above was based on a sample size of 10 selected fields, with 4 plots chosen from each selected field, resulting in a total of 40 plots.
The sample in the previous estimate was obtained using a two-stage sampling process: first selecting fields and then selecting plots within those fields using Simple Random Sampling without Replacement (SRSWOR). Now, if we’re comparing it to a single-stage Simple Random Sample (SRS) of 40 plots:
In an SRS, each plot has an equal chance of being selected, without considering the fields.
The estimate from the SRS of 40 plots might have higher variability compared to the estimate obtained using the two-stage sampling method. This is because the two-stage sampling method ensures that variability within fields is captured, whereas in an SRS, there’s a chance of getting a more uneven representation across fields.
If the original sampling process was unbiased, both estimates should be unbiased estimates of the population parameter. However, if there were biases in either the selection of fields or plots, it could affect the comparison.
To compare the estimates statistically, you can calculate confidence intervals for both estimates. This would give you a range within which the true population parameter is likely to lie. Additionally, you can conduct hypothesis tests to see if there’s a significant difference between the two estimates. You could also compare the standard errors of both estimates. A smaller standard error indicates a more precise estimate.
When comparing the estimates, consider the context of the study and the implications of any differences between the estimates. It’s important to understand the strengths and limitations of each sampling method and the potential impact on the reliability of the estimates. In summary, comparing estimates obtained from different sampling methods requires careful consideration of sample size, sampling method, precision, bias, and statistical measures such as confidence intervals and standard errors.
You’ll need to solve the optimization problem using appropriate functions or algorithms in R. You can use optimization functions like optim() or nlm() to minimize the cost function subject to the constraint. Here’s a general outline:
This question was manually done, refer to the notes.
Use the data below for set A and B to compute the following.
Note!!! Solution for the Jackknife was done manually, refer in the notes!!!
set_A <- c(56.71, 54.23, 48.54, 59.42, 67.12, 49.33)
# Number of bootstrap samples
num_samples <- 1000
# Bootstrap function
bootstrap <- function(data, num_samples) {
replicate(num_samples, mean(sample(data, replace=TRUE)))
}
# Bootstrap statistics for Set A
bootstrap_means_A <- bootstrap(set_A, num_samples)
var_bootstrap_A <- var(bootstrap_means_A)
se_bootstrap_A <- sqrt(var_bootstrap_A)
confidence_interval_A <- quantile(bootstrap_means_A, c(0.025, 0.975))
# Print results for Set A
cat("Set A:\n")
Set A:
cat("Sample Mean (X̄_A):", mean(set_A), "\n")
Sample Mean (X̄_A): 55.89167
cat("Bootstrap Variance (σ^2 Bootstrap_A):", var_bootstrap_A, "\n")
Bootstrap Variance (σ^2 Bootstrap_A): 6.622186
cat("Standard Error (S.E) Bootstrap_A:", se_bootstrap_A, "\n")
Standard Error (S.E) Bootstrap_A: 2.573361
cat("95% Confidence Interval for X̄_A:", confidence_interval_A, "\n")
95% Confidence Interval for X̄_A: 51.11333 61.08333
set_B <- c(30.25, 33.14, 29.53, 31.70, 37.64, 38.89, 34.67, 41.56, 40.45, 33.67, 35.81, 28.41)
# Number of bootstrap samples
num_samples <- 1000
# Bootstrap function
bootstrap <- function(data, num_samples) {
replicate(num_samples, mean(sample(data, replace=TRUE)))
}
# Bootstrap statistics for Set B
bootstrap_means_B <- bootstrap(set_B, num_samples)
var_bootstrap_B <- var(bootstrap_means_B)
se_bootstrap_B <- sqrt(var_bootstrap_B)
confidence_interval_B <- quantile(bootstrap_means_B, c(0.025, 0.975))
# Print results for Set B
cat("\nSet B:\n")
Set B:
cat("Sample Mean (X̄_B):", mean(set_B), "\n")
Sample Mean (X̄_B): 34.64333
cat("Bootstrap Variance (σ^2 Bootstrap_B):", var_bootstrap_B, "\n")
Bootstrap Variance (σ^2 Bootstrap_B): 1.440769
cat("Standard Error (S.E) Bootstrap_B:", se_bootstrap_B, "\n")
Standard Error (S.E) Bootstrap_B: 1.20032
cat("95% Confidence Interval for X̄_B:", confidence_interval_B, "\n")
95% Confidence Interval for X̄_B: 32.16713 36.96398
# Set A data
set_A <- c(56.71, 54.23, 48.54, 59.42, 67.12, 49.33)
# Number of observations
n <- length(set_A)
# Jackknife function
jackknife <- function(data) {
n <- length(data)
jk_means <- numeric(n)
for (i in 1:n) {
jk_means[i] <- mean(data[-i])
}
return(jk_means)
}
# Jackknife statistics for Set A
jackknife_means_A <- jackknife(set_A)
var_jackknife_A <- var(jackknife_means_A)
se_jackknife_A <- sqrt(var_jackknife_A)
confidence_interval_A <- quantile(jackknife_means_A, c(0.025, 0.975))
# Print results for Set A
cat("Set A:\n")
Set A:
cat("Sample Mean (X̄_A):", mean(set_A), "\n")
Sample Mean (X̄_A): 55.89167
cat("Jackknife Variance (σ^2 Jackknife_A):", var_jackknife_A, "\n")
Jackknife Variance (σ^2 Jackknife_A): 1.912463
cat("Standard Error (S.E) Jackknife_A:", se_jackknife_A, "\n")
Standard Error (S.E) Jackknife_A: 1.382918
cat("95% Confidence Interval for X̄_A:", confidence_interval_A, "\n")
95% Confidence Interval for X̄_A: 53.8385 57.34225