MATH 851 SAMPLE SURVEYS:PRACTICAL QUESTIONS DONE USING RSTUDIO

At an experimental station, there are 100 fields sown with wheat. Each field was divided into 16 plots of equal size (1/16th hectare). Out of 100 fields, 10 were selected by SRSWOR. From each selected field, 4 plots were chosen by SRSWOR. The yields in kg/plot are given below.

knitr::include_graphics("optimal.png")

I. Estimate the wheat yield per hectare for the experimental station along with its standard error.

# Data
yield_data <- matrix(c(
  4.32, 4.16, 3.06, 4.00, 4.12, 4.08, 5.16, 4.40, 4.20, 4.38,
  4.84, 4.36, 4.24, 4.84, 4.68, 3.96, 4.24, 4.72, 4.66, 4.36,
  3.96, 3.50, 4.76, 4.32, 3.46, 4.42, 4.96, 4.04, 3.64, 3.00,
  4.04, 5.00, 3.12, 3.72, 4.08, 3.08, 3.84, 3.98, 5.00, 3.92
), nrow = 4, byrow = TRUE)

yield_data

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4.32 4.16 3.06 4.00 4.12 4.08 5.16 4.40 4.20  4.38
[2,] 4.84 4.36 4.24 4.84 4.68 3.96 4.24 4.72 4.66  4.36
[3,] 3.96 3.50 4.76 4.32 3.46 4.42 4.96 4.04 3.64  3.00
[4,] 4.04 5.00 3.12 3.72 4.08 3.08 3.84 3.98 5.00  3.92

# Calculate mean yield per plot for each field
mean_yield_per_plot <- apply(yield_data, 1, mean)
mean_yield_per_plot

[1] 4.188 4.490 4.006 3.978

# Calculate overall mean yield per plot
overall_mean_yield_per_plot <- mean(mean_yield_per_plot)
overall_mean_yield_per_plot

[1] 4.1655

# Convert to yield per hectare
yield_per_hectare <- overall_mean_yield_per_plot * 16
yield_per_hectare

[1] 66.648

# Calculate standard error
N <- 100  # Total number of fields
n <- 10   # Number of selected fields
standard_error <- sqrt(sum((yield_data - overall_mean_yield_per_plot)^2) / (N * (N - 1))) / sqrt(n)

# Output
cat("Wheat yield per hectare:", yield_per_hectare, "kg/hectare\n")

Wheat yield per hectare: 66.648 kg/hectare

cat("Standard error:", standard_error, "\n")

Standard error: 0.0110199

II. How can an estimate obtained from an SRS of 40 plots be compared with the estimate obtained above in part?

To compare an estimate obtained from a Simple Random Sample (SRS) of 40 plots with the estimate obtained above, we need to consider the differences in sampling methods and sample sizes. Here’s how we can do it:

1. Sample Size:

The estimate obtained above was based on a sample size of 10 selected fields, with 4 plots chosen from each selected field, resulting in a total of 40 plots.

2. Sampling Method:

The sample in the previous estimate was obtained using a two-stage sampling process: first selecting fields and then selecting plots within those fields using Simple Random Sampling without Replacement (SRSWOR). Now, if we’re comparing it to a single-stage Simple Random Sample (SRS) of 40 plots:

• Sampling Method:

In an SRS, each plot has an equal chance of being selected, without considering the fields.

Comparing Estimates:

Precision:

The estimate from the SRS of 40 plots might have higher variability compared to the estimate obtained using the two-stage sampling method. This is because the two-stage sampling method ensures that variability within fields is captured, whereas in an SRS, there’s a chance of getting a more uneven representation across fields.

Bias:

If the original sampling process was unbiased, both estimates should be unbiased estimates of the population parameter. However, if there were biases in either the selection of fields or plots, it could affect the comparison.

Statistical Comparison:

To compare the estimates statistically, you can calculate confidence intervals for both estimates. This would give you a range within which the true population parameter is likely to lie. Additionally, you can conduct hypothesis tests to see if there’s a significant difference between the two estimates. You could also compare the standard errors of both estimates. A smaller standard error indicates a more precise estimate.

Interpretation:

When comparing the estimates, consider the context of the study and the implications of any differences between the estimates. It’s important to understand the strengths and limitations of each sampling method and the potential impact on the reliability of the estimates. In summary, comparing estimates obtained from different sampling methods requires careful consideration of sample size, sampling method, precision, bias, and statistical measures such as confidence intervals and standard errors.

III. Obtain optimum values of n and m under the cost function. 100 = 4n+nm. Take a = 0, c1 = 4, c2 = 1

You’ll need to solve the optimization problem using appropriate functions or algorithms in R. You can use optimization functions like optim() or nlm() to minimize the cost function subject to the constraint. Here’s a general outline:

Note~!!,

This question was manually done, refer to the notes.

RESAMPLING METHODS

Jacknife and Bootstrap

Exercise

Use the data below for set A and B to compute the following.

Jackknife sigma^2
Jackknife S.E
95% Confidence Interval for means

Solution

Note!!! Solution for the Jackknife was done manually, refer in the notes!!!

Set A data

set_A <- c(56.71, 54.23, 48.54, 59.42, 67.12, 49.33)

# Number of bootstrap samples
num_samples <- 1000

# Bootstrap function
bootstrap <- function(data, num_samples) {
  replicate(num_samples, mean(sample(data, replace=TRUE)))
}

# Bootstrap statistics for Set A
bootstrap_means_A <- bootstrap(set_A, num_samples)
var_bootstrap_A <- var(bootstrap_means_A)
se_bootstrap_A <- sqrt(var_bootstrap_A)
confidence_interval_A <- quantile(bootstrap_means_A, c(0.025, 0.975))


# Print results for Set A
cat("Set A:\n")

Set A:

cat("Sample Mean (X̄_A):", mean(set_A), "\n")

Sample Mean (X̄_A): 55.89167

cat("Bootstrap Variance (σ^2 Bootstrap_A):", var_bootstrap_A, "\n")

Bootstrap Variance (σ^2 Bootstrap_A): 6.622186

cat("Standard Error (S.E) Bootstrap_A:", se_bootstrap_A, "\n")

Standard Error (S.E) Bootstrap_A: 2.573361

cat("95% Confidence Interval for X̄_A:", confidence_interval_A, "\n")

95% Confidence Interval for X̄_A: 51.11333 61.08333

Set B data

set_B <- c(30.25, 33.14, 29.53, 31.70, 37.64, 38.89, 34.67, 41.56, 40.45, 33.67, 35.81, 28.41)

# Number of bootstrap samples
num_samples <- 1000


# Bootstrap function
bootstrap <- function(data, num_samples) {
  replicate(num_samples, mean(sample(data, replace=TRUE)))
}


# Bootstrap statistics for Set B
bootstrap_means_B <- bootstrap(set_B, num_samples)
var_bootstrap_B <- var(bootstrap_means_B)
se_bootstrap_B <- sqrt(var_bootstrap_B)
confidence_interval_B <- quantile(bootstrap_means_B, c(0.025, 0.975))


# Print results for Set B
cat("\nSet B:\n")


Set B:

cat("Sample Mean (X̄_B):", mean(set_B), "\n")

Sample Mean (X̄_B): 34.64333

cat("Bootstrap Variance (σ^2 Bootstrap_B):", var_bootstrap_B, "\n")

Bootstrap Variance (σ^2 Bootstrap_B): 1.440769

cat("Standard Error (S.E) Bootstrap_B:", se_bootstrap_B, "\n")

Standard Error (S.E) Bootstrap_B: 1.20032

cat("95% Confidence Interval for X̄_B:", confidence_interval_B, "\n")

95% Confidence Interval for X̄_B: 32.16713 36.96398

Jackknife for set A (Compare the results with what was found manually)

# Set A data
set_A <- c(56.71, 54.23, 48.54, 59.42, 67.12, 49.33)

# Number of observations
n <- length(set_A)

# Jackknife function
jackknife <- function(data) {
  n <- length(data)
  jk_means <- numeric(n)
  for (i in 1:n) {
    jk_means[i] <- mean(data[-i])
  }
  return(jk_means)
}

# Jackknife statistics for Set A
jackknife_means_A <- jackknife(set_A)
var_jackknife_A <- var(jackknife_means_A)
se_jackknife_A <- sqrt(var_jackknife_A)
confidence_interval_A <- quantile(jackknife_means_A, c(0.025, 0.975))

# Print results for Set A
cat("Set A:\n")

Set A:

cat("Sample Mean (X̄_A):", mean(set_A), "\n")

Sample Mean (X̄_A): 55.89167

cat("Jackknife Variance (σ^2 Jackknife_A):", var_jackknife_A, "\n")

Jackknife Variance (σ^2 Jackknife_A): 1.912463

cat("Standard Error (S.E) Jackknife_A:", se_jackknife_A, "\n")

Standard Error (S.E) Jackknife_A: 1.382918

cat("95% Confidence Interval for X̄_A:", confidence_interval_A, "\n")

95% Confidence Interval for X̄_A: 53.8385 57.34225