STAT 403 A1

Author

HASAN MERCHANT

Assignment 1

Question 1


1. Grip strength is a key indicator of overall muscle health, hand function, and recovery
progress in rehabilitation settings. In kinesiology research, it is often measured using a
hand dynamometer in kilograms (kg) to assess athletes, patients recovering from injury,
or older adults in strength training programs. Variability arises due to factors like age,
training history, hand dominance, time of day, and fatigue.

Consider a survey to measure grip strength in a very large population of university
varsity athletes at a Canadian institution (e.g., rowers, swimmers, track athletes training,
basketball players, … on campus). If all athletes had the same grip strength, measuring
one would suffice. However, grip strength varies widely, so the average grip strength
across ALL varsity athletes provides a useful benchmark for program evaluation and
injury risk assessment.
A simple random sample of 10 athletes was selected, and their maximum grip strength
(averaged over both hands) was measured in kg. The data are below:
88 92 99 105 94 95 98 100 97 91

a. Compute the sample mean and sample standard deviation for these data.

A Sample is a segment of populations of interest that is subject to a study. The above sample shows 10 observations with a mean of 88 kg

Sampled_Observations <- c(88,92,99,105,94,95,98,100,97,91)
mean(Sampled_Observations)
[1] 95.9

Calculations: Sum of observations: 959 n = 10 Mean = Sum/n = 959/10 = 95.90 kg

The Sample Standard deviation quantifies the disbursement of the sampled data relative to the mean (which we acknowledge has been estimated from the data). Therefore the Sample Standard Deviation is: 4.954235

sd(Sampled_Observations)
[1] 4.954235

b. Compute the standard error of the mean. Interpret this value in terms of repeated sampling.

If we repeatedly drew simple random samples of size 10 from the population of varsity athletes and computed the sample mean for each sample, the standard deviation of those sample means would be approximately 1.5667 kg. This quantifies the sampling variability: how much the sample mean would typically vary from one random sample to another.

Standard_Error<- (sd(Sampled_Observations)/sqrt(10))
Standard_Error
[1] 1.566667

c. Suppose that grip strength in varsity athletes can be well approximated by a normal distribution with a mean of 96 kg and a standard deviation of 4.2 kg. Draw a sample of size n = 10,000 from this distribution and plot the data on a histogram.

Population mean (μ): 96.0 kg Population standard deviation (σ): 4.2 kg

Sample of 10,000 observations: Sample mean: 96.04 kg Sample SD: 4.19 kg

set.seed(123)

x<- rnorm(10000, mean =96, sd =4.2)

hist(x, main = "Simulated Grip Strength (n = 10,000)",
     xlab = "Grip strength (kg)")

d. Suppose that grip strength in varsity athletes follows a normal distribution with a mean of 96 kg and a standard deviation of 4.2 kg. Draw 10,000 samples of size n = 10 from thispopulation and compute the sample mean for each. Construct a histogram of the 10,000-sample means. Compare this histogram to the one in part c. with respect to their centres, spread, and shape.

set.seed(123)
nsim <- 10000
n <- 10
pop_mu <- 96
pop_sd <- 4.2

sample_means <- numeric(nsim)
for (i in 1:nsim) {
  samp <- rnorm(n, mean = pop_mu, sd = pop_sd)
  sample_means[i] <- mean(samp)
}

# Histogram of sample means
hist(sample_means, breaks = 40, main = "Histogram: Sample means (n=10) 10,000 samples",
     xlab = "Sample mean grip strength (kg)")

# Summaries
mean(sample_means)
[1] 96.0041
sd(sample_means)  # empirical SD of sample means
[1] 1.31842

Comparison of Part (c) vs Part (d) Histograms:

  1. CENTER: Part (c) - Individual observations: Mean ≈ 96.04 kg Part (d) Sample means: Mean ≈ 96.00 kg Both are centered around the population mean of 96 kg.

  2. SPREAD: Part (c) - Individual observations: SD ≈ 4.19 kg Part (d) Sample means: SD ≈ 1.32 kg Part (d) has MUCH LESS spread. The sample means are less variable than individual observations. This demonstrates that sample means cluster more tightly around the population mean than individuals do.

  3. SHAPE: Part (c): Approximately normal (bell-shaped), as expected since we sampled from a normal distribution. Part (d): Also approximately normal (bell-shaped), but narrower. This illustrates the Central Limit Theorem the sampling distribution of the mean is approximately normal.

e. Using your samples in part d., compute the mean of your sample means and the standard deviation of the sample means. How do these compare to the theoretical values?

Computed from simulation: Mean of sample means: 95.9957 kg SD of sample means: 1.3187 kg

Theoretical values: Expected value of x̄: μ = 96.0 kg Standard error: σ/√n = 4.2/√10 = 1.3282 kg

Comparison: Mean difference: 0.0043 kg (very close) SD difference: 0.0094 kg (very close)

The simulated values match the theoretical values very well this confirms that the sampling distribution behaves as expected.

f. For each of your 10,000 samples of size n = 10 in part d. from this population, compute a 95% confidence interval for the mean grip strength of varsity athletes. Find the percentage of your 95% confidence intervals that contain the true population mean value of 96 kg.

set.seed(123)
nsim <- 10000
n <- 10
pop_mu <- 96
pop_sd <- 4.2
tcrit <- qt(0.975, df = n-1)   # should be ~2.262157

contains <- logical(nsim)

for (i in 1:nsim) {
  samp <- rnorm(n, mean = pop_mu, sd = pop_sd)
  xbar <- mean(samp)
  s <- sd(samp)               # sample sd (n-1)
  margin <- tcrit * s / sqrt(n)
  lower <- xbar - margin
  upper <- xbar + margin
  contains[i] <- (lower <= pop_mu && pop_mu <= upper)
}

mean(contains)   # proportion of intervals that contain 96
[1] 0.9515

We would expect approximately 95% of confidence intervals to contain the true population mean. Our simulation yielded 95.24%, which is very close to the expected 95%. This demonstrates that 95% confidence intervals have the correct coverage probability in repeated sampling.

Question 2

2. On October 6, 2025, I received the email below. This email was sent to SFU faculty, each of whom was given a link to fill out the survey. Dear SFU Faculty: As part of our ongoing efforts to enhance the university’s administrative services, we are pleased to announce the upcoming launch of the Service Effectiveness Survey (SES) on October 20, 2025. The SES occurs once a year in the Fall and is run in two parts, with each part examining a different set of services. Conducted by Nous Data Insights—a specialist in university administration—the survey aims to understand your satisfaction with our services and to assess their various aspects, including importance and performance. Your participation is crucial in helping our teams improve your experience at the university and continue to meet your needs effectively. The survey should take approximately 15 minutes to complete and we encourage you to share your feedback with us by November 14, 2025, at 11:59 p.m. All responses will remain confidential. We appreciate your support in making SFU a better place for everyone.

a. What type of survey is this?

Since respondents self-select whether to participate, this is a voluntary response survey.

b. Why would SFU conduct this type of survey (not the topic of the survey, but the type of survey)?

  • Because its is efficient, low-cost, quick, and easy to deploy university-wide. It allows centralized administration, rapid collection of structured feedback, and uses standard metrics (importance/performance) that are easy to compare across service units and across years.

c. Would you trust the results of such a survey? Explain why or why not in the context of this survey.

I would not trust the results of such a survey because this is a voluntary response survey, the results are subject to voluntary response bias. Faculty members who feel especially strongly about administrative services, either very dissatisfied or very satisfied, are more likely to complete the survey, while those with neutral opinions may be less likely to respond. As a result, the responses may not accurately represent the views of the entire faculty population.

Additionally, there is no guarantee of a high or uniform response rate across departments, ranks, or contract types. If certain groups of faculty respond at much higher rates than others, the results may be further biased.

While the survey may still provide useful qualitative feedback and highlight areas of concern, it should not be treated as an unbiased estimate of overall faculty satisfaction

Question 3

3. Suppose we wish to estimate the average number of people/households for homes on
Burnaby Mountain. Also, suppose we do not have a list of all households from which we
can sample. Propose a sampling method to achieve this goal. This is conceptually more challenging than it appears (e.g., what is a household?). In presenting your solution, consider the following:
• What practical problems arise in establishing a frame from which to
sample? How will you do this?

  • No ready list of households: municipal lists may exist but could be incomplete (multi-unit buildings, student housing, dormitories, short-term rentals).

  • Defining “household”: does a student suite count as one household or multiple? Dorms and long-term care? Need a clear operational definition (people sharing meals and living space = one household).


• What is the sampling unit?

  • Sampling unit: a household (a dwelling unit).


• What is the observational unit?

  • Observational unit: the individuals living in that household (but the variable of interest is number of people per household).


    • How is the sample selection carried out?’

    • stratified sampling with simple random sampling within strata. Stratification reduces bias/variance and is easy to implement.

      Steps: Stratify Burnaby Mountain by housing type (simple categories): Stratum 1: single-family houses Stratum 2: low-rise/townhouses Stratum 3: multi-unit apartment/condo buildings

      Construct a quick frame for each stratum. Use municipal parcel/building records and Google Maps to list dwelling units for each stratum. If unit counts for buildings are not listed, do a short listing in those buildings chosen for sampling.

      Within each stratum, draw a simple random sample (SRS) of households. Choose sample sizes in strata proportional to stratum size (or use equal allocation if you want roughly equal precision across strata). Example: if total desired sample n=300n=300n=300, allocate nh=Nh/N×300n_h = N_h/N times 300nh​=Nh​/N×300 households to stratum hhh, then do an SRS of nhn_hnh​ households from the frame in that stratum.

      Collect data: call/visit sampled households and record yiy_iyi​ = number of people living there. Use multiple contact attempts to reduce nonresponse.


      • How would you estimate the mean and variance for the weight of apples?

      • To estimate the average household size, we would calculate the sample mean by adding together the number of people in each sampled household and dividing by the total number of sampled households. This gives an estimate of the average number of people per household on Burnaby Mountain.

        To estimate the variability in household size, we would compute the sample variance by measuring how much each household’s size differs from the sample mean. The variance of the estimated mean household size would then be estimated by dividing the sample variance by the sample size.

        If the total number of households on Burnaby Mountain is known and the sample represents a large fraction of the population, a finite population correction could be applied to adjust the variance estimate.

    • Are there any obvious problems with the proposed method?

    -   **Frame incompleteness / coverage error.** Some units (basement suites, short-term rentals) may be missed by municipal or online listings.
    
    -   **Ambiguity in “household” definition.** Inconsistent respondent interpretation causes measurement error. Fix by giving a short clear definition in the survey.
    
    -   **Nonresponse bias.** Households not answering may differ in size. Mitigate with multiple contact attempts and nonresponse weighting if possible.