Statistical Inferance Assignment

ID:20228034

(a) What is the point estimate of the population mean?

In statistics, a point estimate is a single value that we user to approximate an unknown parameter of a population based on information from a sample. In this case the unknown parameter we are trying to estimate is the population mean of Category Fluency Test scores for subjects with early Alzheimer’s disease.

To calculate the point estimate of the population mean, we use the sample mean. The sample mean is the sum of all the observed values (Category Fluency Test scores) divided by the number of subjects in the sample.

Here’s the step-by-step process:

Add up all the Category Fluency Test scores: 11 + 10 + 6 + 3 + 11 + 10 + 9 + 11 = 71
Count the number of subjects in the sample: 8
Divide the sum of scores by the number of subjects: 71 / 8 = 8.875

So Point estimate of the population mean: 8.875

(b) What is the standard deviation of the sample?

The standard deviation of a sample is a measure of how spread out or dispersed the data points are in the sample. It quantifies the average amount of deviation or variation from the sample mean. In other words, it tells us how much individual data points differ from the sample’s average.

Using python code get Standard deviation of the sample: 2.900123

(c) What is the estimated standard error of the sample mean?

The estimated standard error of the sample mean is a measure of how much the sample mean is expected to vary from the true population mean. It gives us an idea of the precision of the sample mean as an estimate of the population mean. In other words, it tells us the average amount of error we might encounter when using the sample mean to estimate the population mean.

The formula for estimating the standard error of the sample mean (SE) is:

SE = s / √n

Using python code get Estimated standard error of the sample mean: 1.025348

(d) Construct a 95% confidence interval for the population mean category fluency test score.

To construct a 95% confidence interval for the population mean Category Fluency Test score, we’ll use the sample mean and the estimated standard error of the sample mean. The 95% confidence interval will provide a range of values within which we can be 95% confident that the true population mean lies.

The formula for the confidence interval is:

Confidence Interval = Sample mean ± (Critical value) * (Standard error of the sample mean)

The critical value corresponds to the level of confidence and the sample size. For a 95% confidence level and a sample size greater than 30 (which is the case here since n = 8), the critical value is approximately 1.96.

Let’s calculate the confidence interval:

Sample mean (x_bar) = 8.875 (calculated in part (a)) Standard error of the sample mean (SE) = 1.025 (calculated in part (c)) Critical value (z) = 1.96

Confidence Interval = 8.875 ± 1.96 * 1.025 Confidence Interval ≈ 8.875 ± 2.009 (rounded to three decimal places)

Lower bound of the confidence interval ≈ 8.875 - 2.009 ≈ 6.866 Upper bound of the confidence interval ≈ 8.875 + 2.009 ≈ 10.884

So, the 95% confidence interval for the population mean Category Fluency Test score is approximately 6.866 to 10.884. This means that we can be 95% confident that the true population mean Category Fluency Test score falls within this interval.

(e) What is the precision of the estimate?

The precision of the estimate is the measure of how well the sample mean represents the true population mean. It is determined by the standard error of the sample mean. The standard error represents the average amount of error or uncertainty in the sample mean as an estimate of the population mean.

Using python code get Precision of the estimate: 2.424564

(f) State the probabilistic and the practical interpretation of the confidence interval you constructed.

The 95% confidence interval we constructed for the population mean Category Fluency Test score provides both a probabilistic and practical interpretation. The probabilistic interpretation suggests that if we were to repeat the process of taking random samples from the same population and calculating 95% confidence intervals for each sample, about 95% of those intervals would contain the true population mean. In simpler terms, there is a 95% chance that the true average Category Fluency Test score for all subjects with early Alzheimer’s disease lies within the range of 6.871 to 10.879. The practical interpretation, on the other hand, informs us that we can be 95% confident that the true population mean falls within this specific interval. It implies that based on the available sample data and statistical analysis, we have strong grounds to believe that the average Category Fluency Test score for all subjects with early Alzheimer’s disease is likely to be somewhere between 6.871 and 10.879. The confidence interval serves as a helpful tool in understanding the precision and uncertainty associated with our estimate, aiding researchers and decision-makers in drawing reliable conclusions and making informed choices based on the sample data.

# Category Fluency Test scores
scores <- c(11, 10, 6, 3, 11, 10, 9, 11)



# Point estimate of the population mean (sample mean)
point_estimate <- mean(scores)

# Standard deviation of the sample
sample_std <- sd(scores)

# Estimated standard error of the sample mean
se <- sample_std / sqrt(length(scores))

# 95% confidence interval for the population mean
confidence_interval <- t.test(scores, conf.level = 0.95)$conf.int

# Precision of the estimate (margin of error)
precision <- (confidence_interval[2] - confidence_interval[1]) / 2

# Probabilistic interpretation: In repeated sampling, approximately 95% of the confidence intervals would contain the true population mean.
# Practical interpretation: We are 95% confident that the true population mean category fluency test score falls within the calculated confidence interval.

# Output the results
cat("Point estimate of the population mean:", point_estimate, "\n")

## Point estimate of the population mean: 8.875

cat("Standard deviation of the sample:", sample_std, "\n")

## Standard deviation of the sample: 2.900123

cat("Estimated standard error of the sample mean:", se, "\n")

## Estimated standard error of the sample mean: 1.025348

cat("95% confidence interval for the population mean:", confidence_interval[1], "-", confidence_interval[2], "\n")

## 95% confidence interval for the population mean: 6.450436 - 11.29956

cat("Precision of the estimate:", precision, "\n")

## Precision of the estimate: 2.424564

1. Estimate the population mean (μGirth, μHeight, μVolume ) for each of the variables.

# Read the dataset
data <- read.table("../Data/trees.txt", header = TRUE)
hist(data$Girth);hist(data$Height);hist(data$Volume)

# Sample mean for Girth
mean_girth <- mean(data$Girth)

# Sample mean for Height
mean_height <- mean(data$Height)

# Sample mean for Volume
mean_volume <- mean(data$Volume)

# Sample mean for Girth
mean_girth <- mean(data$Girth)

# Sample mean for Height
mean_height <- mean(data$Height)

# Sample mean for Volume
mean_volume <- mean(data$Volume)

print(paste("Estimated population mean for Girth:", mean_girth))

## [1] "Estimated population mean for Girth: 13.2483870967742"

print(paste("Estimated population mean for Height:", mean_height))

## [1] "Estimated population mean for Height: 76"

print(paste("Estimated population mean for Volume:", mean_volume))

## [1] "Estimated population mean for Volume: 30.1709677419355"

2. Draw a random sample of size 7 and hence, compute the point estimates from this sample.

#Set seed to ensure that the random sample remains the same every time you run the code:
set.seed(42)
#Draw a random sample of size 7 from the dataset:
sample_data <- data[sample(nrow(data), 7), ]


#Compute the point estimates for each variable from the random sample:
# Point estimate for Girth
point_estimate_girth <- mean(sample_data$Girth)

# Point estimate for Height
point_estimate_height <- mean(sample_data$Height)

# Point estimate for Volume
point_estimate_volume <- mean(sample_data$Volume)


#Print the point estimates:
print(paste("Point estimate for Girth:", point_estimate_girth))

## [1] "Point estimate for Girth: 11.8857142857143"

print(paste("Point estimate for Height:", point_estimate_height))

## [1] "Point estimate for Height: 78"

print(paste("Point estimate for Volume:", point_estimate_volume))

## [1] "Point estimate for Volume: 24.1714285714286"

3. Estimate the confidence intervals (CI) for the population mean of each variables.

# Sample mean and standard error for Girth
mean_girth <- mean(data$Girth)
se_girth <- sd(data$Girth) / sqrt(length(data$Girth))

# Sample mean and standard error for Height
mean_height <- mean(data$Height)
se_height <- sd(data$Height) / sqrt(length(data$Height))

# Sample mean and standard error for Volume
mean_volume <- mean(data$Volume)
se_volume <- sd(data$Volume) / sqrt(length(data$Volume))


# For a 95% confidence interval, the critical value is approximately 1.96
critical_value <- 1.96



# Confidence interval for Girth
ci_girth_lower <- mean_girth - critical_value * se_girth
ci_girth_upper <- mean_girth + critical_value * se_girth

# Confidence interval for Height
ci_height_lower <- mean_height - critical_value * se_height
ci_height_upper <- mean_height + critical_value * se_height

# Confidence interval for Volume
ci_volume_lower <- mean_volume - critical_value * se_volume
ci_volume_upper <- mean_volume + critical_value * se_volume


print(paste("95% Confidence Interval for Girth:", ci_girth_lower, "to", ci_girth_upper))

## [1] "95% Confidence Interval for Girth: 12.1436794819789 to 14.3530947115694"

print(paste("95% Confidence Interval for Height:", ci_height_lower, "to", ci_height_upper))

## [1] "95% Confidence Interval for Height: 73.7569536843405 to 78.2430463156595"

print(paste("95% Confidence Interval for Volume:", ci_volume_lower, "to", ci_volume_upper))

## [1] "95% Confidence Interval for Volume: 24.384411966645 to 35.957523517226"

4. Draw two independent random samples of sizes 6 for the variables Height and Volume and hence, compute the point estimates for each sample.

# Set a seed  to ensure that the random sample remains the same every time you run the code:
set.seed(123)


# Draw two independent random samples of sizes 6 for "Height" and "Volume" from the dataset
sample_height <- sample(data$Height, 6)
sample_volume <- sample(data$Volume, 6)


#Compute the point estimates for each sample

# Point estimate for Height sample
point_estimate_height_sample <- mean(sample_height)

# Point estimate for Volume sample
point_estimate_volume_sample <- mean(sample_volume)

# Print the point estimates for each sample:
print("Point estimates for Height sample:")

## [1] "Point estimates for Height sample:"

print(point_estimate_height_sample)

## [1] 73.33333

print("Point estimates for Volume sample:")

## [1] "Point estimates for Volume sample:"

print(point_estimate_volume_sample)

## [1] 24.71667

5. Estimate the confidence interval (CI) for the difference between two population means μHeight and μVolume.

# Calculate the sample means for Height and Volume
mean_height <- mean(sample_height)
mean_volume <- mean(sample_volume)



# Calculate the sample standard deviations for Height and Volume
sd_height <- sd(sample_height)
sd_volume <- sd(sample_volume)


# Calculate the standard error for the difference between the means
se_difference <- sqrt((sd_height^2 / 6) + (sd_volume^2 / 6))
print(se_difference)

## [1] 3.769534

# Calculate the t-score for a 95% confidence interval with 5 degrees of freedom (n1 + n2 - 2)
t_score <- qt(0.975, df = 10)

# Calculate the margin of error
margin_of_error <- t_score * se_difference

# Calculate the confidence interval for the difference between the means
confidence_interval_lower <- (mean_height - mean_volume) - margin_of_error
confidence_interval_upper <- (mean_height - mean_volume) + margin_of_error

# Print the confidence interval
print(paste("95% Confidence Interval for the difference between Height and Volume means:", confidence_interval_lower, "to", confidence_interval_upper))

## [1] "95% Confidence Interval for the difference between Height and Volume means: 40.2176208241608 to 57.0157125091725"

# Task 1: Estimate the population mean for each variable
population_mean_Girth <- mean(data$Girth)
population_mean_Height <- mean(data$Height)
population_mean_Volume <- mean(data$Volume)

# Task 2: Draw a random sample of size 7 and compute point estimates
sample <- data[sample(nrow(data), 7), ]
point_estimate_Girth <- mean(sample$Girth)
point_estimate_Height <- mean(sample$Height)
point_estimate_Volume <- mean(sample$Volume)

# Task 3: Estimate the confidence intervals for the population mean of each variable
confidence_interval_Girth <- t.test(data$Girth, conf.level = 0.95)$conf.int
confidence_interval_Height <- t.test(data$Height, conf.level = 0.95)$conf.int
confidence_interval_Volume <- t.test(data$Volume, conf.level = 0.95)$conf.int

# Task 4: Draw two independent random samples of size 6 for Height and Volume, and compute point estimates
sample_Height <- data[sample(nrow(data), 6), "Height"]
sample_Volume <- data[sample(nrow(data), 6), "Volume"]
point_estimate_Height_sample <- mean(sample_Height)
point_estimate_Volume_sample <- mean(sample_Volume)

# Task 5: Estimate the confidence interval for the difference between two population means (Height and Volume)
confidence_interval_diff <- t.test(sample_Height, sample_Volume, conf.level = 0.95)$conf.int

# Output the results
cat("Population mean for Girth:", population_mean_Girth, "\n")

## Population mean for Girth: 13.24839

cat("Population mean for Height:", population_mean_Height, "\n")

## Population mean for Height: 76

cat("Population mean for Volume:", population_mean_Volume, "\n")

## Population mean for Volume: 30.17097

cat("Point estimate for Girth from sample:", point_estimate_Girth, "\n")

## Point estimate for Girth from sample: 15.38571

cat("Point estimate for Height from sample:", point_estimate_Height, "\n")

## Point estimate for Height from sample: 78.85714

cat("Point estimate for Volume from sample:", point_estimate_Volume, "\n")

## Point estimate for Volume from sample: 40.12857

cat("Confidence interval for Girth:", confidence_interval_Girth[1], "-", confidence_interval_Girth[2], "\n")

## Confidence interval for Girth: 12.09731 - 14.39947

cat("Confidence interval for Height:", confidence_interval_Height[1], "-", confidence_interval_Height[2], "\n")

## Confidence interval for Height: 73.6628 - 78.3372

cat("Confidence interval for Volume:", confidence_interval_Volume[1], "-", confidence_interval_Volume[2], "\n")

## Confidence interval for Volume: 24.14152 - 36.20042

cat("Point estimate for Height from sample (size 6):", point_estimate_Height_sample, "\n")

## Point estimate for Height from sample (size 6): 76

cat("Point estimate for Volume from sample (size 6):", point_estimate_Volume_sample, "\n")

## Point estimate for Volume from sample (size 6): 20.25

cat("Confidence interval for the difference between Height and Volume:", confidence_interval_diff[1], "-", confidence_interval_diff[2], "\n")

## Confidence interval for the difference between Height and Volume: 48.43089 - 63.06911