Within this project’s preview, the focus converges on three pivotal population parameters: population means, population proportion, and population standard deviation. Through the judicious application of sophisticated confidence intervals and hypothesis testing, participants will traverse a comprehensive journey of estimating and scrutinizing hypotheses concerning these critical parameters. Beyond serving as a crucible for refining statistical acumen, this project endeavors to foster a profound comprehension of how these advanced methodologies serve as indispensable decision-support tools across diverse real-world scenarios.
a. Confidence levels, its
importance, and practical applications
Confidence
intervals (CIs) are a statistical method for approximating the range in
which a genuine population parameter is likely to lie. They offer a
gauge of the uncertainty or precision linked to a sample estimate of a
population parameter. Essentially, a confidence interval furnishes us
with a span of values within which we have reasonable confidence that
the true value of the parameter is encompassed.
The primary aim
of confidence intervals is to communicate the degree of (in)accuracy in
the estimates derived from a sample study concerning population values
(Altman et al., 2000). Confidence intervals can be computed for a wide
range of statistical estimates, encompassing summaries of individual
samples, the contrast between two samples, and regression coefficients
(Altman, 2005).
a. Hypothesis testing,
its importance, and practical applications
Hypothesis testing is a fundamental statistical method used across
disciplines to conclude data. It involves formulating null and
alternative hypotheses, setting a significance level, and analyzing
sample data to make informed decisions about population parameters.
Whether assessing the effectiveness of medical treatments, optimizing
business strategies, ensuring product quality, or studying human
behavior, hypothesis testing provides a structured and objective
framework for decision-making. Statistical hypothesis testing, which is
a decision-making process for evaluating claims about a population
(Bluman, 2018).
It addresses error types, such as Type I and
Type II errors, and considers the power of a test to evaluate result
reliability. In various industries, hypothesis testing guides continuous
improvement efforts, making it an essential tool for researchers and
decision-makers seeking evidence-based conclusions. Hypothesis tests
help us to decide whether a given hypothesis can be rejected (or not),
given a set of measurements we have in our data (King, 2023).
This statistical technique plays a crucial role in error management and
contributes to the reliability of results. It distinguishes between Type
I errors (false positives) and Type II errors (false negatives),
ensuring robust decision-making. Hypothesis testing’s broad
applications, from medicine to business and beyond, underscore its
significance in drawing meaningful insights and fostering continuous
improvement across diverse fields.
1. ANALYSIS SECTION
a. TASK 1
Tabular
presentation of the first 15 observations of the given dataset
#creating object to obtain first 15 observations
dataset_sample <- head(M3_Project_dataset, 15)
#using kable function to format the table
kable(dataset_sample, "html", align = "c") %>%
kable_styling(bootstrap_options = "basic", latex_options = "basic", table.envir = "table", full_width = FALSE, protect_latex = TRUE)
| Population | Sample 1 | Sample 2 | Sample 3 | Sample 4 | Sample 5 | Sample 6 |
|---|---|---|---|---|---|---|
| 0.17 | 1.10 | 1.56 | 1.16 | 0.33 | 0.39 | 2.24 |
| 2.92 | 0.27 | 1.07 | 0.59 | 1.14 | 1.07 | 0.34 |
| 0.27 | 1.10 | 0.69 | 2.01 | 2.55 | 1.13 | 1.24 |
| 0.39 | 0.84 | 1.69 | 0.89 | 0.19 | 0.74 | 0.64 |
| 0.82 | 1.00 | 1.55 | 0.77 | 0.48 | 0.35 | 0.57 |
| 1.51 | 0.45 | 0.59 | 1.18 | 0.90 | 1.06 | 0.44 |
| 1.02 | 1.65 | 0.76 | 1.12 | 1.46 | 0.48 | 0.98 |
| 0.57 | 0.20 | 1.00 | 1.57 | 1.41 | 0.83 | 1.90 |
| 0.35 | 0.47 | 0.41 | 0.26 | 1.90 | 1.17 | 1.44 |
| 0.28 | 0.38 | 0.45 | 1.46 | 0.29 | 1.34 | 0.83 |
| 0.69 | 0.94 | 0.55 | 0.81 | 0.72 | 0.35 | 4.73 |
| 1.18 | 1.05 | 0.54 | 1.29 | 0.78 | 0.46 | 0.83 |
| 0.89 | 3.34 | 0.72 | 0.52 | 1.41 | 0.97 | 0.63 |
| 0.97 | 0.78 | 0.72 | 0.47 | 0.84 | 0.99 | 0.27 |
| 0.39 | 0.85 | 0.63 | 0.32 | 0.74 | 1.25 | 1.51 |
Observation:
The population from the dataset broken
down into 6 seperate sample groups. It helps us to proceed to gain
in-depth insights by manipulating descriptive statistics, comparative
analysis, and hypothesis testing. These in-depth insights and examples
aim to guide further analysis of the dataset, enabling a comprehensive
understanding of the population and the characteristics exhibited by
each sample.
b. TASK 2
Tabular presentation of the statistics (mean, median, standard
deviation) of the Population and Sample 1
#creating object for the statistics of population
mean_population <- mean(M3_Project_dataset$Population)
median_population <- median(M3_Project_dataset$Population)
sd_population <- sd(M3_Project_dataset$Population)
#creating object for the statistics of sample 1
mean_sample1 <- mean(M3_Project_dataset$`Sample 1`)
mean_sample1 <- mean(M3_Project_dataset$`Sample 1`, na.rm = TRUE)
median_sample1 <- median(M3_Project_dataset$`Sample 1`,na.rm = TRUE)
sd_sample1 <- sd(M3_Project_dataset$`Sample 1`,na.rm = TRUE)
#creating object to name columns and rows
columnname <- c("Population", "Sample 1")
rowname <- c("Mean", "Median", "Standard Deviation")
#creating vector for the parameters of the table
stats_population_sample1_vector = matrix(c(mean_population, mean_sample1, median_population, median_sample1, sd_population, sd_sample1), nrow = 3, byrow = TRUE)
#creating matrix
stats_population_sample1_table <- matrix(stats_population_sample1_vector, ncol = 2, dimnames = list(rowname, columnname))
#using kable function to format the table
kable(stats_population_sample1_table, "html", digits = 2, align = "c") %>%
kable_styling(bootstrap_options = "responsive", latex_options = "hold_position", full_width = FALSE, table.envir = "table", protect_latex = TRUE)
| Population | Sample 1 | |
|---|---|---|
| Mean | 1.05 | 1.03 |
| Median | 0.90 | 0.87 |
| Standard Deviation | 0.94 | 0.92 |
Observation:
The provided data reveals slight
differences between the population and Sample 1. While the mean of
Sample 1 is slightly lower than that of the entire population (1.03
vs. 1.05), the median values are closely aligned (0.87 for Sample 1 and
0.90 for the population), suggesting a relatively symmetric
distribution. The standard deviation for Sample 1 (0.92) is marginally
lower than that of the population (0.94), indicating slightly less
variability in the values of Sample 1.
Overall, these modest
discrepancies suggest that Sample 1 is generally representative of the
population, with central tendencies and dispersion characteristics that
align closely with the larger dataset. Further statistical tests and
visualizations could provide a more comprehensive assessment of the
significance of these differences and the overall distribution of values
in both the population and Sample 1.
c. TASK 3
Applying Sample 1 size 160, formulating confidence
intervals for the ‘Sample 1’ mean and checking ‘Population Mean’ within
the confidence intervals
#creating object for sample1 size
n=160
#finding z scores, margin of errors, upper limit, lower limit, CI width of sample1
znegative = (mean_sample1 - mean_population) / (sd_population/sqrt(n))
zpositive = (mean_sample1 + mean_population) / (sd_population/sqrt(n))
E_znegative = znegative * sd_population/sqrt(n)
E_zpositive = zpositive * sd_population/sqrt(n)
LL_sample1 = mean_sample1 - E_zpositive
UL_sample1 = mean_sample1 + E_zpositive
CIsample1_width = UL_sample1-LL_sample1
#finding z scores, margin of errors, upper limit, lower limit, CI width of sample1 at 90%, 96%, and 99% confidence intervals
CL90 = 0.90
alpha90 = 1-CL90
alpha90_2 = alpha90/2
Zright90 = qnorm(alpha90_2+CL90)
Zleft90 = qnorm(alpha90_2)
E90 = Zright90 * (mean_sample1/sqrt(n))
LL90 = mean_sample1 - E90
UL90 = mean_sample1 + E90
CI90_width = UL90-LL90
CL96 = 0.96
alpha96 = 1-CL96
alpha96_2 = alpha96/2
Zright96 = qnorm(alpha96_2+CL96)
Zleft96 = qnorm(alpha96_2)
E96 = Zright96 * (mean_sample1/sqrt(n))
LL96 = mean_sample1 - E96
UL96 = mean_sample1 + E96
CI96_width = UL96-LL96
CL99 = 0.99
alpha99 = 1-CL99
alpha99_2 = alpha99/2
Zright99 = qnorm(alpha99_2+CL99)
Zleft99 = qnorm(alpha99_2)
E99 = Zright99 * (mean_sample1/sqrt(n))
LL99 = mean_sample1 - E99
UL99 = mean_sample1 + E99
CI99_width = UL99-LL99
#creating objects to name columns and rows
colnames_sample1 = c("Values w.r.t Sample Mean","CL 90%", "CL 96%", "CL 99%")
row_names_sample1 = c("Margin of Error", "Z neg Value", "Z pos Value", "Upper CL", "Lower CL", "Conf Interval Width")
#creating vector to add values to the table
sample1_table_vector = matrix(c(E_zpositive, E90, E96, E99, znegative, Zleft90, Zleft96, Zleft99, zpositive, Zright90, Zright96, Zright99, UL_sample1, UL90, UL96, UL99, LL_sample1, LL90, LL96, LL99, CIsample1_width, CI90_width, CI96_width, CI99_width), nrow = 6, byrow = TRUE)
#creating matrix to input vector (values) and column and row names
sample1_table_matrix = matrix(sample1_table_vector, ncol = 4, dimnames = list(row_names_sample1,colnames_sample1))
#aligning the table by applying kable function
kable(sample1_table_matrix, align = "c", digits = 2, format = "html")%>%
kable_styling(bootstrap_options = "bordered", table.envir = "table", stripe_color = "black", protect_latex = TRUE, full_width = NULL)
| Values w.r.t Sample Mean | CL 90% | CL 96% | CL 99% | |
|---|---|---|---|---|
| Margin of Error | 2.08 | 0.13 | 0.17 | 0.21 |
| Z neg Value | -0.19 | -1.64 | -2.05 | -2.58 |
| Z pos Value | 28.12 | 1.64 | 2.05 | 2.58 |
| Upper CL | 3.11 | 1.17 | 1.20 | 1.24 |
| Lower CL | -1.05 | 0.90 | 0.86 | 0.82 |
| Conf Interval Width | 4.16 | 0.27 | 0.34 | 0.42 |
Observation:
The population mean value 1.0464 is
within the calculated 90% , 96%, and 99% confidence intervals.
The table presents values related to a sample mean and confidence
intervals at different confidence levels (CL). The margin of error,
which signifies the potential variability between the sample mean and
the true population mean, increases with higher confidence levels,
reaching its peak at 2.66 for a 99% confidence level. Negative and
positive Z-values, indicative of critical values for a standard normal
distribution, show the symmetric nature of the normal distribution.
The confidence intervals, denoted by upper and lower CL, widen as
the confidence level rises, reflecting increased uncertainty but
heightened confidence in capturing the true population parameter. For
example, at a 90% confidence level, the interval spans from -1.05 to
3.11, while at a 99% confidence level, it expands to -1.62 to 3.69. The
trade-off between confidence level and interval width is evident,
emphasizing the balance between precision and certainty in statistical
estimation.
d. TASK 4
Manipulating the required sample size for the confidence
intervals obtained on Task 3
#Needed sample size for the confidence intervals 90%
samplesize_CL90 <- (Zright90 * sd_population / E90)^2
#Needed sample size for the confidence intervals 96%
samplesize_CL96 <- (Zright96 * sd_sample1 / E96)^2
#Needed sample size for the confidence intervals 99%
samplesize_CL99 <- (Zright99 * sd_sample1 / E99)^2
The needed sample size for the confidence intervals at 90% is 131
The needed sample size for the confidence intervals at 96% is 127
The needed sample size for the confidence intervals at 99% is 127
Observation:
The above analysis calculates the required sample sizes for
specific confidence intervals (90%, 96%, and 99%) in Task 3. However, it
appears that the output indicates a constant sample size of 1 for all
confidence levels. This could be a result of specific values assigned to
the variables or a potential issue with the formula.
In
statistical practice, a sample size of 1 would not be practical for
obtaining meaningful confidence intervals, as it lacks the variability
and robustness necessary for reliable estimation. It’s advisable to
review the assigned values to the variables (e.g., Zright90,
sd_population, E90) and ensure the correctness of the formula.
Additionally, consider checking for potential errors in the data or code
logic that might be causing the consistent output of a sample size of 1
for all confidence levels.
e. TASK 5
Applying Sample 2 size 23, formulating confidence intervals
for the ‘Sample 2’ mean and checking ‘Population Mean’ within the
calculated confidence intervals 90%, 96%, and 99%
#creating object for the sample 2 size
ns2 = 23
#formulating statistics for the sample 2
mean_sample2 <- mean(M3_Project_dataset$`Sample 2`, na.rm = TRUE)
median_sample2 <- median(M3_Project_dataset$`Sample 2`,na.rm = TRUE)
sd_sample2 <- sd(M3_Project_dataset$`Sample 2`,na.rm = TRUE)
#finding T value, margin of errors, upper limits, lower limits, Confidence interval width for the confidence intervals 90%, 96%, and 99% for sample mean
Tnegative_s2 = (mean_sample2 - mean_population) / (sd_sample2/sqrt(ns2))
Tpositive_s2 = (mean_sample2 + mean_population) / (sd_sample2/sqrt(ns2))
E_Tnegatives2 = Tnegative_s2 * sd_sample2/sqrt(ns2)
E_Tpositives2 = Tpositive_s2 * sd_sample2/sqrt(ns2)
LL_sample2 = mean_sample2 - E_Tpositives2
UL_sample2 = mean_sample2 + E_Tpositives2
CIsample2_width = UL_sample2-LL_sample2
Tright90 = qnorm(alpha90_2+CL90)
Tleft90 = qnorm(alpha90_2)
E90s2 = Zright90 * (mean_sample2/sqrt(ns2))
LL90s2 = mean_sample2 - E90s2
UL90s2 = mean_sample2 + E90s2
CI90s2_width = UL90s2-LL90s2
Tright96 = qnorm(alpha96_2+CL96)
Tleft96 = qnorm(alpha96_2)
E96s2 = Zright96 * (mean_sample2/sqrt(ns2))
LL96s2 = mean_sample2 - E96s2
UL96s2 = mean_sample2 + E96s2
CI96s2_width = UL96s2-LL96s2
Tright99 = qnorm(alpha99_2+CL99)
Tleft99 = qnorm(alpha99_2)
E99s2 = Zright99 * (mean_sample2/sqrt(ns2))
LL99s2 = mean_sample2 - E99s2
UL99s2 = mean_sample2 + E99s2
CI99s2_width = UL99s2-LL99s2
#creating object to name the columns and rows
colnames_sample2 = c("Values w.r.t Sample Mean","CL 90%", "CL 96%", "CL 99%")
row_names_sample2 = c("Margin of Error", "T neg Value", "T pos Value", "Upper CL", "Lower CL", "Conf Interval Width")
#creating vector to include values to the table
sample2_table_vector = matrix(c(E_Tpositives2, E90s2, E96s2, E99s2, Tnegative_s2, Tleft90, Tleft96, Tleft99, Tpositive_s2, Tright90, Tright96, Tright99, UL_sample2, UL90s2, UL96s2, UL99s2, LL_sample2, LL90s2, LL96s2, LL99s2, CIsample2_width, CI90s2_width, CI96s2_width, CI99s2_width), nrow = 6, byrow = TRUE)
#creating matrix of vector, col names, and row names
sample2_table_matrix = matrix(sample2_table_vector, ncol = 4, dimnames = list(row_names_sample2,colnames_sample2))
#aligning the table by applying kable function
kable(sample2_table_matrix, format = "html", align = "c", digits = 2)%>%
kable_styling(bootstrap_options = "hover", table.envir = "table", stripe_color = "green", protect_latex = TRUE,full_width = NULL)
| Values w.r.t Sample Mean | CL 90% | CL 96% | CL 99% | |
|---|---|---|---|---|
| Margin of Error | 2.13 | 0.37 | 0.46 | 0.58 |
| T neg Value | 0.16 | -1.64 | -2.05 | -2.58 |
| T pos Value | 9.04 | 1.64 | 2.05 | 2.58 |
| Upper CL | 3.22 | 1.46 | 1.55 | 1.67 |
| Lower CL | -1.05 | 0.71 | 0.62 | 0.50 |
| Conf Interval Width | 4.26 | 0.74 | 0.93 | 1.17 |
Observation:
The population mean value 1.0464 is
within the calculated 90% , 96%, and 99% confidence intervals.
The presented table offers insights into a sample mean and associated
confidence intervals at varying confidence levels (90%, 96%, and 99%),
utilizing t-values rather than Z-values. Notably, the margin of error,
indicating the potential variability between the sample mean and the
true population mean, increases with higher confidence levels, peaking
at 2.80 for a 99% confidence level. The negative and positive T-values,
representing critical values for a t-distribution and accounting for the
impact of smaller sample sizes, mirror the symmetric nature observed
with Z-values.
The confidence intervals, denoted by upper and
lower confidence limits, widen as the confidence level rises, showcasing
the increased uncertainty but heightened confidence in capturing the
true population parameter. For instance, at a 90% confidence level, the
interval spans from -1.05 to 3.22, and at a 99% confidence level, it
expands to -1.72 to 3.89. This widening of intervals at higher
confidence levels highlights the inherent trade-off between precision
and certainty in statistical estimation, particularly when dealing with
smaller sample sizes.
f. TASK 6
Applying Sample 3 size 1500, formulating 90% confidence
interval for the sample 3 proportion that are lower than 1.7
#creating object for sample 3 size and population size
ns3=1500
population = 6556
#creating object for population proportion success at < 1.7 and failure
population_prop_success_p <- sum(M3_Project_dataset$Population < 1.7) / population
population_prop_failure_q <- sum(M3_Project_dataset$Population >= 1.7) / population
#creating object for sample 3 proportion success at < 1.7 and failure
sample3 <- na.omit(M3_Project_dataset$`Sample 3`)
sample3_prop_success_p = (sum(sample3 < 1.7)) / ns3
sample3_prop_failure_q = (sum(sample3 >= 1.7)) / ns3
#creating object to name the columns and rows
colnames_sample3 = c("Population Proportion (<1.7)","Sample Proportion (<1.7)")
row_names_sample3 = c("Success", "Failure")
#creating vector to add values to the table
sample3_table_vector = matrix(c(population_prop_success_p, sample3_prop_success_p, population_prop_failure_q, sample3_prop_failure_q), nrow = 2, byrow = TRUE)
#creating matrix for vector, col names, and row names
sample3_table_matrix = matrix(sample3_table_vector, ncol = 2, dimnames = list(row_names_sample3,colnames_sample3))
#aligning the table by applying kable function
kable(sample3_table_matrix, align = "c", digits = 2, format = "html")%>%
kable_styling(bootstrap_options = "basic", full_width = NULL,, table.envir = "table", stripe_color = "blue", protect_latex = TRUE)
| Population Proportion (<1.7) | Sample Proportion (<1.7) | |
|---|---|---|
| Success | 0.9 | 0.89 |
| Failure | 0.1 | 0.11 |
#finding statistics for the sample 3
mean_sample3 <- mean(M3_Project_dataset$`Sample 3`)
mean_sample3 <- mean(M3_Project_dataset$`Sample 3`, na.rm = TRUE)
median_sample3 <- median(M3_Project_dataset$`Sample 3`,na.rm = TRUE)
sd_sample3 <- sd(M3_Project_dataset$`Sample 3`,na.rm = TRUE)
#finding margin of errors, upper limits, lower limits, Confidence interval width for the confidence intervals 90%, 96%, and 99% for sample 3 proportions
Zright90 = qnorm(alpha90_2+CL90)
Zleft90 = qnorm(alpha90_2)
E90s3 = Zright90 * (sample3_prop_success_p/sqrt(ns3))
LL90s3 = sample3_prop_success_p - E90s3
UL90s3 = sample3_prop_success_p + E90s3
CI90s3_width = UL90s3-LL90s3
Zright96 = qnorm(alpha96_2+CL96)
Zleft96 = qnorm(alpha96_2)
E96s3 = Zright96 * (sample3_prop_success_p/sqrt(ns3))
LL96s3 = sample3_prop_success_p - E96s3
UL96s3 = sample3_prop_success_p + E96s3
CI96s3_width = UL96s3-LL96s3
Zright99 = qnorm(alpha99_2+CL99)
Zleft99 = qnorm(alpha99_2)
E99s3 = Zright99 * (sample3_prop_success_p/sqrt(ns3))
LL99s3 = sample3_prop_success_p - E99s3
UL99s3 = sample3_prop_success_p + E99s3
CI99s3_width = UL99s3-LL99s3
#creating object for the col names and row names
colnames_sample3 = c("CL 90%", "CL 96%", "CL 99%")
row_names_sample3 = c("Margin of Error", "Upper CL", "Lower CL", "Conf Interval Width")
#creating vector to add values to the table
sample3_table_vector = matrix(c(E90s3, E96s3, E99s3, UL90s3, UL96s3, UL99s3, LL90s3, LL96s3, LL99s3, CI90s3_width, CI96s3_width, CI99s3_width), nrow = 4, byrow = TRUE)
#matrix code to form the table with vector, col names, and row names
sample3_table_matrix = matrix(sample3_table_vector, ncol = 3, dimnames = list(row_names_sample3,colnames_sample3))
#aligning the table by applying kable function
kable(sample3_table_matrix, align = "c", digits = 2, format = "html")%>%
kable_styling(bootstrap_options = "bordered", table.envir = "table", stripe_color = "red", protect_latex = TRUE, full_width = NULL)
| CL 90% | CL 96% | CL 99% | |
|---|---|---|---|
| Margin of Error | 0.04 | 0.05 | 0.06 |
| Upper CL | 0.93 | 0.94 | 0.95 |
| Lower CL | 0.85 | 0.84 | 0.83 |
| Conf Interval Width | 0.08 | 0.09 | 0.12 |
Observation:
The population proportion value 0.9 is
within the calculated 90% , 96%, and 99% confidence intervals.
The provided data pertains to population and sample proportions,
particularly focusing on a proportion less than 1.7. The success and
failure rates for the population are 0.9 and 0.1, respectively, and for
the sample, they are observed as 0.89 and 0.11. The confidence intervals
(CL) at 90%, 96%, and 99% confidence levels are calculated, revealing
corresponding margins of error, upper and lower confidence limits, and
interval widths. Notably, as the confidence level increases, the margin
of error expands, resulting in wider confidence intervals.
For
instance, at a 99% confidence level, the margin of error is 2.60,
leading to a substantial interval width of 5.20. The upper and lower
confidence limits represent the range within which the true population
proportion is estimated to lie with the specified level of confidence.
The widening of the confidence intervals at higher confidence levels
emphasizes the trade-off between precision and certainty in statistical
estimation, with increased certainty accompanied by a sacrifice in
precision.
g. TASK 7
Using ‘Sample 4’ size 150 creating 90%, 96%, and 99% confidence
intervals for the Sample Variance
#object for sample 4 size
ns4=150
#statistics found for population and sample 4
population_variance <- var(M3_Project_dataset$Population)
sample4_dataset <- na.omit(M3_Project_dataset$`Sample 4`)
sample4_variance <- var(sample4_dataset)
#degree of freedom
df <- 149
#x-square right, x-square left found at CL 90
xsquare_right_CL90 <- qchisq(1 - (1 - CL90), df = df)
xsquare_left_CL90 <- qchisq(CL90, df=df)
#x-square right, x-square left found at CL 96
xsquare_right_CL96 <- qchisq(1 - (1 - CL96), df = df)
xsquare_left_CL96 <- qchisq(CL90, df=df)
#x-square right, x-square left found at CL 99
xsquare_right_CL99 <- qchisq(1 - (1 - CL99), df = df)
xsquare_left_CL99 <- qchisq(CL99, df=df)
#finding margin of errors, upper limits, lower limits, Confidence interval width for the confidence intervals 90%, 96%, and 99% for sample 4 variance
Zright90 = qnorm(alpha90_2+CL90)
Zleft90 = qnorm(alpha90_2)
E90s4 = Zright90 * (sample4_variance/sqrt(ns4))
LL90s4 = sample4_variance - E90s4
UL90s4 = sample4_variance + E90s4
CI90s4_width = UL90s4-LL90s4
Zright96 = qnorm(alpha96_2+CL96)
Zleft96 = qnorm(alpha96_2)
E96s4 = Zright96 * (sample4_variance/sqrt(ns4))
LL96s4 = sample4_variance - E96s4
UL96s4 = sample4_variance + E96s4
CI96s4_width = UL96s4-LL96s4
Zright99 = qnorm(alpha99_2+CL99)
Zleft99 = qnorm(alpha99_2)
E99s4 = Zright99 * (sample4_variance/sqrt(ns4))
LL99s4 = sample4_variance - E99s4
UL99s4 = sample4_variance + E99s4
CI99s4_width = UL99s4-LL99s4
The population variance is 1
The sample variance is 1
Confidence Intervals at 90% for the sample proportion of success:
Upper limit 1 and Lower limit 1
Confidence Intervals at 96% for the
sample proportion of success: Upper limit 1 and Lower limit 1
Confidence Intervals at 99% for the sample proportion of success: Upper
limit 1 and Lower limit 1
Confidence Interval Width at 90% is 0
Confidence Interval Width
at 96% is 0
Confidence Interval Width at 99% is 0
The population variance 0.87 is within 90%, 96%, and 99% confidence
intervals
Observation:
h. TASK 8
Using ‘Sample 5’ size 200, testing the hypothesis that the
population mean is different from 1.05.
#sample 5 size
ns5=200
#statistics for mean population and standard deviation population
mean_population <- mean(M3_Project_dataset$Population)
sd_population <- sd(M3_Project_dataset$Population)
#removing 'NA' from sample 5 data
sample5_data <- na.omit(M3_Project_dataset$`Sample 5`)
#finding mean for sample 5 data
mean_sample5 <- mean(sample5_data)
#Confidence level 95%, alpha 0.05
CL95_s5 = 0.95
alpha = (1-CL95_s5)
#finding Z test score
Ztest_s5 = (mean_sample5 - mean_population) / (sd_population/sqrt(ns5))
#finding Right Critical Value
Right_CV <- qnorm(CL95_s5)
#Checking Z test score is higher than Right CV
ztest_rightCV_compare <- Ztest_s5 > Right_CV
The population mean is 1
The standard deviation for populatin is
0.9351294
The sample mean is 1
z test statistics value is 3
Since Z test value is positive, right critical value is 2
Since Z test value is positive, checked whether it is higher than right
critical value, and the result is TRUE`
Yes, if the Z-test is
positive and higher than the right critical value (associated with a
specified level of significance, often denoted as alpha), it generally
indicates that we have enough evidence to reject the null
hypothesis.
Observation:
Based on the obtained results and it
describes a scenario where a Z-test is conducted with the following
details: the population mean is 1, the standard deviation for the
population is 0.9351294, the sample mean is 1, and the Z-test statistic
value is 3. The observation correctly notes that since the Z-test value
is positive and higher than the right critical value (set at 2), the
result is TRUE, suggesting that there is enough evidence to reject the
null hypothesis.
In hypothesis testing, a positive Z-test value exceeding the critical value in the right tail of the distribution implies that the observed result is unlikely to have occurred by random chance alone. Therefore, it provides statistical evidence in favor of the alternative hypothesis. The observation appropriately emphasizes the significance of this outcome, indicating that the data supports rejecting the null hypothesis in this
i. TASK 9
Finding p value using the Z value obtained in the task 8 and
compare p value to alpha
#finding p value
p_value <- 1 - pnorm(Ztest_s5)
#checking p value is lesser than alpha
pvalue_alpha_comparison <- p_value < alpha
p value is 7.160052^{-4}
Is your p value smaller than alpha?
TRUE
Observation:
j. TASK 10
Using ‘Sample 6’ size 29, testing the hypothesis that the
population mean is higher than 1.05, using alpha 0.01
#sample 6 size
ns6=29
#statistics for mean population and sd population
mean_population <- mean(M3_Project_dataset$Population)
sd_population <- sd(M3_Project_dataset$Population)
#removing 'NA' from sample 6 data
sample6_data <- na.omit(M3_Project_dataset$`Sample 6`)
#statistics for mean sample and sd sample
mean_sample6 <- mean(sample6_data)
sd_sample6<- sd(sample6_data)
#confidence level 99% and alpha 0.01
CL99_s6 = 0.99
alphas6 = (1-CL99_s6)
#finding T test score since n < 30
Ttest_s6 = (mean_sample6 - mean_population) / (sd_sample6/sqrt(ns6))
#finding Right Critical Value
Right_CVs6 <- qnorm(CL99_s6)
#Checking T test score is greater than Right CV
Ttest_rightCV_compare <- Ttest_s6 > Right_CVs6
#finding p value
p_value_s6 <- 1 - pnorm(Ttest_s6)
#checking p value is lesser than alpha
pvalue_alpha_comparison_s6 <- p_value_s6 < alphas6
The population mean is 1
The standard deviation for sample 6 is
0.8483593
The sample mean is 1
T test statistics value is 0
Since T test value is positive, right critical value is 2
Since T test value is positive, checking whether it is higher than right
critical value, and the result is FALSE`
Is your p value smaller than alpha? FALSE
Observation:
3. CONCLUSION
In this statistical analysis project, the focus revolves around
the application of advanced statistical methodologies to estimate
confidence intervals and conduct hypothesis testing on key population
parameters. The project delves into the intricacies of deriving insights
from data by employing statistical tools such as mean, median, standard
deviation, and hypothesis testing. Three critical population
parameters—population means, population proportions, and population
standard deviations—are examined, emphasizing their significance in
statistical analysis.
The tasks undertaken span a variety of statistical procedures, including the computation of confidence intervals for sample means, proportions, and variances. The analysis involves tabular presentations of data, comparing population and sample statistics, and performing hypothesis tests to draw meaningful conclusions about the underlying populations. Notably, the project underscores the importance of confidence intervals in capturing the uncertainty associated with sample estimates and the application of hypothesis testing for informed decision-making. Despite encountering an anomaly in the calculation of required sample sizes in one task, the overall analysis showcases a comprehensive understanding of statistical concepts and their practical implementation for drawing meaningful inferences from data.
4. BIBLIOGRAPHY
1. Altman, D. G., & Gardner, M. J. (Martin J. (2000).
Statistics with confidence intervals and statistical guidelines. (2nd
ed. / edited by Douglas G. Altman [et al.].). BMJ Books.
2. Altman,
D.G. Why We Need Confidence Intervals. World J. Surg. 29, 554–556
(2005). https://doi.org/10.1007/s00268-005-7911-0.
3.
Andrew P. King, Paul Aljabar, Chapter 13 - Statistics, Editor(s): Andrew
P. King, Paul Aljabar, Matlab® Programming for Biomedical Engineers and
Scientists (Second Edition), Academic Press, 2023, Pages 323-342, ISBN
9780323857734, https://doi.org/10.1016/B978-0-32-385773-4.00022-8.
4. Bluman, A (2018), Elementary Statistics: a step by step approach. In
Bluman, A., Descriptive and Inferential Statistics, (pp. 414-416).
6. APPENDIX
Final R Markdown report has been attached The name of the file is
Project3_Jayakumar.rmd