Introduction

This project aims to investigate the relationship between daily weightlifting exercise and the ability to lift weight. Additionally, the project explores the effect of three different fertilizer mixtures on crop yield. The statistical techniques to be employed for analysis are the t-test and one-way ANOVA.

Number 1

To answer this question, a study with 100 subjects were created. On the first day, the number of squats each participant could do in one minute were measured while wearing a ten pound vest (called pre data). Then each participant was allowed to participate in an exercise program for a month. At the end of that month, the number of squats each participant could do in one minute was measured while wearing a ten pound vest (called post data).Please provide all descriptive statistics, inferential tests, interpretations necessary to answer the research question.

To answer the research question “Is daily weightlifting exercise associated with a change in ability to lift weight?” using the provided data, we can perform several descriptive statistics and inferential tests.
dataset
WL<-data.frame(Pre= c(6.9, 4.8, 16.4, 19.8, 16.2, 21.1, 15, 14.9, 9.4, 16.5, 21.6, 19.3, 7.1, 31.7, 24.7, 9.8, 19.1, 3.3, 15.4, 17.9, 18.2, 19, 6.4, 3.7, 24.5, 15.9, 9.6, 29.9, 18.2, 30.6, 22.4, 21.7, 6.5, 1.4, 28.9, 13.9, 7.6, 12.9, 17, 35.8, 23.1, 14.9, 18.3, -0.1, 7.4, 10.6, 21.7, 4.1, 9.1, -1.9, 32, 3.9, 30.3, 23.5, 23.6, 21, 9, 29.6, 12, 15.6, 11.4, 11.3, 35.4, 20.1, 10.1, 15.2, 6.3, 14.3, 29.2, 15.5, 8.3, -8.2, 4.5, 11.7, 14.1, -1.2, 23.9, 10.7, 23.8, 17.2, 25.6, 8.4, 12.7, 15, 15.8, 34.4, 30, 19.6, 23, 13.2, 1, 31.7, 18.9, -1.5, 30.2, 8.8, 27.2, 22.5, 27, 8.8), Post = c(18.3, 13.4, 5.1, 8.4, 8.6, 26.2, 26.2, 21, 14, 9.1, 21.7, 9.5, 25.9, 26.7, 8.9, 9.4, 14.9, 13.6, 29.2, 10.4, 13.3, 18.4, 8, 8.2, 5.3, 27.7, 18.9, -0.9, 22, 11.5, 4.3, 26.9, 30, 13.6, 22, 15.4, 8.7, 11.7, 33, 31.3, 21.6, 25.7, 15.3, 17, 10, 32.8, 26.4, 19.4, 15.3, 25.4, 22.6, 10, 17.4, 27.9, 26, 24.7, 16.1, 21.3, 22, 5.6, 14.7, 23, 23.3, 31.1, 21.3, 24.5, 12.6, 13.9, 18.7, 18.5, 10, 17.5, 20.6, 22, 19.4, 14.3, 15.5, 12, 19.4, 15, 22.8, 9.2, 22.4, 22.9, 16.3, 23.2, 28.8, 15.3, 18, 13.6, 25.8, 28.7, 19.7, 17.5, 14.8, 9.5, 16, 29.1, 11.1, 18.2))
head(WL)
##    Pre Post
## 1  6.9 18.3
## 2  4.8 13.4
## 3 16.4  5.1
## 4 19.8  8.4
## 5 16.2  8.6
## 6 21.1 26.2

Let’s start by calculating the necessary descriptive statistics.

Descriptive Statistics
summary(WL)
##       Pre              Post      
##  Min.   :-8.200   Min.   :-0.90  
##  1st Qu.: 9.075   1st Qu.:13.12  
##  Median :15.700   Median :18.10  
##  Mean   :16.086   Mean   :18.02  
##  3rd Qu.:22.625   3rd Qu.:23.05  
##  Max.   :35.800   Max.   :33.00
# Calculate the standard deviation of each column
Std <- apply(WL, 2,sd )
Std
##      Pre     Post 
## 9.454164 7.309774
Descriptive statistics:
Pre-exercise mean: 16.09, standard deviation: 9.45
Post-exercise mean: 18.02, standard deviation: 7.31

Inferential Tests

To test whether daily weightlifting exercise is associated with a change in ability to lift weight, we can use a paired-samples t-test. This is appropriate because we are interested in comparing the means of two related samples (pre- and post-exercise).
Hypotheses:
Null hypothesis (H0): There is no significant difference between the mean number of squats performed in one minute before and after the exercise program.
Alternative hypothesis (Ha): There is a significant difference between the mean number of squats performed in one minute before and after the exercise program.
The paired-samples t-test assumes that the difference scores (post - pre) are normally distributed. To check this assumption, we can create a histogram and a normal probability plot of the difference scores.
# Compute difference scores
diff <- WL$Post - WL$Pre

# mean
Mean_diff<- mean(diff)
Mean_diff 
## [1] 1.939
# standard deviation
std_diff<- sd(diff)
std_diff
## [1] 10.87832
# Histogram
hist(diff)

# Normal probability plot
qqnorm(diff); qqline(diff)

The histogram and the normal probability plot show that the difference scores are approximately normally distributed.
Next, we can conduct the Welch Two Sample t-test:
# Welch Two Sample t-test
t.test(WL$Pre, WL$Post, alternative = "two.sided")
## 
##  Welch Two Sample t-test
## 
## data:  WL$Pre and WL$Post
## t = -1.6225, df = 186.2, p-value = 0.1064
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -4.2965745  0.4185745
## sample estimates:
## mean of x mean of y 
##    16.086    18.025
The results of the test are as follows:
t-value: -1.6225
Degrees of freedom (df): 186.2
p-value: 0.1064
Based on these results, the alternative hypothesis states that the true difference in means is not equal to 0. The 95 percent confidence interval for the difference in means ranges from -4.2965745 to 0.4185745.
The sample estimates show that the mean of the pre-exercise weightlifting ability is 16.086, while the mean of the post-exercise weightlifting ability is 18.025.
Given the p-value of 0.1064, which is greater than the conventional significance level of 0.05, we do not have sufficient evidence to reject the null hypothesis. Therefore, we cannot conclude that there is a statistically significant association between daily weightlifting exercise and a change in the ability to lift weight based on this test.
we can conduct the paired-samples t-test to validate the above result.
# Paired t-test
t.test(WL$Pre, WL$Post, paired = TRUE)
## 
##  Paired t-test
## 
## data:  WL$Pre and WL$Post
## t = -1.7824, df = 99, p-value = 0.07774
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -4.0974939  0.2194939
## sample estimates:
## mean difference 
##          -1.939
The results of the test are as follows:
t-value: -1.7824
Degrees of freedom (df): 99
p-value: 0.07774
The alternative hypothesis states that the true mean difference is not equal to 0. The 95 percent confidence interval for the mean difference ranges from -4.0974939 to 0.2194939.
The sample estimate shows that the mean difference between the pre-exercise weightlifting ability and the post-exercise weightlifting ability is -1.939.
Based on these results, since the p-value (0.07774) is greater than the conventional significance level of 0.05, we do not have sufficient evidence to reject the null hypothesis. Therefore, we cannot conclude that there is a statistically significant association between daily weightlifting exercise and a change in the ability to lift weight based on this paired t-test.

Based on the results from the two t-tests, since the p-values are greater than the conventional significance level of 0.05, we do not have sufficient evidence to reject the null hypothesis. Therefore, we cannot conclude that there is a statistically significant association between daily weightlifting exercise and a change in the ability to lift weight based on this paired t-test.

Number 2

Test the effect of three different fertilizer mixtures on crop yield. Use a one-way ANOVA to find out if there is a difference in crop yields between the three groups.

ANOVA stands for Analysis of Variance. It is a statistical technique used to compare the means of two or more groups and determine if there are significant differences among them. ANOVA tests whether the variability between group means is greater than the variability within the groups.
Loading the necessary Library
library(tidyverse) # for data cleaning and EDA
import data
Fert<-read.csv("C:/Users/user/Desktop/FACTUALS/crop.data_.anova_/crop.data.csv", header=T)
head(Fert)
##   density block fertilizer    yield
## 1       1     1          1 177.2287
## 2       2     2          1 177.5500
## 3       1     3          1 176.4085
## 4       2     4          1 177.7036
## 5       1     1          1 177.1255
## 6       2     2          1 176.7783
Changing categorical variables into the appropriate datatype.
Fert$fertilizer<-as.factor(Fert$fertilizer)
Fert$density<-as.factor(Fert$density)
Fert$block<-as.factor(Fert$block)
glimpse(Fert)
## Rows: 96
## Columns: 4
## $ density    <fct> 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2, 1, 2,…
## $ block      <fct> 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4,…
## $ fertilizer <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ yield      <dbl> 177.2287, 177.5500, 176.4085, 177.7036, 177.1255, 176.7783,…
Descriptive statistics
summary(Fert)
##  density block  fertilizer     yield      
##  1:48    1:24   1:32       Min.   :175.4  
##  2:48    2:24   2:32       1st Qu.:176.5  
##          3:24   3:32       Median :177.1  
##          4:24              Mean   :177.0  
##                            3rd Qu.:177.4  
##                            Max.   :179.1
#standard deviation
sd_yield<-sd(Fert$yield)
sd_yield
## [1] 0.6645476
There are 96 observation with 4 variables, 3 categorical variables and one numerical variables. the dataset is equally distributed in each of the categorical variable.
The numerical variable has a mean of 177.0, median of 177.1, and standard deviation of 0.66.

To test the effect of three different fertilizer mixtures on crop yield we will use the One Way Analysis of Variance

H0: μ1 = μ2 = μ3 (The population means of crop yields for groups 1, 2, and 3 are equal)
Ha: At least one population mean of crop yields is different (There is a significant difference in crop yields between the groups)
attach(Fert)
mod<-aov(yield~fertilizer)
summary(mod)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## fertilizer   2   6.07  3.0340   7.863  7e-04 ***
## Residuals   93  35.89  0.3859                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on the results of the one-way ANOVA test, the F-value is 7.863, and the corresponding p-value is 0.0007 (7e-04).
Since the p-value (0.0007) is less than the commonly chosen significance level of 0.05, we can reject the null hypothesis. This suggests that there is a significant difference in crop yields between the three groups treated with different fertilizer mixtures.
The results indicate that the fertilizer mixtures have a statistically significant effect on crop yield. The Mean Square value (3.0340) indicates the average amount of variation in crop yield explained by the fertilizer groups. Additionally, the residuals (within-group variation) have a Sum of Squares value of 35.89.
In summary, the one-way ANOVA test supports the alternative hypothesis, providing evidence that the fertilizer mixtures have a significant impact on crop yields.

To determine which specific fertilizer groups are statistically different from one another in a pairwise manner, we can conduct post-hoc tests. One commonly used post-hoc test for comparing multiple groups is the Tukey’s Honestly Significant Difference (HSD) test. It allows us to make pairwise comparisons while controlling for the family-wise error rate.

TukeyHSD(mod)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = yield ~ fertilizer)
## 
## $fertilizer
##          diff         lwr       upr     p adj
## 2-1 0.1761687 -0.19371896 0.5460564 0.4954705
## 3-1 0.5991256  0.22923789 0.9690133 0.0006125
## 3-2 0.4229569  0.05306916 0.7928445 0.0208735

Based on the post-hoc test using Tukey’s multiple comparisons of means, we can compare the means of each group to identify which ones are statistically different from one another.

The results indicate that the mean yield for group 3 is significantly different from group 1 (p-value = 0.0006) and group 2 (p-value = 0.0209), respectively. However, there is no statistically significant difference in mean yield between groups 1 and 2 (p-value = 0.4955).
Therefore, we can conclude that the fertilizer mixture used in group 3 is the most effective in increasing crop yields compared to groups 1 and 2. Meanwhile, the yields obtained from groups 1 and 2 are not significantly different from each other.

Summary

In our inferential statistics project, we conducted a t-test to investigate the association between daily weightlifting exercise and the ability to lift weight. The results of the t-test indicated that the p-values obtained were greater than the conventional significance level of 0.05. Therefore, we do not have sufficient evidence to reject the null hypothesis. Hence, based on this paired t-test, we cannot conclude that there is a statistically significant association between daily weightlifting exercise and a change in the ability to lift weight.

Additionally, we performed a one-way ANOVA to test the effect of three different fertilizer mixtures on crop yield. The ANOVA results indicated that there was a statistically significant difference between the groups. Post-hoc analyses were conducted to compare the individual groups. The results of the post-hoc tests revealed that the fertilizer mixture used in group 3 had a significantly higher crop yield compared to groups 1 and 2. However, the crop yields obtained from groups 1 and 2 were not significantly different from each other. Therefore, we can conclude that the fertilizer mixture used in group 3 is the most effective in increasing crop yields compared to groups 1 and 2, while there is no significant difference between the yields obtained from groups 1 and 2.

These findings provide insights into the relationship between weightlifting exercise and the ability to lift weight, as well as the impact of different fertilizer mixtures on crop yield. However, it is important to note that these conclusions are based on the specific data analyzed and the chosen significance level.

1

  1. Your comments and recommendations are highly appreciated↩︎