NAME: Joseph Kehoe STUDENT ID: 920697315 DATE: 11-15-1998
For this assignment, we want to use the wheat.csv file that we used for assignment 6. You have a dataset containing the yields of wheat samples treated with two different types of fertilizer: a control (no fertilizer) and a nitrogen-enriched (N) fertilizer. The goal is to determine whether there is a statistically significant difference in the yields produced by these two treatments.
load the dataset, and if not already done, change the fertilizer treatment to a factor. Then, use boxplot to visualize the yield under different fertilizers.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#load the data
raw_wheat<- read.csv("C:/Users/Joey/Downloads/wheat_yield.csv")
#change the fertilizer treatment to a factor
Fertilizer <- as.factor(raw_wheat$Fertilizer)
str(Fertilizer)
## Factor w/ 2 levels "Control","N": 1 1 1 1 1 1 1 1 1 1 ...
#boxplot
ggplot(raw_wheat,aes(x=Fertilizer,y=Yield..kg.ha., fill=Fertilizer))+geom_boxplot()
Question 1: What are the null and alternative hypotheses for comparing the yields from the two fertilizer treatments? H0: N_yield-Control_yield=0 H1: N_yield-Control_yield≠0
Separate the dataset by treatment. Then, use a two-sample t-test to compare the yields (assume variances are not equal).
#separate the dataset by the treatments
control_df <- raw_wheat %>% filter(Fertilizer == "Control")
N_df <- raw_wheat %>% filter(Fertilizer == "N")
#perform t-test and print the result
t_test_wheatyield <- t.test(control_df$Yield..kg.ha., N_df$Yield..kg.ha.,
alternative = "two.sided", var.equal = FALSE)
print(t_test_wheatyield)
##
## Welch Two Sample t-test
##
## data: control_df$Yield..kg.ha. and N_df$Yield..kg.ha.
## t = -5.5969, df = 51.234, p-value = 8.61e-07
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1098.0120 -518.3065
## sample estimates:
## mean of x mean of y
## 4236.207 5044.367
Question 2: How would you interpret the p-value? (1 point) We could interpret it to mean that there is no difference in yields, and be confident that there is a significant difference.
Question 3: What can you conclude about the yields from the two types of fertilizers based on this test? (1 point) We can conclude that the yield of the control is less than the yield of the Nitrogen treatment.
Question 4: Can we reject the null hypothesis? why or why not? (1 point) Since our P value is less than less than our ‘a’ value of 0.05, we can reject our null hypothesis
Write a code using t-test to determine if the yield from using nitrogen-enriched fertilizer is significantly higher than the yield from using the control fertilizer.
# Perform a one-sided t-test
onesidet_test_wheatyield <- t.test( N_df$Yield..kg.ha., control_df$Yield..kg.ha.,
alternative = "greater", var.equal = FALSE)
# Print the results
print(onesidet_test_wheatyield)
##
## Welch Two Sample t-test
##
## data: N_df$Yield..kg.ha. and control_df$Yield..kg.ha.
## t = 5.5969, df = 51.234, p-value = 4.305e-07
## alternative hypothesis: true difference in means is greater than 0
## 95 percent confidence interval:
## 566.277 Inf
## sample estimates:
## mean of x mean of y
## 5044.367 4236.207
Question 5: Based on the result of the t-test, is the yield from using nitrogen-enriched fertilizer significantly higher than the yield from using the control fertilizer? Why or why not? (1 point) Yes, we can conclude that is higher since our t-value is positive, which indicates that the mean of the first variable (Nitrogen) is larger than the mean of the second variable (control)