Agronomists at UC Davis conducted a field experiment to evaluate the impact of a specific type of nitrogen fertilizer (N) on wheat yield, measured in kilograms per hectare. In the study, wheat was grown under two different conditions: one set of plots received N fertilizer at a rate of 50 kg/ha, while the other set served as a control group with no fertilizer applied.
#load necessary packages
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Load in the data, then str() it. (1 Point)
#load the data
raw_wheat <- read.csv("C:/Users/Joey/Downloads/wheat_yield.csv")
#if not already done, change the fertilizer treatment to a factor
Fertilizer <- as.factor(raw_wheat$Fertilizer)
str(Fertilizer)
## Factor w/ 2 levels "Control","N": 1 1 1 1 1 1 1 1 1 1 ...
What is the sample size for this experiment? How many samples do you have for each group? (1 Point) The sample size is 60, 27 in control and 33 in N.
Calculate the overall mean and standard deviation of yield for the entire dataset (2 Points)
#calculate the mean
mean_yield <-mean(raw_wheat$Yield..kg.ha.)
print(mean_yield)
## [1] 4680.695
#calculate the standard deviation
sd_yield<- sd(raw_wheat$Yield..kg.ha.)
print(sd_yield)
## [1] 677.4056
Calculate the z-score for the entire dataset (1 point).
#set the alpha for a 95% confidence interval
alpha<- 0.05
#now calculate the z score
n<- 60
zyield<- qnorm(1-alpha/2)
Calculate the confidence interval of the mean for the entire data set. (3 points)
#first, calculate the standard error
SE_yield<- sd_yield/sqrt(60)
#next calculate the margin of error
ME_yield<- zyield*SE_yield
#now make the upper and lower bounds of the confidence interval
LB_yield<- mean_yield - ME_yield
UB_yield<- mean_yield + ME_yield
#finally, make a vector containing the upper bound, mean and lower bound to make the confidence interval.
Bound_interval<- c(LB_yield,mean_yield,UB_yield)
print(Bound_interval)
## [1] 4509.291 4680.695 4852.099
What does this confidence interval mean? What does it tell you about the effect of the fertilizer treatment on yield? (1 Point) This confidence interval means that without separating data by treatment, we are 95% confident our sample mean will fall between the upper and lower bounds. It doesn’t tell us about the effect of the fertilizer on the yield because we haven’t separated the data into the control and treatment categories yet.
Now lets look at the confidence intervals for the individual treatments. Separate the data set by treatment, calculate the means and standard deviation for each, then make two new confidence intervals.
#seperate the dataset by the treatments
control_df <- raw_wheat %>% filter(Fertilizer == "Control")
N_df <- raw_wheat %>% filter(Fertilizer == "N")
#calculate the yield means and standard errors for both treatments
mean_control<- mean(control_df$Yield..kg.ha.)
mean_N<- mean(N_df$Yield..kg.ha.)
#calculate the standard error for each
sd_control<- sd(control_df$Yield..kg.ha.)
sd_N<- sd(N_df$Yield..kg.ha.)
SE_control<- sd_control/sqrt(27)
print(SE_control)
## [1] 114.5185
SE_N<- sd_N/sqrt(33)
print(SE_N)
## [1] 87.95125
#calculate the margin of error for each of the treatments
#control
DoFcontrol<- 26
t_scorecontrol <- qt(1-alpha/2, DoFcontrol)
ME_control <- t_scorecontrol*SE_control
#N
DofN<- 32
t_scoreN<- qt(1-alpha/2, DofN)
ME_N <- t_scoreN*SE_N
#calculate the upper and lower bounds for each of the treatment confidence intervals, and store them in a vector
#control
LB_control<- mean_control - ME_control
UB_control<- mean_control + ME_control
Vector_ME_control<- c(LB_control,UB_control)
print(Vector_ME_control)
## [1] 4000.811 4471.603
#N
LB_N<- mean_N - ME_N
UB_N<- mean_N + ME_N
Vector_ME_N<- c(LB_N,UB_N)
Is there a difference between the treatment and the control? Why or why not? Justify your answer using the results from your confidence intervals. (1 point) Yes, there is a difference between the treatment and the control. The mean of the N trial is higher that the control trial. The Confidence interval for the N trial also has higher values than that of the control trial.