QUESTION 1

1a. According to the histogram of student heights, the data is normally distributed because the curve is bell shaped or symmetrical with one peak. The qqplot also displays a normal distribution since majority of the data points lay in a straight line from the lower left to the upper right. However, there are some some minor deviations at the tails. In all, normality assumption appears reasonable for most statistical analyses.

1b. According to the histogram of student pulse rate, the data is skewing to the right and the qqplot points diverge from the line on both sides, so its likely the data isn’t normally distributed for pulse rates less than 50 and greater than 100. In all, right-skewed histogram and QQ plot deviations suggest that student pulse rates are not normally distributed.

1c.According to the histogram of student gpa, the data is skewing to the left and the qqplot points diverge from the line at both ends especially towards the upper right hand corner at 2 quantiles. In all, the data sugests the student GPA is not normally distributed.

## 95% Confidence Interval for Height: ( 67.89634 ,  68.30895 )

1d. We are 95% confident that the true average height of a student is between the Confidence Interval ( 67.89634 , 68.30895 ). The narrow interval suggests that the true mean student height is close to the sample mean with little variation. In other words, we have high precision.

## 99% Confidence Interval for Pulse: ( 73.02185 ,  75.41071 )

1e. We are 99% confident that the true average pulse of a student is between the Confidence Interval ( 73.02185 , 75.41071 ). The wider interval (compared to the 95% CI for height) suggests more variability in pulse rates.

QUESTION 2

## 
##  Welch Two Sample t-test
## 
## data:  fertilized_height and control_height
## t = -2.9507, df = 53.503, p-value = 0.004696
## alternative hypothesis: true difference in means is not equal to 0
## 90 percent confidence interval:
##  -0.8507992 -0.2349151
## sample estimates:
## mean of x mean of y 
##  2.039286  2.582143

Problem II Welch Two Sample t-test

data: fertilized_height and control_height t = -2.9507, df = 53.503, p-value = 0.004696 alternative hypothesis: true difference in means is not equal to 0 90 percent confidence interval: -0.8507992 -0.2349151 sample estimates: mean of x mean of y 2.039286 2.582143

2a. The difference between mean(x) and mean(y) is: 2.039-2.582143 = -0.542857. As the confidence interval does not include 0, we can reject the null hypothesis that the two groups are equal. The entire confidence interval is also negative, and this suggests that fertilized plants may actually be shorter on average than control plants.

2b. The fertilizer manufacturer claims that using their fertilizer increases plant growth. For this to be true, the confidence interval should entirely be positive. However the confidence interval is negative and on average, the plants the used fertilizer is lower on average compared to the control plants; therefore, the fertilizer manufacturer claims around greater average plant growth is not supported.

2c. The largest difference in average growth corresponds to the most negative value in the confidence interval, which is -0.8508.At 90% confidence, therefore, we expect the largest negative impact of fertilization to be about 0.85 cm shorter.

Code Appendix

knitr::opts_chunk$set(
    echo = FALSE,
    message = TRUE,
    warning = TRUE
)
# Load the data
student_dt <- read.csv("C:/Users/hemanth/Documents/STA 100/Exams/student.csv")
# Histogram of student height
hist(student_dt$height, main = "Histogram: Student Heights", xlab = "Height (cm)", col = "lightblue")

# QQ plot of student height
qqnorm(student_dt$height, main = "QQ Plot: Student Heights")

#Add a trend line
qqline(student_dt$height, col = "purple")
# Histogram of student pulse
hist(student_dt$pulse, main = "Histogram: Student Pulse Rates", xlab = "Pulse Rate", col = "lightgreen")

# QQ plot of student pulse
qqnorm(student_dt$pulse, main = "QQ Plot: Student Pulse Rates")

# Add trend line 
qqline(student_dt$pulse, col = "blue")
# Histogram of student GPA
hist(student_dt$hsGPA, main = "Histogram: High School GPA", xlab = "High School GPA", col = "lightcoral")

# QQ plot of student GPA
qqnorm(student_dt$hsGPA, main = "QQ Plot: High School GPA")

# Add trend line 
qqline(student_dt$hsGPA, col = "orange")
# Mean of student height 
m_height <- mean(student_dt$height, na.rm = TRUE)

#Standard deviation of student height
sd_height <- sd(student_dt$height, na.rm = TRUE)

#Sum of non-missing student height values - generates sample size
n_height <- sum(!is.na(student_dt$height))

#Calculation of margin of error
se_height <- qt(0.975, df = n_height - 1) * (sd_height / sqrt(n_height))

#LB and UB confidence interval
ci_lower <- m_height - se_height
ci_upper <- m_height + se_height

#Format interval and print 
CI_Height<-cat("95% Confidence Interval for Height: (", ci_lower, ", ", ci_upper, ")")

# Mean of student pulse 
mn_pulse <- mean(student_dt$pulse, na.rm = TRUE)

#Standard deviation of student pulse
sd_pulse <- sd(student_dt$pulse, na.rm = TRUE)

#Sum of non-missing student height values - generates sample size
n_pulse <- sum(!is.na(student_dt$pulse))

#Calculation of margin of error
se_pulse <- qt(0.995, df = n_pulse - 1) * (sd_pulse / sqrt(n_pulse))

#LB and UB confidence interval
ci_lower_pulse <- mn_pulse - se_pulse
ci_upper_pulse <- mn_pulse + se_pulse

#Format interval and print 
CI_Pulse<-cat("99% Confidence Interval for Pulse: (", ci_lower_pulse, ", ", ci_upper_pulse, ")")

# Load the Radish data
radish_dt <- read.csv("C:/Users/hemanth/Documents/STA 100/Exams/Radish.csv")
# Create a control and treatment group
control_height <- radish_dt$Height[radish_dt$Treatment == "Control"]
fertilized_height <- radish_dt$Height[radish_dt$Treatment == "Fertilized"]

# 90% confidence interval
t_test<-t.test(fertilized_height, control_height, conf.level = 0.90)
print(t_test)