# Please insert your code hereENV221 23-24 Final Exam
Task 1 Environmental Data Analysis with “airquality” Dataset. (60 points)
You are an environmental scientist analyzing air quality data for New York City during the summer of 1973 using the built-in R dataset “airquality”. Your goal is to perform various data analysis tasks and visualizations on this dataset to gain insights and draw meaningful conclusions.
Subtask 1
Load the airquality dataset. Calculate the average ozone concentration (Ozone) for the entire dataset. Store the result in the variable named mean_ozone. Print out mean_ozone. Note: remove NA is needed. (4 points)
Subtask 2
Find the row with the highest ozone concentration and store the result in the variable named max_ozone_day. Identify the corresponding weather conditions (Solar.R, Temp, Wind) of that day and store the results in the variable named max_ozone_conditions. Print out max_ozone_conditions. (8 points)
# Please insert your code hereSubtask 3
Calculate the correlation matrix between Ozone, Solar.R, Temp, and Wind. Store the result in the variable named cor_matrix. Print out cor_matrix. (6 points)
# Please insert your code hereSubtask 4
Create a scatter plot matrix to visualize the relationships between Ozone and weather variables. Give the R code to create the scatter plot. (8 points)
# Please insert your code hereSubtask 5
Calculate the 95% confidence interval for the mean Ozone concentration. (6 points)
# Please insert your code hereSubtask 6
Perform a hypothesis test to investigate whether there is a significant difference in ozone concentration between the months of June and July at the significance level of 0.05. (24 points)
What hypothesis test should you choose from those introduced in this module? (4 points)
Provide your answer here:
NoteWhat are your null hypothesis and alternative hypothesis? (4 points)
Provide your answer here:
NoteGive the R code to apply the hypothesis test. (4 points)
# Please insert your code hereGive the value of the test statistic. (4 points)
# Please insert your code hereGive the p-value. (4 points)
# Please insert your code hereGive the decision about the hypothesis. (4 point)
Provide your answer here:
NoteGive the final conclusion. (4 points)
Provide your answer here:
Note
Subtask 7
Create a histogram of Wind speed (Wind) with appropriate labels and titles. (4 points)
# Please insert your code hereTask 2 Step by step hypothesis test (40 points)
Effluents from wastewater treatment plants contain nutrients, organic and inorganic pollutants, which is an important source of urban river pollution. Such pollution issue is highly relevant to organisms living in the rivers as well as human health. To answer whether effluents from wastewater treatment plants influence microorganisms in urban rivers, a research group collected water samples from a river in Suzhou, both upstream (before receiving effluent) and downstream (after receiving effluent) of a municipal wastewater treatment plant. The measured number of bacterial species in the water samples obtained in different seasons are as follows.
Subtask 1
What hypothesis test should you choose to estimate how wastewater treatment plant effluent, season, and interaction between these two factors affect number of bacterial species?
Provide your answer here:
Subtask 2 (Continued)
What are your three null hypotheses?
Provide your answer here:
H01:
H02:
H03:
Subtask 3 (Continued)
Apply the hypothesis test STEP BY STEP by answering the following questions. Give the reproducible R code to calculate the between-group degree of freedom for effluent (1point), season (1 point), and interaction between effluent and season (1 point), and within-group degree of freedom (1 point).
# Insert your code hereSubtask 4 (Continued)
Give the reproducible R code to calculate the mean of squared deviation from the mean for effluent (2 points), season (2 points), and interaction between effluent and season (2points).
# Insert your code hereSubtask 5 (Continued)
Give the reproducible R code to calculate the value of the test statistic for effluent (1 point), season (1 point), and interaction between effluent and season (1 point).
# Insert your code hereSubtask 6 (Continued)
At the significance level of 0.05, give the critical value of the test statistic for effluent (1 point), season (1 point), and interaction between effluent and season (1 point).
# Insert your code hereSubtask 7 (Continued)
Give the reproducible R code to calculate the p value for effluent (2 point), season(2 point), and interaction between effluent and season (2 point).
# Insert your code hereSubtask 8 (Continued)
Give the decisions about your hypotheses.
Provide your answer here:
Subtask 9 (Continued)
What are your conclusions?
Provide your answer here: