presentation Group 7

We choose a dataset that has 81 data, focusing on the factors affecting coral calcification (light).

Code
knitr::opts_chunk$set(warning = FALSE)
Code
library(readxl)  
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Code
coral <- read_xlsx("coral .xlsx") 
ggplot(coral, aes(x = Production)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Boxplots of Production")

Code
ggplot(coral, aes(x = pCO2.med)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Boxplots of pCO2.med")

Code
ggplot(coral, aes(x = Calc.light)) +
  geom_boxplot(fill = "lightblue") +
  labs(title = "Boxplot of Calc.light")

Code
round(cor(coral[, c("Production", "Respiration", "pCO2.med", "pH.med","Calc.light")], use = "complete.obs"), 2)
            Production Respiration pCO2.med pH.med Calc.light
Production        1.00       -0.08     0.25  -0.33       0.16
Respiration      -0.08        1.00    -0.40   0.52       0.00
pCO2.med          0.25       -0.40     1.00  -0.75      -0.30
pH.med           -0.33        0.52    -0.75   1.00       0.10
Calc.light        0.16        0.00    -0.30   0.10       1.00
Code
cor <- round(cor(coral[, c("Production", "Respiration", "pCO2.med", "pH.med", "Calc.light")],  use = "complete.obs"), 2)
pairs(cor)

EDA

The 3 variables examined in the boxplots are Production, pCO2.med, and Calcification (light).Production and pCO2.med represent physiological and environmental measurements relevant to coral calcification.Calcification(light) shows the level of coral’s calcification.

The central tendency of each variable is represented by the median line in each box.

“Production” has a median around 0.18, with a relatively narrow IQR, suggesting consistent production rates across samples.It appears slightly right-skewed, with a few outliers.

“pCO2.med” exhibits a wider IQR and shows a median around 600 and displays greater spread, indicating more variavility.In addition, it is slightly right-skewed, with some outliers.

“Calc.light” also shows a wider IQR and right skewness, with a median around 0.12.

For the correlation, the results have showed that the correlation between Calc.light and Production and pCO2.med is higher(r = 0.16 and r = -0.3).The correlation bewteen Calc.light and Respiration and ph.med is much weaker.

In the pairs plot, we can see a slight positive relationship between Production and Calc.light and a negative relationship between pCO2.med and Calc.All in all, Production and pCO2.med exhibit some association with Calc.light, which may need further statistical modeling.The strong inverse relationship between pCO2.med and ph.med indicates potential multicollinearity if are included in a regression model.Therefore, we choose “Calc.light” as the response variable, “Production” and “pCO2.med” as the explanatory variables.

Hypothesis;

Null hypothesis:There is no relationship between calcification (light), photosynthesis production, and oceanic carbon dioxide. (H0 :β1 = β2 =…= βk =0)

Alternative hypothesis:Oceanic carbon dioxide and photosynthesis production will have an effect on calcification (light). (H1 :At least one βk =0)

Code
library(readxl)  
library(car)
Loading required package: carData

Attaching package: 'car'
The following object is masked from 'package:dplyr':

    recode
The following object is masked from 'package:purrr':

    some
Code
library(lm.beta)

model <- lm(Calc.light ~ Production + pCO2.med , data = coral)
par(mfrow = c(2, 2))
plot(model)

Code
summary(model)

Call:
lm(formula = Calc.light ~ Production + pCO2.med, data = coral)

Residuals:
      Min        1Q    Median        3Q       Max 
-0.094142 -0.028893 -0.007885  0.027725  0.104036 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  9.175e-02  2.384e-02   3.849 0.000248 ***
Production   2.767e-01  1.235e-01   2.241 0.027984 *  
pCO2.med    -4.602e-05  1.417e-05  -3.248 0.001740 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.04718 on 75 degrees of freedom
  (3 observations deleted due to missingness)
Multiple R-squared:  0.145, Adjusted R-squared:  0.1222 
F-statistic:  6.36 on 2 and 75 DF,  p-value: 0.00281
Code
lm.beta(model)

Call:
lm(formula = Calc.light ~ Production + pCO2.med, data = coral)

Standardized Coefficients::
(Intercept)  Production    pCO2.med 
         NA   0.2471812  -0.3582481 

Check assumption

The diagnostic plots suggest that the model reasonably meet the key assumptions:linearity, normality of residuals, and homoscedasticity.Although minor deviations exist, for example, slight non-normality in residuals and one high-leverage point, they are not severe enough to invalidate the model.

Statistical conclusion:

The F-test indicated the model is significant(P < 0.05), allowing us to reject the null hypothesis and conclude at least one of our partial regression coefficients has a slope not equal to 0.

Our t-test upon each of our partial regression coefficients supports this, as the P-value for Production was 0.028, and the P-value for pCO2.med was 0.002, both smaller than 0.05.Therefore we can reject the null hypothesis that the coefficients were equal to 0.

The fit of this model is moderate, with a residual standard error of 0.047 and an adjusted R-square value of 0.12.

Scientific conclusion:

Photosynthesis production and oceanic CO2 are signficant predictors of coral’s light calcification(P < 0.05).The model accounts for 12.2% of variation in coral’s light calcification.