Code
knitr::opts_chunk$set(warning = FALSE)We choose a dataset that has 81 data, focusing on the factors affecting coral calcification (light).
knitr::opts_chunk$set(warning = FALSE)library(readxl)
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
coral <- read_xlsx("coral .xlsx")
ggplot(coral, aes(x = Production)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Boxplots of Production")ggplot(coral, aes(x = pCO2.med)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Boxplots of pCO2.med")ggplot(coral, aes(x = Calc.light)) +
geom_boxplot(fill = "lightblue") +
labs(title = "Boxplot of Calc.light")round(cor(coral[, c("Production", "Respiration", "pCO2.med", "pH.med","Calc.light")], use = "complete.obs"), 2) Production Respiration pCO2.med pH.med Calc.light
Production 1.00 -0.08 0.25 -0.33 0.16
Respiration -0.08 1.00 -0.40 0.52 0.00
pCO2.med 0.25 -0.40 1.00 -0.75 -0.30
pH.med -0.33 0.52 -0.75 1.00 0.10
Calc.light 0.16 0.00 -0.30 0.10 1.00
cor <- round(cor(coral[, c("Production", "Respiration", "pCO2.med", "pH.med", "Calc.light")], use = "complete.obs"), 2)
pairs(cor)EDA
The 3 variables examined in the boxplots are Production, pCO2.med, and Calcification (light).Production and pCO2.med represent physiological and environmental measurements relevant to coral calcification.Calcification(light) shows the level of coral’s calcification.
The central tendency of each variable is represented by the median line in each box.
“Production” has a median around 0.18, with a relatively narrow IQR, suggesting consistent production rates across samples.It appears slightly right-skewed, with a few outliers.
“pCO2.med” exhibits a wider IQR and shows a median around 600 and displays greater spread, indicating more variavility.In addition, it is slightly right-skewed, with some outliers.
“Calc.light” also shows a wider IQR and right skewness, with a median around 0.12.
For the correlation, the results have showed that the correlation between Calc.light and Production and pCO2.med is higher(r = 0.16 and r = -0.3).The correlation bewteen Calc.light and Respiration and ph.med is much weaker.
In the pairs plot, we can see a slight positive relationship between Production and Calc.light and a negative relationship between pCO2.med and Calc.All in all, Production and pCO2.med exhibit some association with Calc.light, which may need further statistical modeling.The strong inverse relationship between pCO2.med and ph.med indicates potential multicollinearity if are included in a regression model.Therefore, we choose “Calc.light” as the response variable, “Production” and “pCO2.med” as the explanatory variables.
Hypothesis;
Null hypothesis:There is no relationship between calcification (light), photosynthesis production, and oceanic carbon dioxide. (H0 :β1 = β2 =…= βk =0)
Alternative hypothesis:Oceanic carbon dioxide and photosynthesis production will have an effect on calcification (light). (H1 :At least one βk =0)
library(readxl)
library(car)Loading required package: carData
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
library(lm.beta)
model <- lm(Calc.light ~ Production + pCO2.med , data = coral)
par(mfrow = c(2, 2))
plot(model)summary(model)
Call:
lm(formula = Calc.light ~ Production + pCO2.med, data = coral)
Residuals:
Min 1Q Median 3Q Max
-0.094142 -0.028893 -0.007885 0.027725 0.104036
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.175e-02 2.384e-02 3.849 0.000248 ***
Production 2.767e-01 1.235e-01 2.241 0.027984 *
pCO2.med -4.602e-05 1.417e-05 -3.248 0.001740 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.04718 on 75 degrees of freedom
(3 observations deleted due to missingness)
Multiple R-squared: 0.145, Adjusted R-squared: 0.1222
F-statistic: 6.36 on 2 and 75 DF, p-value: 0.00281
lm.beta(model)
Call:
lm(formula = Calc.light ~ Production + pCO2.med, data = coral)
Standardized Coefficients::
(Intercept) Production pCO2.med
NA 0.2471812 -0.3582481
Check assumption
The diagnostic plots suggest that the model reasonably meet the key assumptions:linearity, normality of residuals, and homoscedasticity.Although minor deviations exist, for example, slight non-normality in residuals and one high-leverage point, they are not severe enough to invalidate the model.
Statistical conclusion:
The F-test indicated the model is significant(P < 0.05), allowing us to reject the null hypothesis and conclude at least one of our partial regression coefficients has a slope not equal to 0.
Our t-test upon each of our partial regression coefficients supports this, as the P-value for Production was 0.028, and the P-value for pCO2.med was 0.002, both smaller than 0.05.Therefore we can reject the null hypothesis that the coefficients were equal to 0.
The fit of this model is moderate, with a residual standard error of 0.047 and an adjusted R-square value of 0.12.
Scientific conclusion:
Photosynthesis production and oceanic CO2 are signficant predictors of coral’s light calcification(P < 0.05).The model accounts for 12.2% of variation in coral’s light calcification.