The Great Barrier Reef (GBR) is among the most diverse and productive ecosystems in the world, providing a range of critical services to ecological and anthropogenic communities. However, damaging waves generated by cyclones present a major threat to reef health. Modelling and predicting cyclone impacts on reefs presents a range of empirical and theoretical challenges. Notably, models must merge high-resolution atmosphere and wave data with low-resolution empirical data from reef monitoring programs. Our model, based on cyclone-induced wave conditions over 33 years at 10 reefs in the Southern GBR, provides a novel and scalable approach to modelling cyclone-induced reef damage.
We find that cyclone duration and wave conditions are important drivers of coral cover change, with prolonged exposure to storm-generated waves generally associated with greater coral loss. Among the models tested, Random Forest provided the strongest predictive performance, revealing non-linear relationships between cyclone exposure, wave characteristics, and reef condition. This highlights the importance of considering storm intensity and exposure duration when assessing cyclone impacts on coral reefs.
Beyond providing a novel approach for modelling cyclone impacts, we developed an interactive exhibit for educators and community education, translating our complex model into a practical tool for engaging the non-scientific public. The app includes a ‘design-your-own cyclone’ feature based on our model results, allowing users to interactively explore the metrics that contribute to cyclone damage to reefs.
This project (Figure 1) presents a robust, scalable approach to future modelling and predicting cyclone impacts on reefs. Additionally, by translating complex modelling approaches into accessible, interactive platforms for education and science communication, our project framework represents a more inclusive approach to communicating results beyond the scientific community.
Figure 1: Flowchart showing the key steps taken during our project workflow. Identification of the problem, dataset sourcing, processing and modelling steps are shown.
1 Introduction
1.1 Project Aims
This project aims to model and predict the immediate impacts of tropical cyclone-induced waves on reef coral cover and to develop an interactive platform to communicate results to generalised audiences. Our research questions are:
a)How do wave conditions during tropical cyclones impact coral cover in Capricorn Bunker, Great Barrier Reef?
b)How can these changes be communicated as an accessible educational tool?
1.2 Background
Coral reefs support exceptional biodiversity and deliver essential ecosystem services, including coastal protection, fisheries, and tourism (Wu et al. 2025; Ferrario et al. 2014; Graham and Nash 2012). However, the GBR is highly sensitive to environmental stressors. Between 2024 and 2025, coral cover declined by 14–30% across the GBR, largely due to cumulative impacts of bleaching and cyclone damage (AIMS 2025a). Declines have cascading ecological effects on ecosystems, in particular reef-associated biota, threatening overall reef health, value, and resilience (Srinivasan et al. 2026; Graham and Nash 2012).
Tropical cyclones (TCs) are among the most severe acute disturbance events affecting the GBR. While cyclone damage extent is largely influenced by intensity, size, and duration (Dixon et al. 2022), impacts on reefs are ultimately controlled by the mechanical forces of wind-generated waves. Cyclone winds generate damaging waves that cause erosion, dislodgement, large-scale rubble production, and contribute to slope destabilisation (Srinivasan et al. 2026; Vila-Concejo and Kench 2017). Storm waves can propagate long distances, with outer-shelf reefs attenuating but not fully blocking wave energy (Callaghan, Mumby, and Mason 2020). High-resolution wave models demonstrate that near-bottom velocities of 2.5–3.1 m/s exceed the mechanical tolerance of branching and tabular corals, reliably predicting dislodgement (Cheung et al. 2025).
Understanding and predicting how reefs respond to cyclones is critical for effective management and conservation, particularly as forecast increases in extreme storm events (Knutson et al. 2020). However, cyclone damage is challenging to predict. Reef slopes oriented towards incoming swells attenuate the most wave energy, and experience greater damage (Ferrario et al. 2014). Reefs exposed to regular storm regimes may likewise be more adapted to absorb impacts, or rapidly recover due to adapted coral morphologies (Byrne et al. 2023; Zawada et al. 2019).
Quantifying cyclone impacts requires merging high-temporal-resolution climate data with low-temporal-resolution, inconsistent coral-cover surveys. Previous approaches included deriving annual descriptive wave climates to merge with coral cover data (De’ath et al. 2012; Gouezo et al. 2019), or correlating coral cover changes with the presence or absence of cyclones (Osborne et al. 2011). However, reducing high-resolution cyclone data to annual variables may omit important cyclone-specific predictors of damage. A higher-resolution approach is to measure individual reef conditions before and after a cyclone (Fabricius et al. 2008) to predict impacts in sequential modelling (Wolff et al. 2018). However, sampling around unpredictable TCs is not always feasible, and individual reef responses may not scale accurately.
We present a novel, scalable approach to modelling cyclone damage by deriving cyclone-induced wave climates and merging these conditions with available coral-cover data on a cyclone-frequency-dependent basis. This method provides a robust representation of damage impacts and predictors, compared to standard approaches that face trade-offs between data resolution and scalability. By developing an online exhibition to communicate cyclone-reef impacts, we aim to translate our complex modelling outputs into accessible formats for broader public audiences.
2 Methods
2.1 Study Site
The study is situated in the Capricorn-Bunker Group (CBG) (Figure 2). The CBG contains 17 islands, 10 km west of the continental shelf edge in the southernmost sector of the GBR (Jell and Flood 1978). It experiences fewer, but more temporally clustered cyclones than the Central GBR (Wolff et al. 2018). Notably, Southern reefs have been less affected by marine heatwaves and bleaching events than other regions (Hughes 2017), and the CBG has had no recorded severe outbreaks of crown-of-thorns starfish (AIMS 2025b). Thus, the CBG provides a robust case study for isolating cyclone impacts from other confounding impacts.
Ten reefs in the CPG have been monitored biannually since 2006 (for 4 reefs, monitoring started in 1992) through the Australian Institute of Marine Science Long-Term Monitoring Program (AIMS LTMP). The LTMP uses manta tow surveys, where a diver towed behind a boat records percentage estimations of coral cover around the reef perimeter. As cyclone damage is concentrated in the orientation of incoming waves (Vila-Concejo and Kench 2017), manta tow data provides a robust indicator of potentially heterogenously distributed cyclone impacts across the reef. At each site, wave conditions were extracted and isolated to reflect cyclone windows.
Figure 2: Study site: 10 reefs in the Capricorn-Bunker Group, Southern Great Barrier Reef. a) Study context, the white circle represents the 1000km cyclone radius used to constrain cyclone tracks. b) Higher resolution view of the 10 reefs in this study: North Reef, Broomfield reef, Wreck Island Reef, One Tree Island Reef, Erskine Reef, Mast Head Reef, Boult Reef, Hoskyn Islands Reef, Fairfax Islands Reef, Lady Musgrave Reef. Orange markers indicate the island measured under the Australian Institute of Marine Science Long-Term Monitoring Program. Blue markers indicate the gridded locations used to collect wave hindcast data from the Centre for Australian Weather and Climate Wave Hindcast Model. Images from: AIMS (2025), Google Earth (2026).
2.2 Datasets
2.2.1 Extracting Wave Conditions
The Centre for Australian Weather and Climate Research (CAWCR) Wave Hindcast Model is a global model based on forcing conditions from the Climate Forecast System Reanalysis, producing hourly wave hindcast outputs at a 0.4° × 0.4° global resolution. Wave data from 1992 to present was collected from unique grid points adjacent to sites in NetCDF format (Figure 2). Each grid point was linked to its corresponding reef name to generate a combined dataset for hourly wave conditions across study sites since 1992.
From variables in the dataset, wave energy \(\textbf{E}\) and wave power \(\textbf{P}\) were calculated by the equations below to account for the cumulative impacts of height and speed. Deep-water Linear Wave Theory equations were used (Liu 1995), as depths at each wave data point ranged from 8-60 metres (Google Earth 2026). Calculations were performed in Microsoft Excel.
\(\rho\) (seawater density) was 1025 kg/m3 in line with mean seawater density (Karnauskas 2020);
\(g\) (gravitational acceleration) was 9.81 m/s2 ;
\(\mathbf{H}_{\textbf{s}}\) was the mean significant wave height in meters; and
the coefficient was reduced to \(\frac{1}{16}\) (T. Salles, personal communication, 7 May 2026), as significant wave height values are derived from a spectrum of many irregular sea waves, rather than a single idealised wave height.
Wave Power,\(\textbf{P}\) (kW/m)
\[
\textbf{P} = \textbf{E}\,C\,n\]
Where:
\(n=\frac{1}{2}\) to reflect deep water equation parameters (Liu 1995);
celerity: \(C=1.56\,T_p\) ; and
peak period \(T_p\) is the inverse of peak wave frequency \(F_p\) (discretised from 0.038 to 0.5 Hz) and determined as: \(T_p =\frac{1}{F_p}\).
2.2.2 Identifying Cyclone Windows
Cyclone tracks were used to identify where cyclones entered a 1000 km radius of each reef. While the spatial extent of TC gales is 300-500 km (Grossmann-Matheson et al. 2024), propagated waves can travel up to 20,000 km (Hoeke et al. 2013). A radius of 1000 km was chosen as a conservative estimate to reduce confounding effects of regular sea-state conditions, whilst accounting for potential lag effects in wave arrival. Cyclone tracks were accessed from the International Best Track Archive for Climate Stewardship (IBTrACS, v04r01). For each TC, the dataset provides the cyclone eye location at 3-hour intervals. Cyclone data was merged with wave data by matching the date, hour, and minute across the datasets. The spatial distance between gridded reef wave points and cyclone locations was calculated using the Haversine formula (Robusto 1957), computing the distance between two locations using their longitude and latitude, and cyclone observations within 1000 km of the reefs were retained.
2.2.3 Quantifying Changes in Coral Cover
To quantify the impacts of wave climates on coral cover, the combined wave and cyclone dataset was merged with AIMS LTMP coral cover observations for each cyclone. Dataset merging used reef name as the common key, matching the surveys in the manta tow dataset for every cyclone reef row in the wave cyclone dataset. New columns were created to calculate descriptive wave climate characteristics across the duration of each observed cyclone for each specific reef. Finally, live and dead coral cover data were extracted according to the nearest monitoring dates before and after each cyclone. Thus, the merged dataset is cyclone-frequency dependent, including reef-specific wave conditions and temporally related coral cover observations. (For detailed methods see Appendix Section 8.1).
2.3 Exploratory Data Analysis
Dataset merging was performed using R and Python in RStudio, whereas all modelling and figures were produced in RStudio using R. The final product was deployed as a web application using Shiny (Chang et al. 2026).
Data cleaning and preprocessing were performed during dataset merging by removing observations with missing geographic coordinates, and keeping only cyclones after 1992 that passed within 1000 km of CBG reefs. The final dataset contained 152 observations across 10 reefs and 17 cyclones, with numeric, categorical, and temporal variables describing wave conditions, cyclone exposure, and coral cover. No missing values remained. The response variable was the change in live coral cover, calculated using the difference between pre- and post-cyclone measurements.
Code (Reading in Data)
data <-read_csv("merging datasets/merged_cyclone_coral_v4.csv", show_col_types =FALSE)all_vars <-c("Mean_hs","Max_hs","Intervals_hs_gt4","Intervals_hs_gt3","Intervals_hs_gt25","Mean_fp","Max_fp","Mean_dir","Std_dir","Min_distance_km","Duration_hours","Mean_E","Max_E","Mean_P","Max_P")all_data <- data %>% dplyr::select(all_of(all_vars)) %>%mutate(across(everything(), as.numeric))selected_vars <-c("Intervals_hs_gt3","Mean_hs","Mean_P","Mean_fp","Mean_dir","Min_distance_km","Duration_hours")cyclone_exposure <- data %>% dplyr::select(all_of(selected_vars))
Multicollinearity between predictor variables was checked using correlation analysis, where several wave-related variables showed strong correlations, such as mean and maximum significant wave height. (Appendix Section 8.2)
Multicollinearity between predictor variables was checked using correlation analysis, where several wave-related variables showed strong correlations, such as mean and maximum significant wave height. The final predictors included mean significant wave height, hours with significant wave height exceeding 3 metres, mean wave power, mean peak wave frequency, mean wave direction, minimum cyclone to reef distance, and cyclone duration. Predictably, mean significant wave height and mean wave power were highly correlated since wave power is derived from the square of wave height. However, considering the importance of accounting for the cumulative effects of wave height, length, and speed on forces exerted on coral colonies during wave dissipation, both variables were retained (Ferrario et al. 2014). Details descriptions can be found in Appendix Section 8.3.
2.4 Model Selection
Linear Regression was used as a baseline model due to its simplicity and interpretability, as it provided a direct assessment of linear relationships between environmental predictors and coral cover change. Backward stepwise feature selection was applied to the full set of environmental variables, iteratively removing less important predictors to improve model performance, resulting in a final simpler model.
However, relationships between wave and cyclone exposure variables and coral cover change were often complex and non-linear. To better capture these patterns, additional machine learning models were evaluated using a predefined subset of predictor variables selected based on environmental relevance and exploratory analysis. K-Nearest Neighbours (KNN) Regression was implemented as a distance-based algorithm, with predictor variables standardised to prevent variables with larger magnitudes from dominating the distance calculations. Hyperparameter tuning was then performed to identify the optimal number of neighbours.
Random Forest, Extra Trees Regression, and XGBoost served as ensemble tree-based or boosted tree approaches capable of capturing interaction effects and threshold-based environmental impacts. Hyperparameter tuning was conducted for all models using cross-validation. Unlike Random Forest, which built trees independently, Extra Trees introduced additional randomness when choosing split points to increase tree diversity and reduce overfitting (Geurts, Ernst, and Wehenkel 2006). XGBoost constructed trees sequentially, where at each training step the model fixed the trees already learned and added a new tree to improve the objective function, allowing subsequent trees to reduce the remaining prediction errors from earlier trees (Chen and Guestrin 2016).
For all tuned models, the best-performing parameter setting was selected based on the lowest cross-validated RMSE before the final model was fitted using the training dataset and evaluated on the testing dataset. RMSE was used as the primary tuning metric because it places greater emphasis on larger prediction errors, which was important for reliable prediction of coral cover change. Although the dataset was relatively small and the models showed some risk of overfitting, conservative hyperparameter tuning and modelling choices were used to reduce this risk and improve model stability.
2.5 Model Assumptions
Several modelling assumptions were considered throughout the analysis. One key assumption was data independence, although this was inherently violated to some extent because geographically proximate reefs may have shared similar environmental conditions and cyclone exposure. Despite this unavoidable spatial dependence in ecological datasets, each reef–cyclone observation was treated as independent for modelling purposes. The training and testing datasets were also assumed to come from similar distributions, with the training data being representative of broader reef and cyclone exposure conditions. For linear regression, additional assumptions included linearity, homoscedasticity, and normality of residuals, along with multicollinearity among predictor variables checked before modelling.
5-fold cross-validation (CV) with 50 repeats was applied to evaluate model performance and stability, with performance metrics including Root Mean Square Error (RMSE) and R2 values calculated for each fold before averaging the results for comparison, summarised in Figure 3. Based on the overall model performance, Random Forest emerged as the strongest candidate model, achieving the lowest mean RMSE of 9.517 in Figure 3 (a) and the highest mean R2 value of 0.371 in Figure 3 (b). This indicates that the model was able to explain approximately 37% of the variance in coral cover change while also producing the lowest prediction error among all evaluated models.
Extra Trees also demonstrated relatively strong performance with a mean RMSE of 9.686 and an R2 value of 0.332, while XGBoost and KNN Regression showed moderate predictive ability. In contrast, Linear Regression produced the highest RMSE and the lowest R2 value, suggesting that the environmental variables shared complex non-linear relationships with coral cover change that a simple linear model was less capable of capturing.
The repeated cross-validation boxplots further showed that Random Forest maintained a comparatively stable spread in both RMSE and R2 values across folds compared with several other models, particularly Linear Regression and KNN Regression. This suggested more consistent predictive performance and reliable generalisation across validation subsets. Therefore, Random Forest was selected as the final model for this project.
(b) Coefficient of determination R-squared. Higher R-squared values indicate better model fit and explanatory performance.
Figure 3: Boxplot comparisons of our chosen candidate classification models using repeated 5-fold cross-validation.
To further interpret the Random Forest model, a SHAP (Shapley) beeswarm plot (Mayer 2025) was generated to explore the contribution of each predictor variable to predicting coral cover change (Figure 4). The sign of SHAP values refers to the direction of model contribution, indicating whether the predictor variable increases or decreases predicted coral cover change relative to the baseline prediction, while the SHAP magnitude indicates the strength of contribution (Lundberg et al. 2020). The colour gradient represents the feature values of the predictor variables. From Figure 4, higher values of Mean_hs generally contributed towards more positive predicted coral cover changes, while higher values of Duration_hours contributed towards greater coral loss. The SHAP values also illustrated that the influence of these variables varied across observations, suggesting complex non-linear relationships captured by the model.
Figure 4: SHAP (Shapley) beeswarm plot for the Random Forest model showing the contribution and direction of predictor variables on coral cover change predictions. Features with larger absolute SHAP values have greater influence on model output. Positive SHAP values indicate an increase in predicted coral cover change, while negative SHAP values indicate a decrease.
Using our dataset and the Random Forest modelling, we designed an interactive museum exhibit that would guide users through a Shiny app, suitable for the general public or used as a tool by educators to aid learning. Users are able to learn and explore past cyclones that have passed near the CBG, as well as predict how wave and cyclone parameters affect coral cover. Disciplinary content written by MARS students were used in the “Introduction” tab, providing information about coral reefs, cyclones and the waves they cause.
Next, users are guided to “Explore Past Cyclones” Figure 5 (a).
Users can select a previous cyclone from a dropdown menu, revealing information in the panel below, such as the duration spent near the reef, its closest distance and its associated wave parameters. Grey help icons reveal popups explaining what the more complex variables mean. In the adjacent map, the full cyclone track (grey) and highlighted track where the cyclone approaches within 1000 km of the reef (blue) is displayed.
Users select to see the coral cover before or after the cyclone passed. This colour-coded tool serves as an easy comparison for exploring impacts of a cyclone. Clicking onto a reef data point will show a popup with detailed information for those more interested.
Users select whether to show live or dead coral, which provides more functionality for users interested in exploring different parameters.
The next tab encourages users to “Design Your Own Cyclone” Figure 5 (b). The final regression model was saved as a RDS file and imported into Shiny Despite having seven modelling parameters, we decided to allow user inputs on only four parameters:
Minimum distance from reef (Min_distance_km);
Duration of cyclone near reef (Duration_hours);
Hours of damaging waves (Intervals_hs_gt3); and
Wave power (Mean_P), which was discretised into ordinal categories to improve usability: low (2000 kW/m), medium (3500 kW/m), high (7500 kW/m), and extreme (11000 kW/m).
The remaining three variables were fixed at their dataset means (Mean_hs = 2 m, Mean_fp = 0.116 Hz, Mean_dir = 98° [ESE]) to reduce cognitive load and improve interpretability. Fixing variables at mean values also avoids unrealistic or noisy combinations.
The Random Forest model was used to output the predicted live coral cover change. A balance between user-friendliness and model accuracy was considered while designing the exhibit. For example, discretisation removes variability and can mask nonlinear effects ignoring complex interactions present in the full model. However, reducing dimensionality improves interpretability and enables users to explore the key environmental mechanisms influencing coral cover change without requiring specialist knowledge. While some ecological complexity is lost, the simplified interface retains the dominant patterns identified by the model and provides an effective tool for communicating reef disturbance processes to students and the wider public.
(a) Tab 2: Explore Past Cyclones. Users are prompted to: 1. Use the dropdown to select a past cyclone, displaying the cyclone track and some key statistics; 2. Choose a before/after comparison; and 3. Choose to see live or dead coral cover.
(b) Tab 3: Design Your Own Cyclone. Users are able to customise some provided wave/cyclone parameters. Using our Random Forest regression model, they explore how their chosen combination influences the predicted coral cover change across the reef. Users can also label their Cyclone with their own name!
Figure 5: Screenshots taken from the final deployed product (guided museum exhibit) designed in Shiny.
4 Discussion
By integrating long‑term coral cover monitoring with high resolution cyclone and wave data, this study demonstrates the importance of significant wave height, cyclone duration, and wave power as key predictors of coral cover change due to TC exposure in the CBG.
4.1 Predictors of Cyclone Damage
Overall, our results support that wave exposure is the dominant mechanism of physical disturbance on reefs during TCs (Puotinen et al. 2016; Madin et al. 2014). Random Forest SHAP values indicated higher mean significant wave height \(\mathbf{H}_\mathbf{s}\) contributed to greater predicted coral loss, while longer cyclone (i.e. extreme wave) duration amplified these effects. The finding that wave power was among the most influential predictors supports previous work showing that energy flux, rather than height alone, determines the likelihood of colony dislodgement and rubble production (Madin and Connolly 2006; Puotinen et al. 2016).
Secondarily, minimum cyclone distance and duration contributed to predicting coral cover loss, though with more variable effects. Longer duration cyclones produced more hours of elevated wave conditions, increasing the likelihood of cumulative damage. This aligns with existing studies that characterise cyclone duration as a key indicator for predicting damage (Dixon et al. 2022). Notably, proximity alone was not a strong predictor of coral loss, supporting that cyclone characteristics do not necessarily translate to reef-scale damage unless accompanied by high wave energy (Puotinen et al. 2016). This reinforces the value of wave-based metrics over cyclone-based metrics for predicting ecological impacts on reefs.
4.2 Model Performance
The relatively low R2 values across all of the models tested highlight the inherent difficulty of predicting coral cover change from physical variables alone. Coral recovery trajectories, pre‑existing colony morphology, reef orientation, and biological interactions all influence post‑cyclone outcomes but were not included in the dataset (De’ath et al. 2012; Hughes 2017).
Nevertheless, the ability of Random Forest to explain nearly 40% of the variance in coral cover change using only wave and cyclone metrics is notable. It suggests that wave exposure metrics derived from hindcast models can serve as meaningful predictors of immediate cyclone damage, supporting the development of reef‑scale vulnerability assessments and early‑warning tools for managers and communities.
4.3 Limitations and Future Directions
Several limitations must be acknowledged. Firstly, the dataset contained only 152 cyclone‑reef observations, limiting model generalisability and increasing sensitivity to outliers. Secondly, repeated observations from the same reefs introduce potential non‑independence, which may influence variable importance estimates. Thirdly, coral cover was measured annually or biannually, meaning that the temporal resolution of ecological data is coarse relative to the hourly wave and cyclone data (AIMS 2025b). This mismatch may obscure short‑term recovery or delayed mortality.
Future work could incorporate:
Higher‑resolution ecological data, such as photogrammetry or colony‑level surveys
Additional environmental variables, including thermal stress, water quality, antecedent ecological condition (Hughes 2017), and biological disturbances such as crown of thorns starfish outbreaks
Morphology‑specific vulnerability metrics, given differing mechanical tolerances (Madin and Connolly 2006)
Hydrodynamic modelling to estimate near‑bottom orbital velocities directly (Lowe et al. 2009)
5 Conclusion
This study showed wave exposure (mean significant wave height, cyclone duration, and wave power) to be the primary predictor of cyclone‑related coral cover loss in the Capricorn Bunker Group. By integrating long‑term ecological monitoring with IBTrACS cyclone data and CAWCR wave hindcasts, the project demonstrates that reef‑specific wave climates can meaningfully explain immediate disturbance impacts. Although predictive performance was moderate, the Random Forest model captured a substantial portion of the variance in coral cover change, highlighting the value of wave‑based metrics for forecasting cyclone impacts at reefs. Therefore, our results indicate our methodology may present an effective, scalable method for modelling cyclone damage on reefs.
6 Student Contributions
All group members contributed collaboratively to the completion of the project. Claire was responsible for the original research question, and further clarifying the literature review was conducted by Camryn, Isobel and Yindi to facilitate question direction. Yindi, Laashan, and Sara were responsible for generating the data. Sara and Laashan performed the data merging, while Bin and Evelyn handled data processing. Sara, Laashan, Bin, and Evelyn began the initial modelling, after which Bin and Evelyn focused on finalising hyperparameter tuning, model comparisons, results, and accompanying figures. Daniel was responsible for conducting t-tests, developing the Shiny application, and managing the coding and deployment of the final Shiny app product, with the help of Claire, Isobel, Yindi, and Camryn for content and use guidance. All members contributed to the writing and editing of the report, as well as discussions and decision-making, and agree that this statement accurately reflects their individual contributions.
AI acknowledgement
Microsoft Copilot under the University of Sydney enterprise subscription was used to improve aesthetics, keep consistent formatting and debug the Shiny application.
The Generative AI output was not directly used in any part of the report.
7 References
AIMS. 2025a. “Great Barrier Reef Annual Summary Report: Coral Reef Condition 2024/2025.” Australian Institute of Marine Science. https://doi.org/10.25845/CS9T-0K11.
———. 2025b. “AIMS Long-Term Monitoring Program: Crown-of-Thorns Starfish and Benthos Manta Tow Data (Great Barrier Reef).”https://doi.org/10.25845/5c09b0abf315a.
Byrne, M., A. S. Foo, A. Vila-Concejo, and K. Wolfe. 2023. “Impacts of Climate Change Stressors on the Great Barrier Reef.” In Oceanographic Processes of Coral Reefs, 323–40.
Callaghan, David P., Peter J. Mumby, and Matthew S. Mason. 2020. “Near-Reef and Nearshore Tropical Cyclone Wave Climate in the Great Barrier Reef with and Without Reef Structure.”Coastal Engineering 157: 103652. https://doi.org/10.1016/j.coastaleng.2020.103652.
Chang, Winston, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Garrick Aden-Buie, Yihui Xie, et al. 2026. Shiny: Web Application Framework for r. https://shiny.posit.co/.
Chen, Tianqi, and Carlos Guestrin. 2016. “XGBoost: A Scalable Tree Boosting System.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–94. https://doi.org/10.1145/2939672.2939785.
Cheung, Mandy W. M., Milani Chaloupka, Karlo Hock, and Peter J. Mumby. 2025. “Moving Beyond Temperature Metrics in Coral Bleaching Prediction Using Interpretable Machine Learning.”Global Ecology and Biogeography 34 (8): e70105. https://doi.org/https://doi.org/10.1111/geb.70105.
De’ath, Glenn, Katharina E. Fabricius, Hugh Sweatman, and Marji Puotinen. 2012. “The 27-Year Decline of Coral Cover on the Great Barrier Reef and Its Causes.”Proceedings of the National Academy of Sciences 109 (44): 17995–99. https://doi.org/10.1073/pnas.1208909109.
Dixon, Amy M., Marji Puotinen, Hamish A. Ramsay, and Maria Beger. 2022. “Coral Reef Exposure to Damaging Tropical Cyclone Waves in a Warming Climate.”Earth’s Future 10 (8). https://doi.org/10.1029/2021ef002600.
Fabricius, Katharina E., Glenn De’ath, Marji L. Puotinen, Terry Done, Timothy F. Cooper, and Sally C. Burgess. 2008. “Disturbance Gradients on Inshore and Offshore Coral Reefs Caused by a Severe Tropical Cyclone.”Limnology and Oceanography 53 (2): 690–704. https://doi.org/10.4319/lo.2008.53.2.0690.
Ferrario, Filippo, Michael W. Beck, Curt D. Storlazzi, Fiorenza Micheli, Christine C. Shepard, and Laura Airoldi. 2014. “The Effectiveness of Coral Reefs for Coastal Hazard Risk Reduction and Adaptation.”Nature Communications 5 (1). https://doi.org/10.1038/ncomms4794.
Google Earth. 2026. “Gladstone and the Capricorn Bunker Reef Group, Great Barrier Reef: 23.5 s, 152.07 e.” Satellite map. https://earth.google.com/web/.
Gouezo, Mimi, Yimnang Golbuu, Katharina Fabricius, Daniel Olsudong, Gregory Mereb, Victor Nestor, Eric Wolanski, Peter Harrison, and Costas Doropoulos. 2019. “Drivers of Recovery and Reassembly of Coral Reef Communities.”Proceedings of the Royal Society B: Biological Sciences 286 (1897): 20182908. https://doi.org/10.1098/rspb.2018.2908.
Graham, Nicholas A. J., and Kenneth L. Nash. 2012. “The Importance of Structural Complexity in Coral Reef Ecosystems.”Coral Reefs 32 (2): 315–26. https://doi.org/10.1007/s00338-012-0984-y.
Grossmann-Matheson, Georg, Ian R. Young, Andrea Meucci, and Jose-Henrique Alves. 2024. “Global Tropical Cyclone Extreme Wave Height Climatology.”Scientific Reports 14 (1): 4167. https://doi.org/10.1038/s41598-024-54691-9.
Hoeke, Ron K., Kathleen L. McInnes, Jonathan C. Kruger, Robin J. McNaught, John R. Hunter, and Scott G. Smithers. 2013. “Widespread Inundation of Pacific Islands Triggered by Distant-Source Wind-Waves.”Global and Planetary Change 108: 128–38. https://doi.org/10.1016/j.gloplacha.2013.06.006.
Hughes, Terry P. 2017. “Global Warming and Recurrent Mass Bleaching of Corals.”Nature 543 (7645): 373–77. https://doi.org/10.1038/nature21707.
Jell, J. S., and P. G. Flood. 1978. “Guide to the Geology of Reefs of the Capricorn and Bunker Groups, Great Barrier Reef Province, with Special Reference to Heron Reef.”Australasian Sedimentologists Group 8 (3): 1–85.
Karnauskas, Kristopher B. 2020. “Physical Diagnosis of the 2016 Great Barrier Reef Bleaching Event.”Geophysical Research Letters 47 (11). https://doi.org/10.1029/2019gl086177.
Knutson, Thomas, Suzana J. Camargo, Johnny C. L. Chan, Kerry Emanuel, Chun-Chih Ho, James Kossin, Mrutyunjay Mohapatra, et al. 2020. “Tropical Cyclones and Climate Change Assessment: Part II: Projected Response to Anthropogenic Warming.”Bulletin of the American Meteorological Society 101 (3): E303–22. https://doi.org/10.1175/bams-d-18-0194.1.
Liu, Philip L.-F. 1995. “MODEL EQUATIONS FOR WAVE PROPAGATIONS FROM DEEP TO SHALLOW WATER.” In Advances in Coastal and Ocean Engineering, 125–57. https://doi.org/10.1142/9789812797582_0003.
Lowe, Ryan J., James L. Falter, Stephen G. Monismith, and Mark J. Atkinson. 2009. “Wave-Driven Circulation of a Coastal Reef–Lagoon System.”Journal of Physical Oceanography 39 (4): 873–93. https://doi.org/10.1175/2008jpo3958.1.
Lundberg, Scott M., Gabriel Erion, H. Chen, A. DeGrave, Jordan M. Prutkin, B. Nair, R. Katz, Jonathan Himmelfarb, N. Bansal, and Su-In Lee. 2020. “From Local Explanations to Global Understanding with Explainable AI for Trees.”Nature Machine Intelligence 2: 56–67. https://doi.org/10.1038/s42256-019-0138-9.
Madin, Joshua S., Andrew H. Baird, Maria Dornelas, and Sean R. Connolly. 2014. “Mechanical Vulnerability Explains Size-Dependent Mortality of Reef Corals.”Ecology Letters 17 (8): 1008–15. https://doi.org/10.1111/ele.12306.
Madin, Joshua S., and Sean R. Connolly. 2006. “Ecological Consequences of Major Hydrodynamic Disturbances on Coral Reefs.”Nature 444 (7118): 477–80. https://doi.org/10.1038/nature05328.
Osborne, Kim, Andrew M. Dolman, Sally C. Burgess, and Kylie A. Johns. 2011. “Disturbance and the Dynamics of Coral Cover on the Great Barrier Reef (1995–2009).”PLOS ONE 6 (3): e17516. https://doi.org/10.1371/journal.pone.0017516.
Puotinen, Marji, Jeffrey A. Maynard, Roger Beeden, Ben Radford, and Gareth J. Williams. 2016. “A Robust Operational Model for Predicting Where Tropical Cyclone Waves Damage Coral Reefs.”Scientific Reports 6 (1): 26009. https://doi.org/10.1038/srep26009.
Robusto, Carl C. 1957. “The Cosine-Haversine Formula.”The American Mathematical Monthly 64 (1): 38–40. https://doi.org/10.2307/2309088.
Srinivasan, M., G. F. Galbraith, D. M. Ceccarelli, B. J. Cresswell, S. J. Strähl, and D. H. Williamson. 2026. “Long-Term Effects of a Severe Tropical Cyclone on Coral Reef Habitat and Fish Assemblages at the Whitsunday Islands, Central Great Barrier Reef.”PLOS ONE 21 (2): e0329995. https://doi.org/10.1371/journal.pone.0329995.
Vila-Concejo, Alexandra, and Paul Kench. 2017. “Storms in Coral Reefs: Processes and Impacts.” In Coastal Storms: Processes and Impacts, 127–49. https://doi.org/10.1002/9781118937099.ch7.
Wolff, Nicholas H., Peter J. Mumby, Marji Devlin, and Ken R. N. Anthony. 2018. “Vulnerability of the Great Barrier Reef to Climate Change and Local Pressures.”Global Change Biology 24 (5): 1978–91. https://doi.org/10.1111/gcb.14043.
Wu, Dong-Hai, Li-Yan Miao, Yu-Dan Song, Si Wang, Ting-Ting Wang, Hui-Lin Ou, Jun Xie, et al. 2025. “The Distribution, Diversity, and Indicator Species of Coral Communities Under the Influence of Environmental Changes in the Subtropical Peninsula of Southern China.”Ecology and Evolution 15: e72212. https://doi.org/10.1002/ece3.72212.
Zawada, K. J. A., Joshua S. Madin, Andrew H. Baird, Tim C. L. Bridge, and Maria Dornelas. 2019. “Morphological Traits Can Track Coral Reef Responses to the Anthropocene.”Functional Ecology 33 (6): 962–75. https://doi.org/10.1111/1365-2435.13358.
8 Appendix
8.1 Merging Datasets
All netCDF wave files were combined into a single dataset, with each wave point linked to its corresponding reef. This produced the final wave dataset, “combined_wave_data.csv”, which included significant wave height, peak wave frequency, wave direction, and related variables.
The wave dataset (“combined_wave_data.csv”) was then merged with the IBTrACS cyclone dataset (“ibtracs.since1980.list.v04r01.csv”) by matching observations on date, hour, and minute. The spatial distance between reef wave points and cyclone locations was calculated using the Haversine formula and stored as a new variable. Only cyclone observations within 1000 km of reef locations were retained, resulting in the final combined dataset, “combined_wave_cyclone.csv”.
The dataset (“combined_wave_cyclone.csv”) was then merged with the AIMS manta tow coral cover dataset (“capricornbunkermantatowdata.csv”) for use in the final modelling analysis, producing “merged_coral_cover.csv”. The datasets were joined using Reef_ID as the common key, matching manta tow survey records to each cyclone–reef observation in the wave–cyclone dataset.
Additional variables were then derived for analysis, including mean and maximum wave height, the number of hours during which wave height exceeded thresholds of 2.5 m, 3 m, and 4 m, wave direction, wave period, and duration of cyclone impact. Further calculations included minimum distance between cyclone centre and reef, as well as mean and maximum wave energy and wave power.
Figure 6: Pearson correlation heatmap of cyclone exposure variables used in the modelling process. Red colours indicate positive correlations, while blue colours indicate negative correlations. Darker red cells indicate stronger positive correlations and highlight potential multicollinearity among several wave height and energy-related predictors.
8.3 Final Variables and their Descriptions
Final wave and cyclone derived variables in the Random Forest regression model
Variable Name
Description
Mean_hs
Mean significant wave height (m)
Intervals_hs_gt3
Hours of waves > 3 metres significant wave height (hours)
Mean_fp
Mean peak frequency (Hz)
Mean_dir
Mean wave direction from North (°)
Min_distance_km
Minimum distance between cyclone and reef (km)
Duration_hours
Duration of time cyclone was in 1000 km radius of reef (hours)
Mean_P
Mean wave power (kW/m)
Source Code
---title: "Modelling the Impacts of Cyclone-Induced Waves on Coral Cover in the Capricorn Bunker, Great Barrier Reef"date: "`r Sys.Date()`"author: "Reef 1"format: html: embed-resources: true code-fold: true code-tools: true fig_caption: true theme: unitedexecute: cache: truetable-of-contents: truenumber-sections: truebibliography: references.bibreference-location: section---```{r}#| message: false#| warning: false#| code-summary: "Code (Loading Packages)"library(tidyverse)library(lubridate)library(geosphere)library(caret)library(MASS)library(randomForest)library(FNN)library(broom)library(knitr)library(tidymodels)library(xgboost)library(ranger)library(fastshap)library(shapviz)library(patchwork)```# Executive Summary {.unnumbered}The Great Barrier Reef (GBR) is among the most diverse and productive ecosystems in the world, providing a range of critical services to ecological and anthropogenic communities. However, damaging waves generated by cyclones present a major threat to reef health. Modelling and predicting cyclone impacts on reefs presents a range of empirical and theoretical challenges. Notably, models must merge high-resolution atmosphere and wave data with low-resolution empirical data from reef monitoring programs. Our model, based on cyclone-induced wave conditions over 33 years at 10 reefs in the Southern GBR, provides a novel and scalable approach to modelling cyclone-induced reef damage. We find that cyclone duration and wave conditions are important drivers of coral cover change, with prolonged exposure to storm-generated waves generally associated with greater coral loss. Among the models tested, Random Forest provided the strongest predictive performance, revealing non-linear relationships between cyclone exposure, wave characteristics, and reef condition. This highlights the importance of considering storm intensity and exposure duration when assessing cyclone impacts on coral reefs. Beyond providing a novel approach for modelling cyclone impacts, we developed an interactive exhibit for educators and community education, translating our complex model into a practical tool for engaging the non-scientific public. [The app](https://daniel--w.shinyapps.io/DATA3888-Reef-1/) includes a ‘design-your-own cyclone’ feature based on our model results, allowing users to interactively explore the metrics that contribute to cyclone damage to reefs.This project ([@fig-workflow]) presents a robust, scalable approach to future modelling and predicting cyclone impacts on reefs. Additionally, by translating complex modelling approaches into accessible, interactive platforms for education and science communication, our project framework represents a more inclusive approach to communicating results beyond the scientific community.```{r}#| label: fig-workflow#| echo: false#| message: false#| warning: false#| code-summary: Code (Screenshots from Shiny exhibit)#| fig-cap: "Flowchart showing the key steps taken during our project workflow. Identification of the problem, dataset sourcing, processing and modelling steps are shown."knitr::include_graphics("www/workflow-diagram.png")```# Introduction## Project AimsThis project aims to model and predict the immediate impacts of tropical cyclone-induced waves on reef coral cover and to develop an interactive platform to communicate results to generalised audiences. Our research questions are: **a)** *How do wave conditions during tropical cyclones impact coral cover in Capricorn Bunker, Great Barrier Reef?***b)** *How can these changes be communicated as an accessible educational tool?*## BackgroundCoral reefs support exceptional biodiversity and deliver essential ecosystem services, including coastal protection, fisheries, and tourism [@Wu2025; @Ferrario2014; @Graham2012]. However, the GBR is highly sensitive to environmental stressors. Between 2024 and 2025, coral cover declined by 14–30% across the GBR, largely due to cumulative impacts of bleaching and cyclone damage [@AIMS2025Report]. Declines have cascading ecological effects on ecosystems, in particular reef-associated biota, threatening overall reef health, value, and resilience [@Srinivasan2026; @Graham2012].Tropical cyclones (TCs) are among the most severe acute disturbance events affecting the GBR. While cyclone damage extent is largely influenced by intensity, size, and duration [@Dixon2022], impacts on reefs are ultimately controlled by the mechanical forces of wind-generated waves. Cyclone winds generate damaging waves that cause erosion, dislodgement, large-scale rubble production, and contribute to slope destabilisation [@Srinivasan2026; @VilaConcejo2017]. Storm waves can propagate long distances, with outer-shelf reefs attenuating but not fully blocking wave energy [@Callaghan2020]. High-resolution wave models demonstrate that near-bottom velocities of 2.5–3.1 m/s exceed the mechanical tolerance of branching and tabular corals, reliably predicting dislodgement [@Cheung2025].Understanding and predicting how reefs respond to cyclones is critical for effective management and conservation, particularly as forecast increases in extreme storm events [@Knutson2020]. However, cyclone damage is challenging to predict. Reef slopes oriented towards incoming swells attenuate the most wave energy, and experience greater damage [@Ferrario2014]. Reefs exposed to regular storm regimes may likewise be more adapted to absorb impacts, or rapidly recover due to adapted coral morphologies [@Byrne2023; @Zawada2019].Quantifying cyclone impacts requires merging high-temporal-resolution climate data with low-temporal-resolution, inconsistent coral-cover surveys. Previous approaches included deriving annual descriptive wave climates to merge with coral cover data [@Death2012; @Gouezo2019], or correlating coral cover changes with the presence or absence of cyclones [@Osborne2011]. However, reducing high-resolution cyclone data to annual variables may omit important cyclone-specific predictors of damage. A higher-resolution approach is to measure individual reef conditions before and after a cyclone [@Fabricius2008] to predict impacts in sequential modelling [@Wolff2018]. However, sampling around unpredictable TCs is not always feasible, and individual reef responses may not scale accurately.We present a novel, scalable approach to modelling cyclone damage by deriving cyclone-induced wave climates and merging these conditions with available coral-cover data on a cyclone-frequency-dependent basis. This method provides a robust representation of damage impacts and predictors, compared to standard approaches that face trade-offs between data resolution and scalability. By developing an online exhibition to communicate cyclone-reef impacts, we aim to translate our complex modelling outputs into accessible formats for broader public audiences.# Methods## Study SiteThe study is situated in the Capricorn-Bunker Group (CBG) ([@fig-study-site]). The CBG contains 17 islands, 10 km west of the continental shelf edge in the southernmost sector of the GBR [@Jell1978]. It experiences fewer, but more temporally clustered cyclones than the Central GBR [@Wolff2018]. Notably, Southern reefs have been less affected by marine heatwaves and bleaching events than other regions [@Hughes2017], and the CBG has had no recorded severe outbreaks of crown-of-thorns starfish [@AIMS2025]. Thus, the CBG provides a robust case study for isolating cyclone impacts from other confounding impacts.Ten reefs in the CPG have been monitored biannually since 2006 (for 4 reefs, monitoring started in 1992) through the Australian Institute of Marine Science Long-Term Monitoring Program (AIMS LTMP). The LTMP uses manta tow surveys, where a diver towed behind a boat records percentage estimations of coral cover around the reef perimeter. As cyclone damage is concentrated in the orientation of incoming waves [@VilaConcejo2017], manta tow data provides a robust indicator of potentially heterogenously distributed cyclone impacts across the reef. At each site, wave conditions were extracted and isolated to reflect cyclone windows.```{r}#| label: fig-study-site#| echo: false#| message: false#| warning: false#| fig-align: center#| fig-cap: "Study site: 10 reefs in the Capricorn-Bunker Group, Southern Great Barrier Reef. <br><b> a) </b>Study context, the white circle represents the 1000km cyclone radius used to constrain cyclone tracks. <br><b> b) </b>Higher resolution view of the 10 reefs in this study: North Reef, Broomfield reef, Wreck Island Reef, One Tree Island Reef, Erskine Reef, Mast Head Reef, Boult Reef, Hoskyn Islands Reef, Fairfax Islands Reef, Lady Musgrave Reef. Orange markers indicate the island measured under the Australian Institute of Marine Science Long-Term Monitoring Program. Blue markers indicate the gridded locations used to collect wave hindcast data from the Centre for Australian Weather and Climate Wave Hindcast Model. Images from: AIMS (2025), Google Earth (2026)."#| out-width: 100%knitr::include_graphics("www/study-site.png")```## Datasets### Extracting Wave ConditionsThe Centre for Australian Weather and Climate Research (CAWCR) Wave Hindcast Model is a global model based on forcing conditions from the Climate Forecast System Reanalysis, producing hourly wave hindcast outputs at a 0.4° × 0.4° global resolution. Wave data from 1992 to present was collected from unique grid points adjacent to sites in NetCDF format (@fig-study-site). Each grid point was linked to its corresponding reef name to generate a combined dataset for hourly wave conditions across study sites since 1992.From variables in the dataset, wave energy $\textbf{E}$ and wave power $\textbf{P}$ were calculated by the equations below to account for the cumulative impacts of height and speed. Deep-water Linear Wave Theory equations were used [@Liu1995], as depths at each wave data point ranged from 8-60 metres [@GoogleEarth2026]. Calculations were performed in Microsoft Excel.**Wave Energy,** $\textbf{E}$ (J/m^2^)\$$\mathbf{E}= \frac{1}{16} \, \rho \, g \, \mathbf{H}_{\mathrm{s}}^{2}$$- Where: - $\rho$ (seawater density) was 1025 kg/m^3^ in line with mean seawater density [@Karnauskas2020]; - $g$ (gravitational acceleration) was 9.81 m/s^2^ ; - $\mathbf{H}_{\textbf{s}}$ was the mean significant wave height in meters; and - the coefficient was reduced to $\frac{1}{16}$ (T. Salles, personal communication, 7 May 2026), as significant wave height values are derived from a spectrum of many irregular sea waves, rather than a single idealised wave height.**Wave Power,** $\textbf{P}$ (kW/m)\$$\textbf{P} = \textbf{E}\,C\,n$$- Where: - $n=\frac{1}{2}$ to reflect deep water equation parameters [@Liu1995]; - celerity: $C=1.56\,T_p$ ; and - peak period $T_p$ is the inverse of peak wave frequency $F_p$ (discretised from 0.038 to 0.5 Hz) and determined as: $T_p =\frac{1}{F_p}$.### Identifying Cyclone WindowsCyclone tracks were used to identify where cyclones entered a 1000 km radius of each reef. While the spatial extent of TC gales is 300-500 km [@GrossmannMatheson2024], propagated waves can travel up to 20,000 km [@Hoeke2013]. A radius of 1000 km was chosen as a conservative estimate to reduce confounding effects of regular sea-state conditions, whilst accounting for potential lag effects in wave arrival. Cyclone tracks were accessed from the International Best Track Archive for Climate Stewardship (IBTrACS, v04r01). For each TC, the dataset provides the cyclone eye location at 3-hour intervals. Cyclone data was merged with wave data by matching the date, hour, and minute across the datasets. The spatial distance between gridded reef wave points and cyclone locations was calculated using the Haversine formula [@Robusto1957], computing the distance between two locations using their longitude and latitude, and cyclone observations within 1000 km of the reefs were retained.### Quantifying Changes in Coral CoverTo quantify the impacts of wave climates on coral cover, the combined wave and cyclone dataset was merged with AIMS LTMP coral cover observations for each cyclone. Dataset merging used reef name as the common key, matching the surveys in the manta tow dataset for every cyclone reef row in the wave cyclone dataset. New columns were created to calculate descriptive wave climate characteristics across the duration of each observed cyclone for each specific reef. Finally, live and dead coral cover data were extracted according to the nearest monitoring dates before and after each cyclone. Thus, the merged dataset is cyclone-frequency dependent, including reef-specific wave conditions and temporally related coral cover observations. (For detailed methods see Appendix @sec-merging-datasets).## Exploratory Data AnalysisDataset merging was performed using R and Python in RStudio, whereas all modelling and figures were produced in RStudio using R. The final product was deployed as a web application using Shiny [@Shiny].Data cleaning and preprocessing were performed during dataset merging by removing observations with missing geographic coordinates, and keeping only cyclones after 1992 that passed within 1000 km of CBG reefs. The final dataset contained 152 observations across 10 reefs and 17 cyclones, with numeric, categorical, and temporal variables describing wave conditions, cyclone exposure, and coral cover. No missing values remained. The response variable was the change in live coral cover, calculated using the difference between pre- and post-cyclone measurements.```{r}#| message: false#| warning: false#| code-summary: "Code (Reading in Data)"data <-read_csv("merging datasets/merged_cyclone_coral_v4.csv", show_col_types =FALSE)all_vars <-c("Mean_hs","Max_hs","Intervals_hs_gt4","Intervals_hs_gt3","Intervals_hs_gt25","Mean_fp","Max_fp","Mean_dir","Std_dir","Min_distance_km","Duration_hours","Mean_E","Max_E","Mean_P","Max_P")all_data <- data %>% dplyr::select(all_of(all_vars)) %>%mutate(across(everything(), as.numeric))selected_vars <-c("Intervals_hs_gt3","Mean_hs","Mean_P","Mean_fp","Mean_dir","Min_distance_km","Duration_hours")cyclone_exposure <- data %>% dplyr::select(all_of(selected_vars))```Multicollinearity between predictor variables was checked using correlation analysis, where several wave-related variables showed strong correlations, such as mean and maximum significant wave height. (Appendix [@sec-multicollinearity-heatmap])Multicollinearity between predictor variables was checked using correlation analysis, where several wave-related variables showed strong correlations, such as mean and maximum significant wave height. The final predictors included mean significant wave height, hours with significant wave height exceeding 3 metres, mean wave power, mean peak wave frequency, mean wave direction, minimum cyclone to reef distance, and cyclone duration. Predictably, mean significant wave height and mean wave power were highly correlated since wave power is derived from the square of wave height. However, considering the importance of accounting for the cumulative effects of wave height, length, and speed on forces exerted on coral colonies during wave dissipation, both variables were retained [@Ferrario2014]. Details descriptions can be found in Appendix @sec-final-vars.## Model SelectionLinear Regression was used as a baseline model due to its simplicity and interpretability, as it provided a direct assessment of linear relationships between environmental predictors and coral cover change. Backward stepwise feature selection was applied to the full set of environmental variables, iteratively removing less important predictors to improve model performance, resulting in a final simpler model.However, relationships between wave and cyclone exposure variables and coral cover change were often complex and non-linear. To better capture these patterns, additional machine learning models were evaluated using a predefined subset of predictor variables selected based on environmental relevance and exploratory analysis. K-Nearest Neighbours (KNN) Regression was implemented as a distance-based algorithm, with predictor variables standardised to prevent variables with larger magnitudes from dominating the distance calculations. Hyperparameter tuning was then performed to identify the optimal number of neighbours.Random Forest, Extra Trees Regression, and XGBoost served as ensemble tree-based or boosted tree approaches capable of capturing interaction effects and threshold-based environmental impacts. Hyperparameter tuning was conducted for all models using cross-validation. Unlike Random Forest, which built trees independently, Extra Trees introduced additional randomness when choosing split points to increase tree diversity and reduce overfitting [@Geurts2006]. XGBoost constructed trees sequentially, where at each training step the model fixed the trees already learned and added a new tree to improve the objective function, allowing subsequent trees to reduce the remaining prediction errors from earlier trees [@Chen2016XGBoost].For all tuned models, the best-performing parameter setting was selected based on the lowest cross-validated RMSE before the final model was fitted using the training dataset and evaluated on the testing dataset. RMSE was used as the primary tuning metric because it places greater emphasis on larger prediction errors, which was important for reliable prediction of coral cover change. Although the dataset was relatively small and the models showed some risk of overfitting, conservative hyperparameter tuning and modelling choices were used to reduce this risk and improve model stability.## Model AssumptionsSeveral modelling assumptions were considered throughout the analysis. One key assumption was data independence, although this was inherently violated to some extent because geographically proximate reefs may have shared similar environmental conditions and cyclone exposure. Despite this unavoidable spatial dependence in ecological datasets, each reef–cyclone observation was treated as independent for modelling purposes. The training and testing datasets were also assumed to come from similar distributions, with the training data being representative of broader reef and cyclone exposure conditions. For linear regression, additional assumptions included linearity, homoscedasticity, and normality of residuals, along with multicollinearity among predictor variables checked before modelling.# Results## Comparison of Models```{r}#| message: false#| warning: false#| label: setup-5-fold-CV#| code-summary: Code (Setup 5-fold CV)# Create target variablemodel_data <- data %>%mutate(live_coral_change = Post_MEAN_LIVE_CORAL - Pre_MEAN_LIVE_CORAL )selected_vars <- selected_vars[selected_vars %in%names(model_data)]# Prepare modelling datamodel_data <- model_data %>% dplyr::select(live_coral_change, all_of(selected_vars)) %>%drop_na()# Train-test splitset.seed(3888)train_index <-sample(1:nrow(model_data), size =0.75*nrow(model_data))train_data <- model_data[train_index, ]test_data <- model_data[-train_index, ]# Evaluation functionevaluate_model <-function(actual, predicted) {tibble(RMSE =sqrt(mean((actual - predicted)^2)),MSE =mean((actual - predicted)^2),MAE =mean(abs(actual - predicted)),R_squared =1-sum((actual - predicted)^2) /sum((actual -mean(actual))^2) )}# Use the repeated 5-fold CV setting for all modelsset.seed(3888)cv_control <-trainControl(method ="repeatedcv",number =5,repeats =50,savePredictions ="final")set.seed(3888)shared_folds <-vfold_cv(train_data, v =5)summarise_cv_results <-function(cv_result, model_name, interpretability, notes) { cv_result %>%summarise(RMSE =mean(RMSE),MSE =mean(MSE),MAE =mean(MAE),R_squared =mean(R_squared),.groups ="drop" ) %>%mutate(Model = model_name,Interpretability = interpretability,Notes = notes )}``````{r}#| message: false#| warning: false#| label: linear-regression#| code-summary: Code (Linear Regression)# Linear Regression with backward selection using repeated 5-fold CVset.seed(3888)lm_cv_model <- caret::train( live_coral_change ~ .,data = train_data,method ="lmStepAIC",trControl = cv_control,metric ="RMSE",trace =FALSE)lm_cv_results <- lm_cv_model$pred %>%group_by(Resample) %>%summarise(RMSE =sqrt(mean((obs - pred)^2)),MSE =mean((obs - pred)^2),MAE =mean(abs(obs - pred)),R_squared =1-sum((obs - pred)^2) /sum((obs -mean(obs))^2),.groups ="drop" )lm_result <-summarise_cv_results( lm_cv_results,"Linear Regression","High","Backward selection evaluated with 5-fold CV")``````{r}#| message: false#| warning: false#| label: knn-regression#| code-summary: Code (k-NN Regression)# KNN Regression using repeated 5-fold CVset.seed(3888)knn_grid <-expand.grid(k =c(3, 5, 7, 9, 11, 13, 15))knn_cv_model <- caret::train( live_coral_change ~ .,data = train_data,method ="knn",trControl = cv_control,tuneGrid = knn_grid,metric ="RMSE",preProcess =c("center", "scale"))best_k <- knn_cv_model$bestTune$kknn_cv_results <- knn_cv_model$pred %>%filter(k == best_k) %>%group_by(Resample) %>%summarise(RMSE =sqrt(mean((obs - pred)^2)),MSE =mean((obs - pred)^2),MAE =mean(abs(obs - pred)),R_squared =1-sum((obs - pred)^2) /sum((obs -mean(obs))^2),.groups ="drop" )knn_result <-summarise_cv_results( knn_cv_results,paste0("KNN Regression (k = ", best_k, ")"),"Low-Medium","Tuned k and evaluated with 5-fold CV")``````{r}#| message: false#| warning: false#| label: random-forest#| code-summary: Code (Random Forest)#| # Random Forest using repeated 5-fold CVset.seed(3888)rf_grid <-expand.grid(mtry =c(2, 3, 4, 5),splitrule ="variance",min.node.size =c(2, 5, 10))tree_values <-c(300, 500, 700)rf_models <-list()rf_results <-data.frame()for (ntree_val in tree_values) { rf_temp <- caret::train( live_coral_change ~ .,data = train_data,method ="ranger",trControl = cv_control,tuneGrid = rf_grid,metric ="RMSE",num.trees = ntree_val,importance ="impurity" ) temp_results <- rf_temp$results temp_results$num.trees <- ntree_val rf_results <-rbind(rf_results, temp_results) rf_models[[paste0("trees_", ntree_val)]] <- rf_temp}best_rf <- rf_results %>%arrange(RMSE) %>%slice(1)best_mtry <- best_rf$mtrybest_min_node <- best_rf$min.node.sizebest_num_trees <- best_rf$num.treesbest_rf_model <- rf_models[[paste0("trees_", best_num_trees)]]rf_cv_results <- best_rf_model$pred %>%filter(mtry == best_mtry) %>%group_by(Resample) %>%summarise(RMSE =sqrt(mean((obs - pred)^2)),MSE =mean((obs - pred)^2),MAE =mean(abs(obs - pred)),R_squared =1-sum((obs - pred)^2) /sum((obs -mean(obs))^2),.groups ="drop" )rf_result <-summarise_cv_results( rf_cv_results,"Random Forest","Medium","Tuned mtry and evaluated with 5-fold CV")# Final Random Forest fitted on the training data for SHAP interpretationrf_model <-randomForest( live_coral_change ~ .,data = train_data,ntree =500,mtry = best_mtry,importance =TRUE)``````{r}#| message: false#| warning: false#| label: XGBoost#| code-summary: Code (XGBoost)# XGBoost using repeated 5-fold CVset.seed(3888)xgb_train <- train_dataxgb_test <- test_dataxgb_folds <- shared_foldsxgb_recipe <-recipe(live_coral_change ~ ., data = xgb_train) %>%step_zv(all_predictors())xgb_model <-boost_tree(trees =300,tree_depth =tune(),learn_rate =tune(),loss_reduction =tune(),sample_size =tune(),mtry =tune(),min_n =tune()) %>%set_engine("xgboost") %>%set_mode("regression")xgb_grid <-expand_grid(tree_depth =c(1, 2, 3),learn_rate =c(0.01, 0.03, 0.05),loss_reduction =c(0, 0.01),sample_size =c(0.7, 0.9),mtry =c(3, 5, 7),min_n =c(5, 10))xgb_workflow <-workflow() %>%add_recipe(xgb_recipe) %>%add_model(xgb_model)xgb_tuned <-tune_grid( xgb_workflow,resamples = xgb_folds,grid = xgb_grid,metrics = yardstick::metric_set( yardstick::rmse, yardstick::mae, yardstick::rsq),control =control_grid(save_pred =TRUE))best_xgb <-select_best(xgb_tuned, metric ="rmse")final_xgb_workflow <-finalize_workflow( xgb_workflow, best_xgb)final_xgb_fit <-fit( final_xgb_workflow,data = xgb_train)xgb_pred <-predict(final_xgb_fit, xgb_test) %>%pull(.pred)xgb_cv_results <-collect_predictions(xgb_tuned) %>%inner_join(best_xgb, by =names(best_xgb)) %>%group_by(id) %>%summarise(RMSE =sqrt(mean((live_coral_change - .pred)^2)),MSE =mean((live_coral_change - .pred)^2),MAE =mean(abs(live_coral_change - .pred)),R_squared =1-sum((live_coral_change - .pred)^2) /sum((live_coral_change -mean(live_coral_change))^2),.groups ="drop" ) %>%rename(Resample = id)xgb_result <-summarise_cv_results( xgb_cv_results,"XGBoost","Medium","Tuned boosted trees evaluated with 5-fold CV")``````{r}#| message: false#| warning: false#| label: extra-trees#| code-summary: Code (Extra Trees)# Extra Trees using repeated 5-fold CVextra_trees_recipe <-recipe(live_coral_change ~ ., data = xgb_train) %>%step_zv(all_predictors())extra_trees_model <-rand_forest(trees =500,mtry =tune(),min_n =tune()) %>%set_engine("ranger", splitrule ="extratrees", importance ="impurity") %>%set_mode("regression")extra_trees_workflow <-workflow() %>%add_recipe(extra_trees_recipe) %>%add_model(extra_trees_model)extra_trees_grid <-expand_grid(mtry =c(2, 4, 7),min_n =c(2, 5, 10))extra_trees_tuned <-tune_grid( extra_trees_workflow,resamples = xgb_folds,grid = extra_trees_grid,metrics = yardstick::metric_set( yardstick::rmse, yardstick::mae, yardstick::rsq),control =control_grid(save_pred =TRUE))best_extra_trees <-select_best(extra_trees_tuned, metric ="rmse")final_extra_trees_workflow <-finalize_workflow( extra_trees_workflow, best_extra_trees)final_extra_trees_fit <-fit( final_extra_trees_workflow,data = xgb_train)extra_trees_pred <-predict(final_extra_trees_fit, xgb_test) %>%pull(.pred)extra_trees_cv_results <-collect_predictions(extra_trees_tuned) %>%inner_join(best_extra_trees, by =names(best_extra_trees)) %>%group_by(id) %>%summarise(RMSE =sqrt(mean((live_coral_change - .pred)^2)),MSE =mean((live_coral_change - .pred)^2),MAE =mean(abs(live_coral_change - .pred)),R_squared =1-sum((live_coral_change - .pred)^2) /sum((live_coral_change -mean(live_coral_change))^2),.groups ="drop" ) %>%rename(Resample = id)extra_trees_result <-summarise_cv_results( extra_trees_cv_results,"Extra Trees","Medium","Extremely randomized trees evaluated with 5-fold CV")```5-fold cross-validation (CV) with 50 repeats was applied to evaluate model performance and stability, with performance metrics including Root Mean Square Error (RMSE) and R^2^ values calculated for each fold before averaging the results for comparison, summarised in @fig-boxplot-5-fold-CV. Based on the overall model performance, Random Forest emerged as the strongest candidate model, achieving the lowest mean RMSE of `{r} round(rf_result$RMSE, 3)` in @fig-boxplot-5-fold-CV-1 and the highest mean R^2^ value of `{r} round(rf_result$R_squared, 3)` in @fig-boxplot-5-fold-CV-2. This indicates that the model was able to explain approximately `{r} round(rf_result$R_squared*100, 0)`% of the variance in coral cover change while also producing the lowest prediction error among all evaluated models.Extra Trees also demonstrated relatively strong performance with a mean RMSE of `{r} round(extra_trees_result$RMSE, 3)` and an R^2^ value of `{r} round(extra_trees_result$R_squared, 3)`, while XGBoost and KNN Regression showed moderate predictive ability. In contrast, Linear Regression produced the highest RMSE and the lowest R^2^ value, suggesting that the environmental variables shared complex non-linear relationships with coral cover change that a simple linear model was less capable of capturing.The repeated cross-validation boxplots further showed that Random Forest maintained a comparatively stable spread in both RMSE and R^2^ values across folds compared with several other models, particularly Linear Regression and KNN Regression. This suggested more consistent predictive performance and reliable generalisation across validation subsets. Therefore, Random Forest was selected as the final model for this project.```{r}#| message: false#| warning: false#| label: fig-boxplot-5-fold-CV#| code-summary: Code (Boxplot 5-fold CV performance)#| fig-cap: "Boxplot comparisons of our chosen candidate classification models using repeated 5-fold cross-validation."#| fig-subcap:#| - "Root Mean Square Error (RMSE). Lower RMSE values indicate better predictive performance."#| - "Coefficient of determination R-squared. Higher R-squared values indicate better model fit and explanatory performance."#| layout-ncol: 2# Prepare fold-level CV results for boxplotscv_plot_data <-bind_rows( lm_cv_results %>%mutate(Model ="Linear Regression"), knn_cv_results %>%mutate(Model =paste0("KNN Regression (k = ", best_k, ")")), rf_cv_results %>%mutate(Model ="Random Forest"), xgb_cv_results %>%mutate(Model ="XGBoost"), extra_trees_cv_results %>%mutate(Model ="Extra Trees"))# Boxplot RMSE comparison using repeated 5-fold CV resultscv_plot_data %>%mutate(Model =reorder(Model, RMSE, median)) %>%ggplot(aes(x = Model, y = RMSE)) +geom_boxplot(width =0.6,fill ="grey85",color ="black",outlier.shape =NA ) +geom_text(data = cv_plot_data %>%group_by(Model) %>%summarise(med =mean(RMSE)),aes(x = Model,y =min(cv_plot_data$RMSE) +0.6,label =sprintf("%.3f", med) ),inherit.aes =FALSE,hjust =1,fontface ="bold" ) +coord_flip() +labs(title ="Repeated 5-fold CV RMSE Comparison",x ="Model",y ="CV RMSE" ) +theme_minimal(base_size =13) +theme(plot.title =element_text(face ="bold", size =15),axis.title =element_text(face ="bold"),panel.grid.major.y =element_blank(),panel.grid.minor =element_blank() )# Boxplot R-squared comparison using repeated 5-fold CV resultscv_plot_data %>%mutate(Model =reorder(Model, R_squared, median)) %>%ggplot(aes(x = Model, y = R_squared)) +geom_boxplot(width =0.6,fill ="grey85",color ="black",outlier.shape =NA ) +geom_text(data = cv_plot_data %>%group_by(Model) %>%summarise(med =mean(R_squared)),aes(x = Model,y =min(cv_plot_data$R_squared) +0.2,label =sprintf("%.3f", med) ),inherit.aes =FALSE,hjust =1,fontface ="bold" ) +coord_flip() +labs(title ="R-squared Comparison using Repeated 5-fold CV",x ="Model",y ="CV R-squared" ) +theme_minimal(base_size =13)```To further interpret the Random Forest model, a SHAP (Shapley) beeswarm plot [@SHAP] was generated to explore the contribution of each predictor variable to predicting coral cover change (@fig-SHAP-plot). The sign of SHAP values refers to the direction of model contribution, indicating whether the predictor variable increases or decreases predicted coral cover change relative to the baseline prediction, while the SHAP magnitude indicates the strength of contribution [@Lundberg2020]. The colour gradient represents the feature values of the predictor variables. From @fig-SHAP-plot, higher values of `Mean_hs` generally contributed towards more positive predicted coral cover changes, while higher values of `Duration_hours` contributed towards greater coral loss. The SHAP values also illustrated that the influence of these variables varied across observations, suggesting complex non-linear relationships captured by the model.```{r}#| label: fig-SHAP-plot#| message: false#| warning: false#| code-summary: Code (SHAP Beeswarm Plot)#| fig-cap: "SHAP (Shapley) beeswarm plot for the Random Forest model showing the contribution and direction of predictor variables on coral cover change predictions. Features with larger absolute SHAP values have greater influence on model output. Positive SHAP values indicate an increase in predicted coral cover change, while negative SHAP values indicate a decrease."# Random Forest SHAP beeswarm plotX_train <- train_data %>% dplyr::select(-live_coral_change) %>%as.data.frame()X_test <- test_data %>% dplyr::select(-live_coral_change) %>%as.data.frame()pred_fun <-function(object, newdata) {predict(object, newdata =as.data.frame(newdata))}set.seed(3888)shap_values <- fastshap::explain(object = rf_model,X = X_train,pred_wrapper = pred_fun,newdata = X_test,nsim =100,adjust =TRUE)shap_object <- shapviz::shapviz(as.matrix(shap_values),X = X_test)shapviz::sv_importance( shap_object,kind ="bee") + ggplot2::scale_colour_viridis_c(option ="viridis",direction =1,name ="Feature value" ) + ggplot2::ggtitle("Random Forest SHAP Beeswarm Plot")```## Product Deployment (Shiny App)Link to our interactive exhibit: <https://daniel--w.shinyapps.io/DATA3888-Reef-1/>Using our dataset and the Random Forest modelling, we designed an interactive museum exhibit that would guide users through a Shiny app, suitable for the general public or used as a tool by educators to aid learning. Users are able to learn and explore past cyclones that have passed near the CBG, as well as predict how wave and cyclone parameters affect coral cover. Disciplinary content written by MARS students were used in the “Introduction” tab, providing information about coral reefs, cyclones and the waves they cause.Next, users are guided to “Explore Past Cyclones” [@fig-shiny-1].1. Users can **select a previous cyclone** from a dropdown menu, revealing information in the panel below, such as the duration spent near the reef, its closest distance and its associated wave parameters. Grey help icons reveal popups explaining what the more complex variables mean. In the adjacent map, the full cyclone track (grey) and highlighted track where the cyclone approaches within 1000 km of the reef (blue) is displayed. 2. Users **select to see the coral cover before or after** the cyclone passed. This colour-coded tool serves as an easy comparison for exploring impacts of a cyclone. Clicking onto a reef data point will show a popup with detailed information for those more interested.3. Users **select whether to show live or dead coral**, which provides more functionality for users interested in exploring different parameters.The next tab encourages users to “Design Your Own Cyclone” [@fig-shiny-2]. The final regression model was saved as a RDS file and imported into Shiny Despite having seven modelling parameters, we decided to allow user inputs on only four parameters:1. Minimum distance from reef (`Min_distance_km`);2. Duration of cyclone near reef (`Duration_hours`);3. Hours of damaging waves (`Intervals_hs_gt3`); and4. Wave power (`Mean_P`), which was discretised into ordinal categories to improve usability: low (2000 kW/m), medium (3500 kW/m), high (7500 kW/m), and extreme (11000 kW/m).The remaining three variables were fixed at their dataset means (`Mean_hs` = 2 m, `Mean_fp` = 0.116 Hz, `Mean_dir` = 98° \[ESE\]) to reduce cognitive load and improve interpretability. Fixing variables at mean values also avoids unrealistic or noisy combinations.The Random Forest model was used to output the predicted live coral cover change. A balance between user-friendliness and model accuracy was considered while designing the exhibit. For example, discretisation removes variability and can mask nonlinear effects ignoring complex interactions present in the full model. However, reducing dimensionality improves interpretability and enables users to explore the key environmental mechanisms influencing coral cover change without requiring specialist knowledge. While some ecological complexity is lost, the simplified interface retains the dominant patterns identified by the model and provides an effective tool for communicating reef disturbance processes to students and the wider public.```{r}#| label: fig-shiny#| echo: false#| message: false#| warning: false#| code-summary: Code (Screenshots from Shiny exhibit)#| fig-cap: Screenshots taken from the final deployed product (guided museum exhibit) designed in Shiny.#| fig-subcap:#| - "Tab 2: Explore Past Cyclones. Users are prompted to: 1. Use the dropdown to select a past cyclone, displaying the cyclone track and some key statistics; 2. Choose a before/after comparison; and 3. Choose to see live or dead coral cover."#| - "Tab 3: Design Your Own Cyclone. Users are able to customise some provided wave/cyclone parameters. Using our Random Forest regression model, they explore how their chosen combination influences the predicted coral cover change across the reef. Users can also label their Cyclone with their own name!"#| layout-nrow: 2knitr::include_graphics("www/shiny-1.png")knitr::include_graphics("www/shiny-2.png")```# DiscussionBy integrating long‑term coral cover monitoring with high resolution cyclone and wave data, this study demonstrates the importance of **significant wave height**, **cyclone duration**, and **wave power** as key predictors of coral cover change due to TC exposure in the CBG. ## Predictors of Cyclone DamageOverall, our results support that wave exposure is the dominant mechanism of physical disturbance on reefs during TCs [@Puotinen2016; @Madin2014]. Random Forest SHAP values indicated higher mean significant wave height $\mathbf{H}_\mathbf{s}$ contributed to greater predicted coral loss, while longer cyclone (i.e. extreme wave) duration amplified these effects. The finding that wave power was among the most influential predictors supports previous work showing that energy flux, rather than height alone, determines the likelihood of colony dislodgement and rubble production [@Madin2006; @Puotinen2016].Secondarily, minimum cyclone distance and duration contributed to predicting coral cover loss, though with more variable effects. Longer duration cyclones produced more hours of elevated wave conditions, increasing the likelihood of cumulative damage. This aligns with existing studies that characterise cyclone duration as a key indicator for predicting damage [@Dixon2022]. Notably, proximity alone was not a strong predictor of coral loss, supporting that cyclone characteristics do not necessarily translate to reef-scale damage unless accompanied by high wave energy [@Puotinen2016]. This reinforces the value of wave-based metrics over cyclone-based metrics for predicting ecological impacts on reefs.## Model PerformanceThe relatively low R^2^ values across all of the models tested highlight the inherent difficulty of predicting coral cover change from physical variables alone. Coral recovery trajectories, pre‑existing colony morphology, reef orientation, and biological interactions all influence post‑cyclone outcomes but were not included in the dataset [@Death2012; @Hughes2017].Nevertheless, the ability of Random Forest to explain nearly 40% of the variance in coral cover change using only wave and cyclone metrics is notable. It suggests that wave exposure metrics derived from hindcast models can serve as meaningful predictors of immediate cyclone damage, supporting the development of reef‑scale vulnerability assessments and early‑warning tools for managers and communities.## Limitations and Future DirectionsSeveral limitations must be acknowledged. Firstly, the dataset contained only 152 cyclone‑reef observations, limiting model generalisability and increasing sensitivity to outliers. Secondly, repeated observations from the same reefs introduce potential non‑independence, which may influence variable importance estimates. Thirdly, coral cover was measured annually or biannually, meaning that the temporal resolution of ecological data is coarse relative to the hourly wave and cyclone data [@AIMS2025]. This mismatch may obscure short‑term recovery or delayed mortality.Future work could incorporate:- Higher‑resolution ecological data, such as photogrammetry or colony‑level surveys- Additional environmental variables, including thermal stress, water quality, antecedent ecological condition [@Hughes2017], and biological disturbances such as crown of thorns starfish outbreaks- Morphology‑specific vulnerability metrics, given differing mechanical tolerances [@Madin2006]- Hydrodynamic modelling to estimate near‑bottom orbital velocities directly [@Lowe2009]# ConclusionThis study showed wave exposure (mean significant wave height, cyclone duration, and wave power) to be the primary predictor of cyclone‑related coral cover loss in the Capricorn Bunker Group. By integrating long‑term ecological monitoring with IBTrACS cyclone data and CAWCR wave hindcasts, the project demonstrates that reef‑specific wave climates can meaningfully explain immediate disturbance impacts. Although predictive performance was moderate, the Random Forest model captured a substantial portion of the variance in coral cover change, highlighting the value of wave‑based metrics for forecasting cyclone impacts at reefs. Therefore, our results indicate our methodology may present an effective, scalable method for modelling cyclone damage on reefs.# Student ContributionsAll group members contributed collaboratively to the completion of the project. Claire was responsible for the original research question, and further clarifying the literature review was conducted by Camryn, Isobel and Yindi to facilitate question direction. Yindi, Laashan, and Sara were responsible for generating the data. Sara and Laashan performed the data merging, while Bin and Evelyn handled data processing. Sara, Laashan, Bin, and Evelyn began the initial modelling, after which Bin and Evelyn focused on finalising hyperparameter tuning, model comparisons, results, and accompanying figures. Daniel was responsible for conducting t-tests, developing the Shiny application, and managing the coding and deployment of the final Shiny app product, with the help of Claire, Isobel, Yindi, and Camryn for content and use guidance. All members contributed to the writing and editing of the report, as well as discussions and decision-making, and agree that this statement accurately reflects their individual contributions.**AI acknowledgement**Microsoft Copilot under the University of Sydney enterprise subscription was used to improve aesthetics, keep consistent formatting and debug the Shiny application.The Generative AI output was not directly used in any part of the report.# References::: {#refs}:::# Appendix## Merging Datasets {#sec-merging-datasets .appendix}All netCDF wave files were combined into a single dataset, with each wave point linked to its corresponding reef. This produced the final wave dataset, **“combined_wave_data.csv”**, which included significant wave height, peak wave frequency, wave direction, and related variables.The wave dataset (**“combined_wave_data.csv”**) was then merged with the IBTrACS cyclone dataset (**“ibtracs.since1980.list.v04r01.csv”**) by matching observations on date, hour, and minute. The spatial distance between reef wave points and cyclone locations was calculated using the Haversine formula and stored as a new variable. Only cyclone observations within 1000 km of reef locations were retained, resulting in the final combined dataset, **“combined_wave_cyclone.csv”**.The dataset (**“combined_wave_cyclone.csv”**) was then merged with the AIMS manta tow coral cover dataset (**“capricornbunkermantatowdata.csv”**) for use in the final modelling analysis, producing **“merged_coral_cover.csv”**. The datasets were joined using **Reef_ID** as the common key, matching manta tow survey records to each cyclone–reef observation in the wave–cyclone dataset.Additional variables were then derived for analysis, including mean and maximum wave height, the number of hours during which wave height exceeded thresholds of 2.5 m, 3 m, and 4 m, wave direction, wave period, and duration of cyclone impact. Further calculations included minimum distance between cyclone centre and reef, as well as mean and maximum wave energy and wave power.## Multicollinearity Heatmap Plot {#sec-multicollinearity-heatmap .appendix}```{r fig.height=8, fig.width=10}#| label: fig-correlation-heatmap#| message: false#| warning: false#| code-summary: Code (Correlation Heatmap)#| fig-cap: Pearson correlation heatmap of cyclone exposure variables used in the modelling process. Red colours indicate positive correlations, while blue colours indicate negative correlations. Darker red cells indicate stronger positive correlations and highlight potential multicollinearity among several wave height and energy-related predictors.correlation_matrix <- cor( all_data, use = "pairwise.complete.obs", method = "pearson")# Keep only lower trianglecorrelation_matrix[upper.tri(correlation_matrix)] <- NAcorrelation_long <- as.data.frame(as.table(correlation_matrix)) %>% as_tibble() %>% rename( variable_y = Var1, variable_x = Var2, correlation = Freq ) %>% filter(!is.na(correlation)) %>% mutate( variable_x = factor(variable_x, levels = all_vars), variable_y = factor(variable_y, levels = rev(all_vars)), correlation_label = sprintf("%.2f", correlation) )ggplot(correlation_long, aes(x = variable_x, y = variable_y, fill = correlation)) + geom_tile(color = "white", linewidth = 0.5) + geom_text(aes(label = correlation_label), size = 3) + scale_fill_gradient2( low = "#2166AC", mid = "white", high = "#B2182B", midpoint = 0, limits = c(-1, 1), name = "Pearson\ncorrelation" ) + coord_fixed() + labs( title = "Correlation Heatmap of Cyclone Exposure Variables", x = NULL, y = NULL ) + theme_minimal(base_size = 12) + theme( plot.title = element_text(face = "bold", size = 15), axis.text.x = element_text(angle = 45, hjust = 1, vjust = 1), axis.text.y = element_text(hjust = 1), panel.grid = element_blank() )```## Final Variables and their Descriptions {#sec-final-vars .appendix}| Variable Name | Description ||---------------------------|---------------------------------------------|| **`Mean_hs`** | Mean significant wave height (m) || **`Intervals_hs_gt3`** | Hours of waves \> 3 metres significant wave height (hours) || **`Mean_fp`** | Mean peak frequency (Hz) || **`Mean_dir`** | Mean wave direction from North (°) || **`Min_distance_km`** | Minimum distance between cyclone and reef (km) || **`Duration_hours`** | Duration of time cyclone was in 1000 km radius of reef (hours) || **`Mean_P`** | Mean wave power (kW/m) |: Final wave and cyclone derived variables in the Random Forest regression model