Executive Summary
This project uses forecasting techniques to predict commercial catch levels of sea scallops, with a specific focus on the scalloping industry in Massachusetts. The primary aim is to provide accurate predictions regarding the quantity of fish caught, which is vital for stakeholders reliant on the fishery. Using historical data from the National Oceanic and Atmospheric Administration and relevant economic and regulatory variables, several models were developed and tested. The best performing models relied on regulatory measures on catch and 1-year lagged revenues as predictors.
The utility of this forecast is tied to the clear economic importance of sea scallops in this region’s fishing sector. The commodity generated approximately $317 million in revenue in Massachusetts in 2023. This represents greater than 50% of all fishery revenues in the state in that year. Historical data spanning from 1950 to 2023 indicates fluctuations in sea scallop landings, emphasizing the need for robust forecasting methods to try to deal with the uncertainties of the fishing industry.
The analysis begins by defining the dependent variable, which is the total metric tons of scallops landed. Throughout the years, both the volume of scallops caught and the average annual price per pound have shown growth, particularly accelerating after 2000. Notably, while exports of scallops initially increased from 1990 to 2015, a subsequent decline highlighted the complexity of the market dynamics, suggesting that exports might not be a reliable predictor of catch volume. Domestic demand was captured using input-output data from the Bureau of Economic Analysis. Annual value added in industries with the highest demand for fish products were used as independent variables. Regulatory factors such as the number of days at sea and possession limits also play a significant role in shaping the scallop fishery. An index of restrictiveness was created to account for these regulations; this metric was created to gain a more nuanced understanding of regulations’ effects on landings. Ultimately, the possession limit proved to be the best predictor among those considered.
Following EDA, the modeling phase utilizes various statistical techniques to forecast future landings based on historical data. The initial models incorporate variables such as previous revenue totals and daily harvest limits, with the second model achieving a notably higher R-squared value of 0.9, indicating a strong fit. The prediction of 2023 total catch was within about 10% of the actual data. In 2024, the forecast estimates catch to decline by 5% to 11.4k metric tons. This minor decrease is largely attributable to the possession limit (12,000 pounds) remaining the same as 2023 and the slightly decreased revenues in the previous year.
The implications of this project for the planning capabilities of the scalloping fleet are potentially useful. Using a model like this, fleet operators can anticipate market conditions and adjust their fishing efforts accordingly, thereby maximizing catch rates and profitability. Furthermore, insights into economic indicators and species revenue will facilitate more informed resource allocation, ensuring that fishing efforts are concentrated on the most lucrative species and optimal timings. Regulators could further understand the effects of policy changes to measures like days at sea and catch limits. Finally, stakeholders reliant on a steady supply of the product (food manufacturers, restaurants, etc.) can plan their operations more effectively with an advanced signal of future catch.
Introduction
The scalloping industry plays a pivotal role in the economy of northeast US coastal regions, particularly in Massachusetts, where sea scallops generated approximately $317 million in revenue in 2023—over half of the state’s total fishery revenues. The industry faces uncertainties due to fluctuating catch levels and complex market dynamics. This project uses predictive analytics to forecast commercial sea scallop catch levels. By leveraging historical data from the National Oceanic and Atmospheric Administration (NOAA), alongside economic and regulatory variables, the analysis aims to provide stakeholders with accurate and actionable estimates of future landings. Reliable predictions are critical for the fishing fleet, resource managers, and downstream industries reliant on a steady supply of scallops.
The value of this forecasting effort lies in its ability to inform strategic decisions for the scalloping fleet and the broader seafood supply chain. The models developed in this project use predictors like regulatory measures (e.g., possession limits and number of fishing days allowed), demand for fish products, and fishery revenues from one year prior to estimate future catch levels. Possession limits, or the total allowable weight on a given trip, emerged as a particularly strong predictor, highlighting the influence of regulatory factors on fishery outcomes. Using these models, the 2023 catch prediction closely aligned with actual landings, demonstrating its potential for informing decision-making.
Exploratory data analysis
Defining the dependent variable
Sea scallops are the highest-revenue commercial catch in Massachusetts. This is consistent across multiple years. For this reason, scallops are economically significant for the region. Furthermore, predictions on the fleet’s performance are of particular interest.
ACF diagnostics show some autocorrelation in the raw data. Using the first difference causes the ACF to appear much more closely to white noise. This will be important when attempting an ARIMA specification when modeling.
Exports of scallops
Both the metric tons landed and average annual price per pound grow over the period and accelerate after 2000. One metric that can be indicative of total demand for scallops is the annual exports.
## `summarise()` has grouped output by 'Year'. You can override using the
## `.groups` argument.
Interestingly, spending among the top importers of scallops from the US had an upward trend from 1990-2015, followed by a sharp downward shift. This slowdown in exports coincides with a period of rising prices and greater volume of landings, which suggests that exports may not be a strong predictor of catch. Still, total exports will be a possible independent/predictor variable in the models built.
Regulation of the industry
A major factor in the actions of Massachusetts scallop fishing fleet members is the number of days they are allowed at sea, the amount of scallops they are able to possess, and the territories they are allowed to harvest in. Days at sea (DAS) and possession limit data are available from regulatory bodies going back to 1999.
The regulatory data shows a few interesting trends. While the total days at sea has declined steadily over time, possession limits, the amount a vessel is allowed to hold onboard at one time, fluctuates. These factors drive the total annual allowed harvest, which has shifted in a sort of “M” pattern. One important element of this: these regulations took effect in the late 1990’s. Using them in a model requires a dramatic drop in the number of observations. To avoid loss of observations, an index of restrictiveness is later created wherein all values prior to 1999 are 0.
Demand from other sectors
Input-output (IO) data from the BEA was used to identify domestic drivers of demand. Industries that contribute high shares of total output in the fishing, hunting and trapping industry were isolated by sorting the total demand for that industry (contained in the rows of the IO table). The industries most reliant upon fishing, hunting and trapping were food manufacturing and food services establishments. These findings are intuitive - those industries would naturally make more use of products like fresh/frozen scallops. The performance of these industries could be used to predict the total demand and total commercial landings of scallops. Final demand, as measured by personal consumption expenditures, is also considered.
Feature creation: restrictiveness index
The index of regulatory restrictiveness is a weighted average of days at sea and the possession limit, both factors in how much the scalloping fleet can harvest in a year. Using those variables alone limits the observations we can include in the model, as they were instituted beginning in 1999. The index is structured so the maximum number of observations (years 1950-2023) can be used in a model.
Correlations
Food manufacturing GDP and the possession limit are positively correlated with the total tons landed. The proxy variable for restrictiveness is negatively correlated. These are the variables used in the regression forecasts.
Modeling
Training
Training of the models used all years up to 2015.
The first model estimates landings using the previous year’s revenue total and the total harvest allowed per day. Revenues of the previous year, influenced by favorable prices for scallopers, could have an effect on market entrants from year to year. While the fit looks reasonable, the limited number of observations may suggest it is overfitting and may not perform well on test data. The model does have a relatively high r-squared (0.65) and the variables are significant at the 95% level.
\[log(Metric tons) = \beta0 + log(harvest per day)\beta1 + log(Dollars(t-1))\beta2 + \epsilon\]
The second model has similar specification but instead considers the possession limit. R-squared is higher (0.9) and has all significant variables.
\[log(Metric tons) = \beta0 + log(possessionlimit)\beta1 + log(Dollars(t-1))\beta2 + \epsilon\] The third model uses the restrictiveness index and the previous year’s total revenue.
\[log(Metric tons) = \beta0 + log(restrictivenessindex)\beta1 + log(Dollars(t-1))\beta2 + \epsilon\]
The final model is an ARIMA specification. The fit appears very strong. This makes sense as ARIMA models are suited for variables with autocorrelation as is the case here. The first difference is taken as the ACF using a differenced version of the outcome variable appeared to be white noise, as mentioned in the exploratory analysis.
## Series: Metric_Tons
## Model: TSLM
## Transformation: log(Metric_Tons)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.21143 -0.05334 0.02108 0.09755 0.19650
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.21171 1.16409 -0.182 0.858494
## log(AA_Harvest) 0.30923 0.06711 4.608 0.000491 ***
## lag(log(Dollars), 1) 0.32776 0.04951 6.620 1.66e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1255 on 13 degrees of freedom
## Multiple R-squared: 0.8399, Adjusted R-squared: 0.8153
## F-statistic: 34.1 on 2 and 13 DF, p-value: 6.7334e-06
## Series: Metric_Tons
## Model: TSLM
## Transformation: log(Metric_Tons)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.17529 -0.02917 0.02147 0.05654 0.14995
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.89155 1.10232 -2.623 0.0211 *
## lag(log(Dollars), 1) 0.25240 0.03916 6.445 2.18e-05 ***
## log(Poss_Limit) 0.77609 0.11226 6.913 1.06e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09417 on 13 degrees of freedom
## Multiple R-squared: 0.9099, Adjusted R-squared: 0.896
## F-statistic: 65.61 on 2 and 13 DF, p-value: 1.6102e-07
## Series: Metric_Tons
## Model: TSLM
## Transformation: log(Metric_Tons)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.0909 -0.4095 0.1203 0.3615 0.8081
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.57002 1.01655 5.479 8.59e-07 ***
## lag(log(Dollars), 1) 0.20374 0.05382 3.786 0.000352 ***
## log(RestrictivenessIndex_weighted) 0.05056 0.02149 2.352 0.021890 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4837 on 61 degrees of freedom
## Multiple R-squared: 0.4068, Adjusted R-squared: 0.3874
## F-statistic: 20.92 on 2 and 61 DF, p-value: 1.2075e-07
## Series: Metric_Tons
## Model: ARIMA(1,1,0)
## Transformation: log(Metric_Tons)
##
## Coefficients:
## ar1
## 0.3171
## s.e. 0.1188
##
## sigma^2 estimated as 0.05727: log likelihood=1.16
## AIC=1.69 AICc=1.88 BIC=6
Forecast/Test
Now that the models are trained and the forecasts using the test data are complete, it is useful to see how the model’s predictions compare to the actual data in the test period.
While it is helpful to see the different models’ estimates in the test period compared to the actual results, it is not obvious which model compares most closely. Model 1 appears to track more closely with the actual data in the beginning of the forecast period, while models 2 and 3 are far closer from 2020-2023. To compare the models more clearly, we use the root mean squared error (RMSE). RMSE penalizes large errors more heavily than smaller ones, reflecting the variability in prediction accuracy. Its consistent scale with the original data makes it intuitive and easy to interpret for assessing model performance.
RMSE using the test data is lowest for model 2, so it is the initial pick. Additional testing is needed to refine the model. Specifically, attempts at addressing autocorrelation are implemented below by adding an AR(1) term to model 2. Issues with multicollinearity stemming from adding the term complicate the choice.
## Series: Metric_Tons
## Model: LM w/ ARIMA(1,0,0) errors
## Transformation: log(Metric_Tons)
##
## Coefficients:
## ar1 lag(log(Dollars), 1) log(Poss_Limit) intercept
## -0.1615 0.2521 0.7806 -2.9274
## s.e. 0.2806 0.0313 0.0920 0.9002
##
## sigma^2 estimated as 0.001849: log likelihood=16.92
## AIC=-23.85 AICc=-22.83 BIC=-12.98
## [1] 5068.5452 3897.0556 787.1276 2616.5980 2792.3997 2275.0285 746.2515
## [8] 3596.0394 1177.9131
Model selection
Model 2.1 has improved RMSE, but the addition of the autoregressive term adds mulitcollinearity. The previous years catch, which the AR term brings in, is highly correlated with the previous years revenues. The variance inflation factor on the AR term is far higher than 5, which is a threshold often used for assessing multicollinearity. While the model’s predictive performance is greater, it violates the assumption of linear independence among independent variables. Model 2 in its original form is the final choice.
\[log(Metric tons) = \beta0 +
log(possessionlimit)\beta1 + log(Dollars(t-1))\beta2 + AR(1) +
\epsilon\]
Data is available on 2024 possession limits and revenues from 2023, so the model can predict the 2024 total catch in metric tons. According to the chosen model, the fishery will land an estimated 11,434 metric tons, a decline of 5% from 2023’s total of roughly 12,000 metric tons.
Discussion of results
The findings indicate that the approach was reasonably accurate in estimating annual scallop landings, particularly when including regulatory factors. Significant relationships were observed between regulatory measures, such as days at sea and harvest limits, and catch levels. While economic variables like national GDP and PCE also were expected to be useful predictors of demand and thereby total catch, their effects were less pronounced than initially expected. The heavy regulations, aimed at sustaining the fishery in the long run, appear to be the biggest influence in fishing activity.
Factors like advancements in fishing technology and open fishing areas were not accounted for and could have influenced the forecasts. Changes to fishing techniques could influence the yield per trip, making fishing more efficient and offsetting the effects of fewer days at sea or lower possession limits. Furthermore, in addition to controlling the quantity of catch and the days allowed at sea, regulators control what areas the fleet is allowed to fish year to year. Areas are blocked off to allow the biomass to build back up after being harvested. The implications of opening and closing different areas of the scallop grounds are easy to grasp. Some areas are more populated with the target species than others, which can explain fluctuations in catch. When areas are opened for the first time in many years, total catch growth may accelerate. Unfortunately, data on these topics is not available in ways that are easy to incorporate into the model.
Another limitation of this modeling effort is the frequency of the data and the number of observations used. While the annual catch data stretches back to 1950, regulatory data begins in only 1999. Given the strong relationship between regulatory rules and catch totals, it is important to incorporate them into the predictions. But doing so comes at a cost, with only 25 or so observations in total. While the NOAA website states they have monthly data on these same concepts (catch, revenues, etc.), it is not available on their data query tool. Budget limitations and confidentiality concerns have pushed the higher frequency data out of public view. While the data is available upon request and NOAA officials did respond to my initial request, by the time they were able to send it, the project would have been past due.
Conclusion
This study offers an approach to forecasting sea scallop landings in Massachusetts, emphasizing the important role of regulatory measures in determining catch levels and offers practical insights for policymakers and a variety of industry stakeholders.To build on this work, future research could incorporate additional predictors such as advancements in fishing technology, open fishing areas, and environmental indicators like ocean temperatures. Higher-frequency data (e.g., monthly) could add reliability to the model. Another addition to the modeling effort could be scenario analysis. Given a range of economic/regulatory outcomes, how might the catch total deviate in the short and medium term? How might different degrees of environmental degradation affect the fishery? Answers to these questions could make the results of the analysis even more useful. The prediction of 2023 total catch was within about 1,000 tons of the actual reported total. In 2024, the forecast estimates catch to decline by 5% to roughly 11.4k metric tons. This minor decrease is largely attributable to the possession limit (12,000) remaining the same as 2023 and the slightly decreased revenues in the previous year.