The Raleigh Metropolitan Statistical Area (MSA) is one of the fastest-growing MSAs in the United States. As seen below in Figure 1.1, the metro area has experienced tremendous population growth over the past 20 years, and it shows few signs of slowing. Between 2010 and 2019, the MSA grew by 23%– three times the national growth rate.1 Along with the state of North Carolina, the Raleigh MSA is also home to a significant acreage of forests, wetlands, and protected habitats, including the Appalachian Mountains. In the wake of such significant growth, it is critical that regional planning entities, such as the Capital Area Metropolitan Planning Organization (CAMPO) are proactive in their assessment of and response to the patterns and drivers that influence urban growth in the region, so as to ensure that development can find its place with minimal sprawl and destruction of ecological habitat.
The use of urban growth and land cover change modeling is of particular importance in a region where land has long been cheap and plentiful, and the use of it for human-led development has been voracious. From 1950 to 2000, the City of Raleigh’s land use grew 3.5 times faster than its population.2 Despite local attempts to rein in sprawl, including the adoption of an Unified Development Ordinance in 2016 and the recent overhaul of the City’s traditional zoning code for a form-based code, the region has yet to curb the wanton consumption of undeveloped land. Figure 1.2 illustrates the City’s rapid outward growth by annexation since its establishment in the year 1792.3
Figure 1.2: Raleigh annexation since 1792. Credit: Nicole Pajor Moore, INDY Week.
An important way that planning entities can study and understand land changes in their region is through urban growth modeling (UGM). Using quantitative data and statistical models, UGM predicts spatial and temporal patterns of urban development in a given area. Growth models typically use inputs such as population growth rates, land use patterns, and infrastructure development to simulate and forecast future growth and change in urban areas. By tracing these inputs over time, urban growth models can help planners and policymakers understand how urban areas are likely to change in response to different scenarios, policies, or interventions.
For this analysis, we have used a ‘spatially-explicit’ (parcel scale) urban growth model to project and analyze growth in the Raleigh metro area over a 20-year period (2019-2029).4 Using a 4,000 ft. grid cell (fishnet) as our spatial unit of analysis, we assess land use change from 2009 to 2019 as a function of a set of explanatory variables intended to capture some of the core supply and demand drivers of development in the Raleigh MSA. Through a series of binomial (dichotomous outcome) logistic regressions, future development patterns in the MSA are predicted based on variables related to population size, land cover type, and transportation infrastructure.
Some highlights from our analysis include:
As CAMPO looks to the future and embarks on its next comprehensive planning process, the following analysis and forecast of urban growth are intended to facilitate more informed decisions and ensure that CAMPO’s future plans and investments align with the needs and priorities of the region’s residents, economy, and environment.
(❀❛ ֊ ❛„)♡
The following code prepared our workflows for this analysis. It includes definitions for the themes and color palettes used in visualizations and loads in several functions required for further data processing, analysis, and mapping.
This analysis is interested in land cover change over time and the various land use or demographic factors that may, or may not, influence land cover changes. Essential to this analysis is the processing of land cover data and the creation of independent variables that we hypothesize may impact development patterns over time.
Our core hypothesis assumes that land cover change is a function of both demand-side and supply-side factors, namely:
Land Cover Change
Population Growth
Highway Infrastructure
Location of Existing Development
Utilizing land cover rasters from 2009 and 2019, total population counts from 2010 and 2019, and highway infrastructure data, we establish the framework for our analysis and eventually forecast models of development patterns in the Raleigh MSA area for the year 2029.
To start our analysis, we first pulled shapefiles of the geographic boundaries of the Raleigh MSA and its constituent counties: Franklin, Wake, and Johnston. Once loaded, we re-projected the data in the ‘NAD83(2011)/North Carolina (ftUS)’ coordinate reference system (CRS) to ensure uniformity, in addition to creating a spatial union of the MSA’s constituent counties to generate a study area extent and resultant map (Figure 2.1.1).
‿︵‿୨♡୧‿︵‿︵︵︵‿︵‿୨♡୧‿︵‿︵︵︵‿︵‿୨♡୧‿︵‿︵
Second, we pulled in land cover raster data for the Raleigh MSA from 2009 and 2019, and cropped the rasters to the extent of the Raleigh MSA. We then plotted the cropped land cover data with each land cover type represented by a different color.
The land cover rasters were sourced from the Multi-Resolution Land Characteristics Consortium’s National Land Cover Database (NLCD) for our base years.
Using the land cover rasters from 2009 and 2019, we reclassified change in developed land by assigning values between 0 - 2 depending on whether land was undeveloped, developed or remained undeveloped between 2009 and 2019. Land cover was assigned either:
0 = Undeveloped in 2009 and in 2019;
1 = Undeveloped in 2009, but Developed in 2019; or
2 = Developed in 2009 and 2019.
The two land cover data sets are subsequently added together using a raster calculator function to create a new data set that shows the net change in land cover between 2009 and 2019. This effectively creates a binary un/developed dependent variable of development change.
Figure 2.2.3 shows the resultant land cover change reclassification with only values of “1” displayed, identifying areas that were “developed” in 2019 from being “undeveloped” in 2009.
In order to better visualize the land cover development changes, we create a fishnet. Using the reclassified land cover data from 2009, we grouped the data by the fishnet grid cells, which shows the difference between developed and undeveloped land cover areas.
We then convert the raster land cover data to points and aggregate the points to the Raleigh MSA fishnet. We visualize the land cover change on the fishnet by creating a layer exclusively of areas that experienced changes in land use (i.e., have a corresponding value of ‘1’ from the previous workflow).
In order to simplify our land cover analysis, we grouped and reclassified the land cover categories using the following methodology:
| Old_Classification | New_Classification |
|---|---|
| Open Space as well as Low, Medium and High Intensity Development | Developed |
| Deciduous, Evergreen, and Mixed Forest | Forest |
| Pasture/Hay and Cultivated Crops | Farm |
| Woody and Emergent Herbaceous Wetlands | Woodlands |
| Barren Land, Dwarf Scrub, and Grassland/Herbaceous | Other Undeveloped |
| Water | Water |
(=♡ ᆺ ♡=)
In order to apply these new classifications, we utilized a function,
aggregateRaster, which aggregates our list of reclassified
rasters, converts them to points, and adds the resulting sum to the
fishnet. The output is a single, large fishnet that summarizes all the
rasters, as seen in Figure 2.3.3.
Total population and change in population over time are critical for modeling development demand. As population grows, so does the demand for resources. Using 5-year estimates from the American Community Survey (ACS) data from 2010* and 2019, the following section visualizes changes in population over the ten year period.
*We are using 2010 ACS data in order to maintain consistency with 2019, as census tract boundaries changed between 2009 and 2010 to the current (2019) boundaries.
By interpolating population data to the fishnet, we can create a more detailed picture of population patterns and changes over time that wouldn’t be possible with raw census data alone.
Comparing the 2010 census tract geometries and population weighted grid cells shows the aggregation differences in the data between the census tracts to the fishnet.
Transportation infrastructure serves as a vehicle (pun-intended) for supply-side development and is an important indicator when studying development potential. It is also critical to study in this case, because the development of highways and land uses that mushroom around them have long been associated with the destruction, fragmentation, and degradation of natural habitats and resources.
As a starting point in analyzing the relationship between highway development and other types of development, we plot highways and major roads in the Raleigh metro area to our fishnet.
In the following section, we calculate the distance from highways and major roads in the Raleigh area and aggregate those distances to the fishnet. The plot below shows the highway distance variable we will include in future models.
Another important variable needed to develop our prediction model is the ‘spatial lag’ of development. This captures the propensity for development at a location based on its proximity to existing development. The spatial lag analysis is useful for understanding the spatial pattern of development in the Raleigh MSA and identifying areas with high and low levels of development. We maintain a distance parameter of ‘k=2’ to account for the variation in development density across the MSA.
The map below shows us that the spatial lag of development actually sprawls across the MSA - a trend that seems to be encouraging sprawl, leapfrog development, and could likely threaten natural land in the absence of urban growth boundaries or other incentives to limit development to the city center.
Finally, we combine all the different geographic independent variables (land cover, population growth, distance to highways, and spatial lag) into a single data set for analysis. The resulting ‘dat’ data set can then be used for further logistic regression analysis to understand the relationships between different geographic features and their impact on development patterns in the Raleigh area.
✎ (❁ᴗ͈ˬᴗ͈) ༉‧ ♡*.✧
For the next step, we explore how the variables impact development patterns in the Raleigh MSA. We first do this through a series of plots comparing existing and new development patterns as they relate to our independent variables.
Since our dependent variable is binary (no change or new development), the goal is to investigate whether there is a significant difference between the areas that experienced development and those that did not for each of the continuous features.
Interpreting each plot involves examining differences in the mean value of features between the two categories of development change. If the mean value is higher for “new development” compared to “no change”, then it suggests a positive association between the feature and the propensity for development. Conversely, a lower mean value for “new development” group suggests a negative association.
In this bar plot, we plotted the relationship between new development and population variables for the study area. Specifically, this evaluates the relationship between the total population in 2010, 2019, and the change in population.
We see that the changes in total population that occurred in 2019 for the MSA which is correlated with the highest comparative rate of conversion. Interestingly, distance from highways shows a stronger correlation to ‘no change’ in land use.
Next, a table of land cover conversion between 2009 and 2019 is created, providing an overview of the extent to which each land cover type has been converted to developed land. The land cover types with the highest conversion rates are most vulnerable to development and can be used to inform land management decisions. In this case, forested land cover appears to be the most susceptible to land conversions at just over 2%.
| Land_Cover_Type | Conversion_Rate |
|---|---|
| developed | 0.92% |
| farm | 1.05% |
| forest | 2.24% |
| otherUndeveloped | 0.44% |
| wetlands | 0.13% |
In this step we created six logistic regression models to predict development change between 2009 and 2011. The data set is split into a training set (50% of the data) and a test set (remaining 50% of the data) to evaluate the accuracy and generalizability of each model. Each subsequent model is more sophisticated than the last, with new variables added for each model.The models build on each other becoming increasingly sophisticated than the last, with new variables added for each new model.
꘎♡━━━━━♡꘎꘎♡━━━━━♡꘎꘎♡━━━━━♡꘎꘎♡━━━━━♡꘎꘎♡━━━━━♡꘎꘎♡━━━━━♡꘎꘎♡━━━━━♡꘎
First we divide the data set into two parts, a training set and a test set, for the purpose of the model evaluation. The training set is used to train the model and the test set is used to evaluate the model’s performance. By evaluating the model’s performance on data that it has not seen before (i.e., the test set), we can get a sense of how well the model will perform on new data.
## [1] 3914
The models build upon each other, starting with…
Model 1 includes only the 2009 land cover types,
Model 2 adds the lagDevelopment variable,
Model 3 adds population in 2009,
Model 4 adds the 2019 population,
Model 5 adds population change, and
Model 6 adds distance to highways.
By factoring population change between 2009 and 2019 into the regression as well as the supply-side parameter of distance to highways, the final model (Model 6) is well-specified to forecast 2029 development based on population change between 2019 and 2029 (pop_Change).
To determine the best model, the McFadden or “Pseudo” R Squared statistic is used to evaluate the goodness of fit for each model on the test set. The model with the highest McFadden R Squared value is chosen for prediction purposes.
The McFadden R-Squared tells us each model’s goodness of fit, with values closer to 1 indicating a better fit. While it works differently for a logistic regression model than for a linear regression, it is a useful measure to compare different models. In our case, we run the McFadden R-Squared values for Models 1 through 6 below, and learn that:
Overall, the McFadden R-Squared are quite low for all models, but it at least gives us a frame of reference to select Model 6 from our range of models, especially since it runs the gamut of both demand and supply-side variables
Subsequently plotting the distribution of predicted probabilities in Model 6 reveals that the predicted probability of new development is mostly below 50%, which checks out with the low occurrence of new development in our Land Cover Change map.
The following sensitivity and specificity analysis looks at the number of False Positives and False Negatives emerging in our model. Given that the point of this model is to predict and allocate zones of future development, we want to be more certain that our model is predicting 1’s, i.e., sites of development with a greater accuracy so that it isn’t under or over predicting development.
We chose two probability thresholds of 5% and 17% for this analysis, which corresponds with the peak observed for “new development” in the “Probability of New Development” graph above.
| Variable | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| predClass_05 | 0.77 | 0.64 | 0.76 |
| predClass_17 | 0.97 | 0.17 | 0.94 |
Now, we create new variables for predictions of new development using the previously trained model (Model 6).
Visualizing the prediction for new development using the machine learning model trained to historic land use data from earlier.
The code below evaluates the performance of our land cover prediction model through a confusion matrix. It produces both true positives (Sensitivity) and true negatives (Specificity) for each grid cell by threshold type. Notice how the spatial pattern of Sensitivity for both thresholds is relatively consistent, but the 5% threshold misses most the study area with respect to Specificity.
For this case, we want to optimize for how accurately our model predicts new development areas as actually being developed, i.e. its Sensitivity rate.
It is important to note at the outset that all three counties considered in the Raleigh MSA are different in their land use and land cover characteristics. Conducting a spatial cross-validation test for all three counties allows us to test how applicable Model 6 is to all counties in spite of their differences, and how our selected thresholds of 5% and 17% probabilities predict across all counties.
The following section checks the generalizability of Model 6 across Wake, Johnston, and Franklin counties at a 17% probability threshold:
All counties exhibit a high accuracy, with Johnston County (97%) being the highest and Wake County (91%) the lowest.
Wake County has the lowest Sensitivity level at 0%, which is understandable given the concentration of development already in its boundary
The test affirms the that the model can be applied to all three counties that experienced land use and land cover change in Raleigh.
| county | Observed_Change | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|
| Franklin | 20 | 0.98 | 0.00 | 0.96 |
| Johnston | 35 | 0.99 | 0.03 | 0.97 |
| Wake | 101 | 0.96 | 0.25 | 0.91 |
What happens if we use a 5% threshold for the same analysis?
The overall accuracy for each county reduces from the 17% threshold, but the Sensitivity rates radically improve
Johnston County once again has the highest accuracy at 85% and the highest Specificity at 84%, but the lowest Sensitivity at 40%
Wake County has the lowest accuracy at 67% but also presents a high Sensitivity at 76%
This testing changes our use of the 17% probability threshold in favor of the 5% threshold, based on a higher level of Sensitivity observed with the 5% threshold. Such a selection doesn’t aim for perfection, but gives us a way to tune our model across a large MSA with varied land cover characteristics.
| county | Observed_Change | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|
| Franklin | 20 | 0.84 | 0.55 | 0.84 |
| Johnston | 35 | 0.86 | 0.40 | 0.85 |
| Wake | 101 | 0.66 | 0.76 | 0.67 |
♡❀˖⁺. ༶⋆˙⊹❀♡♡❀˖⁺. ༶⋆˙⊹❀♡♡❀˖⁺. ༶⋆˙⊹❀♡♡❀˖⁺. ༶⋆˙⊹❀♡♡❀˖⁺. ༶⋆˙⊹❀♡
The population projections we utilized for 2029 are from the North Carolina Office of State Budget and Management.5 The projections show an overall increase in population for all three counties in the Raleigh MSA, the most dramatic being for Wake County, which sees a 32% increase in total population. This is consistent with the higher proportion of developed land and in new developments from 2009-2019 in Wake County that we see in previous visualizations.
As the City of Raleigh, North Carolina’s state capital, is located in Wake County and, as we discussed in the introduction, has struggled with unchecked sprawl, we are not surprised by the distribution of predicted growth. The predicted concentration of sprawl seems reasonable and fitting with historic trends. Both Johnston and Franklin are significantly less developed than Wake, having no major cities. However, Johnston County is predicted to out-pace Franklin, which aligns with current development and population trends in both counties.
Notably, as can be seen in the development demands (Figure 4.5.1), the spatial pattern of predicted expansion almost perfectly traces the location of major highways in the area, which act as a facilitator– both for bridging formerly rural and undeveloped areas to the central city, and for allowing development to sprawl outwards.
─── ⋆⋅☆⋅⋆ ───
This section focuses on the supply-side factors of land use development, specifically from the perspective of environmental conservation. For an MSA like Raleigh where development is expanding outwards from a core county, what are the measures a local government or planning agency can take to curb the proliferation of human settlements? Can certain lands be deemed “unsuitable” for development? Is there a land ethic we, as planners, should hold ourselves accountable to in the pursuit of economic growth?
In this analysis, we apply a layer of suitability to check how our predictions for development in 2029 threaten natural habitats. We classify undeveloped and farm land as “suitable”, and forests and wetlands as “unsuitable” for new development. To see how this suitability plays out spatially, we compare Land Cover data for 2009 and 2019, and subsequently reclassify and aggregate 2019 Land Cover data to the Raleigh fishnet, exactly the way we previously did for 2009 Land Cover data.
ˏˋ°•*⁀➷
The map below shows us how much sensitive land cover, i.e., forests and wetlands, were lost between 2009 and 2019.
We further convert wetlands and forests from the 2019 land cover raster to contiguous regions to understand exactly how they are spread out across the MSA.
Looking at how sensitive land is distributed (and lost) across counties in the plot below, we see that Franklin County contains the largest proportion of total sensitive lands and forests, while Johnston County contains the largest proportion of wetlands. Sensitive land loss has been the most in Wake County, an unsurprising trend given that new development typically tends to agglomerate near existing areas of concentrated development, and Wake County also has a higher mean value of Development Demand.
What’s also interesting to note is that Johnston County sits somewhere in the middle of Wake and Franklin County (though closer to Franklin) in terms of its population growth (see Figure 7.3.1), and even surpasses Wake County in its Population Change Rate over the study time period. It simultaneously contains a high acreage of wetlands and farmland - implying that an ecologically sensitive growth policy can either make or break the future of wetlands in Raleigh.
From this graph, we can begin to formulate a preliminary framework to allocate new development rights in the Raleigh MSA.
ฅ•ﻌ•ฅ
In the final stage of our project, we check how the spatial location of predicted development demand and predicted population change correspond with the location of sensitive land. Our analysis looks at each county individually, because each of the three counties in Raleigh exhibit different patterns of land cover and growth, and would subsequently require different interventions and policies for conservation.
How to Read The Allocation Maps
The maps illustrate development demand and population growth in each county by fishnet cell. They are layered with land use data, where grey and yellow circles respectively mark which fishnet cells are already developed, and which contain land unsuitable for development (either forests or wetlands). The empty fishnets with a higher development demand / population growth are thus ideal to site new high-density properties.
As the most developed county in Raleigh, Wake is mostly built out, with most of its existing properties along highways. A large proportions of “unsuitable” land also exists in Wake, that should ideally not be razed for development. Parcels of land along highways towards the east contain potential for development, along with opportunities of densification for existing properties in areas projected to experience the highest population increase.
Johnston County has a substantially lower population than Wake County, and a substantially higher proportion of protected land. While the southeast portion of Johnston, seems relatively safe from new development, the northwest part illustrates a high demand for development and population growth without the availability of suitable land. This part of Johnston County directly abuts Wake County, thus likely experiencing sprawl effects from Wake, exacerbated by the highways.
Franklin County contains the largest proportion of sensitive regions. Similar to Johnston County, the southwest edge of Franklin County that is adjacent to Wake County is at the highest risk of sensitive land loss from new development. Parcels within the county also highlight points of concern, especially those near highways where development demand is especially high and land is not suitable for development.
╚═════════════ ❀•°❀°•❀ ═════════════╝
Elizabeth Ordonez, Carolina Demography, Oct. 30, 2020, https://www.ncdemography.org/2020/10/30/raleigh-is-the-second-fastest-growing-large-metro-in-the-united-states-behind-austin/.↩︎
Steve Goldberg, “Reigning in Sprawl-eigh,” Time Magazine, March 25, 2011.https://content.time.com/time/specials/packages/article/0,28804,2026474_2026675_2061559,00.html/.↩︎
Jasmine Gallup, “As Raleigh Grows, So Do Its City Limits,” INDY Week, March 1, 2023. https://indyweek.com/news/wake/as-raleigh-grows-so-do-its-city-limits/.↩︎
“Chapter 8. Urban Growth Models: State of the Art and Prospects” In Global Urbanization edited by Eugenie L. Birch and Susan M. Wachter, 126-150. Philadelphia: University of Pennsylvania Press, 2011. https://doi.org/10.9783/9780812204476.126.↩︎
https://www.osbm.nc.gov/facts-figures/population-demographics/state-demographer/countystate-population-projections.↩︎