The Atlanta Metropolitan Statistical Area (MSA) consists of 29 counties and grew from 2.6 million to 4.1 million people between 1990 and 2010 (1). Further, while population growth in the Atlanta area slowed during the Great Recession, between 2010 and 2019 the region added 730,000 residents (2). This growth makes the entire Atlanta MSA a critical place for planners to ensure sustainable, economically viable development as a function of available land and population growth. This analysis serves to inform the Atlanta Regional Commission (Atlanta’s metropolitan planning organization) about development trends within its 10-county region, as well as trends within the entire 29-county MSA.
Planners can gain an understanding of urban growth through an urban growth model. These typically take numerical inputs or forecasts and translate them to spatial forecasts. Urban growth models also allow planners to test different interventions against the same set of inputs to evaluate their success. The reduced form urban growth model results in the probability of land change based on a set of urban factors through a binomial logit model. Additionally, the reduced form urban growth model assumes the future will follow the trends of the short-term past, as it is calibrated by and used for two sequential time periods. Based on the population growth observed in Atlanta during the time periods in question, the reduced form model offers an opportunity to understand the development trends underpinning this change (3).
The following analysis considers development in the Atlanta MSA from 2001 and 2011, as well as other land cover, proximity to major expressways, and population change to build a predictive model for regional development in 2020. Once fitted, the model can predict on both supply- and demand-side constraints and inputs, based on ground-truthed patterns of development growth, to help planners determine the appropriate allocation of land for development.
This analysis will first provide a brief overview of the data gathered and cleaned. Exploratory analysis of these data against development change will offer context for the impacts of these inputs within the model itself. A brief overview of the modeling process is provided, along with a summary of the final predictive model, metrics for the model’s fit of the data, and steps taken to train the model to predict most accurately. After training, the model inputs are updated to reflect a 2010 baseline and used to predict development demand for 2020 from a both demand- and supply- side perspective. To understand the impacts of development demand change on sensitive land, the environmental sensitivity section tracks the sensitive lands lost between 2001 and 2011 to summarize the relative development demand and available land for development by County. Finally, these summaries are used to understand the relative balance of development demand derived from both supply and demand factors and available land. These summaries are discussed in the allocation section, with the analysis highlighting a few key points in each County, but with the recommendation that allocation should ultimately be handled at the County level of government.
Lee Shearer, “Georgia Growth Super-Concentrated in Atlanta; Half State’s Counties Are Losing Population,” Savannah Morning News, accessed May 8, 2021, https://www.savannahnow.com/news/2016-06-22/georgia-growth-super-concentrated-atlanta-half-states-counties-are-losing-population.
Sean Richard Keenan, “Census: Metro Atlanta Packed on 730,000 More Residents in Nine Years,” Curbed Atlanta, March 31, 2020, https://atlanta.curbed.com/2020/3/31/21200613/atlanta-metro-population-census-data-growth.
John Landis, “Modeling Urban Growth & Land Use Change.”
Development Modeling Considerations
Many factors contribute to urban development patterns. Land developers, for example, likely consider a location’s accessibility to other developments, land availability, land prices, and proximity to transportation, among other things, when evaluating development locations. This analysis assumes that developers prefer accessibility to existing development and proximity to transportation only.
This model uses the fishnet method to manageably break down the data collected and perform analysis. The fishnet used contains 4,000-foot by 4,000-foot cells (referred to as “land”) covering the entirety of the Atlanta MSA. The Atlanta MSA is defined as “Barrow”, “Bartow”, “Butts”, “Carroll”, “Cherokee”, “Clayton”, “Cobb”, “Coweta”, “Dawson”, “DeKalb”, “Douglas”, “Fayette”, “Forsyth”, “Fulton”, “Gwinnett”, “Haralson”, “Heard”, “Henry”, “Jasper”, “Lamar”, “Meriwether”, “Morgan”, “Newton”, “Paulding”, “Pickens”, “Pike”, “Rockdale”, “Spalding”, and “Walton” counties.
Land Cover
This analysis uses Land Cover data from the Multi-Resolution Land Characteristics Consortium’s National Land Cover Database (NCLD) from both 2001 and 2011. This data is used to determine which land cover cells changed from undeveloped in 2001 to developed in 2011 (represented as the binary dependent variable in this analysis), with results shown in the plot below.
Population Change
Change in population is another critical input to modeling development demand. As the population grows, there will be more demand for housing and other services in the growing area. Population data from the 2000 and 2010 Decennial Census were collected and aggregated to the fishnet.
Proximity to Major Roads
Proximity to major transportation infrastructure is another indicator of development potential. This model uses Georgia Expressway data and assumes this major infrastructure has been constant since 2000 (4). The distance from each land observation to the nearest expressway was aggregated to the fishnet.
Spatial Lag of Development
As mentioned previously, this analysis assumes proximity to existing development matters above other factors like price and existing land use. To measure proximity to development, a “spatial lag of development” is calculated. While this does not account for the existing land use(s), this analysis hypothesizes that the more accessible land is to existing development, the higher the likelihood it becomes a development candidate.
With the observed land developed between 2001 and 2011 (lc_change), one can explore the relationship between these results and the continuous variables calculated in the data wrangling stage. Land that remained undeveloped in 2011 has a higher average distance to a highway than land that was developed. Similarly, undeveloped land has a much higher average distance from existing development (reflected in the spatial lag of development) than land that was developed between 2001 and 2011. These findings largely confirm the hypothesis that new development correlates with proximity to highways and existing development.
Regional population is a critical indicator of demand-side changes in housing and development. In the plot below, developed areas have a much higher average population than undeveloped areas in both 2000 and 2010. Further, when comparing the change in population between time periods, areas that developed had more population increase than areas that remained undeveloped.
Finally, the following table shows the 2001 land cover types and their conversion rates. The data suggests that approximately 3.11% of Atlanta’s forest was converted to development between 2001 and 2011, the highest rate of any other land cover considered.
| Land_Cover_Type | Conversion_Rate |
|---|---|
| developed | 0.36% |
| farm | 0.52% |
| forest | 3.11% |
| otherUndeveloped | 0.06% |
| wetlands | 0.05% |
The 2011 development data is then split 50/50 into 50% training and 50% test datasets. A logistic regression model is trained on half of the observed changes, while the remaining half of observed changes are reserved for testing the model. This methodology will ultimately lead to more accurate predictions of development change between 2001 and 2011.
Modeling
Using the glm function in R, six different binary logistic models were built based on the training data and different selections of variables. The fit of the models is compared using the McFadden R-Squared, a pseudo-R-squared metric often used to compare logit models, as the conventional R-squared measure can only be used to evaluate ordinary least squared (OLS) models. The results of the comparison are shown in the figure below.
When comparing the McFadden R-Squared values, it became clear that the population in 2000 and population in 2010 variables together were stronger predictors than population change alone (shown in Model 4). However, population change allows the model to predict future development scenarios more accurately. Due to concerns of multicollinearity, population change, and the 2000 and 2010 population cannot be included in the same model. Ultimately, population change is included in the final model, Model 6, despite its slightly lower McFadden R-Squared value than Model 4.
A summary of Model 6 is included below. All but one of the model inputs are statistically significant at the highest level.
##
## =============================================
## Dependent variable:
## ---------------------------
## lc_change
## ---------------------------------------------
## wetlands1 0.585
## (0.615)
##
## forest1 2.533***
## (0.194)
##
## farm1 2.252***
## (0.258)
##
## otherUndeveloped1 2.592***
## (0.530)
##
## lagDevelopment -0.0002***
## (0.00001)
##
## pop_Change 0.002***
## (0.0002)
##
## distance_highways -0.00001***
## (0.00000)
##
## Constant -2.973***
## (0.188)
##
## ---------------------------------------------
## Observations 8,469
## Log Likelihood -1,079.371
## Akaike Inf. Crit. 2,174.742
## =============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
Model Predicted Probabilities
Using Model 6, the probability of development for each observation in the test set is predicted at a 0.5 probability threshold, shown in the histogram below. Ultimately, this threshold is not granular enough for prediction, given how rarely development change is observed within the data.
Refining Probability Threshold
To improve the model’s accuracy, multiple probability thresholds are compared. The probability threshold impacts both the sensitivity (true positive rate of development) and the specificity (true negative rate), as well as overall accuracy. However, because there are far more instances of no land cover change within the data, the model is better at predicting where development will not occur. Knowing this, a slightly higher probability threshold is chosen to improve the specificity rate, and predict the general, rather than precise, locations of new development.
| Variable | Sensitivity | Specificity | Accuracy |
|---|---|---|---|
| predClass_05 | 0.85 | 0.79 | 0.79 |
| predClass_17 | 0.55 | 0.94 | 0.93 |
Goodness of Fit Metrics for Final Model
The histogram below shows the probability of land development for each observation in the test data as predicted by Model 6. The majority of the predicted probabilities are clustered around zero, which makes sense given that most land in the dataset was not developed between 2001 and 2011.
Confusion Matrix
The confusion matrix highlighting the sensitivity and specificity rates of the logit model is included below.
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 7652 151
## 1 480 184
##
## Accuracy : 0.9255
## 95% CI : (0.9197, 0.931)
## No Information Rate : 0.9604
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.3333
##
## Mcnemar's Test P-Value : <0.0000000000000002
##
## Sensitivity : 0.54925
## Specificity : 0.94097
## Pos Pred Value : 0.27711
## Neg Pred Value : 0.98065
## Prevalence : 0.03957
## Detection Rate : 0.02173
## Detection Prevalence : 0.07842
## Balanced Accuracy : 0.74511
##
## 'Positive' Class : 1
##
Receiver Operating Characteristic (ROC) Curve
The final ROC curve, a plot of true positives and false positives as predicted by the final model, is shown below. The area under the curve is 0.901, indicating the final model is able to predict development change at 90.1% accuracy.
## Area under the curve: 0.901
Development Predictions by Threshold
The following plots highlight the observed development change, predicted development change at the low threshold, and predicted development change at the higher threshold. It is clear that the higher threshold predicts development more conservatively and accurately than the low threshold. This confirms the notion that the higher threshold is able to predict the general location of development, rather than the specific location, with reasonable accuracy.
Confusion Matrix by Threshold
The four plots below illustrate a similar point. Though the lower threshold correctly predicts a higher percentage of developed land, it does this by over-predicting developed land in general, which reduces the true rate of undeveloped land and the overall accuracy rate. For purposes of this model, it is better to balance the two rates for an overall higher accuracy rate. Because the higher threshold is more reluctant to predict developed land, the true rate of undeveloped land is much closer to the observed rate, leading to a higher accuracy rate.
Spatial Cross Validation
A special case of spatial K-fold cross-validation known as leave-one-out cross-validation is used to gauge the generalizability of the model. In this form of spatial cross-validation, the model loops through the counties in the Atlanta MSA, each iteration training itself on all counties except for one, which is reserved as the test county for which development is predicted in that iteration. This method tests the model’s capabilities on land cover and development microcosms within the larger study area. The confusion matrix statistics by County are provided in the table below. These results highlight four counties with no observed change between 2001 and 2011.
| county | Observed_Change | Sensitivity | Specificity | Accuracy |
|---|---|---|---|---|
| Barrow | 7 | 0.00 | 0.97 | 0.95 |
| Bartow | 14 | 0.43 | 0.96 | 0.95 |
| Butts | 1 | 0.00 | 1.00 | 1.00 |
| Carroll | 12 | 0.08 | 1.00 | 0.99 |
| Cherokee | 51 | 0.59 | 0.91 | 0.89 |
| Clayton | 34 | 0.50 | 0.87 | 0.83 |
| Cobb | 78 | 0.65 | 0.79 | 0.78 |
| Coweta | 14 | 0.64 | 0.95 | 0.94 |
| Dawson | 1 | 0.00 | 1.00 | 1.00 |
| DeKalb | 45 | 0.56 | 0.87 | 0.84 |
| Douglas | 18 | 0.61 | 0.85 | 0.84 |
| Fayette | 15 | 0.33 | 0.97 | 0.95 |
| Forsyth | 37 | 0.51 | 0.84 | 0.81 |
| Fulton | 92 | 0.62 | 0.83 | 0.81 |
| Gwinnett | 143 | 0.68 | 0.75 | 0.74 |
| Haralson | 2 | 0.00 | 1.00 | 0.99 |
| Heard | 0 | NA | 1.00 | 1.00 |
| Henry | 43 | 0.37 | 0.92 | 0.89 |
| Jasper | 0 | NA | 1.00 | 1.00 |
| Lamar | 1 | 0.00 | 1.00 | 1.00 |
| Meriwether | 0 | NA | 1.00 | 1.00 |
| Morgan | 1 | 0.00 | 1.00 | 1.00 |
| Newton | 19 | 0.47 | 0.96 | 0.94 |
| Paulding | 25 | 0.04 | 0.97 | 0.94 |
| Pickens | 4 | 0.25 | 1.00 | 0.99 |
| Pike | 0 | NA | 1.00 | 1.00 |
| Rockdale | 16 | 0.62 | 0.89 | 0.87 |
| Spalding | 4 | 0.00 | 1.00 | 0.99 |
| Walton | 11 | 0.09 | 1.00 | 0.98 |
Estimated Population Change
Using Model 6, which has already been trained on observed data from 2000 and 2010, this analysis will update the model’s inputs with a 2010 baseline to predict development demand for 2020. This requires population estimates for 2020, which were collected via the US Census website for all 29 Atlanta MSA Counties (5). The projected change by county is shown in the plot below.
Predicting Development Change: Demand-side Change
The population estimates for 2020 are then aggregated to the land observations. This allows the model to generate predicted development demand in 2020, based on the change in demand-side factors (population).
Predicting Development Change: Supply-side Change
Separately, a supply-side change is modeled using a major infrastructure intervention - a new expressway. Because this analysis assumes that proximity to existing infrastructure will increase development appeal, adding a new, targeted expressway on suitable development land could encourage development away from sensitive areas and change the development landscape in 2020.
The new expressway connects three burgeoning suburban areas, Marietta, Alpharetta, and Suwanee. The proposed expressway is built on existing two-lane roads and will serve to relieve congestion, better connect residents, and eliminate circuitous travel.
The plots below show that the new expressway has relatively little change on development demand MSA-wide. However, the following sections will show the expressway’s local impact on land development.
2011 Land Cover Types
With an understanding of predicted development demand, planners can now balance these predictions with the supply of environmentally viable land to promote and plan sustainable development. The plots below show Atlanta’s 2011 Land Uses.
Sensitive Land Cover Lost to Development
This analysis defines “sensitive land” as wetlands and farms. Other, similar analyses often classify forest as sensitive land, but given Atlanta’s growth potential and relative abundance of forests, considering forests as sensitive constricts development beyond reality. The plot below highlights sensitive land lost to development between 2001 and 2011.
Sensitive Regions
Additionally, the plot below highlights areas with more than one contiguous acre of sensitive lands. These areas should be avoided, or developed sparingly, to ensure sustainable development.
Summarize by County
Finally, the supply and demand changes are summarized for each of Atlanta’s 29 counties. These summaries can generally be used to determine which counties have high development demand and high levels of developable land. For purposes of this analysis, allocation of developable land will be focused on Cobb, Fulton, and Gwinnett counties, where the new expressways are proposed.
The plots below show the 2020 development potential and projected population, current development status, sensitive land cover designations, and expressway infrastructure for Cobb, Fulton, and Gwinnett Counties. These plots illustrate the varying population projections and distribution of sensitive lands in the Counties. While these can be used to suggest allocation strategies for land development, much of the allocation will depend on the zoning and relative development appetite at the local level. However, these comparisons can assist in determining which, if any, sensitive lands should be developed given relative demand, and can also show which already-developed lands are projected to lose population. The following sections comment on the differences observed in the plots, though the final allocation of land should be determined at the County level, rather than the MPO level.
Allocation for Cobb County - Connecting Marietta to Alpharetta
Cobb County is a populated area with many existing developments. However, the new expressway addition connects already developed land to areas of high development demand. While projected population change is negative in some areas, it appears that the population is growing in the areas near existing roads. We recommend both developing the suitable land near the expressway, as well as increasing the density of the existing developments experiencing population decline.
Allocation for Fulton County - Connecting Marietta to Alpharetta and Alpharetta to Suwanee
Fulton County is densely developed with unsuitable lands concentrated mostly in the southern portion of the County. The new expressway runs through the northern portion of the County, through some areas with high projected development demand. Similarly, the expressway runs through and near areas that are projected to experience the most population growth. Here, there is potential for additional development in the undeveloped areas where there are also few sensitive lands.
Allocation for Gwinnett County - Connecting Alpharetta to Suwanee
The northeastern portion of Gwinnett County has the highest development demand and relatively high projected population growth. Both the new expressway and existing expressway connect suitable, undeveloped areas for future development.