1 Introduction

The Pittsburgh Metropolitan Statistical Area (MSA) is composed of Allegheny, Armstrong, Beaver, Butler, Fayette, Washington, and Westmoreland Counties. Since its days as the “Forge of America”, Pittsburgh, together with its surrounding counties, has changed dramatically in the past 200 years¹. The development of sprawl in the region could be traced back to the industrial expansion in the city in the 1950s and the trend continues today. Nowadays, the Pittsburgh metropolitan area is still near the middle of the pack in terms of sprawl among the 221 U.S metropolitan areas, despite its suffering from the fleeing manufacturing jobs and the ongoing population loss. However, urban sprawl has slowed down in the last few years due to geographical barriers such as rivers and mountains, as well as the recent policies which reduced development and directed government funds more toward existing infrastructure².

Figure: Pittsburgh MSA Counties

We completed the growth modeling project in anticipation of the Southwestern Pennsylvania Commission (SPC)’s next large-scale comprehensive planning process for Pittsburgh Metropolitan Statistical Area (MSA). Our findings aim to help planners model the interplay between supply and demand-side insights and guide smart-growth development across to places where growth can have economic and sustainable impacts.

2 Data Wrangling and Feature Engineering

In this section, land cover, population, and highway network data will be aggregated into a 3,000 by 3,000 sq ft fishnet, as illustrated by the figure below.

Feature	Description	Source
Pittsburgh MSA and counties boundary	To be converted into 3000 by 3000 ft fishnet grid cells	IPUMS
Land cover	annual land cover in 2001 and 2011 for the Pittsburgh MSA; reclassification and distance to nearest 2 developed grid cells	the Multi-Resolution Land Characteristics Consortium’s National Land Cover (NLCD)
Population	Total population in 2000 and 2010 by tract; weighted interpolation into fishnet grid	Decennial Census
Highway network	Highway network covering the whole MSA In 2021; distance to the nearest highway	PennDOT

We forecasted change in development for 2021 with patterns from 2001-2011 through a binary logistic regression model per each 3000 foot fishnet grid.

2.1 Dependent Variable: Land Cover Change (Developed) from 2001 to 2011

To identify newly developed land between 2001 and 2011, we first identified changed land cover and then reclassified them to locate those that changed from undeveloped to developed. Please refer to the appendix (7.1.1 Land cover change) for details. From the map, we could see that the majority of development happened in the suburban area of Allegheny County along the border between Allegheny and its surrounding counties. While, in the outer region of Pittsburgh MSA, most of the lands that changed the land cover type from 2001 to 2011 changed to forest or crops.

2.2 Independent Variable: Land Cover in 2001

From the map below, we could see that forests occupied a huge amount of area, and developed land is mostly located within the City of Pittsburgh and in the urban cores of the surrounding counties dispersedly. Land cover types in 2001 could influence land cover changes in the future. They have been reclassified into six categories (7.1.2 Land Cover Classification) that are meaningful to the regression model.

2.3 Independent Variable: Population Change

Population features could be associated with development decisions. Tract-level data has been assigned to fishnet cell-level data (See appendix 7.1.3 Weighted interpolation for population). The maps below illustrate that some population has migrated from urban cores to the outer area, especially to Washington County.

2.4 Independent Variable: Distance to Highway

Highways have been an agent for sprawl and development for a lot of places, which includes Pittsburgh MSA, as we could see that most of the new development occurred near the highways.

Our model will account for the distance from each fishnet grid to its nearest highway segment. The lighter the color, the closer it is to the highway.

2.5 Independent Variable: Spatial lag of development

We assume that new development is more likely to happen near existing development due to the proximity to the jobs and opportunities and sufficient infrastructures like roads. From the map, we could see that most of the newly developed lands in 2011 are close to existing developed lands in 2001.

3 Exploratory Analysis

From exploring the features’ association with new land development, we confirmed that developments are generally closer to the highways and existing developments.

Compared with grid areas with no land cover changes, grid areas where new development occurs tend to have higher population. Moreover, grid areas with new development have seen increase in population, while the others have experienced the opposite, representing a dynamic process of migration.

Finally, the table of land cover conversion between 2001 and 2011 suggests that most of the new land development was converted from forests. Most of the lands are covered by forests in Pittsburgh MSA.

Land_Cover_Type	Conversion_Rate
Developed	0.19%
Farm	0.24%
Forest	0.69%
Other Undeveloped	0.01%
wetlands_2001	0%

4 Modeling for 2011

4.1 Model Selection

Six binomial logistic regression models are created to predict development change between 2001 and 2011.


	Dependent variable:

	lc_change
	(1)	(2)	(3)	(4)	(5)	(6)

wetlands_2001	-11.068	-6.658	-6.201	-6.133	-6.647	-6.513
	(441.372)	(372.022)	(374.784)	(374.557)	(371.744)	(363.132)

forest_2001	-1.000^***	1.267^***	1.755^***	1.802^***	1.257^***	1.352^***
	(0.255)	(0.285)	(0.332)	(0.341)	(0.287)	(0.287)

farm_2001	-0.532^*	2.087^***	2.634^***	2.688^***	2.080^***	2.282^***
	(0.299)	(0.348)	(0.396)	(0.404)	(0.350)	(0.356)

otherUndeveloped_2001	1.419	3.573^***	4.168^***	4.236^***	3.579^***	3.939^***
	(1.085)	(1.258)	(1.270)	(1.272)	(1.258)	(1.388)

lagDevelopment		-0.0003^***	-0.0003^***	-0.0003^***	-0.0003^***	-0.0002^***
		(0.00003)	(0.00003)	(0.00003)	(0.00003)	(0.00003)

pop_2000			0.001^***	-0.001
			(0.0002)	(0.001)

pop_2010				0.002^***
				(0.001)

pop_Change					0.002^**	0.002^**
					(0.001)	(0.001)

distance_highways						-0.0003^***
						(0.00005)

Constant	-3.498^***	-2.992^***	-3.790^***	-3.853^***	-2.974^***	-2.817^***
	(0.227)	(0.233)	(0.344)	(0.357)	(0.235)	(0.237)


Observations	9,143	9,143	9,143	9,143	9,143	9,143
Log Likelihood	-647.710	-529.250	-522.391	-518.822	-526.813	-506.872
Akaike Inf. Crit.	1,305.419	1,070.501	1,058.783	1,053.645	1,067.625	1,029.743

Note:	p<0.1; p<0.05; p<0.01

Model 6 has been selected as the final model for prediction. With the highest McFadden R-squared statistic value, it has the best goodness of fit. It also accounted for features that are theoretically important, though they may not be statistically significant.

4.2 Model Outcome

While the binomial model predicts developed or not-developed land, the MSA will choose a threshold to determine at what probability percentage would the land (grid area) be determined as developed. The density plot visualizes the distribution of predicted probability by observed class. Given how rare a new development occurred in 2011, most of the predicted probabilities are smaller than 1%.

4.2.1 Original plot

4.2.2 Zoomed-in plot

While a threshold of 0.02% best balances sensitivity and specificity (see appendix 7.1.4), it yields much lower overall accuracy. We thus compared different thresholds and found that while their specificity and accuracy do not differ much, the sensitivity of outcome at the 2% threshold is much larger than the rest.

Variable	Sensitivity	Specificity	Accuracy
predClass_002	0.96	0.55	0.55
predClass_02	0.84	0.78	0.79
predClass_03	0.76	0.84	0.84
predClass_05	0.54	0.93	0.93

It is obvious from the maps below that the 0.2% threshold could help predict a higher number of new development areas correctly than the 2% threshold, but it would predict a lot more no-change areas to be development areas, forecasting a false sense of growth. Comparing the predicted new developments to the observed new developments, we recommend the MSA to select the 2% threshold.

4.3 Model Validation

Ideally, our model should be suitable for every county in the MSA. In general, these confusion matrix metrics suggest that the model is generalizable to the counties that underwent significant development changes (see appendix 7.1.5 for more information).

county	Observed_Change	Sensitivity	Specificity	Accuracy
Allegheny	92	0.82	0.50	0.51
Armstrong	3	0.33	0.96	0.96
Beaver	10	0.40	0.82	0.81
Butler	18	0.72	0.88	0.88
Fayette	11	0.91	0.93	0.93
Washington	33	0.85	0.87	0.87
Westmoreland	40	0.68	0.87	0.87

5 2021 Forecast

To forecast for development in 2021, we use the model developed with 2001-2011 data and assumed that the patterns will hold.

5.1 Demande-side Change

5.1.1 Population change as a factor of development demand

With updated land cover data, the model also includes 2021 population estimates³ to predict demand-side change.

This map of probabilities suggests high demand in Allegheny County and its areas bordering the surrounding counties. It could help the MSA to guide development demand.

5.1.2 Demand and Environmental Sensitivity

However, demand must balanced with the supply of environmentally sensitive land. Wetlands and forests are defined as sensitive land cover, and thus are not suitable for development, while farmland and other undeveloped are suitable for development. The map shows there is several sensitive land lost from 2001 to 2011.

Although the new development would inevitably replace forests and wetlands, large natural habitations should be protected and conserved. The map below shows the continuous areas of wetlands or forests which are greater than 1 acre, and it suggests the continuous forests in the Pittsburgh MSA would prohibit further urban sprawl in the future. These will be accounted for in the allocation metrics.

5.1.3 Allocation Metrics by County

Allegheny County and Beaver County are the top two counties with the highest average demand. However, since the two counties have different natural conditions, their development strategies are different. For example:

Beaver County has the largest amount of sensitive land in the region, while it also has the highest proportion of suitable land for new development. Hence, Beaver County has development (and sprawling) potential, but will have to give thoughts to its natural resources for sustainable growth in the longer term;
Allegheny County, on the contrary, as the only county with positive population change and high demand for new development, its lack of suitable land and the constraint of conserved sensitive land could not support conventional growth. Instead, it could focus on infill development.

5.2 Supply-side Change

In this scenario, two new highways, as a part of the Southern Beltway projects⁴, were planned to connect the Pittsburgh International Airport with Washington County to facilitate the development of surrounding regions.

Figure: the Southern Beltway Project and selected highway

To project for the 2021 population after building the highway, we modeled population change with distance to highway feature, but the result coefficient is insignificant and positive, which conflicts with normal assumption.


	Dependent variable:

	pop_Change

wetlands_2001	43.426^***
	(15.394)

forest_2001	43.769^***
	(1.670)

farm_2001	44.853^***
	(1.876)

otherUndeveloped_2001	31.809^***
	(11.375)

lagDevelopment	-0.00002
	(0.00003)

distance_highways	0.0001
	(0.0001)

Constant	-44.391^***
	(1.518)


Observations	18,286
Log Likelihood	-99,286.150
Akaike Inf. Crit.	198,586.300

Note:	p<0.1; p<0.05; p<0.01

Hence, we referred to planning research that a 1% increase in transportation infrastructure stock will lead to 0.2% increase in the population⁵.

The maps below show the population change with and without the new highways, indicating that the new highways will attract slightly more population.

Predicting with our model, building the new highways will increase development potential, especially in the higher quintiles. Although the probabilities values are not high, they indicate that the new highways could bring new development to the MSA.

6 Allocation

The modeling process in the end is to aid allocation of development rights. We recommend the MSA to refer to our guidance while making development decisions, in order to achieve smart growth that balances development and sustainability. However, the final decision should be carefully made after evaluating opinions of different stakeholders.

Allocation of development will vary, depending on different conditions of each county. For example, Allegheny County has developed most of its land in the past years. With increased population and demand, it should focus on infill redevelopment in the urban core and consider a few new development in the outer area, such as areas near the new highway, to strengthen its connection with the surrounding counties.

Washington County, where the new highway development would be mainly constructed, should allocate more development rights to its northeastern area which is close to Allegheny County. With proximity to Allegheny County, highways, and increased population, the area is suitable for big-box commercial, which could increase the job opportunities and attract more people.

7 Appendices

7.1 Data Dictionary and Technical Explaination

7.1.1 Land cover change

To identify newly developed land between 2001 and 2011, we first found changed land cover change. To figure out the land cover change from 2001 to 2011, all the lands which were undeveloped in 2001 and developed in 2011 or have already developed in 2001 but changed to another developed land cover type in 2011 are reclassified as the new development in 2011. The map below illustrate the outcome of this step: while the points show location of land cover change, the color represents the change result (land cover in 2001). Then we reclassified them to find those that changed from undeveloped to developed.

Figure: Land Cover Change before Reclassification

7.1.2 Land cover change data

The table below shows descriptions of each categorical land cover type in the land cover data, which corresponds to the numeric values.

Definition	Value
Open Water - All areas of open water, generally with less than 25% cover or vegetation or soil.	11
Perennial Ice/Snow - All areas characterized by a perennial cover of ice and/or snow, generally greater than 25% of total cover.	12
Developed, Open Space - Includes areas with a mixture of some constructed materials, but mostly vegetation in the form of lawn grasses. Impervious surfaces account for less than 20 percent of total cover. These areas most commonly include large-lot single-family housing units, parks, golf courses, and vegetation planted in developed settings for recreation, erosion control, or aesthetic purposes.	21
Developed, Low Intensity -Includes areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 20-49 percent of total cover. These areas most commonly include single-family housing units.	22
Developed, Medium Intensity - Includes areas with a mixture of constructed materials and vegetation. Impervious surfaces account for 50-79 percent of the total cover. These areas most commonly include single-family housing units.	23
Developed, High Intensity - Includes highly developed areas where people reside or work in high numbers. Examples include apartment complexes, row houses and commercial/industrial. Impervious surfaces account for 80 to 100 percent of the total cover.	24
Barren Land (Rock/Sand/Clay) - Barren areas of bedrock, desert pavement, scarps, talus, slides, volcanic material, glacial debris, sand dunes, strip mines, gravel pits and other accumulations of earthen material. Generally, vegetation accounts for less than 15% of total cover.	31
Deciduous Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species shed foliage simultaneously in response to seasonal change.	41
Evergreen Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. More than 75 percent of the tree species maintain their leaves all year. Canopy is never without green foliage.	42
Mixed Forest - Areas dominated by trees generally greater than 5 meters tall, and greater than 20% of total vegetation cover. Neither deciduous nor evergreen species are greater than 75 percent of total tree cover.	43
Dwarf Scrub - Alaska only areas dominated by shrubs less than 20 centimeters tall with shrub canopy typically greater than 20% of total vegetation. This type is often co-associated with grasses, sedges, herbs, and non-vascular vegetation.	51
Shrub/Scrub - Areas dominated by shrubs; less than 5 meters tall with shrub canopy typically greater than 20% of total vegetation. This class includes true shrubs, young trees in an early successional stage or trees stunted from environmental conditions.	52
Grassland/Herbaceous - Areas dominated by grammanoid or herbaceous vegetation, generally greater than 80% of total vegetation. These areas are not subject to intensive management such as tilling, but can be utilized for grazing.	71
Sedge/Herbaceous - Alaska only areas dominated by sedges and forbs, generally greater than 80% of total vegetation. This type can occur with significant other grasses or other grass like plants, and includes sedge tundra, and sedge tussock tundra.	72
Lichens - Alaska only areas dominated by fruticose or foliose lichens generally greater than 80% of total vegetation.	73
Moss - Alaska only areas dominated by mosses, generally greater than 80% of total vegetation.	74
Pasture/Hay - Areas of grasses, legumes, or grass-legume mixtures planted for livestock grazing or the production of seed or hay crops, typically on a perennial cycle. Pasture/hay vegetation accounts for greater than 20 percent of total vegetation.	81
Cultivated Crops - Areas used for the production of annual crops, such as corn, soybeans, vegetables, tobacco, and cotton, and also perennial woody crops such as orchards and vineyards. Crop vegetation accounts for greater than 20 percent of total vegetation. This class also includes all land being actively tilled.	82
Woody Wetlands - Areas where forest or shrub land vegetation accounts for greater than 20 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.	90
Emergent Herbaceous Wetlands - Areas where perennial herbaceous vegetation accounts for greater than 80 percent of vegetative cover and the soil or substrate is periodically saturated with or covered with water.	95
Source
https://www.mrlc.gov/data/nlcd-2001-2011-land-cover-change-conus

7.1.3 Land Cover Classification

The table below shows how we reclassify land cover data into more useful categories.

Old_Classification	New_Classification
Open Space as well as Low, Medium and High Intensity Development	Developed
Deciduous, Evergreen and Mixed Forest	Forest
Pasture/Hay and Cultivated Crops	Farm
Woody and Emergent Herbaceous Wetlands	Woodlands
Barren Land, Dwarf Scrub and Grassland/Herbaceous	Other Undeveloped
Water	Water

7.1.4 Weighted interpolation for population

Weighted interpolation function was used to assign a tract’s population into a fishnet grid based on the proportion of the tract that intersects with the grid. This method assumes that the tract population is evenly distributed across the tract.

Figure: Figure: Population tract and fishnet comparison

7.1.5 Probability threshold

We chose several different thresholds including 2%, 3%, and 5% to compare with 0.02%, which is the cutoff point where F-score is the highest. The F-score is defined as the harmonic mean of the model’s precision and recall. Among the thresholds 2%, 3%, and 5%, the specificity and accuracy of them do not differ much, however, the sensitivity of threshold 2% is much larger than the rest of the two. Hence, we compared the mapping outcome with thresholds at 0.2% and 2%.

7.1.6 Generalizability across counties

Across-space generalizability is tested for whether the model is suitable for each county in the Pittsburgh MSA considering possible differences in land use planning. Since Armstrong, Beaver, Butler, and Fayette Counties have few observed changes, the results for them are not quite meaningful. The results for the rest of the counties are consistent with the result of the entire Pittsburgh MSA. Among these three counties, Washington County has the highest sensitivity level and the highest specificity level.

7.2 Code

7.2.1 Preparation

library(tidyverse)
library(sf)
library(raster)
library(knitr)
library(formattable)  # I use formattable to replace the kableExtra
# library(kableExtra)
library(tidycensus)
library(tigris)
options(tigris_class = "sf")
options(tigris_use_cache = TRUE)
library(FNN)
library(QuantPsyc)
library(caret)
library(yardstick)
library(pscl)
library(plotROC) 
library(ggrepel)
library(pROC)
library(grid)
library(gridExtra)
library(viridis)
library(exactextractr)
library(stringr)
library(censusapi)
library(ROCR)
library(igraph)
library(stargazer)

plotTheme <- function(base_size = 10) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 12,colour = "black"),
    plot.subtitle = element_text(face="italic"),
    plot.caption = element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),
    panel.grid.major = element_line("grey80", size = 0.1),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    strip.background = element_rect(fill = "grey80", color = "white"),
    strip.text = element_text(size=11),
    axis.title = element_text(size=11),
    axis.text = element_text(size=10),
    plot.background = element_blank(),
    legend.background = element_blank(),
    legend.title = element_text(colour = "black", face = "italic"),
    legend.text = element_text(colour = "black", face = "italic"),
    strip.text.x = element_text(size = 12)
  )
}

mapTheme <- function(base_size = 10) {
  theme(
    text = element_text( color = "black"),
    plot.title = element_text(size = 12,colour = "black"),
    plot.subtitle=element_text(face="italic"),
    plot.caption=element_text(hjust=0),
    axis.ticks = element_blank(),
    panel.background = element_blank(),axis.title = element_blank(),
    axis.text = element_blank(),
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.border = element_blank(),
    strip.text.x = element_text(size = 12))
}

palette2 <- c("#f8edeb", "#83c5be")
palette2b <- c("#f8edeb", "#fd9235")
palette3 <- c("#f8edeb", "#83c5be", "#3b7d76")
palette4 <- c("#a1dab4","#41b6c4","#2c7fb8","#253494")
palette5 <- c("#f7f6de","#edeec9","#b8dedc","#83c5be","#42999b")
palette5b <- c("#42999b","#83c5be","#b8dedc","#edeec9","#f7f6de")
palette10 <- c("#f7fcf0","#e0f3db","#ccebc5","#a8ddb5","#7bccc4",
               "#4eb3d3","#2b8cbe","#0868ac","#084081","#f7fcf0")

quintileBreaks <- function(df,variable) {
    as.character(quantile(df[[variable]],
                          c(.01,.2,.4,.6,.8),na.rm=T))
}

#This function can be used to convert a polygon sf to centroids xy coords.
xyC <- function(aPolygonSF) {
  as.data.frame(
    cbind(x=st_coordinates(st_centroid(aPolygonSF))[,1],
          y=st_coordinates(st_centroid(aPolygonSF))[,2]))
} 

#this function convert a raster to a data frame so it can be plotted in ggplot
rast <- function(inRaster) {
  data.frame(
    xyFromCell(inRaster, 1:ncell(inRaster)), 
    value = getValues(inRaster)) }

nn_function <- function(measureFrom,measureTo,k) {
  #convert the sf layers to matrices
  measureFrom_Matrix <-
    as.matrix(measureFrom)
  measureTo_Matrix <-
    as.matrix(measureTo)
  nn <-   
    get.knnx(measureTo, measureFrom, k)$nn.dist
    output <-
    as.data.frame(nn) %>%
    rownames_to_column(var = "thisPoint") %>%
    gather(points, point_distance, V1:ncol(.)) %>%
    arrange(as.numeric(thisPoint)) %>%
    group_by(thisPoint) %>%
    summarize(pointDistance = mean(point_distance)) %>%
    arrange(as.numeric(thisPoint)) %>% 
    dplyr::select(-thisPoint) %>%
    pull()
  
  return(output)  
}

aggregateRaster <- function(inputRasterList, theFishnet) {
  #create an empty fishnet with the same dimensions as the input fishnet
  theseFishnets <- theFishnet %>% dplyr::select()
  #for each raster in the raster list
  for (i in inputRasterList) {
  #create a variable name corresponding to the ith raster
  varName <- names(i)
  #convert raster to points as an sf
    thesePoints <-
      rasterToPoints(i) %>%
      as.data.frame() %>%
      st_as_sf(coords = c("x", "y"), crs = st_crs(theFishnet)) %>%
      filter(.[[1]] == 1)
  #aggregate to the fishnet
    thisFishnet <-
      aggregate(thesePoints, theFishnet, length) %>%
      mutate(!!varName := ifelse(is.na(.[[1]]),0,1))
  #add to the larger fishnet
    theseFishnets <- cbind(theseFishnets,thisFishnet)
  }
  #output all aggregates as one large fishnet
   return(theseFishnets)
  }

7.2.2 MSA Data

# load pittsburgh boundary
Pittsburgh_MSA <-
  st_read("Pittsburgh_MSA/Pittsburgh_MSA.shp") %>%
  st_transform("ESRI:102741") #102003

Pittsburgh_MSA_boundary <- Pittsburgh_MSA %>%
  group_by(STATEFP) %>%
  summarise() %>%
  st_transform("ESRI:102741")

MSA_fishnet <- 
  st_make_grid(Pittsburgh_MSA_boundary, 3000) %>%
  st_sf()

MSA_fishnet <- 
  MSA_fishnet[Pittsburgh_MSA_boundary,]

# Examine the MSA boundary and counties within
ggplot()+
  labs(title="Pittsburgh MSA Boundary and Counties",
       subtitle="Illustration of Fishnet, 3000 foot resolution")+
  geom_sf(data=Pittsburgh_MSA, aes(fill=as.factor(NAME)))+
  geom_sf(data=MSA_fishnet, fill=NA, color="gray", alpha=0.5) +
  geom_sf(data=Pittsburgh_MSA, fill=NA, size=1.5, aes(colour=as.factor(NAME)))+
  scale_color_viridis(discrete = T,
                     name ="County")+
  scale_fill_viridis(discrete = T,
                     name ="County")+
  mapTheme()

7.2.3 Land Cover Change (Developed) from 2001 to 2011

# Load land cover data for both 2001 and 2011
lc_2001 <- raster("msa_2001lc.tif")
lc_2001 <-projectRaster(lc_2001,crs=crs(Pittsburgh_MSA))

lc_2011 <- raster ("msa_2011lc.tif")
lc_2011 <- projectRaster(lc_2011,crs=crs(Pittsburgh_MSA) )

# aggregate
lc_2001 <- raster::aggregate(lc_2001, fact = 30, fun = modal)
lc_2011 <- raster::aggregate(lc_2011, fact = 30, fun = modal)

# calculate land cover change
lc_change <- (lc_2001 != lc_2011) * lc_2011

# Reclass land cover change
reclassMatrix <- 
  matrix(c(
    0,12,0,
    12,24,1,
    24,Inf,0),
    ncol=3, byrow=T)
lc_change_reclass <- reclassify(lc_change, reclassMatrix)
lc_change_reclass[lc_change_reclass < 1] <- 0
names(lc_change_reclass) <- "lc_change_reclass"

# Rename land cover change for later analysis
lc_change_extract <- data.frame(exact_extract(lc_change_reclass, MSA_fishnet, fun = "mode"))
names(lc_change_extract) <- "developed_0111"

# Create fishnet with land cover change for 2001
MSA_fishnet01 <- cbind(data.frame(MSA_fishnet),lc_change_extract)

MSA_fishnet01 <- 
  MSA_fishnet01 %>%
  st_as_sf(.) %>%
  st_transform("ESRI:102741")

# Plot land cover change (new development between 01-11)
ggplot() +
  geom_sf(data=MSA_fishnet01,aes(fill=as.factor(developed_0111)),color='transparent')+
  scale_fill_manual(values=palette2b, name ="",labels=c("No Change","New Development"))+
  labs(title="Pittsburgh MSA Development Land Use Change (2001 - 2011)") +
  geom_sf(data=Pittsburgh_MSA, fill=NA, size=0.5, alpha=0.2, aes(colour=as.factor(NAME)))+
  scale_color_viridis(discrete = T, name ="County")+
  mapTheme()

# Plot land cover type from raster
ggplot() +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_raster(data=rast(lc_2001) %>% na.omit %>% filter(value > 0), 
              aes(x,y,fill=as.factor(round(value,0)))) +
  scale_fill_viridis(discrete=TRUE,direction=-1, name ="") +
  labs(title = "Pittsburgh MSA Land Cover, 2001",
       subtitle = "Land cover value dictionary see appendix 6.1") +
  mapTheme()

7.2.4 Land Cover in 2001

# Reclass for different land cover types
developed <- lc_2001 == 21 | lc_2001 == 22 | lc_2001 == 23 | lc_2001 == 24
forest <- lc_2001 == 41 | lc_2001 == 42 | lc_2001 == 43 
farm <- lc_2001 == 81 | lc_2001 == 82 
wetlands <- lc_2001 == 90 | lc_2001 == 95 
otherUndeveloped <- lc_2001 == 52 | lc_2001 == 71 | lc_2001 == 31 
water <- lc_2001 == 11
names(developed) <- "developed"
names(forest) <- "forest"
names(farm) <- "farm"
names(wetlands) <- "wetlands"
names(otherUndeveloped) <- "otherUndeveloped"
names(water) <- "water"

layer_list <- list(developed, wetlands, forest, farm, otherUndeveloped, water)
names(layer_list) <- c("developed_2001", "wetlands_2001", "forest_2001", 
                       "farm_2001", "otherUndeveloped_2001", "water_2001")

fish_extract <- function(fishnet, layers) {
  extract_list <- exact_extract(layers, fishnet, fun = "mode")
  return(extract_list)
}


lc_2001_extracts <- lapply(layer_list, fish_extract, fishnet = MSA_fishnet01)
lc_2001_extracts <- data.frame(do.call(cbind, args = lc_2001_extracts))
MSA_fishnet01 <- cbind(data.frame(MSA_fishnet01), lc_2001_extracts) # join to fishnet

MSA_fishnet01 <- 
  MSA_fishnet01 %>%
  st_as_sf(.) %>%
  st_transform("ESRI:102741")

lc_2001_lcvar <-
  MSA_fishnet01 %>%
  gather(var,value,developed_2001:water_2001) %>%
  dplyr::select(var,value,geometry) %>%
  st_as_sf()

lc_2001_lcvar$var <- as.factor(str_sub(lc_2001_lcvar$var,1,-6))

# plot land cover by type
ggplot()+
  geom_sf(data=lc_2001_lcvar,aes(fill=as.factor(value)),color='transparent')+
  facet_wrap(~var)+
  scale_fill_manual(values=palette2, labels=c("Other","Land Cover"),name ="")+
  labs(title = "Land cover types, 2001",
       subtitle = "As fishnet centroids") +
  theme(legend.position = "bottom")+
  mapTheme()

7.2.5 Population change 2000-2010

# Wrangle population data
MSAPop10 <-
  get_decennial(geography = "tract",variables = "P001001",year=2010,
                state=42,county=c("Allegheny County","Armstrong County","Beaver County","Butler County","Fayette County",
                                  "Washington County","Westmoreland County"),geometry=TRUE) %>%
  rename(pop_2010 = value) %>%
  st_transform(st_crs(MSA_fishnet))

Sys.setenv(CENSUS_KEY = "2c3e9f9d2d65abab5f7b81fe418054415d363a43")
MSAPop00 <-getCensus(name = "dec/sf1", 
                  vintage = 2000, 
                  vars = c("NAME","P001001"),
                  region = "tract:*", 
                  regionin = "state:42+county:003,005,007,019,051,125,129")
MSAPop00$label <- paste0(MSAPop00$county,"0",MSAPop00$tract)
for (i in 1:721){
  if(nchar(MSAPop00[i,]$label)!=10){
    MSAPop00[i,]$label <-paste0(MSAPop00[i,]$label,"00")
  }
  
}

MSAtracts00 <- st_read("MSAtracts/MSAtracts.shp") %>%
  st_transform("ESRI:102741")

MSAtracts00$tract <-substr(MSAtracts00$GISJOIN2 ,4,13)
MSAtracts00$tract <- as.character(MSAtracts00$tract)

MSAPop00 <- left_join(MSAtracts00, MSAPop00,by=c("tract"="label"))
MSAPop00 <- MSAPop00 %>%
  dplyr::select(GISJOIN2,NAME,P001001,geometry) %>%
  rename(GEOID = GISJOIN2,pop_2000 = P001001)

# Plot pop 
grid.arrange(
ggplot() +
  geom_sf(data = MSAPop00, aes(fill=factor(ntile(pop_2000,5))), colour=NA) +
  scale_fill_manual(values = palette5,
                    labels=quintileBreaks(MSAPop00,"pop_2000"),
                   name="Quintile\nBreaks") +
  labs(title="Population, Pittsburgh MSA: 2000") +
  mapTheme(),

ggplot() +
  geom_sf(data = MSAPop10, aes(fill=factor(ntile(pop_2010,5))), colour=NA) +
  scale_fill_manual(values = palette5,
                    labels=quintileBreaks(MSAPop10,"pop_2010"),
                   name="Quintile\nBreaks") +
  labs(title="Population, Pittsburgh MSA: 2010") +
  mapTheme(), ncol=2)

MSA_fishnet01 <-
  MSA_fishnet01 %>%
  rownames_to_column("fishnetID") %>%
  mutate(fishnetID = as.numeric(fishnetID)) 

fishnetPop00 <-
  st_interpolate_aw(MSAPop00["pop_2000"],MSA_fishnet01,extensive=TRUE) %>%
  as.data.frame(.) %>%
  left_join(MSA_fishnet01,.,by="geometry") %>%
  mutate(pop_2000=replace_na(pop_2000,0)) %>%
  dplyr::select(fishnetID,pop_2000)

fishnetPop10 <-
  st_interpolate_aw(MSAPop10["pop_2010"],MSA_fishnet01,extensive=TRUE) %>%
  as.data.frame(.) %>%
  left_join(MSA_fishnet01,.,by="geometry") %>%
  mutate(pop_2010=replace_na(pop_2010,0)) %>%
  dplyr::select(pop_2010,fishnetID)

fishnetPop <- 
  cbind(fishnetPop00,fishnetPop10) %>%
  dplyr::select(pop_2000,pop_2010,fishnetID) %>%
  mutate(pop_Change = pop_2010 - pop_2000) %>% # calculate pop change
  st_drop_geometry()


MSA_fishnet01 <- left_join(MSA_fishnet01,data.frame(fishnetPop),by="fishnetID") # join to fishnet

# Compare tract / fishnet pop
grid.arrange(
ggplot() +
  geom_sf(data=MSAPop10, aes(fill=factor(ntile(pop_2010,5))),colour=NA) +
  scale_fill_manual(values = palette5,
                    labels=substr(quintileBreaks(MSAPop10,"pop_2010"),1,4),
                   name="Quintile\nBreaks") +
  labs(title="Population, Pittsburgh MSA: 2010",
       subtitle="Represented as tracts; Boundaries omitted") +
  mapTheme(),

ggplot() +
  geom_sf(data=MSA_fishnet01, aes(fill=factor(ntile(pop_2010,5))),colour=NA) +
  scale_fill_manual(values = palette5,
                   labels=substr(quintileBreaks(fishnetPop,"pop_2010"),1,4),
                   name="Quintile\nBreaks") +
  labs(title="Population, Pittsburgh MSA: 2010",
       subtitle="Represented as fishnet gridcells; Boundaries omitted") +
  mapTheme(), ncol=2)

# Plot pop change (fishnet) since it is not obvious
ggplot() +
  geom_sf(data = MSA_fishnet01, aes(fill=factor(ntile(pop_Change,5))), colour=NA) +
  scale_fill_manual(values = palette5,
                    labels=substr(quintileBreaks(MSA_fishnet01,"pop_Change"), 0,4),
                   name="Quintile\nBreaks") +
  labs(title="Population change, Pittsburgh MSA: 2000-2010") +
  mapTheme()

7.2.6 Distance to highway

# Wrangle highway data
PittsburghHighways <-
  st_read("MSAHighways/MSAHighways.shp") %>%
  st_transform(st_crs(Pittsburgh_MSA)) %>%
  st_intersection(Pittsburgh_MSA_boundary)

PittsburghHighways <- 
  PittsburghHighways %>%
  dplyr::select(CTY_CODE, DISTRICT_N,MAINT_FUNC,geometry)

palette2new <- c("NA", "#fd9235")

# plot highway
ggplot() +
  geom_sf(data=Pittsburgh_MSA, color=NA, fill="#f8edeb")+
  geom_sf(data=PittsburghHighways, color='#6c757d', size=1, alpha=0.5) +
  geom_sf(data=MSA_fishnet01, aes(fill=as.factor(developed_0111)),color='transparent') +
  scale_fill_manual(values=palette2new, name ="Land Cover\nChange",labels=c("No Change","New Development")) +
  labs(title = "Pittsburgh MSA New Development and Highways") +
  mapTheme()

# calculate dist to highway
emptyRaster <- lc_change
emptyRaster[] <- NA

highway_raster <- 
  as(PittsburghHighways,'Spatial') %>%
  rasterize(.,emptyRaster)

# writeRaster(highway_raster,"highway_raster.tif")

highway_raster_distance <- distance(highway_raster)
names(highway_raster_distance) <- "distance_highways"

highwayPoints <-
  rasterToPoints(highway_raster_distance) %>%
  as.data.frame() %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(MSA_fishnet01))

highwayPoints_fishnet <- 
  aggregate(highwayPoints, MSA_fishnet01, mean) %>%
  mutate(distance_highways = ifelse(is.na(distance_highways),0,distance_highways)) %>%
  rownames_to_column("fishnetID") %>%
  mutate(fishnetID = as.numeric(fishnetID)) 
  

highwayPoints_fishnet <-highwayPoints_fishnet %>%
  st_drop_geometry(.)

MSA_fishnet01 <- left_join(MSA_fishnet01,highwayPoints_fishnet,by="fishnetID") %>% # join to fishnet
  st_as_sf()

# plot dist to highway
ggplot() +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_point(data=MSA_fishnet01 , aes(x=xyC(MSA_fishnet01 )[,1], 
                                             y=xyC(MSA_fishnet01)[,2], 
                 colour=factor(ntile(distance_highways,5))),size=1.5) +
  scale_colour_manual(values = palette5,
                      labels=substr(quintileBreaks(MSA_fishnet01 ,"distance_highways"),1,8),
                      name="Quintile\nBreaks") +
  geom_sf(data=PittsburghHighways, color='#6c757d', size=1, alpha=0.8) +
  labs(title = "Distance to Highways",
       subtitle = "As fishnet centroids; Highways visualized in gray") +
  mapTheme()

7.2.7 Spatial lag of development

# Calculate spatial lag of dev
MSA_fishnet01$lagDevelopment <-
  nn_function(xyC(MSA_fishnet01),
              xyC(filter(MSA_fishnet01,developed_2001==1)),
              2)

# plot spatial lag of dev
ggplot() +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_point(data=MSA_fishnet01, 
             aes(x=xyC(MSA_fishnet01)[,1], y=xyC(MSA_fishnet01)[,2], 
                 colour=factor(ntile(lagDevelopment,5))), size=1.5) +
  scale_colour_manual(values = palette5,
                     labels=substr(quintileBreaks(MSA_fishnet01, "lagDevelopment"),1,7),
                     name="Quintile\nBreaks") +
  labs(title = "Spatial lag to 2001 development",
       subtitle = "As fishnet centroids",
       caption = "Accessibility to the existing development is measured by calculating the average distance from each grid to its 2 nearest developed neighboring grid cells in 2001") +
  mapTheme() +
  geom_sf(data=MSA_fishnet01, aes(fill=as.factor(developed_0111)),color='transparent', alpha=0.8) +
  scale_fill_manual(values=palette2new, name ="Land Cover\nChange",labels=c("No Change","New Development")) 

# rename land use change
colnames(MSA_fishnet01)[which (names(MSA_fishnet01)=="developed_0111")] <-"lc_change"
# create final dishnet with all features
if ( any("pop_2000.x" == names(MSA_fishnet01))) {
  MSA_fishnet01 <- 
    MSA_fishnet01 %>%
    mutate(pop_2000 = pop_2000.x,
         pop_2010 = pop_2010.x,
         pop_Change = pop_Change.x)
} # this happens sometimes

dat <-
  MSA_fishnet01 %>%
  dplyr::select(lc_change, developed_2001,wetlands_2001,forest_2001,farm_2001,otherUndeveloped_2001,water_2001,pop_2000,pop_2010, pop_Change,
                distance_highways, lagDevelopment)  %>%
  st_join(Pittsburgh_MSA) %>%
  mutate(developed11 = ifelse(lc_change==1 & developed_2001 == 1, 0, developed_2001)) %>%
  filter(water_2001 == 0)

dat$lc_change <- as.factor(dat$lc_change)

if (any( "geometry.x" == colnames(dat))) {
  dat <- dat %>%
  dplyr::mutate(geometry = geometry.x) %>%
  dplyr::select(-geometry.x)
}

7.2.8 Exploratory Analysis

# Create exploratory plot
dat %>%
  dplyr::select(distance_highways,lagDevelopment,lc_change) %>%
  gather(Variable, Value, -lc_change, -geometry) %>%
  ggplot(., aes(lc_change, Value, fill=lc_change)) + 
    geom_bar(position = "dodge", stat = "summary", fun = "mean") +
    facet_wrap(~Variable) +
    scale_fill_manual(values = palette2b,
                      labels=c("No Change","New Development"),
                      name="") +
    labs(title="New development as a function of the countinuous variables") +
    plotTheme() 

# Create exploratory plot
dat %>%
  dplyr::select(pop_2000,pop_2010,pop_Change,lc_change) %>%
  mutate(pop_Change=pop_Change) %>%
  gather(Variable, Value, -lc_change, -geometry) %>%
  ggplot(., aes(lc_change, Value, fill=lc_change)) + 
    geom_bar(position = "dodge", stat = "summary", fun = "mean") +
    facet_wrap(~Variable, scales="free") +
    scale_fill_manual(values = palette2b,
                      labels=c("No Change","New Development"),
                      name="") +
    labs(title="New development as a function of factor variables",
         subtitle = "Original Population Change") +
    plotTheme()

# Create exploratory table
table <-dat %>%
  dplyr::select(lc_change:otherUndeveloped_2001) %>%
  gather(Land_Cover_Type, Value, -lc_change, -geometry) %>%
   st_set_geometry(NULL) %>%
     group_by(lc_change, Land_Cover_Type) %>%
     summarize(n = sum(as.numeric(Value))) %>%
     ungroup() %>%
    mutate(Conversion_Rate = paste0(round(100 * n/sum(n), 2), "%")) %>%
    filter(lc_change == 1) %>%
  dplyr::select(Land_Cover_Type,Conversion_Rate) 
table <- data.frame(table)
table$Land_Cover_Type[1]<-"Developed"
table$Land_Cover_Type[2]<-"Farm"
table$Land_Cover_Type[3]<-"Forest"
table$Land_Cover_Type[4]<-"Other Undeveloped"
formattable(table,
            align=c("l","l"))

7.2.9 Modeling

# Start modeling process by create testing and training samples
addTaskCallback(function(...) {set.seed(2021);TRUE})

trainIndex <- 
  createDataPartition(dat$developed_2001, p = .50,
                                  list = FALSE,
                                  times = 1)
datTrain <- dat[ trainIndex,]
datTest  <- dat[-trainIndex,]

Model1 <- glm(lc_change ~ wetlands_2001 + forest_2001 + farm_2001 + otherUndeveloped_2001, 
              family="binomial"(link="logit"), data = datTrain)

Model2 <- glm(lc_change ~ wetlands_2001 + forest_2001  + farm_2001 + otherUndeveloped_2001 + lagDevelopment, 
              family="binomial"(link="logit"), data = datTrain)
              
Model3 <- glm(lc_change ~ wetlands_2001 + forest_2001  + farm_2001 + otherUndeveloped_2001 + lagDevelopment + pop_2000, 
              family="binomial"(link="logit"), data = datTrain)          
              
Model4 <- glm(lc_change ~ wetlands_2001 + forest_2001  + farm_2001 + otherUndeveloped_2001 + lagDevelopment + pop_2000 + 
              pop_2010, 
              family="binomial"(link="logit"), data = datTrain)              
            
Model5 <- glm(lc_change ~ wetlands_2001 + forest_2001  + farm_2001 + otherUndeveloped_2001 + lagDevelopment + pop_Change, 
              family="binomial"(link="logit"), data = datTrain)              
              
Model6 <- glm(lc_change ~ wetlands_2001 + forest_2001  + farm_2001 + otherUndeveloped_2001 + lagDevelopment + pop_Change + 
              distance_highways, 
              family="binomial"(link="logit"), data = datTrain) 

# Output model comparison table
stargazer(Model1, Model2, Model3, Model4, Model5, Model6, type='html')


# Prepare to compare McFadden r2
modelList <- paste0("Model", 1:6)
models <- map_dfc(modelList, function(x)pR2(get(x)))[4,] %>%
  setNames(paste0("Model",1:6)) %>%
  gather(Model,McFadden) %>%
  as.data.frame()

# Plot Mcfadden R2
ggplot(models, aes(Model, McFadden)) +
  geom_bar(stat="identity", fill="#b8dedc") +
  labs(title= "McFadden R-Squared by Model") +
  plotTheme()

# Get predicted prob with the best model
testSetProbs <- 
  data.frame(class = datTest$lc_change,
             probs = predict(Model6, datTest, type="response")) 

# Plot probability density
ggplot(testSetProbs, aes(probs)) +
  geom_density(aes(fill=class), alpha=0.5) +
  scale_fill_manual(values = palette2,
                    labels=c("No Change","New Development")) +
  labs(title = "Density plot of test set predicted probabilities",
       x="Predicted Probabilities",y="Density") +
  plotTheme()

# zoom closer
ggplot(testSetProbs, aes(probs)) +
  geom_density(aes(fill=class), alpha=0.5) +
  xlim(0, 0.13)+
  ylim(0, 100)+
  scale_fill_manual(values = palette2,
                    labels=c("No Change","New Development")) +
  labs(title = "Density plot of test set predicted probabilities",
       x="Predicted Probabilities",y="Density") +
  plotTheme()

# Calculate the cutoff that balance sensitivity and specificity
options(yardstick.event_first = FALSE)

classProbs <- predict (Model6, datTest, type='response')

testProbs <- data.frame (obs = as.numeric(datTest$developed11), pred = classProbs)

pred <- prediction(testProbs[is.na(testProbs$pred)==FALSE,]$pred,testProbs[is.na(testProbs$pred)==FALSE,]$obs)
f.perf<-performance(pred,"f")
plot(f.perf)

F.score <-c(f.perf@y.values[[1]])
cutoff<-c(f.perf@x.values[[1]])
F.score_table<-data.frame(cbind(F.score,cutoff))
F.score_table[which.max(F.score_table$F.score),] #0.002;

testSetProbs <- 
  testSetProbs %>% 
  mutate(
        predClass_002 = as.factor(ifelse(testSetProbs$probs >= 0.002 ,1,0)),
        predClass_02 = as.factor(ifelse(testSetProbs$probs >= 0.02 ,1,0)),
        predClass_03 = as.factor(ifelse(testSetProbs$probs >= 0.03 ,1,0)), # V
        predClass_05 = as.factor(ifelse(testSetProbs$probs >= 0.05 ,1,0))
         ) 

# Output table that compares different threshold
testSetProbs %>%
  dplyr::select(-probs) %>%
  gather(Variable, Value, -class) %>%
  group_by(Variable) %>%
  summarize(Sensitivity = round(yardstick::sens_vec(class,factor(Value)),2),
            Specificity = round(yardstick::spec_vec(class,factor(Value)),2),
            Accuracy = round(yardstick::accuracy_vec(class,factor(Value)),2)) %>% 
  formattable(table,
            align=c("l","l"))

# Plot map outcome with different threshold
predsForMap <-         
  dat %>%
    mutate(probs = predict(Model6, dat, type="response") ,
           Threshold_0.2_Pct = as.factor(ifelse(probs >= 0.002 ,1,0)),
           Threshold_2_Pct =  as.factor(ifelse(probs >= 0.02, 1,0))) %>%
    dplyr::select(lc_change,Threshold_0.2_Pct,Threshold_2_Pct) %>%
    gather(Variable,Value, -geometry) %>%
    st_cast("POLYGON")

ggplot() +
  geom_point(data=predsForMap, aes(x=xyC(predsForMap)[,1], y=xyC(predsForMap)[,2], colour=Value)) +
  facet_wrap(~Variable) +
  scale_colour_manual(values = palette2b, labels=c("No Change","New Development"),
                      name="") +
  labs(title="Development predictions - Threshold comparison") + 
  theme(legend.position="bottom") +
  mapTheme()

# Plot map outcome with diff threshold for different Confusion Matrix measures
ConfusionMatrix.metrics <-
  dat %>%
    mutate(probs = predict(Model6, dat, type="response") ,
           Threshold_0.2_Pct = as.factor(ifelse(probs >= 0.002 ,1,0)),
           Threshold_2_Pct =  as.factor(ifelse(probs >= 0.02, 1,0))) %>%
    mutate(TrueP_0.2 = ifelse(lc_change  == 1 & Threshold_0.2_Pct == 1, 1,0),
           TrueN_0.2 = ifelse(lc_change  == 0 & Threshold_0.2_Pct == 0, 1,0),
           TrueP_2 = ifelse(lc_change  == 1 & Threshold_2_Pct == 1, 1,0),
           TrueN_2 = ifelse(lc_change  == 0 & Threshold_2_Pct == 0, 1,0)) %>%
    dplyr::select(., starts_with("True")) %>%
    gather(Variable, Value, -geometry) %>%
    st_cast("POLYGON") 

ggplot(data=ConfusionMatrix.metrics) +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_point(aes(x=xyC(ConfusionMatrix.metrics)[,1], 
                 y=xyC(ConfusionMatrix.metrics)[,2], colour = as.factor(Value))) +
  facet_wrap(~Variable) +
  scale_colour_manual(values = palette2, labels=c("Correct","Incorrect"),
                       name="") +
  labs(title="Development predictions - map by confusion matrix") + 
  theme(legend.position="bottom") +
  mapTheme()

# Spatial CV
spatialCV <- function(dataFrame, uniqueID, dependentVariable, modelName) {

#initialize a data frame 
endList <- list()

#create a list that is all the spatial group unqiue ids in the data frame (ie counties)    
  uniqueID_List <- unique(dataFrame[[uniqueID]])  
  x <- 1
  y <- length(uniqueID_List)
  
#create a counter and while it is less than the number of counties...  
  while(x <= y) 
  {
#call a current county    
    currentUniqueID <- uniqueID_List[x]
#create a training set comprised of units not in that county and a test set of units
#that are that county
    training <- dataFrame[ which(dataFrame[[uniqueID]] != uniqueID_List[x]),]
    testing <- dataFrame[ which(dataFrame[[uniqueID]] == uniqueID_List[x]),]
#create seperate xy vectors
    trainingX <- training[ , -which(names(training) %in% c(dependentVariable))]
    testingX <- testing[ , -which(names(testing) %in% c(dependentVariable))]
    
    trainY <- training[[dependentVariable]]
    testY <- testing[[dependentVariable]]
#Calculate predictions on the test county as part of a data frame including the observed
#outcome and the unique county ID    
   thisPrediction <- 
     data.frame(class = testY,
                probs = predict(modelName, testingX, type="response"),
                county = currentUniqueID) 

#Row bind the predictions to a data farme
   endList <- rbind(endList, thisPrediction)
#iterate counter    
    x <- x + 1 
  } 
#return the final list of counties and associated predictions  
  return (as.data.frame(endList))
}

spatialCV_counties <-
  spatialCV(dat,"NAME","lc_change", Model6) %>%
  mutate(predClass = as.factor(ifelse(probs >= 0.03 ,1,0)))

# Output table of measures for different counties
spatialCV_metrics <-
  spatialCV_counties %>% 
    group_by(county) %>% 
    summarize(Observed_Change = sum(as.numeric(as.character(class))),
              Sensitivity = round(yardstick::sens_vec(class,predClass),2),
              Specificity = round(yardstick::spec_vec(class,predClass),2),
              Accuracy = round(yardstick::accuracy_vec(class,predClass),2)) 

spatialCV_metrics %>%
  formattable(table,
            align=c("l","l"))

7.2.10 2021 Forecast

# Reclass 2011 land cover type
developed11 <- lc_2011 == 21 | lc_2011 == 22 | lc_2011 == 23 | lc_2011 == 24
forest11 <- lc_2011 == 41 | lc_2011 == 42 | lc_2011 == 43 
farm11 <- lc_2011 == 81 | lc_2011 == 82 
wetlands11 <- lc_2011 == 90 | lc_2011 == 95 
otherUndeveloped11 <- lc_2011 == 52 | lc_2011 == 71 | lc_2011 == 31 
water11 <- lc_2011 == 11

names(developed11) <- "developed11"
names(forest11) <- "forest11"
names(farm11) <- "farm11"
names(wetlands11) <- "wetlands11"
names(otherUndeveloped11) <- "otherUndeveloped11"
names(water11) <- "water11"

layer_list_11 <- list(developed11, wetlands11, forest11, farm11, otherUndeveloped11, water11)
names(layer_list_11) <- c("developed_2011", "wetlands_2011", "forest_2011", 
                       "farm_2011", "otherUndeveloped_2011", "water_2011")

fish_extract <- function(fishnet, layers) {
  extract_list <- exactextractr::exact_extract(layers, fishnet, fun = "mode")
  return(extract_list)
}


lc_2011_extracts <- lapply(layer_list_11, fish_extract, fishnet = MSA_fishnet)
lc_2011_extracts <- data.frame(do.call(cbind, args = lc_2011_extracts))
MSA_fishnet11 <- cbind(data.frame(MSA_fishnet), lc_2011_extracts)

if ( any("pop_2000.x" == names(MSA_fishnet11))) {
  MSA_fishnet11 <- 
    MSA_fishnet11 %>%
    mutate(pop_2000 = pop_2000.x,
         pop_2010 = pop_2010.x,
         pop_Change = pop_Change.x)
} 

lc_2011_lcvar <-
  MSA_fishnet11 %>%
  gather(var,value,developed_2011:water_2011) %>%
  dplyr::select(var,value,geometry) %>%
  st_as_sf()

lc_2011_lcvar$var <- as.factor(str_sub(lc_2011_lcvar$var,1,-6))

# Join with 2001 fishnet to collect pop features that were already there
MSA_fishnet11 <-
  MSA_fishnet11 %>%
  rownames_to_column("fishnetID") %>%
  mutate(fishnetID = as.numeric(fishnetID)) 

MSA_fishnet0111 <- left_join(MSA_fishnet11, MSA_fishnet01, by="fishnetID") %>%
  st_as_sf()

MSA_fishnet11 <-
  MSA_fishnet0111 %>%
  dplyr::select(-developed_2001, -wetlands_2001, -forest_2001, -otherUndeveloped_2001, -farm_2001, -water_2001)


MSA_fishnet11$lagDevelopment <-
  nn_function(xyC(MSA_fishnet11),
              xyC(filter(MSA_fishnet11,developed_2011==1)),
              2)

colnames(MSA_fishnet11)[which (names(MSA_fishnet11)=="developed_0111")] <-"lc_change"

MSA_fishnet11 <- MSA_fishnet11 %>%
  mutate(developed11 = ifelse(lc_change==1 & developed_2011 == 1, 0, developed_2011)) 

dat11 <-
  MSA_fishnet11 %>%
  dplyr::select(lc_change, developed_2011,wetlands_2011,forest_2011,farm_2011,otherUndeveloped_2011,water_2011,pop_2000,pop_2010, pop_Change,
                distance_highways, lagDevelopment, fishnetID)  %>%
  st_join(Pittsburgh_MSA) %>%
  filter(water_2011 == 0)

# Force variable names to be same with model *2011 features will be represented by 2001 in variable names*
dat11 <-
  dat11 %>%
  dplyr::mutate(developed_2001 = developed_2011,
                wetlands_2001 = wetlands_2011,
                forest_2001 = forest_2011,
                farm_2001 = farm_2011,
                otherUndeveloped_2001 = otherUndeveloped_2011) %>%
  dplyr::select(-developed_2011, -wetlands_2011, -forest_2011, -farm_2011, -otherUndeveloped_2011)

# Compare land cover between 2001 and 2011
ggplot() +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_raster(data = rbind(rast(lc_2001) %>% mutate(label = "2001"),
                           rast(lc_2011) %>% mutate(label = "2011")) %>% 
              na.omit %>% filter(value > 0), 
              aes(x,y,fill=as.factor(value))) +
  facet_wrap(~label) +
  scale_fill_viridis(discrete=TRUE, direction=-1,name ="") +
  labs(title = "Land Cover, 2001 & 2011") +
  mapTheme() + theme(legend.position = "none")

# Plot 2011 land cover type
ggplot()+
  geom_sf(data=lc_2011_lcvar, aes(fill=as.factor(value)),color='transparent')+
  facet_wrap(~var)+
  scale_fill_manual(values=palette2, labels=c("Other","Land Cover"),name ="")+
  labs(title = "Land cover types, 2011",
         subtitle = "As fishnet centroids") +
  mapTheme()

 Write in population project for 2021 by county
countyPopulation_2021 <- 
  dat %>%
       st_set_geometry(NULL) %>%
       group_by(NAME) %>%
       summarize(county_population_2011 = sum(pop_2010)) %>%
  left_join(
    data.frame(
   NAME = 
     c("Fayette","Washington","Westmoreland","Allegheny","Beaver","Armstrong","Butler"),
   county_projection_2021 = 
     c(129274-1049*2, 206865-153*2, 348899-1560*2, 1216045-1236*2, 163929-653*2, 64735-617*2, 187853+215*2))
  )

# Data source: 2019 estimate from upitt https://www.ucsur.pitt.edu/perspectives.php?b=20221115818670

# Plot population projection without new highway
countyPopulation_2021 %>%
  gather(Variable,Value, -NAME) %>%
  ggplot(aes(reorder(NAME,-Value),Value)) +
  geom_bar(aes(fill=Variable), stat = "identity", position=position_dodge()) +
  scale_fill_manual(values = palette2,
                    labels=c("2011","2021"),
                    name="Population") +
  labs(title="Population Change by county: 2011 - 2021",
       x="County", y="Population",
       caption="*Allegheny County is projected to have the greatest population by 2021 in the Pittsburgh MSA, \nfollowed by Westmoreland and Washington Counties. Since most of Allegheny County has \nbeen developed, it suggests the development scenario in Allegheny County will involve more \ninfill development.") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  plotTheme()

# Demand scenario without any new development, continue growth from past pattern (2001-2011)
# However, the population data will be from upitt's 2021 projection instead of 2010 census

dat_infill <-
  dat11 %>%
  #calculate population change
    left_join(countyPopulation_2021) %>%
    mutate(proportion_of_county_pop = pop_2010 / county_population_2011,
           pop_2021.infill = proportion_of_county_pop * county_projection_2021,
           pop_Change = pop_2021.infill - pop_2010) %>%
    dplyr::select(-county_projection_2021, -county_population_2011, 
                  -proportion_of_county_pop, -pop_2021.infill) %>%
  #predict for 2021;
    mutate(predict_2021.infill = predict(Model6,. , type="response"))

dat_infill %>%
  ggplot() +  
  geom_point(aes(x=xyC(dat_infill)[,1], y=xyC(dat_infill)[,2], colour = factor(ntile(predict_2021.infill,5)))) +
  scale_colour_manual(values = palette5,
                    labels=substr(quintileBreaks(dat_infill, "predict_2021.infill"),1,8),
                    name="Predicted \nProbabilities \nin Quintile\nBreaks") +
  geom_sf(data=Pittsburgh_MSA, fill=NA, colour="black", size=1.5) +
  labs(title
       
       
dat2 <-
  MSA_fishnet0111 %>%
   mutate(sensitive_lost11 = ifelse(forest_2001 == 1 & forest_2011 == 0 |
                                    wetlands_2001 == 1 & wetlands_2011 == 0,1,0)) %>% # this includes lands that are not developed at all (farm)
  st_transform(st_crs(Pittsburgh_MSA))
                      
ggplot() +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_point(data=dat2, aes(x=xyC(dat2)[,1], y=xyC(dat2)[,2], colour=as.factor(sensitive_lost11))) +
  scale_colour_manual(values = palette2,
                      labels=c("No Change","Sensitive Lost"),
                      name = "") +
  labs(title = "Sensitive lands lost: 2001 - 2011",
       subtitle = "As fishnet centroids") +
  mapTheme()

sensitiveRegions <- 
  raster::clump(wetlands11 + forest11) %>%
  rasterToPolygons() %>%
  st_as_sf() %>%
  group_by(clumps) %>% summarize() %>%
    mutate(Acres = as.numeric(st_area(.) * 0.0000229568)) %>%
    filter(Acres > 3954)  %>%
  dplyr::select() %>%
  rasterize(.,emptyRaster) 
sensitiveRegions[sensitiveRegions > 0] <- 1  
names(sensitiveRegions) <- "sensitiveRegions"

dat2 <-
  aggregateRaster(c(sensitiveRegions), dat2) %>%
  dplyr::select(sensitiveRegions) %>%
  st_set_geometry(NULL) %>%
  bind_cols(.,dat2) %>%
  st_sf()

ggplot() +
  geom_sf(data=Pittsburgh_MSA_boundary) +
  geom_point(data=dat2, aes(x=xyC(dat2)[,1], y=xyC(dat2)[,2], colour=as.factor(sensitiveRegions))) +
  scale_colour_manual(values = palette2,
                      labels=c("Other","Sensitive Regions"),
                      name="") +
  labs(title = "Sensitive regions",
       subtitle = "Continous areas of either wetlands or forests\ngreater than 1 acre") +
  mapTheme()

# Allocation metrics by county
# join demand and dat together
dat_sens <- dat2 %>%
  dplyr::select(sensitiveRegions, sensitive_lost11, fishnetID) %>%
  st_drop_geometry()

dat_infill <- 
  left_join(dat_infill, dat_sens) %>%
  mutate(Development_Demand = predict_2021.infill) %>%
  filter(is.na(Development_Demand)==F)

county_specific_metrics <- 
  dat_infill %>%
  left_join(st_set_geometry(dat, NULL) %>% group_by(NAME) %>% summarize(count = n())) %>%
  #calculate summary statistics by county
  group_by(NAME) %>%
  summarize(Total_Farmland = sum(farm_2001) / max(count),
            Total_Forest = sum(forest_2001) / max(count),
            Total_Wetlands = sum(wetlands_2001) / max(count),
            Total_Undeveloped = sum(otherUndeveloped_2001) / max(count),
            Sensitive_Land_Lost = sum(sensitive_lost11) / max(count),
            Sensitive_Regions = sum(sensitiveRegions) / max(count),
            Mean_Development_Demand = mean(Development_Demand)) %>%
  #get population data by county
  left_join(countyPopulation_2021 %>% 
            mutate(Population_Change = county_projection_2021 - county_population_2011,
                   Population_Change_Rate = Population_Change / county_projection_2021) %>%
            dplyr::select(NAME,Population_Change_Rate))

if ( any("geometry.x" == colnames(county_specific_metrics))) {
  county_specific_metrics <- county_specific_metrics %>%
    st_drop_geometry()
}

# Plot county allocation metrics
county_specific_metrics %>%
  gather(Variable, Value, -NAME) %>%
  mutate(Variable = factor(Variable, levels=c("Population_Change_Rate","Mean_Development_Demand",
                                              "Total_Farmland","Total_Undeveloped","Total_Forest",
                                              "Total_Wetlands","Sensitive_Land_Lost","Sensitive_Regions",
                                              ordered = TRUE))) %>%
  mutate(Planning_Designation = case_when(
    Variable == "Population_Change_Rate" | Variable == "Mean_Development_Demand" ~ "Demand-Side",
    Variable == "Total_Farmland" | Variable == "Total_Undeveloped"               ~ "Suitable",
    TRUE                                                                         ~ "Not Suitable")) %>%
  ggplot(aes(x=Variable, y=Value, fill=Planning_Designation)) +
    geom_bar(stat="identity", position=position_dodge(), colour="black") +
    facet_wrap(~NAME, ncol=4) +
    coord_flip() +
    scale_y_continuous(breaks = seq(0, 1, by = 0.5)) +
    geom_vline(xintercept = 2.5) + geom_vline(xintercept = 4.5) +
    scale_fill_manual(values=c("#3b7d64","#e29578","#42999b")) +
    labs(title= "County Specific Allocation Metrics", subtitle= "As rates", x="Indicator", y="Rate") +
    plotTheme() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position="bottom")


# New highway! Supply-side change
# Model relationship between highway and pop change
popProject <- glm(pop_Change ~ wetlands_2001 + forest_2001  + farm_2001 + otherUndeveloped_2001 + lagDevelopment + distance_highways,
  data = dat)

# Output model table
stargazer(popProject, type='html')

# Update pop projection if proposed highway was built
countyPopulation_2021_withHighway <- 
  countyPopulation_2021 %>%
  mutate(new_county_projection_2021 = ifelse( NAME == "Allegheny" | NAME == "Washington", county_projection_2021 * (1.22*0.002 + 1), county_projection_2021))

# Data source: 2019 estimate from upitt https://www.ucsur.pitt.edu/perspectives.php?b=20221115818670

# Plot pop comparisin, 2011, 2021, 2021 with highway
countyPopulation_2021_withHighway %>%
  gather(Variable,Value, -NAME) %>%
  ggplot(aes(reorder(NAME,-Value),Value)) +
  geom_bar(aes(fill=Variable), stat = "identity", position=position_dodge()) +
  scale_fill_manual(values = palette3,
                    labels=c("2011","2021","2021 with highway"),
                    name="Population") +
  labs(title="Population Change by county: 2011 - 2021",
       x="County", y="Population") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  plotTheme()

# Update distance to highway
newHighways <-
  st_read("New_Highways/New_Highways.shp") %>%
  st_transform(st_crs(Pittsburgh_MSA))

updatedHighways <-
  st_read("Updated_HWay/Updated_HWay.shp") %>%
  st_transform(st_crs(Pittsburgh_MSA))

updated_highway_raster <- 
  as(updatedHighways,'Spatial') %>%
  rasterize(.,emptyRaster)

updated_highway_raster_distance <- distance(updated_highway_raster)
names(updated_highway_raster_distance) <- "updated_distance_highways"

updated_highwayPoints <-
  rasterToPoints(updated_highway_raster_distance) %>%
  as.data.frame() %>%
  st_as_sf(coords = c("x", "y"), crs = st_crs(MSA_fishnet))

updated_highwayPoints_fishnet <- 
  aggregate(updated_highwayPoints, MSA_fishnet, mean) %>%
  mutate(updated_distance_highways = ifelse(is.na(updated_distance_highways), 0, updated_distance_highways)) %>%
  rownames_to_column("fishnetID") %>%
  mutate(fishnetID = as.numeric(fishnetID)) 

# change dist to highway in 11 dat
dat11_withHighway <- 
  dat11 %>%
  st_join(., updated_highwayPoints_fishnet) %>%
  dplyr::mutate(distance_highways = updated_distance_highways)

dat_infill_withHighway <-
  dat11_withHighway %>%
  #calculate population change
    left_join(countyPopulation_2021_withHighway) %>%
    mutate(proportion_of_county_pop = pop_2010 / county_population_2011,
           pop_2021.infill = proportion_of_county_pop * new_county_projection_2021,
           pop_Change = pop_2021.infill - pop_2010) %>%
    dplyr::select(-county_projection_2021, -county_population_2011, 
                  -proportion_of_county_pop, -pop_2021.infill) %>%
  #predict for 2021; only changed pop and lag
    mutate(predict_2021.infill = predict(Model6,. , type="response")) # calculate new dist to pop

# Plot pop change with highway map
grid.arrange(
  dat_infill_withHighway %>%
    ggplot() +  
    geom_point(aes(x=xyC(dat_infill_withHighway)[,1], y=xyC(dat_infill_withHighway)[,2], colour = factor(ntile(pop_Change,5)))) +
    scale_colour_manual(values = palette5,
                      labels=substr(quintileBreaks(dat_infill_withHighway, "pop_Change"),1,8),
                      name="Population\nChange\nin Quintile\nBreaks") +
    geom_sf(data=Pittsburgh_MSA, fill=NA, colour="black", size=1.5) +
    geom_sf(data=PittsburghHighways, color='#6c757d', size=1, alpha=0.8) +
    geom_sf(data=newHighways, fill=NA, colour="#e5989b", size=1) +
    labs(title= "Population Change in 2021\nwith Proposed Highway",
         subtitle="New highway shown in pink") +
    mapTheme(),
  
  dat_infill %>%
    ggplot() +  
    geom_point(aes(x=xyC(dat_infill)[,1], y=xyC(dat_infill)[,2], colour = factor(ntile(pop_Change,5)))) +
    scale_colour_manual(values = palette5,
                      labels=substr(quintileBreaks(dat_infill, "pop_Change"),1,8),
                      name="Population\nChange\nin Quintile\nBreaks") +
    geom_sf(data=Pittsburgh_MSA, fill=NA, colour="black", size=1) +
    geom_sf(data=PittsburghHighways, color='#6c757d', size=1, alpha=0.8) +
    labs(title= "Population Change in 2021\nwithout Proposed Highway",
         subtitle=" ") +
    mapTheme(), ncol=2
)

# Plot demand change with highway and comparison map
grid.arrange( 
  dat_infill_withHighway %>%
    ggplot() +  
    geom_point(aes(x=xyC(dat_infill_withHighway)[,1], y=xyC(dat_infill_withHighway)[,2], colour = factor(ntile(predict_2021.infill,5)))) +
    scale_colour_manual(values = palette5,
                      labels=substr(quintileBreaks(dat_infill_withHighway, "predict_2021.infill"),1,8),
                      name="Predicted\nProbabilities\nin Quintile\nBreaks") +
    geom_sf(data=Pittsburgh_MSA, fill=NA, colour="black", size=1.5) +
    geom_sf(data=PittsburghHighways, color='#6c757d', size=1, alpha=0.8) +
    geom_sf(data=newHighways, fill=NA, colour="#e5989b", size=1) +
    labs(title= "Development Potential in 2021\nwith Proposed Highway",
         subtitle="(New highway shown in pink)") +
    mapTheme(),
  
  dat_infill %>%
    ggplot() +  
    geom_point(aes(x=xyC(dat_infill)[,1], y=xyC(dat_infill)[,2], colour = factor(ntile(predict_2021.infill,5)))) +
    scale_colour_manual(values = palette5,
                      labels=substr(quintileBreaks(dat_infill, "predict_2021.infill"),1,8),
                      name="Predicted\nProbabilities\nin Quintile\nBreaks") +
    geom_sf(data=Pittsburgh_MSA, fill=NA, colour="black", size=1) +
    geom_sf(data=PittsburghHighways, color='#6c757d', size=1, alpha=0.8) +
    labs(title= "Development Potential in 2021\nwithout Proposed Highway",
         subtitle=" ") +
    mapTheme(), ncol=2
)

7.2.11 Allocation

# Allegheny allocation filter
Allegheny <-
  dat_infill_withHighway %>%
  filter(NAME == "Allegheny") %>%
  mutate(Development_Demand = predict_2021.infill)

Allegheny_landUse <- rbind(
  filter(Allegheny, forest_2001 == 1 | wetlands_2001 == 1 ) %>% #these are actually 2011.
  dplyr::select() %>% mutate(Land_Use = "Not Suitable"),
  filter(Allegheny, developed_2001 == 1) %>%
  dplyr::select() %>% mutate(Land_Use = "Developed"))

# Allegheny allocation guidance map
grid.arrange(
ggplot() +
  geom_sf(data=Allegheny, aes(fill=factor(ntile(Development_Demand,5))), colour=NA) +
  geom_point(data=Allegheny_landUse, aes(x=xyC(Allegheny_landUse)[,1], 
                                        y=xyC(Allegheny_landUse)[,2], colour=Land_Use),
                                        shape = 18, size = 1) +
  geom_sf(data=st_intersection(PittsburghHighways,filter(Pittsburgh_MSA, NAME=="Allegheny")), color="#6c757d", size=1) +
  geom_sf(data=st_intersection(newHighways,filter(Pittsburgh_MSA)), color="#e5989b", size=1) +
  scale_fill_manual(values = palette5, name="Development\nDemand",
                    labels=substr(quintileBreaks(Allegheny,"Development_Demand"),1,6)) +
  scale_colour_manual(values = c("#FD9235", "#9d8189")) + 
  labs(title = "Development Potential, \n2021: Allegheny",
       subtitle="(New highway shown in pink)") + mapTheme() +
  guides(fill = guide_legend(order = 1), colour = guide_legend(order = 2)),

ggplot() +
  geom_sf(data=Allegheny, aes(fill=factor(ntile(pop_Change,5))), colour=NA) +
  geom_point(data=Allegheny_landUse, aes(x=xyC(Allegheny_landUse)[,1], 
                                        y=xyC(Allegheny_landUse)[,2], colour=Land_Use),
                                        shape = 18, size = 1) +
  geom_sf(data=st_intersection(PittsburghHighways,filter(Pittsburgh_MSA, NAME=="Allegheny")), color="#6c757d", size=1) +
  geom_sf(data=st_intersection(newHighways,filter(Pittsburgh_MSA)), color="#e5989b", size=1) +
  scale_fill_manual(values = palette5, name="Population\nChange",
                    labels=substr(quintileBreaks(Allegheny,"pop_Change"),1,6)) +
  scale_colour_manual(values =c("#FD9235", "#9d8189")) + 
  labs(title = "Projected Population, \n2021: Allegheny",
       subtitle="(New highway shown in pink)") + mapTheme() +
  guides(fill = guide_legend(order = 1), colour = guide_legend(order = 2)), ncol=2)

# Washington filter for allocation and map
Washington <-
  dat_infill_withHighway %>%
  filter(NAME == "Washington") %>%
  mutate(Development_Demand = predict_2021.infill)

Washington_landUse <- rbind(
  filter(Washington, forest_2001 == 1 | wetlands_2001 == 1 ) %>%
  dplyr::select() %>% mutate(Land_Use = "Not Suitable"),
  filter(Washington, developed_2001 == 1) %>%
  dplyr::select() %>% mutate(Land_Use = "Developed"))

grid.arrange(
ggplot() +
  geom_sf(data=Washington, aes(fill=factor(ntile(Development_Demand,5))), colour=NA) +
  geom_point(data=Washington_landUse, aes(x=xyC(Washington_landUse)[,1], 
                                        y=xyC(Washington_landUse)[,2], colour=Land_Use),
                                        shape = 18, size = 1) +
  geom_sf(data=st_intersection(PittsburghHighways,filter(Pittsburgh_MSA, NAME=="Washington")), color="#6c757d", size=1) +
  geom_sf(data=st_intersection(newHighways,filter(Pittsburgh_MSA)), color="#e5989b", size=1) +
  scale_fill_manual(values = palette5, name="Development\nDemand",
                    labels=substr(quintileBreaks(Washington,"Development_Demand"),1,6)) +
  scale_colour_manual(values = c("#FD9235", "#9d8189")) + 
  labs(title = "Development Potential, \n2021: Washington",
       subtitle="(New highway shown in pink)") + mapTheme() +
  guides(fill = guide_legend(order = 1), colour = guide_legend(order = 2)),

ggplot() +
  geom_sf(data=Washington, aes(fill=factor(ntile(pop_Change,5))), colour=NA) +
  geom_point(data=Washington_landUse, aes(x=xyC(Washington_landUse)[,1], 
                                        y=xyC(Washington_landUse)[,2], colour=Land_Use),
                                        shape = 18, size = 1) +
  geom_sf(data=st_intersection(PittsburghHighways,filter(Pittsburgh_MSA, NAME=="Washington")), color="#6c757d", size=1) +
  geom_sf(data=st_intersection(newHighways,filter(Pittsburgh_MSA)), color="#e5989b", size=1) +
  scale_fill_manual(values = palette5, name="Population\nChange",
                    labels=substr(quintileBreaks(Washington,"pop_Change"),1,6)) +
  scale_colour_manual(values = c("#FD9235", "#9d8189")) + 
  labs(title = "Projected Population, \n2021: Washington",
       subtitle="(New highway shown in pink)") + mapTheme() +
  guides(fill = guide_legend(order = 1), colour = guide_legend(order = 2)), ncol=2)

Lamorella, J. (2013). Crisis + opportunity. Re*imagining Pittsburgh. http://pittsburghcriticalpoints.weebly.com/crisis.html ↩︎
Fraser, J. (2019, November 21). Growing smarter. Retrieved from https://pittsburghquarterly.com/articles/growing-smarter/↩︎
Center for Social & Urban Research, U. (n.d.). Perspectives: 2019 population estimates for the Pittsburgh region. Retrieved May 09, 2021, from https://www.ucsur.pitt.edu/perspectives.php?b=20221115818670 ↩︎
The Pennsylvania Turnpike Commission. (n.d.). Southern Beltway Projects. Retrieved May 9, 2021, from https://www.patpconstruction.com/southern_beltway/beltway.aspx ↩︎
Duranton, G., & Turner, M. A. (2012). Urban Growth and Transportation.The Review of Economic Studies, 79 (4), 1407-1440. http://dx.doi.org/10.1093/restud/rds010 ↩︎

Urban Growth Modeling for Pittsburgh Metropolitan Statistical Area

Planning Memo to Southwestern Pennsylvania Commission

Xinyi Qiu and Xuezhu (Gillian) Zhao

5/10/2021