This analysis will use a binomial logit model to identify determinants of changing urban land in Chester County. First, I created a binary variable called “CHNG_URB” based on if a given cell of land was either in agricultural use, pasture use, or forest use in 1992, and was then also urbanized in 2001. If these criteria were met, CHNG_URB was given a value of 1, and all other rows were given a value 0. This represents the dependent variable used in the binomial logit modeling process.
T-Tests
To select independent variables for the binomial logit model, I first conducted t-tests between potential binary independent variables and “CHNG_URB”. T-tests run on “1 if cell is within 300 meters of a four-lane road, 0 otherwise”, “1 if cell is within 300 meters of a SEPTA regional rail line, 0 otherwise”, and “1 if cell is within 1000 meters of a city or boro” all indicated that the null hypothesis can be rejected: the true means may be different.
t.test (CHNG_URB ~ REGRAIL300, data = dat, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: CHNG_URB by REGRAIL300
## t = -9.7627, df = 867.53, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.14425046 -0.09595852
## sample estimates:
## mean in group 0 mean in group 1
## 0.02240759 0.14251208
t.test (CHNG_URB ~ CITBORO_10, data = dat, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: CHNG_URB by CITBORO_10
## t = -5.2009, df = 510.16, p-value = 0.0000002876
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.10074004 -0.04549873
## sample estimates:
## mean in group 0 mean in group 1
## 0.03160341 0.10472279
t.test (CHNG_URB ~ FOURLNE300, data = dat, paired = FALSE)
##
## Welch Two Sample t-test
##
## data: CHNG_URB by FOURLNE300
## t = -13.148, df = 4253.4, p-value < 0.00000000000000022
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.06769419 -0.05012552
## sample estimates:
## mean in group 0 mean in group 1
## 0.007447723 0.066357578
Chi-Squared Tests
To analyze potential continuous independent variables for inclusion, I ran chi-squared tests between “CHNG_URB” and population density in 1990, distance in meters to a regional rail station, percent white in 1990, and distance in meters to a 4-lane highway.
The results of the chi-squared tests were varied. Tests between CHNG_URB and POPDEN90, PCT_WHITE, and DIST_4LNE_ each produced a p-value that approaches 0, meaning we can reject the null hypothesis that there is no relationship between the variables. Because of this finding, there is evidence of association between the two variables.
Tests between CHNG_URB and DIST_RAILS produced a high p-value, indicating that the null hypothesis is accepted and we cannot conclude the variables are associated. DIST_RAILS will not be included in the final model.
CrossTable (dat$CHNG_URB, dat$DIST_4LNE_, fisher = FALSE, chisq = TRUE,
expected = TRUE, sresid = FALSE, format="SPSS")
Correlation Tests
Finally, I conducted correlation tests among the independent variables themselves. Each correlation test run produced p-values that allow us to reject the null hypothesis that true correlation between the variables is 0. These variables may be correlated.
This analysis considered a number of independent variables like, “Within 300 meters of a SEPTA regional rail line”, “within 1000 meters of a city or boro”, “percentage of adults who are college graduates by census tract”, “white population percentage of the census tract” and others. Ultimately, the most robust and statistically significant predictors of whether or not previously non-urbanized land became urbanized by 2001 were as follows:
Each of these variables is highly statistically significant within the model (shown below and denoted by ***), as shown by the original results and Anova test. The distance in meters to a regional rail station would have increased model fit, however, it would have introduced collinearity into the model, yielding inaccurate results.
Based on the model results, if a cell was within 300 meters of a SEPTA regional rail line, the likelihood of urbanization increases by 1.314%. If a cell was within 1,000 meters of a city or boro the likelihood of urbanization increased by 0.838%. A one unit increase in distance in meters to a 4-lane highway decreased the likelihood of urbanization by 0.00035%. Finally, a one unit increase in population density increased the likelihood of urbanization by 0.00022%.
These model results also make sense in practice. If a cell is within 300 meters of a SEPTA station, we would expect it to be one of the first places people would look to develop. Similarly, if a cell is within 1000 meters of a city or boro, it is a natural choice for urban expansion. Connectivity to highways is also important, especially in Pennsylvania. While people do not necessarily want to live adjacent to a 4-lane highway, it is unlikely that urban development will take place far from that kind of connectivity. Future models could analyze this variable and make it stronger by determining an appropriate buffer from the four lane highways for development, while also recognizing that being located 5 miles from a 4-lane highway is not ideal for urban circumstances. Finally, population density is likely similar to the 4-lane highway variable. It is unlikely that very sparsely populated areas would suddenly become urbanized. However, if an area was already densely populated, it was likely already urban. This is another variable that could be studied and refined for future models to capture the turning point in population density for more predictive power.
mod <- glm ( CHNG_URB ~ REGRAIL300 + CITBORO_10 + DIST_4LNE_ + POPDEN90, data=dat, family = binomial)
summary(mod)
##
## Call:
## glm(formula = CHNG_URB ~ REGRAIL300 + CITBORO_10 + DIST_4LNE_ +
## POPDEN90, family = binomial, data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2730 -0.2736 -0.1907 -0.1218 3.4581
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.95397068 0.13061596 -22.616 < 0.0000000000000002 ***
## REGRAIL300 1.31391826 0.14168378 9.274 < 0.0000000000000002 ***
## CITBORO_10 0.83771176 0.17983905 4.658 0.00000319111192372 ***
## DIST_4LNE_ -0.00034972 0.00004404 -7.940 0.00000000000000201 ***
## POPDEN90 0.00022022 0.00005930 3.713 0.000204 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 2185.6 on 6941 degrees of freedom
## Residual deviance: 1861.4 on 6937 degrees of freedom
## AIC: 1871.4
##
## Number of Fisher Scoring iterations: 7
anova(mod, test="Chisq")
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: CHNG_URB
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 6941 2185.6
## REGRAIL300 1 195.814 6940 1989.8 < 0.00000000000000022 ***
## CITBORO_10 1 34.985 6939 1954.8 0.000000003323 ***
## DIST_4LNE_ 1 81.973 6938 1872.8 < 0.00000000000000022 ***
## POPDEN90 1 11.402 6937 1861.4 0.0007337 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
drop1(mod, test="Chisq")
## Single term deletions
##
## Model:
## CHNG_URB ~ REGRAIL300 + CITBORO_10 + DIST_4LNE_ + POPDEN90
## Df Deviance AIC LRT Pr(>Chi)
## <none> 1861.4 1871.4
## REGRAIL300 1 1943.7 1951.7 82.240 < 0.00000000000000022 ***
## CITBORO_10 1 1880.6 1888.6 19.142 0.00001213 ***
## DIST_4LNE_ 1 1940.6 1948.6 79.191 < 0.00000000000000022 ***
## POPDEN90 1 1872.8 1880.8 11.402 0.0007337 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(jitter(dat$CHNG_URB, factor=.7) ~ dat$POPDEN90,
pch= 4,
ylab="Change to Urban Land Use",
xlab="Population Density in 1990",
main="Probability of Land Use Change Based on Population Density in 1990")
points (dat$POPDEN90, fitted (mod), col="red")
legend('topright', c('Observed', 'Predicted'),
col=c("black", "red"), pch=21:21,lty=1:1)
plot(jitter(dat$CHNG_URB, factor=.5) ~ dat$DIST_4LNE_,
pch= 4,
ylab="Change to Urban Land Use",
xlab="Distance in meters to 4-lane highway",
main="Probability of Land Use Change Based on Distance to 4-Lane Highway")
points (dat$DIST_4LNE_, fitted (mod), col="red")
legend('topright', c('Observed', 'Predicted'),
col=c("black", "red"), pch=21:21,lty=1:1)
Scenario Plotting
The plot below shows how differing population densities impact the probability that a given land use changed to urban. These scenarios assume that the tracts of land are within 300 meters of a regional rail station and within 1,000 meters of a city or boro. The plot illustrates that tracts with a population density of 1,000 have a higher likelihood of changing to urban use.