On this project, we will be investigating, detecting and predicting an interesting urban phenomenon. Over the past 50 years or so, some of the nation’s biggest metropolitans have experienced a unique change resulted from a set of urban redevelopment and rehabilitation process known as Gentrification.
Gentrification is defined as ‘Middle class settlement in renovated or redeveloped properties on older, inner-city districts formerly occupied by lower-income population’ (Greogry et al, 2009).Although there are many alternative definitions, most researchers agree that gentrification takes place when socially and economically affluent new settlers move in to rehabilitated and redeveloped low-income neighborhoods(Hammel & Wyly, 1996; Freeman, 2005). The influx of high-income earning residents over time puts pressure on housing prices and living expenses, which leads to poor people to migrate out of their neighborhood.
The Nation’s Capital, Washington D.C. is among the most well known examples of Gentrification. Over the years, the District has been going through unprecedented urban transformation as young, educated and well paid people continue to settle in (Arévalo, et al, 2012). According to a report by the Census Bureau, since the year 2000, DC received at least 100,000 new settlers. The increase, specially, in the population group that is well-off than the existing residents, skyrocketed housing prices. As the result, new settlers began to move into relatively affordable neighborhoods, causing low-income residents to migrate out (Guerrieri etal, 2012).
In addition to several case studies, empirical analysis by Hammel & Wyly (1996, 2001) and other researchers indicate that statistical methods can also be used to study gentrification. On this project, therefore, we made an attempt to empirically detect gentrification in the context of the change in household median income that is associated with the influx of high-income earning residents.
In order to do so, we built two different kinds of regression models, examined outputs and assessed the relationship between the dependent variable income and explanatory variables. The project is organized in to three parts. In the first, we introduce the data, lay out the null hypothesis, run explanatory data analysis and conduct feature selection. The second part is model building. We started out with Ordinary Least Square regression model and then move to Logistic regression. The third part is for result and summary where we also outlined limitations and potential solutions for future similar projects.
The data for this projected is collected from the Census Bureau. To collect the variables we need, in addition to assessing the features of gentrification we discussed above, we referred to similar works conducted by other researchers (Heidkamp & Lucas, 2013; Arévalo, et al, 2012; Hammel & Wyly, 2001). Although gentrification is taking places in many cities around the place, in our data collection, we take into account the changes in the context of Washington DC.
The variables are:
Key | Description |
---|---|
GEO.id | Geographic (unique) id |
p_chg_wh | Percentage change in white population |
p_chg_bk | Percentage chagne in black population |
p_chg_edc | Percentage change in the 25 years old and above with a Bachelors or Higher academic degree (s) |
change_incm | Change in median household income($) (USD, adjusted to infilation) |
change_hhsval | Change in median house value ($) (USD, adjusted to infilation) |
change_rent | Change in gross median rent ($) (USD, adjusted to infiliation) |
p_chg_ownd | Percentage change in the number of households owned |
p_chg_rented | Percentage change in the number of households rented |
p_cg_pvt | Percentage change in the number of people below poverty line |
change_vcri | Change in the number of Violent Crimes |
p_chg_chfm | Percentage change in the number of families with children |
chg_md_age | Change in median age |
Now, let us load the data in R and explore!
The total number of observations in our dataset is 179, each representing the exact same number of census tracts in DC. There are also a total of 13 columns. Let’s take a look at the first 10 observations and the over all structure of the data.
'data.frame': 179 obs. of 13 variables:
$ GEO.id : num 1.1e+10 1.1e+10 1.1e+10 1.1e+10 1.1e+10 ...
$ p_chg_wh : num 1.9 -14.4 -9.14 -1.96 -10.71 ...
$ p_chg_bk : num -3.53 2.22 6.03 3.19 0.84 -0.66 -0.11 -3.28 1.94 -3.5 ...
$ p_chg_edc : num -4.51 -1.48 -6.94 1.86 -2.6 ...
$ change_incm : num 44567 14714 25884 21361 -3328 ...
$ change_hhsval: int 402630 309790 287069 300978 473559 352800 196087 139104 -203462 -164277 ...
$ change_rent : num 598 476 359 428 484 ...
$ p_chg_ownd : num -3.77 1.72 3.78 0.43 -5.13 2.05 -3.86 7.01 -3.25 2.69 ...
$ p_chg_rentd : num -6.98 -1.33 -9.71 -0.83 1.96 ...
$ p_chg_pvt : num -1.59 -1.52 -6.21 -0.18 6.89 0.69 -2.22 4.33 4.12 -6.5 ...
$ change_vcri : int -132 1 -54 -28 0 -53 -21 -60 -5 -9 ...
$ p_chg_chfm : num 10.21 0 10.33 19.04 -3.11 ...
$ chg_md_age : num -3.9 -0.2 -8 2 0.7 1.8 -1.1 -0.4 3.6 -2 ...
Except for the first column, which is a unique geo-id, the dataset contains values representing the social, economic and demographic factors associated with gentrification. As we shall see later, although all of them are not equally relevant to build our models, our initial data collection included as many predicators as literatures covered.
Next, will run basic descriptive statistics, visualize relationship and detect some patterns in the dataset.
p_chg_wh p_chg_bk p_chg_edc change_incm
Min. :-14.400 Min. :-74.570 Min. :-15.38 Min. :-37259.1
1st Qu.: -0.155 1st Qu.:-21.255 1st Qu.: 3.40 1st Qu.: -823.7
Median : 3.870 Median : -8.800 Median : 8.00 Median : 10834.1
Mean : 9.720 Mean :-13.530 Mean : 12.51 Mean : 15897.3
3rd Qu.: 18.385 3rd Qu.: -1.275 3rd Qu.: 20.80 3rd Qu.: 31858.0
Max. : 66.850 Max. : 8.030 Max. : 56.07 Max. : 95491.5
change_hhsval change_rent p_chg_ownd p_chg_rentd
Min. :-373601 Min. : -33.39 Min. :-22.70 Min. :-67.820
1st Qu.: 86860 1st Qu.: 245.85 1st Qu.: -3.22 1st Qu.: -5.725
Median : 192824 Median : 421.40 Median : 0.21 Median : -1.270
Mean : 191139 Mean : 480.83 Mean : 1.72 Mean : -1.328
3rd Qu.: 319432 3rd Qu.: 637.17 3rd Qu.: 5.39 3rd Qu.: 3.755
Max. : 508399 Max. :1951.96 Max. : 68.39 Max. : 33.050
p_chg_pvt change_vcri p_chg_chfm chg_md_age
Min. :-50.460 Min. :-391.00 Min. :-45.83 Min. :-22.2000
1st Qu.: -7.830 1st Qu.:-141.00 1st Qu.: 8.03 1st Qu.: -2.5500
Median : -1.520 Median : -86.00 Median : 17.06 Median : -0.4000
Mean : -2.424 Mean : -97.79 Mean : 18.57 Mean : -0.3955
3rd Qu.: 3.680 3rd Qu.: -43.00 3rd Qu.: 28.23 3rd Qu.: 2.0500
Max. : 17.140 Max. : 30.00 Max. : 65.97 Max. : 12.5000
In the past 16 years, Washington DC has seen some interesting changes. At a census tract level, the percentage share of white population increased by an average 10% while the share of black population declined by 13.5%. Similarly, the percentage share of educated people went up by, on average, 12.5 % while the share of population below poverty line declined by 2.4 %. The median house value and gross rent amount also increased significantly. Interestingly, the median household income also goes up by, on average, $15,897 U.S. dollars.
Let’s visualize some the associations in the variables!
The change in the percentage share of white and black population exhibits a negative association while the change in income and share of educated people tend to positively associated. Most census tracts saw a decline in the number of violent crimes although the district still have high crime rate due theft related incidents. Most places also saw an increase in rent ranging from few hundred dollars to the upper a couple of thousands. The share of low income people also decline as the rent increases.
Gentrification often takes many forms, depending the location of urban transformation and can be seen from many angles. Most literatures, however, agree that gentrified neighborhoods distinctly can be identified as settlement of high income earning new residents in neighborhoods that were once considered poor and deteriorating (Ellen & Ding, 2016; Hammel & Wyly, 1996; Smith, 1982). Hence, sharp increase in household income in low-income neighborhoods can be explained by factors that can demystify the change.
The null hypothesis is that there is no statistical significance between income and social, economic and demographic changes of a neighborhood. In line with this hypothesis, we would also like to assess the potential use of statistical methods to study gentrification in dynamic cities such as the District.
Before moving on, we would like to check the normality of the distribution of our dependent variable (change in income). In order to do so, we’ll examine its distribution, first using a set of four graphical outputs and then by running, Kurtosis, Skewness and Shapiro Wilk Tests.
Accordingly, the change in income is slightly right-tailed. In the scatter plot, some extreme values are also visible. The QQ-Plot also indicates a distribution that deviates from normal.
Shapiro-Wilk normality test
data: Income_change
W = 0.93917, p-value = 6.973e-07
Skewness Kurtosis
1 0.9339091 0.831103
Additional diagnosis using Shapiro-Wilk test indicates the null hypothesis is rejected, indicating that the variable is not normally distributed. The Skewness and Kurtosis also shows that the distribution of our dependent variable is skewed and tailed to the right. We shall also later see how the residuals from the OLS model behave and decide conclusively whether the distribution is truly skewed or not.
Now that we have examined the distribution of dependent variable, let us, first, see the relationship within the dependent variables and, then, select the best predicators of income among them.
One of the underlying assumptions of an Ordinary Least Square regression is that there is no multicollinearity among the predicator variables. In other words, we don’t want our independent variables to have high to moderate correlation. If correlations exist, we shall use the Variance Inflation Factor (VIF) test and identify the multicollinear variables we may need to exclude from our model.
It looks like that there are some highly correlated variables. In running VIF test, we can detect by how much variance of the coefficient generated from the model is increased because of collinearity. Just to give you a head up, the VIF test in R is found in the car library.
vif.value.
p_chg_wh 11.638064
p_chg_bk 7.301797
p_chg_edc 6.649509
change_hhsval 1.253591
change_rent 2.013559
p_chg_ownd 3.040424
p_chg_rentd 2.882633
p_chg_pvt 1.982138
change_vcri 1.230919
p_chg_chfm 1.242959
chg_md_age 1.149341
In this case, at least three of our variables showing a VIF value greater than 4 are either moderately or highly multicollinear (i.e., the change in the percentage share of white population, black population, educated people) Before deciding to hastily exclude any variables from our analysis, let us see which subset of might actually be a good predicator.
The multicollinearity we detect could be due to the presence of redundant or irrelevant variables and this can be detected through Feature (Variable) selection. By selecting a subset of predicators, we can eliminate redundant variables that might other wise cause over fitting and biased estimate.
Based on the Bayesian Information Criterion (BIC),a subsets of our dataset has a much higher value to our model than others. This subset include changes in the values of white population, rent, house value, educated, and low-income people. We also can look into the R-square values and gather the same information.For our purpose, we’ll regress change in rent (change_rent), poverty (p_chg_pvt) and education (p_chg_pvt) on change in median household income (change_incm).
We will begin with the Oridianary Least Square (OLS) regression model and to see if the estimates hold true to the assumptions of a linear regression model.
Call:
lm(formula = change_incm ~ change_rent + p_chg_edc + p_chg_pvt,
data = sub_data)
Residuals:
Min 1Q Median 3Q Max
-32848 -8234 -1134 7007 36625
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6129.280 1814.832 -3.377 0.000902 ***
change_rent 28.238 3.624 7.792 5.61e-13 ***
p_chg_edc 535.072 96.393 5.551 1.04e-07 ***
p_chg_pvt -724.475 136.765 -5.297 3.50e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12470 on 175 degrees of freedom
Multiple R-squared: 0.7061, Adjusted R-squared: 0.7011
F-statistic: 140.2 on 3 and 175 DF, p-value: < 2.2e-16
As you can see from the p value of each the variables we chose, changes assocated with rent, education, poverty and are statistically signficant to the change in income. The model also has relatively high R-square indicating that these variables can potentially explain the variation income. Before moving into accepting the validity of this model, let’s examine the residuals.
W’ll first look into their over all pattern and check for Heteroscedastacity using the Breusch–Pagan test.
Some of the observations in our data seem to skew the over all distribution of the residuals. Interestingly, the Cook’s Distance shows the U-Street Corridor (64) and the Navy-Yard (91) neighborhoods have higher deviation from the rest. Since these two neighbrhoods went through rapid urban transformation in the last 20 years or so, their deviation explain a high level of gentrification.
Let’s now test for Heteroscedastacity and assess the statistical signficance of the distribution of the residuals. In order to do so, we will use the Breusch–Pagan test
Breusch-Pagan test
data: model1
BP = 10.401, df = 3, p-value = 0.01545
The result suggests that we should reject the null hypothesis of of homoscedastacity. Since this will invalidate our model, few potentials solutions might help us correct the issue better further.
Remove or impute the those highly deviating observations. This solution, however, might lead us to misleading estimates because those neighbrhoods are theoretically signficant to our understanding of gentrification in Washington DC.
Fit the data using a different modeling technqiues. We chose to go this route to see if other modeling technique, more specifically the logistic regression could help us detect the change in income better.In Regional and econometric studies, an alternative technique known as the General IV/GMM model is also used. Since this modeling technique is beyond the scope of this course, we’ll test and see if ‘Logistic Regression’ will do the majic.
Our second approach is to use K-Nearest Neighbor algorithm and attempt to detect gentrification. First, will see the chance of correctly orrectly classifying the neighborhoods based on income.
95491.51
0.005586592
Accordingly, we have 5.5 % chance of classifying correctly. Its exteremely low but would like to see if KNN will do a better job. First, we have to split our data in to training and testing groups. W’ll use the 80% of our dataset for training and 20% of it for testing the KNN algorithms.
set.seed(1)
data_train_rows = sample(1:nrow(knn_data),
round(0.8 * nrow(knn_data), 0),
replace = FALSE)
data_train = gd_v2[data_train_rows, ]
data_test = gd_v2[-data_train_rows, ]
Training_data<-nrow(data_train)
Testing_data<-nrow(data_test)
data.frame(Training_data, Testing_data)
Training_data Testing_data
1 143 36
Accordingly, 143 out of the 179 observations will be used to train the data while the remaining 36 observation will be used for testing.
We shall now use the algorithm to classify the neighbrhoods. Later on, we will compare the calssification result to the true class using cofusion matrix.
set.seed(1)
bank_3NN = knn(train = data_train[, c("p_chg_edc", "p_chg_pvt", "change_rent")],
test = data_test[, c("p_chg_edc", "p_chg_pvt", "change_rent")],
cl = data_train[, "change_incm"],
k = 5,
use.all = TRUE)
kNN_res = table(bank_3NN,
data_test$`change_incm`)
kNN_acc = sum(kNN_res[row(kNN_res) == col(kNN_res)]) / sum(kNN_res)
kNN_acc
[1] 0
As it turn out the data we fitted into algorithm is not likely to be classified using KNN. Our next decision is to test and fit the data using Logistic regression modelling technique.
In simple term, Logistic regression model can be understood as an estimate of the probability of an outcome. Now that we know that some neighbrhoods exhibit a sharp difference from the other, in the Logistic Regression, we want to identify which neighbrhoods have seen a sharp increase compared to the others.
The first task is to create a factor variable that categorizes income into different groups. We’ll begin by look at the quantile distribution of income.
0% 25% 50% 75% 100%
-37259.10 -823.70 10834.06 31857.96 95491.51
75% of the census tracts saw a decline or slight change compared to the other 25 % of the observations showing sharp increase. Let’s divide this variable between those showing a decline or slight to moderate increase (low) and those showing sharp increase (high). We will use the 75% as a break point.
p_chg_wh p_chg_bk p_chg_edc change_hhsval change_rent p_chg_ownd
1 1.90 -3.53 -4.51 402630 597.79 -3.77
2 -14.40 2.22 -1.48 309790 475.50 1.72
3 -9.14 6.03 -6.94 287069 358.64 3.78
4 -1.96 3.19 1.86 300978 427.51 0.43
5 -10.71 0.84 -2.60 473559 483.63 -5.13
6 -1.35 -0.66 4.89 352800 443.84 2.05
p_chg_rentd p_chg_pvt change_vcri p_chg_chfm ch_md_age y
1 -6.98 -1.59 -132 10.21 -3.9 high
2 -1.33 -1.52 1 0.00 -0.2 low
3 -9.71 -6.21 -54 10.33 -8.0 low
4 -0.83 -0.18 -28 19.04 2.0 low
5 1.96 6.89 0 -3.11 0.7 low
6 -4.87 0.69 -53 1.72 1.8 low
Accordingly 90 census tracts show sharp increase. Before moving on, we need to make sure that this new categorical variable is recongized as a factor variable.
'data.frame': 179 obs. of 12 variables:
$ p_chg_wh : num 1.9 -14.4 -9.14 -1.96 -10.71 ...
$ p_chg_bk : num -3.53 2.22 6.03 3.19 0.84 -0.66 -0.11 -3.28 1.94 -3.5 ...
$ p_chg_edc : num -4.51 -1.48 -6.94 1.86 -2.6 ...
$ change_hhsval: int 402630 309790 287069 300978 473559 352800 196087 139104 -203462 -164277 ...
$ change_rent : num 598 476 359 428 484 ...
$ p_chg_ownd : num -3.77 1.72 3.78 0.43 -5.13 2.05 -3.86 7.01 -3.25 2.69 ...
$ p_chg_rentd : num -6.98 -1.33 -9.71 -0.83 1.96 ...
$ p_chg_pvt : num -1.59 -1.52 -6.21 -0.18 6.89 0.69 -2.22 4.33 4.12 -6.5 ...
$ change_vcri : int -132 1 -54 -28 0 -53 -21 -60 -5 -9 ...
$ p_chg_chfm : num 10.21 0 10.33 19.04 -3.11 ...
$ ch_md_age : num -3.9 -0.2 -8 2 0.7 1.8 -1.1 -0.4 3.6 -2 ...
$ y : Factor w/ 2 levels "low","high": 2 1 1 1 1 1 1 2 1 1 ...
We will first start with a logistic regression approach that selects and identifies a subset of inputs showing the smallest deviance. The following result shows us the difference in the different AIC levels of all the different model possibilities.
Morgan-Tatar search since family is non-gaussian.
Call:
glm(formula = y ~ ., family = family, data = Xi, weights = weights)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.77621 -0.36874 -0.17568 -0.02026 2.50161
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.837e+00 9.525e-01 -6.128 8.9e-10 ***
p_chg_edc 8.518e-02 2.618e-02 3.254 0.00114 **
change_hhsval 4.037e-06 1.805e-06 2.236 0.02533 *
change_rent 3.429e-03 1.107e-03 3.096 0.00196 **
p_chg_rentd -3.553e-02 2.471e-02 -1.438 0.15040
p_chg_pvt -1.186e-01 4.095e-02 -2.896 0.00377 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 201.863 on 178 degrees of freedom
Residual deviance: 96.622 on 173 degrees of freedom
AIC: 108.62
Number of Fisher Scoring iterations: 6
(Intercept) p_chg_edc change_hhsval change_rent p_chg_rentd
0.002917577 1.088912955 1.000004037 1.003434847 0.965094244
p_chg_pvt
0.888148009
The next test we would like to do is the Hosmer Lemeshow goodness of fit test. This evaluation instrument in line with others is helpful to examine whether or not the observed event rates match expected event rates in subgroups of the model population.
Hosmer and Lemeshow goodness of fit (GOF) test
data: gentrification$y, fitted(income.bglm$BestModel)
X-squared = 179, df = 8, p-value < 2.2e-16
Our next task will focus on finding the probabilities for each response, calculate teh hit rate and plot the ROC curve.
income.prob.final<-predict(income.bglm$BestModel, type = c("response"))
View(income.prob.final)
income.hit.final <- roc(y~income.prob.final, data=gentrification)
income.hit.final
plot(income.hit.final)
With the information about the best glm, we’re going to try if we can partition the data successfully and test our model. This time we will also only the variables with known to have signficance to the estimates.
train.income.final <- gentrification[1:125,]
test.income.final <- gentrification[126:179,]
income.model.final<-glm(y~p_chg_wh+change_rent+p_chg_rentd+p_chg_pvt+p_chg_chfm, family = binomial(link="logit"), train.income.final)
summary(income.model.final)
Call:
glm(formula = y ~ p_chg_wh + change_rent + p_chg_rentd + p_chg_pvt +
p_chg_chfm, family = binomial(link = "logit"), data = train.income.final)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5508 -0.4553 -0.2019 0.1867 2.4108
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.9948051 0.9616268 -4.154 3.26e-05 ***
p_chg_wh 0.0978332 0.0274229 3.568 0.00036 ***
change_rent 0.0032999 0.0012803 2.577 0.00995 **
p_chg_rentd 0.0008607 0.0326292 0.026 0.97896
p_chg_pvt -0.1261501 0.0541597 -2.329 0.01985 *
p_chg_chfm -0.0303358 0.0235003 -1.291 0.19675
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 153.554 on 124 degrees of freedom
Residual deviance: 75.989 on 119 degrees of freedom
AIC: 87.989
Number of Fisher Scoring iterations: 6
To generate the coffients we’ll run the following code as well.
income.output.final <- exp(coef(income.model.final))
income.output.final
(Intercept) p_chg_wh change_rent p_chg_rentd p_chg_pvt p_chg_chfm
0.01841103 1.10277887 1.00330535 1.00086107 0.88148251 0.97011967
Now we have a sense of our model, we’ll use the regression model and predict the probability values. These values are the chances of outcome for each of the census tracts to either be of higher or lower income change.
high
low 0
high 1
126 127 128 129 130 131
1 1 1 0 0 0
# A tibble: 2 x 2
y no_rows
<fctr> <int>
1 low 47
2 high 7
[1] high high low low low low
Levels: low high
[1] 0.9444444
An object of class "performance"
Slot "x.name":
[1] "None"
Slot "y.name":
[1] "Area under the ROC curve"
Slot "alpha.name":
[1] "none"
Slot "x.values":
list()
Slot "y.values":
[[1]]
[1] 0.05471125
Slot "alpha.values":
list()
The AUC performance value indicates that the number of observations we have for our data is making it difficult for us to cross validate our model. However, the model over all has done workable progress and further researches can improve up on these findings and develop a model that performs well.
On this project, we attempted to detect gentrification in Washington DC by using emprical analysis. There are mulitple evidences of Gentrification and our goal was to leverage statistical tools and assess its over all pattern. Some of our findings can be summerized as:
Over all, our analysis indicates that regression modeling techniques are indeed useful to study the gentrification. This a very important finding in that our attempt addresses the criticism against studies on gentrification for lacking macro-scale emperical analysis. The modeling techniques we used clearly show their relevance in any future research in this topic.
The OLS regression model didn’t gave us reliable estimates and because of the difference in the scale of changes we saw in Washington DC, the residuals failed to show normal distribution. However, by introducing other modeling techniques such as the Two Stage Least Square (2SLS), the General Moment of Methods (GMM), instrumental variables, this linear regression technique can be improved to yield better result.
We also detected statistically signficant relationship between the change in household income and explantory variables that include changes in share of people with higher education, the share of people living under poverty line and the change in gross amount of median rent.
We didn’t manage to cross-validate the outputs of the logistic regression model because the number of observation’s we use is relatively small. As a result, we recommend that future emperical studies on small cities such as the District should use values at a census blocks rather than census tracts.
In summary, our attempt to study and detect gentrification in Washington DC through the various attributes and features associated with the change has proven that further research in this area will help us understand the pattern better.
References
Arévalo, J. C, Pető, B., Suaya, A. & Mann, L. M. (2012). Demographic changes and gentrification in Washington DC between 2000 and 2010. Papers of the Applied Geography Conferences.
Ellen, I. & Ding, L. (2016). Advancing our understanding of Gentrification. Cityscape: A Journal of Policy Development and Research 18:3, 1-8.
Freeman, L. (2005). Displacement or Succession? Residential Mobility in Gentrifying Neighborhoods. Urban Affairs Review, 40:4, 463-491
Guerrieri, V., Hartley, D, & Hurst, E.(2010). Endogenous Gentrification and housing price dynamics. National Bureau of Economic Research. Working Paper 16237. Retrived from: http://www.nber.org/papers/w16237, on Sun, 19 Nov 2017 18:59:21 UTC
Gregory, D. Johnston, R. Pratt, G. Watts, M. Whatmore, S. (2009). The Dictionary of Human Geography. Hoboken, NJ: Wiley-Blackwell.
Hammel, D. J., and Wyly. K. E.(1996). A model for identifying gentrified areas with census data. Urban Geography 17 (3): 248-68.
Hammel, D. J., and Wyly. K. E.(1996). Modeling in the context and contengency of Gentrification. Urban Affairs 20(3): 303-326.
Heidkamp, C. P. & Lucas, S. (2006) Finding the Gentrification Frontier Using Census Data: The Case of Portland, Maine. Urban Geography, 27:2, 101-125, DOI: 10.2747/0272-3638.27.2.101
Smith, N. (1982). Gentrification and Uneven Development. Economic Geography 58(2) 139-155.
@Tidy Inisghts
Introduction to Data Science
The George Washington University
Dec-2017