Introduction to neighborhood change in Sacramento
Sacramento is the capital city of California and serves as the state’s governmental and political hub. Historically, Sacramento also played an influential role in the California Gold Rush and was a desirable destination for many newcomers to the state who were hoping to discover their fortune. Although the Gold Rush is no longer a draw for travelers to Sacramento, it continues to attract newcomers at an astonishing rate as the fastest growing big city in California and one of the nation’s most diverse cities. It is also notable that in comparison to the San Francisco Bay Area, just south of the Sacramento region, the cost of housing is far more affordable.
However, some studies suggest that this may be changing as residents of the Bay Area sprawl further north and complaints of gentrification spreads through several previously lower-income areas around the downtown and midtown regions of the city. Therefore, in this report, we will be utilizing cluster analysis techniques to explore the changes in Sacramento’s housing prices and population during the first decade of the 2000s (2000 to 2010) in the midst of the Great Recession and the rapid creation of new technologies in the region just south of the city, the Silicon Valley.
state county FIPS name
1 06 017 06017 EL DORADO
2 06 061 06061 PLACER
3 06 067 06067 SACRAMENTO
To begin our analysis, we will be utilizing data from the U.S. Census Bureau. The variable data and census information (displayed in the table here) was sourced from the Diversity and Disparities Project at Brown University and the shapefiles for mapping were sourced from the Bureau’s American Community Survey (ACS). Data from the ACS requires access to a Census API Key.
Specifically, for the study, we will examine data from the three counties included in the Sacramento, California Metropolitan Statistical Area (MSA): El Dorado County, Placer County, and Sacramento County.
After accessing data from the Diversity and Disparities project, we calculated the changes in House Price and demographic data for each of the tracts that belong to the Sacramento MSA.
For example, the first row of data indicates that for the first tract of data (from an area in El Dorado County), the median house price more than doubled from 2000 to 2010. In contrast, The amount of residents who were foreign born, recent immigrants, living in poverty, identified as Hispanic or Black, and stated that they were self-employed remained approximately the same. The number of residents who spoke poor English decreased substantially by almost 60%, the amount who identified as White who lived in poverty decreased by almost 50%, the residents who were veterans decreased slightly by 16%. The population who worked professional jobs and women in the labor force also decreased slightly by 12% and 14%, respectively.
| Statistic | Mean | St. Dev. | Min | Max |
| HousePriceChange | 7.928 | 74.452 | 0.366 | 898.251 |
| ForeignBornChange | 1.759 | 4.431 | 0.132 | 54.061 |
| RecentImmigrantChange | 1.378 | 1.695 | 0.000 | 14.085 |
| PoorEnglishChange | 1.805 | 1.783 | 0.000 | 15.825 |
| VeteranChange | 0.718 | 0.224 | 0.130 | 1.607 |
| PovertyChange | 1.527 | 1.206 | 0.148 | 6.417 |
| PovertyBlackChange | 1.216 | 2.517 | 0 | 19 |
| PovertyWhiteChange | 1.203 | 1.077 | 0.029 | 7.337 |
| PovertyHispanicChange | 2.049 | 4.621 | 0.000 | 49.091 |
| PopBlackChange | 0.962 | 0.921 | 0.000 | 5.114 |
| PopHispanicChange | 1.264 | 0.598 | 0.000 | 3.888 |
| PopUnempChange | 2.317 | 2.260 | 0.319 | 24.390 |
| PopManufactChange | 0.804 | 0.763 | 0.000 | 7.414 |
| PopSelfEmpChange | 1.054 | 1.252 | 0.000 | 15.066 |
| PopProfChange | 1.008 | 0.372 | 0.185 | 2.638 |
| FemaleLaborForceChange | 1.071 | 0.196 | 0.242 | 1.978 |
This 5-point Data Summary provides an overview of all the variables across the entire Sacramento Region. The first variable, HousePriceChange stands out as a variable with an outrageously high standard deviation. Therefore, after exploring the data table a little further, we discover a tract in Sacramento County where the median house price in 2000 was $316.1721 and the median house price in 2010 was $284,900. This results in a house price change of 898. In comparison to the other data points, this is an extreme outlier. Therefore, we will remove this entry from the data set and re-analyze the data.
| Statistic | Mean | St. Dev. | Min | Max |
| HousePriceChange | 1.745 | 0.402 | 0.366 | 3.862 |
| ForeignBornChange | 1.396 | 0.712 | 0.132 | 4.217 |
| RecentImmigrantChange | 1.290 | 1.325 | 0.000 | 8.134 |
| PoorEnglishChange | 1.707 | 1.349 | 0.000 | 7.127 |
| VeteranChange | 0.721 | 0.223 | 0.130 | 1.607 |
| PovertyChange | 1.534 | 1.207 | 0.148 | 6.417 |
| PovertyBlackChange | 1.208 | 2.524 | 0 | 19 |
| PovertyWhiteChange | 1.209 | 1.078 | 0.029 | 7.337 |
| PovertyHispanicChange | 2.061 | 4.635 | 0.000 | 49.091 |
| PopBlackChange | 0.960 | 0.923 | 0.000 | 5.114 |
| PopHispanicChange | 1.265 | 0.600 | 0.000 | 3.888 |
| PopUnempChange | 2.329 | 2.263 | 0.319 | 24.390 |
| PopManufactChange | 0.758 | 0.528 | 0.000 | 4.888 |
| PopSelfEmpChange | 0.956 | 0.443 | 0.000 | 2.892 |
| PopProfChange | 0.997 | 0.347 | 0.185 | 2.245 |
| FemaleLaborForceChange | 1.064 | 0.181 | 0.242 | 1.572 |
After removing the HousePriceChange outlier from the data set, we can reconsider the 5-point data summary with more accurate information for the Sacramento Region. The first variable, HousePriceChange has a mean of 1.745, sd of .402, minimum of .366, and maximum of 3.862. This indicates that on average house prices increased by almost double from 2000 to 2010. There were some exceptions where some house prices dropped by nearly two-thirds and others nearly quadrupled, but overall the region’s house prices grew exponentially. This is particularly concerning due to the fact that the overall population in poverty increased by approx. 50%, the population of Hispanics in poverty doubled and the amount of the population who were unemployed also doubled.
The unemployment numbers are relatively consistent with the trends of job loss throughout the United States during this time period due to the Great Recession. However, unlike other areas that experienced a decrease in home prices to accomodate for the lower levels of income, Sacramento’s house prices increased. This makes for a risky combination because people who are out of work are in need of cheaper housing, not more expensive housing.
This trend may be responsible for the current high rate of homelessness in Sacramento. In 2019, Sacramento County recorded its highest number of homeless citizens ever. Researchers estimate that this was a 19% increase over previous years due to difficulties with employment and housing.
This histogram grid provides a visual of the same data displayed by the 5-point summary table. Here we can see that most data points hover around the 1 point mark, indicating that the variables did not change significantly. However, a few exceptions to this are the House Price variable that clusters closer to the 2 mark indicating that house prices approximately doubled, the poor English speakers, white poverty rate, Hispanic population, self-employed, and recent immigrant populations which experienced some growth, the veterans, professionals, and female labor force which skewed towards decreasing, and the unemployed population that experienced significant growth.
Once again these characteristics are fairly consistent with the downturn of the economy following the 2008 Great Recession, with the exception of the housing market that surprisingly continued to increase, even while other areas suffered from losses in home value.
The correlation plot displays the positive and negative relationships amongst the variables. Many of the stronger relationships follow conventional assumptions: for example, recent immigrant change and foreign born change are positively correlated so that as the number of recent immigrants increases, so does the number of foreign born residents in the population. Similarly, poverty and house price are negatively correlated so that as the amount of poverty decreases, the price of houses increases.
One interesting finding is the fairly strong positive correlation between female labor force changes and the professional population changes. In the past, female labor force participation was often correlated with lower-income areas where women were required to work due to family financial difficulties, but this data shows changes in that trend to modern labor force trends where women are involved in professional fields at ever-increasing rates. In this case, as the professional population increases, so does the level of female labor force participation. Subsequently, although female labor force changes are not very strongly correlated with house price changes, professional population changes are positively correlated with house prices. As the population of professionals increases, so does the house prices.
Another notable finding, albeit less positive, is the fairly strong negative correlation between Hispanic population changes and house price. This indicates that as the Hispanic population decreases, the house price increases which suggests disparities in Hispanic wealth in the Sacramento region. Especially since this correlation is about the same as the negative correlation between total poverty and house price and stronger than the correlations between race/ethnicity-specific poverty and house price.
A final discovery to examine is the correlation between white poverty and total poverty in the region. All race/ethnic-specific poverty changes are positively correlated with total poverty, but white poverty changes are most strongly correlated. This is especially interesting due to the fact that as displayed in our 5-point summary, white and black rates of poverty remained fairly stable while Hispanic poverty doubled. One reasonable explanation for this, though, is the fact that in Sacramento County, whites accounted for 48.3% of the population in 2010 while Hispanics accounted for 21.7% of the population (the second largest demographic group). Both El Dorado County and Placer County, which are included in this study had white-majority populations of 80% and 76%, respectively. This suggests that this correlation is related to the higher number of individuals influencing the poverty rate instead of rates of change within racial/ethnic groups.
| Dependent variable: | |||
| HousePriceChange | |||
| (1) | (2) | (3) | |
| ForeignBornChange | -0.052 | 0.022 | -0.019 |
| (0.044) | (0.065) | (0.063) | |
| RecentImmigrantChange | -0.031 | -0.020 | |
| (0.034) | (0.033) | ||
| PoorEnglishChange | -0.023 | -0.012 | |
| (0.028) | (0.027) | ||
| VeteranChange | -0.0001 | -0.082 | |
| (0.148) | (0.146) | ||
| PovertyChange | -0.052** | -0.031 | -0.026 |
| (0.026) | (0.035) | (0.034) | |
| PovertyBlackChange | -0.027* | -0.017 | |
| (0.014) | (0.013) | ||
| PovertyWhiteChange | -0.047 | -0.019 | |
| (0.036) | (0.035) | ||
| PovertyHispanicChange | -0.006 | -0.005 | |
| (0.007) | (0.007) | ||
| PopBlackChange | -0.011 | -0.018 | |
| (0.037) | (0.036) | ||
| PopHispanicChange | -0.139*** | -0.160*** | -0.148*** |
| (0.052) | (0.058) | (0.056) | |
| PopUnempChange | 0.004 | ||
| (0.014) | |||
| PopManufactChange | 0.049 | ||
| (0.061) | |||
| PopSelfEmpChange | 0.137* | ||
| (0.075) | |||
| PopProfChange | 0.364*** | 0.396*** | |
| (0.090) | (0.117) | ||
| FemaleLaborForceChange | -0.399* | ||
| (0.218) | |||
| Constant | 1.711*** | 2.158*** | 2.028*** |
| (0.134) | (0.157) | (0.260) | |
| Observations | 144 | 144 | 144 |
| R2 | 0.223 | 0.179 | 0.293 |
| Adjusted R2 | 0.201 | 0.118 | 0.210 |
| Residual Std. Error | 0.359 (df = 139) | 0.377 (df = 133) | 0.357 (df = 128) |
| F Statistic | 9.995*** (df = 4; 139) | 2.904*** (df = 10; 133) | 3.536*** (df = 15; 128) |
| Note: | p<0.1; p<0.05; p<0.01 | ||
While the correlation plot gives us a visual idea of what variables appear to be correlated, the regression model gives us a definitive understanding of which variables have a statistically significant correlation with changes in house price.
Our first model which examines changes in the foreign born, Hispanic, and professional populations in addition to changes in poverty. This model finds that both PopHispaniChange and PopProfChange have very statistically significant relationships with HousePrice at the 0.01 (99%) level. PovertyChange also has a statistically significant relationship with HousePriceChange at the 0.05 (95%) level. ForeignBornChange does not have a statistically significant relationship. This indicates that in model 1, we can be 95% confident that a 1-unit increase in Poverty is associated with a ~5.2% decrease in House Price, 99% confident that a 1-unit increase in the Hispanic population is associated with a ~13.9% decrease in House Price and 99% confident that a 1-unit increase in the Professional population is associated with a ~36.4% increase in House Price.
In our second model, PovertyChange loses its statistical significance. Instead, changes in Black poverty has a slightly statistically significant relationship where we can be 90% confident that a 1-unit increase in Black poverty is associated with a 2.7% decrease in House Price. Further, changes in the Hispanic population remains very statistically significant and we can be 99% confident that a 1-unit increase in the Hispanic population is associated with a 16% decrease in House Price, which is even higher than our Model 1 estimate.
In our third and final model, BlackPovertyChange loses its statistical significance, while once again HispanicPopulationChange remains highly significant. In this model, we can be 99% confident that a 1-unit increase in Hispanic population is associated with a middling value of a ~14.8% decrease in House Price compared to the other two models. Additionally, changes in the self employed population is somewhat statistically significant and we can be 90% confident that a 1-unit increase in the self-employed population is associated with a ~13.7% increase in House Price. The professional population remains highly statistically significant and we can be 99% confident that a 1 unit increase in the professional population is associated with a ~39.6% increase in House Price. Changes in female participation in the labor force are also somewhat statistically significant and we can be 90% confident that a 1-unit increase in FemaleLaborForceChange is associated with a ~39.9% decrease in House Price.
This finding on female participation in the labor force in a model versus in a correlation plot is more consistent with historical findings that women’s labor participation is often correlated with lower income areas while the variable alone suggests a positive relationship. This highlights the importance of omitted variable bias to examine relationships in connection with other variables instead of solely isolated.
Cluster 1: Diverse Lower-to-Middle Class: Lower poverty, ethnically diverse, large female labor force, high professionalism, large immigrant population
Cluster 2: Middle-to-Upper Class: Low poverty, predominantly white, large female labor force, high professionalism, sizable immigrant and veteran populations
Cluster 3: Lower Class: Higher poverty, ethnically diverse, large female labor force, large immigrant population, sizable unemployed population
Cluster 4: Struggling Class: High poverty, ethnically diverse, large female labor force, large immigrant population, large group of poor English speakers, sizable veteran and unemployed populations
To further explore how neighborhoods have changed in the Sacramento, California region we will now examine how our clusters shifted from 2000 to 2010.
1 2 3 4
1 0.53846154 0.11538462 0.23076923 0.11538462
2 0.22535211 0.63380282 0.11267606 0.02816901
3 0.30000000 0.00000000 0.46666667 0.23333333
4 0.00000000 0.00000000 0.05882353 0.94117647
Using the same variables that we analyzed in 2010, we utilized the pred() function to predict what clusters Sacramento neighborhoods would have belonged to in 2000.
As noted previously the clusters are defined as follows:
Cluster 1: Diverse Lower-to-Middle Class
Cluster 2: Middle-to-Upper Class
Cluster 3: Lower Class
Cluster 4: Struggling Class
The majority of neighborhoods remained in their respective clusters, but some neighborhoods improved or declined from 2000 to 2010.
Out of all of the clusters, Cluster 3 appears to have been the most susceptible to gentrification with less than half of the neighborhoods that were in the cluster in 2000 remaining in the cluster in 2010. Approximately a third of these neighborhoods transferred to Cluster 1 in a transition from the Lower Class up to the Diverse Lower-to-Middle Class. A little less than a fourth fell into the lower-income cluster of the Struggling Class (Cluster 4).
Over half of the neighborhoods that belonged to the Diverse Lower-to-Middle Class (Cluster 1) and Middle-to-Upper Class (Cluster 2) stayed in the same cluster, while over 90% of the neighborhoods in the Struggling Class (Cluster 4) remained in the Struggling Class.
This suggests that middle class neighborhoods and struggling neighborhoods remained more stable while the “up and coming” Lower Class neighborhoods tinkered between improving and declining.
Cluster 1: Diverse Lower-to-Middle Class
Cluster 2: Middle-to-Upper Class
Cluster 3: Lower Class
Cluster 4: Struggling Class
This Sankey Transition Plot is an excellent visual representation of the transition matrix on the previous tab.
As explained above, Cluster 3 is the neighborhood that had the most transfers of neighborhoods as evidenced by the three almost-equally-sized arrows that divide up the neighborhoods.
In addition, Cluster 4 has a steady flow of neighborhoods that remained in Cluster 4, except for a small subset that progressed upwards to the Lower Class.
Cluster 2 also had a steady flow of neighborhoods that remained in Cluster 2, except for a few neighborhoods that dropped down to the Lower-to-Middle Class (Cluster 1) and an even smaller subset that transitioned to the Lower or Struggling classes.
Besides Cluster 3, Cluster 1 had the second most diverse breakdown of its neighborhoods with some moving upwards to Cluster 2 and others moving to Clusters 3 & 4.
Thank you for exploring this dashboard on neighborhood change in Sacramento, California!
The data and visuals presented were created by Courtney Stowers as a final project for the CPP 529: Community Analytics Practicum in the Master of Science in Program Evaluation and Data Analytics program at Arizona State University on December 4, 2019.
Image Source: Pixabay UnboxScience
# R libraries used for this project
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( DT )
library( dplyr )
library( sf )
library( sp )
library( cartogram )
library( tmap )All packages used in this dashboard are available for download from CRAN using the install.packages() function.
The tidycensus package is used to download data from the United States Census Bureau. It requires a Census API Key for authorization.
The tidyverse, plyr, dplyr, ggplot2, and purrr packages are all a part of the “tidyverse” and used for data wrangling purposes. In addition, the ggplot2 package is used to create dynamic charts and graphs.
The stargazer package is used to create regression tables.
The corrplot package is used to create correlation plots matrixes.
The flexdashboard package formats the output of R Markdown files into interactive presentations.
The leaflet package creates interactive maps.
The mclust package is used for cluster analysis.
The DT package creates javascript style data tables.
The sf package works with shapefiles in simple features format.
The sp package works with shapefiles and spatial data.
The cartogram package creates cartogram style maps.
The tmap package creates thematic maps like the Dorling map included in this presentation.
---
title: "Community Analytics Practicum"
output:
flexdashboard::flex_dashboard:
social: menu
source: embed
---
```{r setup, include=FALSE}
knitr::opts_chunk$set( message=F, warning=F, echo=F )
#Load in libraries
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( DT )
library( dplyr )
library( sf )
library( sp )
library( cartogram )
library( tmap )
```
```{r, quietly=T, include=F}
census_key <- "665823e8e71ad48b1e3aa3c4a4e49df4f2937a40"
census_api_key(census_key)
#Loading data
URL <- "https://github.com/DS4PS/cpp-529-master/raw/master/data/CensusData.rds"
census.dats <- readRDS(gzcon(url( URL )))
census.dats <- na.omit(census.dats)
```
Introduction {.storyboard}
=========================================
### Project Overview
```{r}
# EDIT ME: set coordinates to your city center
leaflet() %>%
setView(-121.4944, 38.5815697, zoom = 13) %>%
addTiles() %>%
addMarkers(lng=-121.4944, lat=38.5815697, popup="Sacramento, California")
```
***
**Introduction to neighborhood change in Sacramento**
Sacramento is the capital city of California and serves as the state's governmental and political hub. Historically, Sacramento also played an influential role in the California Gold Rush and was a desirable destination for many newcomers to the state who were hoping to discover their fortune. Although the Gold Rush is no longer a draw for travelers to Sacramento, it continues to attract newcomers at an astonishing rate as the fastest growing big city in California and one of the nation's most diverse cities. It is also notable that in comparison to the San Francisco Bay Area, just south of the Sacramento region, the cost of housing is far more affordable.
However, some studies suggest that this may be changing as residents of the Bay Area sprawl further north and complaints of gentrification spreads through several previously lower-income areas around the downtown and midtown regions of the city. Therefore, in this report, we will be utilizing cluster analysis techniques to explore the changes in Sacramento's housing prices and population during the first decade of the 2000s (2000 to 2010) in the midst of the Great Recession and the rapid creation of new technologies in the region just south of the city, the Silicon Valley.
Data {.storyboard}
=========================================
### Empirical Framework
```{r}
#Edit ME: At this point, census.dats contains census information for all of the US.
#Edit ME (Cont.): You want to focus on only your chosen MSA selected in Lab 4.
#Edit ME (Cont.): Subset census.dats to include only your MSA of interest.
#Edit ME (Cont.): After you subset the data to your MSA of interest, change echo=T to echo=F so that we do not see your code, which is not required for professional city government presentation.
# Link to Lab 4: https://ds4ps.org/cpp-529-master/labs/lab-04-instructions.html
```
```{r, include=F}
# MSA
crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv", stringsAsFactors=F, colClasses="character" )
# search for city names by strings, use the ^ anchor for "begins with"
grep( "^SAC", crosswalk$msaname, value=TRUE )
```
```{r, echo=F}
these.sacramento <- crosswalk$msaname == "SACRAMENTO, CA"
these.fips <- crosswalk$fipscounty[ these.sacramento ]
these.fips <- na.omit( these.fips )
state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )
name.fips <- crosswalk$countyname[these.sacramento]
data.frame( state=state.fips, county=county.fips, FIPS=these.fips, name=name.fips)
```
```{r, echo=FALSE}
# Filter Census Data
census.dats<- census.dats %>% filter( county == c("El Dorado County", "Placer County", "Sacramento County") )
```
```{r, echo=FALSE}
#Calculating change Values for variables
censusChange1<-ddply(census.dats,"TRTID10",summarise,
HousePriceChange = Median.HH.Value10/(Median.HH.Value00+1),# Change variable
ForeignBornChange = Foreign.Born10/(Foreign.Born00 +.01),
RecentImmigrantChange = Recent.Immigrant10/(Recent.Immigrant00+.01),
PoorEnglishChange = Poor.English10/(Poor.English00+.01),
VeteranChange = Veteran10/(Veteran00+.01),
PovertyChange = Poverty10/(Poverty00+.01),
PovertyBlackChange = Poverty.Black10/(Poverty.Black00+.01),
PovertyWhiteChange = Poverty.White10/(Poverty.White00+.01),
PovertyHispanicChange = Poverty.Hispanic10/(Poverty.Hispanic00+.01),
PopBlackChange = Pop.Black10/(Pop.Black00+.01),
PopHispanicChange = Pop.Hispanic10/(Pop.Hispanic00+.01),
PopUnempChange = Pop.Unemp10/(Pop.Unemp00+.01),
PopManufactChange = Pop.Manufact10/(Pop.Manufact00+.01),
PopSelfEmpChange = Pop.SelfEmp10/(Pop.SelfEmp00+.01),
PopProfChange = Pop.Prof10/(Pop.Prof00+.01),
FemaleLaborForceChange = Female.LaborForce10/(Female.LaborForce00+.01)
)
#remove NAs that result
censusChange1<-censusChange1[!duplicated(censusChange1$TRTID10),]
```
***
To begin our analysis, we will be utilizing data from the U.S. Census Bureau. The variable data and census information (displayed in the table here) was sourced from the Diversity and Disparities Project at Brown University and the shapefiles for mapping were sourced from the Bureau's American Community Survey (ACS). Data from the ACS requires access to a Census API Key.
Specifically, for the study, we will examine data from the three counties included in the Sacramento, California Metropolitan Statistical Area (MSA): El Dorado County, Placer County, and Sacramento County.
### View Data
```{r}
DT::datatable( head(censusChange1, 25) )
```
***
After accessing data from the Diversity and Disparities project, we calculated the changes in House Price and demographic data for each of the tracts that belong to the Sacramento MSA.
For example, the first row of data indicates that for the first tract of data (from an area in El Dorado County), the median house price more than doubled from 2000 to 2010. In contrast, The amount of residents who were foreign born, recent immigrants, living in poverty, identified as Hispanic or Black, and stated that they were self-employed remained approximately the same. The number of residents who spoke poor English decreased substantially by almost 60%, the amount who identified as White who lived in poverty decreased by almost 50%, the residents who were veterans decreased slightly by 16%. The population who worked professional jobs and women in the labor force also decreased slightly by 12% and 14%, respectively.
### 5-point Summary One
```{r, results='asis',message=F, warning=F, fig.width = 15, fig.align='center', echo=F }
#Visualize 5-point summary
censusChange1 %>%
keep(is.numeric) %>%
select(-TRTID10) %>%
stargazer(
omit.summary.stat = c("p25", "p75"), nobs=F, type="html") # For a pdf document, replace html with latex
```
***
This 5-point Data Summary provides an overview of all the variables across the entire Sacramento Region. The first variable, HousePriceChange stands out as a variable with an outrageously high standard deviation. Therefore, after exploring the data table a little further, we discover a tract in Sacramento County where the median house price in 2000 was $316.1721 and the median house price in 2010 was $284,900. This results in a house price change of 898. In comparison to the other data points, this is an extreme outlier. Therefore, we will remove this entry from the data set and re-analyze the data.
### 5-point Summary Two
```{r, echo=F}
#Remove outlier
censusChange2 <- censusChange1[-c(118), ]
```
```{r, results='asis',message=F, warning=F, fig.width = 15, fig.align='center', echo=F}
#Visualize 5-point summary
censusChange2 %>%
keep(is.numeric) %>%
select(-TRTID10) %>%
stargazer(
omit.summary.stat = c("p25", "p75"), nobs=F, type="html")
```
***
After removing the HousePriceChange outlier from the data set, we can reconsider the 5-point data summary with more accurate information for the Sacramento Region. The first variable, HousePriceChange has a mean of 1.745, sd of .402, minimum of .366, and maximum of 3.862. This indicates that on average house prices increased by almost double from 2000 to 2010. There were some exceptions where some house prices dropped by nearly two-thirds and others nearly quadrupled, but overall the region's house prices grew exponentially. This is particularly concerning due to the fact that the overall population in poverty increased by approx. 50%, the population of Hispanics in poverty doubled and the amount of the population who were unemployed also doubled.
The unemployment numbers are relatively consistent with the trends of job loss throughout the United States during this time period due to the Great Recession. However, unlike other areas that experienced a decrease in home prices to accomodate for the lower levels of income, Sacramento's house prices increased. This makes for a risky combination because people who are out of work are in need of cheaper housing, not more expensive housing.
This trend may be responsible for the current high rate of homelessness in Sacramento. In 2019, Sacramento County recorded its highest number of homeless citizens ever. Researchers estimate that this was a 19% increase over previous years due to difficulties with employment and housing.
### Histogram
```{r}
key.labs <- c("Female Labor Force", "Foreign Born", "House Price", "Poor English Speakers", "Black Population", "Hispanic Population", "Manufacturing Population", "Professional Population", "Self Employed Population", "Unemployed Population", "Black Poverty", "Total Poverty", "Hispanic Poverty", "White Poverty", "Recent Immigrant Population", "Veterans")
names(key.labs) <- c("FemaleLaborForceChange", "ForeignBornChange", "HousePriceChange","PoorEnglishChange", "PopBlackChange", "PopHispanicChange", "PopManufactChange", "PopProfChange", "PopSelfEmpChange","PopUnempChange", "PovertyBlackChange", "PovertyChange", "PovertyHispanicChange", "PovertyWhiteChange", "RecentImmigrantChange", "VeteranChange")
```
```{r,message=F, warning=F, echo=F, fig.width = 10}
#Histogram
censusChange2 %>%
keep(is.numeric) %>%
select(-TRTID10) %>%
gather() %>%
ggplot(aes(value)) +
facet_wrap(~ key, scales = "free", labeller = labeller(key = key.labs)) +
geom_histogram() +
labs( title = "Changes in Sacramento, California Demographics from 2000 to 2010", subtitle = "U.S. Census Bureau Data")
```
***
This histogram grid provides a visual of the same data displayed by the 5-point summary table. Here we can see that most data points hover around the 1 point mark, indicating that the variables did not change significantly. However, a few exceptions to this are the House Price variable that clusters closer to the 2 mark indicating that house prices approximately doubled, the poor English speakers, white poverty rate, Hispanic population, self-employed, and recent immigrant populations which experienced some growth, the veterans, professionals, and female labor force which skewed towards decreasing, and the unemployed population that experienced significant growth.
Once again these characteristics are fairly consistent with the downturn of the economy following the 2008 Great Recession, with the exception of the housing market that surprisingly continued to increase, even while other areas suffered from losses in home value.
### Correlation Plot
```{r, message=F, warning=F, echo=F, fig.height=7}
##save correlations in train_cor
train_cor <- cor(censusChange2[,-1])
##Correlation Plot
corrplot(train_cor, type='lower')
```
***
The correlation plot displays the positive and negative relationships amongst the variables. Many of the stronger relationships follow conventional assumptions: for example, recent immigrant change and foreign born change are positively correlated so that as the number of recent immigrants increases, so does the number of foreign born residents in the population. Similarly, poverty and house price are negatively correlated so that as the amount of poverty decreases, the price of houses increases.
One interesting finding is the fairly strong positive correlation between female labor force changes and the professional population changes. In the past, female labor force participation was often correlated with lower-income areas where women were required to work due to family financial difficulties, but this data shows changes in that trend to modern labor force trends where women are involved in professional fields at ever-increasing rates. In this case, as the professional population increases, so does the level of female labor force participation. Subsequently, although female labor force changes are not very strongly correlated with house price changes, professional population changes are positively correlated with house prices. As the population of professionals increases, so does the house prices.
Another notable finding, albeit less positive, is the fairly strong negative correlation between Hispanic population changes and house price. This indicates that as the Hispanic population decreases, the house price increases which suggests disparities in Hispanic wealth in the Sacramento region. Especially since this correlation is about the same as the negative correlation between total poverty and house price and stronger than the correlations between race/ethnicity-specific poverty and house price.
A final discovery to examine is the correlation between white poverty and total poverty in the region. All race/ethnic-specific poverty changes are positively correlated with total poverty, but white poverty changes are most strongly correlated. This is especially interesting due to the fact that as displayed in our 5-point summary, white and black rates of poverty remained fairly stable while Hispanic poverty doubled. One reasonable explanation for this, though, is the fact that in Sacramento County, whites accounted for 48.3% of the population in 2010 while Hispanics accounted for 21.7% of the population (the second largest demographic group). Both El Dorado County and Placer County, which are included in this study had white-majority populations of 80% and 76%, respectively. This suggests that this correlation is related to the higher number of individuals influencing the poverty rate instead of rates of change within racial/ethnic groups.
Regressions
=========================================
### Regression Model Results
```{r, results='asis', fig.align='center'}
reg1<-lm(HousePriceChange ~ ForeignBornChange + PovertyChange + PopHispanicChange + PopProfChange
, data=censusChange2)
reg2<-lm(HousePriceChange ~ ForeignBornChange + RecentImmigrantChange + PoorEnglishChange + VeteranChange + PovertyChange + PovertyBlackChange + PovertyWhiteChange + PovertyHispanicChange + PopBlackChange + PopHispanicChange , data=censusChange2)
reg3<-lm(HousePriceChange ~ ForeignBornChange + RecentImmigrantChange + PoorEnglishChange + VeteranChange + PovertyChange + PovertyBlackChange + PovertyWhiteChange + PovertyHispanicChange + PopBlackChange + PopHispanicChange +
PopHispanicChange + PopUnempChange + PopManufactChange + PopSelfEmpChange + PopProfChange + FemaleLaborForceChange , data=censusChange2)
# present results with stargazer
# library(stargazer)
stargazer( reg1, reg2, reg3,
title="Effect of Community Change on Housing Price Change",
type='html', align=TRUE )
```
***
While the correlation plot gives us a visual idea of what variables appear to be correlated, the regression model gives us a definitive understanding of which variables have a *statistically* significant correlation with changes in house price.
Our first model which examines changes in the foreign born, Hispanic, and professional populations in addition to changes in poverty. This model finds that both PopHispaniChange and PopProfChange have very statistically significant relationships with HousePrice at the 0.01 (99%) level. PovertyChange also has a statistically significant relationship with HousePriceChange at the 0.05 (95%) level. ForeignBornChange does not have a statistically significant relationship. This indicates that in model 1, we can be 95% confident that a 1-unit increase in Poverty is associated with a ~5.2% decrease in House Price, 99% confident that a 1-unit increase in the Hispanic population is associated with a ~13.9% decrease in House Price and 99% confident that a 1-unit increase in the Professional population is associated with a ~36.4% increase in House Price.
In our second model, PovertyChange loses its statistical significance. Instead, changes in Black poverty has a slightly statistically significant relationship where we can be 90% confident that a 1-unit increase in Black poverty is associated with a 2.7% decrease in House Price. Further, changes in the Hispanic population remains very statistically significant and we can be 99% confident that a 1-unit increase in the Hispanic population is associated with a 16% decrease in House Price, which is even higher than our Model 1 estimate.
In our third and final model, BlackPovertyChange loses its statistical significance, while once again HispanicPopulationChange remains highly significant. In this model, we can be 99% confident that a 1-unit increase in Hispanic population is associated with a middling value of a ~14.8% decrease in House Price compared to the other two models. Additionally, changes in the self employed population is somewhat statistically significant and we can be 90% confident that a 1-unit increase in the self-employed population is associated with a ~13.7% increase in House Price. The professional population remains highly statistically significant and we can be 99% confident that a 1 unit increase in the professional population is associated with a ~39.6% increase in House Price. Changes in female participation in the labor force are also somewhat statistically significant and we can be 90% confident that a 1-unit increase in FemaleLaborForceChange is associated with a ~39.9% decrease in House Price.
This finding on female participation in the labor force in a model versus in a correlation plot is more consistent with historical findings that women's labor participation is often correlated with lower income areas while the variable alone suggests a positive relationship. This highlights the importance of omitted variable bias to examine relationships in connection with other variables instead of solely isolated.
Clustering {.storyboard}
=========================================
```{r, echo=F}
#Remove outlier
census.dats2 <- census.dats[-c(118), ]
```
```{r ,message=F, warning=F, echo=F, fig.align='center'}
# Cluster analysis for 2010 Data
# library(mclust)
Census2010<-census.dats2
keep.these1 <-c("Foreign.Born10","Recent.Immigrant10","Poor.English10","Veteran10","Poverty10","Poverty.Black10","Poverty.White10","Poverty.Hispanic10","Pop.Black10","Pop.Hispanic10","Pop.Unemp10","Pop.Manufact10","Pop.SelfEmp10","Pop.Prof10","Female.LaborForce10")
#Run Cluster Analysis
mod2 <- Mclust(Census2010[keep.these1], G=4) # Set groups to 4, but you can remove this to let r split data into own groupings
#summary(mod2, parameters = TRUE)
#Add group classification to df
Census2010$cluster <- mod2$classification
#Visualize Data
stats1 <-
Census2010 %>%
group_by( cluster ) %>%
select(keep.these1)%>%
summarise_each( funs(mean) )
t <- data.frame( t(stats1), stringsAsFactors=F )
names(t) <- paste0( "GROUP.", 1:4 )
t <- t[-1,]
```
### Cluster 1
```{r ,message=F, warning=F, echo=F, fig.align='center'}
plot( rep(1,15), 1:15, bty="n", xlim=c(-.5,1),
type="n", xaxt="n", yaxt="n",
xlab="Score", ylab="",
main=paste("GROUP",1) )
abline( v=seq(0,1,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,1], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7, .8, .9, 1), col.axis="gray", col="gray" )
```
***
**Cluster 1: Diverse Lower-to-Middle Class: Lower poverty, ethnically diverse, large female labor force, high professionalism, large immigrant population**
### Cluster 2
```{r ,message=F, warning=F, echo=F, fig.align='center'}
plot( rep(1,15), 1:15, bty="n", xlim=c(-.5,1),
type="n", xaxt="n", yaxt="n",
xlab="Score", ylab="",
main=paste("GROUP",2) )
abline( v=seq(0,1,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,2], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7, .8, .9, 1), col.axis="gray", col="gray" )
```
***
**Cluster 2: Middle-to-Upper Class: Low poverty, predominantly white, large female labor force, high professionalism, sizable immigrant and veteran populations**
### Cluster 3
```{r ,message=F, warning=F, echo=F, fig.align='center'}
plot( rep(1,15), 1:15, bty="n", xlim=c(-.5,1),
type="n", xaxt="n", yaxt="n",
xlab="Score", ylab="",
main=paste("GROUP",3) )
abline( v=seq(0,1,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,3], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7, .8, .9, 1), col.axis="gray", col="gray" )
```
***
**Cluster 3: Lower Class: Higher poverty, ethnically diverse, large female labor force, large immigrant population, sizable unemployed population**
### Cluster 4
```{r ,message=F, warning=F, echo=F, fig.align='center'}
plot( rep(1,15), 1:15, bty="n", xlim=c(-.5,.9),
type="n", xaxt="n", yaxt="n",
xlab="Score", ylab="",
main=paste("GROUP",4) )
abline( v=seq(0,.9,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,4], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7,.8, .9), col.axis="gray", col="gray" )
```
***
**Cluster 4: Struggling Class: High poverty, ethnically diverse, large female labor force, large immigrant population, large group of poor English speakers, sizable veteran and unemployed populations**
Neighborhoods {.storyboard}
=========================================
### Mapping Clusters
```{r, message=F, warning=F, echo=T, include=FALSE}
#Edit ME: Now, you have cluster groups based on census data for your MSA
#Edit ME (Cont.): But we have no spatial data in Census2010 dataframe
#Edit ME (Cont.): Go back to Lab 4 and review all of the steps.
#Edit ME (Cont.): Follow lab 4 to download and merge spatial information to Census2010 dataframe.
#Edit ME (Cont.): After you have the spatial information, you can create dorlin map with `tmap_shape`
#Edit ME (Cont.): After you create your dorlin map, change echo=T to echo=F so that we do not see your code, which is not required for professional city government presentation.
# Link to Lab 4: https://ds4ps.org/cpp-529-master/labs/lab-04-instructions.html
```
```{r, echo=F, include=F}
census_api_key("665823e8e71ad48b1e3aa3c4a4e49df4f2937a40")
library( tidycensus )
options(tigris_use_cache = TRUE)
sacramento.pop <-
get_acs( geography = "tract", variables = "B01003_001",
state = "06", county = county.fips[state.fips=="06"], geometry = TRUE ) %>%
select( GEOID, estimate ) %>%
dplyr::rename(POP = estimate)
```
```{r, include=T, results="hide"}
# can merge an sf object and data.frame
sacramento.pop$GEOID<-substring(sacramento.pop$GEOID, 2)
sacramento <- merge( sacramento.pop, Census2010, by.x="GEOID", by.y="TRTID10" )
# make sure there are no empty polygons
sacramento2 <- sacramento[ ! st_is_empty( sacramento ) , ]
# convert sf map object to an sp version
sacramento.sp <- as_Spatial( sacramento2 )
class( sacramento.sp )
# project map and remove empty tracts
sacramento.sp <- spTransform( sacramento.sp, CRS("+init=epsg:3395"))
sacramento.sp <- sacramento.sp[ sacramento.sp$POP != 0 & (! is.na( sacramento.sp$POP )) , ]
# convert census tract polygons to dorling cartogram
sacramento.sp$pop.w <- sacramento.sp$POP / 9000 # max(msp.sp$POP) # standardizes it to max of 1.5
sacramento_dorling <- cartogram_dorling( x=sacramento.sp, weight="pop.w", k=0.05 )
tm_shape( sacramento_dorling ) +
tm_polygons( size="POP", col="cluster", n=4, style="cat", palette="Spectral")
```
```{r ,message=F, warning=F, echo=F, fig.align='center'}
#Predicting cluster Grouping for 2000 census tracts
# Get 2000 data
Census2000 <-census.dats2
keep.these00 <-c("Foreign.Born00","Recent.Immigrant00","Poor.English00","Veteran00","Poverty00","Poverty.Black00","Poverty.White00","Poverty.Hispanic00","Pop.Black00","Pop.Hispanic00","Pop.Unemp00","Pop.Manufact00","Pop.SelfEmp00","Pop.Prof00","Female.LaborForce00")
pred00<-predict(mod2, Census2000[keep.these00])
Census2000$PredCluster <- pred00$classification
TransDF2000<-Census2000 %>%
select(TRTID10, PredCluster)
TransDF2010<-Census2010 %>%
select(TRTID10, cluster,Median.HH.Value10)
TransDFnew<-merge(TransDF2000,TransDF2010,by.all="TRTID10",all.x=TRUE)
```
```{r, eval=FALSE}
TransDFnew
```
***
To further explore how neighborhoods have changed in the Sacramento, California region we will now examine how our clusters shifted from 2000 to 2010.
Neighborhood Change {.storyboard}
=========================================
### Creating Transition Matrix
```{r ,message=F, warning=F, echo=F, fig.align='center'}
#Transition Matrix
prop.table( table( TransDFnew$PredCluster, TransDFnew$cluster ) , margin=1 )
```
***
Using the same variables that we analyzed in 2010, we utilized the **pred()** function to predict what clusters Sacramento neighborhoods would have belonged to in 2000.
As noted previously the clusters are defined as follows:
Cluster 1: Diverse Lower-to-Middle Class
Cluster 2: Middle-to-Upper Class
Cluster 3: Lower Class
Cluster 4: Struggling Class
The majority of neighborhoods remained in their respective clusters, but some neighborhoods improved or declined from 2000 to 2010.
Out of all of the clusters, Cluster 3 appears to have been the most susceptible to gentrification with less than half of the neighborhoods that were in the cluster in 2000 remaining in the cluster in 2010. Approximately a third of these neighborhoods transferred to Cluster 1 in a transition from the Lower Class up to the Diverse Lower-to-Middle Class. A little less than a fourth fell into the lower-income cluster of the Struggling Class (Cluster 4).
Over half of the neighborhoods that belonged to the Diverse Lower-to-Middle Class (Cluster 1) and Middle-to-Upper Class (Cluster 2) stayed in the same cluster, while over 90% of the neighborhoods in the Struggling Class (Cluster 4) remained in the Struggling Class.
This suggests that middle class neighborhoods and struggling neighborhoods remained more stable while the "up and coming" Lower Class neighborhoods tinkered between improving and declining.
### Neighborhood Transitions
```{r, message=F, warning=F, echo=F, fig.align='center'}
# Sankey Transition Plot
trn_mtrx1 <-
with(TransDFnew,
table(PredCluster,
cluster))
library(Gmisc)
transitionPlot(trn_mtrx1,
type_of_arrow = "gradient")
```
***
Cluster 1: Diverse Lower-to-Middle Class
Cluster 2: Middle-to-Upper Class
Cluster 3: Lower Class
Cluster 4: Struggling Class
This Sankey Transition Plot is an excellent visual representation of the transition matrix on the previous tab.
As explained above, Cluster 3 is the neighborhood that had the most transfers of neighborhoods as evidenced by the three almost-equally-sized arrows that divide up the neighborhoods.
In addition, Cluster 4 has a steady flow of neighborhoods that remained in Cluster 4, except for a small subset that progressed upwards to the Lower Class.
Cluster 2 also had a steady flow of neighborhoods that remained in Cluster 2, except for a few neighborhoods that dropped down to the Lower-to-Middle Class (Cluster 1) and an even smaller subset that transitioned to the Lower or Struggling classes.
Besides Cluster 3, Cluster 1 had the second most diverse breakdown of its neighborhoods with some moving upwards to Cluster 2 and others moving to Clusters 3 & 4.
About {.storyboard}
=========================================
### About the Developer

***
Thank you for exploring this dashboard on neighborhood change in Sacramento, California!
The data and visuals presented were created by Courtney Stowers as a final project for the CPP 529: Community Analytics Practicum in the Master of Science in Program Evaluation and Data Analytics program at Arizona State University on December 4, 2019.
Image Source: Pixabay UnboxScience
### Video Presentation
### Documentation {data-commentary-width=400}
```{r, eval=F, echo=T}
# R libraries used for this project
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( DT )
library( dplyr )
library( sf )
library( sp )
library( cartogram )
library( tmap )
```
***
All packages used in this dashboard are available for download from CRAN using the _**install.packages()**_ function.
The **tidycensus** package is used to download data from the United States Census Bureau. It requires a Census API Key for authorization.
The **tidyverse, plyr, dplyr, ggplot2,** and **purrr** packages are all a part of the **"tidyverse"** and used for data wrangling purposes. In addition, the **ggplot2** package is used to create dynamic charts and graphs.
The **stargazer** package is used to create regression tables.
The **corrplot** package is used to create correlation plots matrixes.
The **flexdashboard** package formats the output of R Markdown files into interactive presentations.
The **leaflet** package creates interactive maps.
The **mclust** package is used for cluster analysis.
The **DT** package creates javascript style data tables.
The **sf** package works with shapefiles in simple features format.
The **sp** package works with shapefiles and spatial data.
The **cartogram** package creates cartogram style maps.
The **tmap** package creates thematic maps like the Dorling map included in this presentation.