Introduction

Project Overview

Spokane is the largest city and county seat of Spokane County, Washington, United States. It is in eastern Washington 92 miles (148 km) south of the Canadian border, 18 miles (30 km) west of the Washington–Idaho border, and 279 miles (449 km) east of Seattle, along I-90.

Spokane is the economic and cultural center of the Spokane metropolitan area, A 2021 estimate sets the population of the Spokane Metropolitan Area at 593,466. The Spokane metropolitan area consists of Spokane County Natural resources have historically been the foundation of Spokane’s economy, with the mining, logging, and agriculture industries providing much of the region’s economic activity. Spokane became an important rail and shipping center because of its location between mining and farming areas.

With an influx of immigrants and virtual workers who are refugees from higher-priced or lower-amenity metro areas. The Spokane area has suffered from suburbanization and urban sprawl in past decades, despite Washington’s use of urban growth boundaries. Therefore, in this report, we will be utilizing cluster analysis techniques to explore the changes in Sacramento’s housing prices and population during the first decade of the 2000s (2000 to 2010) in the midst of the Great Recession.

Dashboard uses 2000 -2020 Decennial Census data to display Spokane County, Washington population transformations using a regression analysis on demographic and socio-economic change variables, clustering census tracts into neighborhood groups. A numerical and graphical analysis of changes from 2000-2010 is given using transition matrix and graphical analysis of Sankey transition plot.

Data

Empirical Framework

The data used in this analysis comes from the archive of the decennial U.S. Census from years 2000 and 2010. The dependent variable of interest is the change in median home/property value between the two decades, which is measured in both absolute dollars and percentages. This analysis uses a litany of explanatory variables from the Census data, all of which have been shown in studies to have some connection with gentrification and other forms of neighborhood change. To name a few of interest, there is change in poverty rates, change in unemployment, change in recent immigrant population, and change in female labor force participation. All are measured both in absolute percentage change (i.e. from 50% to 25% = difference of 25%), as well as relative percentage change (i.e. 100% decrease from 50% to 25%).

View Data

Census data is derived from population densities for the years 2000 and 2010. To pinpoint changes in density +1.00 was added to the year 2000 house price value variable and +0.01 was added to each, year 2000 independent variable then used to divide the corresponding 2010 value for the different tracts in Spokane County.

The median house price column shows the values more than doubled from 2000 to 2010. In contrast, The amount of residents who were foreign born, recent immigrants, poor English changed by a slight decrease in most tracts as noted in the 1st 10 rows.

Tract 2 however has a notably significant population influx from 200% increase in Foreign born change, 300% in Recent immigrant change about 280% in poor English change, except for a slight veterans population decrease, similarly increasing trends are reported in tract 25, tract 16 and also tract 14 though no new immigrants were reported.

5-point summary


Statistic	Mean	St. Dev.	Min	Max

TRTID10	53,063,008,145.000	4,878.636	53,063,000,200	53,063,014,500
HousePriceChange	1.658	0.239	0.305	3.024
ForeignBornChange	1.246	0.817	0.240	5.634
RecentImmigrantChange	1.097	1.112	0.000	6.049
PoorEnglishChange	1.191	1.033	0.000	5.739
VeteranChange	0.767	0.154	0.420	1.168
PovertyChange	1.275	0.678	0.262	4.888
PovertyBlackChange	0.976	2.465	0.000	20.088
PovertyWhiteChange	1.231	0.720	0.262	5.599
PovertyHispanicChange	1.146	1.842	0.000	9.981
PopBlackChange	0.678	0.913	0.000	4.464
PopHispanicChange	1.247	0.953	0.000	5.250
PopUnempChange	1.388	0.692	0.097	4.476
PopManufactChange	0.725	0.291	0.148	1.508
PopSelfEmpChange	0.826	0.359	0.180	2.237
PopProfChange	1.002	0.259	0.464	1.961
FemaleLaborForceChange	0.993	0.135	0.651	1.484

The Static Summary conveys the variability of the data variables from 2000 to 2010.

The greatest variance in population changes off shooting from Black poverty change with a standard deviation of ~2.465 followed by Hispanic poverty change ~1.82, Recent Immigrant change ~1.12, poor English change ~1.03 The standard deviation of most variables is fairly low showing an even distribution of changes.

With each tract having its own values for each variable, the min and max values are the scaled averages of those values. In this case we are using an increase in each variable from it’s 25th to 75th percentile to represent a large change. We can observe the range differences between minimum and maximum values indicating that some areas had No value or low value example* House Price Change min:0.0305 and max:3.024 indicating building and development of new homes as well as Price increases of the same homes which could be a result of gentrification.

*Poverty black change min:0.000 and max:20.088 is the widest range difference showing a very high increase in poor black people in tandem with increase in unemployment, increase in poverty, increase in immigrants as well as increase in the population of black people.

*We can also note increases in reliable measures of community stability such as Female Labor Force Change, Professional, Manufacturing employees and self employed populations increased. Possibly our house price change drivers.

Histogram

This histogram grid is a visualization of the variables displayed by the summary table. Just as observed from the summary data stats the variables with the widest ranges which include ~Poverty black change, ~Poverty Hispanic Change, ~Population black change,~Poor English Change, ~Recent Immigrant change and ~Foreign Born Change clearly show skewed visuals which is a a sign of outliers basically values way too high in one or two tracts compared to others tracts or the mean.

The other variables shows close to a normal distribution changes around the 1 point mark, indicating that the variables did not change significantly. The Histograms show a pattern much easier to process.

Correlation Plot

The correlation plot displays the positive and negative relationships amongst the variables. Most of the stronger relationships follow conventional assumptions: for example, Recent immigrant change and foreign born change are positively correlated so that as the number of recent immigrants increases, so does the number of foreign born residents in the population and also, as the number of Poor English speakers increases, so does the number of foreign born residents in the population. Also as the Population of those self employed and population of professional employees changes increased so did the house price changes.

Similarly, foreign born changes are negatively correlated with veteran population changes, likely because immigrants move in to occupy neighborhoods vacated by veterans or probably veterans evacuating neighborhoods with an influx of immigrants as a sign of neighborhood decline- old neglected buildings with cheaper rental rates close to amenities. This is also corroborated by the negative correlation between veteran changes and house prices.

There is interestingly a strong positive relationship between poverty white changes and poverty changes. All race/ethnic-specific poverty changes are positively correlated with total poverty, but white poverty changes are most strongly correlated. This is especially interesting due to the fact that as displayed in our 5-point summary, Hispanic and white rates of poverty remained fairly stable while Black shot up the highest. One reasonable explanation for this, though, is the fact that in Spokane County, whites accounted for over 80% of the population in 2010 while Hispanics and Blacks accounted for a little over 10% of the population (the second largest demographic group). This suggests that this correlation is related to the higher number of individuals influencing the poverty rate instead of rates of change within racial/ethnic groups.

Female labor force changes are positively correlated to foreign born changes, population manufacturing changes and population professional changes reflecting a higher employment rate of employment for foreign women into manufacturing jobs in a period population unemployment was on the rise.

Poverty Black change has a high correlation with other independent variables that is Poverty Hispanic change, Population Black change and Population Hispanic change which could be a sign of multicollinearity where multiple variables that represent different measures of the same underlying construct are in the same dataset. Blacks and Hispanics in Spokane are likely to be affected or experience the same conditions.

Regressions

Regression Model Results

**Effect of Community Change on Housing Price Change**

	Dependent variable:

	HousePriceChange
	(1)	(2)	(3)

ForeignBornChange	-0.008	-0.024	-0.014
	(0.029)	(0.038)	(0.036)

RecentImmigrantChange		-0.011	-0.025
		(0.026)	(0.025)

PoorEnglishChange		0.020	0.018
		(0.026)	(0.026)

VeteranChange		-0.296^*	-0.317^*
		(0.174)	(0.176)

PovertyChange	-0.008	0.174	0.117
	(0.036)	(0.121)	(0.119)

PovertyBlackChange		-0.001	-0.009
		(0.013)	(0.012)

PovertyWhiteChange		-0.176	-0.124
		(0.113)	(0.111)

PovertyHispanicChange		-0.025	-0.006
		(0.015)	(0.016)

PopBlackChange	0.006	0.002	0.008
	(0.027)	(0.034)	(0.032)

PopUnempChange	-0.019		0.006
	(0.035)		(0.035)

PopManufactChange			-0.076
			(0.087)

PopSelfEmpChange			0.224^***
			(0.071)

PopProfChange			0.089
			(0.098)

FemaleLaborForceChange			-0.144
			(0.209)

PopHispanicChange		-0.013	-0.004
		(0.028)	(0.028)

Constant	1.701^***	1.944^***	1.861^***
	(0.076)	(0.166)	(0.301)


Observations	104	104	104
R²	0.005	0.077	0.223
Adjusted R²	-0.035	-0.022	0.091
Residual Std. Error	0.243 (df = 99)	0.241 (df = 93)	0.228 (df = 88)
F Statistic	0.120 (df = 4; 99)	0.778 (df = 10; 93)	1.688^* (df = 15; 88)

Note:	p<0.1; p<0.05; p<0.01

While the correlation plot gives us a visual idea of what variables appear to be correlated, the regression model gives us a definitive understanding of which variables have a statistically significant correlation with changes in house price.

Our first model (1) which examines changes in the foreign born change, poverty change,black population change finds that at 90%, 95% and 99% confidence interval there is no significant relationship between those variables and change in House price.

In our second model (2), New variables are introduced into the model but show no significant relationship with House Price Change excepts for Veteran change which has a slightly significant relationship with House Price Change at 90% confidence interval showing that a 1-unit decrease in Veteran population will result in 26.9% decrease in House Prices.

In our third and final model (3), All variables are regressed on House Price change. Veteran change slightly increases its negative effect size on House Price with a 1 unit decrease in Veteran Population decreasing the house price value by 31.7% at 90% confidence interval. Additionally, changes in the self employed population is highly statistically significant and we can with 99% confidence state that a 1-unit increase in the self-employed population is associated with a 22.4% increase in House Price.

The lack of statistically significant correlation between Independent variables and House Price could entail a multicollinearity effect with multiple variables that represent different measures of the same underlying construct in the same dataset.

Clustering

This cluster analysis of Spokane census tracts produced 4 distinct neighborhood groups based on their socio-economic and demographic makeup. Navigate through the adjacent tabs to see how each group is characterized.

Cluster 1

Cluster 1: DIVERSE LOWER-to-MIDDLE CLASS: Lower unemployment, large female labor force, fair amount of professionals, large white population in poverty,ethnically diverse

Cluster 2

Cluster 2: LOWER CLASS : Higher poverty, ethnically diverse, large female labor force, large immigrant/foreign born population, sizable professional employees

Cluster 3

Cluster 3: UPPER CLASS: Low poverty, predominantly white, large female labor force, high professionalism, sizable veteran populations, self employeed and foreign born

Cluster 4

CLUSTER 4: Medium to Upper Class : Low unemployment, predominantly white with fairly low poverty, large female labor and professional labor force, sizable veteran populations with a lower foreign born

Neighborhoods

Mapping Clusters

The high value homes Cluster 3 & 4 of Spokane tend to be spread out from west to east on the outside of the city center. Seemingly more geographically dispersed. Median household values Cluster 1 & 2 becomes lower as you go towards the city center where there is a more concentration around the same area North and South of the river. All groups however tend to develop along the river.

Neighborhood Change

Creating Transition Matrix

   
             1          2          3          4
  1 0.42857143 0.50000000 0.03571429 0.03571429
  2 0.31250000 0.62500000 0.00000000 0.06250000
  3 0.07407407 0.00000000 0.77777778 0.14814815
  4 0.24242424 0.12121212 0.09090909 0.54545455

Using the same variables that we analyzed in 2010, we predicted what clusters Spokane neighborhoods would have belonged to in 2000 utilized the pred() function.

As noted previously the clusters are defined as follows:

Cluster 1: Diverse lower-to-Middle class Cluster 2: Lower Class: Cluster 3: Upper Class Cluster 4: Middle-to-Upper Class

Cluster 2 and 4 shifts to cluster 1. Cluster 4 gained a new tracts from Cluster 1, 2 and 3. There is a little shift to cluster 3. Group 4 shows signs of neighborhood transition

Neighborhood Transitions

The visual of this Sankey Transition plot provides a more descriptive and intuitive look at the neighborhood changes that took place in Spokane County between 2000 and 2010. One significant pattern that emerges from this graphic is that notable portions of 2000 neighborhood Groups 2 and 4 had transitioned to Group 1 by 2010.

For 2000’s Group 2 this is a transition towards greater economic health as well as less ethnic diversity, both potential signs of gentrification. For 2000’s Group 4 it is a trajectory into higher poverty and unemployment. A possible explanation for this transition is that these areas were slower to recover from the economic downturn of 2008.

There was also a heavy shift of neighborhoods from Group 1 in 2000 to Group 2 in 2010. This shift is characterized by increased poverty, as well as an influx of minority and immigrant residents. This could both be signs that a neighborhood is struggling to recover from the recent recession, and also the presence of low-income residents who were displaced from their previous neighborhoods.

There was very little movement from other neighborhoods to Group 3 between 2000 and 2010, and there was also very little transition out of Group 3. This is an indication that the richest areas of Spokane were impacted least by the recession, and that these areas have a very high socio-economic barrier-to-entry.

Finally, Group 4 gained some new tracts in 2010 from each of the other 3 groups, though the influx was not substantial. Because the economic conditions in Group 4 are so mixed, movements into this group are largely a sign of neighborhoods in-transition, either gaining economic traction, or still struggling to recover.

Documentation

# R libraries used for this project
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( tmap )
library( cartogram )
library( maptools )
library( sp )
library( sf )

tidycensus: provides access to quick downloads of census data by selecting needed variables, requires Census API key

tidyverse: provides numerous interactive data tools, includes ggplot2, dplyr, and purrr

ggplot2: part of the tidyverse package suite, converts variable information to aesthetics through interactive “grammar of graphics” plots

plyr: provides comprehensive tools for splitting, applying, combining, and re-combining data

stargazer: displays detailed results of multiple regression models side-by-side including variable coefficients and standard errors, R-squared values, and F-statistics

corrplot: shows correlation between every combination of independent variables to check for cases of multi-colinearity

purrr: part of the tidyverse package suite, provides complete and consistent set of tools for working with functions and vectors

flexdashboard: simple interactive dashboard for R markdown, includes unique features such as storyboard layouts, flexible row-column-based layouts, and support for a wide variety of components including htmlwidgets, tabular data, gauges and value boxes, and text annotations.

leaflet: add interactive maps to the dashboard

mcluster: performs cluster analysis on specified data set

pander: renders R objects into Pandoc’s markdown

DT: create interactive data tables which can be searched and sorted on the dashboard

tmap: generates flexible thematic maps to display data outputs

cartogram: creates dorling cartograms to visualize data by color and class

maptools: special tool box for handling spatial objects, particularly when converting tabular data to spatial data

sp: provides utility functions for plotting data as maps, spatial selection, and retriving coordinates

sf: provides a standardized way to encode spatial vector data

---
title: "CAP Final Project"
output: 
  flexdashboard::flex_dashboard:
    social: menu
    source: embed
    vertical layout: scroll
---

```{r setup, include=FALSE, message=F, warning=F}
knitr::opts_chunk$set(  message=F, warning=F, echo=F )

#Load in libraries
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( tmap )
library( cartogram )
library( maptools )
library( sp )
library( sf )
```




```{r, quietly=T, include=F, message=F, warning=F}
census_key <- "8eab9b16f44cb26460ecbde164482194b7052772"
census_api_key(census_key)

#Loading data 
URL <- "https://github.com/DS4PS/cpp-529-master/raw/master/data/CensusData.rds"
census.dats <- readRDS(gzcon(url( URL )))
census.dats <- na.omit(census.dats)
census.dats
```




<style type="text/css"> .sidebar { overflow: auto; } </style>




Introduction {.storyboard}
=========================================



### Project Overview

```{r, message=F, warning=F, echo=F }
# EDIT ME: set coordinates to your city center
leaflet() %>%
  addTiles() %>%
  addMarkers(lng=-117.426, lat=47.659, popup="The Heart of the Inland Northwest")
```


***
Spokane  is the largest city and county seat of Spokane County, Washington, United States. It is in eastern Washington  92 miles (148 km) south of the Canadian border, 18 miles (30 km) west of the Washington–Idaho border, and 279 miles (449 km) east of Seattle, along I-90. 

Spokane is the economic and cultural center of the Spokane metropolitan area, A 2021 estimate sets the population of the Spokane Metropolitan Area at 593,466. The Spokane metropolitan area consists of Spokane County Natural resources have historically been the foundation of Spokane's economy, with the mining, logging, and agriculture industries providing much of the region's economic activity. Spokane became an important rail and shipping center because of its location between mining and farming areas. 

With an influx of immigrants and virtual workers who are refugees from higher-priced or lower-amenity metro areas. The Spokane area has suffered from suburbanization and urban sprawl in past decades, despite Washington's use of urban growth boundaries. Therefore, in this report, we will be utilizing cluster analysis techniques to explore the changes in Sacramento’s housing prices and population during the first decade of the 2000s (2000 to 2010) in the midst of the Great Recession.

Dashboard uses 2000 -2020 Decennial Census data to display Spokane County, Washington population transformations using a regression analysis on demographic and socio-economic change variables, clustering census tracts into neighborhood groups. A numerical and graphical analysis of changes from 2000-2010 is given using transition matrix and graphical analysis of Sankey transition plot.

Data {.storyboard}
=========================================


### Empirical Framework 

```{r, message=F, warning=F, echo=F}
#Edit ME: At this point, census.dats contains census information for all of the US.  
#Edit ME (Cont.): You want to focus on only your chosen MSA selected in Lab 4.
#Edit ME (Cont.): Subset census.dats to include only your MSA of interest. 

census.dats <-
  census.dats %>%
  filter( county == "Spokane County" )


```




```{r, echo=F}
#Calculating change Values for variables 

censusChange1<-ddply(census.dats,"TRTID10",summarise, 
       HousePriceChange = Median.HH.Value10/(Median.HH.Value00+1),# Change variable
       ForeignBornChange = Foreign.Born10/(Foreign.Born00 +.01),
       RecentImmigrantChange = Recent.Immigrant10/(Recent.Immigrant00+.01),
       PoorEnglishChange = Poor.English10/(Poor.English00+.01),
       VeteranChange = Veteran10/(Veteran00+.01),
       PovertyChange = Poverty10/(Poverty00+.01),
       PovertyBlackChange = Poverty.Black10/(Poverty.Black00+.01),
       PovertyWhiteChange = Poverty.White10/(Poverty.White00+.01),
       PovertyHispanicChange = Poverty.Hispanic10/(Poverty.Hispanic00+.01),
       PopBlackChange = Pop.Black10/(Pop.Black00+.01),
       PopHispanicChange = Pop.Hispanic10/(Pop.Hispanic00+.01),
       PopUnempChange = Pop.Unemp10/(Pop.Unemp00+.01),
       PopManufactChange = Pop.Manufact10/(Pop.Manufact00+.01),
       PopSelfEmpChange = Pop.SelfEmp10/(Pop.SelfEmp00+.01),
       PopProfChange = Pop.Prof10/(Pop.Prof00+.01),
       FemaleLaborForceChange = Female.LaborForce10/(Female.LaborForce00+.01)
)

#remove NAs that result 
censusChange1<-censusChange1[!duplicated(censusChange1$TRTID10),]
```


***
The data used in this analysis comes from the archive of the decennial U.S. Census from years 2000 and 2010. The dependent variable of interest is the change in median home/property value between the two decades, which is measured in both absolute dollars and percentages. This analysis uses a litany of explanatory variables from the Census data, all of which have been shown in studies to have some connection with gentrification and other forms of neighborhood change. To name a few of interest, there is change in poverty rates, change in unemployment, change in recent immigrant population, and change in female labor force participation. All are measured both in absolute percentage change (i.e. from 50% to 25% = difference of 25%), as well as relative percentage change (i.e. 100% decrease from 50% to 25%).  



### View Data 

```{r, message=F, warning=F, echo=F}
DT::datatable( head(censusChange1, 25) )
```


***

Census data is derived from population densities for the years 2000 and 2010. To pinpoint  changes in  density +1.00 was added to the year 2000 house price value variable and +0.01 was added to each, year 2000 independent variable then used to divide the corresponding 2010 value for the different tracts in Spokane County.

The median house price column shows the values more than doubled from 2000 to 2010. In contrast, The amount of residents who were foreign born, recent immigrants, poor English changed by a slight decrease in most tracts as noted in the 1st 10 rows.

Tract 2 however has a notably significant population influx from 200% increase in Foreign born change, 300% in Recent immigrant change about 280% in poor English change, except for a slight veterans population decrease, similarly increasing trends are reported in tract 25, tract 16 and also tract 14 though no new immigrants were reported.



### 5-point summary  

```{r, results='asis', message=F, warning=F, fig.width = 9, fig.align='center', echo = F }
#Visualize 5-point summary
censusChange1 %>%
    keep(is.numeric) %>% 
stargazer(
          omit.summary.stat = c("p25", "p75"), nobs=F, type="html" ) #For  pdf document, replace html with latex
```

***
The Static Summary conveys the variability of the data variables from 2000 to 2010. 

The greatest variance in population changes off shooting from Black poverty change with a standard deviation of ~2.465 followed by  Hispanic poverty change ~1.82, Recent Immigrant change ~1.12, poor English change ~1.03
The standard deviation of most variables is fairly low showing an even distribution of changes.

With each tract having its own values for each variable, the min and max values are the scaled averages of those values.
In this case we are using an increase in each variable from it's 25th to 75th percentile to represent a large change.
We can observe the range differences between minimum and maximum values indicating that some areas had No value or low value 
example* House Price Change min:0.0305 and max:3.024 indicating building and development of new homes as well as Price increases of the same homes which could be a result of gentrification.  

*Poverty black change min:0.000 and max:20.088 is the widest range difference showing a very high increase in poor black people in tandem with increase in unemployment, increase in poverty, increase in immigrants as well as increase in the population of black people.

*We can also note increases in reliable measures of community stability such as Female Labor Force Change, Professional, Manufacturing employees and  self employed populations increased. Possibly our house price change drivers.



### Histogram 

```{r, message=F, warning=F, echo=F}
#Histogram
censusChange1 %>%
  keep(is.numeric) %>% 
  gather() %>% 
  ggplot(aes(value)) +
    facet_wrap(~ key, scales = "free") +
    geom_histogram()
```

***

This histogram grid is a  visualization of the variables displayed by the summary table. Just as observed from the summary data stats the variables with the widest ranges which include ~Poverty black change, ~Poverty Hispanic Change, ~Population black change,~Poor English Change, ~Recent Immigrant change and  ~Foreign Born Change clearly show  skewed visuals which is a  a sign of outliers basically values way too high in one or two tracts compared to others tracts or the mean.

The other variables shows close to a normal distribution changes  around the 1 point mark, indicating that the variables did not change significantly. The Histograms show a pattern much easier to process. 



### Correlation Plot 

```{r, message=F, warning=F, echo=F}
##save correlations in train_cor
train_cor <- cor(censusChange1[,-1])

##Correlation Plot
corrplot(train_cor, type='lower')

```


***

The correlation plot displays the positive and negative relationships amongst the variables. Most of the stronger relationships follow conventional assumptions: for example, Recent immigrant change and foreign born change are positively correlated so that as the number of recent immigrants increases, so does the number of foreign born residents in the population and also, as the number of Poor English speakers increases, so does the number of foreign born residents in the population. Also as the Population of those self employed and population of professional employees changes increased so did the house price changes.

Similarly, foreign born changes are negatively correlated with veteran population changes, likely because immigrants move in to occupy neighborhoods vacated by veterans or probably veterans evacuating neighborhoods with an influx of immigrants as a sign of neighborhood decline- old neglected buildings with cheaper rental rates close to amenities. This is also corroborated by the negative correlation between veteran changes and house prices.

There is interestingly a strong positive relationship between poverty white changes  and poverty changes. All race/ethnic-specific poverty changes are positively correlated with total poverty, but white poverty changes are most strongly correlated. This is especially interesting due to the fact that as displayed in our 5-point summary, Hispanic and  white rates of poverty remained fairly stable while Black  shot up the highest.  One reasonable explanation for this, though, is the fact that in Spokane County, whites accounted for over 80% of the population in 2010 while Hispanics and Blacks accounted for a little over 10% of the population (the second largest demographic group). This suggests that this correlation is related to the higher number of individuals influencing the poverty rate instead of rates of change within racial/ethnic groups. 

Female labor force changes are positively correlated to foreign born changes, population manufacturing changes and population professional changes reflecting a higher employment rate of employment for foreign women into manufacturing jobs in a period population unemployment was on the rise.

Poverty Black change has a high correlation with other independent variables that is Poverty Hispanic change, Population Black change and Population Hispanic change which could be a sign of multicollinearity where multiple variables that represent different measures of the same underlying construct are in the same dataset. Blacks and Hispanics in Spokane are likely to be affected or experience the same conditions.




Regressions
=========================================


### Regression Model Results 

```{r, results='asis', fig.align='center', message=F, warning=F, echo=F}

reg1<-lm(HousePriceChange ~  ForeignBornChange + PovertyChange + PopBlackChange + PopUnempChange 
            , data=censusChange1)

reg2<-lm(HousePriceChange ~  ForeignBornChange + RecentImmigrantChange + PoorEnglishChange  + VeteranChange + PovertyChange + PovertyBlackChange + PovertyWhiteChange + PovertyHispanicChange + PopBlackChange + PopHispanicChange  , data=censusChange1)

reg3<-lm(HousePriceChange ~  ForeignBornChange + RecentImmigrantChange + PoorEnglishChange  + VeteranChange + PovertyChange + PovertyBlackChange + PovertyWhiteChange + PovertyHispanicChange + PopBlackChange + PopHispanicChange +
PopHispanicChange + PopUnempChange +  PopManufactChange +  PopSelfEmpChange + PopProfChange + FemaleLaborForceChange   , data=censusChange1)

# present results with stargazer
# library(stargazer)
stargazer( reg1, reg2, reg3, 
           title="Effect of Community Change on Housing Price Change",
           type='html', align=TRUE )

```


***

While the correlation plot gives us a visual idea of what variables appear to be correlated, the regression model gives us a definitive understanding of which variables have a statistically significant correlation with changes in house price.

Our first model (1) which examines changes in the foreign born change, poverty change,black population change finds that at 90%, 95% and 99% confidence interval there is no significant relationship between those variables and change in House price. 


In our second model (2), New variables are introduced  into the model but show no significant relationship with  House Price Change excepts for Veteran change which has a slightly significant relationship with House Price Change at 90% confidence interval showing that a 1-unit decrease in Veteran population will result in 26.9% decrease in House Prices. 

In our third and final model (3), All variables are regressed on House Price change. Veteran change slightly increases its negative effect size on House Price with a 1 unit decrease in Veteran Population decreasing the house price value by 31.7% at 90% confidence interval. Additionally, changes in the self employed population is highly statistically significant and we can with 99% confidence state that a 1-unit increase in the self-employed population is associated with a 22.4% increase in House Price.


The lack of statistically significant correlation between Independent variables and House Price could entail a multicollinearity effect with multiple variables that represent different measures of the same underlying construct in the same dataset.



Clustering {.storyboard}
=========================================



```{r, message=F, warning=F, echo=F, fig.align='center', echo= F }
# Cluster analysis for 2010 Data
# library(mclust)

Census2010<-census.dats
keep.these1 <-c("Foreign.Born10","Recent.Immigrant10","Poor.English10","Veteran10","Poverty10","Poverty.Black10","Poverty.White10","Poverty.Hispanic10","Pop.Black10","Pop.Hispanic10","Pop.Unemp10","Pop.Manufact10","Pop.SelfEmp10","Pop.Prof10","Female.LaborForce10")

#Run Cluster Analysis
mod2 <- Mclust(Census2010[keep.these1],G=4) # Set groups to 5, but you can remove this to let r split data into own groupings

#summary(mod2, parameters = TRUE)

#Add group classification to df
Census2010$cluster <- mod2$classification
```





```{r, message=F, warning=F, echo=F, fig.align='center',echo= F }

#Visualize Data
stats1 <- 
  Census2010 %>% 
  group_by( cluster ) %>% 
  select(keep.these1)%>% 
  summarise_each( funs(mean) )

t <- data.frame( t(stats1), stringsAsFactors=F )
names(t) <- paste0( "GROUP.", 1:4 )
t <- t[-1,]

```


***


This cluster analysis of Spokane census tracts produced 4 distinct neighborhood groups based on their socio-economic and demographic makeup. Navigate through the adjacent tabs to see how each group is characterized.  



### Cluster 1

```{r, message=F, warning=F, echo=F, fig.align='center',echo=F }

plot( rep(1,15), 1:15, bty="n", xlim=c(-.2,.7), 
      type="n", xaxt="n", yaxt="n",
      xlab="Score", ylab="",
      main=paste("GROUP",1) )
abline( v=seq(0,.7,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,1], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7), col.axis="gray", col="gray" )

```


***

Cluster 1: DIVERSE LOWER-to-MIDDLE CLASS: Lower unemployment,  large female labor force, fair amount of professionals, large white population in poverty,ethnically diverse


### Cluster 2

```{r, message=F, warning=F, echo=F, fig.align='center', echo=F}

plot( rep(1,15), 1:15, bty="n", xlim=c(-.2,.7), 
      type="n", xaxt="n", yaxt="n",
      xlab="Score", ylab="",
      main=paste("GROUP",2) )
abline( v=seq(0,.7,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,2], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7), col.axis="gray", col="gray" )

```


***

Cluster 2: LOWER CLASS : Higher poverty, ethnically diverse, large female labor force, large immigrant/foreign born population, sizable professional employees



### Cluster 3 

```{r, message=F, warning=F, echo=F, fig.align='center', echo=F}

plot( rep(1,15), 1:15, bty="n", xlim=c(-.2,.7), 
      type="n", xaxt="n", yaxt="n",
      xlab="Score", ylab="",
      main=paste("GROUP",3) )
abline( v=seq(0,.7,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,3], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7), col.axis="gray", col="gray" )

```


***
Cluster 3: UPPER CLASS: Low poverty, predominantly white, large female labor force, high professionalism, sizable  veteran populations, self employeed and  foreign born 

### Cluster 4 

```{r, message=F, warning=F, echo=F, fig.align='center'}

plot( rep(1,15), 1:15, bty="n", xlim=c(-.2,.7), 
      type="n", xaxt="n", yaxt="n",
      xlab="Score", ylab="",
      main=paste("GROUP",4) )
abline( v=seq(0,.7,.1), lty=3, lwd=1.5, col="gray90" )
segments( y0=1:15, x0=0, x1=100, col="gray70", lwd=2 )
text( 0, 1:15, keep.these1, cex=0.85, pos=2 )
points( t[,4], 1:15, pch=19, col="Steelblue", cex=1.5 )
axis( side=1, at=c(0,.1,.3,.2,.4,.5,.6,.7), col.axis="gray", col="gray" )

```

***

CLUSTER 4: Medium to Upper Class  : Low unemployment, predominantly white with fairly low poverty, large female labor and professional labor force, sizable  veteran populations with a lower  foreign born 



Neighborhoods {.storyboard}
=========================================


### Mapping Clusters

```{r message=FALSE, warning=FALSE, include=FALSE, echo=F }
#Edit ME: Now, you have cluster groups based on census data for your MSA
#Edit ME (Cont.): But we have no spatial data in Census2010 dataframe
#Edit ME (Cont.): Go back to Lab 4 and review all of the steps.
#Edit ME (Cont.): Follow lab 4 to download and merge spatial information to Census2010 dataframe. 
#Edit ME (Cont.): After you have the spatial information, you can create dorlin map with `tmap_shape` 
#Edit ME (Cont.): After you create your dorlin map, change echo=T to echo=F so that we do not see your code, which is not required for professional city government presentation.

# Link to Lab 4: https://ds4ps.org/cpp-529-master/labs/lab-04-instructions.html

crosswalk <- read.csv( "https://raw.githubusercontent.com/DS4PS/cpp-529-master/master/data/cbsatocountycrosswalk.csv",  stringsAsFactors=F, colClasses="character" )

these.msp <- crosswalk$msaname == "SPOKANE, WA"
these.fips <- crosswalk$fipscounty[ these.msp ]
these.fips <- na.omit( these.fips )

state.fips <- substr( these.fips, 1, 2 )
county.fips <- substr( these.fips, 3, 5 )

spokane.pop <-
  get_acs( geography = "tract", variables = "B01003_001", state = "53", county = county.fips[state.fips=="53"], geometry = TRUE ) %>%
  select( GEOID, estimate )

spokane <- merge(spokane.pop, Census2010, by.x="GEOID", by.y="TRTID10")
spokane2 <- spokane[ ! st_is_empty( spokane ) , ]

spokane.sp <- as_Spatial( spokane2 )

# project map and remove empty tracts
spokane.sp <- spTransform( spokane.sp, CRS( "+init=epsg:3395" ) )
spokane.sp <- spokane.sp[ spokane.sp$estimate != 0 & (! is.na( spokane.sp$estimate ) ) , ]

# convert census tract polygons to dorling cartogram
spokane_dorling <- cartogram_dorling( x=spokane.sp, weight="estimate", k=0.05 )
spokane_dorling
```




```{r, message=F, warning=F, echo=F}

tmap_mode("view")
tm_basemap( "Stamen.Watercolor" ) +
  tm_tiles( "Wikimedia" ) +
  tm_shape( spokane_dorling ) +
  tm_polygons( size="estimate", col="cluster", n=4, style="quantile", labels=c("1","2","3","4"), palette="Spectral") +
  tm_layout( "Dorling Cartogram - Spokane MSA", title.position = c("right", "bottom") )

```






```{r, message=F, warning=F, echo=F, fig.align='center',echo=F }
#Predicting cluster Grouping for 2000 census tracts

# Get 2000 data
Census2000 <-census.dats


keep.these00 <-c("Foreign.Born00","Recent.Immigrant00","Poor.English00","Veteran00","Poverty00","Poverty.Black00","Poverty.White00","Poverty.Hispanic00","Pop.Black00","Pop.Hispanic00","Pop.Unemp00","Pop.Manufact00","Pop.SelfEmp00","Pop.Prof00","Female.LaborForce00")

pred00<-predict(mod2, Census2000[keep.these00])

Census2000$PredCluster <- pred00$classification

TransDF2000<-Census2000 %>%
  select(TRTID10, PredCluster)

TransDF2010<-Census2010 %>%
  select(TRTID10, cluster,Median.HH.Value10) 

TransDFnew<-merge(TransDF2000,TransDF2010,by.all="TRTID10",all.x=TRUE)
```



***

The high value homes Cluster 3 & 4 of Spokane tend to be spread out from west to east on the outside of the city center. Seemingly more geographically dispersed. Median household values Cluster 1 & 2 becomes lower as you go towards the city center where there is a more concentration around the same area North and South of the river. 
All groups however tend to develop along the river.


Neighborhood Change {.storyboard}
=========================================


### Creating Transition Matrix

```{r, message=F, warning=F, echo=F, fig.align='center',echo=F }

#Transition Matrix
prop.table( table( TransDFnew$PredCluster, TransDFnew$cluster ) , margin=1 )
    
```


***

Using the same variables that we analyzed in 2010, we  predicted what clusters Spokane neighborhoods would have belonged to in 2000 utilized the pred() function.

As noted previously the clusters are defined as follows:

Cluster 1: Diverse lower-to-Middle class
Cluster 2: Lower Class:
Cluster 3: Upper Class 
Cluster 4: Middle-to-Upper Class


Cluster 2 and 4 shifts to cluster 1. Cluster 4 gained a new tracts from Cluster 1, 2 and 3. There is a little shift to cluster 3. Group 4 shows signs of neighborhood transition



<!---
Explain how you predicted clusters in 2000, and then look at transition of tracts from a particular cluster gouping in 2000 to 2010. Briefly highlight the numbers using transition matrix.  Describe any pattern that emerges from results.    
-->


### Neighborhood Transitions

```{r, message=F, warning=F, echo=F, fig.align='center', echo=F}

# Sankey Transition Plot
trn_mtrx1 <-
  with(TransDFnew,
       table(PredCluster, 
             cluster))

library(Gmisc)
transitionPlot(trn_mtrx1, 
               type_of_arrow = "gradient")
```


***

The visual of this Sankey Transition plot provides a more descriptive and intuitive look at the neighborhood changes that took place in Spokane County between 2000 and 2010. One significant pattern that emerges from this graphic is that notable portions of 2000 neighborhood Groups 2 and 4 had transitioned to Group 1 by 2010.

For 2000's Group 2 this is a transition towards greater economic health as well as less ethnic diversity, both potential signs of gentrification. For 2000's Group 4 it is a trajectory into higher poverty and unemployment. A possible explanation for this transition is that these areas were slower to recover from the economic downturn of 2008. 

There was also a heavy shift of neighborhoods from Group 1 in 2000 to Group 2 in 2010. This shift is characterized by increased poverty, as well as an influx of minority and immigrant residents. This could both be signs that a neighborhood is struggling to recover from the recent recession, and also the presence of low-income residents who were displaced from their previous neighborhoods. 

There was very little movement from other neighborhoods to Group 3 between 2000 and 2010, and there was also very little transition out of Group 3. This is an indication that the richest areas of Spokane were impacted least by the recession, and that these areas have a very high socio-economic barrier-to-entry. 

Finally, Group 4 gained some new tracts in 2010 from each of the other 3 groups, though the influx was not substantial. Because the economic conditions in Group 4 are so mixed, movements into this group are largely a sign of neighborhoods in-transition, either gaining economic traction, or still struggling to recover. 

<!---
Explain any patterns that emerge from the figure not obvious by looking at transition matrix above.  For instance, if a community does transition to a new cluster, what is a common change from one cluster grouping in 2000 to another cluster grouping in 2010.  Can we observe any evidence of gentrification, explain.  
-->


### Documentation {data-commentary-width=400}

```{r, eval=F, echo=T, message=F, warning=F}
# R libraries used for this project
library( tidycensus )
library( tidyverse )
library( ggplot2 )
library( plyr )
library( stargazer )
library( corrplot )
library( purrr )
library( flexdashboard )
library( leaflet )
library( mclust )
library( pander )
library( DT )
library( tmap )
library( cartogram )
library( maptools )
library( sp )
library( sf )
```


***

tidycensus: provides access to quick downloads of census data by selecting needed variables, requires Census API key

tidyverse: provides numerous interactive data tools, includes ggplot2, dplyr, and purrr

ggplot2: part of the tidyverse package suite, converts variable information to aesthetics through interactive "grammar of graphics" plots

plyr: provides comprehensive tools for splitting, applying, combining, and re-combining data

stargazer: displays detailed results of multiple regression models side-by-side including variable coefficients and standard errors, R-squared values, and F-statistics

corrplot: shows correlation between every combination of independent variables to check for cases of multi-colinearity

purrr: part of the tidyverse package suite, provides complete and consistent set of tools for working with functions and vectors

flexdashboard: simple interactive dashboard for R markdown, includes unique features such as storyboard layouts, flexible row-column-based layouts, and support for a wide variety of components including
htmlwidgets, tabular data, gauges and value boxes, and text annotations.

leaflet: add interactive maps to the dashboard

mcluster: performs cluster analysis on specified data set

pander: renders R objects into Pandoc's markdown

DT: create interactive data tables which can be searched and sorted on the dashboard

tmap: generates flexible thematic maps to display data outputs

cartogram: creates dorling cartograms to visualize data by color and class

maptools: special tool box for handling spatial objects, particularly when converting tabular data to spatial data

sp: provides utility functions for plotting data as maps, spatial selection, and retriving coordinates

sf: provides a standardized way to encode spatial vector data