Initial Data Cleaning and Distributions

As seen below the original Land Value measure is heavily skewed, with many high values and a large number of zeros. This same pattern appears in the Land Value per Acre measure and both the Just Value and Just Value per Acre measures.

RAW Data

Minimum 25th Pct. Median Mean 75th Pct. Maximum Std. Dev.
0 7636 24000 54359.97 45000 237561253 538555.4

Minimum 25th Pct. Median Mean 75th Pct. Maximum Std. Dev.
0 16645.79 78150.77 159897.8 206242.5 439880454 559436.5

To account for this multiple versions of the data were created, one trimming off all outliers using the IQR method, one with log transformed data to both account for skew and a possible non-linear relationship, and one that was both log transformed and trimmed using the IQR method. The log transformed and trimmed data provided the most normal distribution and strongest correlations and are used going forward.

While the transformed and trimmed data set removes extreme values that could truly be outliers, it is also likely that some of these values are truly this extreme. Without further verification and cleaning of the parcel data set this impossible to confirm. The data set were also trimmed on the Per Acre measures which accounts for extremely high values that are the result of geometry errors in the parcel data. The distributions of this data set are shown below.

Transformed and Trimmed

Minimum 25th Pct. Median Mean 75th Pct. Maximum Std. Dev.
6.315358 9.615872 10.30899 10.24571 10.77898 13.37494 0.9342481

Minimum 25th Pct. Median Mean 75th Pct. Maximum Std. Dev.
7.113558 10.73657 11.67554 11.50648 12.38057 14.84337 1.22396


Additional Cleaning

To account for internal variation within TAZs, accessibility values were adjusted based an a parcels proximity to major roads (PTMR) score. This provided several benefits, it added more noise to the data in a logical way as assigning 4100 values to 2.2 million parcels is questionable and brings in some of the suitability analysis to the accessibility score.

To add this noise, a spatially lagged accessibility score was calculated using a K nearest neighbors approach. For a given TAZ the 10 closets TAZs were selected and had their accessibility scores averaged together. The standard deviation of this spatially lagged accessibility was then used to add noise using the following formula:

\[ acc_{j} = (acc - sd_{lag})+(PTMR * 2sd_{lag}) \]

Where:

  • \(acc_{j}\) is the adjusted access score
  • \(acc\) is the original access score
  • \(sd_{lag}\) is the spatially lagged access standard deviation
  • \(PTMR\) is the proximity to major road

This adjusts the access score by +/- one standard deviation based on the PTMR score which is scaled between 0 and 1.

Correlations

The following scatter plots are made on a random subset of 10% the data as 1.7 million points would make an illegible graph but the correlations are made using the entire data set.

Land Value and Suitability

Overall

There is a slight linear relationship between suitability and Logged Land Value, though it does appear to taper off after suitability reaches 0.5.

## The overall correlation is 0.206 and is statistically significant with a p value of 0

Separating out by land use does not appear to increase the correlation in any meaningful way.

## Correlation for Commercial/Retail is 0.061 and is statistically significant with a p value of 0 
## Correlation for Industrial/Manufacturing is 0.206 and is statistically significant with a p value of 0 
## Correlation for Multifamily is 0.16 and is statistically significant with a p value of 0 
## Correlation for Office is -0.034 and is statistically significant with a p value of 0 
## Correlation for Other is 0.221 and is statistically significant with a p value of 0 
## Correlation for Single-family is 0.146 and is statistically significant with a p value of 0 
## Correlation for Vacant/Undeveloped is 0.303 and is statistically significant with a p value of 0

Separating out by Place Type is similarly unhelpful, except for rural which shows a slightly stronger relationship than overall.

## Correlation for Commercial is -0.069 and is statistically significant with a p value of 0 
## Correlation for Residential is 0.245 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.126 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.153 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.107 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.17 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.321 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.128 and is statistically significant with a p value of 0

Separating out by Intensity does not appear to increase the correlation in any meaningful way.

## Correlation for intensity 0 is 0.319 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.224 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.158 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.248 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.281 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.222 and is statistically significant with a p value of 0

Per Acre

Normalizing the Land Value to Value per Acre offers significant improvement to the overall correlation, though it does appear to still taper off after suitability hits 0.5. This plateau suggested that after a certain point increases in suitability no longer relate to increased value and other factors take over.

## The overall correlation is 0.4 and is statistically significant with a p value of 0

Examining Value per Acre in the context of land use adds important context. Commercial and Office uses appear to have weaker relationships than Multifamily and Other land uses. Single Family plateaus around 0.4, but steadily rises until then. Overall Office uses have the weakest relationship between Value and Suitability.S

## Correlation for Commercial/Retail is 0.247 and is statistically significant with a p value of 0 
## Correlation for Industrial/Manufacturing is 0.391 and is statistically significant with a p value of 0 
## Correlation for Multifamily is 0.559 and is statistically significant with a p value of 0 
## Correlation for Office is 0.077 and is statistically significant with a p value of 0 
## Correlation for Other is 0.581 and is statistically significant with a p value of 0 
## Correlation for Single-family is 0.347 and is statistically significant with a p value of 0 
## Correlation for Vacant/Undeveloped is 0.444 and is statistically significant with a p value of 0

Separating out by Place Type does not appear to increase the correlation in any meaningful way.

## Correlation for Commercial is -0.069 and is statistically significant with a p value of 0 
## Correlation for Residential is 0.245 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.126 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.153 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.107 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.17 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.321 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.128 and is statistically significant with a p value of 0

Separating out by Intensity does show that lower intensities seem to have a stronger relationship between suitability and land value. Its possible that this is caused by a desire to develop less intense but highly suitable areas.

## Correlation for intensity 0 is 0.503 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.393 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.301 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.303 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.308 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.213 and is statistically significant with a p value of 0

Just Value and Suitability

Overall

Just Value and Suitability appear to have an overall pattern that is similar to Land Value and Suitability with values tapering off around 0.5 suitability.

## The overall correlation is 0.208 and is statistically significant with a p value of 0

Separating by land uses offers no meaningful changes to this relationship.

## Correlation for Commercial/Retail is 0.011 but is statistically insignificant with a p value of 0.06 
## Correlation for Industrial/Manufacturing is 0.131 and is statistically significant with a p value of 0 
## Correlation for Multifamily is 0.086 and is statistically significant with a p value of 0 
## Correlation for Office is -0.032 and is statistically significant with a p value of 0 
## Correlation for Other is 0.156 and is statistically significant with a p value of 0 
## Correlation for Single-family is 0.065 and is statistically significant with a p value of 0 
## Correlation for Vacant/Undeveloped is 0.304 and is statistically significant with a p value of 0

Separating out by Place Type offers no meaningful changes to this relationship.

## Correlation for Commercial is -0.008 but is statistically insignificant with a p value of 0.401 
## Correlation for Residential is 0.233 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.147 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.122 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.041 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.091 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.267 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.047 and is statistically significant with a p value of 0.016

Separating out by Intensity offers no meaningful changes to this relationship.

## Correlation for intensity 0 is 0.257 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.228 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.159 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.232 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.208 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.096 and is statistically significant with a p value of 0

Per Acre

Just like Land value, normalizing Just Value by acre increases the strength of the correlation (though not as significantly as Land Value). The pattern remains the same with values plateauing after suitability hits 0.5.

## The overall correlation is 0.366 and is statistically significant with a p value of 0

Separating by land use also shows a similar pattern to land value, with commercial and office uses having the weakest relationship and multifamily and other having the strongest. Industrial uses have the most noticeable change, with the just value relationship being noticeable weaker, 0.391 to 0.277.

## Correlation for Commercial/Retail is 0.204 
## Correlation for Industrial/Manufacturing is 0.277 
## Correlation for Multifamily is 0.555 
## Correlation for Office is 0.086 
## Correlation for Other is 0.56 
## Correlation for Single-family is 0.292 
## Correlation for Vacant/Undeveloped is 0.445

Separating out by Place Type offers no meaningful changes to this relationship.

## Correlation for Commercial is -0.008 but is statistically insignificant with a p value of 0.401 
## Correlation for Residential is 0.233 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.147 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.122 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.041 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.091 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.267 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.047 and is statistically significant with a p value of 0.016

Separating out by Intensity reveals a similar pattern to land value, lower intensities having a stronger correlation. Again this is likely because less developed but suitable parcels are likely prized by developers.

## Correlation for intensity 0 is 0.461 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.37 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.279 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.274 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.236 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.093 and is statistically significant with a p value of 0

Land Value and Accessibility

Overall

Overall the pattern between Land Value and Accessibility is slightly stronger than Land Value and suitability. Unlike suitability, land value plateaus around 20,000 accessible jobs and then begins decrease around 50,000 jobs. Suggesting the relationship is curvilinear rather than linear

## The overall correlation is 0.31 and is statistically significant with a p value of 0

Adding land use context does not seem to meaningfully change the relationship.

## Correlation for Commercial/Retail is 0.098 and is statistically significant with a p value of 0 
## Correlation for Industrial/Manufacturing is 0.359 and is statistically significant with a p value of 0 
## Correlation for Multifamily is 0.392 and is statistically significant with a p value of 0 
## Correlation for Office is -0.118 and is statistically significant with a p value of 0 
## Correlation for Other is 0.279 and is statistically significant with a p value of 0 
## Correlation for Single-family is 0.293 and is statistically significant with a p value of 0 
## Correlation for Vacant/Undeveloped is 0.397 and is statistically significant with a p value of 0

Separating out by Place Type does not seem to meaningfully change the relationship.

## Correlation for Commercial is -0.168 and is statistically significant with a p value of 0 
## Correlation for Residential is 0.353 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.232 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.28 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.286 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.23 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.369 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.187 and is statistically significant with a p value of 0

Separating out by Intensity shows that the highest and lowest intensities both have stronger relationships than middling intensities. The explanation for lower intensities is likely the same and parcels that a suitable for development and highly developed likely attracted the most expensive developments raising the land value.

## Correlation for intensity 0 is 0.366 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.25 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.21 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.332 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.362 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.418 and is statistically significant with a p value of 0

Per Acre

Normalizing by acre slightly strengthens the relationship, and brings back the plateau pattern seen in previous charts. Suggestion an non-linear relationship.

## The overall correlation is 0.31 and is statistically significant with a p value of 0

The land use patterns are comparable to the suitability patterns seen previously, with multifamily having the most linear relationship and office having the least.

## Correlation for Commercial/Retail is 0.243 and is statistically significant with a p value of 0 
## Correlation for Industrial/Manufacturing is 0.409 and is statistically significant with a p value of 0 
## Correlation for Multifamily is 0.582 and is statistically significant with a p value of 0 
## Correlation for Office is -0.085 and is statistically significant with a p value of 0 
## Correlation for Other is 0.479 and is statistically significant with a p value of 0 
## Correlation for Single-family is 0.406 and is statistically significant with a p value of 0 
## Correlation for Vacant/Undeveloped is 0.488 and is statistically significant with a p value of 0

Separating out by Place Type does not seem to meaningfully change the relationship.

## Correlation for Commercial is -0.168 and is statistically significant with a p value of 0 
## Correlation for Residential is 0.353 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.232 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.28 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.286 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.23 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.369 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.187 and is statistically significant with a p value of 0

Separating out by Intensity reveals the same pattern that lower intnesities have a stronger relationship between accessibility and land value.

## Correlation for intensity 0 is 0.563 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.355 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.349 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.392 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.403 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.419 and is statistically significant with a p value of 0

Just Value and Accessibility

Overall

Just value and accessibility have a comparable pattern to land value and accessibly, albeit slightly weaker.

## The overall correlation is 0.269 and is statistically significant with a p value of 0

Land use appears to provide little additional context except to Vacant land use, which have a significantly stronger relationship that any other land use

## Correlation for Commercial/Retail is 0.07 and is statistically significant with a p value of 0 
## Correlation for Industrial/Manufacturing is 0.276 and is statistically significant with a p value of 0 
## Correlation for Multifamily is 0.213 and is statistically significant with a p value of 0 
## Correlation for Office is -0.113 and is statistically significant with a p value of 0 
## Correlation for Other is 0.165 and is statistically significant with a p value of 0 
## Correlation for Single-family is 0.188 and is statistically significant with a p value of 0 
## Correlation for Vacant/Undeveloped is 0.394 and is statistically significant with a p value of 0

Separating out by Place Type offers no meaningful change to the relationship.

## Correlation for Commercial is -0.029 and is statistically significant with a p value of 0.004 
## Correlation for Residential is 0.288 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.225 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.233 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.203 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.155 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.334 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.08 and is statistically significant with a p value of 0

Separating out by Intensity again reveals that lower intensities have a stronger relationship. Though the relationship between Just Value and Accessibility is weaker than then Land Value, likely due to more developed areas haveing more buildings and therefor higher Just Values.

## Correlation for intensity 0 is 0.322 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.242 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.193 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.284 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.276 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.281 and is statistically significant with a p value of 0

Per Acre

As before normalizing by acre offers modest improvements to the relationship, but continues to suggested a curvilinear or logarithmic relationship rather than a purely linear relation

## The overall correlation is 0.371 and is statistically significant with a p value of 0

The land use patterns are comparable to before, multifamily and other are the strongest and office and retail are the weakest.

## Correlation for Commercial/Retail is 0.213 
## Correlation for Industrial/Manufacturing is 0.268 
## Correlation for Multifamily is 0.473 
## Correlation for Office is -0.067 
## Correlation for Other is 0.415 
## Correlation for Single-family is 0.314 
## Correlation for Vacant/Undeveloped is 0.485

Separating out by Place Type does not seem to meaningfully change the relationship.

## Correlation for Commercial is -0.168 and is statistically significant with a p value of 0 
## Correlation for Residential is 0.353 and is statistically significant with a p value of 0 
## Correlation for Mixed is 0.232 and is statistically significant with a p value of 0 
## Correlation for High Mixed is 0.28 and is statistically significant with a p value of 0 
## Correlation for Office/Institution is 0.286 and is statistically significant with a p value of 0 
## Correlation for Industrial is 0.23 and is statistically significant with a p value of 0 
## Correlation for Rural is 0.369 and is statistically significant with a p value of 0 
## Correlation for Developed OS is 0.187 and is statistically significant with a p value of 0

Separating out by Intensity reveals the same pattern as before, though stronger than the non normalized just value relationship.

## Correlation for intensity 0 is 0.536 and is statistically significant with a p value of 0 
## Correlation for intensity 1 is 0.331 and is statistically significant with a p value of 0 
## Correlation for intensity 2 is 0.311 and is statistically significant with a p value of 0 
## Correlation for intensity 3 is 0.333 and is statistically significant with a p value of 0 
## Correlation for intensity 4 is 0.316 and is statistically significant with a p value of 0 
## Correlation for intensity 5 is 0.283 and is statistically significant with a p value of 0

Conclusion

Its obvious there is a relationship between Suitability and a parcels value, especially land value which had stronger relationships overall. However this relationship is not purely linear and is likely curvilinear or logarithmic in nature. This suggested Suitability would be a strong predictor of value in a model though that model would need to be a non-linear model.

It is possible that there is an underlying spatial component to the relationship that is unaccounted for in this analysis. Adding in a context classification could be a worthwhile exercise. Its possible that urban and rural areas have differing relationships between suitability and value.