library(ggplot2) # plots and visualizations
library(readr) #read  data
library(dplyr) #data manipulation
library(ggpubr)  # For ggarrange function
library(tidyverse) #Comprehensive package for data manipulation and visualization.
library(reshape2) # transform the data) from wide format to long format
library(plotly) # interactive visualizations
library(ggalt) #geom_encircle function
library(sf) #to read shape files - 
library(sp) #convert to sf files
library(spgwr)
library(spdep)
library(gridExtra)
library(viridis)
library(car)
library(reshape2)

Data

Data Overview

Source: Kaggle Link: https://www.kaggle.com/datasets/shivachandel/kc-house-data/data Variables: 21 (id, date, price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view, condition, grade, sqft_above, sqft_basement, yr_built, yr_renovated, zipcode, lat, long, sqft_living15, sqft_lot15) Observations: 21.613 Period: 02 May 2014 to 27 May 2015 Geographic coverage: King County, including Seattle

house_data <- read.csv("kc_house_data.csv",header = TRUE, sep = ",")

Structure of Dataset

The dataset consists of 21,613 observations and 21 variables. These variables include details like id, date, price, bedrooms, bathrooms, sqft_living, sqft_lot, floors, waterfront, view, condition, grade, sqft_above, sqft_basement, yr_built, yr_renovated, zipcode, lat, long, sqft_living15, and sqft_lot15.

The id variable serves as a unique identifier for each observation. All variables are numeric or integer types, except for the date variable, which is currently stored as a character type. Converting the date variable to a date type will be beneficial for time-related analyses.

It’s important to recognize that although all variables are numerically structured, some hold categorical significance. These categorical variables are represented by numeric codes to denote different categories or levels within the dataset. This distinction is crucial for accurate data interpretation and analysis.

str(house_data)
## 'data.frame':    21613 obs. of  21 variables:
##  $ id           : num  7.13e+09 6.41e+09 5.63e+09 2.49e+09 1.95e+09 ...
##  $ date         : chr  "20141013T000000" "20141209T000000" "20150225T000000" "20141209T000000" ...
##  $ price        : num  221900 538000 180000 604000 510000 ...
##  $ bedrooms     : int  3 3 2 4 3 4 3 3 3 3 ...
##  $ bathrooms    : num  1 2.25 1 3 2 4.5 2.25 1.5 1 2.5 ...
##  $ sqft_living  : int  1180 2570 770 1960 1680 5420 1715 1060 1780 1890 ...
##  $ sqft_lot     : int  5650 7242 10000 5000 8080 101930 6819 9711 7470 6560 ...
##  $ floors       : num  1 2 1 1 1 1 2 1 1 2 ...
##  $ waterfront   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ view         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ condition    : int  3 3 3 5 3 3 3 3 3 3 ...
##  $ grade        : int  7 7 6 7 8 11 7 7 7 7 ...
##  $ sqft_above   : int  1180 2170 770 1050 1680 3890 1715 1060 1050 1890 ...
##  $ sqft_basement: int  0 400 0 910 0 1530 0 0 730 0 ...
##  $ yr_built     : int  1955 1951 1933 1965 1987 2001 1995 1963 1960 2003 ...
##  $ yr_renovated : int  0 1991 0 0 0 0 0 0 0 0 ...
##  $ zipcode      : int  98178 98125 98028 98136 98074 98053 98003 98198 98146 98038 ...
##  $ lat          : num  47.5 47.7 47.7 47.5 47.6 ...
##  $ long         : num  -122 -122 -122 -122 -122 ...
##  $ sqft_living15: int  1340 1690 2720 1360 1800 4760 2238 1650 1780 2390 ...
##  $ sqft_lot15   : int  5650 7639 8062 5000 7503 101930 6819 9711 8113 7570 ...

Features & Summary Statistics

id - The id variable represents a unique identifier for each home sold.

date - The date variable, contains information about the date of the house sale and spans from May 2, 2014, to May 27, 2015.

price: - The price variable is the dependent variable and shows a wide range, with the minimum house price at $75,000 and the maximum at $7,700,000. - The median house price is $450,000, and the mean is $540,088.

bedrooms and bathrooms: - The variables related to the number of bedrooms and bathrooms (0.5 accounts for a room with a toilet but no shower) exhibit varying ranges and distributions. - The number of bedrooms ranges from 0 to 33, with a mean of approximately 3.37. - The number of bathrooms ranges from 0 to 8, with a mean of approximately 2.12.

sqft_living and sqft_lot: - These variables represent the size of houses. - sqft_living reflects the Square footage of the apartments interior living area, ranging from 290 to 13,540 square feet, with a mean of 2080. - sqft_lot represents the lot size, ranging from 520 to 1,651,359 square feet, with a mean of 15,107.

floors -The floors variable is represents the levels of the houses. The majority of houses have 1 or 1.5 floors. - Notably, there seems to be a common occurrence of houses with 1.5 floors, while the mean is approximately 1.494. - This suggests that many houses have a split-level design or additional space on an upper level, contributing to the fractional floor values.

waterfront:
- The waterfront variable is a dummy variable mostly 0 , represents the property has no waterfront view and 1 for with waterfront.

view and condition:
- view represents the overall view rating (0 to 4) with a mean of 0.23 -condition represents the overall condition rating (0 to 5) with a mean of 3.41 for condition.

grade: - grade represents the overall grade given to the housing unit and ranges from 1 to 13 where 1-3 falls short of building construction and design, 7 has an average level of construction and design, and 11-13 have a high quality level of construction and design.

sqft_above, and sqft_basement: - sqft_above and sqft_basement show the square footage above ground and is below ground level¶ (in the basement), respectively.

yr_built, yr_renovated: - Houses were built between 1900 and 2015 (yr_built), with the majority built in the mid to late 20th century. - yr_renovated indicates the last renovation year, with a mean of 84.4 and many zero values, suggesting no renovations.

Geographical Information (lat, long, zipcode): - lat and long provide latitude and longitude information of house locations, respectively. - zipcode represents the zip code of the house location.

sqft_living15 and sqft_lot15: - sqft_living15 and sqft_lot15 indicate the living room and lot size in 2015, reflecting potential renovations or changes. (? some sources mention it differently and main source couldnt find !!!)

These summary statistics provide an overview of the distribution and characteristics of each numeric variable in the dataset, with a specific focus on understanding the relationships with the dependent variable, ‘price.’

summary(house_data)
##        id                date               price            bedrooms     
##  Min.   :1.000e+06   Length:21613       Min.   :  75000   Min.   : 0.000  
##  1st Qu.:2.123e+09   Class :character   1st Qu.: 321950   1st Qu.: 3.000  
##  Median :3.905e+09   Mode  :character   Median : 450000   Median : 3.000  
##  Mean   :4.580e+09                      Mean   : 540088   Mean   : 3.371  
##  3rd Qu.:7.309e+09                      3rd Qu.: 645000   3rd Qu.: 4.000  
##  Max.   :9.900e+09                      Max.   :7700000   Max.   :33.000  
##                                                                           
##    bathrooms      sqft_living       sqft_lot           floors     
##  Min.   :0.000   Min.   :  290   Min.   :    520   Min.   :1.000  
##  1st Qu.:1.750   1st Qu.: 1427   1st Qu.:   5040   1st Qu.:1.000  
##  Median :2.250   Median : 1910   Median :   7618   Median :1.500  
##  Mean   :2.115   Mean   : 2080   Mean   :  15107   Mean   :1.494  
##  3rd Qu.:2.500   3rd Qu.: 2550   3rd Qu.:  10688   3rd Qu.:2.000  
##  Max.   :8.000   Max.   :13540   Max.   :1651359   Max.   :3.500  
##                                                                   
##    waterfront            view          condition         grade       
##  Min.   :0.000000   Min.   :0.0000   Min.   :1.000   Min.   : 1.000  
##  1st Qu.:0.000000   1st Qu.:0.0000   1st Qu.:3.000   1st Qu.: 7.000  
##  Median :0.000000   Median :0.0000   Median :3.000   Median : 7.000  
##  Mean   :0.007542   Mean   :0.2343   Mean   :3.409   Mean   : 7.657  
##  3rd Qu.:0.000000   3rd Qu.:0.0000   3rd Qu.:4.000   3rd Qu.: 8.000  
##  Max.   :1.000000   Max.   :4.0000   Max.   :5.000   Max.   :13.000  
##                                                                      
##    sqft_above   sqft_basement       yr_built     yr_renovated   
##  Min.   : 290   Min.   :   0.0   Min.   :1900   Min.   :   0.0  
##  1st Qu.:1190   1st Qu.:   0.0   1st Qu.:1951   1st Qu.:   0.0  
##  Median :1560   Median :   0.0   Median :1975   Median :   0.0  
##  Mean   :1788   Mean   : 291.5   Mean   :1971   Mean   :  84.4  
##  3rd Qu.:2210   3rd Qu.: 560.0   3rd Qu.:1997   3rd Qu.:   0.0  
##  Max.   :9410   Max.   :4820.0   Max.   :2015   Max.   :2015.0  
##  NA's   :2                                                      
##     zipcode           lat             long        sqft_living15 
##  Min.   :98001   Min.   :47.16   Min.   :-122.5   Min.   : 399  
##  1st Qu.:98033   1st Qu.:47.47   1st Qu.:-122.3   1st Qu.:1490  
##  Median :98065   Median :47.57   Median :-122.2   Median :1840  
##  Mean   :98078   Mean   :47.56   Mean   :-122.2   Mean   :1987  
##  3rd Qu.:98118   3rd Qu.:47.68   3rd Qu.:-122.1   3rd Qu.:2360  
##  Max.   :98199   Max.   :47.78   Max.   :-121.3   Max.   :6210  
##                                                                 
##    sqft_lot15    
##  Min.   :   651  
##  1st Qu.:  5100  
##  Median :  7620  
##  Mean   : 12768  
##  3rd Qu.: 10083  
##  Max.   :871200  
## 

Data Pre-Processing

Checking Dublicates

The analysis showed two key points about duplicates. First, there were no duplicates across the entire dataset, meaning all entries are unique. However, 177 houses had duplicate ‘id’ values, indicating some homes were sold more than once . To ensure robust estimation, we decided to remove these duplicates.

house_data[duplicated(house_data), ]
##  [1] id            date          price         bedrooms      bathrooms    
##  [6] sqft_living   sqft_lot      floors        waterfront    view         
## [11] condition     grade         sqft_above    sqft_basement yr_built     
## [16] yr_renovated  zipcode       lat           long          sqft_living15
## [21] sqft_lot15   
## <0 rows> (or 0-length row.names)
duplicates <- house_data[duplicated(house_data$id), ]
dim(duplicates)
## [1] 177  21
# Remove rows with duplicate 'id' values
data_manipulated <- house_data[!duplicated(house_data$id), ]

Dropping Unrelated Features

We drop id because it is just a unique identifier with no predictive value. The date is removed since we are not doing time-related analysis, making it unnecessary.

# Drop less important variables
data_manipulated <- data_manipulated %>%
  select(-id, -date)

Handling Missing Values

Most variables have complete data, but sqft_above has two missing values. Given the large dataset, removing these two observations is reasonable and keeps the dataset mostly complete. Thus we removed these 2 observations.

# Remove observations with missing values in 'sqft_above'
data_manipulated <- data_manipulated[complete.cases(data_manipulated$sqft_above), ]

Handling Zero Values in Bedroom & Bathroom

# Replace zero values in bedrooms with the median value
data_manipulated <- data_manipulated %>%
  mutate(bedrooms = ifelse(bedrooms == 0, median(bedrooms[bedrooms != 0], na.rm = TRUE), bedrooms))

# Replace zero values in bathrooms with the median value
data_manipulated <- data_manipulated %>%
  mutate(bathrooms = ifelse(bathrooms == 0, median(bathrooms[bathrooms != 0], na.rm = TRUE), bathrooms))

Handling Outlier in Bedrooms Data

Based on our detailed EDA analysis conducted in the previous studies, we identified and addressed an anomaly in the dataset. Specifically, for instances where the ‘bedrooms’ value was recorded as 33, we replaced these values with the median number of bedrooms to ensure data consistency and accuracy.

# Find indices where 'bedrooms' is 33
index_bedrooms_33 <- which(data_manipulated$bedrooms == 33)

# Replace the 'bedrooms' value of 33 with the median value of bedrooms
data_manipulated$bedrooms[index_bedrooms_33] <- median(data_manipulated$bedrooms, na.rm = TRUE)

Scaling the Data

We scaled the continuous numeric variables (price, sqft_living, sqft_living15, sqft_lot, sqft_lot15, sqft_above, and sqft_basement) to standardize them. This helps make our regression model more stable and easier to interpret. We kept the other variables (discrete numeric, categorical, date, and geographical) unchanged to keep their original meanings. This step is important for accurate and reliable analysis.

# Define variables according to their types
cont_vars <- c("price", "sqft_living", "sqft_living15", "sqft_lot", "sqft_lot15", "sqft_above", "sqft_basement")
disc_vars <- c("bedrooms", "floors", "bathrooms")
cat_vars <- c("waterfront", "view", "condition", "grade")
date_vars <- c("date", "yr_built", "yr_renovated")
geo_vars <- c("lat", "long", "zipcode")

# Scale the continuous numeric variables
scaled_cont_vars <- data_manipulated %>%
  select(all_of(cont_vars)) %>%
  mutate(across(everything(), scale))

# Combine scaled continuous variables with the rest of the original dataset
data_scaled <- data_manipulated %>%
  select(-all_of(cont_vars)) %>%
  bind_cols(scaled_cont_vars)

# Verify the scaling
summary(data_scaled)
##     bedrooms        bathrooms         floors        waterfront      
##  Min.   : 1.000   Min.   :0.500   Min.   :1.000   Min.   :0.000000  
##  1st Qu.: 3.000   1st Qu.:1.750   1st Qu.:1.000   1st Qu.:0.000000  
##  Median : 3.000   Median :2.250   Median :1.500   Median :0.000000  
##  Mean   : 3.372   Mean   :2.118   Mean   :1.496   Mean   :0.007605  
##  3rd Qu.: 4.000   3rd Qu.:2.500   3rd Qu.:2.000   3rd Qu.:0.000000  
##  Max.   :11.000   Max.   :8.000   Max.   :3.500   Max.   :1.000000  
##       view          condition        grade           yr_built   
##  Min.   :0.0000   Min.   :1.00   Min.   : 1.000   Min.   :1900  
##  1st Qu.:0.0000   1st Qu.:3.00   1st Qu.: 7.000   1st Qu.:1952  
##  Median :0.0000   Median :3.00   Median : 7.000   Median :1975  
##  Mean   :0.2351   Mean   :3.41   Mean   : 7.662   Mean   :1971  
##  3rd Qu.:0.0000   3rd Qu.:4.00   3rd Qu.: 8.000   3rd Qu.:1997  
##  Max.   :4.0000   Max.   :5.00   Max.   :13.000   Max.   :2015  
##   yr_renovated        zipcode           lat             long       
##  Min.   :   0.00   Min.   :98001   Min.   :47.16   Min.   :-122.5  
##  1st Qu.:   0.00   1st Qu.:98033   1st Qu.:47.47   1st Qu.:-122.3  
##  Median :   0.00   Median :98065   Median :47.57   Median :-122.2  
##  Mean   :  84.74   Mean   :98078   Mean   :47.56   Mean   :-122.2  
##  3rd Qu.:   0.00   3rd Qu.:98117   3rd Qu.:47.68   3rd Qu.:-122.1  
##  Max.   :2015.00   Max.   :98199   Max.   :47.78   Max.   :-121.3  
##       price.V1         sqft_living.V1     sqft_living15.V1      sqft_lot.V1    
##  Min.   :-1.266031   Min.   :-1.950390   Min.   :-2.317726   Min.   :-0.35186  
##  1st Qu.:-0.594161   1st Qu.:-0.710084   1st Qu.:-0.726708   1st Qu.:-0.24305  
##  Median :-0.246192   Median :-0.176970   Median :-0.216299   Median :-0.18109  
##  Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.00000  
##  3rd Qu.: 0.284124   3rd Qu.: 0.508462   3rd Qu.: 0.556606   3rd Qu.:-0.10688  
##  Max.   :19.470684   Max.   :12.465446   Max.   : 6.156522   Max.   :39.38863  
##     sqft_lot15.V1       sqft_above.V1     sqft_basement.V1  
##  Min.   :-0.443280   Min.   :-1.810435   Min.   :-0.658904  
##  1st Qu.:-0.280770   1st Qu.:-0.712811   1st Qu.:-0.658904  
##  Median :-0.188720   Median :-0.278587   Median :-0.658904  
##  Mean   : 0.000000   Mean   : 0.000000   Mean   : 0.000000  
##  3rd Qu.:-0.098580   3rd Qu.: 0.517492   3rd Qu.: 0.606080  
##  Max.   :31.355656   Max.   : 9.189922   Max.   :10.228987

Modelling

We first created general_model with all variables.

general_model <- lm(price ~ ., data = data_scaled)

Collineratiy

sqft_basement is collinear with sqft_living and sqft_above, which causes multicollinearity issues in the model. Since sqft_living is a more comprehensive measure, we decided to remove sqft_basement from the model.

Then we calculate VIF to check if any variables still show high multicollinearity (VIF > 4 or 5), we can exclude those as well.

The VIF results show that sqft_living and sqft_above have high multicollinearity, which can affect our model. So, we removed sqft_above to reduce multicollinearity since sqft_living is a more comprehensive measure.

And finally we defined general_model_reduced with rest of the variables.

# Checking for collinearity
collinear_vars <- alias(general_model)$Complete

print(collinear_vars)
##               (Intercept)    bedrooms       bathrooms      floors        
## sqft_basement              0              0              0              0
##               waterfront     view           condition      grade         
## sqft_basement              0              0              0              0
##               yr_built       yr_renovated   zipcode        lat           
## sqft_basement              0              0              0              0
##               long           sqft_living    sqft_living15  sqft_lot      
## sqft_basement              0   196979/94874              0              0
##               sqft_lot15     sqft_above    
## sqft_basement              0 -239387/127825
# Rebuild the general model without `sqft_basement`
general_model_reduced <- lm(price ~ . -sqft_basement, data = data_scaled)
# Calculate VIF for the new model
general_vif <- vif(general_model_reduced)
print(general_vif)
##      bedrooms     bathrooms        floors    waterfront          view 
##      1.712835      3.366588      2.011579      1.205482      1.436426 
##     condition         grade      yr_built  yr_renovated       zipcode 
##      1.253355      3.416502      2.433057      1.151969      1.663138 
##           lat          long   sqft_living sqft_living15      sqft_lot 
##      1.180312      1.826732      8.749416      2.974701      2.099874 
##    sqft_lot15    sqft_above 
##      2.133043      6.965209
# Rebuild the general model without `sqft_above`
general_model_reduced <- lm(price ~ . -sqft_basement -sqft_above,data = data_scaled)

# Calculate VIF for the new model
general_vif2 <- vif(general_model_reduced)
print(general_vif2)
##      bedrooms     bathrooms        floors    waterfront          view 
##      1.712611      3.285189      1.619245      1.203934      1.398059 
##     condition         grade      yr_built  yr_renovated       zipcode 
##      1.245115      3.345730      2.432563      1.151866      1.662303 
##           lat          long   sqft_living sqft_living15      sqft_lot 
##      1.164572      1.771062      5.092426      2.913130      2.097854 
##    sqft_lot15 
##      2.132471
 # Defining the formula for the reduced general model
  general_model_reduced <- lm(price ~ bedrooms + bathrooms + floors + waterfront + view + condition + grade + yr_built + yr_renovated + zipcode  + lat  + long  + sqft_living + sqft_living15 + sqft_lot, data = data_scaled)

Correlation Analysis

We then performed a correlation analysis to identify variables highly correlated with price. Using a cutoff level of 0.5, we found that price has a strong correlation with the variables: bathrooms, sqft_living, grade, sqft_above, and sqft_living15.

However, we also noticed that sqft_living, sqft_above, and sqft_living15” are highly correlated with each other. By using previous VIF results, we removed the sqft_above and keep sqft_living15 in the model.

Thus we built highcor_model and checked VIF result for that model too.

# Calculate correlation for all variables
cor_matrix <- cor(data_scaled, use = "complete.obs")

# Set upper triangle to NA to keep only the lower triangle to have more clear view
cor_matrix[upper.tri(cor_matrix)] <- NA

# Create a heatmap for correlation values by using 0.5 cut off level
melted_correlation <- melt(cor_matrix, na.rm = TRUE)

p1 <- ggplot(melted_correlation, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", 
                       midpoint = 0, limit = c(-1, 1), space = "Lab", 
                       name = "Correlation") +
  geom_text(aes(label = ifelse(abs(value) >= 0.5, round(value, 2), "")), vjust = 1,
            size = 2) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  coord_fixed()

p1

# Preliminary model to check VIF
highcor_model <- lm(price ~ bathrooms + sqft_living + grade + sqft_living15, data = data_scaled)

# Calculate VIF
vif(highcor_model)
##     bathrooms   sqft_living         grade sqft_living15 
##      2.447412      4.047698      2.800989      2.619964

OLS Results

The highcor_model explains about 54% which is not enough of the variability in house prices.All predictors are significant, with sqft_living having the largest positive impact on house price, followed by grade and sqft_living15. Surprisingly, bathrooms show a negative relationship, which might need further investigation.

To potentially improve the model, we used general_model_final which includes more variables. The general_model_final explains about 66.75% of the variability in house prices, indicating a better fit. This comprehensive model includes additional significant predictors such as bedrooms, floors, waterfront, view, condition, yr_built, yr_renovated, zipcode, lat, long, sqft_lot, and sqft_lot15, demonstrating a more detailed understanding of the factors influencing house prices.

Thus we wanted to used spatial analysis to try to improve the model.

summary(highcor_model)
## 
## Call:
## lm(formula = price ~ bathrooms + sqft_living + grade + sqft_living15, 
##     data = data_scaled)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.6925 -0.3716 -0.0620  0.2744 13.1447 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -1.909080   0.050459 -37.835  < 2e-16 ***
## bathrooms     -0.099240   0.009459 -10.492  < 2e-16 ***
## sqft_living    0.495180   0.009349  52.967  < 2e-16 ***
## grade          0.276610   0.006623  41.767  < 2e-16 ***
## sqft_living15  0.022181   0.007521   2.949  0.00319 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6803 on 21429 degrees of freedom
## Multiple R-squared:  0.5373, Adjusted R-squared:  0.5372 
## F-statistic:  6221 on 4 and 21429 DF,  p-value: < 2.2e-16
summary(general_model_reduced)
## 
## Call:
## lm(formula = price ~ bedrooms + bathrooms + floors + waterfront + 
##     view + condition + grade + yr_built + yr_renovated + zipcode + 
##     lat + long + sqft_living + sqft_living15 + sqft_lot, data = data_scaled)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4726 -0.2708 -0.0249  0.2124 11.7798 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    2.562e+01  7.937e+00   3.229  0.00125 ** 
## bedrooms      -1.064e-01  5.421e-03 -19.628  < 2e-16 ***
## bathrooms      1.098e-01  8.830e-03  12.440  < 2e-16 ***
## floors         5.007e-02  8.816e-03   5.680 1.36e-08 ***
## waterfront     1.589e+00  4.730e-02  33.588  < 2e-16 ***
## view           1.366e-01  5.773e-03  23.668  < 2e-16 ***
## condition      6.911e-02  6.427e-03  10.754  < 2e-16 ***
## grade          2.670e-01  5.832e-03  45.783  < 2e-16 ***
## yr_built      -7.177e-03  1.988e-04 -36.103  < 2e-16 ***
## yr_renovated   5.343e-05  9.988e-06   5.350 8.91e-08 ***
## zipcode       -1.616e-03  9.032e-05 -17.894  < 2e-16 ***
## lat            1.618e+00  2.915e-02  55.488  < 2e-16 ***
## long          -5.549e-01  3.525e-02 -15.742  < 2e-16 ***
## sqft_living    4.281e-01  8.440e-03  50.720  < 2e-16 ***
## sqft_living15  4.447e-02  6.384e-03   6.966 3.36e-12 ***
## sqft_lot      -3.888e-03  3.932e-03  -0.989  0.32280    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5484 on 21418 degrees of freedom
## Multiple R-squared:  0.6995, Adjusted R-squared:  0.6993 
## F-statistic:  3324 on 15 and 21418 DF,  p-value: < 2.2e-16

Spatial Analyses

Shape Data

Acquisition and Integration

To initiate spatial exploration, we obtained the shapefile for King County from the website [link: https://gis-kingcounty.opendata.arcgis.com/]. Subsequently, we merged this shapefile with our existing and manipulated dataset, data_scaled. This merged dataset serves as the foundation for our geospatial visualization. After merging we recognized the number of observations increased. We suspected that shape file consist some dublication so we remove dublicates.

names(data_scaled)
##  [1] "bedrooms"      "bathrooms"     "floors"        "waterfront"   
##  [5] "view"          "condition"     "grade"         "yr_built"     
##  [9] "yr_renovated"  "zipcode"       "lat"           "long"         
## [13] "price"         "sqft_living"   "sqft_living15" "sqft_lot"     
## [17] "sqft_lot15"    "sqft_above"    "sqft_basement"
shape<-read_sf("zipcodeSHP/")
head(shape, 3)
## Simple feature collection with 3 features and 8 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 1266412 ymin: 97164.57 xmax: 1303262 ymax: 134079.6
## Projected CRS: NAD83(HARN) / Washington North (ftUS)
## # A tibble: 3 × 9
##     ZIP ZIPCODE COUNTY ZIP_TYPE COUNTY_NAM  PREFERRED_  Shape_STAr Shape_STLe
##   <dbl> <chr>   <chr>  <chr>    <chr>       <chr>            <dbl>      <dbl>
## 1 98001 98001   033    Standard King County AUBURN      525368923.    147537.
## 2 98002 98002   033    Standard King County AUBURN      205302741     104440.
## 3 98003 98003   033    Standard King County FEDERAL WAY 316942614.    123734.
## # ℹ 1 more variable: geometry <MULTIPOLYGON [US_survey_foot]>
names(shape)
## [1] "ZIP"        "ZIPCODE"    "COUNTY"     "ZIP_TYPE"   "COUNTY_NAM"
## [6] "PREFERRED_" "Shape_STAr" "Shape_STLe" "geometry"
# Ensure 'zipcode' fields are of the same type
shape$ZIPCODE <- as.character(shape$ZIPCODE)
data_scaled$zipcode <- as.character(data_scaled$zipcode)
merged_data <- merge(data_scaled, shape, by.x = "zipcode", by.y = "ZIPCODE", all.x = TRUE)

str(merged_data)
## 'data.frame':    23125 obs. of  27 variables:
##  $ zipcode      : chr  "98001" "98001" "98001" "98001" ...
##  $ bedrooms     : num  3 3 3 4 3 3 3 5 4 4 ...
##  $ bathrooms    : num  2 2.75 2 2.5 1 2 1.75 2 2.5 2.5 ...
##  $ floors       : num  1 1 1 2 1 1 1 1 2 1 ...
##  $ waterfront   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ view         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ condition    : int  3 4 3 3 4 3 3 5 3 3 ...
##  $ grade        : int  7 7 7 7 6 7 7 7 8 7 ...
##  $ yr_built     : int  1985 1958 1963 1993 1975 1974 1987 1959 2001 1987 ...
##  $ yr_renovated : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lat          : num  47.3 47.3 47.3 47.3 47.3 ...
##  $ long         : num  -122 -122 -122 -122 -122 ...
##  $ price        : num [1:23125, 1] -0.79 -0.777 -0.899 -0.872 -0.967 ...
##  $ sqft_living  : num [1:23125, 1] -0.4598 -0.1987 -0.6992 0.0841 -0.623 ...
##  $ sqft_living15: num [1:23125, 1] -0.479 -0.508 -0.595 -0.289 -0.683 ...
##  $ sqft_lot     : num [1:23125, 1] -0.0494 0.0236 -0.121 -0.1717 0.0458 ...
##  $ sqft_lot15   : num [1:23125, 1] -0.1829 0.233 -0.0945 -0.1743 0.0443 ...
##  $ sqft_above   : num [1:23125, 1] -0.942 0.132 -0.423 0.445 -0.339 ...
##  $ sqft_basement: num [1:23125, 1] 0.809 -0.659 -0.659 -0.659 -0.659 ...
##  $ ZIP          : num  98001 98001 98001 98001 98001 ...
##  $ COUNTY       : chr  "033" "033" "033" "033" ...
##  $ ZIP_TYPE     : chr  "Standard" "Standard" "Standard" "Standard" ...
##  $ COUNTY_NAM   : chr  "King County" "King County" "King County" "King County" ...
##  $ PREFERRED_   : chr  "AUBURN" "AUBURN" "AUBURN" "AUBURN" ...
##  $ Shape_STAr   : num  5.25e+08 5.25e+08 5.25e+08 5.25e+08 5.25e+08 ...
##  $ Shape_STLe   : num  147537 147537 147537 147537 147537 ...
##  $ geometry     :sfc_MULTIPOLYGON of length 23125; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:1563, 1:2] 1279285 1279733 1280459 1281079 1281153 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
# Remove duplicates from shape
shape_unique <- shape[!duplicated(shape$ZIPCODE), ]

merged_data <- merge(data_scaled, shape_unique, by.x = "zipcode", by.y = "ZIPCODE", all.x = TRUE)
data_avg <- data_scaled %>%
  group_by(zipcode) %>%
  summarise_all(mean, na.rm = TRUE)

# Merge the averaged datasets
merged_data_avg <- data_avg %>%
  left_join(shape_unique, by = c("zipcode" = "ZIPCODE"))

Comparing Geometries: King County vs. House Data

The first plot shows the King County geometry in light blue. The second plot overlays the subset of the merged dataset’s geometry in yellow. This color differentiation and overlay help highlight the specific area covered by the house data within the broader King County context. A legend is added to clarify the color representation. The decision to use different colors for the initial mapping and overlay is to observe missing parts more clearly during the analysis process.

# Set up a 1x2 plotting layout
par(mfrow = c(1, 1))

# Plot the geometry of the shapefile with one color (e.g., black)
plot(shape$geometry, col = "lightblue", main = "King County Geometry")

# Plot only the subset within the shapefile geometry with a different color (e.g., red)
plot(merged_data_avg$geometry, col = "yellow",border = "blue", add = TRUE)

# Add a legend
legend("topright", legend = c("King County", "House Data"), 
       fill = c("lightblue", "yellow"))

ggplot(merged_data_avg) + 
geom_sf(data=merged_data_avg$geometry)

ggplot() + 
geom_point(aes(x = long, y = lat, color = factor(zipcode)), data=data_scaled,show.legend= FALSE) 

GWR

We created the model formula according to correlation result from previous section.

We grouped the data according to zipcodes and our dataset includes data for 70 distinct zip codes, each with more than 49 observations, making it suitable for Geographically Weighted Regression (GWR).Grouping the data by zip codes allows us to perform GWR on each one separately, ensuring reliable and meaningful results due to the sufficient number of observations.

 model_formula <- price ~ bathrooms + sqft_living + grade + sqft_living15
# Group data by zipcode and summarize
grouped_data <- merged_data %>%
  group_by(zipcode) %>%
  summarize(count = n())

dim(grouped_data)
## [1] 70  2
summary(grouped_data)
##    zipcode              count      
##  Length:70          Min.   : 49.0  
##  Class :character   1st Qu.:202.5  
##  Mode  :character   Median :281.5  
##                     Mean   :306.2  
##                     3rd Qu.:403.8  
##                     Max.   :599.0

Only for zipcode “98001”

To better understand the GWR process, we first focused on zip code 98001, which has 359 observations.

We selected the optimal bandwidth using ggwr.sel, which minimizes the Cross-Validation (CV) score. The chosen bandwidth is 4.605267, ensuring the most appropriate spatial scale for analyzing local relationships.

The GWR study for zip code 98001 shows that housing prices change differently with important factors in different places. The intercept values show different starting prices in different areas. More bathrooms usually have a small negative effect on prices, while bigger living spaces (sqft_living) and higher quality (grade) have a positive effect. The size of nearby living spaces (sqft_living15) has a small and slightly negative effect. These results show that local context is important in real estate, meaning that how factors affect prices can be different in different places.

The visualizations help us understand how each predictor’s influence varies across different locations within the zip code 98001.and its showing the color changes from north to west.

data_scaled <- st_as_sf(data_scaled, coords = c("long", "lat"), crs = 4326)

# Subset the data for the first zip code  "98001" 
zip_code_to_check <- '98001'
data_subset <- data_scaled %>% filter(zipcode == zip_code_to_check)

#Checking the number of obs in given zipcode
message("# of obs for ", zip_code_to_check, ": ", nrow(data_subset))
## # of obs for 98001: 359
data_subset <- st_as_sf(data_subset)
# Extract coordinates 
crds <- st_coordinates(data_subset)
data_subset_df <- as.data.frame(data_subset)
data_subset_df$X <- crds[, 1]
data_subset_df$Y <- crds[, 2]
# find the optimum bandwidth by using ggwr.sel() function
bw<-ggwr.sel(model_formula, data=data_subset_df, coords=crds, family=gaussian(), longlat=TRUE)
## Bandwidth: 4.623624 CV score: 7.269422 
## Bandwidth: 7.473711 CV score: 7.294329 
## Bandwidth: 2.862173 CV score: 7.309594 
## Bandwidth: 5.409885 CV score: 7.273245 
## Bandwidth: 3.950809 CV score: 7.273628 
## Bandwidth: 4.697589 CV score: 7.269484 
## Bandwidth: 4.586302 CV score: 7.269422 
## Bandwidth: 4.605349 CV score: 7.26942 
## Bandwidth: 4.605308 CV score: 7.26942 
## Bandwidth: 4.605267 CV score: 7.26942 
## Bandwidth: 4.598023 CV score: 7.26942 
## Bandwidth: 4.605227 CV score: 7.26942 
## Bandwidth: 4.605267 CV score: 7.26942
# Fitting the GWR model by using ggwr() function
model.ggwr<-ggwr(model_formula, data_subset_df, crds, family=gaussian(), longlat=TRUE, bandwidth=bw)
model.ggwr
## Call:
## ggwr(formula = model_formula, data = data_subset_df, coords = crds, 
##     bandwidth = bw, family = gaussian(), longlat = TRUE)
## Kernel function: gwr.Gauss 
## Fixed bandwidth: 4.605267 
## Summary of GWR coefficient estimates at data points:
##                     Min.    1st Qu.     Median    3rd Qu.       Max.  Global
## X.Intercept.  -1.5966517 -1.5424884 -1.4717525 -1.2964942 -1.1868254 -1.3918
## bathrooms     -0.0500051 -0.0398909 -0.0178073 -0.0069147  0.0019236 -0.0252
## sqft_living    0.2175299  0.2203236  0.2254762  0.2456898  0.2614657  0.2377
## grade          0.0851893  0.0977804  0.1159407  0.1226024  0.1277208  0.1071
## sqft_living15 -0.0148450 -0.0100417 -0.0072986 -0.0034986  0.0078127 -0.0038
data_subset_sf <- st_as_sf(data_subset_df, coords = c("X", "Y"), crs = 4326)

# Adding GWR coefficients to the spatial data frame
data_subset_sf$GWR_bathrooms <- model.ggwr$SDF$bathrooms
data_subset_sf$GWR_sqft_living <- model.ggwr$SDF$sqft_living
data_subset_sf$GWR_grade <- model.ggwr$SDF$grade
data_subset_sf$GWR_sqft_living15 <- model.ggwr$SDF$sqft_living15

# Plotting the GWR coefficients
ggplot(data_subset_sf) +
  geom_sf(aes(color = GWR_bathrooms)) +
  scale_color_viridis_c() +
  labs(title = "GWR Coefficient for Bathrooms") +
  theme_minimal()

ggplot(data_subset_sf) +
  geom_sf(aes(color = GWR_sqft_living)) +
  scale_color_viridis_c() +
  labs(title = "GWR Coefficient for Living Space") +
  theme_minimal()

ggplot(data_subset_sf) +
  geom_sf(aes(color = GWR_grade)) +
  scale_color_viridis_c() +
  labs(title = "GWR Coefficient for Grade") +
  theme_minimal()

ggplot(data_subset_sf) +
  geom_sf(aes(color = GWR_sqft_living15)) +
  scale_color_viridis_c() +
  labs(title = "GWR Coefficient for Nearby Living Space") +
  theme_minimal()

For all distincts seperately

We created loop to apply the GWR model for each distinct seperately. For it we first subset the data for each unique zip code and chose the optimal bandwidth for each subset. Then we again selected the optimal bandwidth using ggwr.sel and performed Gwr model for each distinct seperately.

# Convert the data to sf object if not already
data_scaled <- st_as_sf(data_scaled, coords = c("long", "lat"), crs = 4326)

# Get a list of unique zip codes
unique_zipcodes <- unique(data_scaled$zipcode)

# Initialize a list to store results
gwr_results <- list()

# Loop through each zip code and perform GWR
for (zip_code_to_check in unique_zipcodes) {
  # Subset the data for the current zip code
  data_subset <- data_scaled %>% filter(zipcode == zip_code_to_check)
  
  # Check the number of observations
  message("# of obs for ", zip_code_to_check, ": ", nrow(data_subset))
  
  if (nrow(data_subset) > 0) {
    # Convert to sf object if not already
    data_subset <- st_as_sf(data_subset)
    
    # Extract coordinates
    crds <- st_coordinates(data_subset)
    data_subset_df <- as.data.frame(data_subset)
    data_subset_df$X <- crds[, 1]
    data_subset_df$Y <- crds[, 2]
    
    # Find the optimum bandwidth using ggwr.sel() function
    bw <- ggwr.sel(model_formula, data=data_subset_df, coords=crds, family=gaussian(), longlat=TRUE)
    
    # Fit the GWR model using ggwr() function
    model.ggwr <- ggwr(model_formula, data_subset_df, crds, family=gaussian(), longlat=TRUE, bandwidth=bw)
    
    # Store the results
    gwr_results[[zip_code_to_check]] <- model.ggwr
  }
}
## # of obs for 98178: 258
## Bandwidth: 2.139113 CV score: 30.62107 
## Bandwidth: 3.457702 CV score: 30.97033 
## Bandwidth: 1.32418 CV score: 30.17638 
## Bandwidth: 0.8205241 CV score: 31.03534 
## Bandwidth: 1.635457 CV score: 30.33901 
## Bandwidth: 1.131801 CV score: 30.20402 
## Bandwidth: 1.282309 CV score: 30.16777 
## Bandwidth: 1.258933 CV score: 30.16572 
## Bandwidth: 1.246481 CV score: 30.16556 
## Bandwidth: 1.249682 CV score: 30.16554 
## Bandwidth: 1.249853 CV score: 30.16554 
## Bandwidth: 1.249812 CV score: 30.16554 
## Bandwidth: 1.249771 CV score: 30.16554 
## Bandwidth: 1.249812 CV score: 30.16554
## # of obs for 98125: 403
## Bandwidth: 2.244435 CV score: 66.57481 
## Bandwidth: 3.627947 CV score: 68.42635 
## Bandwidth: 1.389378 CV score: 64.21351 
## Bandwidth: 0.8609237 CV score: 64.27465 
## Bandwidth: 1.152965 CV score: 63.78816 
## Bandwidth: 1.133973 CV score: 63.7772 
## Bandwidth: 1.083166 CV score: 63.77166 
## Bandwidth: 0.9982773 CV score: 63.85166 
## Bandwidth: 1.100438 CV score: 63.76945 
## Bandwidth: 1.100843 CV score: 63.76945 
## Bandwidth: 1.100566 CV score: 63.76945 
## Bandwidth: 1.100526 CV score: 63.76945 
## Bandwidth: 1.100607 CV score: 63.76945 
## Bandwidth: 1.100566 CV score: 63.76945
## # of obs for 98028: 282
## Bandwidth: 2.371563 CV score: 16.06558 
## Bandwidth: 3.833438 CV score: 16.53027 
## Bandwidth: 1.468074 CV score: 15.04832 
## Bandwidth: 0.9096874 CV score: 14.43437 
## Bandwidth: 0.5645854 CV score: 15.39927 
## Bandwidth: 1.061376 CV score: 14.46406 
## Bandwidth: 0.9414933 CV score: 14.4246 
## Bandwidth: 0.9621992 CV score: 14.42322 
## Bandwidth: 0.9591447 CV score: 14.42319 
## Bandwidth: 0.9594527 CV score: 14.42319 
## Bandwidth: 0.959412 CV score: 14.42319 
## Bandwidth: 0.9594934 CV score: 14.42319 
## Bandwidth: 0.9594527 CV score: 14.42319
## # of obs for 98136: 263
## Bandwidth: 1.905238 CV score: 44.17777 
## Bandwidth: 3.079661 CV score: 44.82874 
## Bandwidth: 1.179404 CV score: 43.1133 
## Bandwidth: 0.7308138 CV score: 41.30855 
## Bandwidth: 0.4535699 CV score: 39.74352 
## Bandwidth: 0.2822238 CV score: 42.80279 
## Bandwidth: 0.538312 CV score: 40.07978 
## Bandwidth: 0.3881215 CV score: 39.90809 
## Bandwidth: 0.4499739 CV score: 39.73907 
## Bandwidth: 0.4415742 CV score: 39.73337 
## Bandwidth: 0.4211571 CV score: 39.75137 
## Bandwidth: 0.4384973 CV score: 39.73305 
## Bandwidth: 0.4389775 CV score: 39.73303 
## Bandwidth: 0.4390285 CV score: 39.73303 
## Bandwidth: 0.4390692 CV score: 39.73303 
## Bandwidth: 0.4390285 CV score: 39.73303
## # of obs for 98074: 435
## Bandwidth: 3.966702 CV score: 109.4319 
## Bandwidth: 6.411851 CV score: 108.9804 
## Bandwidth: 7.923035 CV score: 108.8893 
## Bandwidth: 8.127523 CV score: 108.8809 
## Bandwidth: 8.98338 CV score: 108.8521 
## Bandwidth: 9.512328 CV score: 108.8383 
## Bandwidth: 9.839236 CV score: 108.8309 
## Bandwidth: 10.04128 CV score: 108.8267 
## Bandwidth: 10.16614 CV score: 108.8242 
## Bandwidth: 10.24332 CV score: 108.8227 
## Bandwidth: 10.29101 CV score: 108.8218 
## Bandwidth: 10.32049 CV score: 108.8213 
## Bandwidth: 10.33871 CV score: 108.821 
## Bandwidth: 10.34997 CV score: 108.8208 
## Bandwidth: 10.35692 CV score: 108.8206 
## Bandwidth: 10.36123 CV score: 108.8205 
## Bandwidth: 10.36388 CV score: 108.8205 
## Bandwidth: 10.36553 CV score: 108.8205 
## Bandwidth: 10.36654 CV score: 108.8204 
## Bandwidth: 10.36717 CV score: 108.8204 
## Bandwidth: 10.36756 CV score: 108.8204 
## Bandwidth: 10.3678 CV score: 108.8204 
## Bandwidth: 10.36794 CV score: 108.8204 
## Bandwidth: 10.36804 CV score: 108.8204 
## Bandwidth: 10.36809 CV score: 108.8204 
## Bandwidth: 10.36813 CV score: 108.8204 
## Bandwidth: 10.36813 CV score: 108.8204
## # of obs for 98053: 402
## Bandwidth: 6.767849 CV score: 34.99621 
## Bandwidth: 10.93968 CV score: 35.44976 
## Bandwidth: 4.189518 CV score: 34.15008 
## Bandwidth: 2.596021 CV score: 33.46203 
## Bandwidth: 1.611186 CV score: 36.92217 
## Bandwidth: 3.204683 CV score: 33.63277 
## Bandwidth: 1.987066 CV score: 34.33478 
## Bandwidth: 2.363421 CV score: 33.5639 
## Bandwidth: 2.736119 CV score: 33.46495 
## Bandwidth: 2.6576 CV score: 33.45895 
## Bandwidth: 2.654503 CV score: 33.45893 
## Bandwidth: 2.652421 CV score: 33.45892 
## Bandwidth: 2.65221 CV score: 33.45892 
## Bandwidth: 2.65225 CV score: 33.45892 
## Bandwidth: 2.652169 CV score: 33.45892 
## Bandwidth: 2.65221 CV score: 33.45892
## # of obs for 98003: 276
## Bandwidth: 6.101307 CV score: 6.664922 
## Bandwidth: 9.862266 CV score: 6.687406 
## Bandwidth: 3.776907 CV score: 6.627613 
## Bandwidth: 2.340348 CV score: 6.63392 
## Bandwidth: 3.462474 CV score: 6.6225 
## Bandwidth: 3.177887 CV score: 6.619764 
## Bandwidth: 2.857976 CV score: 6.621793 
## Bandwidth: 3.138065 CV score: 6.619662 
## Bandwidth: 3.117532 CV score: 6.619644 
## Bandwidth: 3.113073 CV score: 6.619644 
## Bandwidth: 3.113661 CV score: 6.619644 
## Bandwidth: 3.113702 CV score: 6.619644 
## Bandwidth: 3.113742 CV score: 6.619644 
## Bandwidth: 3.113702 CV score: 6.619644
## # of obs for 98198: 275
## Bandwidth: 3.665006 CV score: 21.14421 
## Bandwidth: 5.924183 CV score: 21.5609 
## Bandwidth: 2.268757 CV score: 20.55329 
## Bandwidth: 1.405828 CV score: 19.65085 
## Bandwidth: 0.8725087 CV score: 19.08421 
## Bandwidth: 0.5428991 CV score: 20.30357 
## Bandwidth: 1.042899 CV score: 19.18772 
## Bandwidth: 0.7154143 CV score: 19.21805 
## Bandwidth: 0.8895483 CV score: 19.08699 
## Bandwidth: 0.8529042 CV score: 19.08396 
## Bandwidth: 0.861137 CV score: 19.08365 
## Bandwidth: 0.8612261 CV score: 19.08365 
## Bandwidth: 0.8610963 CV score: 19.08365 
## Bandwidth: 0.8611777 CV score: 19.08365 
## Bandwidth: 0.861137 CV score: 19.08365
## # of obs for 98146: 281
## Bandwidth: 2.152787 CV score: 30.26216 
## Bandwidth: 3.479806 CV score: 33.05096 
## Bandwidth: 1.332645 CV score: 26.4451 
## Bandwidth: 0.8257693 CV score: 22.87962 
## Bandwidth: 0.5125028 CV score: 21.00645 
## Bandwidth: 0.3188935 CV score: 24.36936 
## Bandwidth: 0.6042329 CV score: 21.26694 
## Bandwidth: 0.4581777 CV score: 21.18126 
## Bandwidth: 0.5241332 CV score: 21.00845 
## Bandwidth: 0.5166411 CV score: 21.00583 
## Bandwidth: 0.516306 CV score: 21.00582 
## Bandwidth: 0.5162653 CV score: 21.00582 
## Bandwidth: 0.5162246 CV score: 21.00582 
## Bandwidth: 0.5162653 CV score: 21.00582
## # of obs for 98038: 587
## Bandwidth: 6.385508 CV score: 24.0127 
## Bandwidth: 10.32165 CV score: 25.20735 
## Bandwidth: 3.952836 CV score: 22.03946 
## Bandwidth: 2.449362 CV score: 19.81225 
## Bandwidth: 1.520164 CV score: 19.20535 
## Bandwidth: 1.025549 CV score: 20.88914 
## Bandwidth: 1.870162 CV score: 19.18393 
## Bandwidth: 1.719976 CV score: 19.13498 
## Bandwidth: 1.710959 CV score: 19.13436 
## Bandwidth: 1.694278 CV score: 19.13402 
## Bandwidth: 1.697121 CV score: 19.134 
## Bandwidth: 1.697231 CV score: 19.134 
## Bandwidth: 1.69719 CV score: 19.134 
## Bandwidth: 1.697272 CV score: 19.134 
## Bandwidth: 1.697231 CV score: 19.134
## # of obs for 98115: 576
## Bandwidth: 2.257114 CV score: 72.73039 
## Bandwidth: 3.64844 CV score: 74.69182 
## Bandwidth: 1.397226 CV score: 69.09036 
## Bandwidth: 0.8657869 CV score: 65.05394 
## Bandwidth: 0.5373392 CV score: 62.1136 
## Bandwidth: 0.3343473 CV score: 63.75416 
## Bandwidth: 0.5619148 CV score: 62.30859 
## Bandwidth: 0.4932583 CV score: 61.83735 
## Bandwidth: 0.4325597 CV score: 61.78461 
## Bandwidth: 0.4544775 CV score: 61.73937 
## Bandwidth: 0.4571654 CV score: 61.73968 
## Bandwidth: 0.4551687 CV score: 61.73934 
## Bandwidth: 0.4550958 CV score: 61.73934 
## Bandwidth: 0.4550551 CV score: 61.73934 
## Bandwidth: 0.4550958 CV score: 61.73934
## # of obs for 98107: 264
## Bandwidth: 1.849837 CV score: 40.25904 
## Bandwidth: 2.99011 CV score: 47.04667 
## Bandwidth: 1.145109 CV score: 30.33254 
## Bandwidth: 0.709563 CV score: 24.56627 
## Bandwidth: 0.4403809 CV score: 21.334 
## Bandwidth: 0.2740172 CV score: 20.64538 
## Bandwidth: 0.242639 CV score: 20.85132 
## Bandwidth: 0.3189593 CV score: 20.58401 
## Bandwidth: 0.3065147 CV score: 20.57872 
## Bandwidth: 0.3088824 CV score: 20.57851 
## Bandwidth: 0.3085998 CV score: 20.5785 
## Bandwidth: 0.3085591 CV score: 20.5785 
## Bandwidth: 0.3086405 CV score: 20.5785 
## Bandwidth: 0.3085998 CV score: 20.5785
## # of obs for 98126: 352
## Bandwidth: 3.101315 CV score: 19.95358 
## Bandwidth: 5.013024 CV score: 20.65518 
## Bandwidth: 1.919815 CV score: 19.03802 
## Bandwidth: 1.189607 CV score: 17.95679 
## Bandwidth: 0.7383139 CV score: 16.85971 
## Bandwidth: 0.4593994 CV score: 16.28299 
## Bandwidth: 0.2870207 CV score: 16.64155 
## Bandwidth: 0.486369 CV score: 16.30232 
## Bandwidth: 0.4473419 CV score: 16.27929 
## Bandwidth: 0.4387371 CV score: 16.27868 
## Bandwidth: 0.4399543 CV score: 16.27866 
## Bandwidth: 0.4400552 CV score: 16.27866 
## Bandwidth: 0.4400145 CV score: 16.27866 
## Bandwidth: 0.4400959 CV score: 16.27866 
## Bandwidth: 0.4400552 CV score: 16.27866
## # of obs for 98019: 190
## Bandwidth: 14.99476 CV score: 7.257663 
## Bandwidth: 24.23782 CV score: 7.288782 
## Bandwidth: 9.282245 CV score: 7.199146 
## Bandwidth: 5.751713 CV score: 7.171767 
## Bandwidth: 3.569725 CV score: 7.197574 
## Bandwidth: 6.385845 CV score: 7.175466 
## Bandwidth: 5.603686 CV score: 7.171391 
## Bandwidth: 5.376422 CV score: 7.171249 
## Bandwidth: 5.42855 CV score: 7.171233 
## Bandwidth: 5.430627 CV score: 7.171233 
## Bandwidth: 5.43018 CV score: 7.171233 
## Bandwidth: 5.430221 CV score: 7.171233 
## Bandwidth: 5.430139 CV score: 7.171233 
## Bandwidth: 5.43018 CV score: 7.171233
## # of obs for 98002: 197
## Bandwidth: 5.415827 CV score: 1.245589 
## Bandwidth: 8.754243 CV score: 1.2544 
## Bandwidth: 3.352572 CV score: 1.232598 
## Bandwidth: 2.077411 CV score: 1.224007 
## Bandwidth: 1.289317 CV score: 1.236699 
## Bandwidth: 2.410701 CV score: 1.225346 
## Bandwidth: 1.776386 CV score: 1.224913 
## Bandwidth: 2.062693 CV score: 1.223992 
## Bandwidth: 2.034486 CV score: 1.223978 
## Bandwidth: 1.935901 CV score: 1.22409 
## Bandwidth: 2.027097 CV score: 1.223978 
## Bandwidth: 2.027863 CV score: 1.223978 
## Bandwidth: 2.027935 CV score: 1.223978 
## Bandwidth: 2.027976 CV score: 1.223978 
## Bandwidth: 2.027935 CV score: 1.223978
## # of obs for 98133: 485
## Bandwidth: 3.472267 CV score: 18.09628 
## Bandwidth: 5.612636 CV score: 19.01026 
## Bandwidth: 2.149446 CV score: 17.12706 
## Bandwidth: 1.331897 CV score: 16.70556 
## Bandwidth: 0.8266243 CV score: 16.68689 
## Bandwidth: 1.028211 CV score: 16.65728 
## Bandwidth: 1.048747 CV score: 16.65817 
## Bandwidth: 1.013174 CV score: 16.65694 
## Bandwidth: 0.9419184 CV score: 16.65954 
## Bandwidth: 1.000384 CV score: 16.65686 
## Bandwidth: 1.001972 CV score: 16.65686 
## Bandwidth: 1.002189 CV score: 16.65686 
## Bandwidth: 1.002148 CV score: 16.65686 
## Bandwidth: 1.00223 CV score: 16.65686 
## Bandwidth: 1.002189 CV score: 16.65686
## # of obs for 98040: 282
## Bandwidth: 3.170305 CV score: 297.8511 
## Bandwidth: 5.124539 CV score: 298.3054 
## Bandwidth: 1.962521 CV score: 295.3757 
## Bandwidth: 1.21607 CV score: 295.8933 
## Bandwidth: 1.8363 CV score: 295.1126 
## Bandwidth: 1.666694 CV score: 294.9176 
## Bandwidth: 1.494571 CV score: 294.9966 
## Bandwidth: 1.629378 CV score: 294.9079 
## Bandwidth: 1.623676 CV score: 294.9077 
## Bandwidth: 1.621971 CV score: 294.9077 
## Bandwidth: 1.622088 CV score: 294.9077 
## Bandwidth: 1.622129 CV score: 294.9077 
## Bandwidth: 1.622047 CV score: 294.9077 
## Bandwidth: 1.622088 CV score: 294.9077
## # of obs for 98092: 351
## Bandwidth: 7.571111 CV score: 11.6993 
## Bandwidth: 12.23809 CV score: 12.27371 
## Bandwidth: 4.686763 CV score: 10.77095 
## Bandwidth: 2.904138 CV score: 10.14205 
## Bandwidth: 1.802415 CV score: 10.59895 
## Bandwidth: 3.132539 CV score: 10.18126 
## Bandwidth: 2.823615 CV score: 10.13455 
## Bandwidth: 2.433551 CV score: 10.16104 
## Bandwidth: 2.727786 CV score: 10.13069 
## Bandwidth: 2.70857 CV score: 10.13063 
## Bandwidth: 2.713784 CV score: 10.13062 
## Bandwidth: 2.71398 CV score: 10.13062 
## Bandwidth: 2.713939 CV score: 10.13062 
## Bandwidth: 2.714021 CV score: 10.13062 
## Bandwidth: 2.71398 CV score: 10.13062
## # of obs for 98030: 253
## Bandwidth: 3.340405 CV score: 2.741191 
## Bandwidth: 5.399493 CV score: 2.824794 
## Bandwidth: 2.067819 CV score: 2.610783 
## Bandwidth: 1.281317 CV score: 2.5233 
## Bandwidth: 0.7952328 CV score: 2.537542 
## Bandwidth: 1.170933 CV score: 2.518967 
## Bandwidth: 1.118554 CV score: 2.518274 
## Bandwidth: 1.103383 CV score: 2.518255 
## Bandwidth: 1.107526 CV score: 2.518252 
## Bandwidth: 1.107638 CV score: 2.518252 
## Bandwidth: 1.107598 CV score: 2.518252 
## Bandwidth: 1.107679 CV score: 2.518252 
## Bandwidth: 1.107638 CV score: 2.518252
## # of obs for 98119: 184
## Bandwidth: 1.329883 CV score: 82.06058 
## Bandwidth: 2.149647 CV score: 83.96159 
## Bandwidth: 0.8232404 CV score: 80.04106 
## Bandwidth: 0.5101183 CV score: 79.16133 
## Bandwidth: 0.3165982 CV score: 81.79627 
## Bandwidth: 0.6233488 CV score: 79.33851 
## Bandwidth: 0.4411028 CV score: 79.38574 
## Bandwidth: 0.5371294 CV score: 79.16536 
## Bandwidth: 0.5190831 CV score: 79.15859 
## Bandwidth: 0.5206657 CV score: 79.15856 
## Bandwidth: 0.5202594 CV score: 79.15855 
## Bandwidth: 0.5203001 CV score: 79.15855 
## Bandwidth: 0.5202187 CV score: 79.15855 
## Bandwidth: 0.5202594 CV score: 79.15855
## # of obs for 98112: 268
## Bandwidth: 1.597327 CV score: 122.2153 
## Bandwidth: 2.581948 CV score: 126.7759 
## Bandwidth: 0.9887969 CV score: 114.5684 
## Bandwidth: 0.6127048 CV score: 107.3335 
## Bandwidth: 0.3802671 CV score: 104.6611 
## Bandwidth: 0.2366128 CV score: 114.8006 
## Bandwidth: 0.4690504 CV score: 105.0654 
## Bandwidth: 0.3775447 CV score: 104.6774 
## Bandwidth: 0.4048722 CV score: 104.6141 
## Bandwidth: 0.4293861 CV score: 104.7085 
## Bandwidth: 0.3989858 CV score: 104.6106 
## Bandwidth: 0.3996998 CV score: 104.6106 
## Bandwidth: 0.3994788 CV score: 104.6106 
## Bandwidth: 0.3994381 CV score: 104.6106 
## Bandwidth: 0.3995195 CV score: 104.6106 
## Bandwidth: 0.3994788 CV score: 104.6106
## # of obs for 98052: 571
## Bandwidth: 5.883075 CV score: 53.5907 
## Bandwidth: 9.509512 CV score: 54.05454 
## Bandwidth: 3.641814 CV score: 52.71886 
## Bandwidth: 2.256638 CV score: 51.60824 
## Bandwidth: 1.400553 CV score: 52.00963 
## Bandwidth: 2.242102 CV score: 51.6008 
## Bandwidth: 2.0298 CV score: 51.53561 
## Bandwidth: 1.789449 CV score: 51.58543 
## Bandwidth: 2.000836 CV score: 51.53397 
## Bandwidth: 1.992598 CV score: 51.53385 
## Bandwidth: 1.990891 CV score: 51.53385 
## Bandwidth: 1.99098 CV score: 51.53385 
## Bandwidth: 1.991021 CV score: 51.53385 
## Bandwidth: 1.990939 CV score: 51.53385 
## Bandwidth: 1.99098 CV score: 51.53385
## # of obs for 98027: 411
## Bandwidth: 8.456717 CV score: 69.91598 
## Bandwidth: 13.66959 CV score: 72.97544 
## Bandwidth: 5.234981 CV score: 64.86197 
## Bandwidth: 3.243839 CV score: 60.63672 
## Bandwidth: 2.013246 CV score: 55.5442 
## Bandwidth: 1.252698 CV score: 55.86104 
## Bandwidth: 1.724029 CV score: 54.75554 
## Bandwidth: 1.664204 CV score: 54.70317 
## Bandwidth: 1.611595 CV score: 54.69512 
## Bandwidth: 1.625984 CV score: 54.69363 
## Bandwidth: 1.626496 CV score: 54.69363 
## Bandwidth: 1.626369 CV score: 54.69363 
## Bandwidth: 1.62641 CV score: 54.69363 
## Bandwidth: 1.626328 CV score: 54.69363 
## Bandwidth: 1.626369 CV score: 54.69363
## # of obs for 98117: 548
## Bandwidth: 1.745136 CV score: 44.9993 
## Bandwidth: 2.820869 CV score: 46.06447 
## Bandwidth: 1.080295 CV score: 42.67325 
## Bandwidth: 0.6694016 CV score: 40.0665 
## Bandwidth: 0.4154552 CV score: 38.8079 
## Bandwidth: 0.2585078 CV score: 39.25813 
## Bandwidth: 0.4123002 CV score: 38.79828 
## Bandwidth: 0.3742635 CV score: 38.71016 
## Bandwidth: 0.3300487 CV score: 38.70741 
## Bandwidth: 0.3510211 CV score: 38.69095 
## Bandwidth: 0.3513026 CV score: 38.69097 
## Bandwidth: 0.3501515 CV score: 38.69091 
## Bandwidth: 0.3500337 CV score: 38.69091 
## Bandwidth: 0.3500744 CV score: 38.69091 
## Bandwidth: 0.349993 CV score: 38.69091 
## Bandwidth: 0.3500337 CV score: 38.69091
## # of obs for 98058: 451
## Bandwidth: 5.038931 CV score: 15.86107 
## Bandwidth: 8.145021 CV score: 16.1923 
## Bandwidth: 3.119261 CV score: 15.32794 
## Bandwidth: 1.93284 CV score: 14.55631 
## Bandwidth: 1.199592 CV score: 14.03179 
## Bandwidth: 0.7464193 CV score: 14.58491 
## Bandwidth: 1.347017 CV score: 14.09556 
## Bandwidth: 1.026495 CV score: 14.009 
## Bandwidth: 1.042914 CV score: 14.00624 
## Bandwidth: 1.078601 CV score: 14.00507 
## Bandwidth: 1.124815 CV score: 14.01096 
## Bandwidth: 1.067085 CV score: 14.00481 
## Bandwidth: 1.067887 CV score: 14.00481 
## Bandwidth: 1.067502 CV score: 14.00481 
## Bandwidth: 1.067461 CV score: 14.00481 
## Bandwidth: 1.067542 CV score: 14.00481 
## Bandwidth: 1.067502 CV score: 14.00481
## # of obs for 98001: 359
## Bandwidth: 4.623624 CV score: 7.269422 
## Bandwidth: 7.473711 CV score: 7.294329 
## Bandwidth: 2.862173 CV score: 7.309594 
## Bandwidth: 5.409885 CV score: 7.273245 
## Bandwidth: 3.950809 CV score: 7.273628 
## Bandwidth: 4.697589 CV score: 7.269484 
## Bandwidth: 4.586302 CV score: 7.269422 
## Bandwidth: 4.605349 CV score: 7.26942 
## Bandwidth: 4.605308 CV score: 7.26942 
## Bandwidth: 4.605267 CV score: 7.26942 
## Bandwidth: 4.598023 CV score: 7.26942 
## Bandwidth: 4.605227 CV score: 7.26942 
## Bandwidth: 4.605267 CV score: 7.26942
## # of obs for 98056: 404
## Bandwidth: 3.100179 CV score: 66.68987 
## Bandwidth: 5.011187 CV score: 69.91465 
## Bandwidth: 1.919111 CV score: 61.46301 
## Bandwidth: 1.189171 CV score: 55.16134 
## Bandwidth: 0.7380434 CV score: 46.3251 
## Bandwidth: 0.4592311 CV score: 34.81213 
## Bandwidth: 0.2869156 CV score: 27.66438 
## Bandwidth: 0.1804188 CV score: 23.20766 
## Bandwidth: 0.1146001 CV score: 28.22047 
## Bandwidth: 0.203114 CV score: 23.93448 
## Bandwidth: 0.1552783 CV score: 23.32137 
## Bandwidth: 0.1708085 CV score: 23.0974 
## Bandwidth: 0.1700442 CV score: 23.0959 
## Bandwidth: 0.169358 CV score: 23.09557 
## Bandwidth: 0.1694569 CV score: 23.09556 
## Bandwidth: 0.1694976 CV score: 23.09556 
## Bandwidth: 0.1694162 CV score: 23.09556 
## Bandwidth: 0.1694569 CV score: 23.09556
## # of obs for 98166: 250
## Bandwidth: 4.446213 CV score: 72.23377 
## Bandwidth: 7.186941 CV score: 72.02124 
## Bandwidth: 8.880804 CV score: 71.98673 
## Bandwidth: 8.824098 CV score: 71.98751 
## Bandwidth: 9.927669 CV score: 71.97494 
## Bandwidth: 10.57467 CV score: 71.9696 
## Bandwidth: 10.97453 CV score: 71.96683 
## Bandwidth: 11.22167 CV score: 71.96529 
## Bandwidth: 11.3744 CV score: 71.9644 
## Bandwidth: 11.4688 CV score: 71.96386 
## Bandwidth: 11.52714 CV score: 71.96354 
## Bandwidth: 11.56319 CV score: 71.96334 
## Bandwidth: 11.58548 CV score: 71.96322 
## Bandwidth: 11.59925 CV score: 71.96315 
## Bandwidth: 11.60776 CV score: 71.96311 
## Bandwidth: 11.61302 CV score: 71.96308 
## Bandwidth: 11.61627 CV score: 71.96306 
## Bandwidth: 11.61828 CV score: 71.96305 
## Bandwidth: 11.61952 CV score: 71.96304 
## Bandwidth: 11.62029 CV score: 71.96304 
## Bandwidth: 11.62076 CV score: 71.96304 
## Bandwidth: 11.62106 CV score: 71.96303 
## Bandwidth: 11.62124 CV score: 71.96303 
## Bandwidth: 11.62135 CV score: 71.96303 
## Bandwidth: 11.62142 CV score: 71.96303 
## Bandwidth: 11.62146 CV score: 71.96303 
## Bandwidth: 11.62146 CV score: 71.96303
## # of obs for 98023: 492
## Bandwidth: 3.540699 CV score: 15.9262 
## Bandwidth: 5.723252 CV score: 16.24571 
## Bandwidth: 2.191808 CV score: 15.31003 
## Bandwidth: 1.358147 CV score: 14.50245 
## Bandwidth: 0.8429157 CV score: 14.5527 
## Bandwidth: 1.162223 CV score: 14.36193 
## Bandwidth: 1.119643 CV score: 14.34845 
## Bandwidth: 1.046643 CV score: 14.34702 
## Bandwidth: 0.9688263 CV score: 14.38368 
## Bandwidth: 1.079338 CV score: 14.3439 
## Bandwidth: 1.079711 CV score: 14.34391 
## Bandwidth: 1.078928 CV score: 14.3439 
## Bandwidth: 1.078887 CV score: 14.3439 
## Bandwidth: 1.078846 CV score: 14.3439 
## Bandwidth: 1.078887 CV score: 14.3439
## # of obs for 98007: 138
## Bandwidth: 4.13383 CV score: 7.350672 
## Bandwidth: 6.682 CV score: 7.695116 
## Bandwidth: 2.558975 CV score: 6.886225 
## Bandwidth: 1.585661 CV score: 6.677077 
## Bandwidth: 0.9841193 CV score: 7.213001 
## Bandwidth: 1.919302 CV score: 6.691713 
## Bandwidth: 1.67044 CV score: 6.666075 
## Bandwidth: 1.721045 CV score: 6.665478 
## Bandwidth: 1.70251 CV score: 6.665248 
## Bandwidth: 1.703543 CV score: 6.665247 
## Bandwidth: 1.70319 CV score: 6.665247 
## Bandwidth: 1.703149 CV score: 6.665247 
## Bandwidth: 1.70323 CV score: 6.665247 
## Bandwidth: 1.70319 CV score: 6.665247
## # of obs for 98070: 117
## Bandwidth: 8.290611 CV score: 19.87716 
## Bandwidth: 13.4011 CV score: 19.75173 
## Bandwidth: 16.55955 CV score: 19.71402 
## Bandwidth: 18.89769 CV score: 19.69603 
## Bandwidth: 19.95663 CV score: 19.68969 
## Bandwidth: 20.6111 CV score: 19.68621 
## Bandwidth: 21.01558 CV score: 19.6842 
## Bandwidth: 21.26556 CV score: 19.68302 
## Bandwidth: 21.42006 CV score: 19.6823 
## Bandwidth: 21.51554 CV score: 19.68187 
## Bandwidth: 21.57455 CV score: 19.6816 
## Bandwidth: 21.61103 CV score: 19.68144 
## Bandwidth: 21.63357 CV score: 19.68134 
## Bandwidth: 21.6475 CV score: 19.68128 
## Bandwidth: 21.65611 CV score: 19.68124 
## Bandwidth: 21.66143 CV score: 19.68122 
## Bandwidth: 21.66472 CV score: 19.6812 
## Bandwidth: 21.66675 CV score: 19.68119 
## Bandwidth: 21.66801 CV score: 19.68119 
## Bandwidth: 21.66878 CV score: 19.68118 
## Bandwidth: 21.66926 CV score: 19.68118 
## Bandwidth: 21.66956 CV score: 19.68118 
## Bandwidth: 21.66974 CV score: 19.68118 
## Bandwidth: 21.66985 CV score: 19.68118 
## Bandwidth: 21.66992 CV score: 19.68118 
## Bandwidth: 21.66997 CV score: 19.68118 
## Bandwidth: 21.66997 CV score: 19.68118
## # of obs for 98148: 56
## Bandwidth: 1.864819 CV score: 1.082701 
## Bandwidth: 3.014328 CV score: 1.066196 
## Bandwidth: 3.724763 CV score: 1.067549 
## Bandwidth: 3.260663 CV score: 1.066543 
## Bandwidth: 2.575254 CV score: 1.066725 
## Bandwidth: 2.952858 CV score: 1.066155 
## Bandwidth: 2.843691 CV score: 1.066149 
## Bandwidth: 2.741158 CV score: 1.066249 
## Bandwidth: 2.891717 CV score: 1.06614 
## Bandwidth: 2.892466 CV score: 1.06614 
## Bandwidth: 2.891039 CV score: 1.06614 
## Bandwidth: 2.890987 CV score: 1.06614 
## Bandwidth: 2.890946 CV score: 1.06614 
## Bandwidth: 2.890987 CV score: 1.06614
## # of obs for 98105: 229
## Bandwidth: 2.270165 CV score: 66.3921 
## Bandwidth: 3.669536 CV score: 68.50575 
## Bandwidth: 1.405305 CV score: 62.9178 
## Bandwidth: 0.870793 CV score: 60.52174 
## Bandwidth: 0.5404462 CV score: 60.81852 
## Bandwidth: 0.7778164 CV score: 60.25985 
## Bandwidth: 0.7343206 CV score: 60.17024 
## Bandwidth: 0.6602672 CV score: 60.12512 
## Bandwidth: 0.6726093 CV score: 60.11853 
## Bandwidth: 0.6808455 CV score: 60.11797 
## Bandwidth: 0.6782519 CV score: 60.11784 
## Bandwidth: 0.678366 CV score: 60.11784 
## Bandwidth: 0.6783253 CV score: 60.11784 
## Bandwidth: 0.6783253 CV score: 60.11784
## # of obs for 98042: 547
## Bandwidth: 6.189961 CV score: 15.14125 
## Bandwidth: 10.00557 CV score: 15.2306 
## Bandwidth: 3.831786 CV score: 14.89443 
## Bandwidth: 2.374354 CV score: 14.57689 
## Bandwidth: 1.473611 CV score: 14.55926 
## Bandwidth: 1.807615 CV score: 14.51918 
## Bandwidth: 1.884253 CV score: 14.51917 
## Bandwidth: 1.846053 CV score: 14.5188 
## Bandwidth: 1.846093 CV score: 14.5188 
## Bandwidth: 1.845845 CV score: 14.5188 
## Bandwidth: 1.831243 CV score: 14.51885 
## Bandwidth: 1.845886 CV score: 14.5188 
## Bandwidth: 1.845804 CV score: 14.5188 
## Bandwidth: 1.845845 CV score: 14.5188
## # of obs for 98008: 283
## Bandwidth: 3.407424 CV score: 144.4033 
## Bandwidth: 5.507823 CV score: 150.6673 
## Bandwidth: 2.109306 CV score: 135.2436 
## Bandwidth: 1.307024 CV score: 123.6705 
## Bandwidth: 0.8111875 CV score: 112.0347 
## Bandwidth: 0.5047433 CV score: 114.2853 
## Bandwidth: 0.753584 CV score: 111.063 
## Bandwidth: 0.6957055 CV score: 110.477 
## Bandwidth: 0.6227645 CV score: 110.5915 
## Bandwidth: 0.6680152 CV score: 110.3872 
## Bandwidth: 0.6666158 CV score: 110.3866 
## Bandwidth: 0.6650781 CV score: 110.3864 
## Bandwidth: 0.6652173 CV score: 110.3864 
## Bandwidth: 0.665258 CV score: 110.3864 
## Bandwidth: 0.6651766 CV score: 110.3864 
## Bandwidth: 0.6652173 CV score: 110.3864
## # of obs for 98059: 465
## Bandwidth: 4.982406 CV score: 26.44046 
## Bandwidth: 8.053654 CV score: 28.40016 
## Bandwidth: 3.084271 CV score: 23.03457 
## Bandwidth: 1.911158 CV score: 20.11424 
## Bandwidth: 1.186135 CV score: 19.19564 
## Bandwidth: 0.564959 CV score: 19.54374 
## Bandwidth: 1.081962 CV score: 19.13494 
## Bandwidth: 1.002271 CV score: 19.12142 
## Bandwidth: 1.004374 CV score: 19.1213 
## Bandwidth: 1.012876 CV score: 19.12111 
## Bandwidth: 1.01217 CV score: 19.12111 
## Bandwidth: 1.012211 CV score: 19.12111 
## Bandwidth: 1.012252 CV score: 19.12111 
## Bandwidth: 1.012211 CV score: 19.12111
## # of obs for 98122: 289
## Bandwidth: 1.313575 CV score: 49.02044 
## Bandwidth: 2.123287 CV score: 52.8947 
## Bandwidth: 0.8131456 CV score: 43.51038 
## Bandwidth: 0.5038631 CV score: 41.04574 
## Bandwidth: 0.312716 CV score: 43.4432 
## Bandwidth: 0.5612931 CV score: 41.14103 
## Bandwidth: 0.5180579 CV score: 41.04565 
## Bandwidth: 0.5110456 CV score: 41.04362 
## Bandwidth: 0.5110049 CV score: 41.04362 
## Bandwidth: 0.5109642 CV score: 41.04362 
## Bandwidth: 0.5110049 CV score: 41.04362
## # of obs for 98144: 340
## Bandwidth: 2.37842 CV score: 94.44856 
## Bandwidth: 3.844522 CV score: 95.74156 
## Bandwidth: 1.472319 CV score: 91.51972 
## Bandwidth: 0.9123178 CV score: 86.13584 
## Bandwidth: 0.566218 CV score: 80.60572 
## Bandwidth: 0.3523165 CV score: 83.97333 
## Bandwidth: 0.5982318 CV score: 80.981 
## Bandwidth: 0.5297474 CV score: 80.32387 
## Bandwidth: 0.4619748 CV score: 80.46981 
## Bandwidth: 0.5072195 CV score: 80.25686 
## Bandwidth: 0.5053609 CV score: 80.2558 
## Bandwidth: 0.503375 CV score: 80.25547 
## Bandwidth: 0.5036091 CV score: 80.25547 
## Bandwidth: 0.5036498 CV score: 80.25547 
## Bandwidth: 0.5035684 CV score: 80.25547 
## Bandwidth: 0.5036091 CV score: 80.25547
## # of obs for 98004: 315
## Bandwidth: 4.457731 CV score: 345.9478 
## Bandwidth: 7.20556 CV score: 353.9717 
## Bandwidth: 2.75948 CV score: 332.8351 
## Bandwidth: 1.709903 CV score: 318.9122 
## Bandwidth: 1.061229 CV score: 298.5248 
## Bandwidth: 0.6603259 CV score: 268.4929 
## Bandwidth: 0.4125544 CV score: 257.2144 
## Bandwidth: 0.2594232 CV score: 310.3894 
## Bandwidth: 0.5071947 CV score: 258.4176 
## Bandwidth: 0.4302031 CV score: 256.7035 
## Bandwidth: 0.4481268 CV score: 256.6741 
## Bandwidth: 0.4402358 CV score: 256.6391 
## Bandwidth: 0.4405233 CV score: 256.6392 
## Bandwidth: 0.4401951 CV score: 256.6391 
## Bandwidth: 0.4401544 CV score: 256.6391 
## Bandwidth: 0.4401951 CV score: 256.6391
## # of obs for 98005: 168
## Bandwidth: 3.373722 CV score: 30.84123 
## Bandwidth: 5.453348 CV score: 31.15963 
## Bandwidth: 2.088443 CV score: 30.71395 
## Bandwidth: 1.294097 CV score: 31.971 
## Bandwidth: 2.579376 CV score: 30.64924 
## Bandwidth: 2.560694 CV score: 30.64655 
## Bandwidth: 2.446737 CV score: 30.6361 
## Bandwidth: 2.309881 CV score: 30.64172 
## Bandwidth: 2.417085 CV score: 30.63539 
## Bandwidth: 2.412213 CV score: 30.63537 
## Bandwidth: 2.41018 CV score: 30.63536 
## Bandwidth: 2.410335 CV score: 30.63536 
## Bandwidth: 2.410376 CV score: 30.63536 
## Bandwidth: 2.410294 CV score: 30.63536 
## Bandwidth: 2.410335 CV score: 30.63536
## # of obs for 98034: 543
## Bandwidth: 4.443407 CV score: 166.1253 
## Bandwidth: 7.182405 CV score: 172.2033 
## Bandwidth: 2.750613 CV score: 157.4911 
## Bandwidth: 1.704408 CV score: 151.7815 
## Bandwidth: 1.057819 CV score: 151.1929 
## Bandwidth: 1.211678 CV score: 150.613 
## Bandwidth: 1.333186 CV score: 150.6083 
## Bandwidth: 1.273871 CV score: 150.5712 
## Bandwidth: 1.274256 CV score: 150.5712 
## Bandwidth: 1.272011 CV score: 150.5712 
## Bandwidth: 1.272151 CV score: 150.5712 
## Bandwidth: 1.272192 CV score: 150.5712 
## Bandwidth: 1.27211 CV score: 150.5712 
## Bandwidth: 1.272151 CV score: 150.5712
## # of obs for 98075: 358
## Bandwidth: 3.919863 CV score: 124.1556 
## Bandwidth: 6.336139 CV score: 128.8196 
## Bandwidth: 2.426522 CV score: 119.6011 
## Bandwidth: 1.503587 CV score: 117.883 
## Bandwidth: 0.9331812 CV score: 119.0616 
## Bandwidth: 1.611176 CV score: 117.9176 
## Bandwidth: 1.511718 CV score: 117.8828 
## Bandwidth: 1.510605 CV score: 117.8828 
## Bandwidth: 1.510441 CV score: 117.8828 
## Bandwidth: 1.510401 CV score: 117.8828 
## Bandwidth: 1.510482 CV score: 117.8828 
## Bandwidth: 1.510441 CV score: 117.8828
## # of obs for 98116: 329
## Bandwidth: 1.813754 CV score: 38.23056 
## Bandwidth: 2.931786 CV score: 38.95199 
## Bandwidth: 1.122773 CV score: 37.34285 
## Bandwidth: 0.6957225 CV score: 36.74894 
## Bandwidth: 0.431791 CV score: 36.34296 
## Bandwidth: 0.2686723 CV score: 36.43208 
## Bandwidth: 0.4061942 CV score: 36.29292 
## Bandwidth: 0.3652493 CV score: 36.24235 
## Bandwidth: 0.3283602 CV score: 36.25728 
## Bandwidth: 0.3564079 CV score: 36.23955 
## Bandwidth: 0.3546684 CV score: 36.23944 
## Bandwidth: 0.3541252 CV score: 36.23943 
## Bandwidth: 0.3541833 CV score: 36.23943 
## Bandwidth: 0.354224 CV score: 36.23943 
## Bandwidth: 0.3541833 CV score: 36.23943
## # of obs for 98010: 99
## Bandwidth: 6.485223 CV score: 7.922646 
## Bandwidth: 10.48284 CV score: 7.897692 
## Bandwidth: 12.9535 CV score: 7.889331 
## Bandwidth: 15.54836 CV score: 7.883779 
## Bandwidth: 14.55721 CV score: 7.885614 
## Bandwidth: 16.08416 CV score: 7.882905 
## Bandwidth: 16.41531 CV score: 7.882401 
## Bandwidth: 16.61996 CV score: 7.882103 
## Bandwidth: 16.74645 CV score: 7.881923 
## Bandwidth: 16.82462 CV score: 7.881814 
## Bandwidth: 16.87294 CV score: 7.881747 
## Bandwidth: 16.90279 CV score: 7.881706 
## Bandwidth: 16.92125 CV score: 7.88168 
## Bandwidth: 16.93265 CV score: 7.881665 
## Bandwidth: 16.9397 CV score: 7.881655 
## Bandwidth: 16.94406 CV score: 7.881649 
## Bandwidth: 16.94675 CV score: 7.881646 
## Bandwidth: 16.94842 CV score: 7.881643 
## Bandwidth: 16.94944 CV score: 7.881642 
## Bandwidth: 16.95008 CV score: 7.881641 
## Bandwidth: 16.95047 CV score: 7.88164 
## Bandwidth: 16.95071 CV score: 7.88164 
## Bandwidth: 16.95086 CV score: 7.88164 
## Bandwidth: 16.95096 CV score: 7.88164 
## Bandwidth: 16.95101 CV score: 7.88164 
## Bandwidth: 16.95106 CV score: 7.88164 
## Bandwidth: 16.95106 CV score: 7.88164
## # of obs for 98118: 499
## Bandwidth: 2.745356 CV score: 79.96594 
## Bandwidth: 4.437644 CV score: 83.5 
## Bandwidth: 1.699464 CV score: 75.09034 
## Bandwidth: 1.053067 CV score: 70.18162 
## Bandwidth: 0.6535724 CV score: 66.6023 
## Bandwidth: 0.4066709 CV score: 66.96416 
## Bandwidth: 0.5755576 CV score: 66.2329 
## Bandwidth: 0.5500812 CV score: 66.17389 
## Bandwidth: 0.5132663 CV score: 66.1614 
## Bandwidth: 0.4725505 CV score: 66.28156 
## Bandwidth: 0.5263306 CV score: 66.15469 
## Bandwidth: 0.5269531 CV score: 66.15469 
## Bandwidth: 0.526523 CV score: 66.15469 
## Bandwidth: 0.5264823 CV score: 66.15469 
## Bandwidth: 0.5265637 CV score: 66.15469 
## Bandwidth: 0.526523 CV score: 66.15469
## # of obs for 98199: 316
## Bandwidth: 2.142695 CV score: 96.81772 
## Bandwidth: 3.463493 CV score: 98.40552 
## Bandwidth: 1.326398 CV score: 93.62247 
## Bandwidth: 0.8218982 CV score: 89.9377 
## Bandwidth: 0.5101003 CV score: 92.06727 
## Bandwidth: 0.8632319 CV score: 90.17867 
## Bandwidth: 0.7612556 CV score: 89.70494 
## Bandwidth: 0.6653228 CV score: 89.80288 
## Bandwidth: 0.7297382 CV score: 89.66219 
## Bandwidth: 0.7271203 CV score: 89.66157 
## Bandwidth: 0.7247875 CV score: 89.66141 
## Bandwidth: 0.7250341 CV score: 89.66141 
## Bandwidth: 0.7250748 CV score: 89.66141 
## Bandwidth: 0.7249934 CV score: 89.66141 
## Bandwidth: 0.7250341 CV score: 89.66141
## # of obs for 98032: 123
## Bandwidth: 3.197334 CV score: 1.272637 
## Bandwidth: 5.16823 CV score: 1.272365 
## Bandwidth: 6.38631 CV score: 1.271113 
## Bandwidth: 7.139126 CV score: 1.270519 
## Bandwidth: 7.604391 CV score: 1.270213 
## Bandwidth: 7.891941 CV score: 1.270045 
## Bandwidth: 8.069656 CV score: 1.269948 
## Bandwidth: 8.179491 CV score: 1.26989 
## Bandwidth: 8.247372 CV score: 1.269855 
## Bandwidth: 8.289325 CV score: 1.269834 
## Bandwidth: 8.315253 CV score: 1.269821 
## Bandwidth: 8.331278 CV score: 1.269814 
## Bandwidth: 8.341182 CV score: 1.269809 
## Bandwidth: 8.347302 CV score: 1.269806 
## Bandwidth: 8.351085 CV score: 1.269804 
## Bandwidth: 8.353423 CV score: 1.269803 
## Bandwidth: 8.354868 CV score: 1.269802 
## Bandwidth: 8.355761 CV score: 1.269802 
## Bandwidth: 8.356313 CV score: 1.269801 
## Bandwidth: 8.356654 CV score: 1.269801 
## Bandwidth: 8.356865 CV score: 1.269801 
## Bandwidth: 8.356995 CV score: 1.269801 
## Bandwidth: 8.357076 CV score: 1.269801 
## Bandwidth: 8.357126 CV score: 1.269801 
## Bandwidth: 8.357126 CV score: 1.269801
## # of obs for 98045: 219
## Bandwidth: 13.0077 CV score: 15.24183 
## Bandwidth: 21.02589 CV score: 15.34234 
## Bandwidth: 8.052189 CV score: 15.00116 
## Bandwidth: 4.989513 CV score: 14.52276 
## Bandwidth: 3.096675 CV score: 13.79862 
## Bandwidth: 1.926837 CV score: 13.30721 
## Bandwidth: 1.203838 CV score: 14.26849 
## Bandwidth: 2.284533 CV score: 13.35825 
## Bandwidth: 1.896785 CV score: 13.31323 
## Bandwidth: 2.02499 CV score: 13.30048 
## Bandwidth: 2.124126 CV score: 13.31127 
## Bandwidth: 2.014051 CV score: 13.30032 
## Bandwidth: 2.011651 CV score: 13.30031 
## Bandwidth: 2.011243 CV score: 13.30031 
## Bandwidth: 2.011283 CV score: 13.30031 
## Bandwidth: 2.011202 CV score: 13.30031 
## Bandwidth: 2.011243 CV score: 13.30031
## # of obs for 98102: 105
## Bandwidth: 4.35743 CV score: 180.4485 
## Bandwidth: 7.04343 CV score: 180.4134 
## Bandwidth: 8.70347 CV score: 180.4015 
## Bandwidth: 10.5171 CV score: 180.3931 
## Bandwidth: 9.824356 CV score: 180.3959 
## Bandwidth: 10.85032 CV score: 180.3919 
## Bandwidth: 11.05626 CV score: 180.3912 
## Bandwidth: 11.18353 CV score: 180.3908 
## Bandwidth: 11.26219 CV score: 180.3905 
## Bandwidth: 11.31081 CV score: 180.3904 
## Bandwidth: 11.34085 CV score: 180.3903 
## Bandwidth: 11.35942 CV score: 180.3902 
## Bandwidth: 11.3709 CV score: 180.3902 
## Bandwidth: 11.37799 CV score: 180.3902 
## Bandwidth: 11.38238 CV score: 180.3902 
## Bandwidth: 11.38509 CV score: 180.3901 
## Bandwidth: 11.38676 CV score: 180.3901 
## Bandwidth: 11.3878 CV score: 180.3901 
## Bandwidth: 11.38844 CV score: 180.3901 
## Bandwidth: 11.38883 CV score: 180.3901 
## Bandwidth: 11.38907 CV score: 180.3901 
## Bandwidth: 11.38923 CV score: 180.3901 
## Bandwidth: 11.38932 CV score: 180.3901 
## Bandwidth: 11.38938 CV score: 180.3901 
## Bandwidth: 11.38942 CV score: 180.3901 
## Bandwidth: 11.38942 CV score: 180.3901
## # of obs for 98077: 196
## Bandwidth: 3.877691 CV score: 17.65709 
## Bandwidth: 6.267972 CV score: 17.81999 
## Bandwidth: 2.400416 CV score: 17.53554 
## Bandwidth: 1.48741 CV score: 17.8299 
## Bandwidth: 2.964685 CV score: 17.55144 
## Bandwidth: 2.444693 CV score: 17.53242 
## Bandwidth: 2.608197 CV score: 17.52901 
## Bandwidth: 2.744363 CV score: 17.5339 
## Bandwidth: 2.58152 CV score: 17.5288 
## Bandwidth: 2.57557 CV score: 17.52879 
## Bandwidth: 2.573683 CV score: 17.52878 
## Bandwidth: 2.573804 CV score: 17.52878 
## Bandwidth: 2.573845 CV score: 17.52878 
## Bandwidth: 2.573763 CV score: 17.52878 
## Bandwidth: 2.573804 CV score: 17.52878
## # of obs for 98103: 599
## Bandwidth: 2.453038 CV score: 65.11655 
## Bandwidth: 3.965136 CV score: 68.12954 
## Bandwidth: 1.51851 CV score: 61.5875 
## Bandwidth: 0.9409397 CV score: 57.43801 
## Bandwidth: 0.5839818 CV score: 53.54285 
## Bandwidth: 0.3633697 CV score: 51.75194 
## Bandwidth: 0.2270239 CV score: 54.45618 
## Bandwidth: 0.4218406 CV score: 51.97478 
## Bandwidth: 0.3208655 CV score: 51.92622 
## Bandwidth: 0.3682842 CV score: 51.75434 
## Bandwidth: 0.3633059 CV score: 51.75194 
## Bandwidth: 0.3627799 CV score: 51.75191 
## Bandwidth: 0.3628206 CV score: 51.75191 
## Bandwidth: 0.3627392 CV score: 51.75192 
## Bandwidth: 0.3627799 CV score: 51.75191
## # of obs for 98108: 185
## Bandwidth: 2.395536 CV score: 6.919921 
## Bandwidth: 3.872189 CV score: 7.267746 
## Bandwidth: 1.482914 CV score: 6.34701 
## Bandwidth: 0.9188832 CV score: 5.869257 
## Bandwidth: 0.5702927 CV score: 6.143134 
## Bandwidth: 0.9641681 CV score: 5.895301 
## Bandwidth: 0.8582933 CV score: 5.84287 
## Bandwidth: 0.7482868 CV score: 5.832319 
## Bandwidth: 0.779197 CV score: 5.828455 
## Bandwidth: 0.786119 CV score: 5.828486 
## Bandwidth: 0.7819998 CV score: 5.828433 
## Bandwidth: 0.7819198 CV score: 5.828433 
## Bandwidth: 0.7818791 CV score: 5.828433 
## Bandwidth: 0.7819198 CV score: 5.828433
## # of obs for 98168: 264
## Bandwidth: 2.917327 CV score: 4.461453 
## Bandwidth: 4.715621 CV score: 4.524063 
## Bandwidth: 1.80592 CV score: 4.405966 
## Bandwidth: 1.119032 CV score: 4.492929 
## Bandwidth: 2.107332 CV score: 4.416331 
## Bandwidth: 1.543552 CV score: 4.411538 
## Bandwidth: 1.782355 CV score: 4.405722 
## Bandwidth: 1.754953 CV score: 4.405586 
## Bandwidth: 1.74538 CV score: 4.405579 
## Bandwidth: 1.746766 CV score: 4.405578 
## Bandwidth: 1.746884 CV score: 4.405578 
## Bandwidth: 1.746844 CV score: 4.405578 
## Bandwidth: 1.746925 CV score: 4.405578 
## Bandwidth: 1.746884 CV score: 4.405578
## # of obs for 98177: 254
## Bandwidth: 3.406073 CV score: 95.32039 
## Bandwidth: 5.50564 CV score: 99.01006 
## Bandwidth: 2.10847 CV score: 91.24717 
## Bandwidth: 1.306506 CV score: 89.2247 
## Bandwidth: 0.8108659 CV score: 87.33984 
## Bandwidth: 0.5045433 CV score: 87.04048 
## Bandwidth: 0.5190219 CV score: 86.80609 
## Bandwidth: 0.6493974 CV score: 86.34103 
## Bandwidth: 0.7110729 CV score: 86.66041 
## Bandwidth: 0.6233763 CV score: 86.27117 
## Bandwidth: 0.5835165 CV score: 86.28773 
## Bandwidth: 0.6078613 CV score: 86.25715 
## Bandwidth: 0.6072808 CV score: 86.25709 
## Bandwidth: 0.6065532 CV score: 86.25707 
## Bandwidth: 0.6066104 CV score: 86.25707 
## Bandwidth: 0.6066511 CV score: 86.25707 
## Bandwidth: 0.6066104 CV score: 86.25707
## # of obs for 98065: 307
## Bandwidth: 6.679261 CV score: 17.96779 
## Bandwidth: 10.79648 CV score: 17.66076 
## Bandwidth: 13.34106 CV score: 17.66171 
## Bandwidth: 12.05217 CV score: 17.65825 
## Bandwidth: 11.96715 CV score: 17.65817 
## Bandwidth: 11.824 CV score: 17.6581 
## Bandwidth: 11.43153 CV score: 17.65837 
## Bandwidth: 11.76175 CV score: 17.6581 
## Bandwidth: 11.77152 CV score: 17.6581 
## Bandwidth: 11.77257 CV score: 17.6581 
## Bandwidth: 11.77249 CV score: 17.6581 
## Bandwidth: 11.77245 CV score: 17.6581 
## Bandwidth: 11.77249 CV score: 17.6581
## # of obs for 98029: 320
## Bandwidth: 2.654807 CV score: 19.95116 
## Bandwidth: 4.291279 CV score: 20.20305 
## Bandwidth: 1.643412 CV score: 19.57275 
## Bandwidth: 1.018335 CV score: 19.96266 
## Bandwidth: 1.842332 CV score: 19.65507 
## Bandwidth: 1.404653 CV score: 19.5239 
## Bandwidth: 1.310059 CV score: 19.54471 
## Bandwidth: 1.443711 CV score: 19.52418 
## Bandwidth: 1.422089 CV score: 19.5235 
## Bandwidth: 1.421663 CV score: 19.5235 
## Bandwidth: 1.421372 CV score: 19.5235 
## Bandwidth: 1.421331 CV score: 19.5235 
## Bandwidth: 1.421413 CV score: 19.5235 
## Bandwidth: 1.421372 CV score: 19.5235
## # of obs for 98006: 490
## Bandwidth: 3.254803 CV score: 209.6333 
## Bandwidth: 5.261124 CV score: 227.3669 
## Bandwidth: 2.014828 CV score: 175.1974 
## Bandwidth: 1.248482 CV score: 140.1648 
## Bandwidth: 0.7748539 CV score: 118.009 
## Bandwidth: 0.4821356 CV score: 120.0371 
## Bandwidth: 0.677927 CV score: 116.4475 
## Bandwidth: 0.6579364 CV score: 116.345 
## Bandwidth: 0.6406173 CV score: 116.3227 
## Bandwidth: 0.6430411 CV score: 116.322 
## Bandwidth: 0.6432422 CV score: 116.322 
## Bandwidth: 0.6432015 CV score: 116.322 
## Bandwidth: 0.6431608 CV score: 116.322 
## Bandwidth: 0.6432015 CV score: 116.322
## # of obs for 98109: 109
## Bandwidth: 4.603416 CV score: 38.66481 
## Bandwidth: 7.441046 CV score: 39.22783 
## Bandwidth: 2.849663 CV score: 38.1152 
## Bandwidth: 1.765785 CV score: 37.0517 
## Bandwidth: 1.095911 CV score: 34.67509 
## Bandwidth: 0.6819062 CV score: 30.87317 
## Bandwidth: 0.4260372 CV score: 28.32355 
## Bandwidth: 0.2679014 CV score: 27.73301 
## Bandwidth: 0.2228889 CV score: 28.21519 
## Bandwidth: 0.3207125 CV score: 27.77545 
## Bandwidth: 0.2880735 CV score: 27.70959 
## Bandwidth: 0.2876309 CV score: 27.70943 
## Bandwidth: 0.2855611 CV score: 27.70905 
## Bandwidth: 0.2788157 CV score: 27.71201 
## Bandwidth: 0.2851765 CV score: 27.70904 
## Bandwidth: 0.2852172 CV score: 27.70904 
## Bandwidth: 0.2852579 CV score: 27.70904 
## Bandwidth: 0.2852172 CV score: 27.70904
## # of obs for 98022: 234
## Bandwidth: 13.93232 CV score: 12.35501 
## Bandwidth: 22.52046 CV score: 12.40142 
## Bandwidth: 8.624558 CV score: 12.22707 
## Bandwidth: 5.344179 CV score: 11.85864 
## Bandwidth: 3.316794 CV score: 11.00234 
## Bandwidth: 2.063801 CV score: 10.21035 
## Bandwidth: 1.289409 CV score: 10.06907 
## Bandwidth: 1.265298 CV score: 10.07365 
## Bandwidth: 1.481085 CV score: 10.05208 
## Bandwidth: 1.479484 CV score: 10.05203 
## Bandwidth: 1.455107 CV score: 10.0517 
## Bandwidth: 1.391816 CV score: 10.05468 
## Bandwidth: 1.457784 CV score: 10.0517 
## Bandwidth: 1.457743 CV score: 10.0517 
## Bandwidth: 1.457824 CV score: 10.0517 
## Bandwidth: 1.457784 CV score: 10.0517
## # of obs for 98033: 431
## Bandwidth: 3.396635 CV score: 216.6281 
## Bandwidth: 5.490385 CV score: 227.3363 
## Bandwidth: 2.102627 CV score: 193.7634 
## Bandwidth: 1.302886 CV score: 163.393 
## Bandwidth: 0.8086192 CV score: 137.2835 
## Bandwidth: 0.5031453 CV score: 114.7577 
## Bandwidth: 0.314352 CV score: 113.1158 
## Bandwidth: 0.3757059 CV score: 111.2319 
## Bandwidth: 0.3946847 CV score: 111.1426 
## Bandwidth: 0.3924638 CV score: 111.1401 
## Bandwidth: 0.3919374 CV score: 111.14 
## Bandwidth: 0.3918967 CV score: 111.14 
## Bandwidth: 0.391978 CV score: 111.14 
## Bandwidth: 0.3919374 CV score: 111.14
## # of obs for 98155: 442
## Bandwidth: 2.557324 CV score: 128.1342 
## Bandwidth: 4.133705 CV score: 133.555 
## Bandwidth: 1.583066 CV score: 117.6912 
## Bandwidth: 0.9809418 CV score: 106.8429 
## Bandwidth: 0.6088086 CV score: 102.4296 
## Bandwidth: 0.3788176 CV score: 109.7516 
## Bandwidth: 0.7131633 CV score: 103.1319 
## Bandwidth: 0.5209598 CV score: 102.7246 
## Bandwidth: 0.5968741 CV score: 102.4024 
## Bandwidth: 0.5874787 CV score: 102.3922 
## Bandwidth: 0.5825212 CV score: 102.3912 
## Bandwidth: 0.5834094 CV score: 102.3911 
## Bandwidth: 0.5834871 CV score: 102.3911 
## Bandwidth: 0.5835278 CV score: 102.3911 
## Bandwidth: 0.5834871 CV score: 102.3911
## # of obs for 98024: 80
## Bandwidth: 4.765694 CV score: 22.80874 
## Bandwidth: 7.703356 CV score: 22.22945 
## Bandwidth: 9.518931 CV score: 22.13069 
## Bandwidth: 9.516428 CV score: 22.13078 
## Bandwidth: 10.64102 CV score: 22.09534 
## Bandwidth: 11.33451 CV score: 22.07894 
## Bandwidth: 11.76311 CV score: 22.07032 
## Bandwidth: 12.02799 CV score: 22.06548 
## Bandwidth: 12.1917 CV score: 22.06266 
## Bandwidth: 12.29288 CV score: 22.06097 
## Bandwidth: 12.35541 CV score: 22.05995 
## Bandwidth: 12.39406 CV score: 22.05933 
## Bandwidth: 12.41795 CV score: 22.05895 
## Bandwidth: 12.43271 CV score: 22.05871 
## Bandwidth: 12.44183 CV score: 22.05857 
## Bandwidth: 12.44747 CV score: 22.05848 
## Bandwidth: 12.45095 CV score: 22.05842 
## Bandwidth: 12.45311 CV score: 22.05839 
## Bandwidth: 12.45444 CV score: 22.05837 
## Bandwidth: 12.45526 CV score: 22.05836 
## Bandwidth: 12.45577 CV score: 22.05835 
## Bandwidth: 12.45608 CV score: 22.05834 
## Bandwidth: 12.45628 CV score: 22.05834 
## Bandwidth: 12.4564 CV score: 22.05834 
## Bandwidth: 12.45647 CV score: 22.05834 
## Bandwidth: 12.45652 CV score: 22.05834 
## Bandwidth: 12.45652 CV score: 22.05834
## # of obs for 98011: 194
## Bandwidth: 2.922014 CV score: 5.503633 
## Bandwidth: 4.723198 CV score: 5.552828 
## Bandwidth: 1.808822 CV score: 5.355322 
## Bandwidth: 1.12083 CV score: 5.122763 
## Bandwidth: 0.6956286 CV score: 5.448172 
## Bandwidth: 1.294302 CV score: 5.170189 
## Bandwidth: 0.9584178 CV score: 5.153721 
## Bandwidth: 1.108615 CV score: 5.121662 
## Bandwidth: 1.090606 CV score: 5.120848 
## Bandwidth: 1.040115 CV score: 5.124517 
## Bandwidth: 1.084429 CV score: 5.120807 
## Bandwidth: 1.085413 CV score: 5.120805 
## Bandwidth: 1.085497 CV score: 5.120805 
## Bandwidth: 1.085538 CV score: 5.120805 
## Bandwidth: 1.085457 CV score: 5.120805 
## Bandwidth: 1.085497 CV score: 5.120805
## # of obs for 98031: 272
## Bandwidth: 5.277303 CV score: 5.11857 
## Bandwidth: 8.53033 CV score: 5.113478 
## Bandwidth: 10.54081 CV score: 5.112241 
## Bandwidth: 11.23822 CV score: 5.111949 
## Bandwidth: 12.21438 CV score: 5.111616 
## Bandwidth: 12.81768 CV score: 5.111445 
## Bandwidth: 13.19054 CV score: 5.11135 
## Bandwidth: 13.42098 CV score: 5.111295 
## Bandwidth: 13.5634 CV score: 5.111262 
## Bandwidth: 13.65142 CV score: 5.111243 
## Bandwidth: 13.70582 CV score: 5.111231 
## Bandwidth: 13.73944 CV score: 5.111223 
## Bandwidth: 13.76022 CV score: 5.111219 
## Bandwidth: 13.77306 CV score: 5.111216 
## Bandwidth: 13.781 CV score: 5.111214 
## Bandwidth: 13.7859 CV score: 5.111213 
## Bandwidth: 13.78893 CV score: 5.111213 
## Bandwidth: 13.79081 CV score: 5.111212 
## Bandwidth: 13.79197 CV score: 5.111212 
## Bandwidth: 13.79268 CV score: 5.111212 
## Bandwidth: 13.79312 CV score: 5.111212 
## Bandwidth: 13.7934 CV score: 5.111212 
## Bandwidth: 13.79357 CV score: 5.111212 
## Bandwidth: 13.79367 CV score: 5.111212 
## Bandwidth: 13.79374 CV score: 5.111212 
## Bandwidth: 13.79378 CV score: 5.111212 
## Bandwidth: 13.79378 CV score: 5.111212
## # of obs for 98106: 330
## Bandwidth: 3.384394 CV score: 11.42128 
## Bandwidth: 5.470598 CV score: 11.81277 
## Bandwidth: 2.09505 CV score: 10.82058 
## Bandwidth: 1.298191 CV score: 10.36701 
## Bandwidth: 0.805705 CV score: 10.11311 
## Bandwidth: 0.501332 CV score: 9.803307 
## Bandwidth: 0.3132191 CV score: 9.934303 
## Bandwidth: 0.5073076 CV score: 9.809176 
## Bandwidth: 0.4475344 CV score: 9.764544 
## Bandwidth: 0.3962305 CV score: 9.768744 
## Bandwidth: 0.4272444 CV score: 9.760021 
## Bandwidth: 0.4260484 CV score: 9.759985 
## Bandwidth: 0.4249232 CV score: 9.759975 
## Bandwidth: 0.4250417 CV score: 9.759975 
## Bandwidth: 0.4250824 CV score: 9.759975 
## Bandwidth: 0.425001 CV score: 9.759975 
## Bandwidth: 0.4250417 CV score: 9.759975
## # of obs for 98072: 272
## Bandwidth: 4.952095 CV score: 24.94 
## Bandwidth: 8.004659 CV score: 25.14499 
## Bandwidth: 3.065507 CV score: 24.3775 
## Bandwidth: 1.899532 CV score: 23.10768 
## Bandwidth: 1.178919 CV score: 22.05427 
## Bandwidth: 0.7335563 CV score: 23.9101 
## Bandwidth: 1.387822 CV score: 22.21296 
## Bandwidth: 1.006591 CV score: 22.27918 
## Bandwidth: 1.213242 CV score: 22.05611 
## Bandwidth: 1.189454 CV score: 22.05351 
## Bandwidth: 1.190992 CV score: 22.0535 
## Bandwidth: 1.190804 CV score: 22.0535 
## Bandwidth: 1.190845 CV score: 22.0535 
## Bandwidth: 1.190763 CV score: 22.0535 
## Bandwidth: 1.190804 CV score: 22.0535
## # of obs for 98188: 135
## Bandwidth: 2.220828 CV score: 3.349983 
## Bandwidth: 3.589787 CV score: 3.277773 
## Bandwidth: 4.435851 CV score: 3.267356 
## Bandwidth: 4.350055 CV score: 3.268075 
## Bandwidth: 4.958747 CV score: 3.26392 
## Bandwidth: 5.281914 CV score: 3.262386 
## Bandwidth: 5.481643 CV score: 3.261596 
## Bandwidth: 5.605082 CV score: 3.261158 
## Bandwidth: 5.681371 CV score: 3.260904 
## Bandwidth: 5.728521 CV score: 3.260753 
## Bandwidth: 5.757661 CV score: 3.260661 
## Bandwidth: 5.77567 CV score: 3.260606 
## Bandwidth: 5.786801 CV score: 3.260572 
## Bandwidth: 5.79368 CV score: 3.260551 
## Bandwidth: 5.797931 CV score: 3.260538 
## Bandwidth: 5.800559 CV score: 3.26053 
## Bandwidth: 5.802182 CV score: 3.260525 
## Bandwidth: 5.803186 CV score: 3.260522 
## Bandwidth: 5.803806 CV score: 3.26052 
## Bandwidth: 5.80419 CV score: 3.260519 
## Bandwidth: 5.804427 CV score: 3.260518 
## Bandwidth: 5.804573 CV score: 3.260518 
## Bandwidth: 5.804664 CV score: 3.260518 
## Bandwidth: 5.80472 CV score: 3.260518 
## Bandwidth: 5.80476 CV score: 3.260517 
## Bandwidth: 5.80476 CV score: 3.260517
## # of obs for 98014: 123
## Bandwidth: 19.18704 CV score: 16.13944 
## Bandwidth: 31.01428 CV score: 16.32168 
## Bandwidth: 11.8774 CV score: 15.50768 
## Bandwidth: 7.359791 CV score: 14.39603 
## Bandwidth: 4.567757 CV score: 12.66188 
## Bandwidth: 2.842185 CV score: 11.16204 
## Bandwidth: 1.775723 CV score: 10.81563 
## Bandwidth: 1.475956 CV score: 12.05743 
## Bandwidth: 2.259285 CV score: 10.73751 
## Bandwidth: 2.114311 CV score: 10.67781 
## Bandwidth: 2.065209 CV score: 10.66745 
## Bandwidth: 1.954635 CV score: 10.67187 
## Bandwidth: 2.022651 CV score: 10.664 
## Bandwidth: 2.021164 CV score: 10.66398 
## Bandwidth: 2.018802 CV score: 10.66398 
## Bandwidth: 2.018997 CV score: 10.66397 
## Bandwidth: 2.019038 CV score: 10.66397 
## Bandwidth: 2.018956 CV score: 10.66397 
## Bandwidth: 2.018997 CV score: 10.66397
## # of obs for 98055: 260
## Bandwidth: 3.20555 CV score: 6.626513 
## Bandwidth: 5.18151 CV score: 6.718694 
## Bandwidth: 1.984339 CV score: 6.515938 
## Bandwidth: 1.229589 CV score: 6.384751 
## Bandwidth: 0.7631284 CV score: 6.327045 
## Bandwidth: 0.4748397 CV score: 6.69746 
## Bandwidth: 0.9413006 CV score: 6.323349 
## Bandwidth: 0.8729123 CV score: 6.317269 
## Bandwidth: 0.8625981 CV score: 6.316985 
## Bandwidth: 0.8501111 CV score: 6.316912 
## Bandwidth: 0.8532948 CV score: 6.316901 
## Bandwidth: 0.853425 CV score: 6.316901 
## Bandwidth: 0.8533843 CV score: 6.316901 
## Bandwidth: 0.8533436 CV score: 6.316901 
## Bandwidth: 0.8533843 CV score: 6.316901
## # of obs for 98039: 49
## Bandwidth: 1.420567 CV score: 74.28302 
## Bandwidth: 2.29623 CV score: 73.34184 
## Bandwidth: 2.83742 CV score: 73.11766 
## Bandwidth: 3.011033 CV score: 73.07181 
## Bandwidth: 3.279193 CV score: 73.01634 
## Bandwidth: 3.444924 CV score: 72.98913 
## Bandwidth: 3.547352 CV score: 72.97444 
## Bandwidth: 3.610656 CV score: 72.96607 
## Bandwidth: 3.64978 CV score: 72.96115 
## Bandwidth: 3.67396 CV score: 72.95819 
## Bandwidth: 3.688904 CV score: 72.9564 
## Bandwidth: 3.69814 CV score: 72.9553 
## Bandwidth: 3.703848 CV score: 72.95463 
## Bandwidth: 3.707376 CV score: 72.95422 
## Bandwidth: 3.709556 CV score: 72.95396 
## Bandwidth: 3.710904 CV score: 72.95381 
## Bandwidth: 3.711736 CV score: 72.95371 
## Bandwidth: 3.712251 CV score: 72.95365 
## Bandwidth: 3.712569 CV score: 72.95361 
## Bandwidth: 3.712766 CV score: 72.95359 
## Bandwidth: 3.712887 CV score: 72.95357 
## Bandwidth: 3.712962 CV score: 72.95357 
## Bandwidth: 3.713009 CV score: 72.95356 
## Bandwidth: 3.713009 CV score: 72.95356

Clustering

We decided to perform clustering on GWR results. For this purpose we first extract x, y values and also coefficients for each variables as dataframe called “all_coefficients”.

Then we used Elbow method to find best number of cluster. However during that process we recognized that there is a missing values for extracted coefficients so we remove them for further analyze.

# Function to extract coefficients from GWR results
extract_coefficients <- function(gwr_result) {
  coefficients <- gwr_result$SDF %>% as.data.frame()
  selected_columns <- coefficients %>% select(X, Y, bathrooms, sqft_living, grade, sqft_living15)
  selected_columns <- na.omit(selected_columns) # Remove rows with NA values
  return(selected_columns)
}

# Create data frame to combine coefficients for all zip codes
all_coefficients <- lapply(gwr_results, extract_coefficients) %>% bind_rows()

Choosing Number of Cluster: Elbow Method

Here instead of fviz_nbclust method which more memory-intensive and much slower especially not suits for large datasets we decided to used custom function. This approach manually computes the WSS for a range of cluster numbers and then plots the results much faster while needs more lines of codes.

Based on the elbow graph analysis, the optimal number of clusters is determined to be 2 (shown as red point). This is identified by calculating and visualizing the second derivative of the WSS (within-cluster sum of squares) values.

# Compute WSS for a range of k
wss <- function(k) {
  kmeans(all_coefficients, centers = k, nstart = 25, iter.max = 2000)$tot.withinss
}
k.values <- 1:15
wss_values <- map_dbl(k.values, wss)
## Warning: Quick-TRANSfer stage steps exceeded maximum (= 1071450)
# Plot WSS vs. k to find the elbow point
plot(k.values, wss_values, type = "b", pch = 19, frame = FALSE,
     xlab = "Number of clusters K", ylab = "Total within-clusters sum of squares")

# Calculate the second derivative to find the elbow point
wss_diff <- diff(wss_values)
wss_diff2 <- diff(wss_diff)
elbow <- which.max(wss_diff2) + 1

# Plot WSS vs. k with the elbow point marked
points(elbow, wss_values[elbow], col = "red", pch = 19, cex = 2)

Perform Clustering

We performed the kmeans clustering with the chosen number of clusters 2. Here we also used set.seed() to obtain the reproducibility. Then we assigned the cluster informations to the original data. Then we printed the centers for each cluster. The 1th cluster took the attentions with opposite sign for bathroom and higher values, indicating it has distinct characteristics compared to the other cluster.

set.seed(12356)

optimal_k <-2
kmeans_result <- kmeans(all_coefficients, centers = optimal_k, nstart = 25, iter.max = 100)

# Add cluster assignments to the original data
all_coefficients$cluster <- kmeans_result$cluster

# Print cluster centers
print(kmeans_result$centers)
##           X        Y   bathrooms sqft_living     grade sqft_living15
## 1 -122.3049 47.64305 -0.06720048   0.5506481 0.2043207     0.1593478
## 2 -122.1746 47.52456  0.02859060   0.2296444 0.1214938     0.0669485

Plotting the Cluster Results

Finally we plot the clustering results on the map. The clustering results on the map shows that the first cluster is more concentrated in the northern part of the map, while the second cluster is spread across both the northern and southern parts. This separation suggests significant geographic and potentially socioeconomic distinctions between the two clusters.

However we believe that having only 2 clusters may not be enough to show all differences in our dataset. So, we decided to increase the number of clusters to see if it gives better results. We will try with 4 clusters because it might capture more detailed variations.

set.seed(123456)
# Create a data frame to hold the cluster assignments
cluster_assignments <- data.frame( X = all_coefficients$X, Y = all_coefficients$Y, cluster = as.factor(kmeans_result$cluster))

# Convert to sf object for plotting
sf_clusters <- st_as_sf(cluster_assignments, coords = c("X", "Y"), crs = 4326)

# Plot the clusters
ggplot(data = sf_clusters) +
  geom_sf(aes(color = cluster), size = 0.5) +
  labs(title = "GWR Coefficient Clusters", color = "Cluster") +
  theme_minimal()

Increasing The Number of Cluster

When we increased the number of clusters to 4, we still saw a clear difference between the north and south regions. The south cluster stayed almost the same, but in the north, we noticed three separate clusters forming from east to west. This means that the northern part has more detailed variations compared to the south.

However we observed that some points looks widely spreaded.We considered using a Facet Grid for plotting to achieve better visualization. Our objective is to observe the distribution of clusters more clearly. we obtain a clearer view of their individual distributions.

set.seed(12356)

increased_k <-4
kmeans_result2 <- kmeans(all_coefficients, centers = increased_k, nstart = 25, iter.max = 100)

# Add cluster assignments to the original data
all_coefficients$cluster2 <- kmeans_result2$cluster

# Print cluster centers
print(kmeans_result2$centers)
##           X        Y   bathrooms sqft_living     grade sqft_living15 cluster
## 1 -122.2257 47.41691  0.01135620   0.1883257 0.1007914    0.05626240       2
## 2 -122.2376 47.62250 -0.06190791   0.9174346 0.2442467    0.08458150       1
## 3 -122.1139 47.65234  0.04904907   0.2786927 0.1460691    0.07963366       2
## 4 -122.3206 47.64785 -0.06843470   0.4651137 0.1950099    0.17678320       1
# Create a data frame to hold the cluster assignments
cluster_assignments2 <- data.frame( X = all_coefficients$X, Y = all_coefficients$Y, cluster = as.factor(all_coefficients$cluster2))

# Convert to sf object for plotting
sf_clusters2 <- st_as_sf(cluster_assignments2, coords = c("X", "Y"), crs = 4326)

# Plot the clusters
ggplot(data = sf_clusters2) +
  geom_sf(aes(color = cluster), size = 0.5) +
  labs(title = "GWR Coefficient Clusters", color = "Cluster") +
  theme_minimal()

# Plot using facet grid to separate each cluster
ggplot(data = sf_clusters2) +
  geom_sf(aes(color = cluster), size = 1, alpha = 0.6) +
  labs(title = "GWR Coefficient Clusters",
       subtitle = "Clustering of GWR Coefficients by Zip Code",
       color = "Cluster") +
  theme_minimal() +
  facet_wrap(~ cluster)

Conclusion

In this study, we looked at the different housing prices in King County, Washington, using Geographically Weighted Regression (GWR). First, we did a detailed Exploratory Data Analysis (EDA) to find the important factors that affect house prices and make OLS estimation.

Then we grouped data by zip codes and perform the GWR for each distinct. The GWR analysis showed how the connections between these factors and prices change in different locations.

Even though we got some warnings during the GWR modeling, which we need to check more, we continued with k-means clustering on the coefficients we got. We decided to have three clusters according to Elbow graph. The third cluster had very different features with higher values and opposite signs for its coefficients.

We made plots of the clustering results and saw unique patterns in the data. This showed how important it is to think about local differences in real estate analysis. In the future, we should fix the warnings and improve the clustering method to better understand the relationships in housing markets.

##Resources

Lecture Notes and Codes: Spatial econometrics in R [2400-ZEWW780] / prof. Kopczewska

Own Github Repositiories:

https://github.com/gizemguleli/House-Prices-in-King-County-Washington

https://github.com/gizemguleli/RR_Project-Housing-Price-Dynamics-in-King-Cunty-WA

Tools and Software: R Studio and associated libraries

Data Sources:

https://www.kaggle.com/datasets/shivachandel/kc-house-data/data https://gis-kingcounty.opendata.arcgis.com/