Property Data Analysis: School Performance, Pricing, and Sales Velocity

This document provides R code and analysis to explore the relationships between various property features, school performance, pricing, and sales dynamics within a Property_Schools dataset.

1. Setup and Data Loading

First, we need to load the necessary R libraries and the dataset:

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## corrplot 0.92 loaded
## 
## Loading required package: viridisLite
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 16104 Columns: 85
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (46): Address, Street Name, City/Location, School Name, Tax ID, Bed Room...
## dbl (32): UniqueID, Zip Code, Close Price, Tax Rate, Tax Year, Baths Full, B...
## lgl  (7): HOA Mandatory, Maintenance Fee Yes No, Management Co YN, Pool Area...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 6 × 85
##   UniqueID Address        `Street Name` `City/Location` `Zip Code` `School Name`
##      <dbl> <chr>          <chr>         <chr>                <dbl> <chr>        
## 1 18577356 185 Capetown   Capetown      Conroe               77356 Montgomery I…
## 2  1177382 11 LANSDOWNE   LANSDOWNE     The Woodlands        77382 Conroe Indep…
## 3       NA 203 N Vershir… Vershire      Magnolia             77354 <NA>         
## 4       NA 27037 Renly R… Renly Ridge   Magnolia             77354 <NA>         
## 5       NA 119 Keechie C… Keechie Creek Conroe               77304 <NA>         
## 6  2377380 23 Olmstead R… Olmstead      The Woodlands        77380 Conroe Indep…
## # ℹ 79 more variables: `Close Price` <dbl>, `Tax ID` <chr>, `Tax Rate` <dbl>,
## #   `Tax Year` <dbl>, `Baths Full` <dbl>, `Baths Half` <dbl>,
## #   `Baths Total` <dbl>, `Bed Room Description` <chr>, Bedrooms <dbl>,
## #   `Beds Max` <dbl>, `Building SqFt` <dbl>, CDOM <dbl>, `Close Date` <chr>,
## #   Country <chr>, County <chr>, `Current Price` <dbl>, `Current $/SqFt` <dbl>,
## #   `(Current $/SqFT) / (List $/SqFt)` <dbl>, `Garage Carport` <chr>,
## #   `Garage Desc` <chr>, `Heat System` <chr>, `HOA Mandatory` <lgl>, …
##     UniqueID           Address          Street Name        City/Location     
##  Min.   :2.774e+05   Length:16104       Length:16104       Length:16104      
##  1st Qu.:2.527e+08   Class :character   Class :character   Class :character  
##  Median :6.618e+08   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :      Inf                                                           
##  3rd Qu.:1.724e+09                                                           
##  Max.   :      Inf                                                           
##  NA's   :6743                                                                
##     Zip Code     School Name         Close Price        Tax ID         
##  Min.   :77002   Length:16104       Min.   :  1250   Length:16104      
##  1st Qu.:77316   Class :character   1st Qu.:  2050   Class :character  
##  Median :77386   Mode  :character   Median :  2400   Mode  :character  
##  Mean   :77346                      Mean   : 41033                     
##  3rd Qu.:77459                      3rd Qu.:  2900                     
##  Max.   :77598                      Max.   :600000                     
##  NA's   :3                          NA's   :4057                       
##     Tax Rate           Tax Year       Baths Full      Baths Half    
##  Min.   :   0.233   Min.   :2011    Min.   :2.000   Min.   :0.0000  
##  1st Qu.:   2.580   1st Qu.:2021    1st Qu.:2.000   1st Qu.:0.0000  
##  Median :   2.914   Median :2021    Median :2.000   Median :1.0000  
##  Mean   :   3.451   Mean   :2021    Mean   :2.368   Mean   :0.5794  
##  3rd Qu.:   3.200   3rd Qu.:2021    3rd Qu.:3.000   3rd Qu.:1.0000  
##  Max.   :2513.000   Max.   :3312    Max.   :5.000   Max.   :5.0000  
##  NA's   :11504      NA's   :12159                                   
##   Baths Total    Bed Room Description    Bedrooms        Beds Max     
##  Min.   :2.000   Length:16104         Min.   :3.000   Min.   : 0.000  
##  1st Qu.:2.000   Class :character     1st Qu.:3.000   1st Qu.: 3.000  
##  Median :2.100   Mode  :character     Median :4.000   Median : 4.000  
##  Mean   :2.426                        Mean   :3.693   Mean   : 3.845  
##  3rd Qu.:3.000                        3rd Qu.:4.000   3rd Qu.: 4.000  
##  Max.   :5.100                        Max.   :8.000   Max.   :32.000  
##                                                       NA's   :12434   
##  Building SqFt       CDOM         Close Date          Country         
##  Min.   :1700   Min.   : -7.00   Length:16104       Length:16104      
##  1st Qu.:1967   1st Qu.:  9.00   Class :character   Class :character  
##  Median :2264   Median : 21.00   Mode  :character   Mode  :character  
##  Mean   :2365   Mean   : 34.08                                        
##  3rd Qu.:2654   3rd Qu.: 42.00                                        
##  Max.   :4000   Max.   :687.00                                        
##                                                                       
##     County          Current Price    Current $/SqFt    
##  Length:16104       Min.   :  1250   Min.   :  0.4883  
##  Class :character   1st Qu.:  2166   1st Qu.:  0.9758  
##  Mode  :character   Median :  2695   Median :  1.1350  
##                     Mean   :131044   Mean   : 55.3024  
##                     3rd Qu.:325000   3rd Qu.:144.0497  
##                     Max.   :600000   Max.   :317.1892  
##                                                        
##  (Current $/SqFT) / (List $/SqFt) Garage Carport     Garage Desc       
##  Min.   :0.6412                   Length:16104       Length:16104      
##  1st Qu.:1.0000                   Class :character   Class :character  
##  Median :1.0000                   Mode  :character   Mode  :character  
##  Mean   :0.9994                                                        
##  3rd Qu.:1.0000                                                        
##  Max.   :2.5714                                                        
##                                                                        
##  Heat System        HOA Mandatory    HOA Phone         HOA Website       
##  Length:16104       Mode :logical   Length:16104       Length:16104      
##  Class :character   FALSE:411       Class :character   Class :character  
##  Mode  :character   TRUE :4828      Mode  :character   Mode  :character  
##                     NA's :10865                                          
##                                                                          
##                                                                          
##                                                                          
##     Legal           Legal Subdivision    List Price      List $/SqFt    
##  Length:16104       Length:16104       Min.   :  1250   Min.   :  0.49  
##  Class :character   Class :character   1st Qu.:  2175   1st Qu.:  0.98  
##  Mode  :character   Mode  :character   Median :  2695   Median :  1.13  
##                                        Mean   :131469   Mean   : 55.48  
##                                        3rd Qu.:325000   3rd Qu.:144.34  
##                                        Max.   :690201   Max.   :317.56  
##                                                                         
##  Lot Description    Lot Price Sq Ft List Lot Price Sq Ft Sold    Lot Size     
##  Length:16104       Min.   :    0.00     Min.   :  0.000      Min.   :     0  
##  Class :character   1st Qu.:    0.33     1st Qu.:  0.320      1st Qu.:  5352  
##  Mode  :character   Median :    0.42     Median :  0.370      Median :  6386  
##                     Mean   :   27.86     Mean   :  6.735      Mean   :  6821  
##                     3rd Qu.:   46.14     3rd Qu.:  0.480      3rd Qu.:  7674  
##                     Max.   :12355.36     Max.   :341.490      Max.   :701721  
##                     NA's   :2707         NA's   :5926         NA's   :2668    
##  Lot Size Source    Maint Fee Amt    Maint Fee Includes Maint Fee Pay Schedule
##  Length:16104       Min.   :   0.0   Length:16104       Length:16104          
##  Class :character   1st Qu.: 500.0   Class :character   Class :character      
##  Mode  :character   Median : 770.0   Mode  :character   Mode  :character      
##                     Mean   : 808.1                                            
##                     3rd Qu.:1050.0                                            
##                     Max.   :3317.0                                            
##                     NA's   :11307                                             
##  Maintenance Fee Yes No Management Co YN Management Company Market Area       
##  Mode:logical           Mode :logical    Length:16104       Length:16104      
##  NA's:16104             FALSE:6622       Class :character   Class :character  
##                         TRUE :4240       Mode  :character   Mode  :character  
##                         NA's :5242                                            
##                                                                               
##                                                                               
##                                                                               
##  Master Planned Community Original List Price Other Mandatory Fee
##  Length:16104             Min.   :      2     Length:16104       
##  Class :character         1st Qu.:   2200     Class :character   
##  Mode  :character         Median :   2715     Mode  :character   
##                           Mean   : 135034                        
##                           3rd Qu.: 330000                        
##                           Max.   :4150000                        
##                           NA's   :1                              
##  Pool Area       Pool Private    Pool Private Desc    Price Sale     
##  Mode :logical   Mode :logical   Length:16104       Min.   :      0  
##  FALSE:5840      FALSE:16104     Class :character   1st Qu.: 340000  
##  TRUE :8277                      Mode  :character   Median : 434990  
##  NA's :1987                                         Mean   : 463654  
##                                                     3rd Qu.: 540000  
##                                                     Max.   :4300000  
##                                                     NA's   :15915    
##  Property Description Property Type      School District    School District KW
##  Length:16104         Length:16104       Length:16104       Length:16104      
##  Class :character     Class :character   Class :character   Class :character  
##  Mode  :character     Mode  :character   Mode  :character   Mode  :character  
##                                                                               
##                                                                               
##                                                                               
##                                                                               
##  School Elementary   EM (lower)        EM - District      EM - Great Schools
##  Length:16104       Length:16104       Length:16104       Min.   : 0.000    
##  Class :character   Class :character   Class :character   1st Qu.: 4.000    
##  Mode  :character   Mode  :character   Mode  :character   Median : 5.000    
##                                                           Mean   : 5.632    
##                                                           3rd Qu.: 7.000    
##                                                           Max.   :10.000    
##                                                           NA's   :6730      
##    EM - Score     School Middle      School High        Section Num       
##  Min.   :  0.00   Length:16104       Length:16104       Length:16104      
##  1st Qu.:100.00   Class :character   Class :character   Class :character  
##  Median :100.00   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 97.92                                                           
##  3rd Qu.:100.00                                                           
##  Max.   :100.00                                                           
##  NA's   :6730                                                             
##   Sold Terms         SP/LP Ratio    Sales Velocity State Or Province 
##  Length:16104       Min.   :0.641   Mode:logical   Length:16104      
##  Class :character   1st Qu.:1.000   NA's:16104     Class :character  
##  Mode  :character   Median :1.000                  Mode  :character  
##                     Mean   :0.999                                    
##                     3rd Qu.:1.000                                    
##                     Max.   :2.571                                    
##                     NA's   :4057                                     
##     Status             Stories      Street Dir Prefix  Street Dir Suffix 
##  Length:16104       Min.   :1.000   Length:16104       Length:16104      
##  Class :character   1st Qu.:1.000   Class :character   Class :character  
##  Mode  :character   Median :2.000   Mode  :character   Mode  :character  
##                     Mean   :1.686                                        
##                     3rd Qu.:2.000                                        
##                     Max.   :4.000                                        
##                     NA's   :713                                          
##  Street Number   Street Suffix      Street Surface     Subdivision       
##  Min.   :    0   Length:16104       Length:16104       Length:16104      
##  1st Qu.: 2913   Class :character   Class :character   Class :character  
##  Median : 7811   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :10702                                                           
##  3rd Qu.:18106                                                           
##  Max.   :42561                                                           
##  NA's   :24                                                              
##    Tax Amount     Utilities         Utility District Water Sewer       
##  Min.   :    0   Length:16104       Mode :logical    Length:16104      
##  1st Qu.: 5505   Class :character   FALSE:983        Class :character  
##  Median : 6906   Mode  :character   TRUE :4699       Mode  :character  
##  Mean   : 6806                      NA's :10422                        
##  3rd Qu.: 8645                                                         
##  Max.   :23307                                                         
##  NA's   :13473                                                         
##    Year Built    Year Built Src      Zip Code + 4  Welcome Version
##  Min.   : 2000   Length:16104       Min.   :   1   Min.   :1.000  
##  1st Qu.: 2006   Class :character   1st Qu.:2266   1st Qu.:1.000  
##  Median : 2014   Mode  :character   Median :3828   Median :3.000  
##  Mean   : 2015                      Mean   :4079   Mean   :2.931  
##  3rd Qu.: 2021                      3rd Qu.:5619   3rd Qu.:4.000  
##  Max.   :20222                      Max.   :9865   Max.   :5.000  
##                                     NA's   :5278

2. Impact of School Performance on Property Value & Sales

This section explores how school performance ratings influence property pricing and how quickly properties sell. Indicators for property selling performance include: CDOM (cumulative days on the market), listing price, and sales velocity.

Question 1: How do School Performance Ratings correlate with Listing Price?

Analysis: We’ll examine the relationship between school performance (EM - Great Schools) and Listing Price. If school performance is numeric (0 - 10), we can calculate a correlation coefficient. If it’s categorical (e.g., A, B, C), we’ll use box plots to compare price distributions across different rating categories. We can also change the class of the school performance variable from numeric to categorical and construct a boxplot.

## Warning: Removed 6730 rows containing non-finite outside the scale range
## (`stat_bin()`).

## Warning in viridisLite::viridis(256, alpha, begin, end, direction, option):
## Option 'YlGnBu' does not exist. Defaulting to 'viridis'.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 6730 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: The following aesthetics were dropped during statistical transformation: size.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## Warning: Removed 6730 rows containing missing values or values outside the scale range
## (`geom_point()`).

Results: Higher Performing Schools have a higher Listing Price.

The boxplot suggests that schools with higher performance have a greater median (50th percentile) list price and properties located in lower performing school zones have a lower list price. There could be other factors that influence the list price such as the building square footage, number of bedrooms, and number of bathrooms.

Question 2: Do properties in areas with higher School Performance Ratings have a faster Sales Velocity (lower CDOM)?

Analysis: We’ll investigate the relationship between School Performance Ratings and CDOM (Cumulative Days on Market). A lower CDOM indicates faster sales velocity where sales velocity is calculated by taking the list price and dividing by the CDOM.

## Correlation between School Performance Ratings and CDOM: -0.05094555
## `geom_smooth()` using formula = 'y ~ x'

## `geom_smooth()` using formula = 'y ~ x'

Observation:

Single-Family Homes in high and low performing school zones sell faster. Higher performing school zones have higher quality homes and lower performing school zones have lower listing prices.

#Question 3: Is there a premium on Listing Price for properties within top-rated school districts, even when controlling for other features like Building Square Footage and Number of Bedrooms/Bathrooms? We can generate a linear regression model to test for influence on the features on the list price of the properties. One observation is that the list price is not normally distributed suggesting their might be higher variability at the tails of the distribution.

## 
## --- Regression Model: Listing Price Premium by School Performance ---
## 
## Call:
## lm(formula = `List Price` ~ `EM - Great Schools` + `Building SqFt` + 
##     Bedrooms + `Baths Full`, data = df_reg)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -348140 -125212  -93388  201863  512913 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          -33045.040  13609.253  -2.428   0.0152 *  
## `EM - Great Schools`  -9222.467    967.042  -9.537  < 2e-16 ***
## `Building SqFt`          26.943      5.188   5.194  2.1e-07 ***
## Bedrooms               -870.997   3751.254  -0.232   0.8164    
## `Baths Full`          66868.912   4379.326  15.269  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 187500 on 9369 degrees of freedom
## Multiple R-squared:  0.05887,    Adjusted R-squared:  0.05847 
## F-statistic: 146.5 on 4 and 9369 DF,  p-value: < 2.2e-16

## Warning: package 'rgl' was built under R version 4.4.1
## Warning in rgl.planes(a = coef(model_price_premium)[2], b = coef(model_price_premium)[3], : 'rgl.planes' is deprecated.
## Use 'planes3d' instead.
## See help("Deprecated")

The regression model shows the theoretical quantiles are not normally distributed and the variability of the distribution is greater at the lower fitted values.

Question 4: How does the presence of a pool within higher performing schools affect the listing price or sales velocity?

We can filter out the poor performing schools and with the remaining high performing schools, we can evaludate the CDOM and Listing Price for properties with and without a pool.

## 
## Attaching package: 'scales'
## The following object is masked from 'package:viridis':
## 
##     viridis_pal
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor

## `geom_smooth()` using formula = 'y ~ x'
## Warning: The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?
## The following aesthetics were dropped during statistical transformation:
## colour.
## ℹ This can happen when ggplot fails to infer the correct grouping structure in
##   the data.
## ℹ Did you forget to specify a `group` aesthetic or to convert a numerical
##   variable into a factor?

# Observation: Our results indicate that properties with pool areas have a much higher listing price (on average). Interestingly, as the school performance increases, the probability that a property will have a pool also increases suggesting a positive correlation between school performance and quality in amenities. For properties that have a pool, the distribution is highly skewed as the the third quartile (top of the box and whisker plot with teal color) is much higher in the listing price compared to the median (50th percentile) and 25th percentile. This suggests that there might be other factors that influence the list price such as the number of bedrooms, bathrooms, and building square footage.