What Demographic Variables May Influence Median House Value for North Carolina in 2020?

Author: David Parr, daveparr@unc.edu

Date: March 2025

Introduction

In this Spatial Data Science research project, I’m examining which demographic factors might influence or relate to median home value for North Carolina counties in 2020. In particular, this research examines if the percentage of people working from home is related to home value, and how much it increases or decreases home value. Other Census demographic variables, including percentage of persons claiming only white ancestry, median age, and median income will also be incorporated into the model.

The project will explore the data, build a model, and discuss how well the model fits the data statistically and spatially using residuals.

Background

Home price and cost of living displays geographic relationships, but isn’t a common topic covered by geographers or spatial data scientists (Campbell and James, 2023). Cambell and James (2023) developed a Housing Cost Index variable, a ratio of median home value to median household income, as a proxy for cost of living. In their paper, they used population density, 5-year population growth, property tax rates, and boolean variables for regional variation to determine spatial relationships of cost-of-living for North Carolina primarily using the 2020 Census. One of their hypotheses was that second, vacation homes in the mountain west and beach-front east of North Carolina increase home values and cost of living disproportionately.

Data and Methods

For this research, the R code is embedded below. The libraries used include tidyverse, sf, and tmap.

# Load the libraries we'll use in our research

library(tidyverse)
library(sf)
library(tmap)

The primary dataset is the 2020 Census North Carolina data that includes 45 Census variables.

# load the 2020 NC counties

ncCounties <- read_sf("https://drive.google.com/uc?export=view&id=1Ks38Gmdlp58GBhOgb6FBmx8tIAfu0QDw")
glimpse(ncCounties)
## Rows: 100
## Columns: 97
## $ GEOID                                    <chr> "37083", "37179", "37129", "3…
## $ NAME                                     <chr> "Halifax County, North Caroli…
## $ popE                                     <dbl> 50678, 235767, 231448, 63284,…
## $ popM                                     <dbl> NA, NA, NA, NA, NA, 48, NA, N…
## $ maleE                                    <dbl> 24445, 116112, 110218, 31128,…
## $ maleM                                    <dbl> 170, 46, 271, 60, 127, 110, 1…
## $ femaleE                                  <dbl> 26233, 119655, 121230, 32156,…
## $ femaleM                                  <dbl> 170, 46, 271, 60, 127, 105, 1…
## $ white.aloneE                             <dbl> 20078, 187121, 186681, 38439,…
## $ white.aloneM                             <dbl> 301, 1207, 1030, 951, 238, 37…
## $ black.aloneE                             <dbl> 26439, 27297, 30436, 16115, 3…
## $ black.aloneM                             <dbl> 390, 786, 772, 367, 266, 2073…
## $ native.aloneE                            <dbl> 1771, 574, 544, 1318, 212, 34…
## $ native.aloneM                            <dbl> 154, 147, 201, 259, 91, 576, …
## $ asian.aloneE                             <dbl> 367, 7804, 3145, 370, 786, 80…
## $ asian.aloneM                             <dbl> 128, 306, 363, 33, 116, 1104,…
## $ hawaiian.aloneE                          <dbl> 98, 68, 205, 0, 45, 470, 426,…
## $ hawaiian.aloneM                          <dbl> 92, 48, 14, 31, 59, 107, 126,…
## $ two.or.more.racesE                       <dbl> 1597, 7390, 6418, 2110, 2520,…
## $ two.or.more.racesM                       <dbl> 462, 987, 863, 552, 382, 2332…
## $ single.ancestryE                         <dbl> 33651, 111739, 95807, 43714, …
## $ single.ancestryM                         <dbl> 1153, 3271, 2451, 1248, 1236,…
## $ multiple.ancestryE                       <dbl> 5871, 52267, 54366, 6690, 211…
## $ multiple.ancestryM                       <dbl> 906, 2163, 2653, 764, 1119, 5…
## $ born.in.usE                              <dbl> 50678, 235767, 231448, 63284,…
## $ born.in.usM                              <dbl> NA, NA, NA, NA, NA, 48, NA, N…
## $ us.citizen.born.abroadE                  <dbl> 166, 1763, 1922, 191, 742, 12…
## $ us.citizen.born.abroadM                  <dbl> 76, 250, 290, 98, 193, 1038, …
## $ not.a.us.citizenE                        <dbl> 574, 11383, 5464, 4773, 887, …
## $ not.a.us.citizenM                        <dbl> 223, 947, 825, 625, 198, 3204…
## $ drives.alone.to.workE                    <dbl> 14885, 90394, 89347, 21792, 2…
## $ drives.alone.to.workM                    <dbl> 670, 1522, 2280, 819, 700, 39…
## $ carpoolE                                 <dbl> 1995, 10325, 7871, 2934, 2211…
## $ carpoolM                                 <dbl> 403, 868, 787, 424, 297, 2025…
## $ public.transportationE                   <dbl> 15, 477, 444, 89, 93, 5652, 1…
## $ public.transportationM                   <dbl> 22, 138, 161, 68, 53, 828, 11…
## $ bicycleE                                 <dbl> 9, 49, 421, 6, 194, 1058, 670…
## $ bicycleM                                 <dbl> 12, 54, 157, 7, 92, 205, 157,…
## $ works.from.homeE                         <dbl> 781, 11668, 11817, 813, 1571,…
## $ works.from.homeM                         <dbl> 202, 1044, 1276, 230, 251, 28…
## $ walks.to.workE                           <dbl> 237, 991, 1996, 323, 312, 680…
## $ walks.to.workM                           <dbl> 113, 227, 387, 115, 92, 857, …
## $ travels.more.than.1.hour.for.workE       <dbl> 849, 6755, 6519, 1359, 1387, …
## $ travels.more.than.1.hour.for.workM       <dbl> 204, 797, 862, 380, 304, 2112…
## $ householdsE                              <dbl> 50678, 235767, 231448, 63284,…
## $ householdsM                              <dbl> NA, NA, NA, NA, NA, 48, NA, N…
## $ hh.male.lives.aloneE                     <dbl> 2621, 5875, 13955, 2638, 3678…
## $ hh.male.lives.aloneM                     <dbl> 379, 548, 1088, 408, 314, 145…
## $ hh.female.lives.aloneE                   <dbl> 3935, 7072, 19844, 3612, 5567…
## $ hh.female.lives.aloneM                   <dbl> 425, 549, 1100, 398, 340, 214…
## $ hh.opposite.sex.spouseE                  <dbl> 8242, 50815, 40838, 10804, 15…
## $ hh.opposite.sex.spouseM                  <dbl> 556, 928, 1287, 596, 535, 308…
## $ hh.same.sex.spouseE                      <dbl> 59, 162, 713, 78, 74, 1708, 1…
## $ hh.same.sex.spouseM                      <dbl> 53, 97, 280, 83, 44, 265, 395…
## $ hh.opposite.sex.partnerE                 <dbl> 1308, 3511, 6531, 1904, 1780,…
## $ hh.opposite.sex.partnerM                 <dbl> 351, 439, 855, 395, 279, 1451…
## $ hh.same.sex.partnerE                     <dbl> 7, 148, 707, 32, 37, 1308, 15…
## $ hh.same.sex.partnerM                     <dbl> 14, 81, 259, 45, 21, 254, 245…
## $ hh.child.in.householdE                   <dbl> 14018, 81466, 54828, 19737, 1…
## $ hh.child.in.householdM                   <dbl> 705, 1232, 1373, 745, 659, 33…
## $ hh.grandchild.in.houesholdE              <dbl> 2211, 5058, 3684, 2224, 1158,…
## $ hh.grandchild.in.houesholdM              <dbl> 438, 635, 798, 437, 238, 1584…
## $ women.who.had.a.birth.in.past.12.monthsE <dbl> 397, 2656, 2064, 622, 618, 13…
## $ women.who.had.a.birth.in.past.12.monthsM <dbl> 149, 355, 361, 195, 190, 875,…
## $ enrolled.in.schoolE                      <dbl> 10119, 67405, 55815, 14938, 1…
## $ enrolled.in.schoolM                      <dbl> 523, 949, 1212, 430, 427, 328…
## $ undergraduateE                           <dbl> 1636, 11603, 19452, 2312, 252…
## $ undergraduateM                           <dbl> 300, 897, 1108, 357, 316, 228…
## $ only.speak.englishE                      <chr> NA, NA, NA, NA, NA, NA, NA, N…
## $ only.speak.englishM                      <chr> NA, NA, NA, NA, NA, NA, NA, N…
## $ spanish.speakersE                        <chr> NA, NA, NA, NA, NA, NA, NA, N…
## $ spanish.speakersM                        <chr> NA, NA, NA, NA, NA, NA, NA, N…
## $ mandarin.speakersE                       <chr> NA, NA, NA, NA, NA, NA, NA, N…
## $ mandarin.speakersM                       <chr> NA, NA, NA, NA, NA, NA, NA, N…
## $ below.poverty.levelE                     <dbl> 12654, 17290, 32805, 13541, 6…
## $ below.poverty.levelM                     <dbl> 1151, 1775, 2475, 1615, 638, …
## $ median.ageE                              <dbl> 43.9, 38.6, 39.0, 40.3, 49.2,…
## $ median.ageM                              <dbl> 0.4, 0.3, 0.3, 0.4, 0.2, 0.1,…
## $ median.incomeE                           <dbl> 20777, 35468, 31054, 23317, 2…
## $ median.incomeM                           <dbl> 915, 842, 786, 1107, 1132, 39…
## $ median.house.valueE                      <dbl> 85800, 254100, 258200, 97500,…
## $ median.house.valueM                      <dbl> 4548, 5638, 6414, 5710, 8514,…
## $ median.rentE                             <dbl> 705, 1078, 1060, 682, 921, 12…
## $ median.rentM                             <dbl> 25, 45, 21, 42, 40, 10, 9, 43…
## $ pop.under.18E                            <dbl> 10829, 63318, 42710, 15490, 1…
## $ pop.under.18M                            <dbl> 26, 18, 21, NA, NA, 90, NA, 5…
## $ military.veteranE                        <dbl> 2462, 12026, 14736, 3261, 784…
## $ military.veteranM                        <dbl> 293, 683, 1095, 354, 383, 174…
## $ receives.food.stampsE                    <dbl> 6005, 5483, 8421, 4257, 2673,…
## $ receives.food.stampsM                    <dbl> 443, 561, 701, 447, 306, 1531…
## $ housing.unitsE                           <dbl> 25953, 82547, 113125, 27885, …
## $ housing.unitsM                           <dbl> 115, 151, 299, 114, 109, 442,…
## $ occupiedE                                <dbl> 21061, 77791, 97998, 23162, 3…
## $ occupiedM                                <dbl> 443, 574, 957, 603, 573, 1748…
## $ vacantE                                  <dbl> 4892, 4756, 15127, 4723, 2053…
## $ vacantM                                  <dbl> 460, 583, 940, 591, 541, 1586…
## $ geometry                                 <MULTIPOLYGON [°]> MULTIPOLYGON (((…

For the analysis, the percentage of white alone out of total population and the percentage of people who work from home are calculated.

# calculate the percentage of people that work from home and the percent white

ncCounties <- ncCounties |>
  mutate(works.from.home.percent = works.from.homeE / popE * 100,
         white.percent = white.aloneE / popE * 100)

Below, each of the following variables is explored by mapping their occurances in North Carolina and showing the minimum, maximum, average, and variance for each variable. The variables are:

  • Median House Value

  • Median Income

  • Median Age

  • % of people that work from home

  • % of people that claim white (alone) ancestry

Median Home Value

as.data.frame(ncCounties) |>
  drop_na(median.house.valueE) |>
  summarize(count = n(),
            "lowest home price"=min(median.house.valueE),
            "average (median) home price"=mean(median.house.valueE),
            "largest home price"=max(median.house.valueE),
            "variance"=var(median.house.valueE)) |>
  glimpse()
## Rows: 1
## Columns: 5
## $ count                         <int> 100
## $ `lowest home price`           <dbl> 75600
## $ `average (median) home price` <dbl> 158571
## $ `largest home price`          <dbl> 331800
## $ variance                      <dbl> 3259471171
tm_shape(ncCounties) + 
  tm_polygons(fill="median.house.valueE",
              fill.scale= tm_scale_intervals(style="jenks",
                        label.format=list(fun=function(x) paste0("$", formatC(x,big.mark=",",digits=0,format="f")))),
              fill.legend = tm_legend("Median House Value 2020",
                                      position=tm_pos_out(pos.h = "right"))) + 
  tm_title("Median Home Value, NC Counties 2020")

The highest median home values fall along the I-85 corridor and along the coastal regions.

Median Income

as.data.frame(ncCounties) |>
  drop_na(median.incomeE) |>
  summarize(count = n(),
            "lowest income"=min(median.incomeE),
            "average (median) income"=mean(median.incomeE),
            "largest income"=max(median.incomeE),
            "variance"=var(median.incomeE)) |>
  glimpse()
## Rows: 1
## Columns: 5
## $ count                     <int> 100
## $ `lowest income`           <dbl> 19033
## $ `average (median) income` <dbl> 26780.8
## $ `largest income`          <dbl> 41189
## $ variance                  <dbl> 17775049
tm_shape(ncCounties) + 
  tm_polygons(fill="median.incomeE",
              fill.scale= tm_scale_intervals(style="jenks",
                          label.format=list(fun=function(x) paste0("$", formatC(x,big.mark=",",digits=0,format="f")))),

              fill.legend = tm_legend("Median Income 2020",
                                      position=tm_pos_out(pos.h = "right"))) + 
  tm_title("Median Income, NC Counties 2020")

Median Age

as.data.frame(ncCounties) |>
  drop_na(median.ageE) |>
  summarize(count = n(),
            "lowest age"=min(median.ageE),
            "average (median) age"=mean(median.ageE),
            "largest home price"=max(median.ageE),
            "variance"=var(median.ageE)) |>
  glimpse()
## Rows: 1
## Columns: 5
## $ count                  <int> 100
## $ `lowest age`           <dbl> 26.5
## $ `average (median) age` <dbl> 43.007
## $ `largest home price`   <dbl> 54.7
## $ variance               <dbl> 27.02349
tm_shape(ncCounties) + 
  tm_polygons(fill="median.ageE",
              fill.scale= tm_scale_intervals(style="jenks"),
              fill.legend = tm_legend("Median Age 2020",
                                      position=tm_pos_out(pos.h = "right"))) + 
  tm_title("Median Age, NC Counties 2020")

The youngest areas in North Carolina are Chapel Hill and Durham and Charlotte.

Works from Home Percent

as.data.frame(ncCounties) |>
  drop_na(works.from.home.percent) |>
  summarize(count = n(),
            "lowest % work from home"=min(works.from.home.percent),
            "average (median) % work from home"=mean(works.from.home.percent),
            "largest % work from home"=max(works.from.home.percent),
            "variance"=var(works.from.home.percent)) |>
  glimpse()
## Rows: 1
## Columns: 5
## $ count                               <int> 100
## $ `lowest % work from home`           <dbl> 0.6061051
## $ `average (median) % work from home` <dbl> 2.313695
## $ `largest % work from home`          <dbl> 6.9549
## $ variance                            <dbl> 1.806035
tm_shape(ncCounties) + 
  tm_polygons(fill="works.from.home.percent",
              fill.scale= tm_scale_intervals(style="jenks",
                                        label.format=list(fun=function(x) paste0(formatC(x,digits=1,format="f"),"%"))),
              fill.legend = tm_legend("% Works from Home 2020",
                                      position=tm_pos_out(pos.h = "right"))) + 
  tm_title("% Works from Home, NC Counties 2020")

The highest percentages of work-from-home are in the suburban counties around Raleigh-Durham and the Charlotte metro area, with another grouping in Asheville.

White Alone Percent

as.data.frame(ncCounties) |>
  drop_na(white.percent) |>
  summarize(count = n(),
            "lowest % white alone"=min(white.percent),
            "average (median) home price"=mean(white.percent),
            "largest home price"=max(white.percent),
            "variance"=var(white.percent)) |>
  glimpse()
## Rows: 1
## Columns: 5
## $ count                         <int> 100
## $ `lowest % white alone`        <dbl> 27.26651
## $ `average (median) home price` <dbl> 71.52054
## $ `largest home price`          <dbl> 96.44843
## $ variance                      <dbl> 314.1411
tm_shape(ncCounties) + 
  tm_polygons(fill="white.percent",
              fill.scale= tm_scale_intervals(style="jenks",
                                             label.format=list(fun=function(x) paste0(formatC(x,digits=0,format="f"),"%"))),
              fill.legend = tm_legend("% White Alone 2020",
                                      position=tm_pos_out(pos.h = "right"))) + 
  tm_title("% White Alone, NC Counties 2020")

The mountain regions of western North Carolina have a higher percentage of white (alone) compared to other regions in North Carolina.

Results

First Model: House Value vs. % Works from Home

The initial linear regression model examines the percentage of work from home with median house value in 2020.

# build the first model

first.model <- lm(median.house.valueE ~ works.from.home.percent,data=ncCounties)
summary(first.model)
## 
## Call:
## lm(formula = median.house.valueE ~ works.from.home.percent, data = ncCounties)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -60097 -19670  -5395  13763 102485 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                73974       5839   12.67   <2e-16 ***
## works.from.home.percent    36564       2185   16.73   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 29220 on 98 degrees of freedom
## Multiple R-squared:  0.7408, Adjusted R-squared:  0.7381 
## F-statistic:   280 on 1 and 98 DF,  p-value: < 2.2e-16

The results above show that % works from home is positively correlated with median house value, and that this variable alone (potentially) explains nearly 74% (the adjusted R-squared) of the relationship. For each additional percentage of population that works from home, the median house value goes up $36,564, although there may be other factors that influence this unaccounted for here.

Second Model: Multivariate analysis

In the second model, we’ll also include other variables in the analysis - median age, median income, and % white.

second.model <- lm(median.house.valueE ~ 
                     works.from.home.percent + 
                     median.ageE +
                     median.incomeE +
                     white.percent,
                   data=ncCounties)
summary(second.model)
## 
## Call:
## lm(formula = median.house.valueE ~ works.from.home.percent + 
##     median.ageE + median.incomeE + white.percent, data = ncCounties)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -50373 -17516  -2844  14865  63733 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             -7.949e+04  2.596e+04  -3.062  0.00286 ** 
## works.from.home.percent  2.434e+04  2.324e+03  10.476  < 2e-16 ***
## median.ageE              1.839e+02  4.794e+02   0.384  0.70218    
## median.incomeE           4.313e+00  7.233e-01   5.964 4.19e-08 ***
## white.percent            8.154e+02  1.432e+02   5.695 1.37e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 21620 on 95 degrees of freedom
## Multiple R-squared:  0.8624, Adjusted R-squared:  0.8567 
## F-statistic: 148.9 on 4 and 95 DF,  p-value: < 2.2e-16

The second model suggests that median income, % white, and % works from home are all influential in median house price. Median age does not appear to significantly positively influence median house price. These variables have an adjusted R-squared of .85.

The % works from home still has the largest impact, where 1 % additional works from home adds $24,000 to the median house price.

# add the residuals from the analysis to the counties data
ncCounties$resids <- second.model$residuals

# map the residuals

tm_shape(ncCounties) + 
  tm_polygons(fill="resids",
              fill.scale = tm_scale_intervals(midpoint=0),
              fill.legend = tm_legend("Residuals 2020",
                                      position=tm_pos_out(pos.h = "right"))) + 
  tm_title("Model Residuals, NC Counties 2020")

The residuals of the second model show that there is a large spatial variation in how well the model works. Areas that have a value closer to 0 fit the model better, where areas that are largely negative or largely positive don’t fit as well. Some rural and coastal areas don’t fit the model as well, which suggests that there may be additional variables needed to include to incorporate these differences.

Conclusion

The research here suggests that three variables - % work from home, % white, and median income - influence the median house value for North Carolina counties in 2020. Each additional % of people that work from home adds potentially over $20,000 to the median home value.

While the model generally holds true for North Carolina, there is a great deal of spatial variability in the residuals of the model. Some areas in the mountains, some coastal areas, and some rural areas don’t follow the model as well, which suggests that there are additional variables needed to complete the model.

There are limitations in the data. Median house value, median age, and median income don’t incorporate the range and variability of their data. The percentage of people that work from home is small, but still could have an impact in house price. It’s not clear which direction the relationship between median house value and % works from home has. It could be that people working from home drive up home prices, or it could be that people working from home have higher incomes than those who need to work in person (retail, manufacturing, medicine, farming) and therefore drive up home prices.

Future work should consider smaller scales of analysis, such as using Census tracts, and should include additional related variables including job type, education, and more. Future work might also look at changes over time or compare North Carolina with a similar region.

Overall, this work explains nearly 85% of median house value in North Carolina in 2020.

References

Campbell Jr, H. S., & James, R. D. (2021). Place, Space, and Cost of Living in North Carolina: A Research Note. Southeastern Geographer, 61(1), 81-104. link