1 Scope and Goals of the Analysis

At Texas Realty Insights, we continuously strive to provide data-driven insights that guide our strategic decisions in the real estate market. This analysis focuses on historical sales data across Texas cities, examining trends in property prices, sales volume, and listing activity over time.

By combining descriptive statistics with clear visualizations, our goal is to identify patterns that reveal market dynamics, seasonal fluctuations, and city-level differences, allowing us to optimize marketing strategies and resource allocation.

2 Dataset Description and Variable Analysis

The dataset used for this analysis contains historical real estate sales information for four Texas cities between 2010 and 2014. It includes metrics such as the total number of sales, total sales volume, median sale price, number of active listings, and months of inventory.

In this section, we first provide a preview of the dataset and then describe each variable in terms of type, meaning, and statistical classification.

2.1 Dataset Preview

The dataset includes 240 monthly records covering four metropolitan areas in Texas (Beaumont, Bryan-College Station, Tyler, and Wichita Falls) from 2010 to 2014.

The following table shows the first few rows of the dataset, providing a quick glimpse of the cities, time frame, and the key numerical metrics.

Table 1. Dataset Preview
city year month sales volume median_price listings months_inventory
Beaumont 2010 1 83 14.162 163800 1533 9.5
Beaumont 2010 2 108 17.690 138200 1586 10.0
Beaumont 2010 3 182 28.701 122400 1689 10.6
Beaumont 2010 4 200 26.819 123200 1708 10.6
Beaumont 2010 5 202 28.833 123100 1771 10.9
Beaumont 2010 6 189 27.219 122800 1803 11.1

2.2 Variable Description

Below is a detailed description of each variable, including its type and statistical classification.

Table 2. Variable Description
Variable Name R Type Meaning Classification Possible Analysis
city Character Reference city Qualitative nominal Frequency, mode
year Integer Reference year Qualitative ordinal Frequency, mode, temporal trends
month Integer Reference month Qualitative ordinal Frequency, mode, seasonal trends
sales Integer Total number of sales Quantitative discrete Position, variability, and shape indexes
volume Numeric Total value of sales (million USD) Quantitative continuous Position, variability, and shape indexes
median_price Numeric Median sale price (USD) Quantitative continuous Position, variability, and shape indexes
listings Integer Total number of active listings Quantitative discrete Position, variability, and shape indexes
months_inventory Numeric Months of inventory Quantitative continuous Position, variability, and shape indexes

In this dataset, city, year, and month are treated as qualitative variables. While year and month are stored as integers, arithmetic operations such as averaging are not meaningful for these variables. Instead, they serve to group and organize the data over time, enabling the examination of temporal and seasonal trends across cities.

For all qualitative variables, descriptive analysis focuses on frequencies, proportions, and modes, while preserving the natural order of ordinal variables like year and month for clear visualizations and summaries.

Conversely, for all quantitative variables, it is possible to compute the full range of descriptive statistics — including measures of position, dispersion, and shape — to better understand their distributional characteristics.

3 Descriptive Statistics

This chapter summarizes the key statistical characteristics of the quantitative and qualitative variables in the dataset. For quantitative variables, measures of position, variability, and shape are calculated. For qualitative variables, frequency distributions are created to understand the distribution of observations across categories.

3.1 Measures of Position and Central Tendency

This section presents the position indexes of the quantitative variables:

  • Mean: the arithmetic average of the variable, representing its central value.

  • Min: the smallest observed value in the dataset.

  • Q1 (1st quartile): the value below which 25% of the observations fall.

  • Median (Q2 / 2nd quartile): the value below which 50% of the observations fall, representing central tendency.

  • Q3 (3rd quartile): the value below which 75% of the observations fall.

  • Max: the largest observed value in the dataset.

Table 3. Position indexes: mean, median, and quartiles
Variable Mean Min Q1 Median Q3 Max
sales 192.29 79.00 127.00 175.50 247.00 423.00
volume 31.01 8.17 17.66 27.06 40.89 83.55
median_price 132,665.42 73,800.00 117,300.00 134,500.00 150,050.00 180,000.00
listings 1,738.02 743.00 1,026.50 1,618.50 2,056.00 3,296.00
months_inventory 9.19 3.40 7.80 8.95 10.95 14.90

3.2 Measures of Variability

This section presents measures of dispersion for the quantitative variables, providing insight into the spread and relative variability of the data:

  • Standard Deviation (SD): measures the average distance of each observation from the mean, indicating how spread out the values are.

  • Range: the difference between the maximum and minimum values, showing the total spread of the data.

  • Interquartile Range (IQR): the difference between the third quartile (Q3) and the first quartile (Q1), representing the spread of the middle 50% of the data.

  • Coefficient of Variation (CV%): the standard deviation expressed as a percentage of the mean, allowing comparison of variability between variables with different units or scales.

Table 4. Variability Indexes
Variable StdDev Range IQR CV
sales 79.65 344.00 120.00 41.42%
volume 16.65 75.38 23.23 53.71%
median_price 22,662.15 106,200.00 32,750.00 17.08%
listings 752.71 2,553.00 1,029.50 43.31%
months_inventory 2.30 11.50 3.15 25.06%

To compare variability across variables measured in different units, the coefficient of variation (CV) is used. In this dataset:

  • The variable volume exhibits the highest relative variability, with a CV of 53.71%, indicating that the total sales value fluctuates substantially from month to month and across cities.

  • In contrast, median_price has the lowest CV at 17.08%, suggesting that median property prices are relatively less dispersed over time and cities compared to other metrics.

3.3 Measures of Distribution Shape

This section examines the asymmetry and peakedness of the quantitative variables using skewness and kurtosis, providing insight into the shape of their distributions:

  • Skewness measures the asymmetry of a distribution. Positive skewness indicates a longer right tail, while negative skewness indicates a longer left tail. Values near 0 suggest a roughly symmetric distribution.

  • Kurtosis measures the peakedness or tail weight relative to a normal distribution. A value of 0 corresponds to a normal-like shape; positive values indicate a more peaked distribution with heavier tails (leptokurtic), and negative values indicate a flatter distribution with lighter tails (platykurtic).

Table 5. Shape Indexes
Variable Skewness Kurtosis
sales 0.72 -0.31
volume 0.88 0.18
median_price -0.36 -0.62
listings 0.65 -0.79
months_inventory 0.04 -0.17

In this dataset, the shape indexes of the quantitative variables indicate the following:

  • sales is moderately positively skewed (0.72) and slightly platykurtic (-0.31).

  • volume is positively skewed (0.88) and slightly leptokurtic (0.18).

  • median_price is slightly negatively skewed (-0.36) and platykurtic (-0.62).

  • listings is positively skewed (0.65) and platykurtic (-0.79).

  • months_inventory is nearly symmetric (0.04) and slightly platykurtic (-0.17), indicating a distribution quite similar to a normal distribution.

Of these variables, volume exhibits the most pronounced asymmetry, with a positive skew. In real estate, this implies that, although the total sales value is generally moderate, there are occasional months with exceptionally high sales values. These spikes may be due to seasonal effects, large transactions or market opportunities.

3.4 Frequency Distribution for Qualitative Variables

This section summarizes the distribution of observations across categorical variables using frequency counts, relative frequencies, and cumulative frequencies to describe how data are distributed among categories.

A frequency distribution is a summary of how often each category occurs in a dataset:

  • Frequency (ni): the number of observations in each category.

  • Relative frequency (fi): the proportion of observations in each category, calculated as the count divided by the total number of observations.

  • Cumulative count (Ni) and cumulative frequency (Fi): the running total of frequencies as categories are ordered, showing how many observations fall below or within a given category.

Frequency distributions are essential for categorical variables because arithmetic measures like mean or standard deviation do not apply.

Table 6. Frequency distribution: city
city ni fi Ni Fi
Beaumont 60 25.00% 60 25.00%
Bryan-College Station 60 25.00% 120 50.00%
Tyler 60 25.00% 180 75.00%
Wichita Falls 60 25.00% 240 100.00%
Table 7. Frequency distribution: year
year ni fi Ni Fi
2010 48 20.00% 48 20.00%
2011 48 20.00% 96 40.00%
2012 48 20.00% 144 60.00%
2013 48 20.00% 192 80.00%
2014 48 20.00% 240 100.00%
Table 8. Frequency distribution: month
month ni fi Ni Fi
1 20 8.33% 20 8.33%
2 20 8.33% 40 16.67%
3 20 8.33% 60 25.00%
4 20 8.33% 80 33.33%
5 20 8.33% 100 41.67%
6 20 8.33% 120 50.00%
7 20 8.33% 140 58.33%
8 20 8.33% 160 66.67%
9 20 8.33% 180 75.00%
10 20 8.33% 200 83.33%
11 20 8.33% 220 91.67%
12 20 8.33% 240 100.00%

For each qualitative variable, all categories show the same relative frequency: 25% for city, 20% for year, and 8.33% for month. This uniform distribution reflects the complete, ordered structure of the dataset and confirms that there are no missing values.

4 Frequency distribution and Gini Index of Sales

In this chapter, the frequency distribution for the variable sales is constructed to identify how transaction volumes are distributed across different ranges and to highlight potential concentration patterns.

A frequency distribution groups data values into intervals (classes) and shows how often observations fall within each range. This approach helps summarize large datasets and reveals where observations are more concentrated, making it easier to interpret overall trends and variability in the data.

Table 9. Frequency distribution for total sales classes
class ni fi Ni Fi
[79,148] 84 35.00% 84 35.00%
(148,217] 77 32.08% 161 67.08%
(217,285] 41 17.08% 202 84.17%
(285,354] 27 11.25% 229 95.42%
(354,423] 11 4.58% 240 100.00%

The distribution of sales classes shows a higher concentration of observations in the lower and mid-range intervals, with only a few months recording exceptionally high number of sales. This pattern reflects the positive skewness observed for this variable before, indicating that most months have moderate market activity, while a smaller number of months experience unusually strong sales performance. Such periods may correspond to seasonal peaks or particularly favorable market conditions.

The Gini Index is a measure of heterogeneity or dispersion used to assess how evenly observations are distributed across classes. It ranges between 0 and 1:

  • A value close to 0 indicates low variability or a highly concentrated distribution (most observations fall in one class).

  • A value close to 1 indicates high heterogeneity, meaning observations are spread more evenly across classes.

For the variable sales, the Gini index is: 0.913 .

This value suggests that the distribution of sales across the defined classes is highly heterogeneous. Even though the classes are not perfectly balanced—reflecting the positive skewness of the distribution—no single class dominates, indicating that observations are spread across multiple sales ranges.

5 Probability calculations

This section calculates specific probabilities for selected events in the dataset: the probability of a randomly chosen observation corresponding to the city Beaumont, the probability of it being in July, and the joint probability of being in December 2012. These calculations help quantify the likelihood of individual and combined events.

Table 10. Specific Probability Calculations
P(city == 'Beaumont') P(month == 7) P(month == 12 & year == 2012)
25.00% 8.33% 1.67%

As shown in Table 10, the probability of a randomly selected observation belonging to the city Beaumont is 25.0%, and the probability of a month being July is 8.33%, consistent with the frequency distributions presented in Chapter 3.4.

The joint probability of an observation being in December 2012 is 1.67%. This value is the same for every combination of year and month, since the dataset contains 12 months × 5 years = 60 month-year combinations per city, each equally represented.

6 Creation of New Variables

Two new variables were created to enrich the analysis:

  • average_unit_price estimates the average price per property by dividing the total sales volume by the number of transactions.

  • listings_effectiveness measures the efficiency of property listings by calculating the ratio of sales to active listings.

Table 11 summarizes these new variables with key statistics, including mean, median, variability, skewness, and excess kurtosis, providing insights into typical prices and the effectiveness of sales listings across the dataset.

Table 11. Summary of New Variables
Statistic Average Property Price (USD) Listings effectiveness (%)
Mean 154,320.37 11.87%
Median 156,588.48 10.96%
StdDev 27,147.46 4.69%
Min 97,010.20 5.01%
Max 213,233.94 38.71%
CV (%) 17.59% 39.50%
Skewness −0.07 2.09
Kurtosis −0.78 6.88

For average_unit_price, the mean (154,320 USD) is higher than the mean of the median_price variable (132,665 USD) calculated before. This difference indicates that, on average, the newly calculated average property price tends to exceed the median property price, which is consistent with the positively skewed property price distributions often observed in real estate markets, where occasional high-value transactions elevate the overall average relative to the medians.

The listings_effectiveness variable has a mean of 11.87%, indicating that, on average, 11.87% of active property listings are converted into actual sales monthly. This metric reflects the efficiency of listings in generating transactions across the market. The relatively low average effectiveness rate highlights opportunities to optimize marketing and sales strategies. Actions such as targeting high-demand periods, enhancing listing visibility, or refining pricing approaches could improve overall effectiveness and transaction volume.

Additionally, the coefficient of variation (CV) for average_unit_price is 17.59%, indicating moderate variability relative to the mean, similar to the CV of the previously analyzed median_price variable (17.08%). In contrast, listings_effectiveness shows a CV of 39.50%, highlighting substantially greater relative variability.

Regarding distribution shape, average_unit_price shows a mild negative skew (-0.07) and slightly platykurtic kurtosis (-0.78), indicating a roughly symmetric and somewhat flat distribution. By comparison, listings_effectiveness has extreme kurtosis (6.88), reflecting sharp peaks in certain months, and a positive skewness (2.09), suggesting that most months exhibit relatively low effectiveness, while occasional periods achieve unusally high sales efficiency.

Overall, these statistics suggest that while property prices remain relatively stable across cities and time, the effectiveness of listings is highly variable, with occasional peaks likely driven by successful marketing campaigns or favorable market conditions.

The following boxplots illustrate how the newly created variables—average_unit_price and listings_effectiveness—are distributed across the four cities, highlighting differences in central tendency, variability, and the presence of outliers in each location.

The boxplots for average_unit_price reveal notable variability in both central tendency and interquartile range (IQR) across cities. The cities ranked by decreasing median average property price are: Bryan-College Station, Tyler, Beaumont, and Wichita Falls. The IQRs for most cities do not overlap, with only a slight overlap between Bryan-College Station and Tyler, suggesting that the four cities cover different market price ranges. Bryan-College Station stands out with a visibly larger IQR, indicating higher range in average property prices within this city.

The medians of listings_effectiveness are relatively similar across cities, ranging approximately between 9% and 13%. Bryan-College Station stands out with a notably larger interquartile range (IQR) and more pronounced outliers compared to the other cities.

To complement the exploratory analysis of listings_effectiveness by city, the boxplot below illustrates the distribution of months_inventory, which represents the average number of months a property remains on the market before being sold. The results reveal substantial variability across cities, with the median values decreasing in the following order: Tyler, Beaumont, Bryan-College Station, and Wichita Falls. This indicates that listings in Tyler generally take the longest time to sell, whereas properties in Wichita Falls tend to sell the fastest. Furthermore, the variable months_inventory does not appear to be correlated with listings_effectiveness,as it does not show the same trend.

7 Conditional Analysis

This chapter transitions from the exploratory analysis of the newly created variables, initially examined through boxplots, to a more detailed conditional analysis.

Conditional analysis consists of calculating summary statistics, such as the mean and standard deviation, separately for different subgroups—here defined by city, year, and month. This approach allows for understanding how the variables, in this case average_unit_price and listings_effectiveness, vary across different locations and time periods, highlighting trends and patterns that may not be visible in the overall data.

Table 12. Conditional Analysis of New Variables by City
city Mean Average Price (USD) StdDev Average Price (USD) CV Average Price (%) Mean Listings Effectiveness (%) StdDev Listings Effectiveness (%) CV Listings Effectiveness (%)
Beaumont 146,640.41 11,232.13 7.66% 10.61% 2.67% 25.14%
Bryan-College Station 183,534.29 15,149.35 8.25% 14.73% 7.29% 49.44%
Tyler 167,676.76 12,350.51 7.37% 9.35% 2.35% 25.09%
Wichita Falls 119,430.00 11,398.48 9.54% 12.80% 2.47% 19.31%

The conditional analysis by city highlights notable differences in both the average property price and the effectiveness of listings:

  • Average Property Price (USD): Bryan-College Station records the highest mean average price at $183,534, followed by Tyler ($167,677) and Beaumont ($146,640), while Wichita Falls has the lowest ($119,430). The coefficients of variation are relatively consistent across cities, indicating similar levels of price dispersion within each market.

  • Listings Effectiveness (%): Bryan-College Station also stands out with the highest mean effectiveness (14.73%) and the largest coefficient of variation (49.44%), suggesting substantial variability in the performance of listings. Wichita Falls and Beaumont show relatively high mean effectiveness (12.80% and 10.61%, respectively), while Tyler has the lowest mean effectiveness (9.35%).

Table 13. Conditional Analysis of New Variables by Year
year Mean Average Price (USD) StdDev Average Price (USD) CV Average Price (%) Mean Listings Effectiveness (%) StdDev Listings Effectiveness (%) CV Listings Effectiveness (%)
2010 150,188.58 23,279.55 15.50% 9.97% 3.37% 33.84%
2011 148,250.63 24,938.38 16.82% 9.27% 2.32% 25.04%
2012 150,898.68 26,438.50 17.52% 10.97% 2.81% 25.59%
2013 158,705.25 26,523.81 16.71% 13.46% 4.48% 33.28%
2014 163,558.70 31,740.53 19.41% 15.70% 6.18% 39.34%

The conditional analysis by year reveals clear temporal patterns in both average property price and listings effectiveness:

  • Average Property Price (USD): There is a gradual increase over time, from $150,189 in 2010 to $163,559 in 2014. The coefficient of variation increases slightly over the same period, reaching 19.41% in 2014, indicating that price variability has gradually widened as the market expanded.

  • Listings Effectiveness (%): Effectiveness shows a steady upward trend, from 9.97% in 2010 to 15.70% in 2014, suggesting that listings become more successful over time. The coefficient of variation also peaks in 2014 (6.18%), indicating that, although listings have become more successful overall, the disparity in performance across different listings has grown as well.

Table 14. Conditional Analysis of New Variables by Month
month Mean Average Price (USD) StdDev Average Price (USD) CV Average Price (%) Mean Listings Effectiveness (%) StdDev Listings Effectiveness (%) CV Listings Effectiveness (%)
1 145,640.42 29,819.11 20.47% 8.31% 2.30% 27.71%
2 148,840.48 25,120.42 16.88% 8.78% 2.19% 24.97%
3 151,136.54 23,237.92 15.38% 11.60% 3.46% 29.82%
4 151,461.33 26,174.30 17.28% 12.53% 3.80% 30.30%
5 158,235.03 25,787.19 16.30% 14.15% 5.03% 35.53%
6 161,545.82 23,470.46 14.53% 14.24% 5.76% 40.44%
7 156,881.00 27,220.12 17.35% 14.35% 7.40% 51.60%
8 156,455.56 28,253.21 18.06% 14.19% 5.26% 37.09%
9 156,522.32 29,669.41 18.96% 11.17% 3.48% 31.16%
10 155,897.37 32,527.29 20.86% 11.19% 3.60% 32.14%
11 154,233.00 29,684.87 19.25% 10.25% 2.93% 28.60%
12 154,995.52 27,008.87 17.43% 11.73% 3.79% 32.29%

The conditional analysis by month highlights seasonal patterns in both average property price and listings effectiveness:

  • Average Property Price (USD): Prices start relatively lower in January ($145,640) and gradually increase, peaking in June ($161,546), before stabilizing around $155,000–$156,000 in the last months. This suggests a mid-year peak in property values. The coefficient of variation does not show any particular seasonal pattern.

  • Listings Effectiveness (%): Effectiveness is lowest at the beginning of the year (8.31% in January), rises steadily to a peak in July (14.35%), and then gradually decreases toward the end of the year. The coefficient of variation also rises mid-year, particularly in July (51.60%), suggesting more variability in listing success during peak months.

Overall, the data shows a clear seasonal trend, with both prices and listing effectiveness peaking around mid-year, reflecting higher market activity in the summer months.

Below are the plots illustrating the conditional analysis of listings_effectiveness and average_unit_price, showing how these variables vary across city-year combinations and city-month combinations. The graphs complement the tables by providing a visual representation of trends, differences between cities, and seasonal patterns throughout the year.

Comments on the graph above: the increasing trend in average property price over the years is clearly visible across all cities.

Comments on the graph above: the peak in June is not immediately visible in the graph, as the bars across months are quite similar. The peak is likely driven by smaller cities, which reduces the visual impact of the increase.

Comments on the graph above: the increase in listings effectiveness over time is very noticeable. Initially, Wichita Falls had the highest effectiveness, but over the years Bryan-College Station became clearly the most effective city.

Comments on the graph above: the peak in July is prominent, mainly due to Bryan-College Station, which consistently shows the highest effectiveness among the cities.

9 Outcomes of the Analysis and Key Takeaways

This chapter summarizes the main findings of the Texas real estate market analysis, highlighting insights relevant for investors, developers, and market professionals.

Trends in Average Property Prices

  • From 2010 to 2014, the average property price increased steadily from $150,189 to $163,559.

  • The coefficient of variation (CV) also slightly increased over the period, peaking at 19.41% in 2014, indicating modest growth in price variability.

Market Segmentation Across Cities

  • Bryan-College Station: Premium market, with the highest median and average prices.

  • Tyler: Slightly lower prices but highly dynamic, with a strong number of sales and total sales value.

  • Beaumont: Moderate pricing with balanced sales activity.

  • Wichita Falls: Most affordable market, with the lowest prices and small volumes.

  • Price ranges across cities rarely overlap, indicating clear geographic segmentation and market differentiation.

Seasonal Trends in Sales

  • Total sales value and number of sales exhibit a strong seasonal pattern, peaking in late spring and summer (May–August).

  • Median property prices, in contrast, do not show seasonal variation, indicating that the timing of sales does not influence pricing levels.

City-Level Sales Performance

  • Tyler: Consistently shows the highest total sales value and transaction volume, making it the most dynamic market. Its strong performance is driven primarily by volume rather than property price, as Bryan-College Station has higher median prices.

  • Bryan-College Station: Second in total sales value and transaction volume. Despite lower transaction numbers than Tyler, its high listings effectiveness (highest among the cities) makes it a dynamic and highly responsive market, attractive for premium listings and high-value investments.

  • Beaumont: Balanced in both price and sales activity, representing a stable mid-tier market.

  • Wichita Falls: Smallest market in terms of sales and volume, but with affordable pricing for entry-level buyers; limited market dynamism.

Listings Effectiveness

  • Overall, listings effectiveness remains below 20%, suggesting room for improvement via targeted campaigns.

  • Effectiveness has increased over time, from 9.97% in 2010 to a peak of 15.70% in 2014.

  • There is a seasonal effect, with higher effectiveness during the summer months, mirroring the total sales peak.

  • Bryan-College Station stands out with the highest effectiveness, reinforcing its status as a dynamic premium market.

Market Supply and Active Listings

  • The combination of high active listings and high transaction volumes confirms Tyler as the most liquid and dynamic market.

  • Bryan-College Station, while having fewer active listings than Tyler, achieves high sales efficiency through superior listings effectiveness, highlighting its market responsiveness.

Overall Market Insights for Investors

  • Bryan-College Station: Premium pricing, high listings effectiveness, dynamic market despite lower volume than Tyler—ideal for high-value investments.

  • Tyler: High total sales value and transaction volume, abundant active listings—ideal for investors seeking market activity and liquidity.

  • Beaumont: Balanced pricing and moderate dynamism—suitable for mid-tier investment strategies.

  • Wichita Falls: Low pricing, small volumes, and limited activity—suitable for entry-level buyers or small-scale investors, but with careful attention to the market’s limited dynamism.

10 Appendix

This analysis was conducted using R version 4.5.1 and relied exclusively on the provided dataset.