List of required packages
Package	Version
moments	0.14.1
knitr	1.51
kableExtra	1.4.0
ggplot2	4.0.3
tidyr	1.3.2
dplyr	1.2.1
scales	1.4.0
stringr	1.6.0

# load dataset stored in the same folder of .rmd file
df <- read.csv("realestate_texas.csv")
# Randomly select row indices, then sort them to keep original order
idx <- sort(sample(nrow(df), min(10, nrow(df))))
kable(df[idx, ])

	city	year	month	sales	volume	median_price	listings	months_inventory
10	Beaumont	2010	10	150	23.904	138500	1779	11.5
34	Beaumont	2012	10	193	27.350	121800	1671	9.9
42	Beaumont	2013	6	232	36.275	134100	1675	9.0
74	Bryan-College Station	2011	2	101	16.125	148500	1562	9.3
95	Bryan-College Station	2012	11	159	28.882	149100	1442	7.3
119	Bryan-College Station	2014	11	169	34.903	172800	973	3.8
165	Tyler	2013	9	287	51.099	147600	2917	10.2
199	Wichita Falls	2011	7	127	13.594	102300	1029	9.2
222	Wichita Falls	2013	6	121	15.547	104700	923	7.9
233	Wichita Falls	2014	5	140	17.833	115700	899	7.6

1. Analysis of the variables

Description of the study variables
Variable	Description	Type
city	City or market area observed	Qualitative nominal
year	Year of the observation	Quantitative discrete
month	Month of the observation	Quantitative discrete
sales	Number of sales in that city–month	Quantitative discrete
volume	Total sales volume in millions of US dollars	Quantitative continuous
median_price	Median sale price (US dollars)	Quantitative continuous
listings	Active for-sale listings (inventory)	Quantitative discrete
months_inventory	Months needed to clear inventory at the current sales pace	Quantitative continuous

Time dimension: year and month they index repeated observations per city. For time-based analysis, we can create a period variable (e.g., first day of the month) from year and month in order to enables chronological ordering, moving averages to analyze price trends, trend-seasonality (aggregated series or city-level series) and cross-cities comparisons in the same period.

2. Measures of location, variability and shape

Comprehensive descriptive analysis involves using measures of position (central tendency), variability (dispersion) and shape to summarize distributions. For quantitative variables, the relevant measures include the mean, median, 1st and 3rd quartiles, standard deviation, interquartile range, variance and skewness/kurtosis indices.

In the code chunk below, the function compute_indices() is defined to produce a comprehensive set of descriptive statistics for a quantitative variable. The function returns measures of position (mean, median, first and third quartiles) together with minimum and maximum values, dispersion (variance, standard deviation, interquartile range and coefficient of variation), and distributional shape (skewness and excess kurtosis).

The function is then applied to the selected quantitative variables (sales, volume, median_price, listings, and months_inventory), and the resulting statistics are assembled into a single summary table and rounded to improve readability in the final report.

# Function to compute descriptive statistics for a quantitative variable
compute_indices <- function(x) {
  # Convert to numeric and remove missing values
  x <- na.omit(as.numeric(x))
  
  # if no valid observations (size of data equals to 0)
  if (length(x) == 0) { 
    return(c(mean = NA, median = NA, q1 = NA, q3 = NA,min = NA, max = NA,
             variance = NA, sd = NA, iqr = NA, cv = NA,skewness = NA, kurtosis_excess = NA))
  }
  
  # Aliases for mean and std.dev functions
  m <- mean(x)
  s <- sd(x)
  
  # Return a named vector of descriptive indices
  c(
    # --- position indices ---
    mean   = m,
    median = median(x),
    q1     = quantile(x, 0.25),
    q3     = quantile(x, 0.75),
    
    # --- extreme values ---
    min = min(x),
    max = max(x),
    
    # --- variability indices ---
    variance = var(x),
    sd       = s,
    iqr      = IQR(x),
    cv       = ifelse(m == 0, NA, s/m),
    
    # --- shape indices ---
    # When std.dev is 0, all observations are identical (constant variable),
    # so skewness and kurtosis are not defined
    skewness = ifelse(is.na(s) || s == 0, NA, skewness(x)),
    # the excess kurtosis is defined as kurtosis minus 3
    kurtosis_excess = ifelse(is.na(s) || s == 0, NA, kurtosis(x) - 3)
  )
}

# Quantitative variables of interest
vars <- c("sales", "volume", "median_price", "listings", "months_inventory")

# Apply the function to each variable and combine results in a table
results <- t(sapply(vars, function(v) compute_indices(df[[v]])))
results <- as.data.frame(results)

# round values for cleaner reporting
results_rounded <- round(results, 2)

Then, the summary statistics are transformed into a report-ready table.

# Build final table from computed results
# names of variables are moved from row names into first column (Variable)
table_results <- cbind(Variable = rownames(results_rounded), results_rounded)
rownames(table_results) <- NULL

# Rename columns to report-friendly label
# with first column header intentionally left blank
colnames(table_results) <- c("","Mean","Median","Q1","Q3","Min","Max","Variance","SD","IQR","CV","Skewness","Excess Kurtosis")

# Create formatted table
kable(
  table_results,
  caption = "Descriptive statistics for quantitative variables",
  align = "lrrrrrrrrrrrr",
  booktabs = TRUE
) %>%
  # Add headers above for grouping indices
  add_header_above(c(" " = 1,"Position" = 4,"Extremes" = 2,"Variability" = 4,"Shape" = 2)) %>%
  # Add style for HTML
  kable_styling(full_width = FALSE, bootstrap_options = c("striped", "hover", "condensed")
  )

Descriptive statistics for quantitative variables
	Position				Extremes		Variability				Shape
	Mean	Median	Q1	Q3	Min	Max	Variance	SD	IQR	CV	Skewness	Excess Kurtosis
sales	192.29	175.50	127.00	247.00	79.00	423.00	6.34430e+03	79.65	120.00	0.41	0.72	-0.31
volume	31.01	27.06	17.66	40.89	8.17	83.55	2.77270e+02	16.65	23.23	0.54	0.88	0.18
median_price	132665.42	134500.00	117300.00	150050.00	73800.00	180000.00	5.13573e+08	22662.15	32750.00	0.17	-0.36	-0.62
listings	1738.02	1618.50	1026.50	2056.00	743.00	3296.00	5.66569e+05	752.71	1029.50	0.43	0.65	-0.79
months_inventory	9.19	8.95	7.80	10.95	3.40	14.90	5.31000e+00	2.30	3.15	0.25	0.04	-0.17

Given that descriptive statistics were computed for the quantitative variables (sales, volume, median_price, listings, and months_inventory), it is appropriate to compute frequency distributions for the remaining categorical or discrete time-related variables (i.e. Year):

# function used to compute frequency table for specific variable
freq_dist_1var <- function(x) {
  ni <- table(x)
  fi <- ni/length(x)
  Ni <- cumsum(ni)
  Fi <- cumsum(fi)
  return (cbind(ni,fi,Ni,Fi))
}

# apply function to variable year
freq_year  <- freq_dist_1var(df$year)
# render table
kable(freq_year, caption = "Frequency distribution - Year", align = "lrrrr", booktabs = TRUE)

Frequency distribution - Year
	ni	fi	Ni	Fi
2010	48	0.2	48	0.2
2011	48	0.2	96	0.4
2012	48	0.2	144	0.6
2013	48	0.2	192	0.8
2014	48	0.2	240	1.0

3. Identification of variables with the highest variability and skewness

Based on the computed summary table, the conclusions are as follows:

## 1 - Highest variability (CV): volume - 0.54

## 2 - Highest skewness (absolute): volume - 0.88

The first conclusion is based on the coefficient of variation (CV = sd / mean), which is the appropriate measure for comparing dispersion across variables expressed on different scales. In the reported results, volume has the largest CV (0.54), therefore volume exhibits the greatest relative variability.

The second conclusion is based on skewness. Considering asymmetry in absolute value, volume shows the largest skewness (0.88). Since the skewness is positive, the distribution of volume is right-skewed, indicating a longer upper tail and the presence of relatively high observations.

4. Creating class intervals for a quantitative variable

The quantitative variable median_price was selected and partitioned into class intervals in order to construct a frequency distribution and visualize the resulting frequencies using a bar chart.

# width for class intervals
breaks_price <- pretty(df$median_price)

# Human-readable numeric labels (no scientific notation)
labels_price <- paste0(
  "[",
  formatC(breaks_price[-length(breaks_price)], format = "f", digits = 0, big.mark = ""),
  " - ",
  formatC(breaks_price[-1],                   format = "f", digits = 0, big.mark = ""),
  ")"
)

df$median_price_class <- cut(
  df$median_price,
  breaks = breaks_price,
  labels = labels_price,
  include.lowest = TRUE,
  right = FALSE
)
# apply function to variable median_price_class
freq_median_price  <- freq_dist_1var(df$median_price_class)
# render table
kable(freq_median_price, caption = "Frequency distribution - Median price", align = "lrrrr", booktabs = TRUE)

Frequency distribution - Median price
	ni	fi	Ni	Fi
[60000 - 80000)	1	0.0041667	1	0.0041667
[80000 - 100000)	23	0.0958333	24	0.1000000
[100000 - 120000)	41	0.1708333	65	0.2708333
[120000 - 140000)	74	0.3083333	139	0.5791667
[140000 - 160000)	80	0.3333333	219	0.9125000
[160000 - 180000)	21	0.0875000	240	1.0000000

# Convert matrix to data frame and keep class labels
freq_median_price_df <- as.data.frame(freq_median_price)
freq_median_price_df$Class <- rownames(freq_median_price_df)
# Preserve original order of classes
freq_median_price_df$Class <- factor(freq_median_price_df$Class, levels = freq_median_price_df$Class)

# Relative frequency bar chart
ggplot(freq_median_price_df, aes(x = Class, y = fi)) +
  geom_col(fill = "#2c7fb8", width = 0.5, color = "grey25") +
  labs(
    title = "Relative frequency distribution for median price classes",
    x = "Median price",
    y = "Relative frequency"
  ) +
  theme_bw(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    panel.grid.minor = element_blank()
  )

The Gini heterogeneity index was computed based on the distribution of observations across classes by using the relative frequency fi values.

The unnormalized index is

\[\begin{equation} G = 1 - \sum_{i} fi_i^2 \end{equation}\]

while the normalized version is

\[\begin{equation} G^{*} = \frac{k}{k-1}\left(1 - \sum_{i=1}^{k} fi_i^2\right) \end{equation}\] where k is the number of non empty classes

p <- freq_median_price_df$fi
# remove possible zero/NA classes
p <- p[!is.na(p) & p > 0]

# number of non empty classes
k <- length(p)

# gini heterogeneity (unnormalized)
G <- 1 - sum(p^2)

# normalized Gini heterogeneity in [0,1]
G_norm <- if (k > 1) (k / (k - 1)) * G else NA_real_

gini_df <- data.frame(
  k = k,
  G = G,
  G_norm = G_norm
)

kable(
  gini_df,
  caption = "Gini heterogeneity indices based on class relative frequencies",
  digits = 4,
  col.names = c("Number of non empty classes (k)",
                "Gini heterogeneity (G)",
                "Normalized Gini heterogeneity (G*)"),
  align = "lll",
  booktabs = TRUE
)

Gini heterogeneity indices based on class relative frequencies
Number of non empty classes (k)	Gini heterogeneity (G)	Normalized Gini heterogeneity (G*)
6	0.7478	0.8973

Based on the reported frequency distribution and the Gini heterogeneity indices, you can state the empirical distribution of median_price across the six class intervals has a relatively high Gini heterogeneity index (G=0.7478). With k=6 non empty classes, the corresponding normalized index (G*=0.8973) is close to the upper bound of 1, indicating substantial dispersion of observations across price bands.

At the same time, the class frequencies show clear substantive concentration in the mid-to-upper market segments: the intervals [120,000–140,000) and [140,000–160,000) jointly account for approximately 64% of the sample (about 0.31 and 0.33 relative frequency, respectively). By contrast, the lowest interval [60,000–80,000) is extremely sparse (ni=1, fi≈0.0042), indicating negligible representation at the bottom of the defined price bands.

5. Probability calculation

Probabilities are estimated empirically from observed frequencies. For an event $A$, the estimated probability is \[ \widehat{P}(A)=\frac{n_A}{N}, \] where $n_A$ denotes the absolute frequency of event $A$, and $N$ is the total number of observations.
For two events $A$ and $B$, the joint probability is \[ \widehat{P}(A \cap B)=\frac{n_{A,B}}{N}, \] while the conditional probability (for $n_B>0$) is \[ \widehat{P}(A \mid B)=\frac{n_{A,B}}{n_B}. \] Percent values are obtained as $100 \times \widehat{P}(\cdot)$.

5.1 Probability that a randomly selected row from this dataset corresponds to the city of Beaumont

The probability is estimated using an empirical frequency approach by computing a frequency distribution for city, obtaining absolute frequencies (ni) and relative frequencies (fi) for each category.

# apply function to variable city
freq_city  <- freq_dist_1var(df$city)
# render table
kable(freq_city, caption = "Frequency distribution - City", align = "lrrrr", booktabs = TRUE)

Frequency distribution - City
	ni	fi	Ni	Fi
Beaumont	60	0.25	60	0.25
Bryan-College Station	60	0.25	120	0.50
Tyler	60	0.25	180	0.75
Wichita Falls	60	0.25	240	1.00

The estimated probability of selecting a row corresponding to Beaumont is then derived from the relative frequency of that category:

\[ \hat{P}(\text{City}=\text{Beaumont}) = f_i \]

# Estimated probability P(city = "Beaumont")
p_beaumont <- freq_city["Beaumont", "fi"]
# Build report table
prob_table <- data.frame(Event = "City = Beaumont", Probability = p_beaumont, Percentage = 100 * p_beaumont)
# Print with kable
knitr::kable(prob_table, caption = "Probability of selecting Beaumont",
             digits = c(0, 2, 2), align = "lrr", booktabs = TRUE)

Probability of selecting Beaumont
Event	Probability	Percentage
City = Beaumont	0.25	25

Based on the frequency distribution, the estimate is:

\[ \hat{P}(\text{City}=\text{Beaumont}) = 0.25 \]

which corresponds to 25% of the sample.

5.2 Probability that a randomly selected row corresponds to the month of July

Following the same approach described in section 5.1, the probability of selecting a row corresponding to July is obtained from the relative frequency of month = 7 in the distribution of month:

\[ \hat{P}(\text{Month}=7)=f_{7} \]

# apply function to variable month
freq_month  <- freq_dist_1var(df$month)

# render table
kable(freq_month, row.names = TRUE, caption = "Frequency distribution - Month", 
      digits = c(0, 4, 0, 4), align = "lrrrr", booktabs = TRUE)

Frequency distribution - Month
	ni	fi	Ni	Fi
1	20	0.0833	20	0.0833
2	20	0.0833	40	0.1667
3	20	0.0833	60	0.2500
4	20	0.0833	80	0.3333
5	20	0.0833	100	0.4167
6	20	0.0833	120	0.5000
7	20	0.0833	140	0.5833
8	20	0.0833	160	0.6667
9	20	0.0833	180	0.7500
10	20	0.0833	200	0.8333
11	20	0.0833	220	0.9167
12	20	0.0833	240	1.0000

# Estimated probability P(month = 7)
p_month_july <- freq_month[7, "fi"]
# Build report table
prob_table <- data.frame(Event = "Month = 7", Probability = p_month_july, Percentage = 100 * p_month_july)
# Print with kable
knitr::kable(prob_table,caption = "Probability of selecting July",
             digits = c(0, 4, 4), align = "lrr", booktabs = TRUE)

Probability of selecting July
Event	Probability	Percentage
Month = 7	0.0833	8.3333

The estimated probability is:

\[ \hat{P}(\text{Month}=7)=0.0833 \]

which corresponds to approximately 8.33% of the sample.
Based on the above frequency table, this result is consistent with a monthly partition over 12 months, where each month is expected to contribute about one-twelfth of observations under a balanced temporal structure.

5.3 Probability that a randomly selected row corresponds to December 2012

Consistent with the empirical procedure adopted in sections 5.1 and 5.2, the probability for this event is obtained from the relative frequency of the combined category period = 2012_12 in the frequency distribution of period.

# define column period in data frame
# format month in 2 chars for ordering purposes
df$period = as.Date(sprintf("%04d-%02d-01", as.integer(df$year), as.integer(df$month)))


# apply function to variable period
freq_period  <- freq_dist_1var(df$period)

# filter only 2012 rows (2012-01-01 ... 2012-12-01) to avoid printing full table
# Ni and Fi remain cumulative with respect to the original full period ordering
freq_period_2012 <- freq_period[grepl("^2012-", rownames(freq_period)), , drop = FALSE]

# render table 
kable(freq_period_2012, caption = "Frequency distribution - Period \n (filtered for 2012)", 
      digits = c(0, 4, 0, 4), align = "lrrrr", booktabs = TRUE)

Frequency distribution - Period (filtered for 2012)
	ni	fi	Ni	Fi
2012-01-01	4	0.0167	100	0.4167
2012-02-01	4	0.0167	104	0.4333
2012-03-01	4	0.0167	108	0.4500
2012-04-01	4	0.0167	112	0.4667
2012-05-01	4	0.0167	116	0.4833
2012-06-01	4	0.0167	120	0.5000
2012-07-01	4	0.0167	124	0.5167
2012-08-01	4	0.0167	128	0.5333
2012-09-01	4	0.0167	132	0.5500
2012-10-01	4	0.0167	136	0.5667
2012-11-01	4	0.0167	140	0.5833
2012-12-01	4	0.0167	144	0.6000

# Estimated probability P(month = 12 and year = 2012)
p_event_dec2012 <- freq_period["2012-12-01", "fi"]
# Build report table
prob_table <- data.frame(Event = "December 2012",
                         Probability = p_event_dec2012, Percentage = 100 * p_event_dec2012)
# Print with kable
knitr::kable(prob_table, caption = "Probability of selecting December 2012",
             digits = c(0, 4, 4), align = "lrr", booktabs = TRUE)

Probability of selecting December 2012
Event	Probability	Percentage
December 2012	0.0167	1.6667

Therefore, the estimated probability is:

\[ \hat{P}(\text{December 2012})=0.0167 \]

which corresponds to approximately 1.67% of the full dataset.

6. Creation of new variables

6.1 Define estimated average sale price

The estimated average sale price is computed as: \[ \text{avg_sale_price}=\frac{\text{volume}\times 10^{6}}{\text{sales}} \] since, as by study variables description in section 1, volume is measured in millions of dollars, while sales represents the number of transactions per city-month.

# volume is in millions of USD, so multiply by 10^6
# then divide by number of sales
df$avg_sale_price <- ifelse(df$sales > 0, (df$volume * 1e+06) / df$sales, NA)

# print examples
# randomly select row indices, then sort them to keep original order
idx <- sort(sample(nrow(df), min(5, nrow(df))))
kable(df[idx, c("city", "year", "month", "volume", "sales", "avg_sale_price")])

	city	year	month	volume	sales	avg_sale_price
40	Beaumont	2013	4	29.433	198	148651.5
93	Bryan-College Station	2012	9	28.434	149	190832.2
206	Wichita Falls	2012	2	10.697	90	118855.6
209	Wichita Falls	2012	5	12.451	102	122068.6
229	Wichita Falls	2014	1	9.626	89	108157.3

The avg_sale_price provides a mean value per transaction and complements the existing median-based price indicator (median_price).

6.2 Define listing effectiveness

Listing effectiveness is defined as the extent to which available inventory is converted into sales.
A simple monthly absorption proxy is: \[ \text{listing_effectiveness}=\frac{\text{sales}}{\text{listings}} \] An alternative turnover-based indicator is: \[ \text{inventory_turnover_ratio}=\frac{1}{\text{months_inventory}} \]

While months_inventory describes how long current stock will last, its inverse flips the perspective to show how efficiently that inventory is being converted into revenue, where higher values indicate faster market turnover.

#listing effectiveness
df$listing_effectiveness <- ifelse(df$listings > 0, df$sales / df$listings, NA)

# inverse of months_inventory (higher means faster inventory turnover)
df$inventory_turnover_ratio <- ifelse(df$months_inventory > 0, 1 / df$months_inventory, NA)

# print examples
# randomly select row indices, then sort them to keep original order
idx <- sort(sample(nrow(df), min(5, nrow(df))))
kable(df[idx, c("city", "year", "month", "sales", "listings", "months_inventory",
                "listing_effectiveness", "inventory_turnover_ratio")])

	city	year	month	sales	listings	months_inventory	listing_effectiveness	inventory_turnover_ratio
25	Beaumont	2012	1	110	1647	11.4	0.0667881	0.0877193
92	Bryan-College Station	2012	8	296	1518	8.1	0.1949934	0.1234568
133	Tyler	2011	1	143	2852	12.6	0.0501403	0.0793651
163	Tyler	2013	7	369	2998	10.7	0.1230821	0.0934579
233	Wichita Falls	2014	5	140	899	7.6	0.1557286	0.1315789

# aggregate data by period across all cities
monthly_trend <- df %>%
  group_by(period) %>%
  summarise(
    # calculate means for both indicators
    listing_effectiveness = mean(listing_effectiveness, na.rm = TRUE),
    inventory_turnover_ratio = mean(inventory_turnover_ratio, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  pivot_longer(cols = c(listing_effectiveness, inventory_turnover_ratio),
               names_to = "Metric", values_to = "Value")

# plot monthly trends of effectiveness metrics
ggplot(monthly_trend, aes(x = period, y = Value, color = Metric)) +
  geom_line(linewidth = 0.9) +
  theme_bw(base_size = 11) +
  labs(title = "Monthly Trend of effectiveness metrics", x = NULL, y = "Mean value", color = NULL)

The monthly trend confirms a general increase in market absorption over the sample period for both indicators, which is consistent with their conceptual link as both proxy the speed of inventory absorption:

inventory_turnover_ratio shows a relatively smooth upward trajectory, rising from about 0.10–0.12 in early years to around 0.17–0.18 by the end of the period
listing_effectiveness is more volatile, with pronounced short-term oscillations and several peaks (up to ~0.20), suggesting higher sensitivity to short-run fluctuations in sales/listings but it also exhibits a clear positive trend

7. Conditional analysis

Conditional summary statistics are descriptive measures computed on subsets of the data defined by specific criteria (strata). Typical measures include the mean, median, total, and standard deviation; in particular, the mean under a given condition can be interpreted as a conditional expectation. These summaries are used to characterize how a variable behaves across different sub-populations or time segments.

This section reports conditional statistical analyses stratified by city, year, and month. The analysis is implemented in R (using either dplyr or base R). For each stratum, key summary statistics, primarily mean and standard deviation, are estimated and then visualized to support comparison of cross-sectional differences and temporal patterns.

# quantitative variables and derived indicators
num_vars <- c("avg_sale_price","listing_effectiveness", "inventory_turnover_ratio")

# compute mean and standard deviation of every variable in num_vars
summarise_mean_sd <- function(data) {
  data %>%
    summarise(
      # mean of each selected column within the current group
      across(all_of(num_vars),\(x) mean(x, na.rm = TRUE), .names = "{.col}_mean"),
      # standard deviation of each selected column within the current group
      across(all_of(num_vars), \(x) sd(x, na.rm = TRUE), .names = "{.col}_sd"),
      .groups = "drop"  # return an not grouped data frame
    )
}

# conditional summaries for different stratifications
summary_by_city        <- df %>% group_by(city)        %>% summarise_mean_sd()
summary_by_year        <- df %>% group_by(year)        %>% summarise_mean_sd()
summary_by_month       <- df %>% group_by(month)       %>% summarise_mean_sd()
summary_by_city_year   <- df %>% group_by(city, year)  %>% summarise_mean_sd()
summary_by_city_month  <- df %>% group_by(city, month) %>% summarise_mean_sd()

# print summary tables
kable(summary_by_city,       caption = "Conditional summary by city (mean, sd)",       digits = 4, booktabs = TRUE)

Conditional summary by city (mean, sd)
city	avg_sale_price_mean	listing_effectiveness_mean	inventory_turnover_ratio_mean	avg_sale_price_sd	listing_effectiveness_sd	inventory_turnover_ratio_sd
Beaumont	146640.4	0.1061	0.1032	11232.13	0.0267	0.0182
Bryan-College Station	183534.3	0.1473	0.1453	15149.35	0.0729	0.0533
Tyler	167676.8	0.0935	0.0909	12350.51	0.0235	0.0164
Wichita Falls	119430.0	0.1280	0.1293	11398.48	0.0247	0.0136

kable(summary_by_year,       caption = "Conditional summary by year (mean, sd)",       digits = 4, booktabs = TRUE)

Conditional summary by year (mean, sd)
year	avg_sale_price_mean	listing_effectiveness_mean	inventory_turnover_ratio_mean	avg_sale_price_sd	listing_effectiveness_sd	inventory_turnover_ratio_sd
2010	150188.6	0.0997	0.1046	23279.55	0.0337	0.0217
2011	148250.6	0.0927	0.0950	24938.38	0.0232	0.0181
2012	150898.7	0.1097	0.1040	26438.50	0.0281	0.0178
2013	158705.2	0.1346	0.1288	26523.81	0.0448	0.0314
2014	163558.7	0.1570	0.1534	31740.53	0.0618	0.0496

kable(summary_by_month,      caption = "Conditional summary by month (mean, sd)",      digits = 4, booktabs = TRUE)

Conditional summary by month (mean, sd)
month	avg_sale_price_mean	listing_effectiveness_mean	inventory_turnover_ratio_mean	avg_sale_price_sd	listing_effectiveness_sd	inventory_turnover_ratio_sd
1	145640.4	0.0831	0.1190	29819.11	0.0230	0.0290
2	148840.5	0.0878	0.1160	25120.42	0.0219	0.0284
3	151136.5	0.1160	0.1119	23237.92	0.0346	0.0283
4	151461.3	0.1253	0.1091	26174.30	0.0380	0.0297
5	158235.0	0.1415	0.1102	25787.19	0.0503	0.0314
6	161545.8	0.1424	0.1104	23470.46	0.0576	0.0337
7	156881.0	0.1435	0.1127	27220.12	0.0740	0.0386
8	156455.6	0.1419	0.1154	28253.21	0.0526	0.0394
9	156522.3	0.1117	0.1188	29669.41	0.0348	0.0417
10	155897.4	0.1119	0.1218	32527.29	0.0360	0.0418
11	154233.0	0.1025	0.1259	29684.87	0.0293	0.0437
12	154995.5	0.1173	0.1351	27008.87	0.0379	0.0496

kable(summary_by_city_year,  caption = "Conditional summary by city/year (mean, sd)",  digits = 4, booktabs = TRUE)

Conditional summary by city/year (mean, sd)
city	year	avg_sale_price_mean	listing_effectiveness_mean	inventory_turnover_ratio_mean	avg_sale_price_sd	listing_effectiveness_sd	inventory_turnover_ratio_sd
Beaumont	2010	146582.5	0.0898	0.0920	13960.173	0.0195	0.0062
Beaumont	2011	145921.9	0.0823	0.0855	12655.337	0.0117	0.0052
Beaumont	2012	141475.9	0.1015	0.0933	10345.771	0.0158	0.0079
Beaumont	2013	150079.0	0.1225	0.1142	6245.121	0.0215	0.0069
Beaumont	2014	149142.7	0.1346	0.1311	11234.169	0.0218	0.0064
Bryan-College Station	2010	174601.8	0.1056	0.1163	11964.068	0.0396	0.0111
Bryan-College Station	2011	173689.0	0.1027	0.1033	11645.001	0.0315	0.0117
Bryan-College Station	2012	179360.6	0.1215	0.1141	9072.876	0.0423	0.0167
Bryan-College Station	2013	187315.8	0.1708	0.1613	12931.505	0.0649	0.0372
Bryan-College Station	2014	202704.3	0.2362	0.2318	8625.369	0.0768	0.0312
Tyler	2010	159537.5	0.0745	0.0795	8554.899	0.0151	0.0059
Tyler	2011	160248.0	0.0773	0.0747	8949.978	0.0126	0.0064
Tyler	2012	165533.0	0.0902	0.0866	12271.146	0.0134	0.0052
Tyler	2013	174501.8	0.1012	0.0986	8939.224	0.0143	0.0065
Tyler	2014	178563.5	0.1242	0.1152	10805.818	0.0199	0.0122
Wichita Falls	2010	120032.5	0.1290	0.1306	12351.214	0.0302	0.0084
Wichita Falls	2011	113143.6	0.1085	0.1166	8247.222	0.0159	0.0084
Wichita Falls	2012	117225.3	0.1255	0.1222	13981.539	0.0154	0.0072
Wichita Falls	2013	122924.3	0.1439	0.1413	8760.490	0.0283	0.0133
Wichita Falls	2014	123824.3	0.1331	0.1355	10994.397	0.0187	0.0135

kable(summary_by_city_month, caption = "Conditional summary by city/month (mean, sd)", digits = 4, booktabs = TRUE)

Conditional summary by city/month (mean, sd)
city	month	avg_sale_price_mean	listing_effectiveness_mean	inventory_turnover_ratio_mean	avg_sale_price_sd	listing_effectiveness_sd	inventory_turnover_ratio_sd
Beaumont	1	142059.2	0.0760	0.1050	20363.512	0.0201	0.0151
Beaumont	2	146503.0	0.0826	0.1030	12974.719	0.0197	0.0146
Beaumont	3	149918.4	0.1037	0.1029	5398.706	0.0132	0.0198
Beaumont	4	142949.1	0.1118	0.1000	5511.596	0.0141	0.0176
Beaumont	5	146873.9	0.1208	0.0993	6495.480	0.0303	0.0187
Beaumont	6	148591.7	0.1183	0.0990	4913.971	0.0252	0.0186
Beaumont	7	153993.7	0.1061	0.0981	15215.577	0.0179	0.0190
Beaumont	8	150966.9	0.1278	0.1012	6549.042	0.0352	0.0198
Beaumont	9	144663.8	0.1043	0.1038	13874.571	0.0352	0.0238
Beaumont	10	148133.6	0.1137	0.1051	9899.859	0.0319	0.0213
Beaumont	11	134896.1	0.0966	0.1074	11773.634	0.0173	0.0223
Beaumont	12	150135.5	0.1119	0.1139	10028.542	0.0229	0.0228
Bryan-College Station	1	179365.7	0.0862	0.1403	13494.092	0.0256	0.0355
Bryan-College Station	2	169985.7	0.0867	0.1330	18446.113	0.0305	0.0389
Bryan-College Station	3	174920.3	0.1226	0.1246	8552.149	0.0546	0.0435
Bryan-College Station	4	182128.2	0.1443	0.1200	14123.928	0.0573	0.0468
Bryan-College Station	5	181804.4	0.1950	0.1294	18412.798	0.0620	0.0480
Bryan-College Station	6	181582.2	0.2164	0.1363	18298.850	0.0701	0.0530
Bryan-College Station	7	183344.8	0.2228	0.1447	16508.899	0.1132	0.0612
Bryan-College Station	8	184104.9	0.1943	0.1506	16633.849	0.0737	0.0608
Bryan-College Station	9	191815.7	0.1236	0.1578	9544.628	0.0520	0.0618
Bryan-College Station	10	193938.3	0.1214	0.1614	13905.882	0.0587	0.0614
Bryan-College Station	11	192760.5	0.1167	0.1661	11943.247	0.0436	0.0659
Bryan-College Station	12	186660.8	0.1381	0.1802	15651.209	0.0618	0.0775
Tyler	1	154935.3	0.0669	0.0929	6400.878	0.0161	0.0126
Tyler	2	164516.8	0.0768	0.0921	8645.045	0.0132	0.0135
Tyler	3	161441.0	0.0947	0.0899	11066.124	0.0114	0.0126
Tyler	4	162962.8	0.0971	0.0868	10856.908	0.0148	0.0133
Tyler	5	178711.5	0.1042	0.0866	6087.930	0.0233	0.0155
Tyler	6	180028.9	0.1071	0.0854	11050.260	0.0258	0.0151
Tyler	7	170866.7	0.1040	0.0852	8333.915	0.0225	0.0149
Tyler	8	173738.0	0.1028	0.0871	11343.693	0.0213	0.0159
Tyler	9	169106.3	0.0955	0.0896	17250.045	0.0248	0.0180
Tyler	10	167987.0	0.0950	0.0927	15113.128	0.0300	0.0199
Tyler	11	166102.4	0.0826	0.0975	7061.601	0.0267	0.0223
Tyler	12	161724.3	0.0952	0.1053	14740.546	0.0302	0.0260
Wichita Falls	1	106201.5	0.1032	0.1379	9788.224	0.0169	0.0156
Wichita Falls	2	114356.4	0.1052	0.1359	7397.539	0.0152	0.0120
Wichita Falls	3	118266.5	0.1428	0.1301	12167.279	0.0263	0.0067
Wichita Falls	4	117805.3	0.1481	0.1294	7684.451	0.0286	0.0116
Wichita Falls	5	125550.3	0.1459	0.1256	5015.104	0.0285	0.0136
Wichita Falls	6	135980.5	0.1278	0.1208	13412.726	0.0119	0.0092
Wichita Falls	7	119318.8	0.1411	0.1229	7206.987	0.0288	0.0120
Wichita Falls	8	117012.4	0.1428	0.1226	5664.009	0.0211	0.0128
Wichita Falls	9	120503.5	0.1235	0.1241	6905.672	0.0213	0.0162
Wichita Falls	10	113530.6	0.1176	0.1281	13971.742	0.0165	0.0166
Wichita Falls	11	123173.0	0.1141	0.1325	12234.014	0.0145	0.0154
Wichita Falls	12	121461.4	0.1242	0.1412	12532.343	0.0176	0.0144

7.1 Comparison of average sale price by city (and year)

The city-level chart summarizes conditional means of average sale price across all months and years within each market. Bryan–College Station and Tyler exhibit the highest central price levels, while Wichita Falls records the lowest mean with Beaumont occupies an intermediate position. The error bars reflect dispersion at city-level, indicating that price levels are not only shifted across cities but also differ in short-run variability.

# average sale price by city (mean ± SD)
ggplot(summary_by_city, 
       aes(x = city, y = avg_sale_price_mean)) +
  geom_col(col="black", fill = "#2c7fb8", width = 0.7) +
  geom_errorbar(
    aes(
      ymin = avg_sale_price_mean - avg_sale_price_sd,
      ymax = avg_sale_price_mean + avg_sale_price_sd
    ),
    width = 0.3
  ) +
  labs(
    title = "Average sale price by city (mean ± SD)",
    x = "City",
    y = "Average sale price (USD)"
  ) +
  theme_bw(base_size = 11)

The grouped bar chart refines the comparison by conditioning on both city and year, so each bar represents the mean price within a city–year stratum and the error bar captures dispersion within that stratum. Bryan–College Station shows the highest levels and a positive temporal gradient from 2011 to 2014, consistent with strengthening market conditions in that sub-market. Also Tyler displays a stable increase over the sample period. Beaumont appears comparatively stable with limited evidence of a sustained upward shift. Wichita Falls remains the lowest-priced market throughout, with only modest growth after 2011.

# average sale price by city and year (mean ± SD)
ggplot(summary_by_city_year, 
       aes(x = city, y = avg_sale_price_mean, fill = factor(year))) +
  geom_col(color = "black", position = position_dodge(width = 0.8), width = 0.7) +
  geom_errorbar(
    aes(
      ymin = avg_sale_price_mean - avg_sale_price_sd,
      ymax = avg_sale_price_mean + avg_sale_price_sd
    ),
    position = position_dodge(width = 0.8),
    width = 0.2
  ) +
  labs(
    title = "Average sale price by city and year (mean ± SD)",
    x = "City",
    y = "Average sale price (USD)",
    fill = "Year"
  ) +
  theme_bw(base_size = 11)

Overall, the figures point to marked cross-sectional heterogeneity in the Texas panel: geographic stratification is a primary source of variation in average transaction values.

7.2 Comparison of listing effectiveness over years by city

The line chart displays conditional means of listing effectiveness (sales relative to listings) for each city–year stratum, with the series show a common upward shift after an early dip around 2011, consistent with an improvement in the conversion of active listings into sales. The magnitude and timing of that improvement, however, differ across cities:

Bryan–College Station stands out from 2012 onward: effectiveness rises from roughly 0.10 in the early years to about 0.24 by 2014, the highest level in the plot.
Wichita Falls begins with the highest effectiveness in 2010 (near 0.13), then declines in 2011 before recovering to a local peak around 2013 (≈0.144) then a slight fall between 2013 and 2014 (to ≈0.133).
Beaumont follows a comparatively smooth upward path after 2011, from about 0.09 to roughly 0.135 in 2014, converging toward the levels of Wichita Falls by the end of the period.
Tyler shows a stable linear improvement, reaching approximately 0.124 in 2014 and reducing the gap with Beaumont and Wichita Falls.

# listing effectiveness (sales/listings) over time, by city
ggplot(
  summary_by_city_year,
  aes(x = year, y = listing_effectiveness_mean, color = city, group = city)
) +
  geom_line(linewidth = 0.8) +
  geom_point(size = 1.8) +
  labs(
    title = "Listing Effectiveness by year and city",
    x = "Year",
    y = "Listing effectiveness (mean)",
    color = "City"
  ) +
  theme_bw(base_size = 11)

The figure supports the below main conclusions for the report.

First, city-specific dynamics dominate: Bryan–College Station diverges upward while the other three markets move in a narrower band.

Second, by 2014 there is partial convergence among Beaumont, Tyler and Wichita Falls (effectiveness roughly 0.12–0.14), whereas Bryan–College Station remains an outlier on the high side.

7.3 Seasonal patterns of inventory turnover and listing effectiveness by month and city

The chart reports conditional means of inventory turnover by calendar month within each city by aggregating across years in summary_by_city_month:

Bryan–College Station shows the strongest seasonal swing: turnover falls to about 0.12 around month 4, then rises steadily to roughly 0.18 in December, the highest value on the plot, which is consistent with faster clearing of listings in the second half of the year.
Wichita Falls starts close to Bryan–College Station (≈0.138), drifts down to a trough near 0.12 in month 6, and recovers to about 0.14 by December, ranking as second.
Beaumont and Tyler follow a milder U-shaped pattern, with mid-year lows (Tyler near 0.085 in month 7, the lowest overall) and modest year-end gains (Tyler ≈0.105, Beaumont ≈0.115).

Shared features include a mid-year dip (months 4–7) and a fourth-quarter rise, which supports interpreting turnover as driven by both seasonality and persistent city effects.

# seasonal pattern of inventory turnover by month and city
ggplot(
  summary_by_city_month,
  aes(x = month, y = inventory_turnover_ratio_mean, color = city, group = city)
) +
  geom_line(linewidth = 0.9) +
  geom_point(size = 1.6) +
  scale_x_continuous(breaks = 1:12) +
  labs(
    title = "Inventory turnover ratio by month and city",
    x = "Month",
    y = "Inventory turnover ratio (mean)",
    color = "City"
  ) +
  theme_bw(base_size = 11)

The second figure plots conditional means of listing effectiveness by month and city. Seasonality is again evident, with most cities showing higher absorption in spring–summer and lower values in late autumn, then a small December uptick.

Bryan–College Station exhibits the largest amplitude: effectiveness rises from about 0.09 early in the year to a peak near 0.22 in July, then falls sharply through August–September and stabilizes around 0.12–0.14. That pattern indicates a concentrated summer selling season in that market.

Wichita Falls peaks earlier, around 0.15 in April, with a secondary high near 0.14 in August, and ends the year near 0.12.

Beaumont fluctuates between roughly 0.08 and 0.13, with a high near 0.13 in August.

Tyler stays below the other cities all year (about 0.07–0.11), with a gentle peak near 0.11 in June and a November low near 0.08.

# seasonal pattern of listing effectiveness by month and city
ggplot(
  summary_by_city_month,
  aes(x = month, y = listing_effectiveness_mean, color = city, group = city)
) +
  geom_line(linewidth = 0.9) +
  geom_point(size = 1.6) +
  scale_x_continuous(breaks = 1:12) +
  labs(
    title = "Seasonal pattern of listing effectiveness by month and city",
    x = "Month",
    y = "Listing effectiveness (mean)",
    color = "City"
  ) +
  theme_bw(base_size = 11)

8. Creating visualizations with ggplot2

In this section, customized graphics are produced using ggplot2 to support comparative analysis of the Texas real estate data. The visualizations address three objectives:

compare the distribution of median sale price across cities (boxplots);
compare total sales by month and city (bar charts);
examine sales dynamics over different historical periods (line charts).

Together, these figures facilitate assessment of cross-sectional heterogeneity, seasonal patterns, and temporal trends in market activity.

8.1 Distribution of median sale price by city

The boxplots summarize the empirical distribution of monthly median sale prices within each city (all years pooled). The median line and interquartile range (IQR) describe the central level and spread of typical prices; whiskers and points indicate the remaining range and upper-tail outliers.

Bryan–College Station shows the highest median (about $157,000), followed by Tyler (≈ $141,000), Beaumont (≈ $130,000), and Wichita Falls (≈ $102,000). The gap between cities indicates strong cross-sectional heterogeneity in price levels.

Mean and IQR of median sale price by city
City	Mean (USD)	IQR (USD)
Beaumont	129988.3	11525
Bryan-College Station	157488.3	11175
Tyler	141441.7	13700
Wichita Falls	101743.3	16375

Box heights (IQR) are basically similar across cities, so within-city variability of monthly medians is comparable in relative terms, though Wichita Falls and Tyler appear slightly wider. That pattern suggests differences between cities are driven mainly by level shifts, not by market volatility.

ggplot(df, aes(x = city, y = median_price, fill = city)) +
  geom_boxplot(width = 0.6, color = "black") +
  labs(
    title = "Distribution of median sale price by city",
    x = "City",
    y = "Median sale price (USD)"
  ) +
  theme_bw(base_size = 11) +
  theme(legend.position = "none")

Disaggregating by year refines the pooled comparison and highlights temporal dynamics within each market.

The ordering by price level is preserved in most years: Bryan–College Station remains highest, Wichita Falls lowest, with Tyler and Beaumont in between. In particular, by 2014, Tyler’s distribution has shifted upward and overtakes Beaumont, displaying the most regular monotonic rise in median level year over year with a sustained local price increase.

ggplot(df, aes(x = city, y = median_price, fill = factor(year))) +
  geom_boxplot(
    position = position_dodge(width = 0.8),
    width = 0.7,
    color = "black",
  ) +
  labs(
    title = "Distribution of median sale price by city and year",
    x = "City",
    y = "Median sale price (USD)",
    fill = "Year"
  ) +
  theme_bw(base_size = 11)

8.2 Total sales by month and city

The stacked bar chart aggregates sales counts by calendar month and city, pooling observations across all years in the panel.

Total market activity shows a pronounced seasonal cycle: volumes are lowest in January (2548 sales), rise through spring, and peak in June (4871 sales), then decline through autumn to a secondary low in November (3137), with a modest recovery in December, defining a summer-weighted selling season pattern.

ggplot(total_sales_month_city, aes(x = factor(month), y = total_sales, fill = city)) +
  geom_col(position = "stack", width = 0.7, color = "black") +
  scale_x_discrete(breaks = 1:12) +
  labs(title = "Total sales by month and city (absolute)", x = "Month", y = "Total sales (count)", fill = "City") +
  theme_bw(base_size = 11)

The normalized chart displays each city’s share of that month’s aggregate. Relative composition is stable across months: Tyler typically accounts for about 34–39% of monthly sales, Beaumont and Bryan–College Station each near mean of 23–26, and Wichita Falls about 13–17%. Minor mid-year shifts appear with Bryan–College Station gains in relative share around months 5–7 and with small offsetting changes for Tyler and Beaumont.

This view supports the conclusion that geographic structure in the four-city panel is persistent, with only limited evidence of month-specific reallocation among cities.

Month	Beaumont	Bryan-College Station	Tyler	Wichita Falls
1	0.24	0.23	0.36	0.17
2	0.24	0.22	0.38	0.16
3	0.23	0.25	0.35	0.17
4	0.22	0.28	0.34	0.16
5	0.22	0.32	0.33	0.14
6	0.21	0.33	0.34	0.13
7	0.20	0.32	0.34	0.14
8	0.23	0.28	0.34	0.15
9	0.24	0.22	0.39	0.16
10	0.26	0.21	0.38	0.15
11	0.25	0.23	0.36	0.16
12	0.26	0.23	0.36	0.15

ggplot(total_sales_month_city) +
  geom_bar(aes(x = month, y = total_sales, fill = factor(city)), stat = "identity", position = "fill") +
  scale_x_continuous(breaks = 1:12) +
  labs(title = "Total sales by month and city (normalized)", x = "Month", y = "Total sales (count)", fill = "City") +
  theme_bw(base_size = 11)

The below chart displays absolute monthly sales counts stacked by city within each calendar year. Each bar’s height is total transactions in that month–year and segment heights show each city’s contribution in units.

Total monthly sales generally increase over the sample period with city segments grow in parallel during many months, so the rise reflects an overall volume gains rather than a single city driving the entire increase.

For every considered year, sales tend to be lower in early months (especially January), peak in mid-summer (June–August) then ease toward year-end. This pattern aligns with the pooled monthly stacked chart with summer as the dominant selling season in these markets.

# total sales per city and month and years
total_sales_year_month_city <- df %>%
  group_by(city, year, month) %>%
  summarise(total_sales = sum(sales, na.rm = TRUE), .groups = "drop")

ggplot(total_sales_year_month_city) +
  geom_bar(aes(x = month, y = total_sales, fill = factor(city)), stat = "identity", position = "stack") +
  scale_x_continuous(breaks = 1:12) +
  facet_wrap(~ year, ncol = 2) +
  labs(title = "Total sales by year, month and city", x = "Month", y = "Total sales (count)", fill = "City") +
  theme_bw(base_size = 11)

8.3 Sales dynamics over historical periods

The city-specific line charts (month on the x-axis, one line per year, 2010–2014) show strong seasonality and generally rising sales over the period, but with clear cross-city differences.

Tyler, Beaumont, and Bryan–College Station all record higher volumes in last two years (2013–2014), with strong counts in spring–summer and lower in winter. In particular, Bryan–College Station has the most regular pattern (sharp June–July peak, then a fast drop) and the highest peak levels.

# plot monthly sales for one city with one line per calendar year.
plot_sales_by_city_period <- function(city_name) {
  df %>%
    # case insensitive comparison
    filter(str_to_upper(city) == str_to_upper(city_name)) %>%
    # one line for each year
    ggplot(aes(x = month, y = sales, color = factor(year), group = year)) +
    geom_line(linewidth = 0.9) +
    geom_point(size = 1.5) +
    scale_x_continuous(breaks = 1:12) +
    labs(
      title = paste("Seasonal sales by year —", city_name),
      x = "Month",
      y = "Sales (count)",
      color = "Year"
    ) +
    theme_bw(base_size = 11)
}

Wichita Falls is smaller in scale and less predictable: year lines overlap, peaks shift across months, and there is no clear upward trend over 2010–2014, then showing the more volatile market in the report.

The figure plots monthly sales against period (2010–early 2015), faceted by city with free y-scales so each panel shows local level and variation. Observed series (blue) are overlaid with a linear trend (red dashed).

Beaumont shows a positive upward trend with clear seasonal oscillation.

Bryan–College Station combines a steep positive trend with regular, amplifying seasonality: peak months rise markedly in later year (above 400 units), while troughs remain near 100, exhibiting both the strongest growth and large seasonal swings.

Tyler displays a steady upward trend and consistent seasonal cycles, with a higher baseline than Beaumont and Wichita Falls (exceeding 400 at peak in 2014). Tyler is the largest active market in peak volume among the four.

For Wichita Falls the linear trend is basically flat, implying little net growth over the periods with the seasonal pattern less smooth than in the other three cities.

ggplot(df, aes(x = period, y = sales)) +
  geom_line(linewidth = 0.8, color = "#2c7fb8") +
  geom_point(size = 1.2, color = "#2c7fb8") +
  geom_smooth(method = "lm", se = FALSE, color = "red",
                         linewidth = 0.8, linetype = "dashed") +
  facet_wrap(~ city, ncol = 2, scales = "free_y") +
  labs(
    title = "Sales dynamics over time by city",
    x = NULL,
    y = "Sales (count)"
  ) +
  theme_bw(base_size = 11)

9. Conclusions

This report presents a descriptive statistical analysis of the Texas real estate panel in realestate_texas.csv with monthly market indicators for four cities (Beaumont, Bryan–College Station, Tyler, Wichita Falls) over 2010–2014 (N = 240 city–month observations).

Pooled descriptive statistics show that volume has the greatest relative dispersion (CV ≈ 0.54) and the strongest right-skewness** (≈ 0.88), while median_price is comparatively stable (CV ≈ 0.17):

volume is therefore the most heterogeneous and asymmetric variable
median_price is the most suitable for central-tendency comparisons

Classifying median_price into six intervals yields substantial heterogeneity across bands (Gini G = 0.7478, normalized G = 0.8973*), yet most observations fall in mid-to-upper segments ([120,000–140,000) and [140,000–160,000) ≈ 64% combined), with negligible mass at the lowest band.

Conditional summaries by city, year, and month confirm that geography, calendar year, and season jointly shape outcomes:

city effects dominate price levels
year effects capture growth in several markets
month effects capture recurring summer peaks and winter troughs in sales

With graphical evidence, boxplots of median_price reveal a stable city ranking (Bryan–College Station and Tyler highest; Wichita Falls lowest), with appreciation in 2013–2014 for the upper-tier cities.

Stacked and normalized sales charts show strong seasonality in aggregate volume (peak around June) but stable city shares across months and years.

Line charts of sales over period and by year within city indicate positive trends in Beaumont, Bryan–College Station, and Tyler, with pronounced and regular seasonality in the larger markets. On the contrary, we can observe stagnant dynamics in Wichita Falls.

As overall assessment, the Texas submarkets in this sample are characterized by

persistent cross-city heterogeneity in prices and absorption
similar seasonal cycles in transaction sales
moderate upward drift in activity and prices in three of four cities over 2010–2014
concentration of median prices in middle-to-upper brackets

Wichita Falls behaves as a smaller and less trending market.

Texas Real Estate Market Analysis

Texas Realty Insights — Statistical Report

Francesco Meloni

2026-04-12