| city | year | month | sales | volume | median_price | listings | months_inventory |
|---|---|---|---|---|---|---|---|
| Beaumont | 2010 | 1 | 83 | 14.162 | 163800 | 1533 | 9.5 |
| Beaumont | 2010 | 2 | 108 | 17.690 | 138200 | 1586 | 10.0 |
| Beaumont | 2010 | 3 | 182 | 28.701 | 122400 | 1689 | 10.6 |
| Beaumont | 2010 | 4 | 200 | 26.819 | 123200 | 1708 | 10.6 |
| Beaumont | 2010 | 5 | 202 | 28.833 | 123100 | 1771 | 10.9 |
The imported dataset realestate_texas consists of eight variables, covering both qualitative and quantitative data types. Understanding the nature of these variables is crucial for determining appropriate statistical and analytical approaches.
## city
## Beaumont Bryan-College Station Tyler
## 60 60 60
## Wichita Falls
## 60
Since city is a qualitative nominal variable, analyzing its frequency distribution provides insights into how the dataset is structured. The frequency table shows that each city —Beaumont, Bryan-College Station, Tyler, and Wichita Falls— appears exactly 60 times. This indicates that the dataset is evenly distributed across cities, meaning no single city is overrepresented or underrepresented. As a result, there is no mode in the distribution. This suggests that the data collection process was designed to ensure equal representation across different locations, allowing for comparative analysis among cities.
## year
## 2010 2011 2012 2013 2014
## 48 48 48 48 48
The year variable represents the time period over which sales data were collected. The dataset spans five years, covering the period from January 2010 to December 2014. The frequency distribution shows that each year contains exactly 48 observations, confirming that the data collection was evenly distributed over time. Given that the dataset covers four cities, this suggests that for each city, an equal number of records were recorded per year, allowing for consistent year-over-year comparisons and reliable trend analysis.
## month
## 1 2 3 4 5 6 7 8 9 10 11 12
## 20 20 20 20 20 20 20 20 20 20 20 20
Examining the frequency distribution of the month variable confirms the structured nature of the dataset. Each month appears exactly 20 times, and when divided by the four cities, this indicates that data collection was evenly distributed across all months for each city.
Furthermore, this pattern aligns with the yearly distribution, reinforcing that data were systematically recorded every month from January 2010 to December 2014 without gap.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 79.0 127.0 175.5 192.3 247.0 423.0
The minimum value of sales is 79, the maximum is 423. The mean (192.3) is higher than the median (175.5), which suggests a right-skewed (positively skewed) distribution. This implies that while most values are concentrated in the lower range, there are some larger values (potential outliers) pulling the mean upward.
To further confirm the findings highlighted by the measures of position, an analysis of variability is essential. This allows us to understand the spread and fluctuations in the sales data.
| Range | IQR | Variance | Sd | CV | |
|---|---|---|---|---|---|
| Sales | 344 | 120 | 6344.3 | 79.65 | 41.42 |
Range: 344
The difference between the maximum and minimum sales values shows substantial spread in the dataset, reinforcing the observation that sales are highly variable across cities and months.
Interquartile Range: 120
The difference between the 75th percentile (Q3) and the 25th percentile (Q1) , (capturing the middle 50% of the data), is significantly smaller than the range. This could indicate that extreme values are pulling the range outward.
Variance: 6344.3
The variance measures how much individual sales values deviate from the mean. A large variance (6344.3) suggests a wide spread of data points, indicating high variability in the number of properties sold.
Standard Deviation: 79.65
The standard deviation (79.65) is the average deviation of sales from the mean. On average, the number of properties sold deviates by about 79.65 units from the mean, which again shows the significant fluctuations in sales activity.
Coefficient of Variation (CV): 41.42%
The Coefficient of Variation (CV) is calculated as the ratio of the standard deviation to the mean, expressed as a percentage. With a CV of 41.42%, we can conclude that there is high relative variability in sales, meaning that sales fluctuate significantly across different locations (cities) and periods (months and years). This suggests that market conditions are highly volatile and may be influenced by external factors.
Conclusion The analysis of both measures of position and measures of variability confirms the substantial variability in the real estate market. The right-skewed distribution, large range, and high coefficient of variation all point to a dynamic market where the number of properties sold can vary greatly across different locations and time periods.
To enhance the finding a graphical representation and an in-deep analysis of skewness and kurtosis is essential.
| Skewness | Kurtosis | |
|---|---|---|
| Sales | 0.72 | -0.31 |
Skewness: 0.72
As expected there is a Positive Skewness (Right Skew).The boxplot confirms that indeed the median is closer to the first quartile than the third one.
Kurtosis: -0.31
Kurtosis describes the shape of the frequency distribution. It gives an idea about the shape of a frequency distribution.In particular, the value of -0.31 indicates slightly platykurtic behavior. Meaning that the curve having a high peak than the normal distribution.
Boxplot: Looking at the boxplot, it is possible to notice that there are no outliers. However, the length of the upper whisker, which is noticeably longer than the lower whisker, implies a right-skewed distribution. This suggests that while most values are concentrated in the lower range, there are some higher values extending the distribution without being classified as outliers.
summary(volume)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.166 17.660 27.062 31.005 40.893 83.547
The minimum value for volume is 8.166 million dollars, the maximum is 83.547, whereas the mean is 31.005 million dollars. Median (27.062) is slightly lower than the mean (31.005). This suggests a right (positive) skew, meaning there are higher values (outliers) pulling the mean upward.
| Range | IQR | Variance | Sd | CV | |
|---|---|---|---|---|---|
| Volume | 75.38 | 23.23 | 277.27 | 16.65 | 53.71 |
Range: 75.38 The difference between the maximum and the minimum could at first sight seems
Interquartile Range: 23.23
The <interquartile range is much smaller than the range. This suggests that while the middle 50% of the data is relatively concentrated, some extreme values or high variability exist in the dataset.
Variance: 277.27
A variance of 277.27 represents a relatively high value, indicating substantial a spread.
Standard Deviation: 16.65
Standard Deviation is the average deviation of volume from the mean.On average, sales volumes deviate by 16.65 units from the mean.
Coefficient of Variation: 53.71%
A Coefficient of Variation above 50% signals high relative variability, meaning sales amounts are inconsistent. This suggests potential market volatility, with some regions/times experiencing significantly higher/lower sales.
To confirm the evidence highlight by the measures of variability. A graphical representation and an in-deep analysis of skewness and kurtosis is essential.
| Skewness | Kurtosis | |
|---|---|---|
| Volume | 0.88 | 0.18 |
Skewness: 0.88
The value higher than 0, confirms that there is a Positive Skewness (Right Skew). The boxplot confirms that indeed the median is closer to the first quartile than the third one.
Kurtosis: 0.18
A value of 0.18 indicates a light Leptokurtic distribution.Meaning that the curve has a high peak than the normal distribution.In this curve, there is too much concentration of items near the central value.
Boxplot:
The boxplot confirms the presence of outliers, as there are four values that exceed the black line, which represents the upper threshold (Q3 + 1.5 × IQR). The upper whisker is noticeably longer than the lower whisker, indicating a right-skewed distribution. This suggests that while most values are concentrated in the lower range, some higher values extend the distribution, with a few classified as outliers.
summary(median_price)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 73800 117300 134500 132665 150050 180000
The minimum value is 73800 dollars, while the maximum is 180000. Median (134500) is slightly higher than the Mean (132665). This suggests a slight left (negative) skew, meaning there may be some lower values pulling the mean down.
| Range | IQR | Variance | Sd | CV | |
|---|---|---|---|---|---|
| Media Price | 106200 | 32750 | 513572983 | 22662.15 | 17.08 |
Range: 106200
The range , namely, the difference between the minimum (73800) and the maximum (180000), indicates a wide spread in data.
Interquartile range: 32750
The interquartile range is is much smaller than the range. This suggests that while the middle 50% of the data is relatively concentrated, some extreme values or high variability exist in the dataset.
Variance: 513572983
A variance of represents a relatively high value, indicating substantial a spread.
Standard Deviation:22662.15
The average deviation from the mean price is approximately $22,662, indicating a noticeable spread.
Coefficient of Variation: 17.08%
A Coefficient of Variation lower than 20% indicates that the variability is moderate, meaning median home prices do not fluctuate excessively relative to their mean.
| Skewness | Kurtosis | |
|---|---|---|
| Median Price | -0.36 | -0.62 |
Skewness: -0.36
A skewness value of -0.36 indicates a negative skewness (left-skewed).
Kurtosis: -0.62
A value of -0.62 indicates a Platykurtic distribution. Which is a curve having a low peak than the normal curve. In this curve, there is less concentration of items around the central value.
Boxplot:
The boxplot shows that there are no outliers. Additionally, it confirms that the data appears slightly left-skewed (Negative Skew), indeed longer whisker on the left, median closer to Q3. The median is almost centered within the box, indicating that the middle 50% of the data is quite symmetrically distributed. Looking at the whiskers, we can see that the data are more spread out in the first quartile, indicating mild left-tail skewness. This pattern implies that while the core data is quite balanced, There are a few lower values causing greater dispersion on the bottom. These values are not extreme enough to be classified as outliers.
summary(listings)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 743 1026 1618 1738 2056 3296
The minimum number of listing is 743 whereas the maximum is 3296. The Mean (1.738) is higher than the Median (1.618). This suggests a right (positive) skew, meaning there are higher values (possible outliers) pulling the mean upward. The gap between Q3 (2.056) and Max (3.296) is large, indicating potential outliers on the higher end.
| Range | IQR | Variance | Sd | CV | |
|---|---|---|---|---|---|
| Listings | 2553 | 1029.5 | 566569 | 752.71 | 43.31 |
Range: 2553
The range, the difference between the minimum value (743) and the maximum value (3296) is 2.553, indicating a wide spread in the data.
Interquartile Range: 1029.5
The <interquartile range is is much smaller than the range. This suggests that while the middle 50% of the data is relatively concentrated, some extreme values or high variability exist in the dataset.
Variance: 566569
A Variance value of 566,569, indicates a significant dispersion in the number of listings.
Standard Deviation: 752.71
A Standard Deviation of 752.71 means that on average, the number of active listings deviates about 753 listings from the mean.
Coefficient of Variation: 43.31%
A Coefficient of Variation greater than 40%, indicates high relative variability. The number of active listings fluctuates significantly, suggesting market instability or differences across cities.
| Skewness | Kurtosis | |
|---|---|---|
| Listings | 0.65 | -0.79 |
Skewness: 0.65
A skewness value higher than 0 indicates a Positive Skewness (Right Skew)
Kurtosis: -0.791
Akurtosis value of -0.79 indicates a Platykurtic distribution. Which is a curve having a low peak than the normal curve. In this curve, there is less concentration of items around the central value.
Boxplot:
The median is almost centered within the box, indicating that the middle 50% of the data is quite symmetrically distributed. Looking at the whiskers, it is possible to notice that the data is more spread out the third quartile. That indicates a mild right-tail skewness. This pattern implies that while the core data is quite balanced, there are a few higher values causing greater dispersion on the top. However,these values are not extreme enough to be classified as outliers.
summary(months_inventory)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.400 7.800 8.950 9.193 10.950 14.900
The minimum value of months inventory is 3.400, the maxium is 14.90. The Mean (8.950) is higher than the Median (9.193). This suggests a right (positive) skew, meaning there are higher values (outliers) pulling the mean upward.
| Range | IQR | Variance | Sd | CV | |
|---|---|---|---|---|---|
| Months Inventory | 11.5 | 3.15 | 5.31 | 2.3 | 25.06 |
Range: 11.5
A range value of 2.553, represents a wide spread in the data.
Interquartile range: 3.15
The interquartile range is lower than the range. This suggests that while the middle 50% of the data is relatively concentrated, some extreme values or high variability exist in the dataset.
Variance: 5.31
A variance value of 5.31 A relatively low value, indicating moderate spread.
Standard Deviation: 2.3
A standard deviation of 2.30 indicates that on average the months required to sell listings deviate about 2.30 months from the mean.
Coefficient of Variation: 25.06%
A coefficient of variation between 20% and 30% means that the relative variability is moderate. This suggests that inventory levels are somewhat stable, though regional or seasonal factors may still influence fluctuations.
## [1] 0.04
## [1] -0.17
Skewness: 0.04
A skewness value of 0.04 indicates an almost symmetric distribution. Indeed,it’s almost close to zero.
Kurtosis:-0.17
A kurtosis value of -0.17 indicates an almost Mesokurtic kurtosis. That means that the curve has a quite normal peak than the normal curve.
Boxplot:
The median is not centered in the box. Indeed, it’s near the first quartile. Indicating that the middle 50% of data are not symmetrically distributed. The equal whiskers imply that extreme values are evenly distributed at both ends, even if the density of observations is greater in the lower range.
| Sales | Volume | Listings | Median Price | Month Inventory | |
|---|---|---|---|---|---|
| CV | 41.42 | 53.71 | 43.31 | 17.08 | 25.06 |
| Skewness | 0.72 | 0.88 | 0.65 | -0.36 | 0.04 |
Comparing the 5 quantitative variable, Volume has the highest relative variability (CV: 53.71), indicating significant fluctuations in total sales value. Listings (CV: 43.31) and Sales (CV: 41.42) also show substantial variation, reflecting changes in market activity. In contrast, Median Price (CV: 17.08) is the most stable, with relatively minor fluctuations. Months Inventory (CV: 25.06) falls in between, suggesting moderate variability in the time required to sell available properties.
In terms of distribution, Volume is the most skewed (0.88), highlighting the presence of extreme sales values. Sales (0.72) and Listings (0.65) are also moderately right-skewed, indicating an asymmetrical distribution with a tendency toward higher values. Median Price (-0.36) is slightly left-skewed, reinforcing its stability. Months Inventory (0.04) is nearly normally distributed, emphasizing its consistency over time.
Looking at the whole picture seems that total sales value (Volume) is heavily influenced by external factors such as Listings and seasonality, while Months Inventory remains relatively independent of other market forces.
# Define the number of bins (6 equal-width intervals)
sales_bins <- cut(sales,
breaks = seq(min(sales), max(sales), length.out = 7),
include.lowest = TRUE)
# Compute frequency distribution
freq_table <- table(sales_bins)
# Compute relative frequencies
rel_freq <- prop.table(freq_table)
# Compute cumulative frequencies
cum_freq <- cumsum(freq_table)
# Compute cumulative relative frequencies
cum_rel_freq <- cumsum(rel_freq)
# Combine into a distribution table
distribution_table <- data.frame(
Sales_Category = names(freq_table),
Frequency = as.vector(freq_table),
Relative_Frequency = round(as.vector(rel_freq), 4),
Cumulative_Frequency = as.vector(cum_freq),
Cumulative_Relative_Frequency = round(as.vector(cum_rel_freq), 4)
)
# Print the distribution table
print(distribution_table)
## Sales_Category Frequency Relative_Frequency Cumulative_Frequency
## 1 [79,136] 74 0.3083 74
## 2 (136,194] 67 0.2792 141
## 3 (194,251] 40 0.1667 181
## 4 (251,308] 36 0.1500 217
## 5 (308,366] 15 0.0625 232
## 6 (366,423] 8 0.0333 240
## Cumulative_Relative_Frequency
## 1 0.3083
## 2 0.5875
## 3 0.7542
## 4 0.9042
## 5 0.9667
## 6 1.0000
# Bar plot of sales distribution
barplot(freq_table,
col = c("lightcoral","lightpink","lightblue","lightgreen","lightgoldenrodyellow","lightyellow"),
main = "Sales Distribution (Equal Bins)",
xlab = "Sales Categories",
ylab = "Frequency")
# Compute and print Gini Index
gini_index_sales <- round(gini.index(as.numeric(sales_bins)), 2)
gini_index_sales_matrix <- matrix(c(gini_index_sales),
nrow = 1, byrow = TRUE
)
# Adding names to rows and columns
colnames(gini_index_sales_matrix) <- c("**Gini Index**")
rownames(gini_index_sales_matrix) <- c("**Sales**")
# Pretty print the matrix with bold column and row names
kable(gini_index_sales_matrix, format = "markdown") %>%
kable_styling() %>%
row_spec(0, bold = TRUE) %>% # Bold column names
column_spec(1, bold = TRUE) %>% # Bold first column (row names)
column_spec(2:ncol(gini_index_sales_matrix), bold = FALSE) # Keep matrix values in normal font
| Gini Index | |
|---|---|
| Sales | 0.93 |
Histogram:
As shown in the boxplot, the histogram with 6 bins confirms the right-skewed distribution of Sales. The first three bins, representing lower sales volumes, account for 75% of transactions, indicating that most months register sales between 79 and 251 properties. Higher sales figures—beyond 251—may be influenced by seasonal trends or market anomalies. To further investigate these patterns, a conditional analysis is recommended to assess potential external drivers
Gini Index: 0.93
The Gini index measueres the inquality. A value of 0.89, that it is very close to 1, indicates inequality in the distribution. Indeed, the majority of the observations (181) are contained in the in the first three categories.
The probability that, taken a random row in this dataset, it will carry the city “Beaumont” can be calculated as follow:
tot_num_observations <- length(city)
tot_num_observations
## [1] 240
num_observations_Beaumont <- sum(city == "Beaumont")
num_observations_Beaumont
## [1] 60
Considering there are 4 cities and 240 observations and for each city there are 60 observations, the probability that, taken a random row in this dataset, it will carry the city “Beaumont” is:
probability_Beaumont <- num_observations_Beaumont/tot_num_observations
probability_Beaumont
## [1] 0.25
The probability that, taken a random row in this dataset, it will reports the month of July, can be calculated as follow:
tot_num_observations_month <- length(month)
tot_num_observations_month
## [1] 240
num_observations_July <- sum(month == 6)
num_observations_July
## [1] 20
The probability is:
probability_July <- num_observations_July/tot_num_observations_month
probability_July
## [1] 0.08333333
The probability that it reports the month of December 2012 can be calculated as follow:
num_observations_December_2012 <- sum(month == 12 & year == 2012)
num_observations_December_2012
## [1] 4
probability_december_2012 <- num_observations_December_2012/tot_num_observations_month
probability_december_2012
## [1] 0.01666667
# Create a new column for average price
realestate_texas$avg_price <- volume / sales
kable(head(realestate_texas,5))
| city | year | month | sales | volume | median_price | listings | months_inventory | avg_price |
|---|---|---|---|---|---|---|---|---|
| Beaumont | 2010 | 1 | 83 | 14.162 | 163800 | 1533 | 9.5 | 0.1706265 |
| Beaumont | 2010 | 2 | 108 | 17.690 | 138200 | 1586 | 10.0 | 0.1637963 |
| Beaumont | 2010 | 3 | 182 | 28.701 | 122400 | 1689 | 10.6 | 0.1576978 |
| Beaumont | 2010 | 4 | 200 | 26.819 | 123200 | 1708 | 10.6 | 0.1340950 |
| Beaumont | 2010 | 5 | 202 | 28.833 | 123100 | 1771 | 10.9 | 0.1427376 |
# Create a new column for ad effectiveness
realestate_texas$ad_effectiveness <- sales / listings
# View the first few rows to check the new columns
kable(head(realestate_texas,5))
| city | year | month | sales | volume | median_price | listings | months_inventory | avg_price | ad_effectiveness |
|---|---|---|---|---|---|---|---|---|---|
| Beaumont | 2010 | 1 | 83 | 14.162 | 163800 | 1533 | 9.5 | 0.1706265 | 0.0541422 |
| Beaumont | 2010 | 2 | 108 | 17.690 | 138200 | 1586 | 10.0 | 0.1637963 | 0.0680958 |
| Beaumont | 2010 | 3 | 182 | 28.701 | 122400 | 1689 | 10.6 | 0.1576978 | 0.1077561 |
| Beaumont | 2010 | 4 | 200 | 26.819 | 123200 | 1708 | 10.6 | 0.1340950 | 0.1170960 |
| Beaumont | 2010 | 5 | 202 | 28.833 | 123100 | 1771 | 10.9 | 0.1427376 | 0.1140599 |
round(summary(realestate_texas$ad_effectiveness),2)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05 0.09 0.11 0.12 0.13 0.39
Ad effectiveness
The calculated Sales-to-Listings
ratio provides insight into the effectiveness of listings in
converting to actual property sales. The values range from 0.05
(minimum) to 0.39 (maximum), indicating significant variation across
different periods or locations.
Median: 0.11
The median suggests that in a typical scenario, around 11% of listed properties are sold.
Mean: 0.12 The mean is slightly higher than the median, indicating a right-skewed distribution, where a few periods or markets exhibit higher listing effectiveness.
1st quartile: 0.09 3rd quartile: 0.13
The 1st quartile (0.09) and 3rd quartile show that 50% of observations fall between 9% and 13%, reinforcing that most of the time, a relatively small portion of listings translate into sales.
Max: 0.39
The maximum value (0.39) suggests that in some cases, nearly 39% of listings resulted in sales, possibly due to high demand, lower inventory, or seasonal effects.
Min: 0.05
The minimum ratio (0.05) implies that during certain times or in
specific markets, only 5% of listed properties were sold, which may
indicate an oversupply of listings or weaker demand.
The
effectiveness of advertise is generally low, with most values
clustering below 15%, meaning a large portion of properties remain
unsold in a given period. This suggests potential challenges such as
oversupply, pricing mismatches, or seasonal fluctuations in buyer
demand. Further analysis could explore whether certain months, cities,
or price ranges exhibit better conversion rates, helping real estate
professionals refine their pricing and marketing strategies.
To gain deeper insight into the market dynamics, a conditional
analysis is necessary. The following sections will therefore focus
on:
In addition, the effectiveness of listings—calculated earlier as the
ratio of sales to listings—will be analyzed in more detail. This
analysis will provide a clearer understanding of how efficiently
different cities convert their property listings into actual sales,
offering valuable insights for market strategies and inventory
management.
##
##
## |city | total_sales| mean_sales_year| sd_sales_year|
## |:---------------------|-----------:|---------------:|-------------:|
## |Beaumont | 10643| 177.38| 41.48|
## |Bryan-College Station | 12358| 205.97| 84.98|
## |Tyler | 16185| 269.75| 61.96|
## |Wichita Falls | 6964| 116.07| 22.15|
# Create the boxplot using ggplot2
ggplot(realestate_texas, aes(x = city, y = sales, fill = city)) +
geom_boxplot() + # Create the boxplot
labs(title = "Boxplot of Sales by City",
x = "City",
y = "Sales") +
theme_minimal() + # Minimal theme for better clarity
theme(
axis.text.x = element_text(angle = 45, hjust = 1), # Rotate x-axis labels for readability
plot.title = element_text(hjust = 0.5) # Center the plot title
)
The boxplot highlights that Tyler is the city with the highest
number of properties sold, as indicated by its position in the upper
range of the plot. In addition to higher sales, Tyler stands out for its
market strength and stability, evidenced by its relatively
narrow interquartile range, suggesting consistent performance over
time.
On the other hand, Wichita Falls shows the lowest volume
of property sales, with its boxplot situated near the bottom of the
graph. Notably, the lower whisker drops below 100, indicating that in
some months, the number of sales was particularly low.
Bryan-College Station exhibits a very wide IQR, reflecting
high variability in monthly sales. This irregular pattern may be
linked to seasonal factors, such as the academic calendar or local
events, which could cause fluctuations in housing demand.
Lastly,
Beaumont presents a moderate sales volume compared to the other
cities. Its boxplot indicates a fairly consistent market, with a
smaller spread and fewer extreme values. This suggests a relatively
stable housing market without dramatic month-to-month changes in
property sales.
##
##
## | year|city | total_sales| mean_sales_year| sd_sales_year|
## |----:|:---------------------|-----------:|---------------:|-------------:|
## | 2010|Beaumont | 1874| 156.17| 36.92|
## | 2010|Bryan-College Station | 2011| 167.58| 70.75|
## | 2010|Tyler | 2730| 227.50| 48.98|
## | 2010|Wichita Falls | 1481| 123.42| 26.62|
## | 2011|Beaumont | 1728| 144.00| 22.66|
## | 2011|Bryan-College Station | 2009| 167.42| 62.19|
## | 2011|Tyler | 2866| 238.83| 49.62|
## | 2011|Wichita Falls | 1275| 106.25| 19.76|
## | 2012|Beaumont | 2063| 171.92| 28.39|
## | 2012|Bryan-College Station | 2361| 196.75| 74.28|
## | 2012|Tyler | 3162| 263.50| 46.40|
## | 2012|Wichita Falls | 1349| 112.42| 14.25|
## | 2013|Beaumont | 2414| 201.17| 37.73|
## | 2013|Bryan-College Station | 2854| 237.83| 95.85|
## | 2013|Tyler | 3449| 287.42| 53.05|
## | 2013|Wichita Falls | 1455| 121.25| 26.00|
## | 2014|Beaumont | 2564| 213.67| 36.49|
## | 2014|Bryan-College Station | 3123| 260.25| 86.69|
## | 2014|Tyler | 3978| 331.50| 56.85|
## | 2014|Wichita Falls | 1404| 117.00| 21.09|
# Create bar chart
ggplot(total_sales_year, aes(x = year, y = total_sales, fill = city)) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "Total Sales by Year and City",
x = NULL,
y = "Total Sales"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom",
plot.title = element_text(hjust = 0.5) # Center the title
)
# Create line chart with clean data labels
ggplot(total_sales_year, aes(x = year, y = total_sales, color = city, group = city)) +
geom_line(linewidth = 1) +
geom_point(size = 3) +
geom_text(
aes(label = round(total_sales, 0)),
position = position_dodge(width = 0.5),
vjust = -0.7,
size = 3,
show.legend = FALSE # Prevents labels from showing in legend
) +
labs(
x = NULL,
y = "Total Sales",
title = "Time Series – Total Sales per City and Year"
) +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom",
plot.title = element_text(hjust = 0.5)
)
To provide a clearer overview of annual property sales across
different cities, both a bar chart and a time series line
chart were used.
From both visualizations, it is evident that
Tyler city has the most active real estate market. The number of
properties sold in this city increased steadily over time, rising from
approximately 2,730 in 2010 to nearly 3,978 in 2014.
The cities of Bryan-College Station and Beaumont also experienced
a general upward trend in property sales throughout the period. However,
both saw a slight decline between 2010 and 2011 before resuming growth.
On the other hand, Wichita Falls appears to have the weakest
real estate market among the cities analyzed. In every year of the
dataset, the number of properties sold remained below 1,500. Notably,
there was a sharp decline from 1,481 in 2010 to 1,275 in 2011. Although
sales slightly improved afterward, they never returned to 2010 levels by
the end of 2014.
Both The time series and Stacked bar chart visualization of total
property sales per city from 2010 to 2014 reveals distinct patterns in
market behavior. Among all cities, Tyler consistently leads the
market, showing the highest number of properties sold across the
years. Notably, Tyler also displays a strong seasonal trend, with
sales peaking between April and August each year, suggesting
heightened market activity during spring and summer months.
Bryan-College Station, while showing generally lower sales than
Tyler, is marked by high volatility, potentially influenced by
external factors. This city also follows a similar seasonal pattern,
though with more irregular peaks.
Beaumont demonstrates a stable
and moderate sales trend, with visible but less pronounced seasonal
increases in the middle of the year. This stability may point to a more
consistent real estate demand throughout the year.
On the other
hand, Wichita Falls exhibits the lowest and flattest sales trend,
with limited seasonal fluctuation and consistently lower sales volume.
Its modest peaks during mid-year months are less defined compared to the
other cities, reflecting a relatively less dynamic housing market.
Overall, the plot highlights a recurring seasonal effect, where the
period from April to August consistently registers higher sales across
all cities, underlining the importance of these months in the Texas real
estate market.
## # A tibble: 4 × 3
## city mean_effectiveness sd_effectiveness
## <chr> <dbl> <dbl>
## 1 Beaumont 0.106 0.0267
## 2 Bryan-College Station 0.147 0.0729
## 3 Tyler 0.0935 0.0235
## 4 Wichita Falls 0.128 0.0247
The chart illustrates the mean advertisement effectiveness
(sales/listings) by city, along with the corresponding standard
deviations as error bars.
Bryan-College Station stands out with
the highest average ad effectiveness, suggesting that listings in
this city tend to convert into property sales more efficiently than in
other locations. The higher variability (large error bar) may
indicate fluctuating market responsiveness, possibly influenced by
seasonal or event-driven factors (e.g., university cycles).
Wichita Falls also shows relatively strong effectiveness, with
more consistency (smaller standard deviation) than Bryan-College
Station, implying steady performance from its listings.
Beaumont
and Tyler have lower average effectiveness. Interestingly, Tyler,
despite having the highest total sales (as shown in the time series and
boxplots), has one of the lowest ad effectiveness rates. This suggests
that while a large number of properties are being sold, the number of
listings is disproportionately high, possibly due to oversupply,
less targeted advertising, or listing redundancy.
##
##
## |city | tot_median_price| min_median_price| max_median_price| mean_median_price| sd_median_price|
## |:---------------------|----------------:|----------------:|----------------:|-----------------:|---------------:|
## |Beaumont | 7799300| 106700| 163800| 129988.3| 10104.993|
## |Bryan-College Station | 9449300| 140700| 180000| 157488.3| 8852.235|
## |Tyler | 8486500| 120600| 161600| 141441.7| 9336.538|
## |Wichita Falls | 6104600| 73800| 135300| 101743.3| 11320.034|
The aggregated table indicates that Bryan-College Station has
the highest median property prices among the four cities. This is
visually confirmed by the boxplot, where its box is positioned highest
on the y-axis and includes an outlier, suggesting the presence of
particularly high property values in some months. Tyler ranks second
in terms of price levels, with its median property prices generally
ranging between approximately $120,600 and $141,400. This city
shows a relatively tight interquartile range (IQR), implying price
consistency.
In contrast, Wichita Falls stands out as the city with the lowest property prices. Its distribution spans from around 73,800 to 135,300, although the upper bound is influenced by an outlier. Excluding that outlier, the general price range is noticeably lower than in the other cities, reflecting a more affordable housing market.
Beaumont occupies a middle position, with moderate prices and
a fairly symmetric distribution, suggesting stable property valuation
over time.
Considering the above analysis, Tyler emerges as the city
with the most dynamic market, both in terms of property sales and house
prices. However, advertising in this area appears to be less effective,
and sales trends follow a clear seasonal pattern, peaking in mid-spring
and summer. Texas Realty Insights should maintain focus on Tyler,
strategically intensifying listings during these high-activity
periods.
Bryan-College Station stands out with the highest average advertising effectiveness and the highest median property prices. Despite a high variability in market responsiveness—likely influenced by seasonal factors or local events—this city presents strong potential. It represents a promising opportunity for strategic investment by Texas Realty Insights.
Beaumont shows increasing property sales over the years and a stable median price range that, while not as high as Bryan-College Station or Tyler, remains above that of Wichita Falls. This indicates emerging potential, and the city may offer an attractive investment opportunity for Texas Realty Insights.
On the other hand, Wichita Falls reflects a less dynamic and less prosperous market. While it shows relatively strong advertising effectiveness, it consistently ranks lower in terms of both property sales and median prices. Texas Realty Insights may consider prioritizing investment in the other three cities, where growth and profitability appear more promising.