| city | year | month | sales | volume | median_price | listings | months_inventory |
|---|---|---|---|---|---|---|---|
| Beaumont | 2010 | 1 | 83 | 14.162 | 163800 | 1533 | 9.5 |
| Beaumont | 2010 | 2 | 108 | 17.690 | 138200 | 1586 | 10.0 |
| Beaumont | 2010 | 3 | 182 | 28.701 | 122400 | 1689 | 10.6 |
| Beaumont | 2010 | 4 | 200 | 26.819 | 123200 | 1708 | 10.6 |
| Beaumont | 2010 | 5 | 202 | 28.833 | 123100 | 1771 | 10.9 |
| n | f | |
|---|---|---|
| Beaumont | 60 | 0.25 |
| Bryan-College Station | 60 | 0.25 |
| Tyler | 60 | 0.25 |
| Wichita Falls | 60 | 0.25 |
| n | f | |
|---|---|---|
| 2010 | 48 | 0.2 |
| 2011 | 48 | 0.2 |
| 2012 | 48 | 0.2 |
| 2013 | 48 | 0.2 |
| 2014 | 48 | 0.2 |
From the frequency distribution is possible to see that the dataset is evenly distributed in the different city and in the different year with 60 row per city and 48 for every year.
Below is possible to have an overview of the positional indexes.
| volume | sales | median_price | listings | months_inventory | |
|---|---|---|---|---|---|
| Min. : 8.166 | Min. : 79.0 | Min. : 73800 | Min. : 743 | Min. : 3.400 | |
| 1st Qu.:17.660 | 1st Qu.:127.0 | 1st Qu.:117300 | 1st Qu.:1026 | 1st Qu.: 7.800 | |
| Median :27.062 | Median :175.5 | Median :134500 | Median :1618 | Median : 8.950 | |
| Mean :31.005 | Mean :192.3 | Mean :132665 | Mean :1738 | Mean : 9.193 | |
| 3rd Qu.:40.893 | 3rd Qu.:247.0 | 3rd Qu.:150050 | 3rd Qu.:2056 | 3rd Qu.:10.950 | |
| Max. :83.547 | Max. :423.0 | Max. :180000 | Max. :3296 | Max. :14.900 |
Grouped by City
| city | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| Beaumont | 177.38 | 176.5 | 273 | 83 | 10643 |
| Bryan-College Station | 205.97 | 186.5 | 403 | 89 | 12358 |
| Tyler | 269.75 | 271.0 | 423 | 143 | 16185 |
| Wichita Falls | 116.07 | 114.5 | 167 | 79 | 6964 |
| city | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| Beaumont | 26.13 | 25.6185 | 42.028 | 13.496 | 1567.896 |
| Bryan-College Station | 38.19 | 33.5925 | 83.547 | 15.151 | 2291.496 |
| Tyler | 45.77 | 45.0835 | 80.814 | 21.050 | 2746.043 |
| Wichita Falls | 13.93 | 13.7110 | 20.881 | 8.166 | 835.810 |
| city | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| Beaumont | 129988.3 | 130750 | 163800 | 106700 | 7799300 |
| Bryan-College Station | 157488.3 | 155400 | 180000 | 140700 | 9449300 |
| Tyler | 141441.7 | 142200 | 161600 | 120600 | 8486500 |
| Wichita Falls | 101743.3 | 102300 | 135300 | 73800 | 6104600 |
It’s possible to see above that Wichita Falls has the lowest sales by at least 30% and Tyler has the highest sales in all of the value (max, min, sum, mean and median). Beaumont and Bryan-College Station are in the middle with similar value, with Bryan-College Station with greater value than Beaumont. In terms of volume the same distribution appear with Wichita Falls at the last place, with the lowest market and the less dinamic one. Instead on the Median Price Level Bryan-College Station is at first place with the highest market value, but not the most dinamic one as seen in the field above.
Grouped by Year
| year | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| 2010 | 168.67 | 162.0 | 316 | 83 | 8096 |
| 2011 | 164.12 | 144.5 | 313 | 79 | 7878 |
| 2012 | 186.15 | 171.0 | 322 | 90 | 8935 |
| 2013 | 211.92 | 193.5 | 402 | 79 | 10172 |
| 2014 | 230.60 | 215.0 | 423 | 89 | 11069 |
| year | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| 2010 | 25.68 | 23.6840 | 49.914 | 8.951 | 1232.443 |
| 2011 | 25.16 | 23.2400 | 52.319 | 8.166 | 1207.575 |
| 2012 | 29.27 | 25.8475 | 57.388 | 9.695 | 1404.843 |
| 2013 | 35.15 | 32.6760 | 76.116 | 9.666 | 1687.315 |
| 2014 | 39.77 | 36.8275 | 83.547 | 9.626 | 1909.069 |
| year | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| 2010 | 130191.7 | 133050 | 165300 | 86400 | 6249200 |
| 2011 | 127854.2 | 131650 | 157300 | 73800 | 6137000 |
| 2012 | 130077.1 | 134200 | 170000 | 82100 | 6243700 |
| 2013 | 135722.9 | 140250 | 167300 | 85900 | 6514700 |
| 2014 | 139481.2 | 142300 | 180000 | 90000 | 6695100 |
From 2010 and 2014 sales, volume and median_price increase year by year with the only drop in 2011. for sales and volume mean and median values tells us that the distribution has always outliers in the highest side of the curve, infact the mean value is greater than the median. Opposite behavior can be seen for the median_price.
Grouped by month
| month | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| 1 | 127.40 | 112.5 | 238 | 79 | 2548 |
| 2 | 140.85 | 124.5 | 244 | 79 | 2817 |
| 3 | 189.45 | 175.5 | 298 | 102 | 3789 |
| 4 | 211.70 | 199.0 | 323 | 111 | 4234 |
| 5 | 238.85 | 246.0 | 388 | 102 | 4777 |
| 6 | 243.55 | 258.0 | 423 | 111 | 4871 |
| 7 | 235.75 | 209.0 | 403 | 104 | 4715 |
| 8 | 231.45 | 228.0 | 357 | 123 | 4629 |
| 9 | 182.35 | 165.5 | 361 | 95 | 3647 |
| 10 | 179.90 | 163.5 | 369 | 97 | 3598 |
| 11 | 156.85 | 157.0 | 300 | 93 | 3137 |
| 12 | 169.40 | 155.5 | 332 | 81 | 3388 |
| month | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| 1 | 19.00 | 17.1385 | 36.916 | 8.166 | 380.015 |
| 2 | 21.65 | 19.3315 | 42.553 | 8.747 | 433.030 |
| 3 | 29.38 | 26.7675 | 50.404 | 13.104 | 587.694 |
| 4 | 33.30 | 30.9325 | 60.581 | 13.524 | 666.089 |
| 5 | 39.70 | 38.7930 | 71.456 | 12.451 | 794.042 |
| 6 | 41.30 | 41.2580 | 80.814 | 13.031 | 826.063 |
| 7 | 39.12 | 32.8395 | 83.547 | 12.355 | 782.438 |
| 8 | 38.01 | 36.8970 | 67.244 | 14.003 | 760.283 |
| 9 | 29.60 | 26.2430 | 68.744 | 11.792 | 591.983 |
| 10 | 29.08 | 26.5180 | 65.316 | 9.507 | 581.572 |
| 11 | 24.81 | 24.3195 | 52.314 | 10.844 | 496.143 |
| 12 | 27.09 | 25.2870 | 61.032 | 9.400 | 541.893 |
| month | Mean | Median | Max | Min | Sum |
|---|---|---|---|---|---|
| 1 | 124250 | 128400 | 163800 | 82100 | 2485000 |
| 2 | 130075 | 132000 | 168500 | 89400 | 2601500 |
| 3 | 127415 | 129800 | 161000 | 85900 | 2548300 |
| 4 | 131490 | 132300 | 169500 | 92200 | 2629800 |
| 5 | 134485 | 135300 | 165200 | 97500 | 2689700 |
| 6 | 137620 | 137350 | 169600 | 95000 | 2752400 |
| 7 | 134750 | 137450 | 172600 | 96700 | 2695000 |
| 8 | 136675 | 143850 | 172200 | 96000 | 2733500 |
| 9 | 134040 | 133100 | 180000 | 90000 | 2680800 |
| 10 | 133480 | 136150 | 176100 | 73800 | 2669600 |
| 11 | 134305 | 140050 | 172800 | 86400 | 2686100 |
| 12 | 133400 | 134250 | 177300 | 87500 | 2668000 |
During the year we can assert that the distribution of the sales, volume and median_price has a slow start with the lowest values in the first month of the year a peak in June and a descent after with a little bump in December. median_price and volume also has a bump in August, maybe caused by the end of the summer.
Below it’s possible to find a summary of the variable indexes for the columns for which they can be evaluated:
| volume | sales | median_price | listings | months_inventory | |
|---|---|---|---|---|---|
| Range | 75.38 | 344.00 | 106200.00 | 2553.00 | 11.50 |
| Interquartile Range | 23.23 | 120.00 | 32750.00 | 1029.50 | 3.15 |
| Variance | 277.27 | 6344.30 | 513572983.09 | 566568.97 | 5.31 |
| Standard Deviation | 16.65 | 79.65 | 22662.15 | 752.71 | 2.30 |
| Variance Coefficient | 53.71 | 41.42 | 17.08 | 43.31 | 25.06 |
| Gini Index | 1.00 | 1.00 | 1.00 | 1.00 | 0.99 |
“Volume” has the most variance coefficient, so has the most variability, instead the most stable variable is “median_price”. From the Gini index can be seen that for all of the columns the index is equal to one (or very close to one), so the distribution is uniform for all of the columns.
Grouped by city:
| city | Range | IQR | Standard_Deviation | Variance | VC |
|---|---|---|---|---|---|
| Beaumont | 190 | 52.00 | 41.48 | 1720.92 | 23.39 |
| Bryan-College Station | 314 | 147.75 | 84.98 | 7222.24 | 41.26 |
| Tyler | 280 | 86.75 | 61.96 | 3839.51 | 22.97 |
| Wichita Falls | 88 | 33.00 | 22.15 | 490.71 | 19.09 |
Grouping by city “sales” tells has that Wichita Falls has the most stable data; Bryan-College Station instead is the most unstable city. Tyler and Beaumont has very similar stability. Throungh standard deviation and interquartile range it’s possible to list the city from the most stable to the most unstable in term of number of sales and it’s: Wichita, Tyler, Beaumont, Bryan-College.
It’s also possible to study this stability in term of volume to have a better view on the stability, because maybe an house in Bryan-College create more volume than one in Wichita.
| city | Range | IQR | Standard_Deviation | Variance | VC |
|---|---|---|---|---|---|
| Beaumont | 28.53 | 8.21850 | 6.97 | 48.59 | 26.67 |
| Bryan-College Station | 68.40 | 23.71700 | 17.25 | 297.51 | 45.16 |
| Tyler | 59.76 | 17.57025 | 13.11 | 171.80 | 28.64 |
| Wichita Falls | 12.71 | 4.80175 | 3.24 | 10.50 | 23.26 |
In terms of volume Beaumont and Tyler switch position, infact can be seen that Beaumont it’s more stable than Tyler. So from a strategic perspective Beaumont it’s better if the goal is to create more value on the total sales of the company.
Grouped by year
| year | Range | IQR | Standard_Deviation | Variance | VC |
|---|---|---|---|---|---|
| 2010 | 233 | 81.25 | 60.54 | 3664.74 | 35.89 |
| 2011 | 234 | 86.75 | 63.87 | 4079.43 | 38.92 |
| 2012 | 232 | 114.00 | 70.91 | 5027.53 | 38.09 |
| 2013 | 323 | 124.50 | 84.00 | 7055.40 | 39.64 |
| 2014 | 334 | 151.25 | 95.51 | 9123.10 | 41.42 |
From a sales perspective 2011, 2012 and 2013 were very similar year. 2014, instead, has the most unstable month, with a Variance coefficient greater than the other years.
| year | Range | IQR | Standard_Deviation | Variance | CV |
|---|---|---|---|---|---|
| 2010 | 40.96 | 14.43950 | 10.80 | 116.53 | 42.04 |
| 2011 | 44.15 | 17.15750 | 12.20 | 148.93 | 48.51 |
| 2012 | 47.69 | 20.02350 | 14.52 | 210.91 | 49.62 |
| 2013 | 66.45 | 30.75825 | 17.93 | 321.65 | 51.02 |
| 2014 | 73.92 | 37.32250 | 21.19 | 448.86 | 53.27 |
Also from a “volume” view can be seen the same trend, with 2014 as the most unstable year.
This historical trend can be seen also in terms of company growth, because we know that from 2010 to 2014 the company increase the amount of sales and volume (with the exception of 2011), so more instability in the year mean simply mean selling more house in some month due to the growth of the company.
| volume | sales | median_price | listings | months_inventory | |
|---|---|---|---|---|---|
| Fischer Index | 0.88 | 0.72 | -0.36 | 0.65 | 0.04 |
| Kurtosis Index | 3.18 | 2.69 | 2.38 | 2.21 | 2.83 |
Fischer: “median_price” has a negative asymmetric distibution, meanwhile “listings”, “sales” and “volume” has a positive asymmetric distibution with mean greater than median greater than mode. “months_inventory” has the most simmetric distribution with a Fischer index almost equal to zero.
Kurtosis index: “volume” is the only one with a value greater than 3 so his distribution is leptokutic. Instead the other variable has a value smaller than 3 so they are platikurtic with a wider look than the normal distribution. “months_inventory”, as expected, is the most similar to a normal distribution.
The variability can be define through the variance coefficient, because is a normalize parameter that let compare different variable with differen scale, so the variable with the highest variance coefficient has the most variability. Here a list of the variable from the one with the most variability (greatest variance coefficient) to the most stable (smaller variance coefficient):
Here a list of the variable from the most sharp (greater kurtosis index) to the most flat (smaller kurtosis index):
Here a list of the variable with the most positve asymmetrical distribution (greater Fischer Index), so that the peak of the distribution is on the right side of the distribution, to the negative asymettrical distribution (smaller Fischer Index), with the peak of the distribution to the left side:
| Volume Class | Count |
|---|---|
| 0-20 Milion | 78 |
| 20-35 Milion | 81 |
| 35-50 Milion | 45 |
| 50+ Milion | 36 |
The distribution of the class is towards the lower class, in fact both first range have almost the same count than the last two together. From the bar graph it’s possible to see this behavior. The Gini index is equal to 0.96 very close to 1 and tells us that the distribution is very heterogeneous, in fact there isn’t one single class that has all the values.
Here an example of a class distribution that correspond to a Gini index close to 0
The probability is calculated as number of rows with Beaumont as city divided by total number of rows. From the table below it’s possible to calculate with the same process even more, in fact due to the fact the the number of rows by city is equal for every city evaluating the probability for one city it’s the same for every city (and it’s going to be equal to 25%)
| City | Count |
|---|---|
| Beaumont | 60 |
| Bryan-College Station | 60 |
| Tyler | 60 |
| Wichita Falls | 60 |
| City | Count | Probability |
|---|---|---|
| Beaumont | 60 | 25 |
| Bryan-College Station | 60 | 25 |
| Tyler | 60 | 25 |
| Wichita Falls | 60 | 25 |
| # Month | Count |
|---|---|
| April | 20 |
| August | 20 |
| December | 20 |
| February | 20 |
| January | 20 |
| July | 20 |
| June | 20 |
| March | 20 |
| May | 20 |
| November | 20 |
| October | 20 |
| September | 20 |
It’s possible to apply the same logic as before:
| # Month | Count | Probability |
|---|---|---|
| April | 20 | 8.33 |
| August | 20 | 8.33 |
| December | 20 | 8.33 |
| February | 20 | 8.33 |
| January | 20 | 8.33 |
| July | 20 | 8.33 |
| June | 20 | 8.33 |
| March | 20 | 8.33 |
| May | 20 | 8.33 |
| November | 20 | 8.33 |
| October | 20 | 8.33 |
| September | 20 | 8.33 |
| 2010 | 2011 | 2012 | 2013 | 2014 | |
|---|---|---|---|---|---|
| April | 4 | 4 | 4 | 4 | 4 |
| August | 4 | 4 | 4 | 4 | 4 |
| December | 4 | 4 | 4 | 4 | 4 |
| February | 4 | 4 | 4 | 4 | 4 |
| January | 4 | 4 | 4 | 4 | 4 |
| July | 4 | 4 | 4 | 4 | 4 |
| June | 4 | 4 | 4 | 4 | 4 |
| March | 4 | 4 | 4 | 4 | 4 |
| May | 4 | 4 | 4 | 4 | 4 |
| November | 4 | 4 | 4 | 4 | 4 |
| October | 4 | 4 | 4 | 4 | 4 |
| September | 4 | 4 | 4 | 4 | 4 |
The same logic can be apply as before. So it will be 4 divided by 240.
| The probability for every Year-Month combination is equal to 1.67 % |
So summarizing:
The mean price can be evaluated through “sales”, that is the number of sales done and “volume” that is the total value of the sales in milions of dollar. The result will be converted in the same scale of the “median_price”
Let’s visualize some Mean Price:
| city | Mean | Sum | Max | Min |
|---|---|---|---|---|
| Beaumont | 146640 | 8798426 | 174724 | 120713 |
| Bryan-College Station | 183534 | 11012058 | 213234 | 151816 |
| Tyler | 167677 | 10060604 | 193787 | 143464 |
| Wichita Falls | 119430 | 7165803 | 148775 | 97010 |
Bryan-College has the highest value, so we can assert that it’s an high value neighborhood. The opposite can be said for Wichita Falls that has the smallest values.
| year | Mean | Sum | Max | Min |
|---|---|---|---|---|
| 2010 | 150189 | 7209054 | 201000 | 100573 |
| 2011 | 148251 | 7116026 | 195387 | 97010 |
| 2012 | 150899 | 7243139 | 191798 | 97848 |
| 2013 | 158705 | 7617852 | 200085 | 105126 |
| 2014 | 163559 | 7850820 | 213234 | 108157 |
The historical trend confirm what was found before. The market increase it’s value, with the only drop in 2011.
Listing efficiency can be evaluate, for example, relating number of sales and number of listings.
Dividing the number of sales by number of listing we will have a number of sales for every listings for that specific row. This efficiency can be also normalize to have a number between 0 and 1 where 1 correspond with the max efficiency and 0 correspond to the opposite.
| city | year | month | sales | volume | median_price | listings | months_inventory | vol_class | vol_class_new | month_name | mean_price | list_efficiency | list_efficiency_norm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 115 | Bryan-College Station | 2014 | 7 | 403 | 83.547 | 172600 | 1041 | 4.1 | 50+ Milion | 80+ Milion | July | 207313 | 0.3871278 | 1 |
| city | year | month | sales | volume | median_price | listings | months_inventory | vol_class | vol_class_new | month_name | mean_price | list_efficiency | list_efficiency_norm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 133 | Tyler | 2011 | 1 | 143 | 21.05 | 120600 | 2852 | 12.6 | 20-35 Milion | 9-70 Milion | January | 147203 | 0.0501403 | 0 |
The same logic can be applied to “volume”, where the efficiency will be equal to the milions of dollar generated per listing.
| city | year | month | sales | volume | median_price | listings | months_inventory | vol_class | vol_class_new | month_name | mean_price | list_efficiency | list_efficiency_norm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 115 | Bryan-College Station | 2014 | 7 | 403 | 83.547 | 172600 | 1041 | 4.1 | 50+ Milion | 80+ Milion | July | 207313 | 0.0802565 | 1 |
| city | year | month | sales | volume | median_price | listings | months_inventory | vol_class | vol_class_new | month_name | mean_price | list_efficiency | list_efficiency_norm | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 133 | Tyler | 2011 | 1 | 143 | 21.05 | 120600 | 2852 | 12.6 | 20-35 Milion | 9-70 Milion | January | 147203 | 0.0073808 | 0 |
The two way to evaluate the efficiency return the same rows, confirming that efficiency of the listing was at it’s peak and it’s bottom in the two scenarios describe by the two rows.
The minimum value happen in 2011, very consistently with the fact that 2011 was a bad year for the house market, and in January as seen, confirming that the start of the year correspond to worst month of the year. The maximum efficiency correspond to Bryan-College, confirming the high value neighborhood, during 2014, as expected due to the increase of the market during the years, and in July, proving the fact that June and July are the best month of the year for the house market.
With the same logic as before it’s possible to do some conditional analysis grouping by city, year and month.
Mean and Median price by city
The boxplot reflect what we have seen before: - Wichita has the lowest values. In median price term is also the most variable city. - Beaumont and Tyler are in between of Wichita and Bryan-College with Tyler that has highest value, but is more unstable with a larger range than Beaumont. - Bryan-College has the highest values, but in terms of mean price is the most variable.
Total Sales by city
As seen before Tyler has the greater number of sales, follow by Bryan-College and Beaumont. Wichita Falls is at the bottom with almost half the sales of the second to last city in term of number of sales.
Total Sales by time period
The two bar chart above confirm the historical trend analyze before: - In terms of year only 2011 falls behind the year before, otherwise every year has more sales than the one before. - In terms of month June has the highest number of sales, january and December are at the bottom, with a bell-shaped trend during the year.
Volume
As expected Bryan-College and Tyler are the two best city. Tyler is more consistent from in the end of the year, with very high values in relation with the other city. Beaumont is very consistent during the year, opposite of that Bryan-College is the most variable with the highest peak and a lowest point similar to Beaumont. Wichita Falls confirm the low value market during the entire year.
The same behavior can be seen in this chart.
To deep dive, is also possible to see use also the year to study the difference between the city during the month.
The has the same behavior seen above and is very clear that during the 5 year Wichita remains at the same level, instead Tyler is the city with the greatest growth.
Volume Historical Trend
The line charts confirm the historical trends seen before, also from the charts it’s possible to visualize the relation betweenn “sales” and “volume”.
Historical trends by city
From here is very clear the growth done by Tyler, that is almost above Bryan-College in the last timeframe. It’s also very clear that Wichita is a very stable market, but it’s not a good investment in terms of growth for the company.
The same trend can be seen in terms of number of sales
From the analysis can be deducted various things: