Data wrangling
limits<-cut(mieszkania$price_PLN,seq(350000,1350000,by=100000))
table1<-table(limits)
transform(table1)## limits Freq
## 1 (3.5e+05,4.5e+05] 9
## 2 (4.5e+05,5.5e+05] 21
## 3 (5.5e+05,6.5e+05] 33
## 4 (6.5e+05,7.5e+05] 36
## 5 (7.5e+05,8.5e+05] 31
## 6 (8.5e+05,9.5e+05] 36
## 7 (9.5e+05,1.05e+06] 21
## 8 (1.05e+06,1.15e+06] 10
## 9 (1.15e+06,1.25e+06] 2
## 10 (1.25e+06,1.35e+06] 1
TAI
## # classes Goodness of fit Tabular accuracy
## 10.0000000 0.9780872 0.8508467
Intervals used above turned out to be properly selected, what we can see thanks to jenks.tests mesure. Goodness of fit is very high, and that is ours aim, to not to lie with statistics.
Basic plots
From this chart above we can suspect that prices distribution is positively skewed.
ggplot2 plots
From this chart above we can see that most frequently occuring are the smallest flats
From chart above we can see that prices are highest for niski blok type of building, and the mean value is almost like median value.
Using facets
From chart above we can see that prices are highest in Biskupin District what for me was unexpected.
Descriptive statistics #1
Summary tables with ‘kable’
Ok, now we will finally summarize basic central tendency measures for prices by building types using kable packages.
| kamienica | niski blok | wiezowiec | |
|---|---|---|---|
| Prices in PLN | |||
| min | 415834 | 496390 | 359769 |
| max | 1230848 | 1277691 | 1090444 |
| mean (sd) | 770,333 \(\pm\) 184,388 | 815,577 \(\pm\) 176,390 | 705,729 \(\pm\) 182,503 |
| median (Q1, Q3) | 800,693 (647,756, 896,186) | 807,895 (692,926, 939,853) | 678,704 (555,798, 870,753) |
| skewness | 0 | 0.22 | 0.21 |
| kurtosis | -0.61 | -0.45 | -0.97 |
| Flat sizes | |||
| min | 17 | 17.4 | 17.4 |
| max | 87.5 | 87.7 | 85.7 |
| mean (sd) | 48.37 \(\pm\) 20.92 | 49.13 \(\pm\) 18.99 | 42.02 \(\pm\) 19.82 |
| median (Q1, Q3) | 46.20 (32.40, 62.50) | 49.60 (37.40, 61.30) | 41.85 (21.35, 54.83) |
| skewness | 0.13 | 0.15 | 0.36 |
| kurtosis | -1.09 | -0.7 | -1.03 |
| Rooms number | |||
| Less than 3 | 28 (45.90%) | 27 (42.86%) | 39 (51.32%) |
| More than 2 | 33 (54.10%) | 36 (57.14%) | 37 (48.68%) |
From the Prices in PLN part we can see that in general the lowest prices of flats are in wiezowce type of buildings. Interesting is that skewness of kamienica price distribution is perfectly symmetric, and that kurtosi of all three categories are platycurtic, it means that flat prices are very varied.
From the Flat sizes we can read that minimal area of flat is claustrophobic, it is hard to imagen that someone is willing to buy such flat. Second thought is that flat sizes in wiezowce are positively skewed. It means that most frequently occuring type of flat sizes are below median. So despite the fact that range of three categories concerning sizes are almost the same distribution is much different what can we see from median distribution.
Finally from rooms number part of table we can see that only in wiezowce most flats have less than 3 rooms.