DS Lab #1

Karol Flisikowski/Jan Kowalina

2021-03-29

Data wrangling

limits<-cut(mieszkania$price_PLN,seq(350000,1350000,by=100000))
table1<-table(limits)
transform(table1)
##                 limits Freq
## 1    (3.5e+05,4.5e+05]    9
## 2    (4.5e+05,5.5e+05]   21
## 3    (5.5e+05,6.5e+05]   33
## 4    (6.5e+05,7.5e+05]   36
## 5    (7.5e+05,8.5e+05]   31
## 6    (8.5e+05,9.5e+05]   36
## 7   (9.5e+05,1.05e+06]   21
## 8  (1.05e+06,1.15e+06]   10
## 9  (1.15e+06,1.25e+06]    2
## 10 (1.25e+06,1.35e+06]    1

TAI

##        # classes  Goodness of fit Tabular accuracy 
##       10.0000000        0.9780872        0.8508467

Intervals used above turned out to be properly selected, what we can see thanks to jenks.tests mesure. Goodness of fit is very high, and that is ours aim, to not to lie with statistics.

Basic plots

From this chart above we can suspect that prices distribution is positively skewed.

ggplot2 plots

From this chart above we can see that most frequently occuring are the smallest flats

From chart above we can see that prices are highest for niski blok type of building, and the mean value is almost like median value.

Using facets

From chart above we can see that prices are highest in Biskupin District what for me was unexpected.

Descriptive statistics #1

Summary tables with ‘kable’

Ok, now we will finally summarize basic central tendency measures for prices by building types using kable packages.

Summary statistics table about mieszkania data frame
kamienica niski blok wiezowiec
Prices in PLN
min 415834 496390 359769
max 1230848 1277691 1090444
mean (sd) 770,333 \(\pm\) 184,388 815,577 \(\pm\) 176,390 705,729 \(\pm\) 182,503
median (Q1, Q3) 800,693 (647,756, 896,186) 807,895 (692,926, 939,853) 678,704 (555,798, 870,753)
skewness 0 0.22 0.21
kurtosis -0.61 -0.45 -0.97
Flat sizes
min 17 17.4 17.4
max 87.5 87.7 85.7
mean (sd) 48.37 \(\pm\) 20.92 49.13 \(\pm\) 18.99 42.02 \(\pm\) 19.82
median (Q1, Q3) 46.20 (32.40, 62.50) 49.60 (37.40, 61.30) 41.85 (21.35, 54.83)
skewness 0.13 0.15 0.36
kurtosis -1.09 -0.7 -1.03
Rooms number
Less than 3 28 (45.90%) 27 (42.86%) 39 (51.32%)
More than 2 33 (54.10%) 36 (57.14%) 37 (48.68%)

From the Prices in PLN part we can see that in general the lowest prices of flats are in wiezowce type of buildings. Interesting is that skewness of kamienica price distribution is perfectly symmetric, and that kurtosi of all three categories are platycurtic, it means that flat prices are very varied.

From the Flat sizes we can read that minimal area of flat is claustrophobic, it is hard to imagen that someone is willing to buy such flat. Second thought is that flat sizes in wiezowce are positively skewed. It means that most frequently occuring type of flat sizes are below median. So despite the fact that range of three categories concerning sizes are almost the same distribution is much different what can we see from median distribution.

Finally from rooms number part of table we can see that only in wiezowce most flats have less than 3 rooms.