DS Lab #1

Karol Flisikowski

2021-03-30

Data wrangling

As you can see not all formats of our variables are adjusted. We need to prepare the appropriate formats of our variables according to their measurement scales and future usage.

mieszkania$district<-as.factor(mieszkania$district)
mieszkania$building_type<-as.factor(mieszkania$building_type)
mieszkania$rooms<-factor(mieszkania$rooms,ordered=TRUE)
attach(mieszkania)
mieszkania$price_PLN<-as.numeric(mieszkania$price_PLN)
mieszkania$price_PLN<-as.numeric(mieszkania$price_EUR)

Frequency table

Range Frequency
(350000,450000] 9
(450000,550000] 21
(550000,650000] 33
(650000,750000] 36
(750000,850000] 31
(850000,950000] 36
(950000,1050000] 21
(1050000,1150000] 10
(1150000,1250000] 2
(1250000,1350000] 1

TAI

##        # classes  Goodness of fit Tabular accuracy 
##       10.0000000        0.9780872        0.8508467

Basic plots

In this section we should present our data using basic (pre-installed with R) graphics. Choose the most appropriate plots according to the scale of chosen variables. Investigate the heterogeneity of the distribution presenting data by groups (i.e. by district, building type etc.). Do not forget about main titles, labels and legend.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Conclusions

From the histogram we can see that we can buy majority of flats in prices between 600k and 1kk, while flats expensive more than 1.1kk are seen very rarely.
From boxplot we see that apartments in Biskupin are pretty expensive in comparision to other districts, while in Krzyki flats are the cheapest.

ggplot2 plots

Conclusions

The results aren’t surprising at all - the more rooms an apartment has, the more expensive it is.

Using facets

Faceting generates small multiples each showing a different subset of the data. Small multiples are a powerful tool for exploratory data analysis: you can rapidly compare patterns in different parts of the data and see whether they are the same or different. Read more about facets here.

Conclusions

In Krzyki the number of 4-rooms apartments is the biggest, however they’re the most expensive in Biskupin.

Descriptive statistics #1

Before automatically reporting the full summary table of descriptive statistics, this time your goal is to measure the central tendency of the distribution of prices. Compare mean, median and mode together with positional measures - quantiles - by districts and building types or no. of rooms per apartment.

Conclusions

While Biskupin is the most expensive district, Srodmiescie nad Krzyki are almost the same, expect the range of the prices, which is significantly bigger in Krzyki.

Summary tables with ‘kable’

Using kable and kableextra packages we can easily create summary tables with graphics and/or statistics.

rooms boxplot histogram line1 line2 points1
1
2
3
4

Ok, now we will finally summarize basic central tendency measures for prices by districts/building types using kable packages. You can customize your final report. See some hints here.

kamienica (N = 61) niski blok (N = 63) wiezowiec (N = 76)
Prices:
min price 96258 114905 83280
median price 185346 187013 157107.5
max price 284918 295762 252418
mean (sd) 178,317.70 \(\pm\) 42,682.42 188,790.90 \(\pm\) 40,831.08 163,363.18 \(\pm\) 42,246.11
kurtosis -0.61 -0.45 -0.97
skewness 0 0.22 0.21
Size:
min size 17 17.4 17.4
median size 46.2 49.6 41.85
max size 87.5 87.7 85.7
mean (sd) 48.37 \(\pm\) 20.92 49.13 \(\pm\) 18.99 42.02 \(\pm\) 19.82
kurtosis -1.09 -0.7 -1.03
skewness 0.13 0.15 0.36
Rooms:
One room 11 (18.03%) 9 (14.29%) 24 (31.58%)
Two rooms 17 (27.87%) 18 (28.57%) 15 (19.74%)
Three rooms 17 (27.87%) 20 (31.75%) 21 (27.63%)
Four rooms 16 (26.23%) 16 (25.40%) 16 (21.05%)

Conclusions

The apartments in niskie bloki are the most expensive, while in wieżowce are the cheapest. It may come from difference in sizes between these types of buildings.
While in niskie bloki and kamienice it’s hard to get one-room apartment, in wieżowce most of them are one-room. The biggest contribiution of 4-room flats can be observed in kamienice.