DS Lab #1

Daria Skarbek 184869

2021-03-24

Data wrangling

As you can see not all formats of our variables are adjusted. We need to prepare the appropriate formats of our variables according to their measurement scales and future usage.

mieszkania$district<-as.factor(mieszkania$district)
mieszkania$building_type<-as.factor(mieszkania$building_type)
#mieszkania$rooms<-factor(mieszkania$rooms,ordered=TRUE)
mieszkania$price_PLN<-as.numeric(mieszkania$price_PLN)
mieszkania$price_EUR<-as.numeric(mieszkania$price_EUR)
attach(mieszkania)

Frequency table

##           Price.in.PLN Number.of.flats Proportion
## 1    (3.5e+05,4.5e+05]               9      0.045
## 2    (4.5e+05,5.5e+05]              21      0.105
## 3    (5.5e+05,6.5e+05]              33      0.165
## 4    (6.5e+05,7.5e+05]              36      0.180
## 5    (7.5e+05,8.5e+05]              31      0.155
## 6    (8.5e+05,9.5e+05]              36      0.180
## 7   (9.5e+05,1.05e+06]              21      0.105
## 8  (1.05e+06,1.15e+06]              10      0.050
## 9  (1.15e+06,1.25e+06]               2      0.010
## 10 (1.25e+06,1.35e+06]               1      0.005

TAI

##        # classes  Goodness of fit Tabular accuracy 
##       10.0000000        0.9780872        0.8508467

Basic plots

In this section we should present our data using basic (pre-installed with R) graphics. Choose the most appropriate plots according to the scale of chosen variables. Investigate the heterogeneity of the distribution presenting data by groups (i.e. by district, building type etc.). Do not forget about main titles, labels and legend.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

ggplot2 plots

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Using facets

Faceting generates small multiples each showing a different subset of the data. Small multiples are a powerful tool for exploratory data analysis: you can rapidly compare patterns in different parts of the data and see whether they are the same or different. Read more about facets here.

## `summarise()` has grouped output by 'district', 'rooms'. You can override using the `.groups` argument.

Descriptive statistics #1

Before automatically reporting the full summary table of descriptive statistics, this time your goal is to measure the central tendency of the distribution of prices. Compare mean, median and mode together with positional measures - quantiles - by districts and building types or no. of rooms per apartment.

Summary tables with ‘kable’

Using kable and kableextra packages we can easily create summary tables with graphics and/or statistics.

rooms boxplot histogram line1 line2 points1
1
2
3
4

Ok, now we will finally summarize basic central tendency measures for prices by districts/building types using kable packages. You can customize your final report. See some hints here.

Flats in Wroclaw
Biskupin Krzyki Srodmiescie
Min 519652 359769 448196
Max 1277691 1090444 1062054
Mean 818614 726507 739340
Sd 175598 195015 171428
IQR 249723 276126 278465
Q1 676751 600180.5 592287.75
Median 817736 716726 727477.5
Q3 926474 876306.5 870752.5
Min Size 17.1 17.4 17
Max Size 87.7 86.6 83.3
Mean Size 47.05 \(\pm\) 19.57 46.86 \(\pm\) 20.95 44.27 \(\pm\) 19.63
Flats in Wroclaw
1 room 2 rooms 3 rooms 4 rooms
Min 359769 590286 632770 736669
Max 657146 888634 965829 1277691
Mean 515518 683568 833706 974810
Sd 66951 65073 86944 113819
IQR 75340 82971 131395 141605
Q1 479684.75 634757.25 769683.75 909371.5
Median 520507 677260 846303.5 964338.5
Q3 555024.75 717728.5 901078.75 1050976.75
Min Size 17 29.6 41.2 53.3
Max Size 21.9 43.7 65.2 87.7
Mean Size 19.28 \(\pm\) 1.46 36.80 \(\pm\) 4.46 53.33 \(\pm\) 7.21 72.05 \(\pm\) 10.18
Flats in Wroclaw
Kamienica Wiezowiec Niski blok
Min 415834 496390 359769
Max 1230848 1277691 1090444
Mean 770333 815577 705729
Sd 184388 176390 182503
IQR 248430 246927 314954
Q1 647756 692925.5 555798.25
Median 800693 807895 678704
Q3 896186 939852.5 870752.5
Min Size 17 17.4 17.4
Max Size 87.5 87.7 85.7
Mean Size 48.37 \(\pm\) 20.92 49.13 \(\pm\) 18.99 42.02 \(\pm\) 19.82