Data wrangling
As you can see not all formats of our variables are adjusted. We need to prepare the appropriate formats of our variables according to their measurement scales and future usage.
mieszkania$district<-as.factor(mieszkania$district)
mieszkania$building_type<-as.factor(mieszkania$building_type)
mieszkania$price_PLN<-as.numeric(mieszkania$price_PLN)
mieszkania$price_EUR<-as.numeric(mieszkania$price_EUR)Frequency table
## Price.in.PLN Number.of.flats Proportion
## 1 (3.5e+05,4.5e+05] 9 0.045
## 2 (4.5e+05,5.5e+05] 21 0.105
## 3 (5.5e+05,6.5e+05] 33 0.165
## 4 (6.5e+05,7.5e+05] 36 0.180
## 5 (7.5e+05,8.5e+05] 31 0.155
## 6 (8.5e+05,9.5e+05] 36 0.180
## 7 (9.5e+05,1.05e+06] 21 0.105
## 8 (1.05e+06,1.15e+06] 10 0.050
## 9 (1.15e+06,1.25e+06] 2 0.010
## 10 (1.25e+06,1.35e+06] 1 0.005
TAI
## # classes Goodness of fit Tabular accuracy
## 10.0000000 0.9780872 0.8508467
Basic plots
In this section we should present our data using basic (pre-installed with R) graphics. Choose the most appropriate plots according to the scale of chosen variables. Investigate the heterogeneity of the distribution presenting data by groups (i.e. by district, building type etc.). Do not forget about main titles, labels and legend.
First plot is a histogram. It shows diversity of prices of flats in Wroclaw - we can see values from 300 000 up to 1 300 000. Vast majority of flats cost around 700-900 thousands PLN. Density lines show distribution of prices in seperate districts of Wroclaw. They show similar pattern, however Biskupin notates an increase around 900 000 comparing to other districts.
Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.
Boxplot of prices in each district of Wroclaw shows more accurate distribution of prices. It’s easier to realize the prices shown in each district seperately, hovewer histogram is better in showing general pattern. Mean price of flats is very similar, same as median (represented by a blue star). Biskupin notes slightly higher mean and median. It also has larger diversity, flats reach there higher prices than in other districts. It’s also fact that neither of the districts records any outliers. Even higher prices in Biskupin fit into fourth quontile of prices.
ggplot2 plots
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Using facets
Faceting generates small multiples each showing a different subset of the data. Small multiples are a powerful tool for exploratory data analysis: you can rapidly compare patterns in different parts of the data and see whether they are the same or different. Read more about facets here.
## `summarise()` has grouped output by 'district', 'rooms'. You can override using the `.groups` argument.
Descriptive statistics #1
Before automatically reporting the full summary table of descriptive statistics, this time your goal is to measure the central tendency of the distribution of prices. Compare mean, median and mode together with positional measures - quantiles - by districts and building types or no. of rooms per apartment.
## [1] 359769
## [1] 1277691
## [1] 760035
## [1] 755719.5
## [1] 186099.8
## [1] 34633125960
## [1] 282686.5
## 25%
## 619073.8
## 75%
## 901760.2
## [1] 141343.2
## [1] 0.2448568
Summary tables with ‘kable’
Using kable and kableextra packages we can easily create summary tables with graphics and/or statistics.
| Rooms | Price.histogram | Price.boxplot | Price.lineplot |
|---|---|---|---|
| 1 | |||
| 2 | |||
| 3 | |||
| 4 |
| District | Price.histogram | Price.boxplot | Price.lineplot |
|---|---|---|---|
| Biskupin | |||
| Krzyki | |||
| Srodmiescie |
| District | Price.histogram | Price.boxplot | Price.lineplot |
|---|---|---|---|
| Kamienica | |||
| Niski blok | |||
| Wiezowiec |
Ok, now we will finally summarize basic central tendency measures for prices by districts/building types using kable packages. You can customize your final report. See some hints here.
|
FLATS IN DISTRICTS OF WROCLAW
|
|||
|---|---|---|---|
| Biskupin (n = 65) | Krzyki (n = 79) | Srodmiescie (n = 56) | |
| Price of flat | |||
| Min | 519652 | 359769 | 448196 |
| Max | 1277691 | 1090444 | 1062054 |
| Mean | 818614 | 726507 | 739340 |
| Sd | 175598 | 195015 | 171428 |
| IQR | 249723 | 276126 | 278465 |
| Q1 | 676751 | 600180.5 | 592287.75 |
| Median | 817736 | 716726 | 727477.5 |
| Q3 | 926474 | 876306.5 | 870752.5 |
| Size of flat | |||
| Min Size | 17.1 | 17.4 | 17 |
| Max Size | 87.7 | 86.6 | 83.3 |
| Mean Size | 47.05 \(\pm\) 19.57 | 46.86 \(\pm\) 20.95 | 44.27 \(\pm\) 19.63 |
|
FLATS PER ROOMS IN WROCLAW
|
||||
|---|---|---|---|---|
| 1 Room (n = 44) | 2 Rooms (n = 50) | 3 Rooms (n = 58) | 4 Rooms (n = 48) | |
| Price of flat | ||||
| Min | 359769 | 590286 | 632770 | 736669 |
| Max | 657146 | 888634 | 965829 | 1277691 |
| Mean | 515518 | 683568 | 833706 | 974810 |
| Sd | 66951 | 65073 | 86944 | 113819 |
| IQR | 75340 | 82971 | 131395 | 141605 |
| Q1 | 479684.75 | 634757.25 | 769683.75 | 909371.5 |
| Median | 520507 | 677260 | 846303.5 | 964338.5 |
| Q3 | 555024.75 | 717728.5 | 901078.75 | 1050976.75 |
| Size of flat | ||||
| Min Size | 17 | 29.6 | 41.2 | 53.3 |
| Max Size | 21.9 | 43.7 | 65.2 | 87.7 |
| Mean Size | 19.28 \(\pm\) 1.46 | 36.80 \(\pm\) 4.46 | 53.33 \(\pm\) 7.21 | 72.05 \(\pm\) 10.18 |
|
FLATS IN BUILDINGS OF WROCLAW
|
|||
|---|---|---|---|
| Kamienica (n = 61) | Niski blok (n = 63) | Wiezowiec (n = 76) | |
| Price of flat | |||
| Min | 415834 | 496390 | 359769 |
| Max | 1230848 | 1277691 | 1090444 |
| Mean | 770333 | 815577 | 705729 |
| Sd | 184388 | 176390 | 182503 |
| IQR | 248430 | 246927 | 314954 |
| Q1 | 647756 | 692925.5 | 555798.25 |
| Median | 800693 | 807895 | 678704 |
| Q3 | 896186 | 939852.5 | 870752.5 |
| Size of flat | |||
| Min Size | 17 | 17.4 | 17.4 |
| Max Size | 87.5 | 87.7 | 85.7 |
| Mean Size | 48.37 \(\pm\) 20.92 | 49.13 \(\pm\) 18.99 | 42.02 \(\pm\) 19.82 |