Wrangling data
mieszkania$district<-as.factor(mieszkania$district)
mieszkania$building_type<-as.factor(mieszkania$building_type)
mieszkania$rooms<-factor(mieszkania$rooms,ordered=TRUE)
attach(mieszkania)
mieszkania$price_PLN<-as.numeric(mieszkania$price_PLN)
mieszkania$price_EUR<-as.numeric(mieszkania$price_EUR)Frequency table
First part of analysis consist of grouping data in the form of frequency table of costs of apartments mieszkania.csv file. The data is grouped in bins each having interval of 1000000
## limits Freq Rel_Freq Cum_Freq
## 1 (3.5e+05,4.5e+05] 9 0.045 9
## 2 (4.5e+05,5.5e+05] 21 0.105 30
## 3 (5.5e+05,6.5e+05] 33 0.165 63
## 4 (6.5e+05,7.5e+05] 36 0.180 99
## 5 (7.5e+05,8.5e+05] 31 0.155 130
## 6 (8.5e+05,9.5e+05] 36 0.180 166
## 7 (9.5e+05,1.05e+06] 21 0.105 187
## 8 (1.05e+06,1.15e+06] 10 0.050 197
## 9 (1.15e+06,1.25e+06] 2 0.010 199
## 10 (1.25e+06,1.35e+06] 1 0.005 200
Next tables are showing grouped data showing amount of apartments in each district and of each type respectively.
Tabular accuracy
## # classes Goodness of fit Tabular accuracy
## 10.0000000 0.9780872 0.8508467
Basic plots
This section will show some of basic plots.
First plot is used with Freedman/Diaconis rule p.120 (“Algorithm that chooses bin widths and locations automatically, based on the sample size and the spread of the data”) from the laboratories
Next plot is scatter plot showing price dependence on the size of apartment with regression line showing function matching both variables (price = f(size))
And boxplot with price distribution for districts and amount of rooms ## ggplot2 plots Implementation of interactive heatmap showing how big part of any district are apartments with different amount of rooms
Descriptive statistics
Before automatically reporting the full summary table of descriptive statistics, this time your goal is to measure the central tendency of the distribution of prices. Compare mean, median and mode together with positional measures - quantiles - by districts and building types or no. of rooms per apartment.
## [1] 760035
## [1] 755719.5
## [1] 186099.8
## [1] 34633125960
## [1] 282686.5
## [1] 359769
## [1] 1277691
## 0% 5% 25% 50% 75% 95% 100%
## 359769.0 477175.4 619073.8 755719.5 901760.2 1054250.8 1277691.0
Facet
Summary tables with ‘kable’
Below I summarize basic central tendency measures for prices by districts/building types using kable packages.
## Warning in if (drop) f <- factor(f): warunek posiada długość > 1 i tylko
## pierwszy element będzie użyty
## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.
## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.
## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.
## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.
| districts_and_types | boxplot | histogram | line1 | points1 |
|---|---|---|---|---|
| Krzyki | ||||
| Biskupin | ||||
| Srodmiescie | ||||
| wiezowiec | ||||
| kamienica | ||||
| niski blok |
Summary
Simple summary of dataset with median, mean, minimum, maximum, interquantile range and standard deviation
## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
| Characteristic | Biskupin, N = 65 | Krzyki, N = 79 | Srodmiescie, N = 56 |
|---|---|---|---|
| price_PLN | |||
| Median | 817,736 | 716,726 | 727,478 |
| Mean | 818,614 | 726,507 | 739,340 |
| Minimum | 519,652 | 359,769 | 448,196 |
| Maximum | 1,277,691 | 1,090,444 | 1,062,054 |
| iqr | 249,723 | 276,126 | 278,465 |
| SD | 175,598 | 195,015 | 171,428 |
| price_EUR | |||
| Median | 189,291 | 165,909 | 168,398 |
| Mean | 189,494 | 168,173 | 171,143 |
| Minimum | 120,290 | 83,280 | 103,749 |
| Maximum | 295,762 | 252,418 | 245,846 |
| iqr | 57,807 | 63,918 | 64,460 |
| SD | 40,648 | 45,142 | 39,682 |
| rooms | |||
| 1 | 12 (18%) | 18 (23%) | 14 (25%) |
| 2 | 16 (25%) | 19 (24%) | 15 (27%) |
| 3 | 24 (37%) | 18 (23%) | 16 (29%) |
| 4 | 13 (20%) | 24 (30%) | 11 (20%) |
| size | |||
| Median | 45 | 44 | 43 |
| Mean | 47 | 47 | 44 |
| Minimum | 17 | 17 | 17 |
| Maximum | 88 | 87 | 83 |
| iqr | 26 | 31 | 32 |
| SD | 20 | 21 | 20 |
| building_type | |||
| kamienica | 26 (40%) | 21 (27%) | 14 (25%) |
| niski blok | 17 (26%) | 24 (30%) | 22 (39%) |
| wiezowiec | 22 (34%) | 34 (43%) | 20 (36%) |