Lab Report

Krystian Opala

2021-03-27

Wrangling data

mieszkania$district<-as.factor(mieszkania$district)
mieszkania$building_type<-as.factor(mieszkania$building_type)
mieszkania$rooms<-factor(mieszkania$rooms,ordered=TRUE)
attach(mieszkania)
mieszkania$price_PLN<-as.numeric(mieszkania$price_PLN)
mieszkania$price_EUR<-as.numeric(mieszkania$price_EUR)

Frequency table

First part of analysis consist of grouping data in the form of frequency table of costs of apartments mieszkania.csv file. The data is grouped in bins each having interval of 1000000

##                 limits Freq Rel_Freq Cum_Freq
## 1    (3.5e+05,4.5e+05]    9    0.045        9
## 2    (4.5e+05,5.5e+05]   21    0.105       30
## 3    (5.5e+05,6.5e+05]   33    0.165       63
## 4    (6.5e+05,7.5e+05]   36    0.180       99
## 5    (7.5e+05,8.5e+05]   31    0.155      130
## 6    (8.5e+05,9.5e+05]   36    0.180      166
## 7   (9.5e+05,1.05e+06]   21    0.105      187
## 8  (1.05e+06,1.15e+06]   10    0.050      197
## 9  (1.15e+06,1.25e+06]    2    0.010      199
## 10 (1.25e+06,1.35e+06]    1    0.005      200

Next tables are showing grouped data showing amount of apartments in each district and of each type respectively.

Tabular accuracy

##        # classes  Goodness of fit Tabular accuracy 
##       10.0000000        0.9780872        0.8508467

Basic plots

This section will show some of basic plots.

First plot is used with Freedman/Diaconis rule p.120 (“Algorithm that chooses bin widths and locations automatically, based on the sample size and the spread of the data”) from the laboratories

Next plot is scatter plot showing price dependence on the size of apartment with regression line showing function matching both variables (price = f(size))

And boxplot with price distribution for districts and amount of rooms ## ggplot2 plots Implementation of interactive heatmap showing how big part of any district are apartments with different amount of rooms

Descriptive statistics

Before automatically reporting the full summary table of descriptive statistics, this time your goal is to measure the central tendency of the distribution of prices. Compare mean, median and mode together with positional measures - quantiles - by districts and building types or no. of rooms per apartment.

## [1] 760035
## [1] 755719.5
## [1] 186099.8
## [1] 34633125960
## [1] 282686.5
## [1] 359769
## [1] 1277691
##        0%        5%       25%       50%       75%       95%      100% 
##  359769.0  477175.4  619073.8  755719.5  901760.2 1054250.8 1277691.0

Facet

Summary tables with ‘kable’

Below I summarize basic central tendency measures for prices by districts/building types using kable packages.

## Warning in if (drop) f <- factor(f): warunek posiada długość > 1 i tylko
## pierwszy element będzie użyty
## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.

## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.

## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.

## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.
districts_and_types boxplot histogram line1 points1
Krzyki
Biskupin
Srodmiescie
wiezowiec
kamienica
niski blok

Summary

Simple summary of dataset with median, mean, minimum, maximum, interquantile range and standard deviation

## Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
Characteristic Biskupin, N = 65 Krzyki, N = 79 Srodmiescie, N = 56
price_PLN
Median 817,736 716,726 727,478
Mean 818,614 726,507 739,340
Minimum 519,652 359,769 448,196
Maximum 1,277,691 1,090,444 1,062,054
iqr 249,723 276,126 278,465
SD 175,598 195,015 171,428
price_EUR
Median 189,291 165,909 168,398
Mean 189,494 168,173 171,143
Minimum 120,290 83,280 103,749
Maximum 295,762 252,418 245,846
iqr 57,807 63,918 64,460
SD 40,648 45,142 39,682
rooms
1 12 (18%) 18 (23%) 14 (25%)
2 16 (25%) 19 (24%) 15 (27%)
3 24 (37%) 18 (23%) 16 (29%)
4 13 (20%) 24 (30%) 11 (20%)
size
Median 45 44 43
Mean 47 47 44
Minimum 17 17 17
Maximum 88 87 83
iqr 26 31 32
SD 20 21 20
building_type
kamienica 26 (40%) 21 (27%) 14 (25%)
niski blok 17 (26%) 24 (30%) 22 (39%)
wiezowiec 22 (34%) 34 (43%) 20 (36%)