Lab Report

Krystian Opala

2021-03-24

Data wrangling

As you can see not all formats of our variables are adjusted. We need to prepare the appropriate formats of our variables according to their measurement scales and future usage.

mieszkania$district<-as.factor(mieszkania$district)
mieszkania$building_type<-as.factor(mieszkania$building_type)
mieszkania$rooms<-factor(mieszkania$rooms,ordered=TRUE)
attach(mieszkania)
mieszkania$price_PLN<-as.numeric(mieszkania$price_PLN)
mieszkania$price_EUR<-as.numeric(mieszkania$price_EUR)

Frequency table

In the first stage of our analysis we are going to group our data in the form of the simple frequency table.

First, let’s take a look at the distribution of prices of apartments in our sample:

##                 limits Freq Rel_Freq Cum_Freq
## 1    (3.5e+05,4.5e+05]    9    0.045        9
## 2    (4.5e+05,5.5e+05]   21    0.105       30
## 3    (5.5e+05,6.5e+05]   33    0.165       63
## 4    (6.5e+05,7.5e+05]   36    0.180       99
## 5    (7.5e+05,8.5e+05]   31    0.155      130
## 6    (8.5e+05,9.5e+05]   36    0.180      166
## 7   (9.5e+05,1.05e+06]   21    0.105      187
## 8  (1.05e+06,1.15e+06]   10    0.050      197
## 9  (1.15e+06,1.25e+06]    2    0.010      199
## 10 (1.25e+06,1.35e+06]    1    0.005      200

### TAI Now let’s check the tabular accuracy.

##        # classes  Goodness of fit Tabular accuracy 
##       10.0000000        0.9780872        0.8508467

Basic plots

In this section we should present our data using basic (pre-installed with R) graphics. Choose the most appropriate plots according to the scale of chosen variables. Investigate the heterogeneity of the distribution presenting data by groups (i.e. by district, building type etc.). Do not forget about main titles, labels and legend. Read more about graphical parameters here.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

ggplot2 plots

Implementation of interactive heatmap showing how big part of any district are apartments with different amount of rooms

## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout

Ggplot2 allows to show the average value of each group using the stat_summary() function. No more need to calculate your mean values before plotting!

Using facets

Faceting generates small multiples each showing a different subset of the data. Small multiples are a powerful tool for exploratory data analysis: you can rapidly compare patterns in different parts of the data and see whether they are the same or different. Read more about facets here.

Descriptive statistics #1

Before automatically reporting the full summary table of descriptive statistics, this time your goal is to measure the central tendency of the distribution of prices. Compare mean, median and mode together with positional measures - quantiles - by districts and building types or no. of rooms per apartment.

## [1] 760035
## [1] 755719.5
## [1] 186099.8
## [1] 34633125960
## [1] 282686.5
## [1] 359769
## [1] 1277691
##        0%        5%       25%       50%       75%       95%      100% 
##  359769.0  477175.4  619073.8  755719.5  901760.2 1054250.8 1277691.0

Summary tables with ‘kable’

Using kable and kableextra packages we can easily create summary tables with graphics and/or statistics.

rooms boxplot histogram line1 line2 points1
1
2
3
4

Ok, now we will finally summarize basic central tendency measures for prices by districts/building types using kable packages. You can customize your final report. See some hints here.

## Warning in if (drop) f <- factor(f): warunek posiada długość > 1 i tylko
## pierwszy element będzie użyty
## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.

## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.

## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.

## Warning in ensure_len_html(image, nrows, "image"): The number of provided values
## in image does not equal to the number of rows.
districts boxplot histogram line1 points1
Krzyki
Biskupin
Srodmiescie
wiezowiec
kamienica
niski blok

We can also use other formats to produce summary tables. One of these is:

## 
## Attaching package: 'psych'
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
## price_PLN was converted to a data frame
## Data Frame Summary  
## price_PLN  
## Dimensions: 200 x 1  
## Duplicates: 0  
## 
## +----+------------+--------------------------------+---------------------+-----------------------+----------+---------+
## | No | Variable   | Stats / Values                 | Freqs (% of Valid)  | Graph                 | Valid    | Missing |
## +====+============+================================+=====================+=======================+==========+=========+
## | 1  | price_PLN  | Mean (sd) : 760035 (186099.8)  | 200 distinct values |       : . .           | 200      | 0       |
## |    | [integer]  | min < med < max:               |                     |     : : : : :         | (100.0%) | (0.0%)  |
## |    |            | 359769 < 755719.5 < 1277691    |                     |   . : : : : :         |          |         |
## |    |            | IQR (CV) : 282686.5 (0.2)      |                     | . : : : : : : :       |          |         |
## |    |            |                                |                     | : : : : : : : :   .   |          |         |
## +----+------------+--------------------------------+---------------------+-----------------------+----------+---------+

Descriptive statistics - summary table for quantitative variable using reporttools

We can also produce summary tables in Latex format (available only for PDF reports). Let’s see the summary table by district for both: price in PLN and price in EUR.

% latex table generated in R 4.0.4 by xtable 1.8-4 package % Wed Mar 24 01:32:34 2021

Contingency tables using reporttools

We can easily construct contingency tables in Latex. Let’s print summary table by no. of rooms per apartment for districts and building types:

% latex table generated in R 4.0.4 by xtable 1.8-4 package % Wed Mar 24 01:32:34 2021