library(data.table)
library(ggplot2)

df <- data.table(readRDS('flats.rds'))
# colors that I used "#006D77", "#66B7B0" ,"#EDF6F9"

Reproduce the below plots:

Task 1

Task 2

Task 3

Task 4

Task 5

Task 6

Compute the mean and sd of all numeric variables grouped by District, something like:

str(districts)

## Classes 'data.table' and 'data.frame':   23 obs. of  19 variables:
##  $ District                 : num  8 13 9 14 11 16 5 2 7 21 ...
##  $ District.mean            : num  8 13 9 14 11 16 5 2 7 21 ...
##  $ District.sd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Area.mean                : num  51 53.8 53.2 50.8 56 ...
##  $ Area.sd                  : num  24.4 24.3 26.3 19.4 27.1 ...
##  $ Num_whole_rooms.mean     : num  1.78 1.8 1.76 1.66 1.87 ...
##  $ Num_whole_rooms.sd       : num  0.935 0.878 0.917 0.832 0.935 ...
##  $ Num_half_rooms.mean      : num  1.15 1.14 1.11 1.17 1.18 ...
##  $ Num_half_rooms.sd        : num  0.43 0.364 0.353 0.385 0.431 ...
##  $ Price.mean               : num  152614 184411 168140 138125 189766 ...
##  $ Price.sd                 : num  74911 113971 101429 48557 119575 ...
##  $ Floor.mean               : num  2.85 3.43 2.92 2.77 2.76 ...
##  $ Floor.sd                 : num  1.85 2.17 1.91 2.09 1.87 ...
##  $ Floors_in_bdg.mean       : num  5.04 5.91 5.28 4.66 4.98 ...
##  $ Floors_in_bdg.sd         : num  2.18 2.12 2.04 2.51 2.29 ...
##  $ Overhead.mean            : num  24386 24527 25439 23193 23285 ...
##  $ Overhead.sd              : num  12597 13420 15204 13834 14062 ...
##  $ Parking_fee(monthly).mean: num  22744 392160 536695 195405 218329 ...
##  $ Parking_fee(monthly).sd  : num  5265 2659223 3567959 1211490 2106101 ...
##  - attr(*, ".internal.selfref")=<externalptr>

Then apply MDS on this dataset and visualize the similarities of the Budapest districts:

Bonus exercises

Geocode the 23 districts of Budapest (like we did in the class this week), and show them on a map:

Now use the location data from above, but instead of points, place small pie-charts (!) using the scatterpie package on the map to show the distribution of comfort level for each district. You might need to pivot your long table into a wide one using dcast. Don’t forget to order the labels of Comfort_lev first:

Note, that the echo = FALSE parameter was added to the above code chunks to prevent printing of the R code that generated the plot … but you should not do that, as we want to see how you solved the exercise.