library(data.table)
library(ggplot2)
df <- data.table(readRDS('flats.rds'))
# colors that I used "#006D77", "#66B7B0" ,"#EDF6F9"Reproduce the below plots:
Task 1
Task 2
Task 3
Task 4
Task 5
Task 6
Compute the mean and sd of all numeric
variables grouped by District, something like:
str(districts)## Classes 'data.table' and 'data.frame': 23 obs. of 19 variables:
## $ District : num 8 13 9 14 11 16 5 2 7 21 ...
## $ District.mean : num 8 13 9 14 11 16 5 2 7 21 ...
## $ District.sd : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Area.mean : num 51 53.8 53.2 50.8 56 ...
## $ Area.sd : num 24.4 24.3 26.3 19.4 27.1 ...
## $ Num_whole_rooms.mean : num 1.78 1.8 1.76 1.66 1.87 ...
## $ Num_whole_rooms.sd : num 0.935 0.878 0.917 0.832 0.935 ...
## $ Num_half_rooms.mean : num 1.15 1.14 1.11 1.17 1.18 ...
## $ Num_half_rooms.sd : num 0.43 0.364 0.353 0.385 0.431 ...
## $ Price.mean : num 152614 184411 168140 138125 189766 ...
## $ Price.sd : num 74911 113971 101429 48557 119575 ...
## $ Floor.mean : num 2.85 3.43 2.92 2.77 2.76 ...
## $ Floor.sd : num 1.85 2.17 1.91 2.09 1.87 ...
## $ Floors_in_bdg.mean : num 5.04 5.91 5.28 4.66 4.98 ...
## $ Floors_in_bdg.sd : num 2.18 2.12 2.04 2.51 2.29 ...
## $ Overhead.mean : num 24386 24527 25439 23193 23285 ...
## $ Overhead.sd : num 12597 13420 15204 13834 14062 ...
## $ Parking_fee(monthly).mean: num 22744 392160 536695 195405 218329 ...
## $ Parking_fee(monthly).sd : num 5265 2659223 3567959 1211490 2106101 ...
## - attr(*, ".internal.selfref")=<externalptr>
Then apply MDS on this dataset and visualize the similarities of the Budapest districts:
Bonus exercises
Geocode the 23 districts of Budapest (like we did in the class this week), and show them on a map:
Now use the location data from above, but instead of points, place
small pie-charts (!) using the scatterpie package on the
map to show the distribution of comfort level for each district. You
might need to pivot your long table into a wide one using
dcast. Don’t forget to order the labels of
Comfort_lev first:
Note, that the echo = FALSE parameter was added to the
above code chunks to prevent printing of the R code that generated the
plot … but you should not do that, as we want to see how you solved the
exercise.