Setting things up
The following code sets up options and reads in the listings.csv
file - and currently filters out listings with apparently zero days availability. At the moment I’m unsure what zero implies here - missing value, no longer listed, some mixture of this or something entirely different. Until this is resolved, focus will be on non-zero values…
knitr::opts_chunk$set(
echo = TRUE,
message = FALSE,
warning = FALSE
)
library(tidyverse) # Data manipulation tool (autoloads ggplot2 as well)
library(readr) # Fast reading of listings data
library(dplyr) # Pipelining
library(rgdal) # For mapping later on
library(ggmap) # For mapping later on
# The following reads in the data, makes 'room_type' and 'neighbourhood' into factors
# and stores in a table called 'listings'
# Now I delete the zero days available data - remove the 'filter' command to keep them
"http://data.insideairbnb.com/ireland/leinster/dublin/2016-08-07/visualisations/" -> stem
stem %>% paste0("listings.csv") -> fullname
read_csv(fullname) %>%
mutate(room_type=as.factor(room_type),neighbourhood=as.factor(neighbourhood)) %>%
filter(availability_365 > 0) ->
listings
Does room type link with availability?
A box plot of availability (days per year) against room (or whole dwelling) listed
ggplot(listings,aes(y=availability_365,x=room_type)) + labs(x='Room Type',y='Days per year available') + geom_boxplot()
Probably best viewed as a density as data may not be unimodal:
ggplot(listings,aes(x=availability_365,fill=room_type)) + geom_density(alpha=0.4) + labs(fill='Room Type',x='Days per year available') + scale_fill_discrete(h.start=135)
Also a double box-plot, splitting the over 200 day listings from the ones below:
listings %>% mutate(below_200 = availability_365 < 200) %>% ggplot(aes(x=below_200,y=availability_365)) + geom_boxplot() + coord_flip() + labs(x='Below 200 Days',y='Days per year available') -> double_box
double_box
This plot splits the data into two groups, of above and below 200 days listed, and draws box plots for both groups. This reveals two distinct distributions.
Splitting the group by room type suggests this occurs for all types of listing:
double_box + facet_wrap(~room_type)
Availability vs Price
This is essentially a scatter plot plus a superimposed trend. Due to the large number of points hexagonal aggregation is used to handle multiply overlayed points. The trend is a Loess fit and suggests little correlation.
ggplot(listings,aes(x=availability_365,y=price)) + geom_hex(bins=50) + labs(y='Price',x='Days per year available') + scale_fill_continuous(low='wheat',high='red') + scale_y_log10() + geom_smooth()
ggplot(listings,aes(x=availability_365,y=price)) + geom_hex(bins=50) + labs(y='Price',x='Days per year available') + scale_fill_continuous(low='wheat',high='red') + scale_y_log10() + facet_wrap(~room_type,ncol=1) + geom_smooth()
Geographical Patterns
Firstly, get a - here a monochrome backdrop is chosen so that contextual information doesn’t clash with the data visualisation - which will be shown in colour.
get_map('Dublin, Ireland', zoom=12, color='bw', source='osm') %>% ggmap -> backdrop
backdrop
See overall distribution of listings
backdrop + geom_point(mapping=aes(x=longitude,y=latitude),data=listings,alpha=0.1,size=0.5,color='indianred')
However there is sme degree of jittering here, so that the apparent precision of locations may be deceptive. A more honest visualisation here may be to work with kernel density plots instead of direct point representations. Below, there is a comparison of over- and under 200 day listings with density plots:
backdrop + geom_density2d(mapping=aes(x=longitude,y=latitude,color=availability_365 > 200),data=listings) + scale_color_discrete(label=c("Up to 200 days per year","More than 200 days per year")) + labs(color='Listing Time')
The map above suggests that patterns are similar, but the more than 200 days a year listings are slightly more spread out, and there is a small peak near to DCU.
Doing it Tufte’s way - small multiples - possibly a pair of density maps side-by-side better illustrates the distinctions here.
listings %>% mutate(over_200 = if_else(availability_365 > 200,"More than 200 days per year","Up to 200 days per year")) -> listings
get_map('Dublin, Ireland', zoom=12, color='bw', source='osm') %>% ggmap(base_layer=ggplot(aes(x=longitude,y=latitude),data=listings)) -> backdrop
backdrop + geom_polygon(stat='density2d',alpha=0.3,fill='indianred',color='darkred') + facet_wrap(~over_200)
LS0tCnRpdGxlOiAiUiBOb3RlYm9vayIKb3V0cHV0OiAKICBodG1sX25vdGVib29rOiAKICAgIGhpZ2hsaWdodDogcHlnbWVudHMKICAgIHRoZW1lOiBzcGFjZWxhYgogICAgdG9jOiB5ZXMKLS0tCgojIERhdGEgc291cmNlClRoaXMgaXMgZnJvbSBodHRwOi8vaW5zaWRlYWlyYm5iLmNvbS9nZXQtdGhlLWRhdGEuaHRtbCAtIGFuZCBhIGRpc2N1c3Npb24gb2Ygd2h5IHRoaXMgbWF0dGVycyBpcyBoZXJlOiBodHRwczovLzUzZGVncmVlcy53b3JkcHJlc3MuY29tLzIwMTYvMTAvMjEvYXVndXN0LXNuYXBzaG90LW9mLWR1YmxpbnMtYWlyYm5iLwoKCgojIFNldHRpbmcgdGhpbmdzIHVwClRoZSBmb2xsb3dpbmcgY29kZSBzZXRzIHVwIG9wdGlvbnMgYW5kIHJlYWRzIGluIHRoZSBgbGlzdGluZ3MuY3N2YCBmaWxlIC0gYW5kIGN1cnJlbnRseSBmaWx0ZXJzIG91dCBsaXN0aW5ncyB3aXRoIGFwcGFyZW50bHkgemVybyBkYXlzIGF2YWlsYWJpbGl0eS4gIEF0IHRoZSBtb21lbnQgSSdtIHVuc3VyZSB3aGF0ICp6ZXJvKiBpbXBsaWVzIGhlcmUgLSBtaXNzaW5nIHZhbHVlLCAgbm8gbG9uZ2VyIGxpc3RlZCwgc29tZSBtaXh0dXJlIG9mIHRoaXMgb3Igc29tZXRoaW5nIGVudGlyZWx5IGRpZmZlcmVudC4gIFVudGlsIHRoaXMgaXMgcmVzb2x2ZWQsICBmb2N1cyB3aWxsIGJlIG9uIG5vbi16ZXJvIHZhbHVlcy4uLgpgYGB7ciBzZXR1cH0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KAoJZWNobyA9IFRSVUUsCgltZXNzYWdlID0gRkFMU0UsCgl3YXJuaW5nID0gRkFMU0UKKQpsaWJyYXJ5KHRpZHl2ZXJzZSkgIyBEYXRhIG1hbmlwdWxhdGlvbiB0b29sIChhdXRvbG9hZHMgZ2dwbG90MiBhcyB3ZWxsKQpsaWJyYXJ5KHJlYWRyKSAgICAgIyBGYXN0IHJlYWRpbmcgb2YgbGlzdGluZ3MgZGF0YQpsaWJyYXJ5KGRwbHlyKSAgICAgIyBQaXBlbGluaW5nCmxpYnJhcnkocmdkYWwpICAgICAjIEZvciBtYXBwaW5nIGxhdGVyIG9uCmxpYnJhcnkoZ2dtYXApICAgICAjIEZvciBtYXBwaW5nIGxhdGVyIG9uCiMgVGhlIGZvbGxvd2luZyByZWFkcyBpbiB0aGUgZGF0YSwgIG1ha2VzICdyb29tX3R5cGUnIGFuZCAnbmVpZ2hib3VyaG9vZCcgaW50byBmYWN0b3JzCiMgYW5kIHN0b3JlcyBpbiBhIHRhYmxlIGNhbGxlZCAnbGlzdGluZ3MnCiMgTm93IEkgZGVsZXRlIHRoZSB6ZXJvIGRheXMgYXZhaWxhYmxlIGRhdGEgLSByZW1vdmUgdGhlICdmaWx0ZXInIGNvbW1hbmQgdG8ga2VlcCB0aGVtCiJodHRwOi8vZGF0YS5pbnNpZGVhaXJibmIuY29tL2lyZWxhbmQvbGVpbnN0ZXIvZHVibGluLzIwMTYtMDgtMDcvdmlzdWFsaXNhdGlvbnMvIiAtPiBzdGVtCnN0ZW0gJT4lIHBhc3RlMCgibGlzdGluZ3MuY3N2IikgLT4gZnVsbG5hbWUKcmVhZF9jc3YoZnVsbG5hbWUpICU+JQogIG11dGF0ZShyb29tX3R5cGU9YXMuZmFjdG9yKHJvb21fdHlwZSksbmVpZ2hib3VyaG9vZD1hcy5mYWN0b3IobmVpZ2hib3VyaG9vZCkpICAlPiUgCiAgZmlsdGVyKGF2YWlsYWJpbGl0eV8zNjUgPiAwKSAtPgogIGxpc3RpbmdzICAKYGBgCgoKIyBEb2VzIHJvb20gdHlwZSBsaW5rIHdpdGggYXZhaWxhYmlsaXR5PwpBIGJveCBwbG90IG9mIGF2YWlsYWJpbGl0eSAoZGF5cyBwZXIgeWVhcikgYWdhaW5zdCByb29tIChvciB3aG9sZSBkd2VsbGluZykgbGlzdGVkCmBgYHtyfQpnZ3Bsb3QobGlzdGluZ3MsYWVzKHk9YXZhaWxhYmlsaXR5XzM2NSx4PXJvb21fdHlwZSkpICsgbGFicyh4PSdSb29tIFR5cGUnLHk9J0RheXMgcGVyIHllYXIgYXZhaWxhYmxlJykgKyBnZW9tX2JveHBsb3QoKQpgYGAKClByb2JhYmx5IGJlc3Qgdmlld2VkIGFzIGEgZGVuc2l0eSBhcyBkYXRhIG1heSBub3QgYmUgdW5pbW9kYWw6CgpgYGB7cn0KZ2dwbG90KGxpc3RpbmdzLGFlcyh4PWF2YWlsYWJpbGl0eV8zNjUsZmlsbD1yb29tX3R5cGUpKSArIGdlb21fZGVuc2l0eShhbHBoYT0wLjQpICsgbGFicyhmaWxsPSdSb29tIFR5cGUnLHg9J0RheXMgcGVyIHllYXIgYXZhaWxhYmxlJykgKyBzY2FsZV9maWxsX2Rpc2NyZXRlKGguc3RhcnQ9MTM1KQpgYGAKCkFsc28gYSBkb3VibGUgYm94LXBsb3QsICBzcGxpdHRpbmcgdGhlIG92ZXIgMjAwIGRheSBsaXN0aW5ncyBmcm9tIHRoZSBvbmVzIGJlbG93OgpgYGB7ciB9Cmxpc3RpbmdzICU+JSBtdXRhdGUoYmVsb3dfMjAwID0gYXZhaWxhYmlsaXR5XzM2NSA8IDIwMCkgJT4lIGdncGxvdChhZXMoeD1iZWxvd18yMDAseT1hdmFpbGFiaWxpdHlfMzY1KSkgKyBnZW9tX2JveHBsb3QoKSArIGNvb3JkX2ZsaXAoKSArIGxhYnMoeD0nQmVsb3cgMjAwIERheXMnLHk9J0RheXMgcGVyIHllYXIgYXZhaWxhYmxlJykgLT4gZG91YmxlX2JveApkb3VibGVfYm94CmBgYApUaGlzIHBsb3Qgc3BsaXRzIHRoZSBkYXRhIGludG8gdHdvIGdyb3VwcywgIG9mIGFib3ZlIGFuZCBiZWxvdyAyMDAgZGF5cyBsaXN0ZWQsICBhbmQgZHJhd3MgYm94IHBsb3RzIGZvciBib3RoIGdyb3Vwcy4gIFRoaXMgcmV2ZWFscyB0d28gZGlzdGluY3QgZGlzdHJpYnV0aW9ucy4gCgpTcGxpdHRpbmcgdGhlIGdyb3VwIGJ5IHJvb20gdHlwZSBzdWdnZXN0cyB0aGlzIG9jY3VycyBmb3IgYWxsIHR5cGVzIG9mIGxpc3Rpbmc6CmBgYHtyfQpkb3VibGVfYm94ICsgZmFjZXRfd3JhcCh+cm9vbV90eXBlKQpgYGAKCiMgQXZhaWxhYmlsaXR5IHZzIFByaWNlClRoaXMgaXMgZXNzZW50aWFsbHkgYSBzY2F0dGVyIHBsb3QgcGx1cyBhIHN1cGVyaW1wb3NlZCB0cmVuZC4gICBEdWUgdG8gdGhlIGxhcmdlIG51bWJlciBvZiBwb2ludHMgaGV4YWdvbmFsIGFnZ3JlZ2F0aW9uIGlzIHVzZWQgdG8gaGFuZGxlIG11bHRpcGx5IG92ZXJsYXllZCBwb2ludHMuICBUaGUgdHJlbmQgaXMgYSAqTG9lc3MqIGZpdCBhbmQgc3VnZ2VzdHMgbGl0dGxlIGNvcnJlbGF0aW9uLgpgYGB7cn0KZ2dwbG90KGxpc3RpbmdzLGFlcyh4PWF2YWlsYWJpbGl0eV8zNjUseT1wcmljZSkpICsgZ2VvbV9oZXgoYmlucz01MCkgKyBsYWJzKHk9J1ByaWNlJyx4PSdEYXlzIHBlciB5ZWFyIGF2YWlsYWJsZScpICsgc2NhbGVfZmlsbF9jb250aW51b3VzKGxvdz0nd2hlYXQnLGhpZ2g9J3JlZCcpICsgc2NhbGVfeV9sb2cxMCgpICsgZ2VvbV9zbW9vdGgoKQpgYGAKYGBge3J9CmdncGxvdChsaXN0aW5ncyxhZXMoeD1hdmFpbGFiaWxpdHlfMzY1LHk9cHJpY2UpKSArIGdlb21faGV4KGJpbnM9NTApICsgbGFicyh5PSdQcmljZScseD0nRGF5cyBwZXIgeWVhciBhdmFpbGFibGUnKSArIHNjYWxlX2ZpbGxfY29udGludW91cyhsb3c9J3doZWF0JyxoaWdoPSdyZWQnKSArIHNjYWxlX3lfbG9nMTAoKSArIGZhY2V0X3dyYXAofnJvb21fdHlwZSxuY29sPTEpICsgZ2VvbV9zbW9vdGgoKQpgYGAKCgojIEdlb2dyYXBoaWNhbCBQYXR0ZXJucwoKRmlyc3RseSwgIGdldCBhICAtIGhlcmUgYSBtb25vY2hyb21lIGJhY2tkcm9wIGlzIGNob3NlbiBzbyB0aGF0IGNvbnRleHR1YWwgaW5mb3JtYXRpb24gZG9lc24ndCBjbGFzaCB3aXRoIHRoZSBkYXRhIHZpc3VhbGlzYXRpb24gLSB3aGljaCB3aWxsIGJlIHNob3duIGluIGNvbG91ci4KYGBge3J9CmdldF9tYXAoJ0R1YmxpbiwgSXJlbGFuZCcsIHpvb209MTIsIGNvbG9yPSdidycsIHNvdXJjZT0nb3NtJykgJT4lIGdnbWFwIC0+IGJhY2tkcm9wCmJhY2tkcm9wIApgYGAKU2VlIG92ZXJhbGwgZGlzdHJpYnV0aW9uIG9mIGxpc3RpbmdzCgpgYGB7cn0KYmFja2Ryb3AgKyBnZW9tX3BvaW50KG1hcHBpbmc9YWVzKHg9bG9uZ2l0dWRlLHk9bGF0aXR1ZGUpLGRhdGE9bGlzdGluZ3MsYWxwaGE9MC4xLHNpemU9MC41LGNvbG9yPSdpbmRpYW5yZWQnKQpgYGAKSG93ZXZlciB0aGVyZSBpcyBzbWUgZGVncmVlIG9mIGppdHRlcmluZyBoZXJlLCAgc28gdGhhdCB0aGUgYXBwYXJlbnQgcHJlY2lzaW9uIG9mIGxvY2F0aW9ucyBtYXkgYmUgZGVjZXB0aXZlLiAgQSBtb3JlIGhvbmVzdCB2aXN1YWxpc2F0aW9uIGhlcmUgbWF5IGJlIHRvIHdvcmsgd2l0aCBrZXJuZWwgZGVuc2l0eSBwbG90cyBpbnN0ZWFkIG9mIGRpcmVjdCBwb2ludCByZXByZXNlbnRhdGlvbnMuIEJlbG93LCAgdGhlcmUgaXMgYSBjb21wYXJpc29uIG9mIG92ZXItIGFuZCB1bmRlciAyMDAgZGF5IGxpc3RpbmdzIHdpdGggZGVuc2l0eSBwbG90czoKYGBge3J9CmJhY2tkcm9wICsgZ2VvbV9kZW5zaXR5MmQobWFwcGluZz1hZXMoeD1sb25naXR1ZGUseT1sYXRpdHVkZSxjb2xvcj1hdmFpbGFiaWxpdHlfMzY1ID4gMjAwKSxkYXRhPWxpc3RpbmdzKSArIHNjYWxlX2NvbG9yX2Rpc2NyZXRlKGxhYmVsPWMoIlVwIHRvIDIwMCBkYXlzIHBlciB5ZWFyIiwiTW9yZSB0aGFuIDIwMCBkYXlzIHBlciB5ZWFyIikpICsgbGFicyhjb2xvcj0nTGlzdGluZyBUaW1lJykKYGBgClRoZSBtYXAgYWJvdmUgc3VnZ2VzdHMgdGhhdCBwYXR0ZXJucyBhcmUgc2ltaWxhciwgIGJ1dCB0aGUgbW9yZSB0aGFuIDIwMCBkYXlzIGEgeWVhciBsaXN0aW5ncyBhcmUgc2xpZ2h0bHkgbW9yZSBzcHJlYWQgb3V0LCAgYW5kIHRoZXJlIGlzIGEgc21hbGwgcGVhayBuZWFyIHRvIERDVS4KCkRvaW5nIGl0IFR1ZnRlJ3Mgd2F5IC0gKnNtYWxsIG11bHRpcGxlcyogLSBwb3NzaWJseSBhIHBhaXIgb2YgZGVuc2l0eSBtYXBzIHNpZGUtYnktc2lkZSBiZXR0ZXIgaWxsdXN0cmF0ZXMgdGhlIGRpc3RpbmN0aW9ucyBoZXJlLgoKYGBge3IgdHVmdGV9Cmxpc3RpbmdzICU+JSBtdXRhdGUob3Zlcl8yMDAgPSBpZl9lbHNlKGF2YWlsYWJpbGl0eV8zNjUgPiAyMDAsIk1vcmUgdGhhbiAyMDAgZGF5cyBwZXIgeWVhciIsIlVwIHRvIDIwMCBkYXlzIHBlciB5ZWFyIikpIC0+IGxpc3RpbmdzCmdldF9tYXAoJ0R1YmxpbiwgSXJlbGFuZCcsIHpvb209MTIsIGNvbG9yPSdidycsIHNvdXJjZT0nb3NtJykgJT4lIGdnbWFwKGJhc2VfbGF5ZXI9Z2dwbG90KGFlcyh4PWxvbmdpdHVkZSx5PWxhdGl0dWRlKSxkYXRhPWxpc3RpbmdzKSkgLT4gYmFja2Ryb3AKCmJhY2tkcm9wICsgZ2VvbV9wb2x5Z29uKHN0YXQ9J2RlbnNpdHkyZCcsYWxwaGE9MC4zLGZpbGw9J2luZGlhbnJlZCcsY29sb3I9J2RhcmtyZWQnKSArIGZhY2V0X3dyYXAofm92ZXJfMjAwKQpgYGAKCg==