Follow the instructions below for each question and make the appropriate changes to the R script.
The script to load the packages and import the data are given below.
The following three questions contain R plots that do not align with the principles of Data Visualization, as discussed in our class and in the reading Data Points. Correct the plots using your knowledge of ggplot2.
EXAMPLE
Question
The following is an example of an initial graph. There are three primary lines that need change in order to improve.
Answer
The following is an example of a corrected answer. There are three lines were changed to improve the readability of the graph, and each one has a comment explaining the rationale behind the change.
q0 <- listingsAll %>%
filter(!is.na(bed_type)) %>%
filter(price <= 500)
# The boxplots need to be sorted in a meaningful way in order to
# draw attention to the important insight: The bed type with the
# highest median price is "real beds."
ggplot(data=q0, aes(x=reorder(bed_type, -price, FUN=median), y=price)) +
# It was redunant to use size or color as a visual cue for price when price is already
# on the y-axis so I eliminated them.
geom_point(fill="grey50")+
geom_boxplot(fill="royalblue3", alpha=0.5)+
theme_minimal() +
theme( axis.title.x = element_text(size=10, face="bold"),
axis.title.y = element_text(size=10, face="bold"),
axis.text.x = element_text(size=10),
legend.position="bottom",
panel.grid.major.x = element_blank(),
panel.grid.major.y = element_line(color="grey95"),
panel.grid.minor = element_blank(),
plot.caption = element_text(size=8)) +
# The y-axis should have labels to make it easier to read.
labs(x = "Bed Type", y="Price ($)",
title="Airbnb Bed Types by D.C. Neighborhood",
subtitle="Real beds have a much higher median price then the other bed types, focusing on rooms with a price under or at $500.",
caption="Source: Inside Airbnb")
1. Listings per neighborhood
This plot is supposed to show the number of listings for each neighborhood. The goal of the graphic is to provide a heatmap of the number of listings for each type, potentially seeing the distribution of room types per neighborhood, as discussed in the plot subtitle.
There are at least three lines that need corrections in the script below. You should correct those lines, and explain why you made those changes. The data set, q1, does not need any changes.
(6 pts)
2. Neighborhoods and price
This plot is supposed to show that the most popular neighborhoods, as measured by the number of listings, do not necessarily have the highest prices for available rooms. The goal is to combine information about the most popular neighbors with information about the price and the number of available listings.
There are at least three lines that need corrections in the script below. You should correct those lines, and explain your rationale for making those changes. The data set, q2, does not need any changes.
(6 pts)
3. Room types and price
This graph is supposed to show the average price among the property types. For this graph in particular, consider the type of data that you want to plot, and you may want to refer to Chapter 4 of Data Points as to what types of plots are appropriate.
There are at least three corrections to the graph below. You should make the corrections to the code and comment in the R below as to why you made those changes. The data set, q3, does not need any changes.
(6 pts)
The following two questions contain errors that prevent the script from executing. Correct the errors in order to have a functional script that creates that described graph.
4. Room types with ggvis
This graph is supposed to be a tool to help explore the relationship between num_listings, room_type, property type, and neighborhood that were discussed in the first three graphs.
There are three errors that you should correct in order to make this script run. the graph below. Make the corrections to the code and comment in the R below as to why you made those changes. Explain why the code does not work with comments in the script.
(6 pts)
5. Host network The following script plots the co-stay network of hosts. Each node is a host, and the host ties represent hosts with common reviewers. Put differently, some reviewers stay with different hosts, and those hosts may have common attributes (i.e., apartments, similar neighborhoods, etc.). To understand the relationships among hosts, you should plot the network of hosts.
The host network has been restricted to 2013 visits. The nodes are colored by their degree (or the number of hosts that they are connected to by reviewers). The edges are colored by the number of reviewers that the connected hosts have in common.
(6 pts)
*** ### Part 3. Insights Given what you learned in the five preceding exervises, tell me something interesting about this data related to either the price, review scores, the most desirable neighborhood (as a host) and/or the network of hosts. You are welcome to use any package, including Shiny. If you choose to create a Shiny app, please write that below and submit that R script separately.
(10 pts)
This is a treemap, with the color representing the number of listings of each property type and the size representing the mean price of a listing of that property type. As one can see, there are clear classifications of property types into number of listings on Airbnb, with apartments and houses having the most listings, followed by townhouses and condominiums. The graph above is interactive.
Here is another graph comparing the mean price of property types and room types by their mean rating. The above graph is also interactive, hover to see what property type and how many listings each point represents. Entire home/apartments tend to cost more, but private rooms are cheaper and have about the same kind of rating. Shared rooms are the cheapest, of course, but are all over the place when it comes to ratings.