Graphics and Visualization Homework #3

Jerry Zhang

setwd("C:/Users/Jerry Zhang/Dropbox/School/Graphics and Visualization/PS3")
load("phoenix-amfood.RData")
library(UScensus2010)
## Loading required package: maptools Loading required package: sp Checking
## rgeos availability: FALSE Note: when rgeos is not available, polygon
## geometry computations in maptools depend on gpclib, which has a restricted
## licence. It is disabled by default; to enable gpclib, type gpclibPermit()
## Loading required package: foreign
## 
## Package UScensus2010: US Census 2010 Suite of R Packages Version 0.11
## created on 2011-11-18.
## 
## Zack Almquist, University of California-Irvine ne
## 
## For citation information, type citation("UScensus2010"). Type
## help(package=UScensus2010) to get started.
library(ggplot2)
library(grid)
library(gridExtra)
names(amfood.rest) = c("longitude", "latitude", "stars", "review.count", "american.new", 
    "american.traditional", "italian", "mexican", "pizza", "funny", "useful", 
    "cool")
attach(amfood.rest)

1) American (New) restaurants tend to be distributed to the East of the median longitude of -111.9746. American (Traditional) restaurants are distributed in a similar fashion. Mexican restaurants are very centrally distributed and are slightly more common West of the median longitude. Italian restaurants are once again more common on the East side of the city. Pizza restaurants are distributed throughout the city; the peak to the East of the median corresponds to an area with a high density of all restaurants.

data.american.new = subset(amfood.rest, american.new == 1)
data.american.trad = subset(amfood.rest, american.traditional == 1)
data.mexican = subset(amfood.rest, mexican == 1)
data.italian = subset(amfood.rest, italian == 1)
data.pizza = subset(amfood.rest, pizza == 1)

plot.an = ggplot(data.american.new) + geom_histogram(aes(x = longitude), binwidth = 0.05) + 
    ggtitle("American (New) Restaurants by Longitude") + xlab("Longitude") + 
    ylab("Count") + geom_vline(xintercept = median(longitude), colour = "red")
plot.at = ggplot(data.american.trad) + geom_histogram(aes(x = longitude), binwidth = 0.05) + 
    ggtitle("American (Traditional) Restaurants by Longitude") + xlab("Longitude") + 
    ylab("Count") + geom_vline(xintercept = median(longitude), colour = "red")
plot.me = ggplot(data.mexican) + geom_histogram(aes(x = longitude), binwidth = 0.05) + 
    ggtitle("Mexican Restaurants by Longitude") + xlab("Longitude") + ylab("Count") + 
    geom_vline(xintercept = median(longitude), colour = "red")
plot.it = ggplot(data.italian) + geom_histogram(aes(x = longitude), binwidth = 0.05) + 
    ggtitle("Italian Restaurants by Longitude") + xlab("Longitude") + ylab("Count") + 
    geom_vline(xintercept = median(longitude), colour = "red")
plot.pi = ggplot(data.pizza) + geom_histogram(aes(x = longitude), binwidth = 0.05) + 
    ggtitle("Pizza Restaurants by Longitude") + xlab("Longitude") + ylab("Count") + 
    geom_vline(xintercept = median(longitude), colour = "red")

plot.overall = ggplot(amfood.rest, aes(x = longitude)) + geom_histogram(binwidth = 0.05) + 
    ggtitle("Overall Distribution of Restaurants by Longitude") + xlab("Longitude") + 
    ylab("Count")

grid.arrange(plot.overall, plot.an, plot.at, plot.me, plot.it, plot.pi, nrow = 6)

plot of chunk unnamed-chunk-2

The median latitude is 33.4825. American (New) restaurants are distributed more to the Northern side of the town. American (Traditional) restaurants are very centrally distributed. Mexican restaurants are more common the South side of the town. Italian restaurants are more common the North side of the town. Pizza restaurants are once again everywhere, but there are more pizza restaurants on the South side.

plot.an2 = ggplot(data.american.new) + geom_histogram(aes(x = latitude), binwidth = 0.05) + 
    ggtitle("American (New) Restaurants by Latitude") + xlab("Latitude") + ylab("Count") + 
    geom_vline(xintercept = median(latitude), colour = "red")
plot.at2 = ggplot(data.american.trad) + geom_histogram(aes(x = latitude), binwidth = 0.05) + 
    ggtitle("American (Traditional) Restaurants by Latitude") + xlab("Latitude") + 
    ylab("Count") + geom_vline(xintercept = median(latitude), colour = "red")
plot.me2 = ggplot(data.mexican) + geom_histogram(aes(x = latitude), binwidth = 0.05) + 
    ggtitle("Mexican Restaurants by Latitude") + xlab("Latitude") + ylab("Count") + 
    geom_vline(xintercept = median(latitude), colour = "red")
plot.it2 = ggplot(data.italian) + geom_histogram(aes(x = latitude), binwidth = 0.05) + 
    ggtitle("Italian Restaurants by Latitude") + xlab("Latitude") + ylab("Count") + 
    geom_vline(xintercept = median(latitude), colour = "red")
plot.pi2 = ggplot(data.pizza) + geom_histogram(aes(x = latitude), binwidth = 0.05) + 
    ggtitle("Pizza Restaurants by Latitude") + xlab("Latitude") + ylab("Count") + 
    geom_vline(xintercept = median(latitude), colour = "red")

plot.overall2 = ggplot(amfood.rest, aes(x = latitude)) + geom_histogram(binwidth = 0.05) + 
    ggtitle("Overall Distribution of Restaurants by Latitude") + xlab("Latitude") + 
    ylab("Count")

grid.arrange(plot.overall2, plot.an2, plot.at2, plot.me2, plot.it2, plot.pi2, 
    nrow = 6)

plot of chunk unnamed-chunk-3

2) The number of reviews does not seem to determine the average votes for useful or funny. The highest density areas are marked by the red contour. There seem to be more useful reviews than funny ones.

plot.useful = ggplot(amfood.rest, aes(x = review.count, y = useful)) + geom_point(alpha = 0.2) + 
    geom_density2d(aes(x = review.count, y = useful, alpha = 0.3, colour = "red")) + 
    ylab("Useful Rating") + xlab("Review Count") + ggtitle("Useful Rating versus Review Count") + 
    theme_bw() + geom_smooth(method = "loess", se = FALSE, colour = "red") + 
    theme(legend.position = "none")

plot.funny = ggplot(amfood.rest, aes(x = review.count, y = funny)) + geom_point(alpha = 0.2) + 
    geom_density2d(aes(x = review.count, y = funny, alpha = 0.3, colour = "red")) + 
    ylab("Funny Rating") + xlab("Review Count") + ggtitle("Funny Rating versus Review Count") + 
    theme_bw() + geom_smooth(method = "loess", se = FALSE, colour = "red") + 
    theme(legend.position = "none")

grid.arrange(plot.useful, plot.funny, nrow = 2)

plot of chunk unnamed-chunk-4

3) General map of restaurants in the Phoenix area

load("arizona2010.RData")
library(MASS)
limits = data.frame(y = c(min(latitude), max(latitude)), x = c(min(longitude), 
    max(longitude)))
plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.pizza$longitude, data.pizza$latitude, pch = 17, cex = 1.5, col = "red")
title("Location of All Restaurants in Phoenix")

plot of chunk unnamed-chunk-5

data.italiannp = subset(amfood.rest, (italian == 1) & (pizza == 0))
plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.italiannp$longitude, data.italiannp$latitude, pch = 16, cex = 1.5, 
    col = "green")
points(data.pizza$longitude, data.pizza$latitude, pch = 16, cex = 1.5, col = "red")
contour(kde2d(data.italiannp$longitude, data.italiannp$latitude), add = TRUE, 
    col = "green")
contour(kde2d(data.pizza$longitude, data.pizza$latitude), add = TRUE, col = "red")
title("Italian and Pizza Restaurants in Phoenix (Green = Italian, Red = Pizza)")

plot of chunk unnamed-chunk-6

a) Italian restaurants that do not also identify as pizza restaurants tend to be located in clusters. Pizza restaurants are more evenly distributed throughout the city. Areas with higher concentrations of Italian restaurants also seem to have higher concentrations of pizza restaurants.

plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.mexican$longitude, data.mexican$latitude, pch = 16, cex = 1.5, col = "green")
points(data.pizza$longitude, data.pizza$latitude, pch = 16, cex = 1, col = "red")
contour(kde2d(data.mexican$longitude, data.mexican$latitude), add = TRUE, col = "green")
contour(kde2d(data.pizza$longitude, data.pizza$latitude), add = TRUE, col = "red")
title("Mexican Restaurants in Phoenix (Mexican = Green, Pizza = Red)")

plot of chunk unnamed-chunk-7

b) There are 2 modes for Mexican restaurants. Pizza restaurants seem to be more evenly distributed within the city.

data.americannewhigh = subset(amfood.rest, (american.new == 1) & (stars >= 4.5))
data.americannewlow = subset(amfood.rest, (american.new == 1) & (stars <= 2))
plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.americannewhigh$longitude, data.americannewhigh$latitude, pch = 16, 
    cex = 1.5, col = "green")
points(data.americannewlow$longitude, data.americannewlow$latitude, pch = 16, 
    cex = 1.5, col = "red")
contour(kde2d(data.americannewhigh$longitude, data.americannewhigh$latitude), 
    add = TRUE, col = "green")
contour(kde2d(data.americannewlow$longitude, data.americannewlow$latitude), 
    add = TRUE, col = "red")
title("Highest and Lowest Rated New American Restaurants in Phoenix (Green = Highest, Red = Lowest)")

plot of chunk unnamed-chunk-8

data.americantradhigh = subset(amfood.rest, (american.traditional == 1) & (stars >= 
    4.5))
data.americantradlow = subset(amfood.rest, (american.traditional == 1) & (stars <= 
    2))
plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.americantradhigh$longitude, data.americantradhigh$latitude, pch = 16, 
    cex = 1.5, col = "green")
points(data.americantradlow$longitude, data.americantradlow$latitude, pch = 16, 
    cex = 1.5, col = "red")
contour(kde2d(data.americantradhigh$longitude, data.americantradhigh$latitude), 
    add = TRUE, col = "green")
contour(kde2d(data.americantradlow$longitude, data.americantradlow$latitude), 
    add = TRUE, col = "red")
title("Highest and Lowest Rated Traditional American Restaurants in Phoenix (Green = Highest, Red = Lowest)")

plot of chunk unnamed-chunk-9

c) The best rated New American restaurants are located along the Eastern side of the town. The worst rated ones are on the Western side. There doesn't seem to be much of a pattern to the distribution of Traditional American restaurants in Phoenix. The best and worst restaurants are scattered throughout the city.

d)

data.mostfunny = subset(amfood.rest, funny > 3)
data.leastfunny = subset(amfood.rest, (funny > 0) & (funny < 0.1))
plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.mostfunny$longitude, data.mostfunny$latitude, pch = 16, cex = 1.5, 
    col = "green")
points(data.leastfunny$longitude, data.leastfunny$latitude, pch = 16, cex = 1.5, 
    col = "red")
contour(kde2d(data.mostfunny$longitude, data.mostfunny$latitude), add = TRUE, 
    col = "green")
contour(kde2d(data.leastfunny$longitude, data.leastfunny$latitude), add = TRUE, 
    col = "red")
title("Restaurants with Most and Least Funny Reviews (Green = Most, Red = Least)")

plot of chunk unnamed-chunk-10

data.mostuseful = subset(amfood.rest, useful > 4)
data.leastuseful = subset(amfood.rest, (useful > 0) & (useful < 0.2))
plot(arizona.blkgrp10, xlim = limits$x, ylim = limits$y)
points(data.mostuseful$longitude, data.mostuseful$latitude, pch = 16, cex = 1.5, 
    col = "green")
points(data.leastuseful$longitude, data.leastuseful$latitude, pch = 16, cex = 1.5, 
    col = "red")
contour(kde2d(data.mostuseful$longitude, data.mostuseful$latitude), add = TRUE, 
    col = "green")
contour(kde2d(data.leastuseful$longitude, data.leastuseful$latitude), add = TRUE, 
    col = "red")
title("Restaurants with Most and Least Useful Reviews (Green = Most, Red = Least)")

plot of chunk unnamed-chunk-11

Both funnier and more useful comments are located on more on the Northern side of the town. Less funny and less useful comments are more common on the Southern side. Based on the plots, reviews of restaurants in Northern Phoenix are probably more insightful. Geographical location seems to have an impact on the quality of Yelp reviews.