Shige
library(Zelig)
data(turnout)
hist(turnout$educate)
library(lattice)
histogram(~ educate, data=turnout)
Based on the book The Grammer of Graphics and thus is:
library(ggplot2)
p <- ggplot(turnout, aes(x=educate)) + geom_histogram()
print(p)
library(ggthemes)
p1 <- ggplot(turnout, aes(educate)) + geom_histogram() + theme_tufte()
print(p1)
library(ggthemes)
p2 <- ggplot(turnout, aes(educate)) + geom_histogram() + theme_stata()
print(p2)
p3 <- ggplot(turnout, aes(educate)) + geom_histogram() + theme_economist()
print(p3)
p4 <- ggplot(turnout, aes(educate)) + geom_histogram() + theme_excel()
print(p4)
Here are some more themes.
ggplot(turnout, aes(educate)) + geom_histogram() + facet_grid(race ~ .)
ggplot(turnout, aes(educate)) + geom_histogram() + facet_grid(vote ~ .)
ggplot(turnout, aes(educate)) + geom_histogram() + facet_grid(race ~ vote)
If you like a frequency polygon:
ggplot(turnout, aes(educate)) + geom_freqpoly() + facet_grid(race ~ vote)
Education by race:
ggplot(turnout, aes(y=educate, x=race)) + geom_boxplot()
Let's create the data:
dat <- data.frame(xval=1:4, yval=c(3,5,6,9), group=c("A","B","A","B"))
names(dat)
[1] "xval" "yval" "group"
A basic ggplot() specification looks like following, which creates a ggplot object using the data frame dat and specifies the default aesthetic mappings within aes()
ggplot(dat, aes(x = xval, y = yval))
We also need to tell ggplot() what geometric objects (e.g., bars, points, lines) to put there. Let's begin with a scatter plot:
ggplot(dat, aes(x = xval, y = yval)) + geom_point()
But we can easily turn it into a line chart
ggplot(dat, aes(x=xval, y=yval)) + geom_line()
This would be even better
ggplot(dat, aes(x=xval, y=yval)) + geom_point() + geom_line()
ggplot(dat, aes(x=xval, y=yval)) + geom_point(aes(colour=group))
ggplot(dat, aes(x=xval, y=yval)) + geom_point(aes(colour=group)) + geom_line(aes(colour=group))
ggplot(dat, aes(x = xval, y = yval)) + geom_point() + geom_smooth(method = "lm")
As you can see, it is just another layer.
ggplot(dat, aes(x = xval, y = yval)) + geom_point() + stat_smooth(method = "lm")
It turns out that “geom_smooth()” is the same as “stat_smooth()”, which is one of the many statistical functions that can be used with ggplot().
ggplot(turnout, aes(x=educate,y=income)) + geom_point() + geom_smooth(method="lm")
ggplot(turnout, aes(x=educate,y=income)) + geom_point() + geom_smooth()
Linear smoother
ggplot(turnout, aes(x=educate,y=income, colour=race)) + geom_point() + scale_colour_manual(values=c("red", "blue")) + geom_smooth(method="lm")
Nonlinear smoother
ggplot(turnout, aes(x=educate,y=income, colour=race)) + geom_point() + scale_colour_manual(values=c("red", "blue")) + geom_smooth()
Mean
ggplot(turnout, aes(x = educate, y = income, colour = race)) + scale_colour_manual(values = c("red",
"blue")) + stat_summary(fun.y = mean, geom = "point")
Median
ggplot(turnout, aes(x = educate, y = income, colour = race)) + scale_colour_manual(values = c("red",
"blue")) + stat_summary(fun.y = median, geom = "point")
| race | age | educate | income | vote |
|---|---|---|---|---|
| white | 60 | 14 | 3.3458 | 1 |
| white | 51 | 10 | 1.8561 | 0 |
| white | 24 | 12 | 0.6304 | 0 |
| white | 38 | 8 | 3.4183 | 1 |
| white | 25 | 12 | 2.7852 | 1 |
These are the first 5 cases of the original data. Suppose we want to transform the data into something like
| race | educate | income | vote |
|---|---|---|---|
| others | 11.04 | 2.927 | 0.6267 |
| white | 12.24 | 4.051 | 0.7664 |
Or
| age | educate | income | vote |
|---|---|---|---|
| 17 | 14.0 | 6.78 | 1.000 |
| 18 | 11.2 | 2.87 | 0.364 |
| 19 | 12.5 | 3.43 | 0.714 |
| 20 | 13.1 | 3.15 | 0.467 |
| 21 | 12.3 | 3.07 | 0.638 |
| 22 | 12.1 | 2.57 | 0.477 |
| 23 | 12.5 | 2.44 | 0.535 |
| 24 | 13.0 | 3.44 | 0.600 |
| 25 | 12.9 | 3.92 | 0.628 |
| 26 | 13.0 | 3.57 | 0.755 |
| 27 | 12.9 | 4.10 | 0.722 |
| 28 | 12.8 | 3.70 | 0.692 |
| 29 | 13.1 | 3.93 | 0.698 |
| 30 | 13.5 | 4.48 | 0.595 |
| 31 | 13.1 | 4.07 | 0.712 |
| 32 | 12.9 | 4.21 | 0.780 |
| 33 | 13.3 | 3.86 | 0.766 |
| 34 | 12.7 | 4.27 | 0.806 |
| 35 | 12.9 | 4.60 | 0.771 |
| 36 | 13.2 | 4.53 | 0.688 |
Reshape2 has two main command, “melt” and “cast”.
For our example, the first step looks like this:
new <- melt(turnout, id = c("race", "age"))
This creates the following data:
| race | age | variable | value |
|---|---|---|---|
| white | 17 | educate | 14.0000 |
| white | 17 | income | 6.7838 |
| white | 17 | vote | 1.0000 |
| others | 18 | educate | 10.0000 |
| others | 18 | income | 0.9457 |
| others | 18 | vote | 0.0000 |
| white | 18 | educate | 12.0000 |
| white | 18 | educate | 12.0000 |
| white | 18 | educate | 12.0000 |
| white | 18 | educate | 12.0000 |
| white | 18 | educate | 12.0000 |
| white | 18 | educate | 9.0000 |
| white | 18 | educate | 13.0000 |
| white | 18 | educate | 10.0000 |
| white | 18 | educate | 10.0000 |
| white | 18 | educate | 11.0000 |
| white | 18 | income | 7.0036 |
| white | 18 | income | 0.1936 |
| white | 18 | income | 4.6690 |
| white | 18 | income | 3.7972 |
| white | 18 | income | 6.2740 |
| white | 18 | income | 0.7294 |
| white | 18 | income | 0.9214 |
| white | 18 | income | 3.1356 |
| white | 18 | income | 0.2364 |
Now we can “cast” the data into the form that we want:
new_cast <- dcast(new, race ~ variable, mean)
produces the following:
| race | educate | income | vote |
|---|---|---|---|
| others | 11.04 | 2.927 | 0.6267 |
| white | 12.24 | 4.051 | 0.7664 |
And
new_cast <- dcast(new, race + age ~ variable, mean)
produces:
| age | race | educate | income | vote |
|---|---|---|---|---|
| 17 | white | 14.00 | 6.7838 | 1.0000 |
| 18 | others | 10.00 | 0.9457 | 0.0000 |
| 18 | white | 11.30 | 3.0608 | 0.4000 |
| 19 | others | 12.00 | 2.8296 | 0.3333 |
| 19 | white | 12.64 | 3.5957 | 0.8182 |
| 20 | others | 14.00 | 2.8769 | 0.2500 |
| 20 | white | 12.73 | 3.2490 | 0.5455 |
| 21 | others | 12.09 | 2.2023 | 0.7273 |
| 21 | white | 12.31 | 3.3388 | 0.6111 |
| 22 | others | 11.70 | 1.6117 | 0.5000 |
| 22 | white | 12.21 | 2.8552 | 0.4706 |
| 23 | others | 12.29 | 1.2820 | 0.2857 |
| 23 | white | 12.53 | 2.6662 | 0.5833 |
| 24 | others | 11.75 | 3.6401 | 0.3750 |
| 24 | white | 13.24 | 3.3950 | 0.6486 |
| 25 | others | 11.71 | 3.6925 | 0.4286 |
| 25 | white | 13.17 | 3.9697 | 0.6667 |
| 26 | others | 11.73 | 2.6948 | 0.7273 |
| 26 | white | 13.33 | 3.8026 | 0.7619 |
| 27 | others | 12.80 | 3.1278 | 0.8000 |
ggmap is a cool map making package based on ggplot2.
library(ggmap)
nyc <- "new york city"
qmap(nyc, zoom=12)