This question uses the lattice
package. Show the code you used to generate the graphs. Use the state.region
and state.x77
data set.
Investigate the population density of USA states within the four regions: Northeast, South, North Central and West. Observe the graph of State Population as a Function of Area
. Create Figure 1.
library("lattice")
oldPar <- par(no.readonly=TRUE)
par(mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0))
drawPanel <- function(x, y, ...) {
panel.xyplot(x, y, ...)
panel.lmline(x, y, col = 2)
}
xyplot(Area ~ Population | state.region,
df77,
layout = c(4, 1),
main = "State Population as a Function of Area",
panel = drawPanel)
Figure 1: xyplot(Area ~ Population | state.region)
There is one state in the west which has a relatively dense population (Area is relatively small compared to the number of people). Which state is that?
Answer: We can observe this by first calculating the maximum population in west, since this will give us the population of the state we are looking for:
max_west <- max(df77$Population[state.region=="West"])
row.names(df77[df77$Population==max_west, ])
## [1] "California"
Use the cloud() function from the lattice package to create the 3D plot of States (Figure 2). Can you make this plot look more informative or interesting?
cloud(Income ~ Illiteracy * HS.Grad,
xlab="Illiteracy",
ylab="HSgrad",
zlab="Income",
xlim=range(Illiteracy),
ylim=range(HS.Grad),
zlim=range(Income),
scales = list(distance = 1, arrows = FALSE))
Figure 2: cloud(Income ~ Illiteracy * HS.Grad)
Here is a little bit modified version of the same plot:
cloud(Income ~ Illiteracy * HS.Grad,
screen = list(z = 105, x = -70),
panel.aspect = 0.75,
xlab="Illiteracy",
ylab="HSgrad",
zlab="Income",
xlim=range(Illiteracy),
ylim=range(HS.Grad),
zlim=range(Income),
col="red",
pch=20,
scales = list(distance = 1.25, arrows = FALSE))
Figure 2a: cloud(Income ~ Illiteracy * HS.Grad) with different cosmetics
For this question the HS Grad column has been divided into three groups. For the first group, the HS Grad percentage is less than 50. In the second group the HS Grad percentage is bewteen 50 and 57, while the HS Grad percentage for the third group is greater than 57. Use the lattice coplot()
function to create Figure 3.
HSGradGroup <- cut(HS.Grad, c(0, 50, 57, Inf), labels = c(1, 2, 3))
coplot(Income ~ Illiteracy | HSGradGroup,
data = df77,
panel = panel.smooth,
rows = 1)
Figure 3: coplot(Income ~ Illiteracy | HSGradGroup)
coplot(Murder ~ HS.Grad | state.region, df77,
rows = 1,
pch = 20,
col = state.region)
Figure 3a: coplot(Murder ~ HS.Grad | state.region)
Create the graphs for this question using the ggplot2
package.
library(ggplot2)
Recreate the graph of State Population as a Function of Area from Question 1a using the qplot() function from the ggplot2 package (Figure 4). Can you make the Population Axis look better?
# re organize data for us to work easier.
df <- data.frame(state = state.name,
region = state.region,
state.x77[, c("Population", "Area")],
row.names = NULL)
qplot(Population, Area, data = df) +
labs(title = "State Population as a Function of Area") +
facet_grid(.~region) +
scale_x_continuous(breaks = seq(0, max(Area), 10000), limits = c(0, max(Population))) +
geom_smooth(method = "lm",
formula = y~x,
se = FALSE,
colour = "blue",
na.rm = TRUE,
fullrange = TRUE
)
Question 2a
Investigate the land areas of USA regions. State regions can be found in the state.region data set. Use the qplot function from the ggplot2 package. Show the code you used to generate the graphs. Use the data sets state.x77 and state.region. Notice the axis for the charts. Use the grid.arrange() function from the package gridExtra to put two plots on the same page. Create the graphs and charts exactly as in Figure 5
library(gridExtra)
df <- data.frame(state = state.name,
region = state.region,
state.x77[, c("Illiteracy", "Income", "HS Grad")],
row.names = NULL)
p1 <- qplot(Illiteracy, Income, data = df, geom = "jitter", asp = 1) +
aes(colour = as.factor(region)) +
labs(title = "Illiteracy vs Income")
p2 <- qplot(HS.Grad, Income, data = df, geom = "jitter", asp = 1) +
aes(colour = as.factor(region)) +
labs(title = "Percent of HS graduation vs Income")
grid.arrange(p1, p2, nrow=2)
Two Plots on the Same Figure
The figure for this question is a replica of one from Wickham’s ggplot2 text. To produce the figure, use the following data frame:
df <- data.frame(
x=c(3,1,5),
y=c(2,4,6),
label=c("a","b","c")
)
Next create a variable named myPlot as follows:
myPlot <- ggplot(df, aes(x, y, label = label)) +
xlab(NULL) + ylab(NULL)
Now create eight graphs. Each of these starts with myPlot and adds only one geom layer along with specifying a title. The first graph adds a geom point() layer and annotates with the title ggtitle(”geom point”). Recall that you would need to add an extra line of code (ie. add the line pP1) to get the plot to print.
pP1 <- myPlot + geom_point() + ggtitle("geom_point")
pP1
Each of the remaining seven graphs need to have one of the following layers added: geom_bar(stat=”identity”)
, geom_line()
, geom_area()
, geom_path()
, geom_text()
, geom_tile()
, geom_polygon()
. Exactly replicate the graph as shown in Figure 6. Show your code and explain the charts.
pP1 <- myPlot + geom_point() + ggtitle("geom_point")
pP2 <- myPlot + geom_bar(stat = "identity") + ggtitle("geom_bar")
pP3 <- myPlot + geom_line() + ggtitle("geom_line")
pP4 <- myPlot + geom_area() + ggtitle("geom_area")
pP5 <- myPlot + geom_path() + ggtitle("geom_path")
pP6 <- myPlot + geom_text() + ggtitle("geom_text")
pP7 <- myPlot + geom_tile() + ggtitle("geom_tile")
pP8 <- myPlot + geom_polygon() + ggtitle("geom_polygon")
grid.arrange(pP1, pP2, pP3, pP4, pP5, pP6, pP7, pP8, nrow=2, ncol=4)
Example: geom xxxx()
The following few lines of code will be useful:
# install.packages("maps")
library(maps)
stateMap <- map_data("state")
Create a population density map of the United States in the year 1977. Make the map look exactly like the one in Figure 7. Notice that keywidth is set to 3, in order to make the legend wider. Check out the annotate() function described in the geom text() documentation. The state with the maximum population density at that time was New Jersey, while the minimum was in Alaska.
df <- data.frame(state = tolower(state.name),
region = state.region,
state.x77[, c("Population", "Area")],
row.names = NULL)
names(df) <- tolower(names(df))
stateMap <- map_data("state")
# replace `region` with `state` so that `merge` could happen.
colnames(stateMap)[5] <- "state"
df <- merge(stateMap, df)
df$density <- df$population / df$area
df <- df[order(df$order),]
qplot(long, lat, data=df, group=group, fill=density, geom="polygon") +
labs(title = "Population Density (1000 people/square miles) of the USA in 1977") +
borders("state", colour = "white", size = 0.1) +
coord_fixed(1.3)
What about using the data at http://www.census.gov/popest/data/state/totals/2013/index.html to create a more current version of the map in Figure 7?
Let’s set up a data frame will help us plot easier:
remote_data <- read.csv("http://www.census.gov/popest/data/national/totals/2013/files/NST-EST2013-popchg2010_2013.csv")
# we are using the remote data for only name and the most recent
# population estimation.
df <- subset(remote_data,
STATE > 0 & NAME %in% state.name,
select = c(NAME, POPESTIMATE2013))
rownames(df) <- NULL
names(df) <- c("state", "population")
df$state <- tolower(df$state)
# Area will come from the state.x77 data
df <- cbind(df, area=NA, density=NA, division = state.division, region=state.region)
for (i in 1:50) {
df$area[i] <- df77$Area[i]
df$density[i] <- (df$population[i] / 1000) / df$area[i]
}
# unify column name case (e.g all will be lower cased)
names(df) <- tolower(names(df))
# replace `region` with `state` so that `merge` could happen.
colnames(stateMap)[5] <- "state"
df <- merge(stateMap, df)
df <- df[order(df$order),]
Here is the first 6 element of the data frame
we are gonna use:
head(df, n = 6L)
## state long lat group order subregion population area
## 1 alabama -87.46201 30.38968 1 1 <NA> 4833722 50708
## 2 alabama -87.48493 30.37249 1 2 <NA> 4833722 50708
## 6 alabama -87.52503 30.37249 1 3 <NA> 4833722 50708
## 7 alabama -87.53076 30.33239 1 4 <NA> 4833722 50708
## 8 alabama -87.57087 30.32665 1 5 <NA> 4833722 50708
## 9 alabama -87.58806 30.32665 1 6 <NA> 4833722 50708
## density division region
## 1 0.09532464 East South Central South
## 2 0.09532464 East South Central South
## 6 0.09532464 East South Central South
## 7 0.09532464 East South Central South
## 8 0.09532464 East South Central South
## 9 0.09532464 East South Central South
qplot(long, lat, data=df, group=group, fill=density, geom="polygon") +
labs(title = "Population Density (1000 people/square miles) of the USA in 2013") +
borders("state", colour = "white", size = 0.1) +
coord_fixed(1.3)
Use the ggplot2
package. Show the code you used to generate the graphs. What about coloring a USA map using either the state.division
or state.region
from the state
data set?
Building on the data set we created in Question 4e
qplot(long, lat, data=df, group=group, fill=division, geom="polygon") +
labs(title = "USA state divisions") +
borders("state", colour = "white", size = 0.1) +
coord_fixed(1.3)
qplot(long, lat, data=df, group=group, fill=region, geom="polygon") +
labs(title = "USA state regions") +
borders("state", colour = "white", size = 0.1) +
coord_fixed(1.3)