Question 1

This question uses the lattice package. Show the code you used to generate the graphs. Use the state.region and state.x77 data set.

Question 1a

Investigate the population density of USA states within the four regions: Northeast, South, North Central and West. Observe the graph of State Population as a Function of Area. Create Figure 1.

library("lattice")
oldPar <- par(no.readonly=TRUE)
par(mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0))

drawPanel <- function(x, y, ...) {
  panel.xyplot(x, y, ...)
  panel.lmline(x, y, col = 2)
}

xyplot(Area ~ Population | state.region, 
       df77, 
       layout = c(4, 1),
       main = "State Population as a Function of Area", 
       panel = drawPanel)
Figure 1: xyplot(Area ~ Population | state.region)

Figure 1: xyplot(Area ~ Population | state.region)

Question 1b

There is one state in the west which has a relatively dense population (Area is relatively small compared to the number of people). Which state is that?

Answer: We can observe this by first calculating the maximum population in west, since this will give us the population of the state we are looking for:

max_west <- max(df77$Population[state.region=="West"])
row.names(df77[df77$Population==max_west, ])
## [1] "California"

Question 1c

Use the cloud() function from the lattice package to create the 3D plot of States (Figure 2). Can you make this plot look more informative or interesting?

cloud(Income ~ Illiteracy * HS.Grad, 
      xlab="Illiteracy",
      ylab="HSgrad",
      zlab="Income",
      xlim=range(Illiteracy), 
      ylim=range(HS.Grad), 
      zlim=range(Income), 
      scales = list(distance = 1, arrows = FALSE))
Figure 2: cloud(Income ~ Illiteracy * HS.Grad)

Figure 2: cloud(Income ~ Illiteracy * HS.Grad)

Here is a little bit modified version of the same plot:

cloud(Income ~ Illiteracy * HS.Grad, 
      screen = list(z = 105, x = -70), 
      panel.aspect = 0.75,
      xlab="Illiteracy",
      ylab="HSgrad",
      zlab="Income",
      xlim=range(Illiteracy), 
      ylim=range(HS.Grad), 
      zlim=range(Income),
      col="red",
      pch=20,
      scales = list(distance = 1.25, arrows = FALSE))
Figure 2a: cloud(Income ~ Illiteracy * HS.Grad) with different cosmetics

Figure 2a: cloud(Income ~ Illiteracy * HS.Grad) with different cosmetics

Question 1d

For this question the HS Grad column has been divided into three groups. For the first group, the HS Grad percentage is less than 50. In the second group the HS Grad percentage is bewteen 50 and 57, while the HS Grad percentage for the third group is greater than 57. Use the lattice coplot() function to create Figure 3.

HSGradGroup <- cut(HS.Grad, c(0, 50, 57, Inf), labels = c(1, 2, 3))
coplot(Income ~ Illiteracy | HSGradGroup, 
       data = df77,
       panel = panel.smooth,
       rows = 1)
Figure 3: coplot(Income ~ Illiteracy | HSGradGroup)

Figure 3: coplot(Income ~ Illiteracy | HSGradGroup)

Question 1e

coplot(Murder ~ HS.Grad | state.region, df77,
       rows = 1,
       pch = 20,
       col = state.region)
Figure 3a: coplot(Murder ~ HS.Grad | state.region)

Figure 3a: coplot(Murder ~ HS.Grad | state.region)


Question 2

Create the graphs for this question using the ggplot2 package.

library(ggplot2)

Question 2a

Recreate the graph of State Population as a Function of Area from Question 1a using the qplot() function from the ggplot2 package (Figure 4). Can you make the Population Axis look better?

# re organize data for us to work easier.
df <- data.frame(state = state.name, 
                 region = state.region,
                 state.x77[, c("Population", "Area")],
                 row.names = NULL)

qplot(Population, Area, data = df) +
  labs(title = "State Population as a Function of Area") +
  facet_grid(.~region) +
  scale_x_continuous(breaks = seq(0, max(Area), 10000), limits = c(0, max(Population))) +
  geom_smooth(method = "lm", 
              formula = y~x, 
              se = FALSE, 
              colour = "blue", 
              na.rm = TRUE,
              fullrange = TRUE
              )
Question 2a

Question 2a

Question 2b

Investigate the land areas of USA regions. State regions can be found in the state.region data set. Use the qplot function from the ggplot2 package. Show the code you used to generate the graphs. Use the data sets state.x77 and state.region. Notice the axis for the charts. Use the grid.arrange() function from the package gridExtra to put two plots on the same page. Create the graphs and charts exactly as in Figure 5

library(gridExtra)

df <- data.frame(state = state.name, 
                 region = state.region,
                 state.x77[, c("Illiteracy", "Income", "HS Grad")],
                 row.names = NULL)

p1 <- qplot(Illiteracy, Income, data = df, geom = "jitter", asp = 1) +
  aes(colour = as.factor(region)) +
  labs(title = "Illiteracy vs Income")

p2 <- qplot(HS.Grad, Income, data = df, geom = "jitter", asp = 1) +
  aes(colour = as.factor(region)) +
  labs(title = "Percent of HS graduation vs Income")

grid.arrange(p1, p2, nrow=2)
Two Plots on the Same Figure

Two Plots on the Same Figure


Question 3

The figure for this question is a replica of one from Wickham’s ggplot2 text. To produce the figure, use the following data frame:

df <- data.frame(
  x=c(3,1,5),
  y=c(2,4,6),
  label=c("a","b","c")
)

Next create a variable named myPlot as follows:

myPlot <- ggplot(df, aes(x, y, label = label)) +
  xlab(NULL) + ylab(NULL)

Now create eight graphs. Each of these starts with myPlot and adds only one geom layer along with specifying a title. The first graph adds a geom point() layer and annotates with the title ggtitle(”geom point”). Recall that you would need to add an extra line of code (ie. add the line pP1) to get the plot to print.

pP1 <- myPlot + geom_point() + ggtitle("geom_point")
pP1

Each of the remaining seven graphs need to have one of the following layers added: geom_bar(stat=”identity”), geom_line(), geom_area(), geom_path(), geom_text(), geom_tile(), geom_polygon(). Exactly replicate the graph as shown in Figure 6. Show your code and explain the charts.

pP1 <- myPlot + geom_point() + ggtitle("geom_point")
pP2 <- myPlot + geom_bar(stat = "identity") + ggtitle("geom_bar")
pP3 <- myPlot + geom_line() + ggtitle("geom_line")
pP4 <- myPlot + geom_area() + ggtitle("geom_area")
pP5 <- myPlot + geom_path() + ggtitle("geom_path")
pP6 <- myPlot + geom_text() + ggtitle("geom_text")
pP7 <- myPlot + geom_tile() + ggtitle("geom_tile")
pP8 <- myPlot + geom_polygon() + ggtitle("geom_polygon")
grid.arrange(pP1, pP2, pP3, pP4, pP5, pP6, pP7, pP8, nrow=2, ncol=4)
Example: geom xxxx()

Example: geom xxxx()


Question 4

The following few lines of code will be useful:

# install.packages("maps")
library(maps)
stateMap <- map_data("state")

Queestion 4a

Create a population density map of the United States in the year 1977. Make the map look exactly like the one in Figure 7. Notice that keywidth is set to 3, in order to make the legend wider. Check out the annotate() function described in the geom text() documentation. The state with the maximum population density at that time was New Jersey, while the minimum was in Alaska.

df <- data.frame(state = tolower(state.name), 
                 region = state.region,
                 state.x77[, c("Population", "Area")],
                 row.names = NULL)
names(df) <- tolower(names(df))

stateMap <- map_data("state")
# replace `region` with `state` so that `merge` could happen.
colnames(stateMap)[5] <- "state"

df <- merge(stateMap, df)
df$density <- df$population / df$area
df <- df[order(df$order),]

qplot(long, lat, data=df, group=group, fill=density, geom="polygon") +
  labs(title = "Population Density (1000 people/square miles) of the USA in 1977") +
  borders("state", colour = "white", size = 0.1) +
  coord_fixed(1.3)

Question 4b

What about using the data at http://www.census.gov/popest/data/state/totals/2013/index.html to create a more current version of the map in Figure 7?

Let’s set up a data frame will help us plot easier:

remote_data <- read.csv("http://www.census.gov/popest/data/national/totals/2013/files/NST-EST2013-popchg2010_2013.csv")

# we are using the remote data for only name and the most recent
# population estimation.
df <- subset(remote_data,
             STATE > 0 & NAME %in% state.name,
             select = c(NAME, POPESTIMATE2013))
rownames(df) <- NULL
names(df) <- c("state", "population")
df$state <- tolower(df$state)
# Area will come from the state.x77 data
df <- cbind(df, area=NA, density=NA, division = state.division, region=state.region)

for (i in 1:50) {
  df$area[i] <- df77$Area[i]
  df$density[i] <- (df$population[i] / 1000) / df$area[i]
}
# unify column name case (e.g all will be lower cased)
names(df) <- tolower(names(df))


# replace `region` with `state` so that `merge` could happen.
colnames(stateMap)[5] <- "state"

df <- merge(stateMap, df)
df <- df[order(df$order),]

Here is the first 6 element of the data frame we are gonna use:

head(df, n = 6L)
##     state      long      lat group order subregion population  area
## 1 alabama -87.46201 30.38968     1     1      <NA>    4833722 50708
## 2 alabama -87.48493 30.37249     1     2      <NA>    4833722 50708
## 6 alabama -87.52503 30.37249     1     3      <NA>    4833722 50708
## 7 alabama -87.53076 30.33239     1     4      <NA>    4833722 50708
## 8 alabama -87.57087 30.32665     1     5      <NA>    4833722 50708
## 9 alabama -87.58806 30.32665     1     6      <NA>    4833722 50708
##      density           division region
## 1 0.09532464 East South Central  South
## 2 0.09532464 East South Central  South
## 6 0.09532464 East South Central  South
## 7 0.09532464 East South Central  South
## 8 0.09532464 East South Central  South
## 9 0.09532464 East South Central  South
qplot(long, lat, data=df, group=group, fill=density, geom="polygon") +
  labs(title = "Population Density (1000 people/square miles) of the USA in 2013") +
  borders("state", colour = "white", size = 0.1) +
  coord_fixed(1.3)


Question 5

Use the ggplot2 package. Show the code you used to generate the graphs. What about coloring a USA map using either the state.division or state.region from the state data set?

Building on the data set we created in Question 4e

qplot(long, lat, data=df, group=group, fill=division, geom="polygon") +
  labs(title = "USA state divisions") +
  borders("state", colour = "white", size = 0.1) +
  coord_fixed(1.3)

qplot(long, lat, data=df, group=group, fill=region, geom="polygon") +
  labs(title = "USA state regions") +
  borders("state", colour = "white", size = 0.1) +
  coord_fixed(1.3)