Convert the “chas” variable to a factor with labels “off” and “on” (referring to the Charles river).

Name: Lars Olson

if (!require("knitr")) {
    install.packages("knitr")  # do this once per lifetime
    require("knitr")  # do this once per session
}
require("MASS")

## Loading required package: MASS

Convert the “chas” variable to a factor with labels “off” and “on” (referring to the Charles river).

Boston = Boston
Boston$chas = factor(Boston$chas, levels = c(0, 1), labels = c("off", "on"))

How many rows are in the Boston data frame? How many columns?

columns = ncol(Boston)
cat(sep = "", "The number of columns are ", columns, "\n")

## The number of columns are 14

rows = nrow(Boston)
cat(sep = "", "The number of rows are ", rows, "\n")

## The number of rows are 506

What does a row represent?

Each row in this data frame represents a unique suburb in the Boston area, containing statistics related to that suburb.

What does a column represent?

Each column in this data frame represents a different statistic that is being measured for each suburb. Abstractly, a column is the set of all observations for a particular variable.

Make a density plot of tax rates. Add the data points to the plot via the function call, `rug(Boston$tax)`. (I neglected to include `rug()` in the handout.)

plot(density(Boston$tax), main = "Tax Rates")
rug(Boston$tax)

plot of chunk unnamed-chunk-4

Describe the shape of the distribution of tax rates.

It is Bimodal around approximately 300 and 700

Note that the distribution shape doesn't make sense in light of the rug representation of the data. Make a histogram of the tax rates.

hist(Boston$tax, breaks = 100, freq = TRUE, main = "Histogram of Tax Rates", 
    ylim = c(0, 140), xlim = c(100, 800))

plot of chunk unnamed-chunk-5

Why is the second peak of the density plot so large? In what way is the rug representation of the data inadequate? Write a line or two of code to figure it out, and then explain it.

max = length(which(Boston$tax == max(Boston$tax)))
cat(sep = "", "There are ", max, " occurrences of the maximum value, ", max(Boston$tax), 
    ", in this histogram.", "\n")

## There are 5 occurrences of the maximum value, 711, in this histogram.

second = sort(Boston$tax, TRUE)[max + 1]
secondcount = length(which(Boston$tax == second))
cat(sep = "", "There are ", secondcount, " occurrences of the second highest value, ", 
    second, " ,in this histogram.", "\n")

## There are 132 occurrences of the second highest value, 666 ,in this histogram.

There are multiple occurrences of tax rates at 666 and 711, which is represented as a single dark line with the rug, so it obscures the number of points at these tax rates.

Make a barplot of “chas”.

counts = table(Boston$chas)
barplot(counts, names.arg = NULL, main = "Charles River Count (chas)", ylim = c(0, 
    500))

plot of chunk unnamed-chunk-7

How many neighborhoods are on the Charles river?

Rivercount = length(which(Boston$chas == "on"))
cat(sep = "", "There are ", Rivercount, " neighborhoods on the Charles river.", 
    "\n")

## There are 35 neighborhoods on the Charles river.

Make a single graph consisting of three plots:

#* a scatterplot of “nox” on the y-axis vs. “dis” on the x-axis #* a boxplot of “nox”“ left of the scatterplot's y-axis #* a boxplot of "dis”“ below the scatterplot's x-axis

m = matrix(data = c(1, 3, 3, 3, 1, 3, 3, 3, 1, 3, 3, 3, 0, 2, 2, 2), nrow = 4, 
    ncol = 4, byrow = TRUE)
layout(m)
boxplot(Boston$nox)
boxplot(Boston$dis, horizontal = TRUE)
plot(Boston$dis, Boston$nox)

plot of chunk unnamed-chunk-9