We are only working Problem 9.

Preliminary Considerations

Sometimes a particular zip code does not have a state assigned to it, as happens with (for example) zips in Puerto Rico. In these cases the State is represented with the empty string "". This is impossible to observe just with a call to View(ZipGeography), but can be seen if you ask for all of the “levels” of the factor variable State:

# First ten levels of State:
levels(ZipGeography$State)[1:10]
##  [1] ""              "Massachusetts" "New Hampshire" "New York"     
##  [5] "Rhode Island"  "Maine"         "Vermont"       "Connecticut"  
##  [9] "New Jersey"    "Pennsylvania"

Also some zips do not belong to any particular timezone. In this case the timezone is indicated as a string consisting of a single space: " ". You can see this with:

# the levels of Timezone:
levels(ZipGeography$Timezone)
##  [1] " "     "EST"   "EST+1" "CST"   "MST"   "PST"   "PST-2" "PST-3"
##  [9] "PST-4" "PST-5" "PST-6" "PST-7" "PST-1"

Finally, some zips do not belong to any particular city. The missing city is also represented by " " (single space). This is seen by the code:

# the first ten levels of CityName:
levels(ZipGeography$CityName)[1:10]
##  [1] " "          "Abington"   "Accord"     "Acton"      "Acushnet"  
##  [6] "Adams"      "Adamsville" "Adjuntas"   "Agawam"     "Aguada"

The upshot is that in some of the problems below it will be necessary to filter out the problematic zip codes, with lines like this:

filter(State != "")

and this:

filter(CityName != " ")

and this:

filter(Timezone != " ")