Question 1

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv

fileUrl1 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv"
download.file(fileUrl1, destfile = "./Dataset/Quiz1-01.csv", method = "curl")


and load the data into R. The code book, describing the variable names is here: https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf

quiz1Data <- read.csv("./Dataset/Quiz1-01.csv")
head(quiz1Data)

How many properties are worth $1,000,000 or more?


Answer

sum(quiz1Data$VAL == 24, na.rm = TRUE)
[1] 53


Question 2

Use the data you loaded from Question 1. Consider the variable FES in the code book. Which of the “tidy data” principles does this variable violate?

Answer

Tidy data has one variable per column.



Question 3

Download the Excel spreadsheet on Natural Gas Aquisition Program here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx (original data source: http://catalog.data.gov/dataset/natural-gas-acquisition-program)

fileUrl1 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx"
download.file(fileUrl1, destfile = "./Dataset/Quiz1-03.xlsx", method = "curl")
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0
  0     0    0     0    0     0      0      0 --:--:--  0:00:02 --:--:--     0
100 16197  100 16197    0     0   5436      0  0:00:02  0:00:02 --:--:--  5435
dateDownloaded <- date()
dateDownloaded
[1] "Wed Jun 20 21:33:55 2018"


Read rows 18-23 and columns 7-15 into R and assign the result to a variable called: dat

library(xlsx)
col <- 7:15
row <- 18:23
dat <- read.xlsx("./Dataset/Quiz1-03.xlsx", sheetIndex=1, colIndex = col, rowIndex = row)
dat


What is the value of:

Answer

sum(dat$Zip*dat$Ext, na.rm=T)
[1] 36534720


Question 4

Read the XML data on Baltimore restaurants from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml

library(XML)
fileUrl3 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
BalResto <- xmlTreeParse(sub("s", "", fileUrl3), useInternal=TRUE)
rootNode <- xmlRoot(BalResto)


How many restaurants have zipcode 21231?

Answer

zip <- xpathSApply(rootNode, "//zipcode", xmlValue)
sum(zip == 21231)
[1] 127


Question 5

The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:

https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv

fileUrl4 <- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv"
download.file(fileUrl4, destfile = "./Dataset/Quiz1-05.csv", method = "curl")


using the fread() command load the data into an R object: DT

library(data.table)
DT <- fread("./Dataset/Quiz1-05.csv")
DT


The following are ways to calculate the average value of the variable: pwgtp15

broken down by sex. Using the data.table package, which will deliver the fastest user time?

Answer


  • option a: rowMeans(DT[DT$SEX==1]); rowMeans(DT[DT$SEX==2])
system.time(rowMeans(DT[DT$SEX==1]), rowMeans(DT[DT$SEX==2]))
Error in rowMeans(DT[DT$SEX == 2]) : 'x' must be numeric
  • option b: DT[DT$SEX==1,]\(pwgtp15), mean(DT[DT\)SEX==2,]$pwgtp15))
system.time(mean(DT[DT$SEX==1,]$pwgtp15), mean(DT[DT$SEX==2,]$pwgtp15))
   user  system elapsed 
  0.052   0.000   0.050 
  • option c: DT[,mean(pwgtp15),by=SEX])
system.time(DT[,mean(pwgtp15),by=SEX])
   user  system elapsed 
  0.004   0.000   0.003 
  • option d: sapply(split(DT\(pwgtp15,DT\)SEX),mean))
system.time(sapply(split(DT$pwgtp15,DT$SEX),mean))
   user  system elapsed 
  0.000   0.000   0.002 
  • option e: tapply(DT\(pwgtp15,DT\)SEX,mean))
system.time(tapply(DT$pwgtp15,DT$SEX,mean))
   user  system elapsed 
  0.000   0.000   0.002 
  • option f: mean(DT\(pwgtp15,by=DT\)SEX))
system.time(mean(DT$pwgtp15,by=DT$SEX))
   user  system elapsed 
  0.000   0.000   0.001 



END

LS0tCnRpdGxlOiAiR2V0dGluZyBhbmQgQ2xlYW5pbmcgRGF0YTogV2VlayAxIFF1aXoiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KCjxiciAvPgo8YnIgLz4KCi0tLS0tLS0tLQojIyBRdWVzdGlvbiAxCgpUaGUgQW1lcmljYW4gQ29tbXVuaXR5IFN1cnZleSBkaXN0cmlidXRlcyBkb3dubG9hZGFibGUgZGF0YSBhYm91dCBVbml0ZWQgU3RhdGVzIGNvbW11bml0aWVzLiBEb3dubG9hZCB0aGUgMjAwNiBtaWNyb2RhdGEgc3VydmV5IGFib3V0IGhvdXNpbmcgZm9yIHRoZSBzdGF0ZSBvZiBJZGFobyB1c2luZyBkb3dubG9hZC5maWxlKCkgZnJvbSBoZXJlOgoKaHR0cHM6Ly9kMzk2cXVzemE0MG9yYy5jbG91ZGZyb250Lm5ldC9nZXRkYXRhJTJGZGF0YSUyRnNzMDZoaWQuY3N2CgpgYGB7cn0KZmlsZVVybDEgPC0gImh0dHBzOi8vZDM5NnF1c3phNDBvcmMuY2xvdWRmcm9udC5uZXQvZ2V0ZGF0YSUyRmRhdGElMkZzczA2aGlkLmNzdiIKZG93bmxvYWQuZmlsZShmaWxlVXJsMSwgZGVzdGZpbGUgPSAiLi9EYXRhc2V0L1F1aXoxLTAxLmNzdiIsIG1ldGhvZCA9ICJjdXJsIikKYGBgCjxiciAvPgphbmQgbG9hZCB0aGUgZGF0YSBpbnRvIFIuIFRoZSBjb2RlIGJvb2ssIGRlc2NyaWJpbmcgdGhlIHZhcmlhYmxlIG5hbWVzIGlzIGhlcmU6Cmh0dHBzOi8vZDM5NnF1c3phNDBvcmMuY2xvdWRmcm9udC5uZXQvZ2V0ZGF0YSUyRmRhdGElMkZQVU1TRGF0YURpY3QwNi5wZGYKCmBgYHtyfQpxdWl6MURhdGEgPC0gcmVhZC5jc3YoIi4vRGF0YXNldC9RdWl6MS0wMS5jc3YiKQpoZWFkKHF1aXoxRGF0YSkKYGBgCgoKSG93IG1hbnkgcHJvcGVydGllcyBhcmUgd29ydGggJDEsMDAwLDAwMCBvciBtb3JlPwoKPGJyIC8+CgojIyMgQW5zd2VyCgpgYGB7cn0Kc3VtKHF1aXoxRGF0YSRWQUwgPT0gMjQsIG5hLnJtID0gVFJVRSkKCmBgYAoKLS0tCgo8YnIgLz4KCiMjIFF1ZXN0aW9uIDIKClVzZSB0aGUgZGF0YSB5b3UgbG9hZGVkIGZyb20gUXVlc3Rpb24gMS4gQ29uc2lkZXIgdGhlIHZhcmlhYmxlIEZFUyBpbiB0aGUgY29kZSBib29rLiBXaGljaCBvZiB0aGUgInRpZHkgZGF0YSIgcHJpbmNpcGxlcyBkb2VzIHRoaXMgdmFyaWFibGUgdmlvbGF0ZT8KCiMjIyBBbnN3ZXIKCipUaWR5IGRhdGEgaGFzIG9uZSB2YXJpYWJsZSBwZXIgY29sdW1uLioKCgotLS0KCjxiciAvPgoKIyMgUXVlc3Rpb24gMwoKRG93bmxvYWQgdGhlIEV4Y2VsIHNwcmVhZHNoZWV0IG9uIE5hdHVyYWwgR2FzIEFxdWlzaXRpb24gUHJvZ3JhbSBoZXJlOgoKaHR0cHM6Ly9kMzk2cXVzemE0MG9yYy5jbG91ZGZyb250Lm5ldC9nZXRkYXRhJTJGZGF0YSUyRkRBVEEuZ292X05HQVAueGxzeAoob3JpZ2luYWwgZGF0YSBzb3VyY2U6IGh0dHA6Ly9jYXRhbG9nLmRhdGEuZ292L2RhdGFzZXQvbmF0dXJhbC1nYXMtYWNxdWlzaXRpb24tcHJvZ3JhbSkKCmBgYHtyfQpmaWxlVXJsMiA8LSAiaHR0cHM6Ly9kMzk2cXVzemE0MG9yYy5jbG91ZGZyb250Lm5ldC9nZXRkYXRhJTJGZGF0YSUyRkRBVEEuZ292X05HQVAueGxzeCIKZG93bmxvYWQuZmlsZShmaWxlVXJsMiwgZGVzdGZpbGUgPSAiLi9EYXRhc2V0L1F1aXoxLTAzLnhsc3giLCBtZXRob2QgPSAiY3VybCIpCgpkYXRlRG93bmxvYWRlZCA8LSBkYXRlKCkKZGF0ZURvd25sb2FkZWQKYGBgCgo8YnIgLz4KUmVhZCByb3dzIDE4LTIzIGFuZCBjb2x1bW5zIDctMTUgaW50byBSIGFuZCBhc3NpZ24gdGhlIHJlc3VsdCB0byBhIHZhcmlhYmxlIGNhbGxlZDogZGF0CgpgYGB7cn0KbGlicmFyeSh4bHN4KQoKY29sIDwtIDc6MTUKcm93IDwtIDE4OjIzCmRhdCA8LSByZWFkLnhsc3goIi4vRGF0YXNldC9RdWl6MS0wMy54bHN4Iiwgc2hlZXRJbmRleD0xLCBjb2xJbmRleCA9IGNvbCwgcm93SW5kZXggPSByb3cpCmRhdApgYGAKCjxiciAvPgpXaGF0IGlzIHRoZSB2YWx1ZSBvZjoKCiMjIyBBbnN3ZXIKCmBgYHtyfQpzdW0oZGF0JFppcCpkYXQkRXh0LCBuYS5ybT1UKQpgYGAKCi0tLQoKPGJyIC8+CgojIyBRdWVzdGlvbiA0CgpSZWFkIHRoZSBYTUwgZGF0YSBvbiBCYWx0aW1vcmUgcmVzdGF1cmFudHMgZnJvbSBoZXJlOgoKaHR0cHM6Ly9kMzk2cXVzemE0MG9yYy5jbG91ZGZyb250Lm5ldC9nZXRkYXRhJTJGZGF0YSUyRnJlc3RhdXJhbnRzLnhtbAoKYGBge3J9CmxpYnJhcnkoWE1MKQpmaWxlVXJsMyA8LSAiaHR0cHM6Ly9kMzk2cXVzemE0MG9yYy5jbG91ZGZyb250Lm5ldC9nZXRkYXRhJTJGZGF0YSUyRnJlc3RhdXJhbnRzLnhtbCIKQmFsUmVzdG8gPC0geG1sVHJlZVBhcnNlKHN1YigicyIsICIiLCBmaWxlVXJsMyksIHVzZUludGVybmFsPVRSVUUpCnJvb3ROb2RlIDwtIHhtbFJvb3QoQmFsUmVzdG8pCmBgYAoKCjxiciAvPgpIb3cgbWFueSByZXN0YXVyYW50cyBoYXZlIHppcGNvZGUgMjEyMzE/IAoKIyMjIEFuc3dlcgoKYGBge3J9CnppcCA8LSB4cGF0aFNBcHBseShyb290Tm9kZSwgIi8vemlwY29kZSIsIHhtbFZhbHVlKQpzdW0oemlwID09IDIxMjMxKQpgYGAKCi0tLQoKPGJyIC8+CgojIyBRdWVzdGlvbiA1CgpUaGUgQW1lcmljYW4gQ29tbXVuaXR5IFN1cnZleSBkaXN0cmlidXRlcyBkb3dubG9hZGFibGUgZGF0YSBhYm91dCBVbml0ZWQgU3RhdGVzIGNvbW11bml0aWVzLiBEb3dubG9hZCB0aGUgMjAwNiBtaWNyb2RhdGEgc3VydmV5IGFib3V0IGhvdXNpbmcgZm9yIHRoZSBzdGF0ZSBvZiBJZGFobyB1c2luZyBkb3dubG9hZC5maWxlKCkgZnJvbSBoZXJlOgoKaHR0cHM6Ly9kMzk2cXVzemE0MG9yYy5jbG91ZGZyb250Lm5ldC9nZXRkYXRhJTJGZGF0YSUyRnNzMDZwaWQuY3N2CgpgYGB7cn0KZmlsZVVybDQgPC0gImh0dHBzOi8vZDM5NnF1c3phNDBvcmMuY2xvdWRmcm9udC5uZXQvZ2V0ZGF0YSUyRmRhdGElMkZzczA2cGlkLmNzdiIKZG93bmxvYWQuZmlsZShmaWxlVXJsNCwgZGVzdGZpbGUgPSAiLi9EYXRhc2V0L1F1aXoxLTA1LmNzdiIsIG1ldGhvZCA9ICJjdXJsIikKYGBgCgo8YnIgLz4KdXNpbmcgdGhlIGZyZWFkKCkgY29tbWFuZCBsb2FkIHRoZSBkYXRhIGludG8gYW4gUiBvYmplY3Q6IERUCgpgYGB7cn0KbGlicmFyeShkYXRhLnRhYmxlKQpEVCA8LSBmcmVhZCgiLi9EYXRhc2V0L1F1aXoxLTA1LmNzdiIpCkRUCmBgYAoKPGJyIC8+ClRoZSBmb2xsb3dpbmcgYXJlIHdheXMgdG8gY2FsY3VsYXRlIHRoZSBhdmVyYWdlIHZhbHVlIG9mIHRoZSB2YXJpYWJsZTogcHdndHAxNQoKYnJva2VuIGRvd24gYnkgc2V4LiBVc2luZyB0aGUgZGF0YS50YWJsZSBwYWNrYWdlLCB3aGljaCB3aWxsIGRlbGl2ZXIgdGhlIGZhc3Rlc3QgdXNlciB0aW1lPwo8YnIgLz4KCiMjIyBBbnN3ZXIKCjxiciAvPgoKKiBvcHRpb24gYTogcm93TWVhbnMoRFRbRFQkU0VYPT0xXSk7IHJvd01lYW5zKERUW0RUJFNFWD09Ml0pCmBgYHtyfQpzeXN0ZW0udGltZShyb3dNZWFucyhEVFtEVCRTRVg9PTFdKSwgcm93TWVhbnMoRFRbRFQkU0VYPT0yXSkpCiMgUHJvZHVjZSBlcnJvcgpgYGAKKiBvcHRpb24gYjogRFRbRFQkU0VYPT0xLF0kcHdndHAxNSksIG1lYW4oRFRbRFQkU0VYPT0yLF0kcHdndHAxNSkpCmBgYHtyfQpzeXN0ZW0udGltZShtZWFuKERUW0RUJFNFWD09MSxdJHB3Z3RwMTUpLCBtZWFuKERUW0RUJFNFWD09MixdJHB3Z3RwMTUpKQpgYGAKKiBvcHRpb24gYzogRFRbLG1lYW4ocHdndHAxNSksYnk9U0VYXSkKYGBge3J9CnN5c3RlbS50aW1lKERUWyxtZWFuKHB3Z3RwMTUpLGJ5PVNFWF0pCmBgYAoqIG9wdGlvbiBkOiBzYXBwbHkoc3BsaXQoRFQkcHdndHAxNSxEVCRTRVgpLG1lYW4pKQpgYGB7cn0Kc3lzdGVtLnRpbWUoc2FwcGx5KHNwbGl0KERUJHB3Z3RwMTUsRFQkU0VYKSxtZWFuKSkKYGBgCiogb3B0aW9uIGU6IHRhcHBseShEVCRwd2d0cDE1LERUJFNFWCxtZWFuKSkKYGBge3J9CnN5c3RlbS50aW1lKHRhcHBseShEVCRwd2d0cDE1LERUJFNFWCxtZWFuKSkKYGBgCiogb3B0aW9uIGY6IG1lYW4oRFQkcHdndHAxNSxieT1EVCRTRVgpKQpgYGB7cn0Kc3lzdGVtLnRpbWUobWVhbihEVCRwd2d0cDE1LGJ5PURUJFNFWCkpCmBgYAo8YnIgLz4KCi0tLS0tLS0tLQo8Y2VudGVyPioqRU5EKio8L2NlbnRlcj4KKioqKioqKioq