The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here: https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06hid.csv and load the data into R. The code book, describing the variable names is here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FPUMSDataDict06.pdf
we know by following the code book is that VAL is the variable that specifies property value, and that range 24 means value of 1m or more.
getwd()
## [1] "D:/data/hopkins-clean"
housing<-read.csv('getdata-data-ss06hid.csv')
str(housing)
## 'data.frame': 6496 obs. of 188 variables:
## $ RT : Factor w/ 1 level "H": 1 1 1 1 1 1 1 1 1 1 ...
## $ SERIALNO: int 186 306 395 506 835 989 1861 2120 2278 2428 ...
## $ DIVISION: int 8 8 8 8 8 8 8 8 8 8 ...
## $ PUMA : int 700 700 100 700 800 700 700 200 400 500 ...
## $ REGION : int 4 4 4 4 4 4 4 4 4 4 ...
## $ ST : int 16 16 16 16 16 16 16 16 16 16 ...
## $ ADJUST : int 1015675 1015675 1015675 1015675 1015675 1015675 1015675 1015675 1015675 1015675 ...
## $ WGTP : int 89 310 106 240 118 115 0 35 47 51 ...
## $ NP : int 4 1 2 4 4 4 1 1 2 2 ...
## $ TYPE : int 1 1 1 1 1 1 2 1 1 1 ...
## $ ACR : int 1 NA 1 1 2 1 NA 1 1 1 ...
## $ AGS : int NA NA NA NA 1 NA NA NA NA NA ...
## $ BDS : int 4 1 3 4 5 3 NA 2 3 2 ...
## $ BLD : int 2 7 2 2 2 2 NA 1 2 1 ...
## $ BUS : int 2 NA 2 2 2 2 NA 2 2 2 ...
## $ CONP : int NA NA NA NA NA NA NA NA NA NA ...
## $ ELEP : int 180 60 70 40 250 130 NA 40 2 20 ...
## $ FS : int 0 0 0 0 0 0 0 0 0 0 ...
## $ FULP : int 2 2 2 2 2 2 NA 480 2 2 ...
## $ GASP : int 3 3 30 80 3 3 NA 3 3 140 ...
## $ HFL : int 3 3 1 1 3 3 NA 4 3 1 ...
## $ INSP : int 600 NA 200 200 700 250 NA NA 770 120 ...
## $ KIT : int 1 1 1 1 1 1 NA 1 1 1 ...
## $ MHP : int NA NA NA NA NA NA NA NA NA 220 ...
## $ MRGI : int 1 NA NA 1 1 1 NA NA 1 NA ...
## $ MRGP : int 1300 NA NA 860 1900 700 NA NA 750 NA ...
## $ MRGT : int 1 NA NA 1 1 1 NA NA 1 NA ...
## $ MRGX : int 1 NA 3 1 1 1 NA NA 1 3 ...
## $ PLM : int 1 1 1 1 1 1 NA 1 1 1 ...
## $ RMS : int 9 2 7 6 7 6 NA 4 6 5 ...
## $ RNTM : int NA 2 NA NA NA NA NA NA NA NA ...
## $ RNTP : int NA 600 NA NA NA NA NA NA NA NA ...
## $ SMP : int NA NA NA 400 650 400 NA NA NA NA ...
## $ TEL : int 1 1 1 1 1 1 NA 1 1 1 ...
## $ TEN : int 1 3 2 1 1 1 NA 4 1 2 ...
## $ VACS : int NA NA NA NA NA NA NA NA NA NA ...
## $ VAL : int 17 NA 18 19 20 15 NA NA 13 1 ...
## $ VEH : int 3 1 2 3 5 2 NA 1 2 2 ...
## $ WATP : int 840 1 50 500 2 1200 NA 650 660 2 ...
## $ YBL : int 5 3 5 2 3 5 NA 5 3 5 ...
## $ FES : int 2 NA 7 1 1 2 NA NA 2 NA ...
## $ FINCP : int 105600 NA 9400 66000 93000 61000 NA NA 209000 NA ...
## $ FPARC : int 2 NA 2 1 2 1 NA NA 4 NA ...
## $ GRNTP : int NA 660 NA NA NA NA NA NA NA NA ...
## $ GRPIP : int NA 23 NA NA NA NA NA NA NA NA ...
## $ HHL : int 1 1 1 1 1 1 NA 1 1 2 ...
## $ HHT : int 1 4 3 1 1 1 NA 6 1 5 ...
## $ HINCP : int 105600 34000 9400 66000 93000 61000 NA 10400 209000 35400 ...
## $ HUGCL : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ HUPAC : int 2 4 2 1 2 1 NA 4 4 4 ...
## $ HUPAOC : int 2 4 2 1 2 1 NA 4 4 4 ...
## $ HUPARC : int 2 4 2 1 2 1 NA 4 4 4 ...
## $ LNGI : int 1 1 1 1 1 1 NA 1 1 2 ...
## $ MV : int 4 3 2 3 1 4 5 5 1 1 ...
## $ NOC : int 2 0 1 2 1 2 NA 0 0 0 ...
## $ NPF : int 4 NA 2 4 4 4 NA NA 2 NA ...
## $ NPP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ NR : int 0 0 0 0 0 0 NA 0 0 1 ...
## $ NRC : int 2 0 1 2 1 2 NA 0 0 0 ...
## $ OCPIP : int 18 NA 23 26 36 26 NA NA 5 7 ...
## $ PARTNER : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ PSF : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ R18 : int 1 0 1 1 1 1 NA 0 0 0 ...
## $ R60 : int 0 0 0 0 0 0 NA 1 1 0 ...
## $ R65 : int 0 0 0 0 0 0 NA 1 1 0 ...
## $ RESMODE : int 1 2 1 2 1 2 NA 2 1 1 ...
## $ SMOCP : int 1550 NA 179 1422 2800 1330 NA NA 805 196 ...
## $ SMX : int 3 NA NA 1 1 2 NA NA 3 NA ...
## $ SRNT : int 0 1 0 0 0 0 NA 1 0 0 ...
## $ SVAL : int 1 0 1 1 1 1 NA 0 1 0 ...
## $ TAXP : int 24 NA 16 31 25 7 NA NA 22 4 ...
## $ WIF : int 3 NA 1 2 3 1 NA NA 1 NA ...
## $ WKEXREL : int 2 NA 13 2 1 7 NA NA 6 NA ...
## $ WORKSTAT: int 3 NA 13 1 1 3 NA NA 3 NA ...
## $ FACRP : int 0 0 0 0 0 0 NA 0 0 1 ...
## $ FAGSP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FBDSP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FBLDP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FBUSP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FCONP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FELEP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FFSP : int 0 0 0 0 0 0 0 0 0 0 ...
## $ FFULP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FGASP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FHFLP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FINSP : int 0 0 0 0 0 1 NA 0 0 0 ...
## $ FKITP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FMHP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FMRGIP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FMRGP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FMRGTP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FMRGXP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FMVYP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FPLMP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FRMSP : int 0 0 0 0 0 0 NA 0 0 1 ...
## $ FRNTMP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FRNTP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FSMP : int 0 0 0 0 0 0 NA 0 0 0 ...
## $ FSMXHP : int 0 0 0 0 0 0 NA 0 0 0 ...
## [list output truncated]
nrow(housing[housing$VAL==24 & !is.na(housing$VAL),])
## [1] 53
Download the Excel spreadsheet on Natural Gas Aquisition Program here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2FDATA.gov_NGAP.xlsx
Read rows 18-23 and columns 7-15 into R and assign the result to a variable called: dat
library(xlsx)
## Warning: package 'xlsx' was built under R version 3.1.2
## Loading required package: rJava
## Warning: package 'rJava' was built under R version 3.1.2
## Loading required package: xlsxjars
## Warning: package 'xlsxjars' was built under R version 3.1.2
dat <- read.xlsx('getdata-data-DATA.gov_NGAP.xlsx', sheetIndex=1,rowIndex=18:23, colIndex=7:15)
sum(dat$Zip*dat$Ext,na.rm=T)
## [1] 36534720
Read the XML data on Baltimore restaurants from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml
How many restaurants have zipcode 21231?
library(XML)
## Warning: package 'XML' was built under R version 3.1.2
library(RCurl)
## Loading required package: bitops
##
## Attaching package: 'RCurl'
##
## The following object is masked from 'package:rJava':
##
## clone
urlXML<- "https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Frestaurants.xml"
#doc<- xmlTreeParse(urlXML, useInternal=TRUE)
xData<-getURL(urlXML, ssl.verifypeer=FALSE)
doc<-xmlParse(xData)
roots<-xmlRoot(doc)
xmlName(roots)
## [1] "response"
names(roots)
## row
## "row"
zips<-xpathSApply(roots, "//zipcode",xmlValue)
length(zips[zips=="21231"])
## [1] 127
The American Community Survey distributes downloadable data about United States communities. Download the 2006 microdata survey about housing for the state of Idaho using download.file() from here:
https://d396qusza40orc.cloudfront.net/getdata%2Fdata%2Fss06pid.csv
using the fread() command load the data into an R object
library(data.table)
DT<-fread("getdata-data-ss06pid.csv")
system.time(tapply(DT$pwgtp15,DT$SEX,mean))
## user system elapsed
## 0.02 0.00 0.01
system.time(sapply(split(DT$pwgtp15,DT$SEX),mean))
## user system elapsed
## 0 0 0
#system.time({rowMeans(DT)[DT$SEX==1]; rowMeans(DT)[DT$SEX==2]})
system.time({mean(DT[DT$SEX==1,]$pwgtp15); mean(DT[DT$SEX==2,]$pwgtp15)})
## user system elapsed
## 0.11 0.00 0.11
system.time(mean(DT$pwgtp15,by=DT$SEX))
## user system elapsed
## 0 0 0
system.time(DT[,mean(pwgtp15),by=SEX])
## user system elapsed
## 0 0 0