The following report analyzes tax parcel data from Syracuse, New York (USA).
View the “Data Dictionary” here: Syracuse City Tax Parcel Data
The following code imports the Syracuse, NY tax parcel data using a URL.
url <- paste0("https://raw.githubusercontent.com/DS4PS/Data",
"-Science-Class/master/DATA/syr_parcels.csv")
dat <- read.csv(url,
strings = FALSE)
There are several exploratory functions to better understand our new dataset.
We can inspect the first 5 rows of these data using function head().
Functions names() or colnames() will print all variable names in a dataset.
## [1] "tax_id" "neighborhood" "stnum" "stname" "zip"
## [6] "owner" "frontfeet" "depth" "sqft" "acres"
## [11] "yearbuilt" "age" "age_range" "land_use" "units"
## [16] "residential" "rental" "vacantbuil" "assessedla" "assessedva"
## [21] "tax.exempt" "countytxbl" "schooltxbl" "citytaxabl" "star"
## [26] "amtdelinqu" "taxyrsdeli" "totint" "overduewater"
We can also inspect the values of a variable by extracting it with $.
The extracted variable is called a “vector”.
## [1] "CLARMIN BUILDERS ONON COR" "JOHNSTON LEE R"
## [3] "CHRISTO CRAIG S" "HAWKINS FARMS INC"
## [5] "PETERS LYNNETTE" "MITCHELL LOTAN G"
## [7] "WHALEN GIOVANNA A" "BERGH GARY D"
## [9] "CITY OF SYRACUSE TD" "DOUGHERTY ROBERT K JR"
Function unique() helps us determine what values exist in a variable.
## [1] "Vacant Land" "Single Family" "Commercial"
## [4] "Parking" "Two Family" "Three Family"
## [7] "Apartment" "Schools" "Parks"
## [10] "Multiple Residence" "Cemetery" "Religious"
## [13] "Recreation" "Community Services" "Utilities"
## [16] "Industrial"
Function str() provides an overview of total rows and columns (dimensions), variable classes, and a preview of values.
## 'data.frame': 41502 obs. of 29 variables:
## $ tax_id : int 1393130501 1393130500 1437100600 1425100900 1425101000 ...
## $ neighborhood: chr "South Valley" "South Valley" ...
## $ stnum : chr "2655" "2635" ...
## $ stname : chr "VALLEY DR" "VALLEY DR" ...
## $ zip : chr "13215" "13120" ...
## $ owner : chr "CLARMIN BUILDERS ONON COR" "JOHNSTON LEE R" ...
## $ frontfeet : num 67.2 104.8 ...
## $ depth : num 50 46.5 ...
## $ sqft : num 2149 6370 ...
## $ acres : num 0.0493 0.1462 ...
## $ yearbuilt : int NA 1925 1957 1958 1965 ...
## $ age : int NA 90 58 57 50 ...
## $ age_range : chr NA "81-90" ...
## $ land_use : chr "Vacant Land" "Single Family" ...
## $ units : int 0 0 0 0 0 ...
## $ residential : logi FALSE TRUE TRUE ...
## $ rental : logi FALSE FALSE FALSE ...
## $ vacantbuil : logi FALSE FALSE FALSE ...
## $ assessedla : int 475 10800 20200 18000 18000 ...
## $ assessedva : int 500 69300 88300 70500 74000 ...
## $ tax.exempt : logi TRUE FALSE FALSE ...
## $ countytxbl : int 500 69300 88300 70500 74000 ...
## $ schooltxbl : int 500 69300 88300 70500 74000 ...
## $ citytaxabl : int 500 69300 88300 70500 74000 ...
## $ star : logi NA TRUE TRUE ...
## $ amtdelinqu : num 0 0 0 0 0 ...
## $ taxyrsdeli : int 0 0 0 0 0 ...
## $ totint : num 0 0 0 0 0 ...
## $ overduewater: num 0 178 ...
Instructions: Provide the code for each solution in the following “chunks”.
Remember to modify the text to show your answer in human-readable terms.
Question: How many tax parcels are in Syracuse, NY?
Answer: There are 41469 tax parcels in Syracuse, NY.
## [1] "integer"
## [1] 41502 29
## [1] 41469
Question: How many acres of land are in Syracuse, NY?
Answer: There are 12510.49 acres of land in Syracuse, NY.
## [1] 12510.49
Question: How many vacant buildings are there in Syracuse, NY?
Answer: There are 1888 vacant buildings in Syracuse, NY.
## [1] 1888
Question: What proportion of parcels are tax-exempt?
Answer: 10.7% of parcels are tax-exempt.
## [1] 10.70069
# Pass a logical ('TRUE' or 'FALSE') variable to function 'mean()', with argument 'na.rm = TRUE'
mean(dat$tax.exempt)## [1] 0.1070069
Question: Which neighborhood contains the most tax parcels?
Answer: Eastwood contains the most tax parcels.
#class(dat$neighborhood)
tab <- table(dat$neighborhood)
# Pass the appropriate variable to function 'table()'
tab##
## Brighton Court-Woodlawn Downtown
## 2302 2402 389
## Eastwood Elmwood Far Westside
## 4889 1444 1027
## Franklin Square Hawley-Green Lakefront
## 89 367 312
## Lincoln Hill Meadowbrook Near Eastside
## 1123 1878 441
## Near Westside North Valley Northside
## 1772 1531 3261
## Outer Comstock Park Ave. Prospect Hill
## 990 942 365
## Salt Springs Sedgwick Skunk City
## 1414 1138 713
## South Campus South Valley Southside
## 36 1925 1370
## Southwest Strathmore Tipp Hill
## 1150 1822 1468
## University Hill University Neighborhood Washington Square
## 505 1259 1180
## Westcott Winkworth
## 1540 452
# Optional: Use additional functions to narrow your results
#dat$neighborhood <- as.factor(dat$neighborhood)
#levels(dat$neighborhood)
#summary(dat$neighborhood)Question: Which neighborhood contains the most vacant lots?
Answer: Northside contains the most vacant lots.
tab2 <- table(dat$neighborhood, dat$vacantbuil)
# Pass two variables to function 'table()', separated by a comma
tab2##
## FALSE TRUE
## Brighton 2048 243
## Court-Woodlawn 2320 56
## Downtown 357 15
## Eastwood 4716 93
## Elmwood 1305 122
## Far Westside 951 43
## Franklin Square 71 5
## Hawley-Green 343 16
## Lakefront 272 5
## Lincoln Hill 1049 56
## Meadowbrook 1822 15
## Near Eastside 409 24
## Near Westside 1559 176
## North Valley 1441 75
## Northside 2978 264
## Outer Comstock 895 25
## Park Ave. 847 75
## Prospect Hill 321 37
## Salt Springs 1342 44
## Sedgwick 1114 16
## Skunk City 670 35
## South Campus 32 0
## South Valley 1836 40
## Southside 1239 120
## Southwest 1028 87
## Strathmore 1769 38
## Tipp Hill 1402 44
## University Hill 478 6
## University Neighborhood 1242 11
## Washington Square 1102 68
## Westcott 1474 24
## Winkworth 421 10