Source Data

The following report analyzes tax parcel data from Syracuse, New York (USA).

View the “Data Dictionary” here: Syracuse City Tax Parcel Data



Importing the Data

The following code imports the Syracuse, NY tax parcel data using a URL.

url <- paste0("https://raw.githubusercontent.com/DS4PS/Data",
              "-Science-Class/master/DATA/syr_parcels.csv")

dat <- read.csv(url, 
                strings = FALSE)



Previewing the Data

There are several exploratory functions to better understand our new dataset.

We can inspect the first 5 rows of these data using function head().


head(dat, 5)              # Preview a dataset with 'head()'


Listing All Variables

Functions names() or colnames() will print all variable names in a dataset.


names(dat)                # List all variables with 'names()'
##  [1] "tax_id"       "neighborhood" "stnum"        "stname"       "zip"         
##  [6] "owner"        "frontfeet"    "depth"        "sqft"         "acres"       
## [11] "yearbuilt"    "age"          "age_range"    "land_use"     "units"       
## [16] "residential"  "rental"       "vacantbuil"   "assessedla"   "assessedva"  
## [21] "tax.exempt"   "countytxbl"   "schooltxbl"   "citytaxabl"   "star"        
## [26] "amtdelinqu"   "taxyrsdeli"   "totint"       "overduewater"


Previewing Specific Variables

We can also inspect the values of a variable by extracting it with $.

The extracted variable is called a “vector”.


head(dat$owner, 10)       # Preview a variable, or "vector"
##  [1] "CLARMIN BUILDERS ONON COR" "JOHNSTON LEE R"           
##  [3] "CHRISTO CRAIG S"           "HAWKINS FARMS INC"        
##  [5] "PETERS LYNNETTE"           "MITCHELL LOTAN G"         
##  [7] "WHALEN GIOVANNA A"         "BERGH GARY D"             
##  [9] "CITY OF SYRACUSE TD"       "DOUGHERTY ROBERT K JR"


Listing Unique Values

Function unique() helps us determine what values exist in a variable.


unique(dat$land_use)      # Print all possible values with 'unique()'
##  [1] "Vacant Land"        "Single Family"      "Commercial"        
##  [4] "Parking"            "Two Family"         "Three Family"      
##  [7] "Apartment"          "Schools"            "Parks"             
## [10] "Multiple Residence" "Cemetery"           "Religious"         
## [13] "Recreation"         "Community Services" "Utilities"         
## [16] "Industrial"


Examining Data Structure

Function str() provides an overview of total rows and columns (dimensions), variable classes, and a preview of values.


str(object = dat,
    vec.len = 2)          # Examine data structure with 'str()'
## 'data.frame':    41502 obs. of  29 variables:
##  $ tax_id      : int  1393130501 1393130500 1437100600 1425100900 1425101000 ...
##  $ neighborhood: chr  "South Valley" "South Valley" ...
##  $ stnum       : chr  "2655" "2635" ...
##  $ stname      : chr  "VALLEY DR" "VALLEY DR" ...
##  $ zip         : chr  "13215" "13120" ...
##  $ owner       : chr  "CLARMIN BUILDERS ONON COR" "JOHNSTON LEE R" ...
##  $ frontfeet   : num  67.2 104.8 ...
##  $ depth       : num  50 46.5 ...
##  $ sqft        : num  2149 6370 ...
##  $ acres       : num  0.0493 0.1462 ...
##  $ yearbuilt   : int  NA 1925 1957 1958 1965 ...
##  $ age         : int  NA 90 58 57 50 ...
##  $ age_range   : chr  NA "81-90" ...
##  $ land_use    : chr  "Vacant Land" "Single Family" ...
##  $ units       : int  0 0 0 0 0 ...
##  $ residential : logi  FALSE TRUE TRUE ...
##  $ rental      : logi  FALSE FALSE FALSE ...
##  $ vacantbuil  : logi  FALSE FALSE FALSE ...
##  $ assessedla  : int  475 10800 20200 18000 18000 ...
##  $ assessedva  : int  500 69300 88300 70500 74000 ...
##  $ tax.exempt  : logi  TRUE FALSE FALSE ...
##  $ countytxbl  : int  500 69300 88300 70500 74000 ...
##  $ schooltxbl  : int  500 69300 88300 70500 74000 ...
##  $ citytaxabl  : int  500 69300 88300 70500 74000 ...
##  $ star        : logi  NA TRUE TRUE ...
##  $ amtdelinqu  : num  0 0 0 0 0 ...
##  $ taxyrsdeli  : int  0 0 0 0 0 ...
##  $ totint      : num  0 0 0 0 0 ...
##  $ overduewater: num  0 178 ...



Questions & Solutions

Instructions: Provide the code for each solution in the following “chunks”.

Remember to modify the text to show your answer in human-readable terms.


Question 1: Total Parcels

Question: How many tax parcels are in Syracuse, NY?

Answer: There are 41,502 tax parcels in Syracuse, NY.


# Use an exploratory function like 'dim()', 'nrow()', or 'str()'

dim(dat)
## [1] 41502    29


Question 2: Total Acres

Question: How many acres of land are in Syracuse, NY?

Answer: There are 12,510.49 acres of land in Syracuse, NY.


# Pass a numeric variable to function 'sum()', with argument 'na.rm = TRUE'

sum(dat$acres, na.rm = TRUE)
## [1] 12510.49


Question 3: Vacant Buildings

Question: How many vacant buildings are there in Syracuse, NY?

Answer: There are 1,888 vacant buildings in Syracuse, NY.


# Pass a numeric variable to function 'sum()', with argument 'na.rm = TRUE'

sum(dat$vacantbuil, na.rm = TRUE)
## [1] 1888

Question 4: Tax-Exempt Parcels

Question: What proportion of parcels are tax-exempt?

Answer: 10.70% of parcels are tax-exempt.


# Pass a logical ('TRUE' or 'FALSE') variable to function 'mean()', with argument 'na.rm = TRUE'

table(dat$tax.exempt)
## 
## FALSE  TRUE 
## 37061  4441
mean(dat$tax.exempt, na.rm = TRUE)
## [1] 0.1070069


Question 5: Neighborhoods & Parcels

Question: Which neighborhood contains the most tax parcels?

Answer: Eastwood contains the most tax parcels with 4,889.


# Pass the appropriate variable to function 'table()'

# Optional: Use additional functions to narrow your results

table(dat$neighborhood)
## 
##                Brighton          Court-Woodlawn                Downtown 
##                    2302                    2402                     389 
##                Eastwood                 Elmwood            Far Westside 
##                    4889                    1444                    1027 
##         Franklin Square            Hawley-Green               Lakefront 
##                      89                     367                     312 
##            Lincoln Hill             Meadowbrook           Near Eastside 
##                    1123                    1878                     441 
##           Near Westside            North Valley               Northside 
##                    1772                    1531                    3261 
##          Outer Comstock               Park Ave.           Prospect Hill 
##                     990                     942                     365 
##            Salt Springs                Sedgwick              Skunk City 
##                    1414                    1138                     713 
##            South Campus            South Valley               Southside 
##                      36                    1925                    1370 
##               Southwest              Strathmore               Tipp Hill 
##                    1150                    1822                    1468 
##         University Hill University Neighborhood       Washington Square 
##                     505                    1259                    1180 
##                Westcott               Winkworth 
##                    1540                     452
head(sort(table(dat$neighborhood), decreasing = TRUE)) 
## 
##       Eastwood      Northside Court-Woodlawn       Brighton   South Valley 
##           4889           3261           2402           2302           1925 
##    Meadowbrook 
##           1878


Question 6: Neighborhoods & Vacant Lots

Question: Which neighborhood contains the most vacant lots?

Answer: Near Westside contains the most vacant lots, 425.


# Pass two variables to function 'table()', separated by a comma

# (Optional) use additional functions to narrow your results

head(table(dat$neighborhood, dat$land_use))
##                 
##                  Apartment Cemetery Commercial Community Services Industrial
##   Brighton              26        0         49                 10          0
##   Court-Woodlawn        18        4         52                  2          1
##   Downtown               6        0        209                 17          4
##   Eastwood             139        0        149                  6          2
##   Elmwood               13        3         41                  4          0
##   Far Westside          32        0         82                  2          2
##                 
##                  Multiple Residence Parking Parks Recreation Religious Schools
##   Brighton                        9       2     1          2        16       3
##   Court-Woodlawn                  4       7     3          0         2       4
##   Downtown                        0      78     8          5         6       4
##   Eastwood                       13      15     3          4         7       2
##   Elmwood                         5       8     6          2         7       2
##   Far Westside                   14      11     1          4         5       1
##                 
##                  Single Family Three Family Two Family Utilities Vacant Land
##   Brighton                1398           38        436         0         312
##   Court-Woodlawn          1859           11        370         1          64
##   Downtown                   1            0          0         6          45
##   Eastwood                3605           50        718         2         174
##   Elmwood                  909           18        240         0         186
##   Far Westside             471           23        271         7         101
table1 <- table(dat$neighborhood, dat$land_use)
head(sort(table1[, "Vacant Land"], decreasing = TRUE))
## Near Westside     Southwest     Southside      Brighton       Elmwood 
##           425           367           341           312           186 
##      Eastwood 
##           174