NY State Total Income And Tax Liability

Data Overview

This project explores “Total Income And Tax Liability By Place Of Residence” dataset from NY State. Dataset was obtained from US Government open data website (data.gov): https://catalog.data.gov/dataset/total-income-and-tax-liability-by-place-of-residence-beginning-tax-year-1999. The data includes tax liability information for full-year resident, full-year nonresident, and part-year resident returns for 1999 through 2014 tax years. The data includes income, deductions, dependent exemptions, and tax liability and information is provided by income and county. The project will attempt to determine whether there is any impactful difference between resident and non-resident tax lability. This data is relevant to current tax laws and policies and it might expose a pattern of tax loopholes or indications for a need of a policy change.

Data Clean-up

Reading in the data:

df <- read.csv("https://raw.githubusercontent.com/che10vek/Data608/master/Total_Income_And_Tax_Liability_By_Place_Of_Residence__Beginning_Tax_Year_1999.csv", header= TRUE)

And lets preview this data:

knitr::kable(head(df),format = "markdown")
Tax.Year Resident.Type Place.of.Residence Country State County Disclosure Number.of.All.Returns NY.AGI.of.All.Returns..in.thousands… Tax.Liability.of.All.Returns..in.thousands…. Number.of.Taxable.Returns NY.AGI.of.Taxable.Returns..in.thousands… Tax.Liability.of.Taxable.Returns..in.thousands…. Number.of.Nontaxable.Returns NY.AGI.of.Nontaxable.Returns..in.thousands… Tax.Liability.of.Nontaxable.Returns..in.thousands…. Average.NY.AGI.of.All.Returns Average.Tax.of.All.Returns Average.NY.AGI.of.Taxable.Returns Average.Tax.of.Taxable.Returns Average.NY.AGI.of.Nontaxable.Returns County.Sort.Order
2011 Full-Year Resident New York City - Bronx United States New York Bronx NA 588674 17626369 433593 299820 14719525 598802 288854 2906845 -165209 29942 737 49095 1997 10063 1
2011 Full-Year Resident New York City - Kings United States New York Kings NA 1078520 48145626 1989456 636200 44304145 2239988 442320 3841481 -250532 44640 1845 69639 3521 8685 2
2011 Full-Year Resident New York City - Manhattan United States New York Manhattan NA 808917 129181676 9465286 575612 132401862 9552543 233305 -3220186 -87257 159697 11701 230019 16595 -13802 3
2011 Full-Year Resident New York City - Queens United States New York Queens NA 1028443 41248512 1512781 627276 37521694 1694491 401167 3726818 -181709 40108 1471 59817 2701 9290 4
2011 Full-Year Resident New York City - Richmond United States New York Richmond NA 204587 11442144 494959 140940 11001652 518404 63647 440492 -23445 55928 2419 78059 3678 6921 5
2011 Full-Year Resident Total, New York City United States New York Total, New York City NA 3709141 247644328 13896076 2279848 239948878 14604228 1429293 7695450 -708153 66766 3746 105248 6406 5384 6
summary(df)
##     Tax.Year                  Resident.Type                Place.of.Residence
##  Min.   :1999   Full-Year Nonresident: 760   Washington             :  32    
##  1st Qu.:2003   Full-Year Resident   :1072   Delaware               :  31    
##  Median :2007   Grand Total, Total   :  16   Wyoming                :  20    
##  Mean   :2007   Part-Year Resident   :  16   Albany                 :  16    
##  3rd Qu.:2011                                All Other Countries +++:  16    
##  Max.   :2014                                Allegany               :  16    
##                                              (Other)                :1733    
##                     Country                          State     
##  United States          :1668   New York                :1072  
##  All                    :  48   Not Applicable          : 144  
##  All Other Countries +++:  16   All - Excluding New York:  32  
##  Canada                 :  16   All                     :  20  
##  Residence Unknown ++++ :  16   California              :  16  
##  Germany                :  15   Colorado                :  16  
##  (Other)                :  85   (Other)                 : 564  
##             County     Disclosure     Number.of.All.Returns
##  All           : 600   Mode:logical   Min.   :      52     
##  Not Applicable: 144   NA's:1864      1st Qu.:    3807     
##  Albany        :  16                  Median :   22008     
##  Allegany      :  16                  Mean   :  311807     
##  Bronx         :  16                  3rd Qu.:   67982     
##  Broome        :  16                  Max.   :10273274     
##  (Other)       :1056                                       
##  NY.AGI.of.All.Returns..in.thousands...
##  Min.   :3.740e+03                     
##  1st Qu.:7.028e+05                     
##  Median :1.536e+06                     
##  Mean   :2.472e+07                     
##  3rd Qu.:6.244e+06                     
##  Max.   :1.109e+09                     
##                                        
##  Tax.Liability.of.All.Returns..in.thousands.... Number.of.Taxable.Returns
##  Min.   :     147                               Min.   :     31          
##  1st Qu.:   10494                               1st Qu.:   2585          
##  Median :   35656                               Median :  14784          
##  Mean   :  986693                               Mean   : 214842          
##  3rd Qu.:  140765                               3rd Qu.:  49587          
##  Max.   :41911953                               Max.   :7027102          
##                                                                          
##  NY.AGI.of.Taxable.Returns..in.thousands...
##  Min.   :3.591e+03                         
##  1st Qu.:6.265e+05                         
##  Median :1.355e+06                         
##  Mean   :2.300e+07                         
##  3rd Qu.:5.451e+06                         
##  Max.   :1.022e+09                         
##                                            
##  Tax.Liability.of.Taxable.Returns..in.thousands....
##  Min.   :     147                                  
##  1st Qu.:   11244                                  
##  Median :   38915                                  
##  Mean   : 1040505                                  
##  3rd Qu.:  153005                                  
##  Max.   :43342022                                  
##  NA's   :34                                        
##  Number.of.Nontaxable.Returns NY.AGI.of.Nontaxable.Returns..in.thousands...
##  Min.   :     21              Min.   :-3220186                             
##  1st Qu.:   1062              1st Qu.:   60758                             
##  Median :   7309              Median :  139968                             
##  Mean   :  97017              Mean   : 1718891                             
##  3rd Qu.:  19581              3rd Qu.:  575504                             
##  Max.   :3258100              Max.   :94923870                             
##                                                                            
##  Tax.Liability.of.Nontaxable.Returns..in.thousands....
##  Min.   :-1458748.0                                   
##  1st Qu.:   -7536.8                                   
##  Median :   -2637.5                                   
##  Mean   :  -75963.5                                   
##  3rd Qu.:     -11.8                                   
##  Max.   :       0.0                                   
##  NA's   :1272                                         
##  Average.NY.AGI.of.All.Returns Average.Tax.of.All.Returns
##  Min.   :    6627              Min.   :  122             
##  1st Qu.:   38492              1st Qu.: 1325             
##  Median :   57676              Median : 2024             
##  Mean   :  267333              Mean   : 2800             
##  3rd Qu.:  335807              3rd Qu.: 3230             
##  Max.   :32714136              Max.   :32720             
##                                                          
##  Average.NY.AGI.of.Taxable.Returns Average.Tax.of.Taxable.Returns
##  Min.   :   32554                  Min.   :  695                 
##  1st Qu.:   53920                  1st Qu.: 2166                 
##  Median :   78996                  Median : 2938                 
##  Mean   :  303776                  Mean   : 4096                 
##  3rd Qu.:  384735                  3rd Qu.: 4623                 
##  Max.   :47070012                  Max.   :45523                 
##                                    NA's   :34                    
##  Average.NY.AGI.of.Nontaxable.Returns County.Sort.Order
##  Min.   :-188756                      Min.   :  1.00   
##  1st Qu.:   8083                      1st Qu.: 30.00   
##  Median :   9976                      Median : 59.00   
##  Mean   : 198480                      Mean   : 82.33   
##  3rd Qu.: 163643                      3rd Qu.:128.00   
##  Max.   :9855074                      Max.   :299.00   
## 

Certain rows include “Grand Total” information, I will remove those to avoid double counting, I will accomplish that by subseting the data and filtering out those rows.

df1 <- df %>% filter((Resident.Type != "Grand Total, Total"))
df1 <- df1 %>% filter(Place.of.Residence != "Total, New York City")
df1 <- df1 %>% filter(Place.of.Residence != "Total, Counties Outside of New York City")
df1 <- df1 %>% filter(Place.of.Residence != "Grand Total, Full-Year Resident")
df1 <- df1 %>% filter(Place.of.Residence != "Grand Total, Full-Year Nonresident")

summary(df1)
##     Tax.Year                  Resident.Type                Place.of.Residence
##  Min.   :1999   Full-Year Nonresident: 744   Washington             :  32    
##  1st Qu.:2003   Full-Year Resident   :1024   Delaware               :  31    
##  Median :2007   Grand Total, Total   :   0   Wyoming                :  20    
##  Mean   :2007   Part-Year Resident   :  16   Albany                 :  16    
##  3rd Qu.:2011                                All Other Countries +++:  16    
##  Max.   :2014                                Allegany               :  16    
##                                              (Other)                :1653    
##                     Country                          State     
##  United States          :1620   New York                :1024  
##  All                    :  16   Not Applicable          : 144  
##  All Other Countries +++:  16   All - Excluding New York:  16  
##  Canada                 :  16   California              :  16  
##  Residence Unknown ++++ :  16   Colorado                :  16  
##  Germany                :  15   Connecticut             :  16  
##  (Other)                :  85   (Other)                 : 552  
##             County    Disclosure     Number.of.All.Returns
##  All           :600   Mode:logical   Min.   :     52      
##  Not Applicable:144   NA's:1784      1st Qu.:   3381      
##  Albany        : 16                  Median :  20633      
##  Allegany      : 16                  Mean   :  84070      
##  Bronx         : 16                  3rd Qu.:  48991      
##  Broome        : 16                  Max.   :1138249      
##  (Other)       :976                                       
##  NY.AGI.of.All.Returns..in.thousands...
##  Min.   :     3740                     
##  1st Qu.:   686220                     
##  Median :  1440429                     
##  Mean   :  7092545                     
##  3rd Qu.:  4532468                     
##  Max.   :177432814                     
##                                        
##  Tax.Liability.of.All.Returns..in.thousands.... Number.of.Taxable.Returns
##  Min.   :     147                               Min.   :    31           
##  1st Qu.:    9857                               1st Qu.:  2378           
##  Median :   32686                               Median : 13865           
##  Mean   :  270107                               Mean   : 58264           
##  3rd Qu.:  117261                               3rd Qu.: 34264           
##  Max.   :12317382                               Max.   :690968           
##                                                                          
##  NY.AGI.of.Taxable.Returns..in.thousands...
##  Min.   :     3591                         
##  1st Qu.:   601187                         
##  Median :  1266724                         
##  Mean   :  6556208                         
##  3rd Qu.:  3858014                         
##  Max.   :179562053                         
##                                            
##  Tax.Liability.of.Taxable.Returns..in.thousands....
##  Min.   :     147                                  
##  1st Qu.:   10517                                  
##  Median :   35410                                  
##  Mean   :  283542                                  
##  3rd Qu.:  124265                                  
##  Max.   :12415536                                  
##  NA's   :34                                        
##  Number.of.Nontaxable.Returns NY.AGI.of.Nontaxable.Returns..in.thousands...
##  Min.   :    21.0             Min.   :-3220186                             
##  1st Qu.:   965.8             1st Qu.:   58820                             
##  Median :  6925.5             Median :  130467                             
##  Mean   : 25859.4             Mean   :  537065                             
##  3rd Qu.: 14625.2             3rd Qu.:  451021                             
##  Max.   :449793.0             Max.   :13304043                             
##                                                                            
##  Tax.Liability.of.Nontaxable.Returns..in.thousands....
##  Min.   :-269143                                      
##  1st Qu.:  -5922                                      
##  Median :  -1926                                      
##  Mean   : -10474                                      
##  3rd Qu.:     -7                                      
##  Max.   :      0                                      
##  NA's   :1248                                         
##  Average.NY.AGI.of.All.Returns Average.Tax.of.All.Returns
##  Min.   :    6627              Min.   :  122             
##  1st Qu.:   38158              1st Qu.: 1308             
##  Median :   55342              Median : 1958             
##  Mean   :  273961              Mean   : 2765             
##  3rd Qu.:  353794              3rd Qu.: 3182             
##  Max.   :32714136              Max.   :32720             
##                                                          
##  Average.NY.AGI.of.Taxable.Returns Average.Tax.of.Taxable.Returns
##  Min.   :   32554                  Min.   :  695                 
##  1st Qu.:   53464                  1st Qu.: 2137                 
##  Median :   76072                  Median : 2866                 
##  Mean   :  311003                  Mean   : 4048                 
##  3rd Qu.:  398284                  3rd Qu.: 4438                 
##  Max.   :47070012                  Max.   :45523                 
##                                    NA's   :34                    
##  Average.NY.AGI.of.Nontaxable.Returns County.Sort.Order
##  Min.   :-188756                      Min.   :  1.00   
##  1st Qu.:   8094                      1st Qu.: 29.00   
##  Median :   9976                      Median : 57.00   
##  Mean   : 204307                      Mean   : 80.19   
##  3rd Qu.: 167031                      3rd Qu.:125.25   
##  Max.   :9855074                      Max.   :250.00   
## 

Based on the summry table above - the data looks very clean - there is only 1 significant column with NAs. Tax.Liability.of.Nontaxable.Returns column includes NAs due to the fact that detailed breakdown information is not avaiilable by county for prior years, that information is only available for several of the more recent years. I will use Mice to impute using random forest imputation since I believe the data will be a good candidate for this method.

df1_imputed <- mice(df1, m = 1, method = "rf", print = F) %>% mice::complete()

Data Visualization

Let’s begin by creating a visualizaiton for Average NY AGI for all returns and seeing what we can learn from that data.

inc1 <- df1_imputed %>%
  group_by(Tax.Year) %>%
  summarize(AGI = mean(Average.NY.AGI.of.All.Returns))
inc1<-arrange(inc1, desc(Tax.Year))
ggplot(inc1, aes(x = Tax.Year, y = AGI)) + 
  geom_bar(stat = "identity", fill = 'light blue')+
  labs(title="Average AGI of NY Returns by Year (Thousands)", x = "", y = "") +
  geom_text(aes(label=round(AGI,0)), size = 2, color = "black") +
  theme_bw()

From the above visualizaiton we can observe the Adjusted Gross Income trends over the years. The effect of the recent economic crisis is very apparent from this plot.

Next, we will explore the Average Tax of all returns.

inc2 <- df1_imputed %>%
  group_by(Tax.Year) %>%
  summarize(Tax_Liability = mean(Average.Tax.of.All.Returns))
inc2<-arrange(inc2, desc(Tax.Year))
ggplot(inc2, aes(x = Tax.Year, y = Tax_Liability)) + 
  geom_bar(stat = "identity", fill = 'light blue')+
  labs(title="Average Tax Liability for All NY Returns by Year (Thousands)", x = "", y = "") +
  geom_text(aes(label=round(Tax_Liability,0)), size = 2, color = "black") +
  theme_bw()

It is intersting to observe that the average tax doesn’t really follow the same trends and patterns we are observing in the AGI data.

Next, I will create an interactive visualization app which will allow the user to interact with the data and view the details by place of residence.

Link to the App:

https://che10vek.shinyapps.io/FPApp/