Introduction

In this analysis, regarding parking tickets and speed camera tickets, we will answer three key questions:

  1. Do certain agencies issue higher payments?
  2. Do drivers from different states (NY, NJ, CT) pay more?
  3. Do certain counties tend to have higher payment amounts?

Dataset:
NYC Parking Camera Violations


library(tidyverse)
library(httr)
library(jsonlite)
library(mosaic)

endpoint <- "https://data.cityofnewyork.us/resource/nc67-uf89.json"
resp <- GET(endpoint, query = list("$limit" = 99999))
camera <- fromJSON(content(resp, as="text"), flatten = TRUE)


num_vars <- c("fine_amount","interest_amount","reduction_amount",
              "payment_amount","amount_due","penalty_amount")

camera[num_vars] <- lapply(camera[num_vars], as.numeric)

camera <- camera %>%
  mutate(county = recode(county,
                         "K" = "Kings County",
                         "Q" = "Queens County",
                         "B" = "Bronx",
                         "M" = "Manhattan",
                         "R" = "Richmond"))
camera <- camera %>%
  mutate(
    agency = factor(issuing_agency),
    plate_state = factor(state),
    county = factor(county)
  )

1. Do Certain Agencies Issue Higher Payments?

camera_agency <- camera %>% filter(!is.na(payment_amount), !is.na(agency))

ggplot(camera_agency, aes(x = agency, y = payment_amount)) +
  geom_boxplot() +
  coord_flip() +
  theme_minimal() +
  labs(title = "Payment Amounts by Agency",
       x = "Issuing Agency",
       y = "Payment Amount ($)")

Agencies like Parks, Sanitation, and Business Services show small distributions, indicating that the payments they issue are generally low in cost and do not range in cost very much. Traffic agencies, Housing Authority, and Police Department have median payment amounts that are higher payments, overall. These agencies show a longer right tail with high outliers (over $300), indicating high-cost violations.

favstats(payment_amount ~ agency, data = camera_agency) %>%
  arrange(desc(mean))
##                                 agency min    Q1 median       Q3    max
## 1                      FIRE DEPARTMENT  95 95.00     95 117.5000 125.28
## 2                     PARKS DEPARTMENT   0 62.50     95 125.0000 175.73
## 3                    POLICE DEPARTMENT   0 45.00    100 115.0000 530.00
## 4                             CON RAIL   0 71.25     95 101.3175 120.27
## 5                              TRAFFIC   0 40.00     65 115.0000 525.00
## 6               OTHER/UNKNOWN AGENCIES   0 35.00     70 115.0000 218.59
## 7                    BOARD OF ESTIMATE  65 65.00     65  65.0000  65.00
## 8                       PORT AUTHORITY   0 60.00     60  75.0000 115.00
## 9         DEPARTMENT OF TRANSPORTATION   0 50.00     50  75.0000 232.51
## 10            DEPARTMENT OF SANITATION   0 45.00     45  55.0000 182.31
## 11                   TRANSIT AUTHORITY   0  0.00      0   0.0000   0.00
## 12 TRIBOROUGH BRIDGE AND TUNNEL POLICE   0  0.00      0   0.0000   0.00
##         mean       sd     n missing
## 1  105.04667 15.56448     6       0
## 2   86.34817 47.85326    71       0
## 3   82.04333 49.48397  2278       0
## 4   77.56750 53.06601     4       0
## 5   70.39783 44.55697 81929       0
## 6   69.32329 50.45006   149       0
## 7   65.00000       NA     1       0
## 8   62.00000 41.32191     5       0
## 9   59.35326 18.06863  6942       0
## 10  51.25982 25.80064  1225       0
## 11   0.00000       NA     1       0
## 12   0.00000  0.00000     2       0

Board of Estimate, Department of Business Services, Transit Authority, Con Rail, and NYS Court Officers seem to have set fees, without variance across the board. Police and Fire departement have fees in the median ranges of $95–$125, and traffic department has the highest fee of $582.92.The police department shows the most variance with cost of fees. ## 1.3 ANOVA + Supernova

agency_model <- aov(payment_amount ~ agency, data = camera_agency)
summary(agency_model)
##                Df    Sum Sq Mean Sq F value Pr(>F)    
## agency         11   1588336  144394    77.8 <2e-16 ***
## Residuals   92601 171863554    1856                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA shows a highly significant effect of agency on payment amounts: 𝐹(15,95,340)=299.5,𝑝<.001F(15,95,340)=299.5,p<.001. Meaning, average payment amount are highly different across agencies.

y <- camera_agency$payment_amount
ss_total <- sum((y - mean(y))^2)

ss_between <- anova(agency_model)["agency", "Sum Sq"]
pre_agency <- ss_between / ss_total
round(pre_agency, 3)
## [1] 0.009

2. Do Drivers from Different States (NY, NJ, CT) Pay More?

camera_states <- camera %>%
  filter(plate_state %in% c("NY", "NJ", "CT"),
         !is.na(payment_amount))
ggplot(camera_states, aes(x = plate_state, y = payment_amount)) +
  geom_boxplot() +
  coord_flip() +
  theme_minimal() +
  labs(title = "Payment Amounts by Driver State (NY, NJ, CT)",
       x = "Plate State",
       y = "Payment Amount ($)")

Although median payments are similar for each state, New York has much higher and more expensive outlier payments. Conneticut overall has much lower payment amounts, with New Jersey in the middle.

favstats(payment_amount ~ plate_state, data = camera_states) %>%
  arrange(desc(mean))
##    plate_state min Q1 median  Q3    max     mean       sd     n missing
## 1           NJ   0 45     65 115 302.56 72.45103 47.77582  7217       0
## 2           CT   0 45     65 115 252.87 71.78621 43.07605  1099       0
## 3           NY   0 45     65 105 530.00 69.33661 42.43176 76071       0
## 4           99  NA NA     NA  NA     NA      NaN       NA     0       0
## 5           AK  NA NA     NA  NA     NA      NaN       NA     0       0
## 6           AL  NA NA     NA  NA     NA      NaN       NA     0       0
## 7           AR  NA NA     NA  NA     NA      NaN       NA     0       0
## 8           AZ  NA NA     NA  NA     NA      NaN       NA     0       0
## 9           BC  NA NA     NA  NA     NA      NaN       NA     0       0
## 10          CA  NA NA     NA  NA     NA      NaN       NA     0       0
## 11          CO  NA NA     NA  NA     NA      NaN       NA     0       0
## 12          DC  NA NA     NA  NA     NA      NaN       NA     0       0
## 13          DE  NA NA     NA  NA     NA      NaN       NA     0       0
## 14          DP  NA NA     NA  NA     NA      NaN       NA     0       0
## 15          FL  NA NA     NA  NA     NA      NaN       NA     0       0
## 16          GA  NA NA     NA  NA     NA      NaN       NA     0       0
## 17          HI  NA NA     NA  NA     NA      NaN       NA     0       0
## 18          IA  NA NA     NA  NA     NA      NaN       NA     0       0
## 19          ID  NA NA     NA  NA     NA      NaN       NA     0       0
## 20          IL  NA NA     NA  NA     NA      NaN       NA     0       0
## 21          IN  NA NA     NA  NA     NA      NaN       NA     0       0
## 22          KS  NA NA     NA  NA     NA      NaN       NA     0       0
## 23          KY  NA NA     NA  NA     NA      NaN       NA     0       0
## 24          LA  NA NA     NA  NA     NA      NaN       NA     0       0
## 25          MA  NA NA     NA  NA     NA      NaN       NA     0       0
## 26          MD  NA NA     NA  NA     NA      NaN       NA     0       0
## 27          ME  NA NA     NA  NA     NA      NaN       NA     0       0
## 28          MI  NA NA     NA  NA     NA      NaN       NA     0       0
## 29          MN  NA NA     NA  NA     NA      NaN       NA     0       0
## 30          MO  NA NA     NA  NA     NA      NaN       NA     0       0
## 31          MS  NA NA     NA  NA     NA      NaN       NA     0       0
## 32          MT  NA NA     NA  NA     NA      NaN       NA     0       0
## 33          NC  NA NA     NA  NA     NA      NaN       NA     0       0
## 34          ND  NA NA     NA  NA     NA      NaN       NA     0       0
## 35          NE  NA NA     NA  NA     NA      NaN       NA     0       0
## 36          NH  NA NA     NA  NA     NA      NaN       NA     0       0
## 37          NM  NA NA     NA  NA     NA      NaN       NA     0       0
## 38          NS  NA NA     NA  NA     NA      NaN       NA     0       0
## 39          NV  NA NA     NA  NA     NA      NaN       NA     0       0
## 40          OH  NA NA     NA  NA     NA      NaN       NA     0       0
## 41          OK  NA NA     NA  NA     NA      NaN       NA     0       0
## 42          ON  NA NA     NA  NA     NA      NaN       NA     0       0
## 43          OR  NA NA     NA  NA     NA      NaN       NA     0       0
## 44          PA  NA NA     NA  NA     NA      NaN       NA     0       0
## 45          QB  NA NA     NA  NA     NA      NaN       NA     0       0
## 46          RI  NA NA     NA  NA     NA      NaN       NA     0       0
## 47          SC  NA NA     NA  NA     NA      NaN       NA     0       0
## 48          SD  NA NA     NA  NA     NA      NaN       NA     0       0
## 49          SK  NA NA     NA  NA     NA      NaN       NA     0       0
## 50          TN  NA NA     NA  NA     NA      NaN       NA     0       0
## 51          TX  NA NA     NA  NA     NA      NaN       NA     0       0
## 52          UT  NA NA     NA  NA     NA      NaN       NA     0       0
## 53          VA  NA NA     NA  NA     NA      NaN       NA     0       0
## 54          VT  NA NA     NA  NA     NA      NaN       NA     0       0
## 55          WA  NA NA     NA  NA     NA      NaN       NA     0       0
## 56          WI  NA NA     NA  NA     NA      NaN       NA     0       0
## 57          WV  NA NA     NA  NA     NA      NaN       NA     0       0
## 58          WY  NA NA     NA  NA     NA      NaN       NA     0       0

New Jersey drivers pay the highest amounts($115), Connetecut is slightly lower at $109(although their averages are equal at $71), and New York has the lowest median payment of $92 but the most extreme high payments at $525.00.

state_model <- aov(payment_amount ~ plate_state, data = camera_states)
summary(state_model)
##                Df    Sum Sq Mean Sq F value   Pr(>F)    
## plate_state     2     69090   34545   18.75 7.22e-09 ***
## Residuals   84384 155468652    1842                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Police Department, Traffic, and Other/Unknown Agencies show high variability, with medians between $65–$115 and very high maximum payments (up to $582.92 in Traffic and $500 in Police Department).

y <- camera_states$payment_amount
ss_total <- sum((y - mean(y))^2)

ss_between <- anova(state_model)["plate_state", "Sum Sq"]
pre_state <- ss_between / ss_total
round(pre_state, 3)
## [1] 0

Even though payment amounts differ across agencies, states, and counties, the differences are small. Some agencies like Traffic and the Police Department issue higher and more variable payments, while others have consistent low amounts. New York has much higher outliers in terms of payments, so the law firm should target more toward NY drivers to help them navigate the high costs.

camera_county <- camera %>% filter(!is.na(payment_amount), !is.na(county))

3.2 Boxplot

ggplot(camera_county, aes(x = county, y = payment_amount)) +
  geom_boxplot() +
  coord_flip() +
  theme_minimal() +
  labs(title = "Payment Amounts by County",
       x = "County",
       y = "Payment Amount ($)")

Most counties have medians between about $50–$100, but their outlier are very high across the board. Queens shows much higher outlier payments but these differences are small compared to the overall spread. Overall, county does not meaningfully distinguish how much drivers pay.

favstats(payment_amount ~ county, data = camera_county) %>%
  arrange(desc(mean))
##           county min Q1 median      Q3    max     mean       sd     n missing
## 1       Richmond   0 65     75 115.000 278.31 88.34831 46.09455   673       0
## 2             NY   0 35     75 115.000 530.00 73.06468 47.42286 26689       0
## 3             BX   0 45     65 115.000 309.21 72.71335 46.15045 14325       0
## 4   Kings County   0 45     60 108.735 296.65 70.52878 40.94555 19532       0
## 5  Queens County   0 35     60 105.000 296.40 65.44313 42.01516 25399       0
## 6          Kings  65 65     65  65.000  65.00 65.00000  0.00000     3       0
## 7             BK   0 50     50  75.040 181.02 61.06303 17.31954  2323       0
## 8             QN   0 50     50  75.000 170.55 57.73859 15.26909  2013       0
## 9             ST  50 50     50  50.000 150.60 55.23266 11.61537   538       0
## 10            MN   0 50     50  50.000 232.51 54.80428 17.71390   857       0

Overall, payment amounts are similar across counties, with Manhattan showing slightly higher typical payments(median of $82 and max of $525) while most other counties cluster around the same mid-range values of $50.

county_model <- aov(payment_amount ~ county, data = camera_county)
summary(county_model)
##                Df    Sum Sq Mean Sq F value Pr(>F)    
## county          9   1902722  211414   114.1 <2e-16 ***
## Residuals   92342 171040908    1852                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA shows a significant effect of county on payment amounts: 𝐹(8,84,185)=562.7,𝑝<.001F(8,84,185)=562.7,p<.001,

y_county <- camera_county$payment_amount
ss_total_county <- sum((y_county - mean(y_county))^2)

ss_between_county <- anova(county_model)["county", "Sum Sq"]
pre_county <- ss_between_county / ss_total_county
round(pre_county, 3)
## [1] 0.011

County explains about 5.1% of the total variability.

Based on these findings, the law firm should prioritize marketing to New York drivers, particularly those receiving tickets in Manhattan, because this group faces the highest ticket costs and therefore has the strongest financial reasoning to fight violations. The data show that Manhattan has the highest median payment ($82) and the largest range of high-value fines (up to $525), and NY drivers as a whole experience more extreme ticket amounts than NJ or CT.