In this analysis, regarding parking tickets and speed camera tickets, we will answer three key questions:
Dataset:
NYC
Parking Camera Violations
library(tidyverse)
library(httr)
library(jsonlite)
library(mosaic)
endpoint <- "https://data.cityofnewyork.us/resource/nc67-uf89.json"
resp <- GET(endpoint, query = list("$limit" = 99999))
camera <- fromJSON(content(resp, as="text"), flatten = TRUE)
num_vars <- c("fine_amount","interest_amount","reduction_amount",
"payment_amount","amount_due","penalty_amount")
camera[num_vars] <- lapply(camera[num_vars], as.numeric)
camera <- camera %>%
mutate(county = recode(county,
"K" = "Kings County",
"Q" = "Queens County",
"B" = "Bronx",
"M" = "Manhattan",
"R" = "Richmond"))
camera <- camera %>%
mutate(
agency = factor(issuing_agency),
plate_state = factor(state),
county = factor(county)
)
camera_agency <- camera %>% filter(!is.na(payment_amount), !is.na(agency))
ggplot(camera_agency, aes(x = agency, y = payment_amount)) +
geom_boxplot() +
coord_flip() +
theme_minimal() +
labs(title = "Payment Amounts by Agency",
x = "Issuing Agency",
y = "Payment Amount ($)")
Agencies like Parks, Sanitation, and Business Services show small
distributions, indicating that the payments they issue are generally low
in cost and do not range in cost very much. Traffic agencies, Housing
Authority, and Police Department have median payment amounts that are
higher payments, overall. These agencies show a longer right tail with
high outliers (over $300), indicating high-cost violations.
favstats(payment_amount ~ agency, data = camera_agency) %>%
arrange(desc(mean))
## agency min Q1 median Q3 max
## 1 FIRE DEPARTMENT 95 95.00 95 117.5000 125.28
## 2 PARKS DEPARTMENT 0 62.50 95 125.0000 175.73
## 3 POLICE DEPARTMENT 0 45.00 100 115.0000 530.00
## 4 CON RAIL 0 71.25 95 101.3175 120.27
## 5 TRAFFIC 0 40.00 65 115.0000 525.00
## 6 OTHER/UNKNOWN AGENCIES 0 35.00 70 115.0000 218.59
## 7 BOARD OF ESTIMATE 65 65.00 65 65.0000 65.00
## 8 PORT AUTHORITY 0 60.00 60 75.0000 115.00
## 9 DEPARTMENT OF TRANSPORTATION 0 50.00 50 75.0000 232.51
## 10 DEPARTMENT OF SANITATION 0 45.00 45 55.0000 182.31
## 11 TRANSIT AUTHORITY 0 0.00 0 0.0000 0.00
## 12 TRIBOROUGH BRIDGE AND TUNNEL POLICE 0 0.00 0 0.0000 0.00
## mean sd n missing
## 1 105.04667 15.56448 6 0
## 2 86.34817 47.85326 71 0
## 3 82.04333 49.48397 2278 0
## 4 77.56750 53.06601 4 0
## 5 70.39783 44.55697 81929 0
## 6 69.32329 50.45006 149 0
## 7 65.00000 NA 1 0
## 8 62.00000 41.32191 5 0
## 9 59.35326 18.06863 6942 0
## 10 51.25982 25.80064 1225 0
## 11 0.00000 NA 1 0
## 12 0.00000 0.00000 2 0
Board of Estimate, Department of Business Services, Transit Authority, Con Rail, and NYS Court Officers seem to have set fees, without variance across the board. Police and Fire departement have fees in the median ranges of $95–$125, and traffic department has the highest fee of $582.92.The police department shows the most variance with cost of fees. ## 1.3 ANOVA + Supernova
agency_model <- aov(payment_amount ~ agency, data = camera_agency)
summary(agency_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## agency 11 1588336 144394 77.8 <2e-16 ***
## Residuals 92601 171863554 1856
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA shows a highly significant effect of agency on payment amounts: 𝐹(15,95,340)=299.5,𝑝<.001F(15,95,340)=299.5,p<.001. Meaning, average payment amount are highly different across agencies.
y <- camera_agency$payment_amount
ss_total <- sum((y - mean(y))^2)
ss_between <- anova(agency_model)["agency", "Sum Sq"]
pre_agency <- ss_between / ss_total
round(pre_agency, 3)
## [1] 0.009
camera_states <- camera %>%
filter(plate_state %in% c("NY", "NJ", "CT"),
!is.na(payment_amount))
ggplot(camera_states, aes(x = plate_state, y = payment_amount)) +
geom_boxplot() +
coord_flip() +
theme_minimal() +
labs(title = "Payment Amounts by Driver State (NY, NJ, CT)",
x = "Plate State",
y = "Payment Amount ($)")
Although median payments are similar for each state, New York has much
higher and more expensive outlier payments. Conneticut overall has much
lower payment amounts, with New Jersey in the middle.
favstats(payment_amount ~ plate_state, data = camera_states) %>%
arrange(desc(mean))
## plate_state min Q1 median Q3 max mean sd n missing
## 1 NJ 0 45 65 115 302.56 72.45103 47.77582 7217 0
## 2 CT 0 45 65 115 252.87 71.78621 43.07605 1099 0
## 3 NY 0 45 65 105 530.00 69.33661 42.43176 76071 0
## 4 99 NA NA NA NA NA NaN NA 0 0
## 5 AK NA NA NA NA NA NaN NA 0 0
## 6 AL NA NA NA NA NA NaN NA 0 0
## 7 AR NA NA NA NA NA NaN NA 0 0
## 8 AZ NA NA NA NA NA NaN NA 0 0
## 9 BC NA NA NA NA NA NaN NA 0 0
## 10 CA NA NA NA NA NA NaN NA 0 0
## 11 CO NA NA NA NA NA NaN NA 0 0
## 12 DC NA NA NA NA NA NaN NA 0 0
## 13 DE NA NA NA NA NA NaN NA 0 0
## 14 DP NA NA NA NA NA NaN NA 0 0
## 15 FL NA NA NA NA NA NaN NA 0 0
## 16 GA NA NA NA NA NA NaN NA 0 0
## 17 HI NA NA NA NA NA NaN NA 0 0
## 18 IA NA NA NA NA NA NaN NA 0 0
## 19 ID NA NA NA NA NA NaN NA 0 0
## 20 IL NA NA NA NA NA NaN NA 0 0
## 21 IN NA NA NA NA NA NaN NA 0 0
## 22 KS NA NA NA NA NA NaN NA 0 0
## 23 KY NA NA NA NA NA NaN NA 0 0
## 24 LA NA NA NA NA NA NaN NA 0 0
## 25 MA NA NA NA NA NA NaN NA 0 0
## 26 MD NA NA NA NA NA NaN NA 0 0
## 27 ME NA NA NA NA NA NaN NA 0 0
## 28 MI NA NA NA NA NA NaN NA 0 0
## 29 MN NA NA NA NA NA NaN NA 0 0
## 30 MO NA NA NA NA NA NaN NA 0 0
## 31 MS NA NA NA NA NA NaN NA 0 0
## 32 MT NA NA NA NA NA NaN NA 0 0
## 33 NC NA NA NA NA NA NaN NA 0 0
## 34 ND NA NA NA NA NA NaN NA 0 0
## 35 NE NA NA NA NA NA NaN NA 0 0
## 36 NH NA NA NA NA NA NaN NA 0 0
## 37 NM NA NA NA NA NA NaN NA 0 0
## 38 NS NA NA NA NA NA NaN NA 0 0
## 39 NV NA NA NA NA NA NaN NA 0 0
## 40 OH NA NA NA NA NA NaN NA 0 0
## 41 OK NA NA NA NA NA NaN NA 0 0
## 42 ON NA NA NA NA NA NaN NA 0 0
## 43 OR NA NA NA NA NA NaN NA 0 0
## 44 PA NA NA NA NA NA NaN NA 0 0
## 45 QB NA NA NA NA NA NaN NA 0 0
## 46 RI NA NA NA NA NA NaN NA 0 0
## 47 SC NA NA NA NA NA NaN NA 0 0
## 48 SD NA NA NA NA NA NaN NA 0 0
## 49 SK NA NA NA NA NA NaN NA 0 0
## 50 TN NA NA NA NA NA NaN NA 0 0
## 51 TX NA NA NA NA NA NaN NA 0 0
## 52 UT NA NA NA NA NA NaN NA 0 0
## 53 VA NA NA NA NA NA NaN NA 0 0
## 54 VT NA NA NA NA NA NaN NA 0 0
## 55 WA NA NA NA NA NA NaN NA 0 0
## 56 WI NA NA NA NA NA NaN NA 0 0
## 57 WV NA NA NA NA NA NaN NA 0 0
## 58 WY NA NA NA NA NA NaN NA 0 0
New Jersey drivers pay the highest amounts($115), Connetecut is slightly lower at $109(although their averages are equal at $71), and New York has the lowest median payment of $92 but the most extreme high payments at $525.00.
state_model <- aov(payment_amount ~ plate_state, data = camera_states)
summary(state_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## plate_state 2 69090 34545 18.75 7.22e-09 ***
## Residuals 84384 155468652 1842
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Police Department, Traffic, and Other/Unknown Agencies show high variability, with medians between $65–$115 and very high maximum payments (up to $582.92 in Traffic and $500 in Police Department).
y <- camera_states$payment_amount
ss_total <- sum((y - mean(y))^2)
ss_between <- anova(state_model)["plate_state", "Sum Sq"]
pre_state <- ss_between / ss_total
round(pre_state, 3)
## [1] 0
Even though payment amounts differ across agencies, states, and counties, the differences are small. Some agencies like Traffic and the Police Department issue higher and more variable payments, while others have consistent low amounts. New York has much higher outliers in terms of payments, so the law firm should target more toward NY drivers to help them navigate the high costs.
camera_county <- camera %>% filter(!is.na(payment_amount), !is.na(county))
ggplot(camera_county, aes(x = county, y = payment_amount)) +
geom_boxplot() +
coord_flip() +
theme_minimal() +
labs(title = "Payment Amounts by County",
x = "County",
y = "Payment Amount ($)")
Most counties have medians between about $50–$100, but their outlier are
very high across the board. Queens shows much higher outlier payments
but these differences are small compared to the overall spread. Overall,
county does not meaningfully distinguish how much drivers pay.
favstats(payment_amount ~ county, data = camera_county) %>%
arrange(desc(mean))
## county min Q1 median Q3 max mean sd n missing
## 1 Richmond 0 65 75 115.000 278.31 88.34831 46.09455 673 0
## 2 NY 0 35 75 115.000 530.00 73.06468 47.42286 26689 0
## 3 BX 0 45 65 115.000 309.21 72.71335 46.15045 14325 0
## 4 Kings County 0 45 60 108.735 296.65 70.52878 40.94555 19532 0
## 5 Queens County 0 35 60 105.000 296.40 65.44313 42.01516 25399 0
## 6 Kings 65 65 65 65.000 65.00 65.00000 0.00000 3 0
## 7 BK 0 50 50 75.040 181.02 61.06303 17.31954 2323 0
## 8 QN 0 50 50 75.000 170.55 57.73859 15.26909 2013 0
## 9 ST 50 50 50 50.000 150.60 55.23266 11.61537 538 0
## 10 MN 0 50 50 50.000 232.51 54.80428 17.71390 857 0
Overall, payment amounts are similar across counties, with Manhattan showing slightly higher typical payments(median of $82 and max of $525) while most other counties cluster around the same mid-range values of $50.
county_model <- aov(payment_amount ~ county, data = camera_county)
summary(county_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## county 9 1902722 211414 114.1 <2e-16 ***
## Residuals 92342 171040908 1852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA shows a significant effect of county on payment amounts: 𝐹(8,84,185)=562.7,𝑝<.001F(8,84,185)=562.7,p<.001,
y_county <- camera_county$payment_amount
ss_total_county <- sum((y_county - mean(y_county))^2)
ss_between_county <- anova(county_model)["county", "Sum Sq"]
pre_county <- ss_between_county / ss_total_county
round(pre_county, 3)
## [1] 0.011
County explains about 5.1% of the total variability.
Based on these findings, the law firm should prioritize marketing to New York drivers, particularly those receiving tickets in Manhattan, because this group faces the highest ticket costs and therefore has the strongest financial reasoning to fight violations. The data show that Manhattan has the highest median payment ($82) and the largest range of high-value fines (up to $525), and NY drivers as a whole experience more extreme ticket amounts than NJ or CT.