library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.1.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(supernova)
library(AICcmodavg)
library(mosaic)
## Registered S3 method overwritten by 'mosaic':
## method from
## fortify.SpatialPolygonsDataFrame ggplot2
##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected by this.
##
## Attaching package: 'mosaic'
##
## The following object is masked from 'package:Matrix':
##
## mean
##
## The following objects are masked from 'package:dplyr':
##
## count, do, tally
##
## The following object is masked from 'package:purrr':
##
## cross
##
## The following object is masked from 'package:ggplot2':
##
## stat
##
## The following objects are masked from 'package:stats':
##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var
##
## The following objects are masked from 'package:base':
##
## max, mean, min, prod, range, sample, sum
library(httr)
library(jsonlite)
##
## Attaching package: 'jsonlite'
##
## The following object is masked from 'package:purrr':
##
## flatten
library(ggplot2)
I’m a data scientist, currently working for a Law Firm that specializes in fighting parking and camera tickets. They want me to uncover hidden patterns in NYC violation data such as:
If certain agencies issue higher payments?
If drivers from different states (NY, NJ, CT) pay more?
If certain counties tend to have higher payment amounts?
I will be using and analyzing NYC violation data from NYCOpenData, which can be found at this link: https://data.cityofnewyork.us/City-Government/Open-Parking-and-Camera-Violations/nc67-uf89/about_data
endpoint<-"https://data.cityofnewyork.us/resource/nc67-uf89.json"
resp <- GET(endpoint, query = list(
"$limit" = 99999,
"$order" = "issue_date DESC"
))
camera <- fromJSON(content(resp, as = "text"), flatten = TRUE)
View(camera)
ggplot(camera, aes(x = issuing_agency, y = payment_amount)) + geom_boxplot() + labs(
title= "Agency and Payment Amount",
x = "Agency",
y = "Payment Amount")+ theme_minimal() + coord_flip()
favstats(payment_amount ~ issuing_agency, data=camera) %>% arrange(desc(mean))
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## issuing_agency min Q1 median Q3 max
## 1 HEALTH DEPARTMENT POLICE 243.81 243.810 243.81 243.8100 243.81
## 2 SEA GATE ASSOCIATION POLICE 190.00 190.000 190.00 190.0000 190.00
## 3 FIRE DEPARTMENT 180.00 180.000 180.00 180.0000 180.00
## 4 NYS OFFICE OF MENTAL HEALTH POLICE 0.00 180.000 180.00 190.0000 210.00
## 5 ROOSEVELT ISLAND SECURITY 0.00 135.000 180.00 190.0000 246.68
## 6 PORT AUTHORITY 0.00 180.000 180.00 190.0000 242.76
## 7 NYS PARKS POLICE 0.00 45.000 180.00 190.0000 242.58
## 8 PARKS DEPARTMENT 0.00 90.000 180.00 190.0000 245.28
## 9 TAXI AND LIMOUSINE COMMISSION 125.00 125.000 125.00 125.0000 125.00
## 10 HEALTH AND HOSPITAL CORP. POLICE 0.00 0.000 180.00 190.0000 245.64
## 11 POLICE DEPARTMENT 0.00 0.000 180.00 190.0000 260.00
## 12 CON RAIL 0.00 0.000 95.00 228.8875 243.87
## 13 DEPARTMENT OF TRANSPORTATION 0.00 50.000 75.00 125.0000 690.04
## 14 TRAFFIC 0.00 65.000 115.00 115.0000 245.79
## 15 OTHER/UNKNOWN AGENCIES 0.00 40.115 80.23 120.3450 160.46
## 16 TRANSIT AUTHORITY 0.00 0.000 75.00 125.0000 190.00
## 17 SUNY MARITIME COLLEGE 65.00 65.000 65.00 65.0000 65.00
## 18 NYC OFFICE OF THE SHERIFF 0.00 28.750 57.50 86.2500 115.00
## 19 DEPARTMENT OF SANITATION 0.00 0.000 65.00 105.0000 115.00
## 20 LONG ISLAND RAILROAD 0.00 0.000 0.00 0.0000 0.00
## mean sd n missing
## 1 243.81000 NA 1 0
## 2 190.00000 0.00000 2 0
## 3 180.00000 NA 1 0
## 4 161.33333 65.99423 15 0
## 5 149.16083 90.57967 24 0
## 6 147.35792 82.58394 48 0
## 7 143.86176 89.24158 34 0
## 8 128.47736 78.92728 144 0
## 9 125.00000 NA 1 0
## 10 124.71373 98.60130 51 0
## 11 123.93855 88.00388 214 0
## 12 112.62000 124.87146 6 0
## 13 99.52822 82.88394 87273 0
## 14 94.59362 44.47453 12091 0
## 15 80.23000 113.46235 2 0
## 16 78.00000 82.05181 5 0
## 17 65.00000 NA 1 0
## 18 57.50000 81.31728 2 0
## 19 56.78571 48.26239 14 0
## 20 0.00000 NA 1 0
anova_model_agency<- aov(payment_amount ~ issuing_agency, data=camera)
summary(anova_model_agency)
## Df Sum Sq Mean Sq F value Pr(>F)
## issuing_agency 19 937675 49351 7.858 <2e-16 ***
## Residuals 99910 627464684 6280
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 69 observations deleted due to missingness
supernova(anova_model_agency)
## Refitting to remove 69 cases with missing value(s)
## ℹ aov(formula = payment_amount ~ issuing_agency, data = listwise_delete(camera,
## c("payment_amount", "issuing_agency")))
## Analysis of Variance Table (Type III SS)
## Model: payment_amount ~ issuing_agency
##
## SS df MS F PRE p
## ----- --------------- | ------------- ----- --------- ----- ----- -----
## Model (error reduced) | 937675.432 19 49351.339 7.858 .0015 .0000
## Error (from model) | 627464683.951 99910 6280.299
## ----- --------------- | ------------- ----- --------- ----- ----- -----
## Total (empty model) | 628402359.383 99929 6288.488
** Interpretation**
In our findings, when looking at the sum of squares, we can see that there is a small amount of variance related to the issuing agency. The F-value (7.858) is pretty small, and the p-value (<2e-16) conveys that the difference is statistically significant. 0.15% of the variance is explained, conveying how little of the variation in payment amount is related to the issuing agency. I would not recommend the law firm to use the issuing agency in their marketing strategy because of how small the variation and f-value is.
ggplot(camera, aes(x = state, y = payment_amount)) + geom_boxplot() + labs(
title= "Plate State and Payment Amount",
x = "State",
y = "Payment Amount")+ theme_minimal() + coord_flip()
favstats(payment_amount ~ state, data=camera) %>% arrange(desc(mean))
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## state min Q1 median Q3 max mean sd n missing
## 1 OK 0.00 50.00 200.00 250.0000 250.00 162.19719 88.522638 160 0
## 2 ON 115.00 115.00 120.00 130.0000 145.00 125.00000 14.142136 4 0
## 3 QB 115.00 115.00 115.00 125.0000 125.00 118.75000 5.175492 8 0
## 4 NB 115.00 115.00 115.00 115.0000 115.00 115.00000 NA 1 0
## 5 AR 50.00 50.00 100.00 150.0000 250.00 113.30731 72.563803 67 0
## 6 WA 0.00 50.00 50.00 125.0000 275.00 109.09091 92.114522 33 0
## 7 TX 0.00 50.00 75.04 126.4025 277.06 104.12010 69.855661 312 0
## 8 DC 50.00 75.43 115.00 117.6800 145.00 102.66700 29.610797 20 0
## 9 NJ 0.00 50.00 75.00 115.0000 682.35 101.57462 89.971702 8654 3
## 10 NY 0.00 50.00 75.00 125.0000 690.04 101.09015 80.930148 79541 10
## 11 IN 0.00 67.50 115.00 115.0000 250.00 99.16667 50.520663 42 0
## 12 MN 0.00 50.00 75.00 107.5000 250.00 91.05847 68.580471 59 0
## 13 OH 0.00 50.00 75.00 115.0000 281.80 90.77151 65.548205 299 0
## 14 MT 50.00 50.00 87.50 100.0000 225.00 90.62500 43.671513 24 0
## 15 AL 0.00 50.00 75.00 115.0000 277.06 89.53567 56.218191 97 0
## 16 NC 0.00 50.00 75.00 115.0000 275.89 88.74886 57.680647 484 1
## 17 IL 0.00 50.00 75.00 100.0000 275.00 86.22200 54.900047 265 0
## 18 PA 0.00 50.00 75.00 100.0000 283.57 85.92090 53.933428 2977 2
## 19 IA 50.00 50.00 75.00 93.7600 175.00 85.00400 44.408710 10 0
## 20 VA 0.00 50.00 50.00 115.0000 275.00 82.70679 53.216823 527 0
## 21 SC 0.00 50.00 75.02 100.0000 250.00 82.61794 41.265398 194 0
## 22 GA 0.00 50.00 50.00 100.0000 275.62 82.57126 63.360707 302 0
## 23 MD 0.00 50.00 50.00 100.0000 250.00 81.02126 46.705884 413 0
## 24 CT 0.00 50.00 75.00 100.0000 276.57 80.66270 46.078493 1457 2
## 25 DE 0.00 50.00 75.00 75.4625 275.00 79.71512 49.576008 84 1
## 26 FL 0.00 50.00 50.00 100.0000 276.10 79.26281 50.883529 1654 2
## 27 AZ 0.00 50.00 50.00 100.0000 250.00 79.14683 50.917069 556 0
## 28 MO 0.00 50.00 50.00 75.1900 250.00 78.81636 57.999183 33 0
## 29 MA 0.00 50.00 50.00 100.0000 278.02 78.02744 48.262245 735 0
## 30 VT 0.00 50.00 75.00 75.7550 200.00 77.40515 41.129903 68 0
## 31 MS 0.00 50.00 75.16 115.0000 125.87 76.78111 42.988707 9 0
## 32 AK 75.95 75.95 75.95 75.9500 75.95 75.95000 NA 1 0
## 33 NH 50.00 50.00 50.00 100.0000 178.39 75.04704 31.790066 54 0
## 34 LA 50.00 50.00 50.00 76.4375 241.31 73.36333 41.807692 24 0
## 35 CA 0.00 50.00 50.00 100.0000 275.00 73.04461 52.607199 128 0
## 36 WI 0.00 50.00 50.00 115.0000 125.00 70.62500 44.460840 24 0
## 37 ME 0.00 50.00 50.00 75.4950 250.00 69.10433 37.054284 67 0
## 38 MI 0.00 50.00 50.00 75.0300 225.06 68.87076 35.774572 118 1
## 39 RI 0.00 50.00 50.00 75.5925 241.36 68.77096 36.502474 104 0
## 40 WV 50.00 50.00 50.00 75.6900 125.72 66.91444 25.274199 9 0
## 41 NV 50.00 50.00 50.00 75.0000 125.00 66.47059 26.325172 17 0
## 42 TN 50.00 50.00 50.00 75.0000 180.00 66.27884 30.075361 95 0
## 43 NE 0.00 50.00 50.00 85.0000 180.00 66.25000 51.527795 12 0
## 44 CO 0.00 50.00 50.00 75.0000 125.00 64.51613 28.992954 31 0
## 45 KY 50.00 50.00 50.00 75.0000 125.00 63.41818 25.188157 33 0
## 46 OR 50.00 50.00 50.00 61.2500 125.00 63.01793 23.969258 58 0
## 47 NM 50.00 50.00 50.00 63.1050 76.21 58.73667 15.132351 3 0
## 48 SD 0.00 50.00 62.50 75.0000 125.00 55.36929 35.604580 14 0
## 49 KS 0.00 12.50 50.00 87.5000 115.00 52.50000 48.347699 6 0
## 50 ID 50.00 50.00 50.00 50.0000 50.00 50.00000 NA 1 0
## 51 ND 50.00 50.00 50.00 50.0000 50.00 50.00000 NA 1 0
## 52 DP 0.00 0.00 0.00 115.0000 115.00 49.28571 61.470086 7 0
## 53 UT 0.00 50.00 50.00 50.0000 50.00 38.88889 22.047928 9 0
## 54 99 0.00 0.00 0.00 0.0000 190.00 20.51724 46.605196 29 43
anova_model_state<- aov(payment_amount ~ state, data=camera)
summary(anova_model_state)
## Df Sum Sq Mean Sq F value Pr(>F)
## state 53 4867057 91831 14.71 <2e-16 ***
## Residuals 99880 623567686 6243
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 65 observations deleted due to missingness
supernova(anova_model_state)
## Refitting to remove 65 cases with missing value(s)
## ℹ aov(formula = payment_amount ~ state, data = listwise_delete(camera,
## c("payment_amount", "state")))
## Analysis of Variance Table (Type III SS)
## Model: payment_amount ~ state
##
## SS df MS F PRE p
## ----- --------------- | ------------- ----- --------- ------ ----- -----
## Model (error reduced) | 4867056.569 53 91831.256 14.709 .0077 .0000
## Error (from model) | 623567685.704 99880 6243.169
## ----- --------------- | ------------- ----- --------- ------ ----- -----
## Total (empty model) | 628434742.273 99933 6288.561
** Interpretation Paragraph**
In our findings, when looking at the sum of squares, we can see that there is a small amount of variance related to the different states. The F-value (14.709) is small, and the p-value (<2e-16) conveys that the difference is statistically significant. 0.77% of the variance is explained, conveying how little of the variation in payment amount is related to the different states. I would not recommend the law firm to use states in their marketing strategy because of how small the variation and f-value is, even though it is statistically significant.
camera<- camera %>%
mutate(
county_clean= str_replace(county, "Q", "Queens County"),
county_clean= str_replace(county_clean, "K", "Kings County")
)
ggplot(camera, aes(x = county_clean, y = payment_amount)) + geom_boxplot() + labs(
title= "County and Payment Amount",
x = "County",
y = "Payment Amount")+ theme_minimal() + coord_flip()
favstats(payment_amount ~ county_clean, data=camera) %>% arrange(desc(mean))
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## Warning in (function (x, ..., na.rm = TRUE, type = 7) : Auto-converting
## character to numeric.
## county_clean min Q1 median Q3 max mean sd n
## 1 RICH 180 180 180 180.00 180.00 180.00000 NA 1
## 2 R 0 65 180 180.00 245.79 139.67920 80.35405 863
## 3 Bronx 115 115 115 115.00 115.00 115.00000 NA 1
## 4 Queens Countyns 115 115 115 115.00 115.00 115.00000 NA 1
## 5 BKings County 0 50 75 100.00 690.04 113.54971 131.50278 14560
## 6 Queens County 0 65 115 125.00 244.46 101.70729 53.07962 992
## 7 MN 0 50 50 125.06 281.80 100.54274 73.46670 14518
## 8 BX 0 65 75 145.00 245.64 99.59634 67.66429 246
## 9 NY 0 65 115 115.00 260.00 92.89794 38.39107 8961
## 10 Kings County 0 65 65 115.00 243.81 85.99174 49.27722 1551
## 11 Queens CountyN 0 50 50 100.00 283.03 82.35782 60.30923 16373
## 12 ST 0 50 50 75.00 250.00 69.66361 45.80596 485
## 13 Kings Countyings 0 0 0 0.00 0.00 0.00000 NA 1
## missing
## 1 0
## 2 0
## 3 0
## 4 0
## 5 0
## 6 0
## 7 0
## 8 0
## 9 0
## 10 0
## 11 0
## 12 0
## 13 0
anova_model_county<- aov(payment_amount ~ county_clean, data=camera)
summary(anova_model_county)
## Df Sum Sq Mean Sq F value Pr(>F)
## county_clean 12 9980943 831745 116.7 <2e-16 ***
## Residuals 58540 417135006 7126
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 41446 observations deleted due to missingness
supernova(anova_model_county)
## Refitting to remove 41446 cases with missing value(s)
## ℹ aov(formula = payment_amount ~ county_clean, data = listwise_delete(camera,
## c("payment_amount", "county_clean")))
## Analysis of Variance Table (Type III SS)
## Model: payment_amount ~ county_clean
##
## SS df MS F PRE p
## ----- --------------- | ------------- ----- ---------- ------- ----- -----
## Model (error reduced) | 9980942.982 12 831745.249 116.726 .0234 .0000
## Error (from model) | 417135005.877 58540 7125.641
## ----- --------------- | ------------- ----- ---------- ------- ----- -----
## Total (empty model) | 427115948.859 58552 7294.643
** Interpretation paragraph**
In our findings, when looking at the sum of squares, we can see that there is a small amount of variance related to the different counties. The F-value (116.726) is large, and the p-value (<2e-16) conveys that the difference is statistically significant. 2.34% of the variance is explained, conveying that there is some variation in payment amount that is related to the the different counties. I would recommend the law firm to use the the different counties in their marketing strategy because of the large F-value as well as the variance that is related to payment amount.
I think that the firm should prioritize the different counties in its marketing efforts. The reason is because out of the 3 variables (agency, state, and county), county is the variable that has the largest f-value, as well as the most variation that is related to payment amount compared to the other 2 variables.