Analysis

Load the libraries

I am loading two libraries for this project. The dplyr library to help with manipulating the dataframe and the ggplot2 library for plotting the data.

install.packages(“dplyr”) install.packages(“ggplot2”)

library("dplyr")

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library("ggplot2")

Load the data

The data set I chose was the Chicago, IL data set. I chose this data set because it had a large amount of rows which I believe would make for more interesting analysis. The data set contains 24 columns and 1515 records. The data can be found in the same directory as this file as chicago_edited.csv

chicagoMisconductPayouts <- read.csv(file = 'chicago_edited.csv')
head(chicagoMisconductPayouts)

##   calendar_year    city state incident_date incident_year filed_date filed_year
## 1          2019 Chicago    IL            NA            NA         NA         NA
## 2          2018 Chicago    IL            NA            NA         NA         NA
## 3          2019 Chicago    IL            NA            NA         NA         NA
## 4          2010 Chicago    IL            NA            NA         NA         NA
## 5          2010 Chicago    IL            NA            NA         NA         NA
## 6          2010 Chicago    IL            NA            NA         NA         NA
##   closed_date amount_awarded other_expenses collection total_incurred
## 1  2019-01-29         160000             NA         NA             NA
## 2  2018-10-02         100000             NA         NA             NA
## 3  2019-05-30          70000             NA         NA             NA
## 4  2010-09-24          60000             NA         NA             NA
## 5  2010-03-18          99000             NA         NA             NA
## 6  2010-03-31          22500             NA         NA             NA
##                                        case_outcome docket_number claim_number
## 1 Civil Litigation - General:Dismissed:Settlement\n  2016 C 08033           NA
## 2 Civil Litigation - General:Dismissed:Settlement\n  2016 C 08236           NA
## 3 Civil Litigation - General:Dismissed:Settlement\n  2018 C 04020           NA
## 4 Civil Litigation - General:Dismissed:Settlement\n     00L005230           NA
## 5 Civil Litigation - General:Dismissed:Settlement\n     00L007137           NA
## 6 Civil Litigation - General:Dismissed:Settlement\n     06L003486           NA
##   court plaintiff_name plaintiff_attorney
## 1    NA             NA                 NA
## 2    NA             NA                 NA
## 3    NA             NA                 NA
## 4    NA             NA                 NA
## 5    NA             NA                 NA
## 6    NA             NA                 NA
##                                                                                                                              matter_name
## 1        Itemid Al Matar v PO D.R. Borchardt #16806; PO T.P. Hansen #3833; Sgt. Lucid #2361; PO M. Walter #4118; and the City of Chicago
## 2                            Anthony Hawks v Chicago Police Department 18th Precinct, Officer Lawrence Gade, Jr., Officers John Doe 1-10
## 3 Hasin Ramadan and Dwight Gamble v. Sergeant Xavier Elizondo, Star No. 1340; Officer David Salgado, Star No. 16347; and City of Chicago
## 4                                                                                                   Pauline Underdown v. City of Chicago
## 5                                                                                                        PATSY MCCALL V. CITY OF CHICAGO
## 6                                  CAMILLE GILLIAM V. CITY OF CHICAGO, AND THE CHICAGO POLICE DEPT., AND UNKNOWN CHICAGO POLICE OFFICERS
##   location                                    summary_allegations status
## 1       NA   Dispute:General:Police Matters:Excessive Force Minor Closed
## 2       NA   Dispute:General:Police Matters:Excessive Force Minor Closed
## 3       NA            Dispute:General:Police Matters:False Arrest Closed
## 4       NA Dispute:General:Police Matters:Excessive Force Serious Closed
## 5       NA Dispute:General:Police Matters:Excessive Force Serious Closed
## 6       NA            Dispute:General:Police Matters:False Arrest Closed
##        role            flag_dept
## 1 Defendant Chicago Police Board
## 2 Defendant Chicago Police Board
## 3 Defendant Chicago Police Board
## 4 Defendant  Chicago Police Dept
## 5 Defendant  Chicago Police Dept
## 6 Defendant  Chicago Police Dept

nrow(chicagoMisconductPayouts)

## [1] 1515

Subset the data

I subsetted the data down to just two columns. The year that the misconduct settlement was awarded and the dollar amount awarded. Those two values are the the most important variables in this data set.

payoutsWithYear <- subset(chicagoMisconductPayouts, select = c("calendar_year", "amount_awarded"))
head(payoutsWithYear)

##   calendar_year amount_awarded
## 1          2019         160000
## 2          2018         100000
## 3          2019          70000
## 4          2010          60000
## 5          2010          99000
## 6          2010          22500

Aggregate the data

I then grouped each row by the calendar year and summarized the data by taking the mean of the payouts for each year, summed the payouts per year, and counted the amount of settlements per year.

summary <- payoutsWithYear %>%
  group_by(calendar_year) %>%
  summarise(Average_Amount_Rewarded = mean(amount_awarded),Total_Amount_Rewarded = sum(amount_awarded), Number_Of_Settlements = n())
summary

## # A tibble: 10 × 4
##    calendar_year Average_Amount_Rewarded Total_Amount_Rewarded Number_Of_Settle…
##            <int>                   <dbl>                 <dbl>             <int>
##  1          2010                 305778.             24462226.                80
##  2          2011                 203483.             22790143.               112
##  3          2012                 390242.             60487581.               155
##  4          2013                 506050.             92607082.               183
##  5          2014                 186598.             29669161.               159
##  6          2015                 150062.             22509331.               150
##  7          2016                 160494.             31296263.               195
##  8          2017                 534597.             94623599.               177
##  9          2018                 314646.             56950918.               181
## 10          2019                 261709.             32190160.               123

Graph the data

I graphed the total settlements in dollars per year as I thought it might show a trend in the dollar amount of settlements increasing over time. What I found was there were two peak years 2013 and 2017 with much lower amounts in the remaining years. Some other interesting insights are that 2017 appears to have fewer payouts than both 2016 and 2018 indicating that they may have been a uncharacteristically large payout in the year of 2017 causing it to be higher than other years.

ggplot(data=summary, aes(x=calendar_year, y=Total_Amount_Rewarded)) + 
  geom_bar(stat = "identity") +
    scale_y_continuous(
  labels = scales::comma_format(big.mark = ','),breaks = seq(0, floor(max(summary$Total_Amount_Rewarded)), by = 10000000)) +
    scale_x_continuous(breaks = seq(min(summary$calendar_year), max(summary$calendar_year), by = 1)) +
  ylab("Total Amount Rewarded ($)") +
  xlab("Year")

DATA607 HW1 Loading Data Into A Data Frame