INTRODUCTION:

This paper aims to find out if there is any association between the different kinds of crimes committed in a given country of a given year by applying association rules to the crime report data. First, data about the crime was gathered from the website of FBI, from the uniform crime reporting statistics.The website consists a recored of all the violent and property crimes from the year 1985 to 2014. You can find the website and more detailed information on the dataset here : links But, here I have analysed the data of only 1985. After the data was downloaded, it had to be processed so that it can be used as to apply association rules. The data looked something like this:

LocalCrimeOneYearofData <- read.csv("C:/Users/PC-CATHERINE/Downloads/LocalCrimeOneYearofData.csv", stringsAsFactors=FALSE)
LocalCrimeOneYearofData[1:5,]

##   State Months Population Violent.crime.total
## 1    TX     12     111317                 355
## 2    ID     12         NA                 111
## 3    SC     12         NA                 612
## 4    OH     12     226704                1873
## 5    CA     12         NA                 427
##   Murder.and.nonnegligent.Manslaughter Legacy.rape..1 Revised.rape..2 Robbery
## 1                                    8             36              NA      96
## 2                                    3              8              NA      10
## 3                                   10             32              NA      49
## 4                                   17            158              NA     511
## 5                                    3             27              NA     166
##   Aggravated.assault Property.crime.total Burglary Larceny.theft
## 1                215                 6156     1623          4116
## 2                 90                 1797      533          1190
## 3                521                 2218      946          1117
## 4               1187                13261     3197          9126
## 5                231                 3964     1483          2128
##   Motor.vehicle.theft
## 1                 417
## 2                  74
## 3                 155
## 4                 938
## 5                 353

Processing the Data:

By looking at the data we understand that each cell value indicates the total number of a particular sort of crime that was committed in a given jurisdiction/agency in the year of 1985. But since our paper is not exploring each state, but is trying to have a composite look at the crime scenario of the entire country as a whole, we will not take into consideration the number of times each crime was committed in a given jurisdiction. Since association rules are commonly used to find the association of goods brought in a super market, lets create an analogy between our data and the data of the transaction of a supermarket to help clarify in a better way what we are trying to achieve here.We would look at each jurisdiction as a transaction ID and each kind of crime as an item that was purchased in the supermarket. Now, ususally, in simple market basket analysis, we don’t take into account the amount or quantity of each item that was purchased in a particular transaction. So, we are gonna do the same with our crime data. We don’t care about how many times a particular crime was committed in a given jurisdiction. We only care about if it happened. So, we will tranform the data in the following way:

install.packages("arulesViz",repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/PC-CATHERINE/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

## package 'arulesViz' successfully unpacked and MD5 sums checked
## 
## The downloaded binary packages are in
##  C:\Users\PC-CATHERINE\AppData\Local\Temp\RtmpiMevA8\downloaded_packages

install.packages("arules",repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/PC-CATHERINE/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

## package 'arules' successfully unpacked and MD5 sums checked

## Warning: cannot remove prior installation of package 'arules'

## Warning in file.copy(savedcopy, lib, recursive = TRUE):
## problem copying C:\Users\PC-CATHERINE\Documents\R\win-
## library\3.6\00LOCK\arules\libs\x64\arules.dll to C:\Users\PC-
## CATHERINE\Documents\R\win-library\3.6\arules\libs\x64\arules.dll: Permission
## denied

## Warning: restored 'arules'

## 
## The downloaded binary packages are in
##  C:\Users\PC-CATHERINE\AppData\Local\Temp\RtmpiMevA8\downloaded_packages

install.packages("dplyr",repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/PC-CATHERINE/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

## 
##   There is a binary version available but the source version is later:
##       binary source needs_compilation
## dplyr  0.8.3  0.8.4              TRUE
## 
##   Binaries will be installed
## package 'dplyr' successfully unpacked and MD5 sums checked

## Warning: cannot remove prior installation of package 'dplyr'

## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying C:
## \Users\PC-CATHERINE\Documents\R\win-library\3.6\00LOCK\dplyr\libs\x64\dplyr.dll
## to C:\Users\PC-CATHERINE\Documents\R\win-library\3.6\dplyr\libs\x64\dplyr.dll:
## Permission denied

## Warning: restored 'dplyr'

## 
## The downloaded binary packages are in
##  C:\Users\PC-CATHERINE\AppData\Local\Temp\RtmpiMevA8\downloaded_packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(arules)

## Warning: package 'arules' was built under R version 3.6.2

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following object is masked from 'package:dplyr':
## 
##     recode

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

## Warning: package 'arulesViz' was built under R version 3.6.2

## Loading required package: grid

## Registered S3 method overwritten by 'seriation':
##   method         from 
##   reorder.hclust gclus

#Removing the months clumn as we know the data is for the entire year
LocalCrimeOneYearofData<-LocalCrimeOneYearofData[,-c(2)]

#Transforming the columns
LocalCrimeOneYearofData$Murder.and.nonnegligent.Manslaughter<-ifelse(is.na(LocalCrimeOneYearofData$Murder.and.nonnegligent.Manslaughter) | LocalCrimeOneYearofData$Murder.and.nonnegligent.Manslaughter==0, "", "Murder.and.nonnegligent.Manslaughter")
LocalCrimeOneYearofData$Legacy.rape..1<-ifelse(is.na(LocalCrimeOneYearofData$Legacy.rape..1) | LocalCrimeOneYearofData$Legacy.rape..1==0, "", "Legacy.rape_1")
LocalCrimeOneYearofData$Revised.rape..2<-ifelse(is.na(LocalCrimeOneYearofData$Revised.rape..2) | LocalCrimeOneYearofData$Revised.rape..2==0, "", "Revised.rape_2")
LocalCrimeOneYearofData$Robbery<-ifelse(is.na(LocalCrimeOneYearofData$Robbery) | LocalCrimeOneYearofData$Robbery==0, "", "Robbery")
LocalCrimeOneYearofData$Aggravated.assault<-ifelse(is.na(LocalCrimeOneYearofData$Aggravated.assault) | LocalCrimeOneYearofData$Aggravated.assault==0, "", "Aggravated.assault")
LocalCrimeOneYearofData$Burglary<-ifelse(is.na(LocalCrimeOneYearofData$Burglary) | LocalCrimeOneYearofData$Burglary==0, "", "Burglary")
LocalCrimeOneYearofData$Larceny.theft<-ifelse(is.na(LocalCrimeOneYearofData$Larceny.theft) | LocalCrimeOneYearofData$Larceny.theft==0, "", "Larceny.theft")
LocalCrimeOneYearofData$Motor.vehicle.theft<-ifelse(is.na(LocalCrimeOneYearofData$Motor.vehicle.theft) | LocalCrimeOneYearofData$Motor.vehicle.theft==0, "", "Motor.vehicle.theft")


#Removing the population and the count of total number of violent and property crimes committed
LocalCrimeOneYearofData<-LocalCrimeOneYearofData[,-c(2,3,9)]
#Saving the state column and then removing it
State<-c(LocalCrimeOneYearofData$State)
LocalCrimeOneYearofData<-LocalCrimeOneYearofData[,-c(1)]

# Writing the current dataframe as a table without columns and then reading it as transactions data
write.table(LocalCrimeOneYearofData, file = "crime.csv", col.names = FALSE, row.names = FALSE, sep = ",")
trans<- read.transactions("crime.csv", sep =",", format("basket"),  rm.duplicates = TRUE)

After we have done that, lets take a look at what we have:

items

[1] {Aggravated.assault,
Burglary,
Larceny.theft,
Legacy.rape_1,
Motor.vehicle.theft,
Murder.and.nonnegligent.Manslaughter, Robbery}
[2] {Aggravated.assault,
Burglary,
Larceny.theft,
Legacy.rape_1,
Motor.vehicle.theft,
Murder.and.nonnegligent.Manslaughter, Robbery}
[3] {Aggravated.assault,
Burglary,
Larceny.theft,
Legacy.rape_1,
Motor.vehicle.theft,
Murder.and.nonnegligent.Manslaughter, Robbery}
[4] {Aggravated.assault,
Burglary,
Larceny.theft,
Legacy.rape_1,
Motor.vehicle.theft,
Murder.and.nonnegligent.Manslaughter, Robbery}
[5] {Aggravated.assault,
Burglary,
Larceny.theft,
Legacy.rape_1,
Motor.vehicle.theft,
Murder.and.nonnegligent.Manslaughter, Robbery}

Now, that we have the data in the transaction form, we can easily apply association rules to it.

#Lets strat by creating the frequency table
freq_items<-eclat(trans, parameter=list(supp=0.50, maxlen=8))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target   ext
##     FALSE     0.5      1      8 frequent itemsets FALSE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 227 
## 
## create itemset ... 
## set transactions ...[7 item(s), 455 transaction(s)] done [0.00s].
## sorting and recoding items ... [7 item(s)] done [0.00s].
## creating bit matrix ... [7 row(s), 455 column(s)] done [0.00s].
## writing  ... [127 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

inspect(freq_items[1:5,])

##     items                                    support count
## [1] {Aggravated.assault,                                  
##      Burglary,                                            
##      Larceny.theft,                                       
##      Legacy.rape_1,                                       
##      Motor.vehicle.theft,                                 
##      Murder.and.nonnegligent.Manslaughter,                
##      Robbery}                              0.9098901   414
## [2] {Aggravated.assault,                                  
##      Burglary,                                            
##      Legacy.rape_1,                                       
##      Motor.vehicle.theft,                                 
##      Murder.and.nonnegligent.Manslaughter,                
##      Robbery}                              0.9098901   414
## [3] {Aggravated.assault,                                  
##      Larceny.theft,                                       
##      Legacy.rape_1,                                       
##      Motor.vehicle.theft,                                 
##      Murder.and.nonnegligent.Manslaughter,                
##      Robbery}                              0.9098901   414
## [4] {Burglary,                                            
##      Larceny.theft,                                       
##      Legacy.rape_1,                                       
##      Motor.vehicle.theft,                                 
##      Murder.and.nonnegligent.Manslaughter,                
##      Robbery}                              0.9098901   414
## [5] {Burglary,                                            
##      Legacy.rape_1,                                       
##      Motor.vehicle.theft,                                 
##      Murder.and.nonnegligent.Manslaughter,                
##      Robbery}                              0.9098901   414

Analysing the Data

The absolute minimum support count is found in 227 jurisdictions, that is around in around 227 jurisdictions, 50% of the crimes are committed together.

freq_rules<-ruleInduction(freq_items, trans, confidence=0.95)
freq_rules<-sort(freq_rules, by="lift", decreasing=TRUE)
inspect(freq_rules[1:5,])

##     lhs                                       rhs               support confidence     lift itemset
## [1] {Aggravated.assault,                                                                           
##      Burglary,                                                                                     
##      Larceny.theft,                                                                                
##      Murder.and.nonnegligent.Manslaughter,                                                         
##      Robbery}                              => {Legacy.rape_1} 0.9142857  0.9719626 1.005098      16
## [2] {Aggravated.assault,                                                                           
##      Burglary,                                                                                     
##      Murder.and.nonnegligent.Manslaughter,                                                         
##      Robbery}                              => {Legacy.rape_1} 0.9142857  0.9719626 1.005098      17
## [3] {Aggravated.assault,                                                                           
##      Larceny.theft,                                                                                
##      Murder.and.nonnegligent.Manslaughter,                                                         
##      Robbery}                              => {Legacy.rape_1} 0.9142857  0.9719626 1.005098      18
## [4] {Burglary,                                                                                     
##      Larceny.theft,                                                                                
##      Murder.and.nonnegligent.Manslaughter,                                                         
##      Robbery}                              => {Legacy.rape_1} 0.9142857  0.9719626 1.005098      19
## [5] {Burglary,                                                                                     
##      Murder.and.nonnegligent.Manslaughter,                                                         
##      Robbery}                              => {Legacy.rape_1} 0.9142857  0.9719626 1.005098      20

Results

Violent crimes consists of murder and manslaughter, legacy rape, revised rape, robberyand aggravated assult. Whereas the property crimes consists of burglary, larceny theft and motorcycle theft. From looking at the results we can see that aggravated assault, burglary, larceny theft and murder are most likely to happen with legacy rape (as their lift value is the highest). Morever, the values of support and confidence further reinforce this hypothesis that these crimes are more likely to happen together. But what does this mean given our data? When I say these crimes are more likely to happen together, I don’t mean that they will happen at the same time or in will be reported in one crime incident report. This is because even though we drew the analogy of the jurisdiction being transaction ID, its not. That means that a high lift value has a more aggregate interpretation: It implies that if a jurisdiction has crimes reported on the lhs, then it is more likely (or probable) for that particular state to have the crime reporting on rhs. So, we are not talking about each report incident, but about a jurisdiction on a whole. Lets look deeper. Lets try and see what is the situation for each of this crime when they are on rhs:

rules_Legacy.rape_1<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                      appearance=list(default="lhs", rhs="Legacy.rape_1"), control=list(verbose=F)) 
rules_rape_byconf<-sort(rules_Legacy.rape_1, by="confidence", decreasing=TRUE)
inspect(rules_rape_byconf[1:5])

##     lhs                                       rhs               support confidence     lift count
## [1] {Murder.and.nonnegligent.Manslaughter} => {Legacy.rape_1} 0.9142857  0.9719626 1.005098   416
## [2] {Murder.and.nonnegligent.Manslaughter,                                                       
##      Robbery}                              => {Legacy.rape_1} 0.9142857  0.9719626 1.005098   416
## [3] {Aggravated.assault,                                                                         
##      Murder.and.nonnegligent.Manslaughter} => {Legacy.rape_1} 0.9142857  0.9719626 1.005098   416
## [4] {Burglary,                                                                                   
##      Murder.and.nonnegligent.Manslaughter} => {Legacy.rape_1} 0.9142857  0.9719626 1.005098   416
## [5] {Larceny.theft,                                                                              
##      Murder.and.nonnegligent.Manslaughter} => {Legacy.rape_1} 0.9142857  0.9719626 1.005098   416

Looking at the confidence, suuport and lift values, we cn say that the first 5 rules which states that crimes like murder, robbery assault, burglary and larceny theft in a jurisdiction implies the prevalence of legacy rape in that jurisdiction as well in 1985 USA.

rules_Murder<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                             appearance=list(default="lhs", rhs="Murder.and.nonnegligent.Manslaughter"), control=list(verbose=F)) 
rules_murder_byconf<-sort(rules_Murder, by="confidence", decreasing=TRUE)
inspect(rules_murder_byconf[1:5])

##     lhs                     rhs                                      support confidence     lift count
## [1] {Legacy.rape_1,                                                                                   
##      Robbery}            => {Murder.and.nonnegligent.Manslaughter} 0.9142857  0.9476082 1.007387   416
## [2] {Aggravated.assault,                                                                              
##      Legacy.rape_1,                                                                                   
##      Robbery}            => {Murder.and.nonnegligent.Manslaughter} 0.9142857  0.9476082 1.007387   416
## [3] {Burglary,                                                                                        
##      Legacy.rape_1,                                                                                   
##      Robbery}            => {Murder.and.nonnegligent.Manslaughter} 0.9142857  0.9476082 1.007387   416
## [4] {Larceny.theft,                                                                                   
##      Legacy.rape_1,                                                                                   
##      Robbery}            => {Murder.and.nonnegligent.Manslaughter} 0.9142857  0.9476082 1.007387   416
## [5] {Aggravated.assault,                                                                              
##      Burglary,                                                                                        
##      Legacy.rape_1,                                                                                   
##      Robbery}            => {Murder.and.nonnegligent.Manslaughter} 0.9142857  0.9476082 1.007387   416

The support, confidence and lift again show that rape, robbery assault, burglary, larceny imply the prevalence of murder and manslaughter in a jurisdiction as well. But notice that the support , confidence and lift values are less than the previous case. This implies that this rule is not as strong as the previous one. Hence we can’t say that the rule of murder being impleid by rape is as strong as rape bening implied by murder.

rules_Burglary<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                             appearance=list(default="lhs", rhs="Burglary"), control=list(verbose=F)) 
rules_burglary_byconf<-sort(rules_Burglary, by="confidence", decreasing=TRUE)
inspect(rules_burglary_byconf[1:5])

##     lhs                                       rhs        support   confidence
## [1] {}                                     => {Burglary} 1.0000000 1         
## [2] {Murder.and.nonnegligent.Manslaughter} => {Burglary} 0.9406593 1         
## [3] {Legacy.rape_1}                        => {Burglary} 0.9670330 1         
## [4] {Motor.vehicle.theft}                  => {Burglary} 0.9956044 1         
## [5] {Robbery}                              => {Burglary} 0.9956044 1         
##     lift count
## [1] 1    455  
## [2] 1    428  
## [3] 1    440  
## [4] 1    453  
## [5] 1    453

This shows us that the most probable chance on burglary happens alone (because of the empty set on lhs). That means that a jurisdiction where no other crime has been reported is most likely to have a burglary report. Also since the value of confidence and support is 1, this is a very strong rule and it happens in almost all of the jurisdictions.

rules_Robbery<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                             appearance=list(default="lhs", rhs="Robbery"), control=list(verbose=F)) 
rules_robbery_byconf<-sort(rules_Robbery, by="confidence", decreasing=TRUE)
inspect(rules_robbery_byconf[1:5])

##     lhs                                       rhs         support confidence     lift count
## [1] {Murder.and.nonnegligent.Manslaughter} => {Robbery} 0.9406593          1 1.004415   428
## [2] {Legacy.rape_1,                                                                        
##      Murder.and.nonnegligent.Manslaughter} => {Robbery} 0.9142857          1 1.004415   416
## [3] {Motor.vehicle.theft,                                                                  
##      Murder.and.nonnegligent.Manslaughter} => {Robbery} 0.9362637          1 1.004415   426
## [4] {Aggravated.assault,                                                                   
##      Murder.and.nonnegligent.Manslaughter} => {Robbery} 0.9406593          1 1.004415   428
## [5] {Burglary,                                                                             
##      Murder.and.nonnegligent.Manslaughter} => {Robbery} 0.9406593          1 1.004415   428

This is an interesting yet very true result that we observe. The support, confidence and lift values tells us that apparently robbery is most likely to happen along with murder and manslaughter. This is very true and is still prevalent as is eveident by the news. This too is a very strong rule as the confidence is 1.

rules_Larceny.theft<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                             appearance=list(default="lhs", rhs="Larceny.theft"), control=list(verbose=F)) 
rules_Larceny_byconf<-sort(rules_Larceny.theft, by="confidence", decreasing=TRUE)
inspect(rules_Larceny_byconf[1:5])

##     lhs                                       rhs               support confidence lift count
## [1] {}                                     => {Larceny.theft} 1.0000000          1    1   455
## [2] {Murder.and.nonnegligent.Manslaughter} => {Larceny.theft} 0.9406593          1    1   428
## [3] {Legacy.rape_1}                        => {Larceny.theft} 0.9670330          1    1   440
## [4] {Motor.vehicle.theft}                  => {Larceny.theft} 0.9956044          1    1   453
## [5] {Robbery}                              => {Larceny.theft} 0.9956044          1    1   453

Like burglary, larceny theft too is most likely to happen alone, that is, in a jurisdiction where none of the other crimes have been reported. Lets look at assault next:

rules_Aggravated.assault<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                             appearance=list(default="lhs", rhs="Aggravated.assault"), control=list(verbose=F)) 
rules_assault_byconf<-sort(rules_Aggravated.assault, by="confidence", decreasing=TRUE)
inspect(rules_assault_byconf[1:5])

##     lhs                                       rhs                    support confidence     lift count
## [1] {Murder.and.nonnegligent.Manslaughter} => {Aggravated.assault} 0.9406593          1 1.002203   428
## [2] {Legacy.rape_1}                        => {Aggravated.assault} 0.9670330          1 1.002203   440
## [3] {Robbery}                              => {Aggravated.assault} 0.9956044          1 1.002203   453
## [4] {Legacy.rape_1,                                                                                   
##      Murder.and.nonnegligent.Manslaughter} => {Aggravated.assault} 0.9142857          1 1.002203   416
## [5] {Motor.vehicle.theft,                                                                             
##      Murder.and.nonnegligent.Manslaughter} => {Aggravated.assault} 0.9362637          1 1.002203   426

These results too are intutive and understable. It says that aggravated assault is most likely to happen in a jurisdiction where murder and rape have been reported or are prevalent.

rules_Motor.vehicle.theft<-apriori(data=trans, parameter=list(supp=0.001,conf = 0.08), 
                                  appearance=list(default="lhs", rhs="Motor.vehicle.theft"), control=list(verbose=F)) 
rules_vehicle_theft_byconf<-sort(rules_Motor.vehicle.theft, by="confidence", decreasing=TRUE)
inspect(rules_vehicle_theft_byconf[1:5])

##     lhs                         rhs                   support   confidence
## [1] {}                       => {Motor.vehicle.theft} 0.9956044 0.9956044 
## [2] {Burglary}               => {Motor.vehicle.theft} 0.9956044 0.9956044 
## [3] {Larceny.theft}          => {Motor.vehicle.theft} 0.9956044 0.9956044 
## [4] {Burglary,Larceny.theft} => {Motor.vehicle.theft} 0.9956044 0.9956044 
## [5] {Aggravated.assault}     => {Motor.vehicle.theft} 0.9934066 0.9955947 
##     lift      count
## [1] 1.0000000 453  
## [2] 1.0000000 453  
## [3] 1.0000000 453  
## [4] 1.0000000 453  
## [5] 0.9999903 452

Lastly, lets look at the motorcycle theft. This theft too is very similar to burglary and larceny theft and has similar implications. From this we can safely say that property crimes are more likely not determined by any other crimes and can happen in jurisdictions where violent crimes are not present at all. But its not the other way around. The presence of property crime does implicate the presence of violent crimes.

Now, we know that these crimes can be classified into two type: violent crimes and property crimes. Lets add that as a level and see how the results are then.

names<-c("Aggravated.assault","Burglary","Larceny.theft","Legacy.rape_1","Motor.vehicle.theft","Murder.and.nonnegligent.Manslaughter","Robbery")
level_label<-c("Violent Crime","Property Crime" , "Property Crime","Violent Crime","Property Crime","Violent Crime","Violent Crime")
itemInfo(trans) <- data.frame(labels = names, level1 = level_label)
trans_level2<-aggregate(trans, by="level1")
freq_items_2<-eclat(trans_level2, parameter=list(supp=0.50, maxlen=8))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target   ext
##     FALSE     0.5      1      8 frequent itemsets FALSE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 227 
## 
## create itemset ... 
## set transactions ...[2 item(s), 455 transaction(s)] done [0.00s].
## sorting and recoding items ... [2 item(s)] done [0.00s].
## creating bit matrix ... [2 row(s), 455 column(s)] done [0.00s].
## writing  ... [3 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

freq_rules_l2<-ruleInduction(freq_items_2, trans_level2, confidence=0.95)
freq_rules_l2<-sort(freq_rules_l2, by="lift", decreasing=TRUE)
inspect(freq_rules_l2)

##     lhs                 rhs              support   confidence lift itemset
## [1] {Violent Crime}  => {Property Crime} 0.9978022 1.0000000  1    1      
## [2] {Property Crime} => {Violent Crime}  0.9978022 0.9978022  1    1

From the confidence and suport value, we can see that out of all the agency/jurisduction reports, both property and violent crimes encompass 99% of all the reports. This is not suprising because we just ahve these two classification of the entire set of crimes.

Lastly, lets take very frequently occuring crimes(by that I mean crimes which happen in more number of jurisdictions, not the number of times the crimes happen) and see if we have a heirarchy amoungst them.

install.packages("stats",repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/PC-CATHERINE/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

## Warning: package 'stats' is not available (for R version 3.6.1)

## Warning: package 'stats' is a base package, and should not be updated

install.packages("arules",repos = "http://cran.us.r-project.org")

## Installing package into 'C:/Users/PC-CATHERINE/Documents/R/win-library/3.6'
## (as 'lib' is unspecified)

## Warning: package 'arules' is in use and will not be installed

library(arules)
library(stats)
trans_sel<-trans[,itemFrequency(trans)>0.70]
d_jac_item<-dissimilarity(trans_sel, which="items") 
plot(hclust(d_jac_item, method = "ward.D2"), main = "Dendrogram for items")

The violent crimes like rape and murder and manslaughter are at seperate branches, but assault and robbery along with burglary and larceny theft form the last branches.This justifies our results and also the occurence of of these crimes in real life as well. WE all have heard numerous reports of robbery gone wrong where the inhabitants were assaulted or injured. This proves that association rule can help in mapping out which crimes are more likely to happen in a particulr place, given the reports of already reported crimes.

Further Scope

This paper can be further evolved to not just analyse the “crime basket” of a place but also can be applied to criminals. We all know that a past criminal record for a criminal implies a potential future one. But for which crimes exactly? This question can be answered by collecting data of criminal records of criminals and try to see if there is any combination of crimes that are more likely to imply the commitance of another crime for a particular criminal.

Association Rules in FBI Crime Data

Catherine Sunil

1/31/2020

INTRODUCTION:

Processing the Data:

Analysing the Data

Results

Further Scope