library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.1
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.3.0 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(odbc)
## Warning: package 'odbc' was built under R version 4.2.3
library(DBI)
library(RSQLite)
## Warning: package 'RSQLite' was built under R version 4.2.3
library(readr)
lapd_crime_2020_to_20231112 <- read_csv("C:/Users/joel.torres/Documents/Data Analytics/LAPD_crime/Crime_Data_from_2020_to_Present_20231112.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
## Rows: 834320 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): DR_NO, Date Rptd, DATE OCC, TIME OCC, AREA, AREA NAME, Rpt Dist No...
## dbl (10): Part 1-2, Crm Cd, Vict Age, Premis Cd, Weapon Used Cd, Crm Cd 1, C...
## lgl (1): Crm Cd 4
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
I would like to look into what age group victims are more susceptible to crime in general. Are certain age groups more prone to specific type of crimes. What are the differences in crime by gender and victim age. Does victim descent play a role in the counts of crime. What geographic area based off of police district zones have higher counts of crime.
First focus on victim age groups, gender, and types of crime associated with each. I’m thinking of classifying crimes into one category of violent crimes which include murder, manslaughter, rape, robbery, and assault. The other group would then just be classified as non-violent crimes.
Basic Data Exploration
lapd_crime_2020_to_20231112
## # A tibble: 834,320 × 28
## DR_NO Date …¹ DATE …² TIME …³ AREA AREA …⁴ Rpt D…⁵ Part …⁶ Crm C…⁷ Crm C…⁸
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl> <chr>
## 1 010304… 01/08/… 01/08/… 2230 03 Southw… 0377 2 624 BATTER…
## 2 190101… 01/02/… 01/01/… 0330 01 Central 0163 2 624 BATTER…
## 3 200110… 04/14/… 02/13/… 1200 01 Central 0155 2 845 SEX OF…
## 4 191501… 01/01/… 01/01/… 1730 15 N Holl… 1543 2 745 VANDAL…
## 5 191921… 01/01/… 01/01/… 0415 19 Mission 1998 2 740 VANDAL…
## 6 200100… 01/02/… 01/01/… 0030 01 Central 0163 1 121 RAPE, …
## 7 200100… 01/02/… 01/02/… 1315 01 Central 0161 1 442 SHOPLI…
## 8 200100… 01/04/… 01/04/… 0040 01 Central 0155 2 946 OTHER …
## 9 200100… 01/04/… 01/04/… 0200 01 Central 0101 1 341 THEFT-…
## 10 201710… 06/19/… 05/26/… 1925 17 Devons… 1708 1 341 THEFT-…
## # … with 834,310 more rows, 18 more variables: Mocodes <chr>, `Vict Age` <dbl>,
## # `Vict Sex` <chr>, `Vict Descent` <chr>, `Premis Cd` <dbl>,
## # `Premis Desc` <chr>, `Weapon Used Cd` <dbl>, `Weapon Desc` <chr>,
## # Status <chr>, `Status Desc` <chr>, `Crm Cd 1` <dbl>, `Crm Cd 2` <dbl>,
## # `Crm Cd 3` <dbl>, `Crm Cd 4` <lgl>, LOCATION <chr>, `Cross Street` <chr>,
## # LAT <dbl>, LON <dbl>, and abbreviated variable names ¹`Date Rptd`,
## # ²`DATE OCC`, ³`TIME OCC`, ⁴`AREA NAME`, ⁵`Rpt Dist No`, ⁶`Part 1-2`, …
Based off of this crime summary tibble I can conclude there are at least 834,320 rows of data and/or crime incidents.
The lapd_crime main dataset is too large to view all columns within a tibble in markdown. To view full dataset click the following link
Here is an iframe of the LA City LAPD crime dataset in Power BI
knitr::include_url("https://app.powerbi.com/view?r=eyJrIjoiMzRiZGM2Y2MtOTBmNS00YTRkLTljY2EtMDJhODRiNzY4ZmQzIiwidCI6IjFjMmFhOTkyLTgzNjItNGNiZi1hOTYxLTk1Njk5NWMzNDM3MyIsImMiOjZ9&embedImagePlaceholder=true")
To view the Power BI visualization on another page click the following
lapd_crime_summary <- read.csv("C:/Users/joel.torres/Documents/Data Analytics/Rstudio/LAPD_crime/crime_basic_summary.csv")
lapd_crime_summary
## total_ct_crime.avg_age.min_vict_age.max_vict_age
## 1 834320;29;0;120
library(readxl)
crime_basic_summary_2 <- read_excel("C:/Users/joel.torres/Documents/Data Analytics/Rstudio/LAPD_crime/crime_basic_summary_2.xls")
crime_basic_summary_2
## # A tibble: 1 × 4
## total_ct_crime avg_age min_vict_age max_vict_age
## <dbl> <dbl> <dbl> <dbl>
## 1 834320 29 0 120
An average age of a victim to be 29 years old. Minimum age for victim at 0 years of age and maximum age of a victim at 120 years of age.
con <- dbConnect(SQLite(), ":memory:")
copy_to(con, crime_basic_summary_2)
dbDisconnect(con)
con <- dbConnect(SQLite(), ":memory:")
copy_to(con, lapd_crime_2020_to_20231112)
dbGetQuery(con, '
SELECT "Vict Age", "Crm Cd Desc"
FROM lapd_crime_2020_to_20231112
LIMIT 50
')
## Vict Age Crm Cd Desc
## 1 36 BATTERY - SIMPLE ASSAULT
## 2 25 BATTERY - SIMPLE ASSAULT
## 3 0 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 4 76 VANDALISM - MISDEAMEANOR ($399 OR UNDER)
## 5 31 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 6 25 RAPE, FORCIBLE
## 7 23 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 8 0 OTHER MISCELLANEOUS CRIME
## 9 23 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 10 0 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 11 29 BURGLARY FROM VEHICLE
## 12 35 CRIMINAL THREATS - NO WEAPON DISPLAYED
## 13 41 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 14 0 ARSON
## 15 24 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 16 24 INTIMATE PARTNER - SIMPLE ASSAULT
## 17 34 INTIMATE PARTNER - SIMPLE ASSAULT
## 18 29 THEFT PLAIN - PETTY ($950 & UNDER)
## 19 46 THEFT OF IDENTITY
## 20 66 ROBBERY
## 21 40 THEFT OF IDENTITY
## 22 31 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 23 27 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 24 46 THEFT OF IDENTITY
## 25 62 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT
## 26 34 THEFT OF IDENTITY
## 27 43 BURGLARY FROM VEHICLE
## 28 71 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT
## 29 50 BATTERY - SIMPLE ASSAULT
## 30 19 RAPE, FORCIBLE
## 31 51 BURGLARY
## 32 27 THEFT OF IDENTITY
## 33 33 INTIMATE PARTNER - SIMPLE ASSAULT
## 34 0 VEHICLE - STOLEN
## 35 69 CRIMINAL THREATS - NO WEAPON DISPLAYED
## 36 39 THEFT OF IDENTITY
## 37 19 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 38 57 BATTERY - SIMPLE ASSAULT
## 39 0 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 40 78 BATTERY - SIMPLE ASSAULT
## 41 0 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 42 24 BURGLARY FROM VEHICLE
## 43 52 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT
## 44 38 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT
## 45 55 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 46 0 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)
## 47 66 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 48 34 BRANDISH WEAPON
## 49 50 INTIMATE PARTNER - AGGRAVATED ASSAULT
## 50 35 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
At first glance age value 0 for “Vict Age” column seems like a problem in data reporting. The crime description does not fit the age.
dbGetQuery(con, '
SELECT "Vict Age", "Crm Cd Desc"
FROM lapd_crime_2020_to_20231112
ORDER BY "Vict Age" Asc
LIMIT 100
')
## Vict Age Crm Cd Desc
## 1 -3 EMBEZZLEMENT, PETTY THEFT ($950 & UNDER)
## 2 -2 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 3 -2 THEFT OF IDENTITY
## 4 -2 THEFT OF IDENTITY
## 5 -2 OTHER MISCELLANEOUS CRIME
## 6 -2 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 7 -2 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 8 -2 BURGLARY
## 9 -2 CRIMINAL THREATS - NO WEAPON DISPLAYED
## 10 -2 THEFT OF IDENTITY
## 11 -2 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 12 -2 VEHICLE - STOLEN
## 13 -2 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 14 -2 SEX,UNLAWFUL(INC MUTUAL CONSENT, PENETRATION W/ FRGN OBJ
## 15 -1 BURGLARY
## 16 -1 BURGLARY
## 17 -1 THEFT OF IDENTITY
## 18 -1 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 19 -1 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 20 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 21 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 22 -1 VANDALISM - MISDEAMEANOR ($399 OR UNDER)
## 23 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 24 -1 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 25 -1 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 26 -1 BURGLARY
## 27 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 28 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 29 -1 THEFT OF IDENTITY
## 30 -1 DOCUMENT FORGERY / STOLEN FELONY
## 31 -1 THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)
## 32 -1 THEFT OF IDENTITY
## 33 -1 BURGLARY
## 34 -1 THEFT PLAIN - PETTY ($950 & UNDER)
## 35 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 36 -1 THEFT PLAIN - PETTY ($950 & UNDER)
## 37 -1 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 38 -1 THEFT OF IDENTITY
## 39 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 40 -1 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 41 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 42 -1 TRESPASSING
## 43 -1 DOCUMENT FORGERY / STOLEN FELONY
## 44 -1 BURGLARY
## 45 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 46 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 47 -1 THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)
## 48 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 49 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 50 -1 THEFT PLAIN - PETTY ($950 & UNDER)
## 51 -1 THEFT OF IDENTITY
## 52 -1 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 53 -1 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 54 -1 BUNCO, GRAND THEFT
## 55 -1 BRANDISH WEAPON
## 56 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 57 -1 THEFT PLAIN - PETTY ($950 & UNDER)
## 58 -1 UNAUTHORIZED COMPUTER ACCESS
## 59 -1 BUNCO, GRAND THEFT
## 60 -1 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 61 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 62 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 63 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 64 -1 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 65 -1 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 66 -1 THEFT PLAIN - PETTY ($950 & UNDER)
## 67 -1 THEFT OF IDENTITY
## 68 -1 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 69 -1 THEFT OF IDENTITY
## 70 -1 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 71 -1 THEFT OF IDENTITY
## 72 -1 BUNCO, GRAND THEFT
## 73 0 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 74 0 OTHER MISCELLANEOUS CRIME
## 75 0 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 76 0 ARSON
## 77 0 VEHICLE - STOLEN
## 78 0 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 79 0 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 80 0 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)
## 81 0 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 82 0 BATTERY POLICE (SIMPLE)
## 83 0 BATTERY POLICE (SIMPLE)
## 84 0 VANDALISM - MISDEAMEANOR ($399 OR UNDER)
## 85 0 BURGLARY
## 86 0 VEHICLE - STOLEN
## 87 0 ROBBERY
## 88 0 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 89 0 VEHICLE - STOLEN
## 90 0 VEHICLE - STOLEN
## 91 0 VANDALISM - MISDEAMEANOR ($399 OR UNDER)
## 92 0 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)
## 93 0 BURGLARY FROM VEHICLE
## 94 0 VEHICLE - STOLEN
## 95 0 VEHICLE - STOLEN
## 96 0 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 97 0 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 98 0 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 99 0 VEHICLE - STOLEN
## 100 0 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)
In RMarkdown the NULL values are coming in as negative numbers. I know this because in SQL Server those same negative values are NULL.
Upon looking further at the data set with just the first top 100 rows and in ascending order by age I could see there are quite a few values for Null/negative values and 0 values.Taking a closer look at the victim age columns and Crime descriptions we can gather that crimes that have NULL, negative (-) values, and 0 value have crime descriptions which are victimless or where the crime committed for the age group of 0 seems to be out of the norm.For example crimes of burglary, theft, trespassing, and out of compliance are a few of the crimes for which there is no direct victim. If there was a victim but the office reporting could not get victims age then reporting a 0 was possible. Due to all these various possibilities I will be removing value 0 for victim age column.
This data set is going to come with quite a bit of reporting errors. The data set as described was inputted from crime incident reports and was not cleaned up for any type of formatting errors. When exploring and summarizing for age I will not be including the null/negative and 0 values. The first summary for table 1.1 is therefore inaccurate.
dbGetQuery(con, '
SELECT COUNT("Vict Age") AS "Count_of_0age_victim_crime"
FROM lapd_crime_2020_to_20231112
WHERE "Vict Age" <= 0
ORDER BY "Vict Age" ASC
')
## Count_of_0age_victim_crime
## 1 206407
dbGetQuery(con, '
SELECT COUNT("Vict Age") AS "Count_of_victim_crime_wAge"
FROM lapd_crime_2020_to_20231112
WHERE "Vict Age" > 0
ORDER BY "Vict Age" ASC
')
## Count_of_victim_crime_wAge
## 1 627913
library(tidyverse)
crime_by_vict_age <- read_xlsx("C:/Users/joel.torres/Documents/Data Analytics/LAPD_crime/crime_by_vict_age.xlsx")
lapd_crime_df <- data.frame(lapd_crime_2020_to_20231112)
lapd_victim_crime_df <- data.frame(crime_by_vict_age)
ggplot(data = lapd_victim_crime_df, aes(x = Vict_Age, y= "")) +
geom_boxplot()
The dataset for this boxplot did not include Nulls/Negative numbers nor 0 values. I was curious about the IQR range and if they’ll be any outliers. Based on the results there are definitely quite a few outliers that I’d like to address.
q_marks <- quantile(crime_by_vict_age$`Vict_Age`)
iqr <- IQR(crime_by_vict_age$Vict_Age)
q_marks[2]
## 25%
## 28
q_marks[4]
## 75%
## 50
iqr
## [1] 22
up_range <- q_marks[4] + 1.5*iqr # Upper Range
up_range
## 75%
## 83
Our results indicate that quartile 1 is 28, quartile 3 is 50. The Interquartile Range (IQR) is 22. When calculating the quartile range outside of the IQR box but within the whisker range I get a result of 83. This means that any values 84 and above are considered outliers.
dbGetQuery(con, '
SELECT COUNT("Vict Age") as vict_total_above83
FROM lapd_crime_2020_to_20231112
WHERE "Vict Age" > 83
')
## vict_total_above83
## 1 3785
Summary for victim_age column
834,220 total crime cases. 206407 total crime for incidents where victim age of 0 is not applicable. Which leaves 627913 crimes with victim ages are greater than 1 after filtering out 0 values. We also see that there are 3785 outliers in our dataset. These outliers are victim age values that are greater than 83. I will have to further analyse the outlier values to see if there is any merit to keeping the values or removing the values for the sake of doing a summary of statistics on the victim age. To do so we’ll need to analyze the crime description for the age outliers.
dbGetQuery(con, '
SELECT COUNT(DISTINCT "Crm Cd Desc") AS Desc_crime_count_greater_than_age_83
FROM lapd_crime_2020_to_20231112
WHERE "Vict Age" > 83
')
## Desc_crime_count_greater_than_age_83
## 1 76
dbGetQuery(con, '
SELECT DISTINCT "Crm Cd Desc" AS Crime_description
FROM lapd_crime_2020_to_20231112
WHERE "Vict Age" > 83
')
## Crime_description
## 1 THEFT, PERSON
## 2 ATTEMPTED ROBBERY
## 3 CHILD NEGLECT (SEE 300 W.I.C.)
## 4 THEFT PLAIN - PETTY ($950 & UNDER)
## 5 THEFT OF IDENTITY
## 6 BATTERY - SIMPLE ASSAULT
## 7 BURGLARY
## 8 THEFT FROM MOTOR VEHICLE - GRAND ($950.01 AND OVER)
## 9 ROBBERY
## 10 BURGLARY, ATTEMPTED
## 11 BUNCO, GRAND THEFT
## 12 BURGLARY FROM VEHICLE
## 13 THEFT-GRAND ($950.01 & OVER)EXCPT,GUNS,FOWL,LIVESTK,PROD
## 14 VANDALISM - MISDEAMEANOR ($399 OR UNDER)
## 15 ASSAULT WITH DEADLY WEAPON, AGGRAVATED ASSAULT
## 16 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 17 BIKE - STOLEN
## 18 DOCUMENT FORGERY / STOLEN FELONY
## 19 THEFT FROM MOTOR VEHICLE - PETTY ($950 & UNDER)
## 20 CREDIT CARDS, FRAUD USE ($950 & UNDER
## 21 OTHER MISCELLANEOUS CRIME
## 22 KIDNAPPING
## 23 VIOLATION OF RESTRAINING ORDER
## 24 VIOLATION OF COURT ORDER
## 25 SEXUAL PENETRATION W/FOREIGN OBJECT
## 26 STALKING
## 27 INDECENT EXPOSURE
## 28 TRESPASSING
## 29 BUNCO, PETTY THEFT
## 30 DISTURBING THE PEACE
## 31 BRANDISH WEAPON
## 32 INTIMATE PARTNER - SIMPLE ASSAULT
## 33 BURGLARY FROM VEHICLE, ATTEMPTED
## 34 CHILD ANNOYING (17YRS & UNDER)
## 35 VEHICLE, STOLEN - OTHER (MOTORIZED SCOOTERS, BIKES, ETC)
## 36 INTIMATE PARTNER - AGGRAVATED ASSAULT
## 37 OTHER ASSAULT
## 38 LETTERS, LEWD - TELEPHONE CALLS, LEWD
## 39 SHOTS FIRED AT INHABITED DWELLING
## 40 CHILD ABUSE (PHYSICAL) - SIMPLE ASSAULT
## 41 CRIMINAL THREATS - NO WEAPON DISPLAYED
## 42 VEHICLE - ATTEMPT STOLEN
## 43 PICKPOCKET
## 44 UNAUTHORIZED COMPUTER ACCESS
## 45 ARSON
## 46 THEFT PLAIN - ATTEMPT
## 47 BATTERY WITH SEXUAL CONTACT
## 48 CONTEMPT OF COURT
## 49 THREATENING PHONE CALLS/LETTERS
## 50 CRIMINAL HOMICIDE
## 51 PROWLER
## 52 BUNCO, ATTEMPT
## 53 VIOLATION OF TEMPORARY RESTRAINING ORDER
## 54 CHILD STEALING
## 55 SHOPLIFTING - PETTY THEFT ($950 & UNDER)
## 56 RAPE, FORCIBLE
## 57 HUMAN TRAFFICKING - INVOLUNTARY SERVITUDE
## 58 KIDNAPPING - GRAND ATTEMPT
## 59 VEHICLE - STOLEN
## 60 DISHONEST EMPLOYEE - GRAND THEFT
## 61 DOCUMENT WORTHLESS ($200.01 & OVER)
## 62 BATTERY POLICE (SIMPLE)
## 63 COUNTERFEIT
## 64 EXTORTION
## 65 CREDIT CARDS, FRAUD USE ($950.01 & OVER)
## 66 SEX OFFENDER REGISTRANT OUT OF COMPLIANCE
## 67 CRM AGNST CHLD (13 OR UNDER) (14-15 & SUSP 10 YRS OLDER)
## 68 EMBEZZLEMENT, GRAND THEFT ($950.01 & OVER)
## 69 DISCHARGE FIREARMS/SHOTS FIRED
## 70 SODOMY/SEXUAL CONTACT B/W PENIS OF ONE PERS TO ANUS OTH
## 71 SHOPLIFTING-GRAND THEFT ($950.01 & OVER)
## 72 SHOTS FIRED AT MOVING VEHICLE, TRAIN OR AIRCRAFT
## 73 THEFT FROM PERSON - ATTEMPT
## 74 THEFT FROM MOTOR VEHICLE - ATTEMPT
## 75 WEAPONS POSSESSION/BOMBING
## 76 PURSE SNATCHING
Summary of results:
Based off of the crime description for victim ages 84 and above you see that some crimes are what might be considered crimes with no particular individual but that of against a group. However, the number of crimes equal to or greater than 84 are 3,785 compared to that of crimes equal to 0 or null are 206,335.It is apparent that the 0 value for age was meant as a way to signify that a crime according to the police report either was a victimless crime, had no value to input (null), or also a group crime, or was just missing. Either way when calculating a mean summary statistics for the victim_age variable we cannot use those 206,335 rows for 0 age values.
At first I was going to use the victim_age column as a way to differentiate between crimes with victims and non-victim crimes. However, the way this column value was reported in the data set makes it difficult to distinguish. For one instead of using null for crimes with no victim they used the value of 0. Also, the interpretation of a victim is defined differently in the reporting as seen by the data. For example for the same crime description of auto theft some reports have a value of 0 indicating the officer felt this was a victimless crime, however, in other incidences of the same crime it was reported with an age of an adult. This means that other officers felt that the person who’s car got stolen was a victim. For the purposes of this study I will evaluate all crime as having a victim when age is reported greater than 0. When making summary calculations for age variable I will rule out the age 0 value.
The idea here is that crimes that were reported as having an age value of 99 were seen as crimes against a group of people. Some crimes for example theft/vandalism to a vehicle would be easier to report an age value to because you can pinpoint the crime to an individual. However, a crime of theft/burglary to a church is a crime that does have a victim, not an individual, but to a group of people.
When I calculate for mean age for crimes I need to take out the 0 values for reasons stated earlier. But what about for the values that are outliers? I believe that we can rationaly make the conclusion that most of the outliers (84 and above), definitely age value 99, were values used for crimes against groups.
For clarification, I am summarizing that all crimes have a victim, however, when summarizing the mean I will not be using the 0 age values nor the outliers. That being said using the mean to describe these crimes might not be as appropriate compared to using the boxplot IQR, median, and mode values.
I will first analyse the crime descriptions and crime codes to check how many crime categories we have. I will check those categories for crime descriptions that are more applicable to crimes which might not apply to a singular victim but more so have an effect on a group.
Crimes that would effect a group or collective whole
Crimes that involve vandalism and burglary can possibly be distinguished as being crimes that in our specific requirements fall under crimes that effect a group as whole. Burglary for example is typical of crimes to property whereas crimes involving a robbery are crimes that require a threat or physical nature. However, for our intent and purposes even a crime such as burglary can in fact indirectly have an individual victim. For example if a persons home is burglarized then they have become a victim even though they might not have directly been threatened physically or verbally or have been present but in general said crime can in fact effect one person. If this is the case for such crimes then it’s possible the officer reporting the crime might have put the age of the victim that reported/called in the crime or who it effected indirectly. Vandalism therefore as well can fall under the same rationale as that of burglary. About the only crime for victim age outliers that I can definitively say the age reported by the officer is not representative of the individual or the group is VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS). But lets look at the Crm_Cd_Desc and see how it matches with Premis_Desc.
dbGetQuery(con, '
SELECT "Crm Cd Desc" AS Crime_description, "Premis Desc", "Vict Age"
FROM lapd_crime_2020_to_20231112
WHERE "Vict Age" > 83
AND "Crm Cd Desc" LIKE "%CHURCH VANDALISMS%"
ORDER BY "Vict Age" DESC
LIMIT 50
')
## Crime_description
## 1 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 2 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 3 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 4 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 5 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 6 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 7 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 8 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 9 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 10 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 11 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 12 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 13 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 14 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 15 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 16 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 17 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 18 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 19 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 20 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 21 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 22 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 23 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 24 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 25 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 26 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 27 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 28 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 29 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 30 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 31 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 32 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 33 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 34 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 35 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 36 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 37 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 38 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 39 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 40 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 41 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 42 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 43 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 44 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 45 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 46 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 47 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 48 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 49 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## 50 VANDALISM - FELONY ($400 & OVER, ALL CHURCH VANDALISMS)
## Premis Desc Vict Age
## 1 SINGLE FAMILY DWELLING 99
## 2 VEHICLE, PASSENGER/TRUCK 99
## 3 VEHICLE, PASSENGER/TRUCK 99
## 4 VEHICLE, PASSENGER/TRUCK 99
## 5 MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC) 99
## 6 MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC) 99
## 7 VEHICLE, PASSENGER/TRUCK 99
## 8 STREET 99
## 9 DEPARTMENT STORE 99
## 10 RESTAURANT/FAST FOOD 99
## 11 STREET 99
## 12 RESTAURANT/FAST FOOD 99
## 13 VEHICLE, PASSENGER/TRUCK 99
## 14 OTHER BUSINESS 99
## 15 STREET 99
## 16 STREET 99
## 17 SINGLE FAMILY DWELLING 98
## 18 OTHER STORE 98
## 19 VEHICLE, PASSENGER/TRUCK 98
## 20 OTHER BUSINESS 97
## 21 SINGLE FAMILY DWELLING 97
## 22 MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC) 96
## 23 PARKING LOT 96
## 24 VEHICLE, PASSENGER/TRUCK 95
## 25 SINGLE FAMILY DWELLING 95
## 26 OTHER BUSINESS 94
## 27 SINGLE FAMILY DWELLING 94
## 28 VEHICLE, PASSENGER/TRUCK 94
## 29 VEHICLE, PASSENGER/TRUCK 94
## 30 STREET 94
## 31 SINGLE FAMILY DWELLING 93
## 32 SINGLE FAMILY DWELLING 93
## 33 SINGLE FAMILY DWELLING 93
## 34 SINGLE FAMILY DWELLING 92
## 35 SINGLE FAMILY DWELLING 92
## 36 SINGLE FAMILY DWELLING 92
## 37 OTHER BUSINESS 92
## 38 SINGLE FAMILY DWELLING 91
## 39 SINGLE FAMILY DWELLING 91
## 40 SINGLE FAMILY DWELLING 91
## 41 OTHER RESIDENCE 91
## 42 SINGLE FAMILY DWELLING 91
## 43 ALLEY 91
## 44 SINGLE FAMILY DWELLING 90
## 45 STREET 90
## 46 DRIVEWAY 90
## 47 SINGLE FAMILY DWELLING 90
## 48 VEHICLE, PASSENGER/TRUCK 90
## 49 MULTI-UNIT DWELLING (APARTMENT, DUPLEX, ETC) 90
## 50 SINGLE FAMILY DWELLING 90
I was trying to make the assumption that the outlier Victim Age values which are values greater than 83 are prone to values which do not represent the correct age of a victim. If I could surmise that the crime description was a crime for a non specific individual, like a group, then I could remove such data from creating a statistal summary. If the age values are not valid then that would effect the mean age values from crime. However, upon inspecting the Premise Description along with it’s crime code descriptions there doesn’t seem to be a correlation that crimes to property such as vandalism are only effecting the general group and not individuals (that might have a legitimate age value). There are vandalism crimes with Premis Desc that are happening in a single family dwelling or vehicle which can imply that the victim age is possibly correct.
For this dataset because there are so few crimes above the age of 83, and because of the possibility that a higher age was used to denote an unknown age, or a crime to public property, and because with the use of standard deviation the calculated values greater than 83 are outliers. Therefore I will not include these outliers, along with value age 0, to calculate statistical summaries.
dbGetQuery(con, '
SELECT
"DR_NO", "Vict Age",
IIF("Vict Age" between 1 and 4, "Toddler",
IIF("Vict Age" between 5 and 12, "Kid",
IIF("Vict Age" between 13 and 17, "Teenager",
IIF("Vict Age" between 18 and 25, "Young Adult",
IIF("Vict Age" between 26 and 39, "Adult",
IIF("Vict Age" between 40 and 65, "Middle Age", "Senior")))))) as Age_Group
FROM lapd_crime_2020_to_20231112
where "Vict Age" between 1 and 83
order by "Vict Age" asc
LIMIT 10;
')
## DR_NO Vict Age Age_Group
## 1 210816791 2 Toddler
## 2 200907120 2 Toddler
## 3 211304234 2 Toddler
## 4 200715304 2 Toddler
## 5 221806711 2 Toddler
## 6 211219616 2 Toddler
## 7 201320179 2 Toddler
## 8 200516029 2 Toddler
## 9 200307249 2 Toddler
## 10 200108589 2 Toddler
A csv file was created from the sql query above but for the whole data set result not just limited to the 10 rows. Which gives me a category age grouping for every victim age except the outliers and the 0 values. Lets take a look at the distinct category age groups and their respective age range.
lapd_vict_age_bins <- read_excel("C:/Users/joel.torres/Documents/Data Analytics/LAPD_crime/lapd_vict_age_groups_4.xlsx")
dbDisconnect(con)
con <- dbConnect(SQLite(), ":memory:")
copy_to(con, lapd_vict_age_bins)
dbGetQuery(con, '
SELECT DISTINCT "Age_Group",
IIF("Age_Group" = "Toddler", "a) 1-4",
IIF("Age_Group" = "Kid", "b) 5-12",
IIF("Age_Group" = "Teenager", "c) 13-17",
IIF("Age_Group" = "Young Adult", "d) 18-25",
IIF("Age_Group" = "Adult", "e) 26-39",
IIF("Age_Group" = "Middle Age", "f) 40-65", "g) 66-plus")))))) as Age_Range
FROM lapd_vict_age_bins
')
## Age_Group Age_Range
## 1 Toddler a) 1-4
## 2 Kid b) 5-12
## 3 Teenager c) 13-17
## 4 Young Adult d) 18-25
## 5 Adult e) 26-39
## 6 Middle Age f) 40-65
## 7 Senior g) 66-plus
I have 7 different categories with their respective age ranges.
Lets take a look at total count of crime for each age group
dbGetQuery(con, '
SELECT "Age_Group", COUNT("Age_Group") as Age_Group_Total
FROM lapd_vict_age_bins
GROUP BY "Age_Group"
ORDER BY Age_Group_Total ASC;
')
## Age_Group Age_Group_Total
## 1 Toddler 1246
## 2 Kid 5964
## 3 Teenager 14935
## 4 Senior 38578
## 5 Young Adult 96117
## 6 Middle Age 231759
## 7 Adult 235528
Here we can see that most crime in Los Angeles effects the middle age and Adult populations.