Synopsis

For my final project I will perform an analysis of homicide data in the United States from 1980 - 2014. This data is a part of the Murder Accountability Project, which aims at using data to predict criminal activity and decrease the rate of unsolved homicides in the United States. This data set is one of the most comprehensive public records of homicide data; it contains the following variables of interest: State, City, Year, Incident, Crime Type, Crime Solved, Victim Sex, Victim Age, Victim Race, Perpetrator Sex, Perpetrator Age, Perpetrator Race, Relationship to Victim and Weapon used. Using this data, I would like to track national homicide activity from 1980 - 2014 in order to answer the following questions: How have homicide rates changed over time? By state? What factors are most important in predicting victim profiles? Are investigators becoming more effective at solving and closing homicide cases? Through these questions I would like to better identify and summarize what criminal activity looks like in the U.S.

Analysis and Techniques

The data set includes nearly 639,000 rows of data, so my primary concern will be keeping the scope of the project tight and managable. This analysis will require summary statistics of the variables of interest, substantial data filtering, regression analysis and ample visualization techniques in ggplot. The tentative necessary packages come from tidyverse, including ggplot, dpylr and tidyr.

Why This Matters

Reducing homicide rates is far more layered and complex than simply analyzing historically accurate trends. There are socioeconomic, political, psychological, educational and a slew of issues that complicate realizing that objective. Building quantitative profiles however aid in decreasing the uncertainty surrounding why homicides occur and understanding how to assess a case.