PLS 120 Assignment 1

Assignment 1

Name: [Joseph Kehoe] ID: [920697315] Date: [Oct 4 2023]

You will be writing code into what are called “chunks”. This where you can actually run code. These are dilineated by the ” ``` ” marks you see around the large grey boxes. Written answers should be given outside of the chunks. You will be turning in the .Rmd with your name in the title, as well as a knit html file.

For this assignment, and the next several assignments, use the following data set, which examines crime statistics in LA. This dataset, LA_Crime.xlsx, reflects incidents of crime in the City of Los Angeles on the 1st of September, 2023 (i.e. only one single day). The source of data can be found here: https://catalog.data.gov/dataset/crime-data-from-2020-to-presentLinks to an external site.

The original dataset comprises more than 800,000 entries. However, I have narrowed it down to data from just one day, September 1, 2023, focusing solely on individuals identified as female (F) or male (M). The trimmed dataset consists of two columns: “Vict Age,” representing the age of the victim, and “Vict Sex,” indicating the victim’s gender. This reduced Excel file contains a total of 555 rows. Utilize this dataset to accomplish the following tasks.

RUN THE FOLLOWING CHUNK BEFORE STARTING ONLY IF YOU HAVE FRESHLY INSTALLED R ON YOUR OWN COMPUTER.

Otherwise, you can delete this chunk.

PART 1: Load the data into R for use in the rest of the assignment

#load the data into R
#crime_stats<-read.csv("C:/Users/Joey/Downloads/la_crime.csv")
  
#examine the data
#str(La.crime)
#str(La.crime)

the str() function will tell you a lot about your data, such as the type of data it is, how many levels there are, and the dimensions of the your data frame. Using str(), answer the following questions:

What type of data is “Vict Age”? How many levels does it have? Integer,
What type of data is “Vict Sex”? How many levels does it have? Character, two levels ——————————————————————————–

PART 2: Create a frequency table and histogram for victim age

Visualization is often a good way to observe macro level trends in data sets. A common method of doing this is using a histogram, which visualizes the number of occurrences of a particular data point. In this exercise, you will be making a histogram based upon the ages of the victims in the dataset. A frequency table will tell create a table with the corresponding

#create a frequency table with ages divided into ten groups
#frequency_table<- table(crime$gender, cut(crime$age,seq("Fill in the code here")))
#hist(La.crime$Vict.Age)
#hist(La.crime$Vict.Age, breaks=10)
#create a histogram with 10 bins

Now that you have created two tables describing the crime statistics visually, what do these tables tell you about the age of victims of crimes? At what age is someone most commonly to be a victim of crime, based on the data? Answer - Someone is most likely to be the victim of crime between the ages of 20 and 40.

PART 3: Create Male and Female tables

Now that we have observed from trends in the overall dataset, we can now dig a little deeper. We know the gender of each person in the dataset, so maybe we can learn more information and make new inferences if we can understand how each group is affected differently.

#create two new dataframes: one for male victims and one for female victims

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

#crime_male<- crime_stats %>% 
 # filter(Vict.Sex == "M")

#here I am using a package called tidyverse to select only the males. We'll learn more about this later, but try adapting the code to select only the females
#library(dplyr)
#femalecrime<- La.crime %>%
 # filter(Vict.Sex == "F")
#malecrime<- La.crime %>%
  #filter(Vict.Sex == "M")
# create a histogram for each data set with 10 bins
#hist(malecrime$Vict.Age, breaks=10)
#adjust the bins to 20
#hist(malecrime$Vict.Age, breaks=20)

#hist(femalecrime$Vict.Age, breaks=20)

Based on the histograms for the male and female victims, how does the trend in the data change? What new inferences can you make now that you have separated the data? Answer - Separating the data shows that female victims of violence tend to skew younger. As a woman you’re more likely to be a victim of violence when you are younger, and there is much less of a distinction in victim age for men.
What does changing the bins do? How may this affect your interpretation of the data? Answer - Increasing the number of bins gives you a more detailed picture of what age groups are more likely to be affected by crim.

PLS 120 Assignment 1

2024-09-30