Assignment 1

Name: Sydney Goodman ID: 919224391 Date: 10/04/2024

PART 1: Load the data into R for use in the rest of the assignment

file.exists("~/Downloads/la_crime.csv")
## [1] TRUE
setwd("~/Downloads") 
getwd() 
## [1] "/Users/syd/Downloads"
LA_Crime<- read.csv("la_crime.csv")
?read.csv() 

str(LA_Crime)
## 'data.frame':    555 obs. of  2 variables:
##  $ Vict.Age: int  23 23 56 40 27 63 30 18 32 54 ...
##  $ Vict.Sex: chr  "F" "M" "F" "M" ...

the str() function will tell you a lot about your data, such as the type of data it is, how many levels there are, and the dimensions of the your data frame. Using str(), answer the following questions:

  1. What type of data is “Vict Age”? How many levels does it have?

Victim age is an integer and it has 70 levels/ages.

  1. What type of data is “Vict Sex”? How many levels does it have?

Victim sex is a factor and it has 2 levels.

PART 2: Create a frequency table and histogram for victim age

#create a frequency table with ages divided into ten groups
table(LA_Crime$Vict.Age)
## 
##  8 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 
##  2  1  2  2  3  5  3  5  8  6  5  9 11 12 15 12 14 13 24 18 12 16 12 22 16 15 
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 
## 19 12 13 11 12 11 11  8 11  8  6 10  8  4 10  9  7 10  4  7  5  9 12 10  3  5 
## 63 64 65 66 67 68 69 70 71 73 74 75 76 77 78 80 85 91 
##  7  4  4  1  4  4  4  1  5  7  5  1  2  1  4  1  1  1
#create a histogram with 10 bins
hist(LA_Crime$Vict.Age, breaks = 10)

Now that you have created two tables describing the crime statistics visually, what do these tables tell you about the age of victims of crimes? At what age is someone most commonly to be a victim of crime, based on the data?

People are most likely to be the victims of crimes between ages 30 and 40, and being a victim between the ages of 20 and 30 is the second most likely.

PART 3: Create Male and Female tables

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
crime_male<-
LA_Crime %>%
  filter(Vict.Sex == "M")

crime_female<-
LA_Crime %>%
  filter(Vict.Sex == "F")
  
hist(crime_male$Vict.Age, breaks = 10, xlim = c(0,100))

hist(crime_female$Vict.Age, breaks = 10, xlim = c(0,100))

hist(crime_male$Vict.Age, breaks = 20, xlim = c(0,100))

hist(crime_female$Vict.Age, breaks = 20, xlim = c(0,100))

  1. Based on the histograms for the male and female victims, how does the trend in the data change? What new inferences can you make now that you have separated the data?

The trend in data shifts between ages per gender. Women are more likely to be victims of crimes between ages 25-30, but men are more likely to be victims of crimes between ages 35-40.

  1. What does changing the bins do? How may this affect your interpretation of the data?

Changing the bins added more blocks per age group within each division of 20 years. It was easier to narrow down a more exact age group with more bins.