Assignment 1
Name: Sydney Goodman ID: 919224391 Date: 10/04/2024
PART 1: Load the data into R for use in the rest of the assignment
file.exists("~/Downloads/la_crime.csv")
## [1] TRUE
setwd("~/Downloads")
getwd()
## [1] "/Users/syd/Downloads"
LA_Crime<- read.csv("la_crime.csv")
?read.csv()
str(LA_Crime)
## 'data.frame': 555 obs. of 2 variables:
## $ Vict.Age: int 23 23 56 40 27 63 30 18 32 54 ...
## $ Vict.Sex: chr "F" "M" "F" "M" ...
the str() function will tell you a lot about your data, such as the type of data it is, how many levels there are, and the dimensions of the your data frame. Using str(), answer the following questions:
Victim age is an integer and it has 70 levels/ages.
PART 2: Create a frequency table and histogram for victim age
#create a frequency table with ages divided into ten groups
table(LA_Crime$Vict.Age)
##
## 8 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## 2 1 2 2 3 5 3 5 8 6 5 9 11 12 15 12 14 13 24 18 12 16 12 22 16 15
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
## 19 12 13 11 12 11 11 8 11 8 6 10 8 4 10 9 7 10 4 7 5 9 12 10 3 5
## 63 64 65 66 67 68 69 70 71 73 74 75 76 77 78 80 85 91
## 7 4 4 1 4 4 4 1 5 7 5 1 2 1 4 1 1 1
#create a histogram with 10 bins
hist(LA_Crime$Vict.Age, breaks = 10)
Now that you have created two tables describing the crime statistics visually, what do these tables tell you about the age of victims of crimes? At what age is someone most commonly to be a victim of crime, based on the data?
PART 3: Create Male and Female tables
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
crime_male<-
LA_Crime %>%
filter(Vict.Sex == "M")
crime_female<-
LA_Crime %>%
filter(Vict.Sex == "F")
hist(crime_male$Vict.Age, breaks = 10, xlim = c(0,100))
hist(crime_female$Vict.Age, breaks = 10, xlim = c(0,100))
hist(crime_male$Vict.Age, breaks = 20, xlim = c(0,100))
hist(crime_female$Vict.Age, breaks = 20, xlim = c(0,100))
The trend in data shifts between ages per gender. Women are more likely to be victims of crimes between ages 25-30, but men are more likely to be victims of crimes between ages 35-40.
Changing the bins added more blocks per age group within each division of 20 years. It was easier to narrow down a more exact age group with more bins.