Note: For this exam, don’t worry about including the correct labels and titles for the plots.
Question:
Load the data called riverton-crime.csv using the proper R command and file path.
Code (5 points):
datriv <- read_csv("/Users/sujeyvilleda/Desktop/crim1200/Exam Practice/riverton-crime.csv")
Question:
Read the codebook. What types of (stat) variables are these? Variables: id, gender, offense_type, num_prior_arrests, and time_served_months.
Text answer (5 points):
Question:
How many observations and variables are there in the data?
Code (3 points):
dim(datriv)
## [1] 500 6
Text answer (2 points):
There are 500 observations and 6 variables in the data.
Question:
What percentage of the sample is male?
Code (3 points):
datriv %>%
count(gender) %>%
mutate(percent = round(100 * n / sum(n), 1))
## # A tibble: 2 × 3
## gender n percent
## <chr> <int> <dbl>
## 1 Female 122 24.4
## 2 Male 378 75.6
Text answer (2 points):
75.6% of the sample are males.
Question:
Make a histogram for number of prior arrests and look for an appropriate number of bins. Describe this histogram.
Code (3 points):
view(datriv)
datriv %>% ggplot(aes(x=num_prior_arrests)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Text answer (2 points):
The histogram for number of prior arrests is unimodal and very slightly skewed right with a short tail. There arent any visible outliers.
Question:
Find statistics to describe the centrality and spread of the variable number of prior arrests. Are these numbers surprising?
Code (3 points):
summary_stats2 <- datriv %>%
summarise(
count = n(),
mean = mean(num_prior_arrests, na.rm = TRUE),
median = median(num_prior_arrests, na.rm = TRUE),
sd = sd(num_prior_arrests, na.rm = TRUE),
IQR = IQR(num_prior_arrests, na.rm = TRUE),
min = min(num_prior_arrests, na.rm = TRUE),
max = max(num_prior_arrests, na.rm = TRUE))
summary_stats2
## # A tibble: 1 × 7
## count mean median sd IQR min max
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 500 3.43 3 2.02 3 0 10
Text answer (2 points):
The centrality of the variable of number of prior arrests is a median of 3 and the spread is an IQR of 3 as well.I think it is a bit suprising that theyre the same number.
Question:
Now split up the histogram by gender. Does it look like males tend to have more or fewer prior arrests than females?
Code (3 points):
datriv %>% ggplot(aes(x=num_prior_arrests)) + geom_histogram() + facet_grid(~gender)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Text answer (2 points):
It looks like males tend to have more prior arrests than females.
Question:
Make a barplot for offense_type and describe it.
Code (3 points):
datriv %>% ggplot(aes(x = offense_type)) + geom_bar()
Text answer (2 points):
Property offenses are most common in this data. After property offenses, drug offenses would be next in most common and then violent offenses in third place and public order offenses last since it is the least common within this data.
Question:
Now split the barplot by gender. Does it look like males commit more violent crimes than females?
Code (3 points):
datriv %>% ggplot(aes(x = offense_type)) + geom_bar() + facet_grid(~gender)
Text answer (2 points):
It does look like males commit more crimes than females.
Question:
Make a histogram for time served. Split it up by offense type. Does it look like public order offenses have shorter or longer time served?
Code (3 points):
datriv %>% ggplot(aes(x=time_served_months)) + geom_histogram() + facet_grid(~offense_type)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Text answer (2 points):
It looks like public order has a shorter time served.
Question:
Now find a measure of centrality to determine whether your guess to #10 is correct.
Note: I have given you part of the code. You need to fill in the rest: Write the statistic you want to see in the empty round brackets. Next in the class, we will learn how to test these differences statistically.
Code (3 point):
datriv %>%
group_by(offense_type) %>%
summarise(
mean_arrests = mean(time_served_months, na.rm = TRUE),
median_arrests = median(time_served_months, na.rm = TRUE),
count = n())
## # A tibble: 4 × 4
## offense_type mean_arrests median_arrests count
## <chr> <dbl> <dbl> <int>
## 1 Drug 35.1 35.7 150
## 2 Property 29.1 29.1 160
## 3 Public Order 17.6 17.7 73
## 4 Violent 52.9 54.2 117
Text answer (2 point):
My guess is right. The mean and median of time served for public order is less than the ones for the other offenses.