Note: For this exam, don’t worry about including the correct labels and titles for the plots.

1. Loading data (5 points)

Question:

Load the data called riverton-crime.csv using the proper R command and file path.

Code (5 points):

datriv <- read_csv("/Users/sujeyvilleda/Desktop/crim1200/Exam Practice/riverton-crime.csv")

2. Variable types (5 points)

Question:

Read the codebook. What types of (stat) variables are these? Variables: id, gender, offense_type, num_prior_arrests, and time_served_months.

Text answer (5 points):

3. Data dimensions (5 points)

Question:

How many observations and variables are there in the data?

Code (3 points):

dim(datriv)
## [1] 500   6

Text answer (2 points):

There are 500 observations and 6 variables in the data.

4. Quantitative EDA (5 points)

Question:

What percentage of the sample is male?

Code (3 points):

 datriv %>%
  count(gender) %>%
  mutate(percent = round(100 * n / sum(n), 1))
## # A tibble: 2 × 3
##   gender     n percent
##   <chr>  <int>   <dbl>
## 1 Female   122    24.4
## 2 Male     378    75.6

Text answer (2 points):

75.6% of the sample are males.

5. Visual EDA (5 points)

Question:

Make a histogram for number of prior arrests and look for an appropriate number of bins. Describe this histogram.

Code (3 points):

view(datriv)
datriv %>% ggplot(aes(x=num_prior_arrests)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Text answer (2 points):

The histogram for number of prior arrests is unimodal and very slightly skewed right with a short tail. There arent any visible outliers.

6. Quantitative EDA (5 points)

Question:

Find statistics to describe the centrality and spread of the variable number of prior arrests. Are these numbers surprising?

Code (3 points):

summary_stats2 <- datriv %>%
  summarise(
    count = n(),
    mean = mean(num_prior_arrests, na.rm = TRUE),
    median = median(num_prior_arrests, na.rm = TRUE),
    sd = sd(num_prior_arrests, na.rm = TRUE),
    IQR = IQR(num_prior_arrests, na.rm = TRUE),
    min = min(num_prior_arrests, na.rm = TRUE),
    max = max(num_prior_arrests, na.rm = TRUE))
summary_stats2
## # A tibble: 1 × 7
##   count  mean median    sd   IQR   min   max
##   <int> <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>
## 1   500  3.43      3  2.02     3     0    10

Text answer (2 points):

The centrality of the variable of number of prior arrests is a median of 3 and the spread is an IQR of 3 as well.I think it is a bit suprising that theyre the same number.

7. Visual EDA (5 points)

Question:

Now split up the histogram by gender. Does it look like males tend to have more or fewer prior arrests than females?

Code (3 points):

datriv %>% ggplot(aes(x=num_prior_arrests)) + geom_histogram() + facet_grid(~gender)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Text answer (2 points):

It looks like males tend to have more prior arrests than females.

8. Visual EDA (5 points)

Question:

Make a barplot for offense_type and describe it.

Code (3 points):

datriv %>% ggplot(aes(x = offense_type)) + geom_bar() 

Text answer (2 points):

Property offenses are most common in this data. After property offenses, drug offenses would be next in most common and then violent offenses in third place and public order offenses last since it is the least common within this data.

9. Visual EDA (5 points)

Question:

Now split the barplot by gender. Does it look like males commit more violent crimes than females?

Code (3 points):

datriv %>% ggplot(aes(x = offense_type)) + geom_bar() + facet_grid(~gender)

Text answer (2 points):

It does look like males commit more crimes than females.

10. Visual EDA (5 points)

Question:

Make a histogram for time served. Split it up by offense type. Does it look like public order offenses have shorter or longer time served?

Code (3 points):

datriv %>% ggplot(aes(x=time_served_months)) + geom_histogram() + facet_grid(~offense_type)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Text answer (2 points):

It looks like public order has a shorter time served.

Extra credit. Quantitative EDA (5 points extra credit)

Question:

Now find a measure of centrality to determine whether your guess to #10 is correct.

Note: I have given you part of the code. You need to fill in the rest: Write the statistic you want to see in the empty round brackets. Next in the class, we will learn how to test these differences statistically.

Code (3 point):

datriv %>%
  group_by(offense_type) %>%
  summarise(
    mean_arrests = mean(time_served_months, na.rm = TRUE),
    median_arrests = median(time_served_months, na.rm = TRUE),
    count = n())
## # A tibble: 4 × 4
##   offense_type mean_arrests median_arrests count
##   <chr>               <dbl>          <dbl> <int>
## 1 Drug                 35.1           35.7   150
## 2 Property             29.1           29.1   160
## 3 Public Order         17.6           17.7    73
## 4 Violent              52.9           54.2   117

Text answer (2 point):

My guess is right. The mean and median of time served for public order is less than the ones for the other offenses.