Note: For this exam, don’t worry about including the correct labels and titles for the plots.
Question:
Load the data called riverton-crime.csv using the proper R command and file path.
Code (5 points):
dat_riverton <- read.csv("C:/Users/evely/AppData/Roaming/Microsoft/Windows/Network Shortcuts/Penn Classes/CRIM/Data/riverton-crime.csv")
Question:
Read the codebook. What types of (stat) variables are these? Variables: id, gender, offense_type, num_prior_arrests, and time_served_months.
Text answer (5 points):
Question:
How many observations and variables are there in the data?
Code (3 points):
dim(dat_riverton)
## [1] 500 6
Text answer (2 points):
There are 500 observations of 6 varibales
Question:
What percentage of the sample is male?
Code (3 points):
dat_riverton %>%
count(gender) %>%
mutate(prop = prop.table(n))
## gender n prop
## 1 Female 122 0.244
## 2 Male 378 0.756
Text answer (2 points):
75.6% of the sample is male.
Question:
Make a histogram for number of prior arrests and look for an appropriate number of bins. Describe this histogram.
Code (3 points):
dat_riverton %>%
ggplot(aes(x = num_prior_arrests)) + geom_histogram(binwidth = 1)
Text answer (2 points):
The histogram appears to be unimodal, slightly skewed to the right,with a small tail to the right, and no real noticeable outlier.
Question:
Find statistics to describe the centrality and spread of the variable number of prior arrests. Are these numbers surprising?
Code (3 points):
summ_stats_num_prior_arrest <- dat_riverton %>% summarize(median=median(num_prior_arrests),
IQR = IQR(num_prior_arrests))
summ_stats_num_prior_arrest
## median IQR
## 1 3 3
Text answer (2 points):
It’s surprising that the median is 3 because this means over half the individuals have at least 3 prior arrests, rather than clustering at 0 or 1 as you might expect. The IQR of 3 shows that the middle half of people fall within a span of 3 arrest.
Question:
Now split up the histogram by gender. Does it look like males tend to have more or fewer prior arrests than females?
Code (3 points):
dat_riverton %>%
ggplot(aes(x=num_prior_arrests, fill=gender)) + geom_histogram(binwidth = 1)
Text answer (2 points):
For the most part, males tend to have more prior arrests than females.
Question:
Make a barplot for offense_type and describe it.
Code (3 points):
dat_riverton %>%
ggplot(aes(x=offense_type)) + geom_bar()
Text answer (2 points):
Offenses regarding property tend to be the most common, followed by drugs, violnt, and public offenses.
Question:
Now split the barplot by gender. Does it look like males commit more violent crimes than females?
Code (3 points):
dat_riverton %>%
ggplot(aes(x=offense_type, fill=gender)) + geom_bar()
Text answer (2 points):
The barplot illustrates that males do commit more violent crimes than females do.
Question:
Make a histogram for time served. Split it up by offense type. Does it look like public order offenses have shorter or longer time served?
Code (3 points):
dat_riverton %>%
ggplot(aes(x=time_served_months, fill=offense_type)) + geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Text answer (2 points):
The histogram illustrates that relative to other offenses, public order offenses have shorter time served.
Question:
Now find a measure of centrality to determine whether your guess to #10 is correct.
Note: I have given you part of the code. You need to fill in the rest: Write the statistic you want to see in the empty round brackets. Next in the class, we will learn how to test these differences statistically.
Code (3 point):
dat_riverton %>%
group_by(offense_type) %>%
summarize(median = median(time_served_months),
mean = mean(time_served_months))
## # A tibble: 4 × 3
## offense_type median mean
## <chr> <dbl> <dbl>
## 1 Drug 35.7 35.1
## 2 Property 29.1 29.1
## 3 Public Order 17.7 17.6
## 4 Violent 54.2 52.9
Text answer (2 point):
Both the mean and median validate that my answer in 10 was correct as they both illustrate that relative to other offenses, public order offenses have shorter time served.