Exam 1 Questions

Note: For this exam, don’t worry about including the correct labels and titles for the plots.

1. Loading data (5 points)

Question:

Load the data called riverton-crime.csv using the proper R command and file path.

Code (5 points):

dat_riverton <- read.csv("C:/Users/evely/AppData/Roaming/Microsoft/Windows/Network Shortcuts/Penn Classes/CRIM/Data/riverton-crime.csv")

2. Variable types (5 points)

Question:

Read the codebook. What types of (stat) variables are these? Variables: id, gender, offense_type, num_prior_arrests, and time_served_months.

Text answer (5 points):

id: Identifier variable
gender: Categorical variable
offense_type: Categorical
num_prior_arrests: Quantitative variable
time_served_months: Quantitative variable

3. Data dimensions (5 points)

Question:

How many observations and variables are there in the data?

Code (3 points):

dim(dat_riverton)

## [1] 500   6

Text answer (2 points):

There are 500 observations of 6 varibales

4. Quantitative EDA (5 points)

Question:

What percentage of the sample is male?

Code (3 points):

dat_riverton %>%
  count(gender) %>%
  mutate(prop = prop.table(n))

##   gender   n  prop
## 1 Female 122 0.244
## 2   Male 378 0.756

Text answer (2 points):

75.6% of the sample is male.

5. Visual EDA (5 points)

Question:

Make a histogram for number of prior arrests and look for an appropriate number of bins. Describe this histogram.

Code (3 points):

dat_riverton %>%
  ggplot(aes(x = num_prior_arrests)) +  geom_histogram(binwidth = 1)

Text answer (2 points):

The histogram appears to be unimodal, slightly skewed to the right,with a small tail to the right, and no real noticeable outlier.

6. Quantitative EDA (5 points)

Question:

Find statistics to describe the centrality and spread of the variable number of prior arrests. Are these numbers surprising?

Code (3 points):

summ_stats_num_prior_arrest <- dat_riverton %>% summarize(median=median(num_prior_arrests),
                  IQR = IQR(num_prior_arrests))
summ_stats_num_prior_arrest

##   median IQR
## 1      3   3

Text answer (2 points):

It’s surprising that the median is 3 because this means over half the individuals have at least 3 prior arrests, rather than clustering at 0 or 1 as you might expect. The IQR of 3 shows that the middle half of people fall within a span of 3 arrest.

7. Visual EDA (5 points)

Question:

Now split up the histogram by gender. Does it look like males tend to have more or fewer prior arrests than females?

Code (3 points):

dat_riverton %>%
  ggplot(aes(x=num_prior_arrests, fill=gender)) + geom_histogram(binwidth = 1)

Text answer (2 points):

For the most part, males tend to have more prior arrests than females.

8. Visual EDA (5 points)

Question:

Make a barplot for offense_type and describe it.

Code (3 points):

dat_riverton %>%
  ggplot(aes(x=offense_type)) + geom_bar()

Text answer (2 points):

Offenses regarding property tend to be the most common, followed by drugs, violnt, and public offenses.

9. Visual EDA (5 points)

Question:

Now split the barplot by gender. Does it look like males commit more violent crimes than females?

Code (3 points):

dat_riverton %>%
  ggplot(aes(x=offense_type, fill=gender)) + geom_bar()

Text answer (2 points):

The barplot illustrates that males do commit more violent crimes than females do.

10. Visual EDA (5 points)

Question:

Make a histogram for time served. Split it up by offense type. Does it look like public order offenses have shorter or longer time served?

Code (3 points):

dat_riverton %>%
  ggplot(aes(x=time_served_months, fill=offense_type)) + geom_histogram()

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Text answer (2 points):

The histogram illustrates that relative to other offenses, public order offenses have shorter time served.

Extra credit. Quantitative EDA (5 points extra credit)

Question:

Now find a measure of centrality to determine whether your guess to #10 is correct.

Note: I have given you part of the code. You need to fill in the rest: Write the statistic you want to see in the empty round brackets. Next in the class, we will learn how to test these differences statistically.

Code (3 point):

dat_riverton %>%
  group_by(offense_type) %>%
  summarize(median = median(time_served_months),
            mean = mean(time_served_months))

## # A tibble: 4 × 3
##   offense_type median  mean
##   <chr>         <dbl> <dbl>
## 1 Drug           35.7  35.1
## 2 Property       29.1  29.1
## 3 Public Order   17.7  17.6
## 4 Violent        54.2  52.9

Text answer (2 point):

Both the mean and median validate that my answer in 10 was correct as they both illustrate that relative to other offenses, public order offenses have shorter time served.

Exam 1 Questions

Evely Carbonell

2024-09-30

1. Loading data (5 points)

2. Variable types (5 points)

3. Data dimensions (5 points)

4. Quantitative EDA (5 points)

5. Visual EDA (5 points)

6. Quantitative EDA (5 points)

7. Visual EDA (5 points)

8. Visual EDA (5 points)

9. Visual EDA (5 points)

10. Visual EDA (5 points)

Extra credit. Quantitative EDA (5 points extra credit)