Overview

This quiz is meant to give you and me a sense of how well you understand the materials up to this point.

It will be worth the same number of points as a regular assignment.

Problem 1

Question 1a

First install a couple of packages. You don’t need to do this in this Rmarkdown document. Just be sure that “tidyverse” and “dslabs” are in your list of packages in the window on the bottom right.

If they aren’t use install.packages(““) directly in the Console in the window on the bottom left.

Question 1b

Load the packages you have just installed.

library(dslabs)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Question 1c

View the different datasets available in dslabs.

data(package ="dslabs")

Question 1d

What year are the data in murders from? US gun murders by state for 2010

Question 1e

Load the dataset “murders”.

data("murders")

Problem 2

Question 2a

How many columns (variables) are there in “murders” and what type or class is each variable? HINT: There is a single function that you can use to get this information and murders does not need to be in quotes.Put your code in the code chunk. If it runs correctly it will show all of the information I’m asking for and you don’t need to write it out separately.

str(murders)
## 'data.frame':    51 obs. of  5 variables:
##  $ state     : chr  "Alabama" "Alaska" "Arizona" "Arkansas" ...
##  $ abb       : chr  "AL" "AK" "AZ" "AR" ...
##  $ region    : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
##  $ population: num  4779736 710231 6392017 2915918 37253956 ...
##  $ total     : num  135 19 232 93 1257 ...

Question 2b

List the state abbreviations in order from lowest number of murder to highest.

HINT: In the book, Introduction to Data Science, we were taught to first list the total murders in order and assign that a name. Then use the abb variable with this new name. Feel free to use the book.

ind <- order(murders$total)
murders$abb[ind]
##  [1] "VT" "ND" "NH" "WY" "HI" "SD" "ME" "ID" "MT" "RI" "AK" "IA" "UT" "WV" "NE"
## [16] "OR" "DE" "MN" "KS" "CO" "NM" "NV" "AR" "WA" "CT" "WI" "DC" "OK" "KY" "MA"
## [31] "MS" "AL" "IN" "SC" "TN" "AZ" "NJ" "VA" "NC" "MD" "OH" "MO" "LA" "IL" "GA"
## [46] "MI" "PA" "NY" "FL" "TX" "CA"

Question 2c

Create a new value that provides the crime rate for each state.Finish the r code in the chunk below.

murder_rate <- (murders$total/murders$population)*100000

Question 2d

Now make that value appear as a new variable in murders. HINT: You will be using mutate() check out the example in Basic Statistics using R p 28.

murders_plus <- murders %>%
  mutate(murder_rate = (murders$total/murders$population)*100000)

Question 2e

Show a summary of murders_plus.

summary(murders_plus)
##     state               abb                      region     population      
##  Length:51          Length:51          Northeast    : 9   Min.   :  563626  
##  Class :character   Class :character   South        :17   1st Qu.: 1696962  
##  Mode  :character   Mode  :character   North Central:12   Median : 4339367  
##                                        West         :13   Mean   : 6075769  
##                                                           3rd Qu.: 6636084  
##                                                           Max.   :37253956  
##      total         murder_rate     
##  Min.   :   2.0   Min.   : 0.3196  
##  1st Qu.:  24.5   1st Qu.: 1.2526  
##  Median :  97.0   Median : 2.6871  
##  Mean   : 184.4   Mean   : 2.7791  
##  3rd Qu.: 268.0   3rd Qu.: 3.3861  
##  Max.   :1257.0   Max.   :16.4528

Problem 3

Question 3a

Import the dataset 2018.UCR.PA.xlsx You can copy the code provided from the “import dataset” option in Environment.

library(readxl)
X2018_UCR_PA <- read_excel("/Users/ingridellis/Desktop/CJS 310/Week 3/2018.UCR.PA.xlsx")
head(X2018_UCR_PA)
## # A tibble: 6 × 12
##   City        Population `Violent\r\ncrime` Murder and\r\nnonneg…¹  Rape Robbery
##   <chr>            <dbl>              <dbl>                  <dbl> <dbl>   <dbl>
## 1 Abington T…      55631                 44                      1     6      12
## 2 Adamstown         1857                  3                      0     0       0
## 3 Adams Town…      14105                  3                      0     0       0
## 4 Adams Town…       5581                  0                      0     0       0
## 5 Akron             4015                  7                      0     1       0
## 6 Albion            1466                  0                      0     0       0
## # ℹ abbreviated name: ¹​`Murder and\r\nnonnegligent\r\nmanslaughter`
## # ℹ 6 more variables: `Aggravated\r\nassault` <dbl>, `Property\r\ncrime` <dbl>,
## #   Burglary <dbl>, `Larceny-\r\ntheft` <dbl>,
## #   `Motor\r\nvehicle\r\ntheft` <dbl>, Arson <dbl>

Question 3b

Show the names of each variable. Check for strange spellings and correct them. Name 2018.UCR.PA to reflect this change.

HINT: We did this in week 3.

df <- X2018_UCR_PA %>%
  rename(violent.crime = 'Violent\r\ncrime') %>%
  rename(murder.manslaughter = 'Murder and\r\nnonnegligent\r\nmanslaughter') %>%
  rename(aggravated.assault = 'Aggravated\r\nassault') %>%
  rename(property.crime = 'Property\r\ncrime') %>%
  rename(larceny.theft = 'Larceny-\r\ntheft') %>%
  rename(motor.theft = 'Motor\r\nvehicle\r\ntheft')

summary(df)
##      City             Population      violent.crime      murder.manslaughter
##  Length:989         Min.   :    132   Min.   :    0.00   Min.   :  0.0000   
##  Class :character   1st Qu.:   2066   1st Qu.:    1.00   1st Qu.:  0.0000   
##  Mode  :character   Median :   4320   Median :    5.00   Median :  0.0000   
##                     Mean   :  10054   Mean   :   34.16   Mean   :  0.6977   
##                     3rd Qu.:   9088   3rd Qu.:   15.00   3rd Qu.:  0.0000   
##                     Max.   :1586916   Max.   :14420.00   Max.   :351.0000   
##       Rape             Robbery         aggravated.assault property.crime   
##  Min.   :   0.000   Min.   :   0.000   Min.   :   0.00    Min.   :    0.0  
##  1st Qu.:   0.000   1st Qu.:   0.000   1st Qu.:   1.00    1st Qu.:    9.0  
##  Median :   0.000   Median :   0.000   Median :   4.00    Median :   40.0  
##  Mean   :   2.971   Mean   :   9.449   Mean   :  21.05    Mean   :  164.6  
##  3rd Qu.:   1.000   3rd Qu.:   2.000   3rd Qu.:  11.00    3rd Qu.:  105.0  
##  Max.   :1095.000   Max.   :5262.000   Max.   :7712.00    Max.   :49145.0  
##     Burglary       larceny.theft      motor.theft          Arson        
##  Min.   :   0.00   Min.   :    0.0   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:   1.00   1st Qu.:    7.0   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :   5.00   Median :   32.0   Median :   1.00   Median :  0.000  
##  Mean   :  21.42   Mean   :  131.3   Mean   :  11.84   Mean   :  1.147  
##  3rd Qu.:  12.00   3rd Qu.:   89.0   3rd Qu.:   4.00   3rd Qu.:  0.000  
##  Max.   :6497.00   Max.   :36968.0   Max.   :5680.00   Max.   :430.000

Question 3c

Create a new variable called violent_crime_rate as you did in questions 2c and 2d.

violent_crime_rate <- (df$violent.crime/df$Population)*100000
df_rate <- df %>%
  mutate(violent_crime_rate = (df$violent.crime/df$Population)*100000)
head(df_rate)
## # A tibble: 6 × 13
##   City                Population violent.crime murder.manslaughter  Rape Robbery
##   <chr>                    <dbl>         <dbl>               <dbl> <dbl>   <dbl>
## 1 Abington Township,…      55631            44                   1     6      12
## 2 Adamstown                 1857             3                   0     0       0
## 3 Adams Township, Bu…      14105             3                   0     0       0
## 4 Adams Township, Ca…       5581             0                   0     0       0
## 5 Akron                     4015             7                   0     1       0
## 6 Albion                    1466             0                   0     0       0
## # ℹ 7 more variables: aggravated.assault <dbl>, property.crime <dbl>,
## #   Burglary <dbl>, larceny.theft <dbl>, motor.theft <dbl>, Arson <dbl>,
## #   violent_crime_rate <dbl>

Question 3d

Create a histogram based on violent_crime_rate. HINT: See page 31 in Basic Statistics using R

crime.rate.table <- df_rate %>%
  ggplot(aes(x = violent_crime_rate, fill = ..count..)) +
  geom_histogram() +
  labs(x = "Crime rates", y = "Frequency", title = "Distribution of Crime Rates") +
  theme_minimal()
crime.rate.table 
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.

Problem 3

Knit this Rmarkdown file and submit it via the link in the week 6 folder.