This quiz is meant to give you and me a sense of how well you understand the materials up to this point.
It will be worth the same number of points as a regular assignment.
First install a couple of packages. You don’t need to do this in this Rmarkdown document. Just be sure that “tidyverse” and “dslabs” are in your list of packages in the window on the bottom right.
If they aren’t use install.packages(““) directly in the Console in the window on the bottom left.
Load the packages you have just installed.
library(dslabs)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
View the different datasets available in dslabs.
data(package ="dslabs")
What year are the data in murders from? US gun murders by state for 2010
Load the dataset “murders”.
data("murders")
How many columns (variables) are there in “murders” and what type or class is each variable? HINT: There is a single function that you can use to get this information and murders does not need to be in quotes.Put your code in the code chunk. If it runs correctly it will show all of the information I’m asking for and you don’t need to write it out separately.
str(murders)
## 'data.frame': 51 obs. of 5 variables:
## $ state : chr "Alabama" "Alaska" "Arizona" "Arkansas" ...
## $ abb : chr "AL" "AK" "AZ" "AR" ...
## $ region : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
## $ population: num 4779736 710231 6392017 2915918 37253956 ...
## $ total : num 135 19 232 93 1257 ...
List the state abbreviations in order from lowest number of murder to highest.
HINT: In the book, Introduction to Data Science, we were taught to first list the total murders in order and assign that a name. Then use the abb variable with this new name. Feel free to use the book.
ind <- order(murders$total)
murders$abb[ind]
## [1] "VT" "ND" "NH" "WY" "HI" "SD" "ME" "ID" "MT" "RI" "AK" "IA" "UT" "WV" "NE"
## [16] "OR" "DE" "MN" "KS" "CO" "NM" "NV" "AR" "WA" "CT" "WI" "DC" "OK" "KY" "MA"
## [31] "MS" "AL" "IN" "SC" "TN" "AZ" "NJ" "VA" "NC" "MD" "OH" "MO" "LA" "IL" "GA"
## [46] "MI" "PA" "NY" "FL" "TX" "CA"
Create a new value that provides the crime rate for each state.Finish the r code in the chunk below.
murder_rate <- (murders$total/murders$population)*100000
Now make that value appear as a new variable in murders. HINT: You will be using mutate() check out the example in Basic Statistics using R p 28.
murders_plus <- murders %>%
mutate(murder_rate = (murders$total/murders$population)*100000)
Show a summary of murders_plus.
summary(murders_plus)
## state abb region population
## Length:51 Length:51 Northeast : 9 Min. : 563626
## Class :character Class :character South :17 1st Qu.: 1696962
## Mode :character Mode :character North Central:12 Median : 4339367
## West :13 Mean : 6075769
## 3rd Qu.: 6636084
## Max. :37253956
## total murder_rate
## Min. : 2.0 Min. : 0.3196
## 1st Qu.: 24.5 1st Qu.: 1.2526
## Median : 97.0 Median : 2.6871
## Mean : 184.4 Mean : 2.7791
## 3rd Qu.: 268.0 3rd Qu.: 3.3861
## Max. :1257.0 Max. :16.4528
Import the dataset 2018.UCR.PA.xlsx You can copy the code provided from the “import dataset” option in Environment.
library(readxl)
X2018_UCR_PA <- read_excel("/Users/ingridellis/Desktop/CJS 310/Week 3/2018.UCR.PA.xlsx")
head(X2018_UCR_PA)
## # A tibble: 6 × 12
## City Population `Violent\r\ncrime` Murder and\r\nnonneg…¹ Rape Robbery
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abington T… 55631 44 1 6 12
## 2 Adamstown 1857 3 0 0 0
## 3 Adams Town… 14105 3 0 0 0
## 4 Adams Town… 5581 0 0 0 0
## 5 Akron 4015 7 0 1 0
## 6 Albion 1466 0 0 0 0
## # ℹ abbreviated name: ¹`Murder and\r\nnonnegligent\r\nmanslaughter`
## # ℹ 6 more variables: `Aggravated\r\nassault` <dbl>, `Property\r\ncrime` <dbl>,
## # Burglary <dbl>, `Larceny-\r\ntheft` <dbl>,
## # `Motor\r\nvehicle\r\ntheft` <dbl>, Arson <dbl>
Show the names of each variable. Check for strange spellings and correct them. Name 2018.UCR.PA to reflect this change.
HINT: We did this in week 3.
df <- X2018_UCR_PA %>%
rename(violent.crime = 'Violent\r\ncrime') %>%
rename(murder.manslaughter = 'Murder and\r\nnonnegligent\r\nmanslaughter') %>%
rename(aggravated.assault = 'Aggravated\r\nassault') %>%
rename(property.crime = 'Property\r\ncrime') %>%
rename(larceny.theft = 'Larceny-\r\ntheft') %>%
rename(motor.theft = 'Motor\r\nvehicle\r\ntheft')
summary(df)
## City Population violent.crime murder.manslaughter
## Length:989 Min. : 132 Min. : 0.00 Min. : 0.0000
## Class :character 1st Qu.: 2066 1st Qu.: 1.00 1st Qu.: 0.0000
## Mode :character Median : 4320 Median : 5.00 Median : 0.0000
## Mean : 10054 Mean : 34.16 Mean : 0.6977
## 3rd Qu.: 9088 3rd Qu.: 15.00 3rd Qu.: 0.0000
## Max. :1586916 Max. :14420.00 Max. :351.0000
## Rape Robbery aggravated.assault property.crime
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 9.0
## Median : 0.000 Median : 0.000 Median : 4.00 Median : 40.0
## Mean : 2.971 Mean : 9.449 Mean : 21.05 Mean : 164.6
## 3rd Qu.: 1.000 3rd Qu.: 2.000 3rd Qu.: 11.00 3rd Qu.: 105.0
## Max. :1095.000 Max. :5262.000 Max. :7712.00 Max. :49145.0
## Burglary larceny.theft motor.theft Arson
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 7.0 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 5.00 Median : 32.0 Median : 1.00 Median : 0.000
## Mean : 21.42 Mean : 131.3 Mean : 11.84 Mean : 1.147
## 3rd Qu.: 12.00 3rd Qu.: 89.0 3rd Qu.: 4.00 3rd Qu.: 0.000
## Max. :6497.00 Max. :36968.0 Max. :5680.00 Max. :430.000
Create a new variable called violent_crime_rate as you did in questions 2c and 2d.
violent_crime_rate <- (df$violent.crime/df$Population)*100000
df_rate <- df %>%
mutate(violent_crime_rate = (df$violent.crime/df$Population)*100000)
head(df_rate)
## # A tibble: 6 × 13
## City Population violent.crime murder.manslaughter Rape Robbery
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Abington Township,… 55631 44 1 6 12
## 2 Adamstown 1857 3 0 0 0
## 3 Adams Township, Bu… 14105 3 0 0 0
## 4 Adams Township, Ca… 5581 0 0 0 0
## 5 Akron 4015 7 0 1 0
## 6 Albion 1466 0 0 0 0
## # ℹ 7 more variables: aggravated.assault <dbl>, property.crime <dbl>,
## # Burglary <dbl>, larceny.theft <dbl>, motor.theft <dbl>, Arson <dbl>,
## # violent_crime_rate <dbl>
Create a histogram based on violent_crime_rate. HINT: See page 31 in Basic Statistics using R
crime.rate.table <- df_rate %>%
ggplot(aes(x = violent_crime_rate, fill = ..count..)) +
geom_histogram() +
labs(x = "Crime rates", y = "Frequency", title = "Distribution of Crime Rates") +
theme_minimal()
crime.rate.table
## Warning: The dot-dot notation (`..count..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(count)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `stat_bin()` using `bins = 30`. Pick better value `binwidth`.
Knit this Rmarkdown file and submit it via the link in the week 6 folder.