This quiz is meant to give you and me a sense of how well you understand the materials up to this point.
It will be worth the same number of points as a regular assignment.
First install a couple of packages. You don’t need to do this in this Rmarkdown document. Just be sure that “tidyverse” and “dslabs” are in your list of packages in the window on the bottom right.
If they aren’t use install.packages(““) directly in the Console in the window on the bottom left.
Load the packages you have just installed.
library(dslabs)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
View the different datasets available in dslabs.
data(package ="dslabs")
What year are the data in murders from? THEY ARE ALL FROM 2010
Load the dataset “murders”.
data("murders")
view(murders)
How many columns (variables) are there in “murders” and what type or class is each variable? HINT: There is a single function that you can use to get this information and murders does not need to be in quotes.Put your code in the code chunk. If it runs correctly it will show all of the information I’m asking for and you don’t need to write it out separately.
str(murders)
## 'data.frame': 51 obs. of 5 variables:
## $ state : chr "Alabama" "Alaska" "Arizona" "Arkansas" ...
## $ abb : chr "AL" "AK" "AZ" "AR" ...
## $ region : Factor w/ 4 levels "Northeast","South",..: 2 4 4 2 4 4 1 2 2 2 ...
## $ population: num 4779736 710231 6392017 2915918 37253956 ...
## $ total : num 135 19 232 93 1257 ...
List the state abbreviations in order from lowest number of murder to highest.
HINT: In the book, Introduction to Data Science, we were taught to first list the total murders in order and assign that a name. Then use the abb variable with this new name. Feel free to use the book.
ind <- order(murders$total)
murders$abb[ind]
## [1] "VT" "ND" "NH" "WY" "HI" "SD" "ME" "ID" "MT" "RI" "AK" "IA" "UT" "WV" "NE"
## [16] "OR" "DE" "MN" "KS" "CO" "NM" "NV" "AR" "WA" "CT" "WI" "DC" "OK" "KY" "MA"
## [31] "MS" "AL" "IN" "SC" "TN" "AZ" "NJ" "VA" "NC" "MD" "OH" "MO" "LA" "IL" "GA"
## [46] "MI" "PA" "NY" "FL" "TX" "CA"
Create a new value that provides the crime rate for each state.Finish the r code in the chunk below.
murder_rate <- murders$total / murders$population * 100000
Now make that value appear as a new variable in murders. HINT: You will be using mutate() check out the example in Basic Statistics using R p 28.
murders_plus <- mutate(murders,
murder_rate = total / population * 100000)
Show a summary of murders_plus.
summary(murders_plus)
## state abb region population
## Length:51 Length:51 Northeast : 9 Min. : 563626
## Class :character Class :character South :17 1st Qu.: 1696962
## Mode :character Mode :character North Central:12 Median : 4339367
## West :13 Mean : 6075769
## 3rd Qu.: 6636084
## Max. :37253956
## total murder_rate
## Min. : 2.0 Min. : 0.3196
## 1st Qu.: 24.5 1st Qu.: 1.2526
## Median : 97.0 Median : 2.6871
## Mean : 184.4 Mean : 2.7791
## 3rd Qu.: 268.0 3rd Qu.: 3.3861
## Max. :1257.0 Max. :16.4528
Import the dataset 2018.UCR.PA.xlsx You can copy the code provided from the “import dataset” option in Environment.
library(readxl)
X2018_UCR_PA <- read_excel("2018.UCR.PA.xlsx")
Show the names of each variable. Check for strange spellings and correct them. Name 2018.UCR.PA to reflect this change.
HINT: We did this in week 3.
names(X2018_UCR_PA)
## [1] "City"
## [2] "Population"
## [3] "Violent\r\ncrime"
## [4] "Murder and\r\nnonnegligent\r\nmanslaughter"
## [5] "Rape"
## [6] "Robbery"
## [7] "Aggravated\r\nassault"
## [8] "Property\r\ncrime"
## [9] "Burglary"
## [10] "Larceny-\r\ntheft"
## [11] "Motor\r\nvehicle\r\ntheft"
## [12] "Arson"
X2018_UCR_PA_cleaned <- X2018_UCR_PA %>%
rename(violent.crime = 'Violent\r\ncrime') %>%
rename(murder.manslaughter = 'Murder and\r\nnonnegligent\r\nmanslaughter') %>%
rename(aggravated.assault = 'Aggravated\r\nassault') %>%
rename(property.crime = 'Property\r\ncrime') %>%
rename(larceny.theft = 'Larceny-\r\ntheft') %>%
rename(motor.theft = 'Motor\r\nvehicle\r\ntheft') %>%
mutate(crime.rate = ((violent.crime + property.crime)/Population) * 100000)
summary(X2018_UCR_PA_cleaned)
## City Population violent.crime murder.manslaughter
## Length:989 Min. : 132 Min. : 0.00 Min. : 0.0000
## Class :character 1st Qu.: 2066 1st Qu.: 1.00 1st Qu.: 0.0000
## Mode :character Median : 4320 Median : 5.00 Median : 0.0000
## Mean : 10054 Mean : 34.16 Mean : 0.6977
## 3rd Qu.: 9088 3rd Qu.: 15.00 3rd Qu.: 0.0000
## Max. :1586916 Max. :14420.00 Max. :351.0000
## Rape Robbery aggravated.assault property.crime
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.0
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 9.0
## Median : 0.000 Median : 0.000 Median : 4.00 Median : 40.0
## Mean : 2.971 Mean : 9.449 Mean : 21.05 Mean : 164.6
## 3rd Qu.: 1.000 3rd Qu.: 2.000 3rd Qu.: 11.00 3rd Qu.: 105.0
## Max. :1095.000 Max. :5262.000 Max. :7712.00 Max. :49145.0
## Burglary larceny.theft motor.theft Arson
## Min. : 0.00 Min. : 0.0 Min. : 0.00 Min. : 0.000
## 1st Qu.: 1.00 1st Qu.: 7.0 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 5.00 Median : 32.0 Median : 1.00 Median : 0.000
## Mean : 21.42 Mean : 131.3 Mean : 11.84 Mean : 1.147
## 3rd Qu.: 12.00 3rd Qu.: 89.0 3rd Qu.: 4.00 3rd Qu.: 0.000
## Max. :6497.00 Max. :36968.0 Max. :5680.00 Max. :430.000
## crime.rate
## Min. : 0.0
## 1st Qu.: 456.3
## Median : 904.8
## Mean : 1261.1
## 3rd Qu.: 1701.4
## Max. :17757.0
Create a new variable called violent_crime_rate as you did in questions 2c and 2d.
X2018_UCR_PA_cleaned <- X2018_UCR_PA_cleaned %>%
mutate(violent_crime_rate = violent.crime / Population * 100000)
Create a histogram based on violent_crime_rate. HINT: See page 31 in Basic Statistics using R
hist(X2018_UCR_PA_cleaned$violent_crime_rate,
main = "Histogram of Violent Crime Rate",
xlab = "Violent Crime Rate (per 100,000)",
ylab = "Frequency")
Knit this Rmarkdown file and submit it via the link in the week 6 folder.