Diandra Dzib
In this assignment, you will analyze the USArrests dataset, which contains statistics on violent crime rates in the United States for each of the 50 states in 1973. You will apply the R skills you learned in this module, including importing data, summarizing information, creating visualizations, and using tidyverse functions.
For this assignment, and most others in this class you will need to
use the tidyverse and GGally libraries. Before
loading your libraries, if you have not already installed those
libraries on the version of R you are currently using, please install
them now. You can do that by typing the code below in your console or by
using the Tools dropdown menu. You will not need to do this again for
future assignments as long as you are using the same computer.
Load the necessary libraries for data analysis and visualization.
The USArrests dataset contains the following
variables: - Murder: Murder arrests (per 100,000 residents)
- Assault: Assault arrests (per 100,000 residents) -
UrbanPop: Percent of the population living in urban
areas
Use the following commands to explore the dataset content:
| State | Murder | Assault | UrbanPop |
|---|---|---|---|
| Alabama | 13.2 | 236 | 58 |
| Alaska | 10.0 | 263 | 48 |
| Arizona | 8.1 | 294 | 80 |
| Arkansas | 8.8 | 190 | 50 |
| California | 9.0 | 276 | 91 |
| Colorado | 7.9 | 204 | 78 |
## Rows: 50
## Columns: 4
## $ State <chr> "Alabama", "Alaska", "Arizona", "Arkansas", "California", "Co…
## $ Murder <dbl> 13.2, 10.0, 8.1, 8.8, 9.0, 7.9, 3.3, 5.9, 15.4, 17.4, 5.3, 2.…
## $ Assault <dbl> 236, 263, 294, 190, 276, 204, 110, 238, 335, 211, 46, 120, 24…
## $ UrbanPop <dbl> 58, 48, 80, 50, 91, 78, 77, 72, 80, 60, 83, 54, 83, 65, 57, 6…
Use the following tasks to practice subsetting data using
filter() to select specific rows and select()
to choose specific columns.
Retrieve all rows where Murder is greater than 10 per
100,000 residents.
| State | Murder | Assault | UrbanPop |
|---|---|---|---|
| Alabama | 13.2 | 236 | 58 |
| Florida | 15.4 | 335 | 80 |
| Georgia | 17.4 | 211 | 60 |
| Illinois | 10.4 | 249 | 83 |
| Louisiana | 15.4 | 249 | 66 |
| Maryland | 11.3 | 300 | 67 |
| Michigan | 12.1 | 255 | 74 |
| Mississippi | 16.1 | 259 | 44 |
| Nevada | 12.2 | 252 | 81 |
| New Mexico | 11.4 | 285 | 70 |
| New York | 11.1 | 254 | 86 |
| North Carolina | 13.0 | 337 | 45 |
| South Carolina | 14.4 | 279 | 48 |
| Tennessee | 13.2 | 188 | 59 |
| Texas | 12.7 | 201 | 80 |
Retrieve all rows where UrbanPop is less than 50%.
| State | Murder | Assault | UrbanPop |
|---|---|---|---|
| Alaska | 10.0 | 263 | 48 |
| Mississippi | 16.1 | 259 | 44 |
| North Carolina | 13.0 | 337 | 45 |
| North Dakota | 0.8 | 45 | 44 |
| South Carolina | 14.4 | 279 | 48 |
| South Dakota | 3.8 | 86 | 45 |
| Vermont | 2.2 | 48 | 32 |
| West Virginia | 5.7 | 81 | 39 |
Find the data for the states of California, Texas, and New York.
Select only the State and Murder
columns.
| State | Murder |
|---|---|
| Alabama | 13.2 |
| Alaska | 10.0 |
| Arizona | 8.1 |
| Arkansas | 8.8 |
| California | 9.0 |
| Colorado | 7.9 |
| Connecticut | 3.3 |
| Delaware | 5.9 |
| Florida | 15.4 |
| Georgia | 17.4 |
| Hawaii | 5.3 |
| Idaho | 2.6 |
| Illinois | 10.4 |
| Indiana | 7.2 |
| Iowa | 2.2 |
| Kansas | 6.0 |
| Kentucky | 9.7 |
| Louisiana | 15.4 |
| Maine | 2.1 |
| Maryland | 11.3 |
| Massachusetts | 4.4 |
| Michigan | 12.1 |
| Minnesota | 2.7 |
| Mississippi | 16.1 |
| Missouri | 9.0 |
| Montana | 6.0 |
| Nebraska | 4.3 |
| Nevada | 12.2 |
| New Hampshire | 2.1 |
| New Jersey | 7.4 |
| New Mexico | 11.4 |
| New York | 11.1 |
| North Carolina | 13.0 |
| North Dakota | 0.8 |
| Ohio | 7.3 |
| Oklahoma | 6.6 |
| Oregon | 4.9 |
| Pennsylvania | 6.3 |
| Rhode Island | 3.4 |
| South Carolina | 14.4 |
| South Dakota | 3.8 |
| Tennessee | 13.2 |
| Texas | 12.7 |
| Utah | 3.2 |
| Vermont | 2.2 |
| Virginia | 8.5 |
| Washington | 4.0 |
| West Virginia | 5.7 |
| Wisconsin | 2.6 |
| Wyoming | 6.8 |
Select the State, Assault, and
UrbanPop columns.
| State | Assault | UrbanPop |
|---|---|---|
| Alabama | 236 | 58 |
| Alaska | 263 | 48 |
| Arizona | 294 | 80 |
| Arkansas | 190 | 50 |
| California | 276 | 91 |
| Colorado | 204 | 78 |
| Connecticut | 110 | 77 |
| Delaware | 238 | 72 |
| Florida | 335 | 80 |
| Georgia | 211 | 60 |
| Hawaii | 46 | 83 |
| Idaho | 120 | 54 |
| Illinois | 249 | 83 |
| Indiana | 113 | 65 |
| Iowa | 56 | 57 |
| Kansas | 115 | 66 |
| Kentucky | 109 | 52 |
| Louisiana | 249 | 66 |
| Maine | 83 | 51 |
| Maryland | 300 | 67 |
| Massachusetts | 149 | 85 |
| Michigan | 255 | 74 |
| Minnesota | 72 | 66 |
| Mississippi | 259 | 44 |
| Missouri | 178 | 70 |
| Montana | 109 | 53 |
| Nebraska | 102 | 62 |
| Nevada | 252 | 81 |
| New Hampshire | 57 | 56 |
| New Jersey | 159 | 89 |
| New Mexico | 285 | 70 |
| New York | 254 | 86 |
| North Carolina | 337 | 45 |
| North Dakota | 45 | 44 |
| Ohio | 120 | 75 |
| Oklahoma | 151 | 68 |
| Oregon | 159 | 67 |
| Pennsylvania | 106 | 72 |
| Rhode Island | 174 | 87 |
| South Carolina | 279 | 48 |
| South Dakota | 86 | 45 |
| Tennessee | 188 | 59 |
| Texas | 201 | 80 |
| Utah | 120 | 80 |
| Vermont | 48 | 32 |
| Virginia | 156 | 63 |
| Washington | 145 | 73 |
| West Virginia | 81 | 39 |
| Wisconsin | 53 | 66 |
| Wyoming | 161 | 60 |
Retrieve only the states where Assault is greater than
200, but display only the State and Assault
columns.
| State | Assault |
|---|---|
| Alabama | 236 |
| Alaska | 263 |
| Arizona | 294 |
| California | 276 |
| Colorado | 204 |
| Delaware | 238 |
| Florida | 335 |
| Georgia | 211 |
| Illinois | 249 |
| Louisiana | 249 |
| Maryland | 300 |
| Michigan | 255 |
| Mississippi | 259 |
| Nevada | 252 |
| New Mexico | 285 |
| New York | 254 |
| North Carolina | 337 |
| South Carolina | 279 |
| Texas | 201 |
Retrieve only the states where Murder is below 5 and
UrbanPop is above 70, but display only State,
Murder, and UrbanPop.
| State | Murder | UrbanPop |
|---|---|---|
| Connecticut | 3.3 | 77 |
| Massachusetts | 4.4 | 85 |
| Rhode Island | 3.4 | 87 |
| Utah | 3.2 | 80 |
| Washington | 4.0 | 73 |
Create a new variable high_murder that equals
1 if Murder is greater than the median
murder rate and 0 otherwise.
Create a scatter plot to show the relationship between the number of
assault arrests (Assault) and murder arrests
(Murder).
ggplot(us_arrests, aes (x = Assault, y = Murder)) +
geom_point(alpha = 0.6, color = "blue") +
labs(x = "Assault", y = "Murder") Add a linear trend line to the plot.
ggplot(us_arrests, aes (x = Assault, y = Murder)) +
geom_point(alpha = 0.6, color = "blue") +
labs(x = "Assault", y = "Murder") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
theme_minimal()Create histograms for Murder and UrbanPop
to understand their distributions.
ggplot(us_arrests, aes(x = Murder)) +
geom_histogram(binwidth = 0.1, fill = "blue", color = "black", alpha = 0.7) +
labs(x = "Murder", y = "Count") +
theme_minimal()ggplot(us_arrests, aes(x = UrbanPop)) +
geom_histogram(binwidth = 0.1, fill = "blue", color = "black", alpha = 0.7) +
labs(x = "UrbanPop", y = "Count") +
theme_minimal()Calculate the mean murder rate for states classified as
high_murder (1) and not high murder (0).