DS Labs- Jason Laucel

library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.4.4     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.0
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

library(readr)
library(dslabs)


Attaching package: 'dslabs'

The following object is masked from 'package:highcharter':

    stars

# Load data manipulation library

library(dplyr)
# Filter for specific regions and states
murders_filtered <- murders %>%
  filter(state %in% c("Texas", "California", "New York", "Missouri"),
         region %in% c("South", "West", "Northeast","North Central"))

# Interactive Scatterplot
highchart() %>%
  
  # Add data series for each region/state, each follow similar format
  hc_add_series(data = filter(murders_filtered, region == "South"),
                type = "scatter",
                hcaes(x = population, y = total, group = state),
                name = "South: Texas",
                color = c("Black", "Green", "Red")[1],
                # shape for data plot
                marker = list(symbol = "circle", radius = 5)) %>%
  hc_add_series(data = filter(murders_filtered, region == "West"),
                type = "scatter",
                hcaes(x = population, y = total, group = state),
                name = "West: California",
                color = c("Black", "Green", "Red")[2],
                
                marker = list(symbol = "circle", radius = 5)) %>%
  hc_add_series(data = filter(murders_filtered, region == "Northeast"),
                type = "scatter",
                hcaes(x = population, y = total, group = state),
                name = "Northeast: New York",
                color = c("Black", "Green", "Red")[3],  
                marker = list(symbol = "circle", radius = 5)) %>%
  hc_add_series(data = filter(murders_filtered, region == "North Central"),
                type = "scatter",
                hcaes(x = population, y = total, group = state),
                name = "North Central: Missouri",
                color = "Purple",  
                marker = list(symbol = "circle", radius = 5)) %>%
  # Change X axis scale to count by intervals of 5 M
  hc_xAxis(title = list(text = "Population"), tickInterval = 5000000) %>%
  hc_yAxis(title = list(text = "Total Murders")) %>%
  hc_legend(layout = "vertical", align = "right", verticalAlign = "middle") %>%
  hc_title(text = "Largest Total Murders and Population Count by Region and State")

I chose to use the murders.csv data set for this assignment. I wanted to focus on the data that was the largest and stood out from the rest. I hand picked the 4 states from each respective region with the highest population and total murder. I was surprised to see Michigan as the highest for North Central; I kind of had Illinois in mind but the data said otherwise. I used a high chart with a scatter plot format to organize my data visualization. I like how it came out because it’s interactive and allows users to see specific numbers associated with each region and state. While working on this current end product i thought about one of the visualizations we went over during class about deaths. In the future I wish to add more weight to an assignment like this especially when the topic is sensitive like this.