Tutorial 1: Dynamic Data Visualization

Data in Use

The present tutorial focuses on the exploration of the interactive visualization capabilities of animated graphics in R with the help of plotly library. Additionally, basic EDA is conducted prior to data visualization to assess quality of the used data and explore the dataset at work.

The dataset is taken from the open source Kaggle Community and explores the topic of the various US crime rates from 1975 to 2015. Historical data is of particular importance when it comes to interactive data visualization due to the fact that the latter is capable of producing an animated timeline of the events and showcase how the data trends were changing during the designated time period in the most effective and easy-to-understand for the end-user way. The dataset has been put together by The Marshall project and contains more than 40 years of data on the four major crimes the FBI classifies as violent — homicide, rape, robbery and assault — in 68 police jurisdictions with populations of 250,000 or greater.

Dataset insight

Libraries and the Dataset in use

# Disabling scientific notation
options(scipen=999)

# Loading the dataset (keeping the NA values in as for EDA)
data1 <- read.csv("report.csv")

# Quick look at the data inside the dataframe provided using different basic R commands
str(data1) # Returns names of the columns, variable type and several observations
## 'data.frame':    2829 obs. of  15 variables:
##  $ report_year        : int  1975 1975 1975 1975 1975 1975 1975 1975 1975 1975 ...
##  $ agency_code        : chr  "NM00101" "TX22001" "GAAPD00" "CO00101" ...
##  $ agency_jurisdiction: chr  "Albuquerque, NM" "Arlington, TX" "Atlanta, GA" "Aurora, CO" ...
##  $ population         : int  286238 112478 490584 116656 300400 642154 864100 616120 422276 262103 ...
##  $ violent_crimes     : int  2383 278 8033 611 1215 1259 16086 11386 3350 1937 ...
##  $ homicides          : int  30 5 185 7 33 25 259 119 63 68 ...
##  $ rapes              : int  181 28 443 44 190 137 463 453 192 71 ...
##  $ assaults           : int  1353 132 3518 389 463 347 6309 3036 755 976 ...
##  $ robberies          : int  819 113 3887 171 529 750 9055 7778 2340 822 ...
##  $ months_reported    : int  12 12 12 12 12 12 12 12 12 12 ...
##  $ crimes_percapita   : num  833 247 1637 524 404 ...
##  $ homicides_percapita: num  10.48 4.45 37.71 6 10.99 ...
##  $ rapes_percapita    : num  63.2 24.9 90.3 37.7 63.2 ...
##  $ assaults_percapita : num  473 117 717 333 154 ...
##  $ robberies_percapita: num  286 100 792 147 176 ...
colnames(data1) # Returns all column names of the dataset
##  [1] "report_year"         "agency_code"         "agency_jurisdiction"
##  [4] "population"          "violent_crimes"      "homicides"          
##  [7] "rapes"               "assaults"            "robberies"          
## [10] "months_reported"     "crimes_percapita"    "homicides_percapita"
## [13] "rapes_percapita"     "assaults_percapita"  "robberies_percapita"
summary(data1) # Returns a more detailed summary for each variable in the dataset
##   report_year   agency_code        agency_jurisdiction   population     
##  Min.   :1975   Length:2829        Length:2829         Min.   : 100763  
##  1st Qu.:1985   Class :character   Class :character    1st Qu.: 377931  
##  Median :1995   Mode  :character   Mode  :character    Median : 536614  
##  Mean   :1995                                          Mean   : 795698  
##  3rd Qu.:2005                                          3rd Qu.: 816856  
##  Max.   :2015                                          Max.   :8550861  
##                                                        NA's   :69       
##  violent_crimes      homicides           rapes           assaults    
##  Min.   :    154   Min.   :    1.0   Min.   :  15.0   Min.   :   15  
##  1st Qu.:   3015   1st Qu.:   32.0   1st Qu.: 176.2   1st Qu.: 1467  
##  Median :   5136   Median :   64.0   Median : 291.0   Median : 2597  
##  Mean   :  29632   Mean   :  398.4   Mean   : 416.3   Mean   : 4405  
##  3rd Qu.:   9058   3rd Qu.:  131.0   3rd Qu.: 465.0   3rd Qu.: 4556  
##  Max.   :1932274   Max.   :24703.0   Max.   :3899.0   Max.   :71030  
##  NA's   :35        NA's   :34        NA's   :75       NA's   :76     
##    robberies      months_reported crimes_percapita  homicides_percapita
##  Min.   :    83   Min.   : 0.00   Min.   :  16.49   Min.   : 0.210     
##  1st Qu.:  1032   1st Qu.:12.00   1st Qu.: 625.08   1st Qu.: 6.955     
##  Median :  1940   Median :12.00   Median : 949.68   Median :11.980     
##  Mean   :  4000   Mean   :11.87   Mean   :1093.05   Mean   :15.373     
##  3rd Qu.:  3610   3rd Qu.:12.00   3rd Qu.:1409.51   3rd Qu.:20.230     
##  Max.   :107475   Max.   :12.00   Max.   :4352.83   Max.   :94.740     
##  NA's   :75       NA's   :137     NA's   :35        NA's   :34         
##  rapes_percapita  assaults_percapita robberies_percapita
##  Min.   :  1.64   Min.   :   1.61    Min.   :  11.46    
##  1st Qu.: 35.77   1st Qu.: 319.09    1st Qu.: 210.24    
##  Median : 55.90   Median : 487.48    Median : 374.40    
##  Mean   : 59.31   Mean   : 566.60    Mean   : 459.97    
##  3rd Qu.: 77.80   3rd Qu.: 728.24    3rd Qu.: 612.00    
##  Max.   :199.30   Max.   :2368.22    Max.   :2337.52    
##  NA's   :75       NA's   :76         NA's   :75
nlevels(as.factor(data1$agency_jurisdiction)) # Check the number of jurisdiction areas
## [1] 69
nlevels(as.factor(data1$report_year)) # Check the number of years reported
## [1] 41
  • Data set contains a total of 2829 observations
  • Data is obtained on 15 variables, 2 of which are categorical and the rest in numeric
  • 41 years of observations obtained from 69 different jurisdiction areas result in 2829 total observations

Dataset in focus

We will proceed to use ‘modelsummary’ library to showcase its capabilities when it comes to summarizing data (descriptive statistics) and later on statistical models.

As it can be observed, ‘report_year’ is the only variable that does not have any missing values which is perfectly reasonable due to the dataset structure. When it comes to missing values in case of the ‘population’ variable it can be observed that the only two jurisdiction areas that have reported data with NAs for this particular variable are ‘United States’ and ‘Louisville, KY’. As for ‘Louisville, KY’ jurisdiction area, the data on population is missing all the way till 2003 (data present 2003 - 2015), while for the ‘United States’ jurisdiction area the ‘population’ data is missing entirely. The lack of population data thus results in NA cases attributed to all of the ‘per capita’-type variables since those are impossible to calculate (demonstrated using ‘homicides_percapita’ variable). However, In case of the ‘United States’ jurisdiction area on the ‘percapita’-type variables were still calculated.

As it is stated in the dataset description provided by the data source regarding the aforementioned calculation: “[The authors] calculated the rate of crime in each category and for all violent crime, per 100,000 residents in the jurisdiction, based on the FBI’s estimated population for that year. [The authors] used the 2014 estimated population to calculate 2015 crime rates per capita”.

An in-depth analysis could be conducted to dive deeper into the mentioned above jurisdiction ares due to the unusual number of missing values. The need for such is noted but not conducted for the present study due to its main focus on visualization.

# Quick summary with default parameters 
datasummary_skim(data1) 
Unique (#) Missing (%) Mean SD Min Median Max
report_year 41 0 1995.0 11.8 1975.0 1995.0 2015.0
population 2741 2 795698.1 1012450.6 100763.0 536614.5 8550861.0
violent_crimes 2527 1 29632.5 172863.0 154.0 5135.5 1932274.0
homicides 522 1 398.4 2281.3 1.0 64.0 24703.0
rapes 879 3 416.3 479.8 15.0 291.0 3899.0
assaults 2281 3 4405.1 6977.3 15.0 2597.0 71030.0
robberies 2149 3 4000.2 8653.9 83.0 1940.0 107475.0
months_reported 13 5 11.9 1.1 0.0 12.0 12.0
crimes_percapita 2782 1 1093.0 676.9 16.5 949.7 4352.8
homicides_percapita 1873 1 15.4 12.4 0.2 12.0 94.7
rapes_percapita 2431 3 59.3 32.0 1.6 55.9 199.3
assaults_percapita 2724 3 566.6 369.4 1.6 487.5 2368.2
robberies_percapita 2707 3 460.0 340.9 11.5 374.4 2337.5
data1[!complete.cases(data1$population), ] %>%
  distinct(agency_jurisdiction)
##   agency_jurisdiction
## 1      Louisville, KY
## 2       United States
# Louisville and United States jurisdiction areas, missings by year (report years are limited to those after 2001 for simplicity, but other years data has been checked prior)
data1 %>%
  filter(agency_jurisdiction == 'Louisville, KY' & report_year > 2001) %>%
  select(population, report_year)
##    population report_year
## 1          NA        2002
## 2      623771        2003
## 3      624697        2004
## 4      623735        2005
## 5      626018        2006
## 6      624030        2007
## 7      629679        2008
## 8      631260        2009
## 9      660582        2010
## 10     665152        2011
## 11     666200        2012
## 12     671120        2013
## 13     677710        2014
## 14     680550        2015
data1 %>%
  filter(agency_jurisdiction == 'United States' & report_year > 2001) %>%
  select(population, report_year)
##    population report_year
## 1          NA        2002
## 2          NA        2003
## 3          NA        2004
## 4          NA        2005
## 5          NA        2006
## 6          NA        2007
## 7          NA        2008
## 8          NA        2009
## 9          NA        2010
## 10         NA        2011
## 11         NA        2012
## 12         NA        2013
## 13         NA        2014
## 14         NA        2015
# Presence/Absence of the 'percapita'-type variables due to missing valus
data1 %>%
  filter(agency_jurisdiction == 'Louisville, KY' & report_year > 2001) %>%
  select(report_year, homicides, homicides_percapita)
##    report_year homicides homicides_percapita
## 1         2002        NA                  NA
## 2         2003        50                8.02
## 3         2004        66               10.57
## 4         2005        55                8.82
## 5         2006        50                7.99
## 6         2007        71               11.38
## 7         2008        71               11.28
## 8         2009        62                9.82
## 9         2010        52                7.87
## 10        2011        48                7.22
## 11        2012        62                9.31
## 12        2013        48                7.15
## 13        2014        56                8.26
## 14        2015        81               11.90
data1 %>%
  filter(agency_jurisdiction == 'United States' & report_year > 2001) %>%
  select(report_year, homicides, homicides_percapita)
##    report_year homicides homicides_percapita
## 1         2002     16229                 5.6
## 2         2003     16528                 5.7
## 3         2004     16148                 5.5
## 4         2005     16740                 5.6
## 5         2006     17309                 5.8
## 6         2007     17128                 5.7
## 7         2008     16465                 5.4
## 8         2009     15399                 5.0
## 9         2010     14722                 4.8
## 10        2011     14661                 4.7
## 11        2012     14827                 4.7
## 12        2013     14319                 4.5
## 13        2014     14164                 4.4
## 14        2015     15696                 4.9

Brief look at NAs

Although the data only has a limited number of NA cases (limited to no more than 5% of the total observations per variable) which are unlikely to affect the further analysis for the purposes of the present tutorial additional brief EDA into NA cases is conducted in what follows.

dataNA <- data1[!complete.cases(data1), ]

str(dataNA) # The basic dataset structure can be observed, including the total number of rows containing NAs from the original dataset that have been filtered into a new one
## 'data.frame':    141 obs. of  15 variables:
##  $ report_year        : int  1975 1975 1976 1976 1977 1977 1978 1978 1979 1979 ...
##  $ agency_code        : chr  "KY05680" "" "KY05680" "" ...
##  $ agency_jurisdiction: chr  "Louisville, KY" "United States" "Louisville, KY" "United States" ...
##  $ population         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ violent_crimes     : int  NA 1039710 NA 1004210 NA 1029580 NA 1085550 NA 1208030 ...
##  $ homicides          : int  NA 20510 NA 18780 NA 19120 NA 19560 NA 21460 ...
##  $ rapes              : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ assaults           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ robberies          : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ months_reported    : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ crimes_percapita   : num  NA 488 NA 468 NA ...
##  $ homicides_percapita: num  NA 9.6 NA 8.7 NA 8.8 NA 9 NA 9.8 ...
##  $ rapes_percapita    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ assaults_percapita : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ robberies_percapita: num  NA NA NA NA NA NA NA NA NA NA ...
datasummary_skim(dataNA) # Basic stats using the same package 
Unique (#) Missing (%) Mean SD Min Median Max
report_year 41 0 2003.3 13.7 1975.0 2013.0 2015.0
population 73 49 891732.2 1092370.1 191992.0 651480.0 8550861.0
violent_crimes 107 25 552730.0 707929.3 626.0 9149.5 1932274.0
homicides 100 24 7250.0 9318.6 8.0 145.0 24703.0
rapes 66 53 454.5 433.6 68.0 346.5 2244.0
assaults 66 54 3691.8 4365.8 234.0 2469.0 30546.0
robberies 67 53 2337.8 2784.0 270.0 1447.5 16946.0
months_reported 2 97 0.0 0.0 0.0 0.0 0.0
crimes_percapita 107 25 683.7 346.0 88.4 595.2 1817.1
homicides_percapita 97 24 10.7 10.0 1.2 8.1 59.3
rapes_percapita 67 53 55.2 28.3 5.9 53.1 151.6
assaults_percapita 66 54 437.2 247.4 35.1 382.6 1163.2
robberies_percapita 67 53 275.3 177.1 40.2 222.9 784.3

As it can be seen from the basic statistics, the total number of NA cases (rows containing NA values) in the original dataset was 141 with some rows containing multiple variables with data values missing and some only having one. Variables are matched in pairs, naturally resulting in repeated NA counts / shares: e.g. share of NA cases for assaults in the new data split equals to 54 %, resulting in 54 % NA cases as identified for the ‘assaults_percapita’ variable. As it could be seen from the summary table above, the minimal value of the ‘assaults’ variable (the same applies to other variables of the same kind) amounts to 15 cases thus implying that it is rather unclear whether NA cases are input as indicators of no criminal cases registered in the measured time period in question or complete lack of any information on the present metric.

table(dataNA[dataNA$agency_jurisdiction == 'Louisville, KY' | dataNA$agency_jurisdiction == 'United States', ]$agency_jurisdiction) 
## 
## Louisville, KY  United States 
##             29             41
# Alternative way to check the number of missing cases by jurisdiction area to identify the areas that require additional analysis
# NB: Only the two areas identified earlier are filtered out

As it way mentioned above, it can be identified that agencies within ‘Louisville, KY’ or ‘United States’ jurisdiction areas are the ones contributing to the most of missing data cases (29 and 41 rows with at least one NA case detected accordingly) while the rest of the jurisdiction areas only registered number of NA cases limited up to 3. Since data collected in the above mentioned jurisdiction areas is contributing to the most of NA cases, further analysis could be potentially limited to those.

Static Visualizations

# Further visualizations are built on the dataset with NAs excluded 
data2 <- na.omit(data1)

# A quick glance at what kind of a distribution it is
a1 <- ggplot(data2, aes(x = population)) +
  geom_histogram(fill="darkred") +
  labs(y="Frequency", x="Population") +
  theme_bw() 

# Taking a closer look at the most 'populated' area of the graph
a2 <- ggplot(data2, aes(x = population)) +
  geom_histogram(fill="red") +
  labs(y="Frequency", x="Population") +
  theme_bw() +
  xlim(0, 1000000)

ggarrange(a1, a2, 
          ncol = 2, nrow = 1)

vars n mean sd median trimmed mad min max range skew kurtosis se
1 2688 793125.7 1010316 534129 596294.3 266661.9 100763 8473938 8373175 5.097839 30.26807 19486.9

A quick analysis of the distribution reveals that the values range from 100,763 to 8,473,938, with the mean being equal to 793,125.7. The median (534,129) value is smaller than the latter statistical measure indicating a clear right-skewness of the distribution: there are more observations from lower populated areas. That is also indicated by the positive value of the skew (5.097839). Since the kurtosis is positive, the distribution is leptokurtic (30.26807). Apart from that, the distribution concentrated in the left side of the overall histogram looks surprisingly normal and resembles the bell curve.

# Using the knowledge of the population distribution, 
# the data can be divided into three approximately equal groups for the further purposes of data visualization
data2$population <- as.numeric(as.character(data2$population))

data2$populationCH <- rep(NA, length(data2$population))

data2$populationCH[data2$population < 400000] <- "Less than 400k"

data2$populationCH[data2$population > 400000 & data2$population < 650000] <- "400k - 650k"

data2$populationCH[data2$population > 650000]  <- "More than 650k +"  

data2$populationCH <- as.factor(data2$populationCH)

# Visualization by separate population groups
b1 <- ggplot(data2, aes(x = populationCH, fill = populationCH)) +
  geom_bar() + 
  scale_fill_brewer(palette = "YlOrRd") +
  labs(x = "Population group", y = "N", title = "Distribution of population groups") + 
  theme_bw() +
  theme(legend.position = "none")

b2 <- ggplot(data2, aes(x = populationCH, y =..count../sum(..count..)*100, fill = populationCH)) +
  geom_bar() + 
  scale_fill_brewer(palette = "YlOrRd") + 
  labs(x = "Population group", y = "%", title = "Distribution of population groups") + 
  theme_bw() + 
  theme(legend.position = "none") +
  stat_count(aes(label=round(..count../sum(..count..)*100, digits = 1)), vjust=0, 
                          geom="text", position="identity")

ggarrange(b1, b2, 
          ncol = 2, nrow = 1)

Ideally, in the full version of the EDA similar static visualizations are to be built for all the studied variables. Such are omitted in the present tutorial.

Dynamic Visualizations

The following section of the tutorial contains various dynamic visualization plots built based the dataset in use with the sole purpose of showcasing how helpful dynamic plots can be in direct comparison to static counterparts when it comes to historic data such as the one studied in the present tutitorial. While the usage of interactive plots

  • Can be updated based on actions performed by the user
  • Allows to hover over an object, zoom in etc.
  • Increases one’s ability to explore the story the data are telling

The inclusion of dynamic plots in data analysis reports implies that such visualizations do not require user input once the code to generate the graphics is written and change automatically.

Note that for all the graphs below markers contain information on the jurisdiction area that the data point belongs to. The rest of the variables used in the graph generation is indicated on the axis accordingly.

Visualizations showcased in what follows could be potentially used to report on the fluctuation of various crime-related trends through studied years and presented in a rather easy-to-understand visial way to various stakeholder parties.

Simple dynamic visualization with no parameters changed

#The original full dataset is used
data1 %>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_markers(frame = ~report_year, ids = ~agency_code)%>%
  layout(title = "Relationship between crime types in police jurisdiction areas: robbery and homicide",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita"))%>%
  hide_legend()

Updated slider: color and text size

a1 <- data1%>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_markers(frame = ~report_year, ids = ~agency_code, marker = list(color = "darkred")) %>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita"))%>%
  hide_legend() 
  
a1 %>%
  animation_slider(currentvalue = list(prefix = NULL, font = list(color = "darkred", size = 30))) 

Coloring according to population size as a way to add another variable to the plot

a2 <- data1%>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction)) %>%
  add_markers(frame = ~report_year, ids = ~agency_code, color =~population, colors = brewer.pal(50, "YlOrRd")) %>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita"))%>%
  hide_legend()
  
a2 %>%
  animation_slider(currentvalue = list(prefix = NULL, font = list(color = "darkred", size = 30)))

Hidden slider, added text layer

data1%>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_text(x = 1250, y = 50, text = ~report_year, frame = ~report_year, 
           textfont = list(color = toRGB("gray80"), size = 200)) %>%
  add_markers(frame = ~report_year, ids = ~agency_code, marker = list(color = "darkred"))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita")) %>%
  animation_slider(hide = TRUE) %>%
  hide_legend()

New layer of markers

data1975 <- data1 %>%
  filter(report_year == 1975) 

data1 %>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_text(x = 1250, y = 50, text = ~report_year, frame = ~report_year, 
           textfont = list(color = toRGB("gray95"), size = 200)) %>%
  add_markers(data = data1975, marker = list(color = toRGB("gray20"), opacity = 0.5)) %>%
  add_markers(frame = ~report_year, ids = ~agency_code, data = data1, marker = list(color = "darkred"))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita"))%>%
  animation_slider(hide = TRUE)%>%
  hide_legend()

Linking plots and Filtering

Clean base plot

shared_crime <- SharedData$new(data1975)

cols <- toRGB(RColorBrewer::brewer.pal(3, "PRGn"))

p1 <- shared_crime%>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_markers(marker = list(color = "red",size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide, 1975",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita"))%>%
  hide_legend()
  
p2 <- shared_crime%>%
  plot_ly(x = ~robberies_percapita, y = ~rapes_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_markers(marker = list(color = "darkred",size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and rape, 1975",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Rapes per capita"))%>%
  hide_legend()

subplot(p1, p2, titleY = TRUE, titleX = TRUE, shareX = TRUE)%>%
  hide_legend()

Manual selectizer of a group

data1975N <- data2 %>%
  filter(report_year == 1975)

shared_crime2 <- SharedData$new(data1975N, key =~ populationCH)

p3 <- shared_crime2 %>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  group_by(populationCH)%>%
  add_markers(marker = list(color = "red",size = 15)) %>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide, 1975",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita"))%>%
  hide_legend() 
  
p4 <- shared_crime2%>%
  plot_ly(x = ~robberies_percapita, y = ~rapes_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_markers(marker = list(color = "darkred", size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and rape, 1975",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Rapes per capita"))%>%
  hide_legend()

subplot(p3, p4, titleY = TRUE, titleX = TRUE, shareX = TRUE) %>%
  hide_legend() 

Filtering using a slider

p5 <- shared_crime %>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction))%>%
  add_markers(marker = list(color = "red", size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide, 1975",
         xaxis = list(title = "Robberies per capita", range = c(0, 1500)),
         yaxis = list(title = "Homicide per capita"), range = c(0, 50)) %>%
  hide_legend()
  
p6 <- shared_crime%>%
  plot_ly(x = ~robberies_percapita, y = ~rapes_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction)) %>%
  add_markers(marker = list(color = "darkred", size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and rape, 1975",
         xaxis = list(title = "Robberies per capita", range = c(0, 1500)),
         yaxis = list(title = "Rapes per capita"), range = c(0, 100)) %>%
  hide_legend() 

bscols(list(p5, p6, filter_slider(id = "robberies_percapita", label = "Robberies", sharedData = shared_crime, column = ~robberies_percapita)))

Filtering using a checkbox

shared_crime2 <- SharedData$new(data1975N, key =~ populationCH)

p7 <- shared_crime2 %>%
  plot_ly(x = ~robberies_percapita, y = ~homicides_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction), color =~ populationCH)%>%
  add_markers(marker = list(color = "red", size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and homicide, 1975",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Homicides per capita")) %>%
  hide_legend()
  
p8 <- shared_crime2 %>%
  plot_ly(x = ~robberies_percapita, y = ~rapes_percapita, hoverinfo = "text",
          text = ~paste("Agency jurisdiction:", agency_jurisdiction), color =~ populationCH)%>%
  add_markers(marker = list(color = "darkred", size = 15))%>%
  layout(title = "Relationship between crimes in police jurisdiction areas: robbery and rape, 1975",
         xaxis = list(title = "Robberies per capita"),
         yaxis = list(title = "Rapes per capita"))%>%
  hide_legend()

bscols(list(p7, p8, filter_checkbox(id = "populationCH",  label = "Area population",  sharedData = shared_crime2, group =~ populationCH)))