It comes as no surprise that Australians love their beaches. The country’s second-largest state, Queensland, is home to some of the world’s best beaches. These beaches are great for swimming and surfing, but it is important to be aware that sharks inhabit Australia’s coastlines.
In 1962, the Queensland Government initiated a Shark Control Program (SCP) to provide swimmer protection at the state’s most popular swimming beaches. Before the program was put in place, there were 36 shark attacks in Queensland, resulting in 19 fatalities from 1916 to 1962. Since the SCP implementation in 1962, there has only been one fatal shark attack on a protected beach.
The program relies on nets, drumlines (baited hooks) or a combination of both to minimise the threat of shark attacks to humans. The equipment is set offshore adjacent to the beach and is set to reduce the likelihood of shark attacks by catching large, potentially dangerous sharks that might be moving through the area.
The dataset has been sourced from the Queensland Government through its Open Data Portal (https://www.data.qld.gov.au/dataset/shark-control-program-shark-catch-statistics/resource/5c6be990-3938-4125-8cca-dac0cd734263). The dataset contains the species of sharks caught from 2001 to 2016, the date, area, location, length and number of sharks caught by the program.
1.1 One of the biggest challenges is that the original presentation of the data is not in a format that can be readily used. As shown in the below screenshot, each year is represented as a separate tab. The data is categorized by Species Name and there are totals for each Species Name, which will not be needed for our purposes. Lastly, the latitude and longitude values are in a degree format (°) which R is not able to process.
The proposed solution is to select 5 years worth of data, from 2012 to 2016, since manual effort will be required to restructure the data. The data for each year is to be copied onto one tab and a new column will be created to reflect the year. The totals in between each Species Name will also be removed. The Species Name will also be dragged down so each row will reflect the name of the Species. To address the longitude and latitude values, the degree will be converted to decimal format in Excel by converting into minutes and seconds. For instance, 145°39.78 will be converted to 145.7569444 by using the formula 145 + (39/60) + (78/3600).
After making the above-mentioned adjustments, the formatted data appears as follows.
1.2 Another challenge is the highly categorical and cardinal data. There are 35 different species of sharks found in 93 different locations. Plotting this number of sharks and locations in a single graph will make the visualization appear cluttered and make interpretation more difficult. To address the high dimensionality, only the top 5-10 sharks/locations will be analyzed for simplicity. Also, instead of plotting each shark at a given location, clusters of sharks will be plotted for easier intepretability and the user can interactively delve deeper into each cluster.
1.3 Lastly, using the basic ggplot2 does not provide much room for interactivity. The plotly package can be used to allow user engagement but the maps are very basic and do not show terrain and bodies of water. To create maps that are more visually appealing and interactive, the proposed solution is to use the leaflet package, which is an open-source JavaScript library for interactive map making.
The sketched visualizations display maps showing shark species and the locations they were caught.
To begin with, all necessary packages including sp, tidyverse, leaflet, htmltools, ggplot2, ggridges, forcats, and viridis are installed and loaded onto the R environment.
packages = c('sp', 'tidyverse', 'leaflet', 'htmltools', 'ggplot2', 'ggridges', 'forcats', 'viridis')
for(p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}
Next, the data is imported into R using the read_csv function. As previously mentioned, formatting and restructuring of the raw data was already completed using Excel so the formatted dataset was uploaded to R.
sharks <- read_csv("Formatted Data_July29.csv")
The Longitude and Latitude columns are adjusted to numeric values using the mutate() function.
sharks %>%
mutate('sharks$`Longitude (deg)`' = as.numeric(sharks$`Longitude (deg)`)) %>%
mutate('sharks$`Latitude (deg)`' = as.numeric(sharks$`Latitude (deg)`))
Initial exploratory analysis is performed to understand the variables and their distribution better. Firstly, the number of sharks caught over time is examined. Only the relevant variables being presented in the visualization are selected (Number Caught and Year). The dataframe is then grouped by Year, and the Number Caught values are aggregated.
line <- sharks %>%
#sharks$Year <- as.character(sharks$Year) %>%
select(`Number Caught`, Year) %>%
group_by(Year) %>%
summarise(Total_Count = sum(`Number Caught`))
Once the appropriate dataframe is created, ggplot2 is used to create a line chart to show the trend in sharks caught from 2012 to 2016. Geom_line() is used to plot the line chart and the y-axis scale has been selected from 0 to 1000. It can be seen from Figure 1 that the number of sharks caught has been showing an overall declining trend, except in 2015 when the number of sharks caught increased from the prior year.
ggplot(line, aes(x=Year, y=Total_Count))+
geom_line(color="lightblue", size=2)+ylim(0, 1000)+geom_text(aes(label=Total_Count), hjust=0,vjust=0)+theme_classic()+labs(title="Figure 1: Total Number of Sharks Caught, 2012-2016", x="Year", y="Total Count", caption="Source: Queensland Government Open Data Portal")+theme(plot.caption = element_text(hjust = 1, face = "italic"))
Since there is data on the species type, it would be interesting to know which species of sharks have been caught the most. A new dataframe called lollipop is constructed by selecting Species Name and Number Caught. Similar to the previous visual, the dataframe is grouped by the Species Name and Total Number Caught is aggregated. The dataframe is then sorted in descending order by Total Count.
lollipop <- sharks %>%
select(`Species Name`, `Number Caught`) %>%
group_by(`Species Name`) %>%
summarise(Total_Count = sum(`Number Caught`)) %>%
arrange(desc(Total_Count)) %>%
mutate(`Species Name`=fct_reorder(`Species Name`, (Total_Count)))
A lollipop chart, which essentially conveys the same information as a bar chart, is created by using geom_point and geom_segment. The lollipops are also filled with the values for the total count for each species type. The coord_flip() function is used to swap the x and y axes so that it is easier to read the species name when placed on the y axis.
ggplot(lollipop, aes(x=`Species Name`, y=Total_Count, label=Total_Count))+geom_segment(aes(x=`Species Name`, xend=`Species Name`, y=0, yend=Total_Count), color="grey")+geom_point(size=7, color="skyblue")+coord_flip()+geom_text(color="black", size=3)+labs(title="Figure 2: Species of Sharks Caught, 2012-2016", y="Total Count", caption="Source: Queensland Government Open Data Portal")+theme(plot.caption = element_text(hjust = 1, face = "italic"))
By far, the Tiger Shark has been captured the most number of times, followed by the Bull Whaler and Blacktip Reef Whaler. The Tiger Shark and Bull Whaler found in Australian waters are one of the most deadliest species and clearly have been targeted by Queenland’s SCP program. Comparatively, the Blacktip Reef Whaler is not usually dangerous but may attack the legs or feet of people wading in shallow water.
The Great White Shark (also known as White Shark) is the largest and deadliest of all predatory fish - only 42 of this shark species has been captured from 2012 to 2016. It is possible this is because of its massive size and weight as well as the fact that its population is declining since it has been declared a vulnerable species.
Next, a new dataframe called top10sharks is created by filtering the top 10 sharks that have been caught by the SCP program in Queensland.
top10sharks <- sharks %>%
filter(`Species Name` == c('TIGER SHARK', 'BULL WHALER', 'BLACKTIP REEF WHALER', 'LONG NOSE WHALER', 'TAWNY SHARK', 'SPOT-TAIL WHALER', 'GREAT HAMMERHEAD', 'SCALLOPED HAMMERHEAD', 'PIGEYE WHALER', 'DUSKY WHALER'))
This visual now compares the size distribution of the top 10 sharks caught. The geom_density_ridges() function is used to calculate and plot the density estimate of shark length in meters. Additional formatting is completed to change the color palette of the visual and remove grid labels.
ggplot(top10sharks, aes(x = top10sharks$`Length (m)`, y = top10sharks$`Species Name`))+geom_density_ridges(aes(fill = top10sharks$`Species Name`))+scale_fill_viridis(discrete=TRUE)+scale_color_viridis(discrete=TRUE)+theme_ridges(grid = FALSE, center_axis_labels = TRUE)+labs(title="Figure 3: Distribution of Size for Top 10 Sharks Caught, 2012-2016", x="Length (m)", y=" ", caption="Source: Queensland Government Open Data Portal")+theme(plot.caption = element_text(hjust = 1, face = "italic"))+labs(fill="Shark Species")
As illustrated, certain sharks are relatively smaller in length such as the Blacktip Reef Whaler and Spot Tail Whaler. The fact that the Blacktip Reef Whaler is not typically dangerous is likely due to its smaller size. The Tiger Shark and Great Hammerhead shark sizes vary to a large extent, some of them almost reaching 4 meters and beyond!
Furthermore, we will look at another dimension, namely location of where the sharks were captured. A new dataframe called barchart is created by selecting the Location and Number Caught variables - the dataframe is then grouped by Location and the number of sharks caught for each location is summed and presented in descending order. The slice function is used to show data for the top 9 locations.
barchart <- sharks %>%
select(Location, `Number Caught`) %>%
group_by(Location) %>%
summarise(Total_Count = sum(`Number Caught`)) %>%
arrange(desc(Total_Count)) %>%
slice(1:9) %>%
mutate(Location=fct_reorder(Location, (Total_Count)))
To plot a barchart, geom_bar() is used and the x and y axes are swapped for easier readability of location names. Labels have been included so the count is easier to compare.
ggplot(barchart, aes(x=Location, y=Total_Count, fill=Location))+geom_bar(stat="identity")+coord_flip()+ scale_fill_brewer(palette="Blues")+theme_classic()+geom_text(aes(label=Total_Count, hjust = -0.2), colour="black", size=3.5)+labs(title="Figure 4: Locations by Total Number of Sharks Captured, 2012-2016", y="Total Count", y=" ", caption="Source: Queensland Government Open Data Portal")+theme(plot.caption = element_text(hjust = 1, face = "italic"))
As seen in Figure 4, Tannum Sands and Rainbow Beach are the leading locations with the most number of sharks caught. Tannum Sands is a local hot spot with small variation in seasonal water temperatures, making it perfect for year-round swimming. The government has recently installed additional drumlines to ensure swimmer safety at Tannum Sands.
We will now delve further into Location. The leaflet package will be used to generate maps with the use of longitude and latitude values of locations sharks were caught at. Icons are used as markers to show these locations. To customize the icon, a shark image is externally sourced and the makeIcon() function is used to create an icon.
sharkIcon <- makeIcon(
iconUrl = "http://vidoukin.com/wp-content/uploads/leaflet-maps-marker-icons/shark-export.png",
iconWidth = 20, iconHeight = 20,
iconAnchorX = 22, iconAnchorY = 94,
)
The original sharks dataframe is used to plot the captured locations on a map. The leaflet package allows to a tile layer to be added to the map and the “Esri.NatGeoWorldMap” was selected. The markers of longitude and latitude values are added and displayed as shark icons. The map is interactive and allows the user to zoom into locations. Hovering over the shark icon will display the location the shark was caught. It can be seen from the visual that the Shark Control Program stretches along the eastern coast of Queensland, all the way from Cairns to Gold Coast.
leaflet(data = sharks) %>% addProviderTiles("Esri.NatGeoWorldMap") %>%
addMarkers(~sharks$`Longitude (deg)`, ~sharks$`Latitude (deg)`, icon=sharkIcon, label = ~htmlEscape(Location))
The next map clusters the markers from the previous map. The is done using the Leaflet.markercluster plug-in. As the user clicks and digs deeper into each cluster, the details on the area, location and shark species will be displayed.
sharks$X <- paste0("<strong>Area: </strong>",
sharks$Area,
"<br><strong>Location: </strong>",
sharks$Location,
"<br><strong>Species Name: </strong>",
sharks$`Species Name`,
"<br>","<strong>Number Caught:</strong>",
sharks$`Number Caught`)
labs <- as.list(sharks$X)
leaflet(data = sharks) %>% addProviderTiles("Esri.NatGeoWorldMap") %>%
addMarkers(~sharks$`Longitude (deg)`, ~sharks$`Latitude (deg)`, icon=sharkIcon, label = lapply(labs, HTML), clusterOptions = markerClusterOptions())
The last visual focuses on the top 5 sharks caught. These sharks have been filtered out in a new dataframe called top5sharks and a color palette for each shark species has been set.
top5sharks <- sharks %>%
filter(`Species Name` == c('TIGER SHARK', 'BULL WHALER', 'BLACKTIP REEF WHALER', 'LONG NOSE WHALER', 'TAWNY SHARK'))
pal <- colorFactor(c("navy", "lightgreen", "purple", "magenta", "maroon"), domain = c('TIGER SHARK', 'BULL WHALER', 'BLACKTIP REEF WHALER', 'LONG NOSE WHALER', 'TAWNY SHARK'))
The locations of where the top 5 sharks were caught have been marked by circle markers and a legend shows which color corresponds to each shark species. Similar to the previous visual, the user can hover over the circle markers and important details on the area, location and shark species will be displayed.
top5sharks$X <- paste0("<strong>Area: </strong>",
top5sharks$Area,
"<br><strong>Location: </strong>",
top5sharks$Location,
"<br><strong>Species Name: </strong>",
top5sharks$`Species Name`,
"<br>","<strong>Number Caught:</strong>",
top5sharks$`Number Caught`)
labs2 <- as.list(top5sharks$X)
leaflet(data = top5sharks) %>% addProviderTiles("Esri.NatGeoWorldMap") %>%
addCircleMarkers(~top5sharks$`Longitude (deg)`, ~top5sharks$`Latitude (deg)`, color=~pal(top5sharks$`Species Name`), stroke=FALSE, fillOpacity=1, radius=6, label = lapply(labs2, HTML)) %>% addLegend(pal=pal, values=~top5sharks$`Species Name`, title="Color Legend", opacity=1) %>% setView(153, -23, zoom = 5)
The following two maps have been selected for the final visualization.
From the above visuals, the following insights can be drawn.
Queensland’s Shark Control Program has been controversial - although it has been successful in reducing shark attack incidents at protected beaches, drum lines can result in the death of sharks. While they do not directly lead to extinction, there may not also be the room for the shark population to recover from being endangered. Sharks play an important role in the functioning of marine ecosystems and are needed for healthy oceans. The program has not only killed sharks, but also other marine animals such as dolphins and turtles. As a result, the Queensland government needs to delicately balance marine life with the threat to human life.