The aim of this report is to investigate shark attacks over the years 2010 to 2018, looking at where people are most likely to be attacked and what gender is most likely to be attacked. The three discussion questions we formulated answered which country was home to the most shark attacks recorded, and what gender was more commonly attacked. Our first research question was ‘What year had the most shark attacks recorded’, and we discovered that the year 2015 had the most incidents reported with 153 shark attacks globally, closely followed by 2017 with 137 attacks recorded. Whilst looking into this question we also found that the amount of attacks each year were fairly consistent, with the average of 125.6, and a standard deviation of 9.225 representing a tight cluster of data (not including 2018, as the year has not been fully completed). When we represented the findings visually, we could see constant trend through the use of a straight line of best fit, showing a levelled set of data. Our second research question was, ‘Which country had the most attacks reported between 2010 and 2018?’, which revealed 2 countries accounting for the majority of the attacks in the 8-year range. Our findings showed that USA was responsible for 509 attacks out of the total 1071, making them alone accountable for almost 50% of the attacks in the last 8 years. We also looked into how many attacks Australia had reported to compare the data, discovering a mere 223 attacks within the range equating to less than 30 per year on average. We looked further into the data to see why this occurred and found that the majority of the incidents involved with attacks were surfing and swimming, and these activities occur predominantly within USA and Australia with both countries hosting the top surfing competitions around the world. Finally, we looked into the question of what gender was more commonly attacked within 2010-2018. Our findings were astounding, we found that men were more than 4 times more likely to be attacked by sharks within the past 8 years. This could be due to a number of reasons such as male predominance in activities around the water including; surfing, fishing, swimming and sailing. However, the graph revealed that there were four gender types. This was initially an issue, before further inspection revealed that the first column which was untitled was in fact genders that had simply not been reported, and the fourth column was a mistake in the data set. This shows the set of data was not fully reliable as it housed some issues involving the recording of the gender of the individual attacked. Overall our report was both interesting and intellectual developing some unique findings, as a group we worked well together to formulate the coding necessary to visually show the data and report findings within it.
# LOAD DATA v2 - uncomment the link below to: load data from local file
sharkattacks = read.csv("~/Desktop/Shark Attack Data 2.csv")
# Quick look at top 5 rows of data
head(sharkattacks)
## Year Country Activity Sex Age Fatal..Y.N.
## 1 2018 IRELAND Fishing M 40s N
## 2 2018 BAHAMAS Diving M N
## 3 2018 USA Swimming M 14 N
## 4 2018 USA Surfing M 14 N
## 5 2018 BAHAMAS F N
## 6 2018 USA M N
## Size of data
# For the mtcars dataset, there are 32 rows (the types of cars) and 11 variables (properties of the cars).
dim(sharkattacks)
## [1] 1085 6
## R's classification of data
class(sharkattacks)
## [1] "data.frame"
## R's classification of variables
str(sharkattacks)
## 'data.frame': 1085 obs. of 6 variables:
## $ Year : int 2018 2018 2018 2018 2018 2018 2018 2018 2018 2018 ...
## $ Country : Factor w/ 76 levels "","ANTIGUA","ARUBA",..: 33 7 75 75 7 75 75 75 75 24 ...
## $ Activity : Factor w/ 164 levels "","2 boats capsized",..: 44 26 135 131 1 1 135 115 148 114 ...
## $ Sex : Factor w/ 4 levels "","F","M","M ": 3 3 3 3 2 3 3 3 2 3 ...
## $ Age : Factor w/ 84 levels "","10","11","12",..: 38 1 6 6 1 1 63 40 82 40 ...
## $ Fatal..Y.N.: Factor w/ 4 levels "","N","UNKNOWN",..: 2 2 2 2 2 2 2 2 2 4 ...
#sapply(s, class)
Summary: The source of data can be found on the shark attack file website http://www.sharkattackfile.net/incidentlog.htm . This data has been sourced directly from reported shark incidence throughout history. The data has been put together by a team of marine biologists to further understand common factors in shark attack incidents. The data is classified into many variables, stating where, why and when these incidents happened and who was affected.
This data is valid and reliable as it comes from a reliable source, collected by the marine biology team involving specifically shark attacks. However some sections of the data are not included such as gender on some attacks creating some issues with the detail of the data.
Possible issues of the data include: Non reported attacks: shark attacks may occur more frequently without being reported. These attacks may occur in countries with little communication globally, therefore the priority of reporting the shark attacks might not be high. This may cause issues with the set of data as it may not include all attacks that happen globally.
Each column in the data represents something different, with the left most column representing the year in which the attack was, the second column representing the country the attack occurred, the next column representing the activity involved with the attack, the fourth column represents the sex of the individual attacked, the fifth column depicting the age of the individual, and the last column shows whether or not the incident was fatal. Each row of the data represents a single incident.
The table below shows the amount of shark attacks present in each year. As you can see from the numbers, the year 2015 exhibited the highest amount of shark attacks globally.(1, Matthews, 2018)
table(sharkattacks$Year)
##
## 2010 2011 2012 2013 2014 2015 2016 2017 2018
## 101 128 117 122 127 143 130 137 79
hist(sharkattacks$Year, col='red')
Summary: The histogram provides a visual representation of the data above. The consistent level of shark attacks present each year around the world can be seen, shown by the relatively level of the bars on the graph. 2018 however, looks a lot smaller due to the incomplete record of the year, with there being more data to be collected for this year. 2015 shows the highest number of recorded shark attacks at 143.
The information below states how many attacks have occurred in each country since 2010. The wide range of data on this table includes all countries exhibiting shark attacks around the world. Australia and USA have had the highest number of attacks since 2010 with USA at the top of the chart with 509 attacks in the last 8 years, followed by Australia with 223 attacks since 2010.(1, Matthews, 2018)
table(sharkattacks$Country)
##
##
## 2
## ANTIGUA
## 1
## ARUBA
## 1
## ATLANTIC OCEAN
## 1
## AUSTRALIA
## 223
## AZORES
## 1
## BAHAMAS
## 38
## BELIZE
## 1
## BRAZIL
## 19
## CANADA
## 1
## CAPE VERDE
## 1
## CAYMAN ISLANDS
## 2
## CHILE
## 1
## CHINA
## 2
## COLOMBIA
## 1
## COLUMBIA
## 3
## COMOROS
## 1
## COSTA RICA
## 5
## CROATIA
## 1
## CUBA
## 1
## DIEGO GARCIA
## 1
## DOMINICAN REPUBLIC
## 3
## ECUADOR
## 4
## EGYPT
## 12
## ENGLAND
## 1
## Fiji
## 1
## FIJI
## 6
## FRANCE
## 2
## FRENCH POLYNESIA
## 14
## GREECE
## 1
## GUAM
## 2
## INDONESIA
## 6
## IRELAND
## 1
## ISRAEL
## 1
## ITALY
## 2
## JAMAICA
## 3
## JAPAN
## 5
## KENYA
## 1
## KIRIBATI
## 1
## LIBYA
## 1
## MADAGASCAR
## 1
## MALAYSIA
## 2
## MALDIVES
## 1
## MALTA
## 1
## MAURITIUS
## 2
## MEXICO
## 16
## MOZAMBIQUE
## 3
## NEW CALEDONIA
## 12
## NEW ZEALAND
## 15
## NIGERIA
## 1
## PALESTINIAN TERRITORIES
## 1
## PAPUA NEW GUINEA
## 3
## PHILIPPINES
## 3
## PUERTO RICO
## 1
## REUNION
## 23
## RUSSIA
## 3
## SAMOA
## 2
## SAUDI ARABIA
## 1
## SCOTLAND
## 2
## SEYCHELLES
## 3
## SOLOMON ISLANDS
## 2
## SOUTH AFRICA
## 63
## SOUTH KOREA
## 1
## SPAIN
## 14
## ST HELENA, British overseas territory
## 2
## ST. MARTIN
## 1
## TAIWAN
## 2
## THAILAND
## 3
## TONGA
## 2
## TRINIDAD & TOBAGO
## 2
## TURKS & CAICOS
## 2
## UNITED ARAB EMIRATES
## 2
## UNITED ARAB EMIRATES (UAE)
## 2
## UNITED KINGDOM
## 2
## USA
## 509
## VIETNAM
## 6
plot(sharkattacks$Country)
title('Shark attacks per country (2010-2018)')
Summary: As shown on the graph, the countries are in alphabetical order on the x axis with the number of attacks represented on the y axis. The high spike on the left hand side of the graph is Australia and the high spike on the right on the chart is USA. As you can see these two countries account for the majority of attacks globally within the last 8 years.
Shown below is the data outlining which gender has been attacked more within the last 8 years. The section without a title are attacks where the gender has not been specified giving the data some unreliability. The second section named M on the right is a mistake in the set of data and has been made its own variable. The number depict that males are much more likely to be attacked that females. This could be due to many reasons, however, males have been attacked more than 4 times more frequently than females.(1, Matthews, 2018)
table(sharkattacks$Sex)
##
## F M M
## 35 201 848 1
plot(sharkattacks$Sex)
title('Gender attacks since 2010')
Summary: The table provides a visual comparison between the attacks on males and females, showing the drastic difference between the two. As you can see the male bar is almost 4 time larger than the female one showing a 4x more likely outcome of occurring. The small bar on the left shows the genders unspecified whilst the single bar on the right depicts the misread data.
The data provided was mostly clear and concise with a few minor issues. Analysis of attacks per year was clear and expressed with a visual chart outlining the years which have housed the greatest number of attacks since 2010. The table outlining the attacks per country is highly intuitive exhibiting the exact number of attacks within each country. Furthermore the graph depicts which countries had the highest number of attacks vs the smallest, however as a result of the high collation of data it is hard to read and understand the x axis on the graph which creates some issues when interpreting the data. Finally, the interesting difference between genders attacks since 2010 is depicted visually through both a table and a graph showing that males are more likely to be attacked than females. Over all the data collected housed some issues however, the data presented is clear and interpretable with visuals aids to further represent the data shown.
In our group, we worked together to investigate data that we were both interested in, and that had enough information in it to be able to get an interesting discussion out of. We both looked at the data separately to begin with, before coming together to decide on which parts of the data we wanted to investigate most, allowing for us to put together our three research questions. We divided the work equally, and worked on all parts of the assignment together, with collaboration taking place particularly within the coding sections. Overall, I found the task to be an enjoyable and interesting assignment where we worked well as a team.