1 Executive Summary

  • The aim of this report is to …
  • The main discoveries are …


2 Full Report

2.1 Initial Data Analysis (IDA)

# Load Data direct from html
cape = read.csv("http://www.maths.usyd.edu.au/u/UG/OL/OLEO1631/r/livelabs/shark_bites_attitudes/CapeTown021012.csv",na.strings="#NULL!")

# Quick look at top 5 rows of data
head(cape)
##   id sex age race beachattend Seals Dolphins Sharks Lifsavers NSRI
## 1  1   1  26    1           3    10       10     10         5    6
## 2  2   2  25    1           1    10       10     10        NA   NA
## 3  3   1  40    2           3     6        6      6         4   NA
## 4  4   1  60    1           3    10       10     10         7   10
## 5  5   2  23    2           1    10       10     10         7   10
## 6  6   1  61    1           1    10       10     10         9   10
##   SSpotters beach sharkbite SealPride2 SharkPride2 SharkSpot2 Lifesavers2
## 1         5     1         1          3           3          2           2
## 2         7     1         1          3           3          3          NA
## 3         5     1         1          3           3          2           2
## 4         7     1         1          3           3          3           3
## 5         7     1         1          3           3          3           3
## 6         9     1         1          3           3          3           3
##   NSRI2 Dolphin2
## 1     3        3
## 2    NA        3
## 3    NA        3
## 4     3        3
## 5     3        3
## 6     3        3
## Size of data
dim(cape)
## [1] 100  19
## R's classification of data
class(cape)
## [1] "data.frame"
## R's classification of variables
str(cape)
## 'data.frame':    100 obs. of  19 variables:
##  $ id         : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ sex        : int  1 2 1 1 2 1 2 1 1 2 ...
##  $ age        : int  26 25 40 60 23 61 61 22 21 30 ...
##  $ race       : int  1 1 2 1 2 1 1 2 2 2 ...
##  $ beachattend: int  3 1 3 3 1 1 1 3 1 1 ...
##  $ Seals      : int  10 10 6 10 10 10 10 8 6 1 ...
##  $ Dolphins   : int  10 10 6 10 10 10 10 4 8 1 ...
##  $ Sharks     : int  10 10 6 10 10 10 10 3 2 1 ...
##  $ Lifsavers  : int  5 NA 4 7 7 9 9 5 1 1 ...
##  $ NSRI       : int  6 NA NA 10 10 10 10 3 3 1 ...
##  $ SSpotters  : int  5 7 5 7 7 9 9 10 3 1 ...
##  $ beach      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ sharkbite  : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ SealPride2 : int  3 3 3 3 3 3 3 3 3 1 ...
##  $ SharkPride2: int  3 3 3 3 3 3 3 1 1 1 ...
##  $ SharkSpot2 : int  2 3 2 3 3 3 3 3 1 1 ...
##  $ Lifesavers2: int  2 NA 2 3 3 3 3 2 1 1 ...
##  $ NSRI2      : int  3 NA NA 3 3 3 3 1 1 1 ...
##  $ Dolphin2   : int  3 3 3 3 3 3 3 2 3 1 ...
#sapply(cape, class)

Summary:

  • The data came from …
  • The data is/is not valid because …
  • Possible issues include …
  • Each row represents …
  • Each column represents …
  • The total number of people surveyed about their attitude to sharks before and after a shark attack was …


2.2 Research Questions

  • In the full survey, what type of people were surveyed?
  • Was there any difference between the type of people surveyed before and after the shark attack (on 28/9/11)?
  • How often did the people surveyed visit the beach?
  • Did people’s attitude to sharks change after the shark attack?
  • Is there a difference in the level of confidence in local beach safety measures offered by surf lifesavers before and after the shark attack?

Other possible Research Questions include …


2.3 In the full survey, what type of people were surveyed?

  • First consider the gender of people

This is an analysis of 1 qualitative (categorical) variable.

## Select 1 qualitative variable
sex = cape$sex

## Produce frequency table
counts = table(sex)
counts
## sex
##  1  2 
## 49 51
# Produce barplot
barplot(counts, col = c("light pink", "lavender"), names.arg = c("Male", "Female"), main = "Sex of Participants")

Summary: In terms of gender, the amount of men and women in the sample was almost an even split. 49% of people surveyed at the beach were male and 51% of participants were female.

  • Next consider the gender of people divided by race.

This is called facetting (or filtering or dividing) 1 qualitative variable by a 2nd qualitative variable.

## Produce contingency table of 2 qualitative variables
counts2 = table(cape$sex, cape$race)
counts2
##    
##      1  2  3
##   1 19 18 12
##   2 24 15 12
## Produce stacked barplot
barplot(counts2, col=c("lightpink","lavender"),main="Gender of Participants By Race",xlab="Race", names.arg=c("White","Black","People of Colour"))

# Add legend
legend("topright",c("Male","Female"),fill=c("lightpink","lavender"),title="Gender")

Summary: In terms of gender across race, the amount of men and women in the sample was …

  • Next consider the age of people.

This is an analysis of 1 quantitative variable. We will analyse the data using a histogram and boxplots.

## Select 1 quantitative variable
age = cape$age

## Produce histogram
hist(age, col = "lightyellow", border = "black", main = "Histogram of Age")

# Produce boxplot
boxplot(age, col = "lightblue", medcol = "red", main = "Ages of Participants")

# Experiment with customising
boxplot(age, col="lightblue",horizontal=T,main="Ages of Participants")

Mean Age of Participants:

mean(age)
## [1] 34.36

Range of Participants:

range(age)
## [1] 18 75

Summary: Participants were predominantly aged between 20 and 30 years old.The mean age of participants in this data set is 34.36 years old. The youngest participant was 18 years old and the oldest was 75 years old. This demonstrates that the data has been collected from a random sample including a large variety of people.

  • Now consider the age of people divided by gender.

This is an analysis of 1 quantitative variable facetted by 1 qualitative variable.

# Produce 2 comparative boxplots
boxplot(age~sex, col=c("lightblue","lightgreen"),horizontal=T,main="Age of people surveyed, by gender")
legend("topright",c("Male","Female"),fill=c("lightblue","lightgreen"),title="Gender")

Summary: The age of people surveyed by gender was …

Overall Summary: The type of people in the full survey was …


2.4 Was there any difference between the type of people surveyed before and after the shark attack?

  • First consider the size of the 2 survey groups (before and after shark attack).
sharkbite = cape$sharkbite
counts3=table(sharkbite)
counts3
## sharkbite
##  1  2 
## 50 50

Summary: The size of the 2 survey groups was …

  • Next consider the age of the people in the 2 survey groups.
boxplot(age~sharkbite, col=c("lightblue","lightgreen"),horizontal=T,main="Age of people surveyed, for the 2 survey periods")
legend("bottomright",c("Before Attack","After Attack"),fill=c("lightblue","lightgreen"),title="Time of Survey")

Summary: Comparing the 2 survey groups, the age of peoole was …

Overall Summary:


2.5 How often did the people surveyed visit the beach?

  • First consider the beach attendance of people.
## Select the variable
beachattend = cape$beachattend

## Produce frequency table
counts4 = table(beachattend)
counts4
## beachattend
##  1  2  3 
## 30 20 50
# Produce barplot
barplot(counts4, col = "lightblue", names.arg = c("More than once a week", "Once a week", "Once a month or less"), main="Beach Attendance of people surveyed")

Overall Summary: The people surveyed …


2.6 Did people’s attitude to sharks change after the shark attack?

  • First consider how people’s pride about their local sharks is recorded.

Note that the ‘Sharks’ variable is a number 1-10, but represents 3 levels of pride: (1-3) Little Pride, (4-6) Average Pride, (7-10) A lot of Pride.

So first we use a bit of data wrangling (extension) to replace the number by the level.

# This is a function converting the number to level
ClassPride= function(x)
{
  y = x
  y[x %in% 1:3] = "Little Pride"
  y[x %in% 4:6] = "Average Pride"
  y[x %in% 7:10] = "A lot of Pride"
  return(y)
}

# We apply the function to the Shark variable, creating a new variable SharkPride
SharkPride = ClassPride(cape$Sharks)

Summarise the 2 variables now:

table(cape$Sharks)
## 
##  1  2  3  4  5  6  7  8  9 10 
## 20  7  9  1  6  9  3  8  6 31
table(SharkPride)
## SharkPride
## A lot of Pride  Average Pride   Little Pride 
##             48             16             36

Now replace the values of the new variable (SharkPride) in the old variable (Sharks):

## Create a new dataframe by replacing Sharks with the levels in SharkPride
cape2 = cape
cape2[,"Sharks"] = SharkPride

# Check the new data
dim(cape2)
## [1] 100  19
head(cape2)
##   id sex age race beachattend Seals Dolphins         Sharks Lifsavers NSRI
## 1  1   1  26    1           3    10       10 A lot of Pride         5    6
## 2  2   2  25    1           1    10       10 A lot of Pride        NA   NA
## 3  3   1  40    2           3     6        6  Average Pride         4   NA
## 4  4   1  60    1           3    10       10 A lot of Pride         7   10
## 5  5   2  23    2           1    10       10 A lot of Pride         7   10
## 6  6   1  61    1           1    10       10 A lot of Pride         9   10
##   SSpotters beach sharkbite SealPride2 SharkPride2 SharkSpot2 Lifesavers2
## 1         5     1         1          3           3          2           2
## 2         7     1         1          3           3          3          NA
## 3         5     1         1          3           3          2           2
## 4         7     1         1          3           3          3           3
## 5         7     1         1          3           3          3           3
## 6         9     1         1          3           3          3           3
##   NSRI2 Dolphin2
## 1     3        3
## 2    NA        3
## 3    NA        3
## 4     3        3
## 5     3        3
## 6     3        3
  • Next analyse the ‘SharkPride’ variable facvtted by ‘sharkbite’
## Produce contingency table of 2 qualitative variables
counts5 = table(cape2$Sharks, cape2$sharkbite)
counts5
##                 
##                   1  2
##   A lot of Pride 26 22
##   Average Pride   5 11
##   Little Pride   19 17
## Produce stacked barplot
barplot(counts5, col=c("lightblue","lightgreen","lightyellow"),main="Pride in sharks before and after shark attack",xlab="Time of survey", names.arg=c("Before","After"))

# Add legend
legend("topright",c("Little pride","Average pride", "A lot of pride"),fill=c("lightblue","lightgreen", "lightyellow"),title="Shark pride")

Summary: The effect of shark attack on shark pride appears to be …


2.7 Is there a difference in the level of confidence in local beach safety measures offered by surf lifesavers before and after the shark attack?

  • First consider how people’s confidence in the surf lifesavers is recorded.

Note that the ‘Lifesavers’ variable again is a number 1-10 which represents 3 levels of confidence in beauty safty measures:(1-3) Not Confident, (4-6) Somewhat Confident, (7-10) Very Confident

So again we use a bit of data wrangling (extension) to replace the number by the level.

ClassConfidence =  function(x)
  {
    y = x
    y[x %in% 1:3] = "Not Confident"
    y[x %in% 4:6] = "Somewhat Confident"
    y[x %in% 7:10] = "Very Confident"
    return(y)
}

# We apply the function to the Lifesavers variable, creating a new variable LifesaversConfidence
LifesaversConf = ClassConfidence(cape2$Lifsavers)

Now replace the new variable (LifesaversConf) in the old variable (Lifsavers):

## Create a new dataframe by replacing Lifesavers with the levels in LifesaversConf
cape3 = cape2
cape3[,"Lifsavers"] = LifesaversConf

# Check the new data
dim(cape3)
## [1] 100  19
head(cape3)
##   id sex age race beachattend Seals Dolphins         Sharks
## 1  1   1  26    1           3    10       10 A lot of Pride
## 2  2   2  25    1           1    10       10 A lot of Pride
## 3  3   1  40    2           3     6        6  Average Pride
## 4  4   1  60    1           3    10       10 A lot of Pride
## 5  5   2  23    2           1    10       10 A lot of Pride
## 6  6   1  61    1           1    10       10 A lot of Pride
##            Lifsavers NSRI SSpotters beach sharkbite SealPride2 SharkPride2
## 1 Somewhat Confident    6         5     1         1          3           3
## 2               <NA>   NA         7     1         1          3           3
## 3 Somewhat Confident   NA         5     1         1          3           3
## 4     Very Confident   10         7     1         1          3           3
## 5     Very Confident   10         7     1         1          3           3
## 6     Very Confident   10         9     1         1          3           3
##   SharkSpot2 Lifesavers2 NSRI2 Dolphin2
## 1          2           2     3        3
## 2          3          NA    NA        3
## 3          2           2    NA        3
## 4          3           3     3        3
## 5          3           3     3        3
## 6          3           3     3        3
  • Next analyse the LifesaversConf variable by time of survey.
## Produce contingency table of 2 qualitative variables
counts6 = table(cape3$Lifsavers, cape3$sharkbite)
counts6
##                     
##                       1  2
##   Not Confident       2  2
##   Somewhat Confident 14 13
##   Very Confident     33 34
## Produce stacked barplot
barplot(counts6, col=c("lightblue","lightgreen","lightyellow"),main="Confidence in life savers before and after shark attack",xlab="Time of survey", names.arg=c("Before","After"))

# Add legend
legend("topright",c("Not confident","Somewhat confident", "very confident"),fill=c("lightblue","lightgreen", "lightyellow"),title="Confidence in Lifesavers")

Summary: