# Load Data direct from html
cape = read.csv("http://www.maths.usyd.edu.au/u/UG/OL/OLEO1631/r/livelabs/shark_bites_attitudes/CapeTown021012.csv",na.strings="#NULL!")
# Quick look at top 5 rows of data
head(cape)
## id sex age race beachattend Seals Dolphins Sharks Lifsavers NSRI
## 1 1 1 26 1 3 10 10 10 5 6
## 2 2 2 25 1 1 10 10 10 NA NA
## 3 3 1 40 2 3 6 6 6 4 NA
## 4 4 1 60 1 3 10 10 10 7 10
## 5 5 2 23 2 1 10 10 10 7 10
## 6 6 1 61 1 1 10 10 10 9 10
## SSpotters beach sharkbite SealPride2 SharkPride2 SharkSpot2 Lifesavers2
## 1 5 1 1 3 3 2 2
## 2 7 1 1 3 3 3 NA
## 3 5 1 1 3 3 2 2
## 4 7 1 1 3 3 3 3
## 5 7 1 1 3 3 3 3
## 6 9 1 1 3 3 3 3
## NSRI2 Dolphin2
## 1 3 3
## 2 NA 3
## 3 NA 3
## 4 3 3
## 5 3 3
## 6 3 3
## Size of data
dim(cape)
## [1] 100 19
## R's classification of data
class(cape)
## [1] "data.frame"
## R's classification of variables
str(cape)
## 'data.frame': 100 obs. of 19 variables:
## $ id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ sex : int 1 2 1 1 2 1 2 1 1 2 ...
## $ age : int 26 25 40 60 23 61 61 22 21 30 ...
## $ race : int 1 1 2 1 2 1 1 2 2 2 ...
## $ beachattend: int 3 1 3 3 1 1 1 3 1 1 ...
## $ Seals : int 10 10 6 10 10 10 10 8 6 1 ...
## $ Dolphins : int 10 10 6 10 10 10 10 4 8 1 ...
## $ Sharks : int 10 10 6 10 10 10 10 3 2 1 ...
## $ Lifsavers : int 5 NA 4 7 7 9 9 5 1 1 ...
## $ NSRI : int 6 NA NA 10 10 10 10 3 3 1 ...
## $ SSpotters : int 5 7 5 7 7 9 9 10 3 1 ...
## $ beach : int 1 1 1 1 1 1 1 1 1 1 ...
## $ sharkbite : int 1 1 1 1 1 1 1 1 1 1 ...
## $ SealPride2 : int 3 3 3 3 3 3 3 3 3 1 ...
## $ SharkPride2: int 3 3 3 3 3 3 3 1 1 1 ...
## $ SharkSpot2 : int 2 3 2 3 3 3 3 3 1 1 ...
## $ Lifesavers2: int 2 NA 2 3 3 3 3 2 1 1 ...
## $ NSRI2 : int 3 NA NA 3 3 3 3 1 1 1 ...
## $ Dolphin2 : int 3 3 3 3 3 3 3 2 3 1 ...
#sapply(cape, class)
Summary:
Other possible Research Questions include …
This is an analysis of 1 qualitative (categorical) variable.
## Select 1 qualitative variable
sex = cape$sex
## Produce frequency table
counts = table(sex)
counts
## sex
## 1 2
## 49 51
# Produce barplot
barplot(counts, col = c("light pink", "lavender"), names.arg = c("Male", "Female"), main = "Sex of Participants")
Summary: In terms of gender, the amount of men and women in the sample was almost an even split. 49% of people surveyed at the beach were male and 51% of participants were female.
This is called facetting (or filtering or dividing) 1 qualitative variable by a 2nd qualitative variable.
## Produce contingency table of 2 qualitative variables
counts2 = table(cape$sex, cape$race)
counts2
##
## 1 2 3
## 1 19 18 12
## 2 24 15 12
## Produce stacked barplot
barplot(counts2, col=c("lightpink","lavender"),main="Gender of Participants By Race",xlab="Race", names.arg=c("White","Black","People of Colour"))
# Add legend
legend("topright",c("Male","Female"),fill=c("lightpink","lavender"),title="Gender")
Summary: In terms of gender across race, the amount of men and women in the sample was …
This is an analysis of 1 quantitative variable. We will analyse the data using a histogram and boxplots.
## Select 1 quantitative variable
age = cape$age
## Produce histogram
hist(age, col = "lightyellow", border = "black", main = "Histogram of Age")
# Produce boxplot
boxplot(age, col = "lightblue", medcol = "red", main = "Ages of Participants")
# Experiment with customising
boxplot(age, col="lightblue",horizontal=T,main="Ages of Participants")
Mean Age of Participants:
mean(age)
## [1] 34.36
Range of Participants:
range(age)
## [1] 18 75
Summary: Participants were predominantly aged between 20 and 30 years old.The mean age of participants in this data set is 34.36 years old. The youngest participant was 18 years old and the oldest was 75 years old. This demonstrates that the data has been collected from a random sample including a large variety of people.
This is an analysis of 1 quantitative variable facetted by 1 qualitative variable.
# Produce 2 comparative boxplots
boxplot(age~sex, col=c("lightblue","lightgreen"),horizontal=T,main="Age of people surveyed, by gender")
legend("topright",c("Male","Female"),fill=c("lightblue","lightgreen"),title="Gender")
Summary: The age of people surveyed by gender was …
Overall Summary: The type of people in the full survey was …
sharkbite = cape$sharkbite
counts3=table(sharkbite)
counts3
## sharkbite
## 1 2
## 50 50
Summary: The size of the 2 survey groups was …
boxplot(age~sharkbite, col=c("lightblue","lightgreen"),horizontal=T,main="Age of people surveyed, for the 2 survey periods")
legend("bottomright",c("Before Attack","After Attack"),fill=c("lightblue","lightgreen"),title="Time of Survey")
Summary: Comparing the 2 survey groups, the age of peoole was …
Overall Summary:
## Select the variable
beachattend = cape$beachattend
## Produce frequency table
counts4 = table(beachattend)
counts4
## beachattend
## 1 2 3
## 30 20 50
# Produce barplot
barplot(counts4, col = "lightblue", names.arg = c("More than once a week", "Once a week", "Once a month or less"), main="Beach Attendance of people surveyed")
Overall Summary: The people surveyed …
Note that the ‘Sharks’ variable is a number 1-10, but represents 3 levels of pride: (1-3) Little Pride, (4-6) Average Pride, (7-10) A lot of Pride.
So first we use a bit of data wrangling (extension) to replace the number by the level.
# This is a function converting the number to level
ClassPride= function(x)
{
y = x
y[x %in% 1:3] = "Little Pride"
y[x %in% 4:6] = "Average Pride"
y[x %in% 7:10] = "A lot of Pride"
return(y)
}
# We apply the function to the Shark variable, creating a new variable SharkPride
SharkPride = ClassPride(cape$Sharks)
Summarise the 2 variables now:
table(cape$Sharks)
##
## 1 2 3 4 5 6 7 8 9 10
## 20 7 9 1 6 9 3 8 6 31
table(SharkPride)
## SharkPride
## A lot of Pride Average Pride Little Pride
## 48 16 36
Now replace the values of the new variable (SharkPride) in the old variable (Sharks):
## Create a new dataframe by replacing Sharks with the levels in SharkPride
cape2 = cape
cape2[,"Sharks"] = SharkPride
# Check the new data
dim(cape2)
## [1] 100 19
head(cape2)
## id sex age race beachattend Seals Dolphins Sharks Lifsavers NSRI
## 1 1 1 26 1 3 10 10 A lot of Pride 5 6
## 2 2 2 25 1 1 10 10 A lot of Pride NA NA
## 3 3 1 40 2 3 6 6 Average Pride 4 NA
## 4 4 1 60 1 3 10 10 A lot of Pride 7 10
## 5 5 2 23 2 1 10 10 A lot of Pride 7 10
## 6 6 1 61 1 1 10 10 A lot of Pride 9 10
## SSpotters beach sharkbite SealPride2 SharkPride2 SharkSpot2 Lifesavers2
## 1 5 1 1 3 3 2 2
## 2 7 1 1 3 3 3 NA
## 3 5 1 1 3 3 2 2
## 4 7 1 1 3 3 3 3
## 5 7 1 1 3 3 3 3
## 6 9 1 1 3 3 3 3
## NSRI2 Dolphin2
## 1 3 3
## 2 NA 3
## 3 NA 3
## 4 3 3
## 5 3 3
## 6 3 3
## Produce contingency table of 2 qualitative variables
counts5 = table(cape2$Sharks, cape2$sharkbite)
counts5
##
## 1 2
## A lot of Pride 26 22
## Average Pride 5 11
## Little Pride 19 17
## Produce stacked barplot
barplot(counts5, col=c("lightblue","lightgreen","lightyellow"),main="Pride in sharks before and after shark attack",xlab="Time of survey", names.arg=c("Before","After"))
# Add legend
legend("topright",c("Little pride","Average pride", "A lot of pride"),fill=c("lightblue","lightgreen", "lightyellow"),title="Shark pride")
Summary: The effect of shark attack on shark pride appears to be …
Note that the ‘Lifesavers’ variable again is a number 1-10 which represents 3 levels of confidence in beauty safty measures:(1-3) Not Confident, (4-6) Somewhat Confident, (7-10) Very Confident
So again we use a bit of data wrangling (extension) to replace the number by the level.
ClassConfidence = function(x)
{
y = x
y[x %in% 1:3] = "Not Confident"
y[x %in% 4:6] = "Somewhat Confident"
y[x %in% 7:10] = "Very Confident"
return(y)
}
# We apply the function to the Lifesavers variable, creating a new variable LifesaversConfidence
LifesaversConf = ClassConfidence(cape2$Lifsavers)
Now replace the new variable (LifesaversConf) in the old variable (Lifsavers):
## Create a new dataframe by replacing Lifesavers with the levels in LifesaversConf
cape3 = cape2
cape3[,"Lifsavers"] = LifesaversConf
# Check the new data
dim(cape3)
## [1] 100 19
head(cape3)
## id sex age race beachattend Seals Dolphins Sharks
## 1 1 1 26 1 3 10 10 A lot of Pride
## 2 2 2 25 1 1 10 10 A lot of Pride
## 3 3 1 40 2 3 6 6 Average Pride
## 4 4 1 60 1 3 10 10 A lot of Pride
## 5 5 2 23 2 1 10 10 A lot of Pride
## 6 6 1 61 1 1 10 10 A lot of Pride
## Lifsavers NSRI SSpotters beach sharkbite SealPride2 SharkPride2
## 1 Somewhat Confident 6 5 1 1 3 3
## 2 <NA> NA 7 1 1 3 3
## 3 Somewhat Confident NA 5 1 1 3 3
## 4 Very Confident 10 7 1 1 3 3
## 5 Very Confident 10 7 1 1 3 3
## 6 Very Confident 10 9 1 1 3 3
## SharkSpot2 Lifesavers2 NSRI2 Dolphin2
## 1 2 2 3 3
## 2 3 NA NA 3
## 3 2 2 NA 3
## 4 3 3 3 3
## 5 3 3 3 3
## 6 3 3 3 3
## Produce contingency table of 2 qualitative variables
counts6 = table(cape3$Lifsavers, cape3$sharkbite)
counts6
##
## 1 2
## Not Confident 2 2
## Somewhat Confident 14 13
## Very Confident 33 34
## Produce stacked barplot
barplot(counts6, col=c("lightblue","lightgreen","lightyellow"),main="Confidence in life savers before and after shark attack",xlab="Time of survey", names.arg=c("Before","After"))
# Add legend
legend("topright",c("Not confident","Somewhat confident", "very confident"),fill=c("lightblue","lightgreen", "lightyellow"),title="Confidence in Lifesavers")
Summary: