hapsur <- read.table("http://nathanieldphillips.com/wp-content/uploads/2016/04/hapsur.txt")
In this WPA, you’ll analyze data from a survey about happiness. In the survey, 500 people were asked the following questions:
age - How old are you
sex - What is your sex?
exercise - How many hours a week do you exercise on average?
relationship - Are you in a long-term romantic relationship?
drinks - How many alcoholic drinks d you consume a week on average
happiness - On a scale of 0 to 100, how happy are you with your life?
A. You’ll need to use the yarrr package in this WPA. It has been updated in the past few days so you should reinstall it.
library(devtools)
install_github("ndphillips/yarrr")
B. Now that you’ve installed the latest version you’re good to go. However, as always you need to load the package before you can use it. Load the package with the library() function:
library(yarrr)
C. Create a new project called wpa.Rproj. You’ll use this project for this and all future wpas.
D. Navigate to the folder containing wpa.Rproj, and create two new folders called data and scripts. You’ll save all future datasets and wpas in these two folders.
E. Now it’s time to download the data for this WPA. You can find the data stored as a tab-delimited text file at http://nathanieldphillips.com/wp-content/uploads/2016/04/hapsur.txt. Download the file and save it to the data folder in your wpa project.
F. Open a new R script and save it the scripts folder under the name WPA5.
G. Using read.table(), load the happiness survey data into R as a new object called hapsur
Open the help menu for the histogram function
Create a boring histogram of the exercise data by only specifying the exercise data without any additional arguments.
Now, create a more interesting histogram by making the following changes
Now, create two separate histograms of the exercise data on top of each other. One for men and one for women.
The abline() function allows you to add lines to plots. For example:
Using abline(), add a line showing the mean of each of the two histograms. Feel free to change the color, width, and type of lines using additional arguments. As always, check the help menu for details.
Open the help menu for plot. Then open the help menu for par. Here you can see all the different arguments you can specify in plots.
Create a really boring scatterplot of exercise and happiness by only specifying the data without additional arguments
Now create a prettier scatterplot by making the following changes
You can easily add a regression line to a plot using a combination of abline(), and lm(). Add a regression line to your plot using the following code:
# Define the model
mod <- lm(happiness ~ exercise,
data = hapsur)
# Add model line to the plot
abline(reg = mod,
lty = 2)
Repeat the previous plot, but now create separate points for people in a relationship and those not in a relationship.
Open the help menu for barplot
Using aggregate(), calculate the mean number of drinks separately for men and women and assign the result to an object called drinks.agg
Using barplot(), create a barplot of the average drink data for men and women (using the drinks.agg object you just created)
Open the help menu for pirateplot
?pirateplot
Now create a pirateplot showing the relationship between sex and drinks
Change the look of the plot by changing the pal, point.pch, pal, point.o, and bean.o arguments
Create a pireateplot showing the relationship between sex AND relationship status on happiness. That is, include both independent variables sex and relationship in your plot. Try using the ‘basel’ or ‘google’ palette!
Create a pirateplot showing the relationship between drinks and happiness.
N.balloons <- 100 # Try 10, 100, 1000
my.palette <- "basel" # try "basel", "google", "nemo"
# x - locations of balloons
x.loc <- rnorm(N.balloons, mean = 100, sd = 15)
# y.loc - vertical locations of balloons
y.loc <- rnorm(N.balloons, mean = 100, sd = 10)
# size - size of the balloons
size <- runif(N.balloons, min = 0, max = 3)
# Set up the plotting space
# Remove margins
par(mar = c(0, 0, 0, 0))
plot(1,
xaxt = "n",
yaxt = "n",
bty = "n",
xlab = "",
ylab = "",
xlim = c(70, 130),
ylim = c(70, 130),
type = "n"
)
# Add Strings
segments(x0 = x.loc + rnorm(N.balloons, mean = 0, sd = .3),
y0 = y.loc - (size * 1.5),
x1 = x.loc,
y1 = y.loc,
col = transparent("black", 1 - size / 10),
lwd = size / 3
)
# Add balloons
points(x.loc,
y.loc,
cex = size, # Size of the balloons
pch = 21,
col = "white", # white border
bg = piratepal(palette = my.palette, trans = .5))
class.initials <- c("AA", "RA", "CB", "GB", "TB", "GaBr", "EE", "VF", "SF", "SH",
"WH", "SeHu", "LK", "TK", "LL", "SM", "JM", "LN", "KO", "NP",
"SR", "AS", "SS", "CS", "GS", "LS", "ShSt", "KS", "SaSt", "Bv",
"LW")
N.balloons <- length(class.initials)
my.palette <- "basel" # try "basel", "google", "nemo"
# x - locations of balloons
x.loc <- rnorm(N.balloons, mean = 100, sd = 15)
# y.loc - vertical locations of balloons
y.loc <- x.loc + rnorm(N.balloons, mean = 0, sd = 15)
# size - size of the balloons
size <- rexp(N.balloons, rate = .7)
# Set up the plotting space
# Regular margins
par(mar = c(5, 4, 4, 1))
plot(1,
xlab = "Drinks",
ylab = "Happiness",
main = "R Pirate drinks per day and Happiness",
xlim = c(60, 140),
ylim = c(60, 140),
type = "n")
mtext("Point size indicates love of R", line = .5)
# Add gray background and gridlines
rect(-1000, -1000, 1000, 1000, col = gray(.96))
abline(h = seq(60, 140, 5), lwd = c(.5, 1), col = "white",
v = seq(60, 140, 5))
# Add regression line
abline(lm(y.loc ~ x.loc,
data = as.data.frame(cbind(y.loc, x.loc))),
lty = 2)
# Add initials below each balloon string
text(x = x.loc,
y = y.loc - (size * 2),
labels = class.initials,
cex = size / 3,
pos = 1
)
# Add Strings
segments(x0 = x.loc + rnorm(N.balloons, mean = 0, sd = .3),
y0 = y.loc - (size * 1.5),
x1 = x.loc,
y1 = y.loc,
col = transparent("black", 1 - size / max(size)),
lwd = size / 5
)
# Add balloons
points(x.loc,
y.loc,
cex = size, # Size of the balloons
pch = 21,
col = "white", # white border
bg = piratepal(palette = my.palette, trans = .5))