This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. R code chunks look like this:
# This is a comment line, which we will often use as a question prompt.
# Do not change the name of the chunk because it is used when grading.
note <- "This is a line of R code that sets a variable to a value"
note # This line will print the value to the knitted document.
## [1] "This is a line of R code that sets a variable to a value"
We will also use multiple choice questions within this assignment.
You will use an X to indicate you choice, as shown
below.
Which of the following is a shade of red?
Gradescope is the website that we will be using to grade your RMarkdown exercises, and you will access it via Canvas. To submit, you will upload your RMarkdown file to Gradescope, and Gradescope will run your file and its output through many different unit tests to validate your code. Each unit test will check something specific about your code and if something fails, you will received a report about the failure. You may submit as many times as you want up until the deadline for the assignment.
There is a learning curve attached to Gradescope, but once you get the hang of using unit testing to fix your code, it will improve your productivity. Due to limitations of the unit tests, some unit tests will be difficult to pass on first submission, but in these cases the failure messages will help you fix your code to match what the unit test expects.
Unit testing is new to BIO 415/514 this semester. There may be errors with the assignments or the unit tests, please reach out to us with any issues or feedback. We have tried our best to make the unit test output understandable by novice R uses, if you have problems understanding your unit test reports, please reach out to us for help.
setwd() in your RMarkdown, as your local
directory structure will not match the server’s directory
structure.Once you have completed all of the exercises below, use the Knit button to generate an html file with all of your code and output. Then submit both the html file and this R Markdown file on Canvas.
What is the sex ratio at birth of red deer? You have data from six deer mothers on the sex of all their offspring, as follows:
Use the prompts in the code chunk below to write a script to analyze these data. For each comment, write the corresponding code immediately below, as in the first example.
# Store the data in two vectors, one for males and one for females.
male <- c(2, 5, 3, 4, 1, 2)
female <- c(3, 2, 1, 5, 4, 3)
# Calculate the sex ratio for Deer 1. That is, what proportion of her
# offspring are female?
female[1] / (male[1] + female[1])
## [1] 0.6
# Calculate the sex ratio for all of the deer mothers.
sex_ratio <- female / (female + male)
print (sex_ratio)
## [1] 0.6000000 0.2857143 0.2500000 0.5555556 0.8000000 0.6000000
# Do this in one single R command.
# Do not separately calculate the ratio for each mother.
# Calculate the average sex ratio across the six mothers.
average_sex_ratio <- mean(sex_ratio)
print(average_sex_ratio)
## [1] 0.5152116
Do red deer appear to have an equal sex ratio at birth?
Body mass index (BMI) is a measure of body weight designed to account for differences in height. It is equal to weight divided by height squared, with weight measured in kilograms and height in meters. You have data on the height and weight of ten people, as follows:
Normally, you would input these two columns of data into a spreadsheet via Excel or Google Sheets and save it as a CSV file (comma separated values) with appropriate column labels, e.g. “height” and “weight.” However, Gradescope will not have access to your local CSV unless you upload it alongside your script. Because this data is small enough, we will input create a data frame to hold it directly in R.
Use the prompts in the code chunk below to write a script to analyze this data.
# We can construct a data frame to hold the first three values
data.frame(
height = c(167, 175, 180),
weight = c(64, 72, 73)
)
## height weight
## 1 167 64
## 2 175 72
## 3 180 73
# Construct a data frame to hold all values of this data.
# Save it to a variable called `body_data`.
height <- c(167, 175, 180, 155, 173, 185, 181, 163, 179, 170)
weight <- c(64, 72, 73, 65, 75, 74, 82, 69, 79, 72)
body_data <- data.frame(height, weight)
# Show the size of the data frame.
dim(body_data)
## [1] 10 2
# Show the names of the data frame's variables.
names(body_data)
## [1] "height" "weight"
# Calculate the average weight of the ten people.
average_weight <- mean(body_data$weight)
print(average_weight)
## [1] 72.5
# Calculate the BMI of each person and store it in the data frame.
# Do this with one single command; do not separately calculate the BMI for each
# person. Don't forget that BMI expects height in meters, not centimeters.
body_data$bmi <- body_data$weight / ((body_data$height / 100)^2)
print(body_data)
## height weight bmi
## 1 167 64 22.94812
## 2 175 72 23.51020
## 3 180 73 22.53086
## 4 155 65 27.05515
## 5 173 75 25.05931
## 6 185 74 21.62162
## 7 181 82 25.02976
## 8 163 69 25.97012
## 9 179 79 24.65591
## 10 170 72 24.91349
# Make a scatterplot of weight vs BMI. Be sure to label the plot axes.
plot(body_data$weight, body_data$bmi,
xlab = "Weight (kg)", ylab = "Body mass index",
main = "Scatterplot of Weight vs. Body Mass Index")
Does BMI appear to depend on weight?
How fast does the concentration of a toxin in the bloodstream
decrease? A typical pattern is that the concentration decreases by a
fixed proportion each unit time (e.g., it goes down by half every 2
hours). File toxin.csv contains data on the concentration
(in parts per million) of a toxin in the bloodstream of a rat, measured
every hour for eight hours. Use the prompts in the code chunk below to
write a script to analyze these data.
# Read the data into a data frame called `toxin_data`.
toxin_data <- read.csv("toxin.csv")
print(toxin_data)
## concentration time
## 1 112.0 1
## 2 53.0 2
## 3 22.0 3
## 4 9.0 4
## 5 3.0 5
## 6 1.0 6
## 7 0.6 7
## 8 0.2 8
str(toxin_data)
## 'data.frame': 8 obs. of 2 variables:
## $ concentration: num 112 53 22 9 3 1 0.6 0.2
## $ time : int 1 2 3 4 5 6 7 8
# Plot toxin concentration over time with appropriate axis labels.
plot(toxin_data$time,
log(toxin_data$concentration),
xlab = "Time (hours)",
ylab = "Toxin concentration (ppm)",
main = "Toxin Concentration Over Time")
# Plot the logarithm of concentration over time with appropriate axis labels.
plot(toxin_data$time, log(toxin_data$concentration),
xlab = "Time (hours)",
ylab = "Log toxin concentration (ppm)",
main = "Log of Toxin Concentration Over Time")
How do the two plots compare?
How do the weights of male and female monitor lizards compare? File
lizards.csv contains the weights of ten male and ten female
lizards (in kilograms). Use the prompts in the code chunk below to write
a script to analyze these data.
# Read the data into a data frame called `lizard_data`.
lizard_data <- read.csv("lizards.csv")
print(lizard_data)
## sex weight
## 1 male 0.8
## 2 male 1.0
## 3 male 0.7
## 4 male 0.6
## 5 male 1.2
## 6 male 0.8
## 7 male 0.8
## 8 male 1.3
## 9 male 1.3
## 10 male 0.6
## 11 female 1.0
## 12 female 1.4
## 13 female 1.5
## 14 female 1.6
## 15 female 1.3
## 16 female 1.3
## 17 female 1.1
## 18 female 1.2
## 19 female 1.3
## 20 female 1.5
# Calculate the average weight of males and the average weight of females.
# Create a vector called `averages` that contains the male average and the
# female average in that order.
# Hint: You can manually create a vector using the `c()` function:
# `c(some_thing, other_thing)`.
male_avg <- mean(lizard_data$weight[lizard_data$sex == "male"])
female_avg <- mean(lizard_data$weight[lizard_data$sex == "female"])
averages <- c(male_avg, female_avg)
print(averages)
## [1] 0.91 1.32
# Make a bar plot showing the average weight of each sex with appropriate axis
# labels, bar labels, and main title.
barplot(averages, names.arg = c("Male", "Female"),
xlab = "Sex",
ylab = "Weight (g)",
main = "Average weights of male and female lizards")
Does one sex seem bigger?
Can lion age be told by the amount of black pigmentation on the nose?
File lions.csv contains the age (in years) and proportion
of black pigmentation on the nose for 20 male lions. Use the prompts in
the code chunk below to write a script to analyze these data.
# Read the data into a data frame called `lion_data`.
lion_data <- read.csv("lions.csv")
print(lion_data)
## age black
## 1 2.3 0.03
## 2 3.5 0.13
## 3 2.0 0.37
## 4 11.0 0.85
## 5 10.6 1.00
## 6 6.5 0.63
## 7 5.7 0.55
## 8 1.1 0.26
## 9 0.6 0.04
## 10 12.7 0.96
## 11 11.3 0.91
## 12 8.8 0.47
## 13 10.9 0.65
## 14 11.1 0.70
## 15 7.6 0.51
# Make a scatter plot of the relation between age and proportion of black
# pigmentation on the nose. Use appropriate axis labels.
plot(lion_data$age, lion_data$black_proportion,
xlab = "Age (years)",
ylab = "Proportion of black pigmentation on nose",
main = "Age vs. Proportion of Black Pigmentation")
Based on this plot, what can you say about the usefulness of nose pigmentation for estimating the age of male lions?
Please watch the video lectures about the normal distribution before doing these exercises! Also, please first watch the tutorial video about working with normal distributions in R.
In many animals, island populations diverge in size from mainland populations of the same species. You are interested in finding out if this is the case for monitor lizards, which are widespread on the Australian continent and surrounding islands. Based on earlier studies, you believe that the mean and variance for the lengths of adult mainland lizards are as follows: Mean = 100 cm, Variance = 400. You also make the reasonable assumption that length is a normal random variable. Before collecting any data on the Kangaroo Island lizards, you will use this prior knowledge to say something about their expected probability distribution. Use the prompts in the code chunk below to write a script that plots the PDF and the CDF.
We will create a plots of the probability and cumulative density functions of lizard lengths. The plots will include informative axis titles and an overall titles.
# First enter parameter values using the variables `lizard_mu` and `lizard_var`.
lizard_mu <- 100
lizard_var <- 400
lizard_sd <- sqrt(lizard_var)
# Create a vector of X values centered on lizard_mu and covering 6 standard
# deviations. Save this vector into a variable called `lizard_x`.
lizard_x <- seq(lizard_mu - 3 * lizard_sd,
lizard_mu + 3 * lizard_sd, length = 100)
# Using `lizard_x`, plot a line graph of the PDF for lizard lengths.
# Include informative axis titles and an overall title.
plot(lizard_x, dnorm(lizard_x, mean = lizard_mu, sd = lizard_sd),
type = "l",
xlab = "Length (cm)",
ylab = "Probability density",
main = "Probability Density Function of Lizard Lengths")
# Using `lizard_x`, plot a line graph of the CDF for lizard lengths.
# Include informative axis titles and an overall title.
plot(lizard_x, pnorm(lizard_x, mean = lizard_mu, sd = lizard_sd),
type = "l",
xlab = "Length (cm)",
ylab = "Cumulative probability",
main = "Cumulative Probability Function of Lizard Lengths")
Write another script to answer the following questions about what you expect to find when you measure Kangaroo island lizards, assuming that they follow the same probability distribution as mainland lizards. Inspect the plots you made in 1 to check if your answers make sense.
# What is the probability density for a length of 75 cm?
dnorm(75, mean = lizard_mu, sd = lizard_sd)
## [1] 0.009132454
# What is the probability that a lizard will be...
# Less than or equal to 75 cm?
pnorm(75, mean = lizard_mu, sd = lizard_sd)
## [1] 0.1056498
# What is the probability that a lizard will be...
# Greater than 120 cm?
1 - pnorm(120, mean = lizard_mu, sd = lizard_sd)
## [1] 0.1586553
# What is the probability that a lizard will be...
# Between 95 and 115cm?
pnorm(115, mean = lizard_mu, sd = lizard_sd) - pnorm(95, mean = lizard_mu, sd = lizard_sd)
## [1] 0.372079
# What is the probability that a lizard will be...
# At least 40 cm different from the mean, in either direction?
pnorm(60, mean = lizard_mu, sd = lizard_sd) + (1 - pnorm(140, mean = lizard_mu, sd = lizard_sd))
## [1] 0.04550026
# What is the probability that a lizard will be...
# Further than 0.7 standard deviations below the mean?
cutoff <- lizard_mu - 0.7 * lizard_sd
pnorm(cutoff, mean = lizard_mu, sd = lizard_sd)
## [1] 0.2419637
# What is the probability that a lizard will be...
# Closer than 1.3 standard deviations to the mean?
lower <- lizard_mu - 1.3 * lizard_sd
upper <- lizard_mu + 1.3 * lizard_sd
pnorm(upper, lizard_mu, lizard_sd) - pnorm(lower, lizard_mu, lizard_sd)
## [1] 0.806399
# What is the probability that a lizard will be...
# Further than 1.5 standard deviations from the mean?
lower <- lizard_mu - 1.5 * lizard_sd
upper <- lizard_mu + 1.5 * lizard_sd
pnorm(lower, lizard_mu, lizard_sd) + (1 - pnorm(upper, lizard_mu, lizard_sd))
## [1] 0.1336144
# What are the quartiles of the distribution? That is, what are the 0.25, 0.5,
# and 0.75 quantiles? Return the values as a vector of quartiles.
# Hint: You can manually create a vector using the `c()` function:
# `c(some_thing, other_thing)`.
quartiles <- qnorm(c(0.25, 0.5, 0.75), mean = lizard_mu, sd = lizard_sd)
print(quartiles)
## [1] 86.5102 100.0000 113.4898
# 2/3 of observations are expected to lie below what value?
value_2thirds <- round(qnorm(2/3, mean = lizard_mu, sd = lizard_sd))
print(value_2thirds)
## [1] 109
# 80% of observations are expected to lie above what value?
qnorm(0.2, mean = lizard_mu, sd = lizard_sd)
## [1] 83.16758
Britain’s domestic intelligence service MI5 places an upper limit on the height of its spies, on the assumption that people who are too tall do not blend in well with the crowd. To be a spy, men must be no taller than 180 cm (~5 feet 11 inches) and women no taller than 173 cm (~5 feet 8 inches). Write a script to answer the questions in the code chunk below.
# If the mean height of British men is 177 cm, with a standard deviation of
# 7.1 cm, what proportion of British men are excluded from being spies by this
# height restriction? Assume that height follows a normal distribution.
1 - pnorm(180, mean = 177, sd = 7.1)
## [1] 0.3363172
# The mean height of British women is 163.3 cm, with a standard deviation of
# 6.4 cm. Assuming a normal distribution of female height, what fraction of
# women meet MI5's height standard?
pnorm(173, mean = 163.3, sd = 6.4)
## [1] 0.9351929
# Imagine that MI5 wants to change its maximum height for female spies. Its
# goal is to exclude the same proportion of women as men. What should the new
# maximum height for women be?
male_cutoff <- 1 - pnorm(180, mean = 177, sd = 7.1)
qnorm(1 - male_cutoff, mean = 163.3, sd = 6.4)
## [1] 166.0042
# Sean Connery, the original James Bond, was 183 cm tall. By how many standard
# deviations did he exceed the height limit for spies?
(183 - 180) / 7.1
## [1] 0.4225352