R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. R code chunks look like this:

# Your code goes here.

Basic R exercises

Once you have completed all of the exercises below, use the Knit button to generate an html file with all of your code and output. Then submit both the html file and this R Markdown file on Canvas.

Question 1

What is the sex ratio at birth of red deer? You have data from six deer mothers on the sex of all their offspring, as follows:
Deer 1: 3 female, 2 male
Deer 2: 2 female, 5 male
Deer 3: 1 female, 3 male
Deer 4: 5 female, 4 male
Deer 5: 4 female, 1 male
Deer 6: 3 female, 2 male
Use the prompts in the code chunk below to write a script to analyze these data. For each comment, write the corresponding code immediately below, as in the first example.

# Store the data in two vectors, one for males and one for females.
male <- c(2, 5, 3, 4, 1, 2)
female <- c(3, 2, 1, 5, 4, 3)

# Calculate the sex ratio for Deer 1. That is, what proportion of her offspring are female?
female[1] / (male[1] + female[1])
## [1] 0.6
# Calculate the sex ratio for all of the deer mothers. Do this in one single R command; do not separately calculate the ratio for each mother.
female[1:6] / (male[1:6] + female[1:6])
## [1] 0.6000000 0.2857143 0.2500000 0.5555556 0.8000000 0.6000000
# Calculate the average sex ratio across the six mothers.
ratio.data <- c(female/(male + female))
mean (ratio.data)
## [1] 0.5152116

Do red deer appear to have an equal sex ratio at birth?
Yes, they do, since the ratio is 0.5152116, which is close to 0.5, which is a perfectly equal sex ratio.

Question 2

Body mass index (BMI) is a measure of body weight designed to account for differences in height. It is equal to weight divided by height squared, with weight measured in kilograms and height in meters. You have data on the height and weight of ten people, as follows:
167 cm, 64 kg
175 cm, 72 kg
180 cm, 73 kg
155 cm, 65 kg
173 cm, 75 kg
185 cm, 74 kg
181 cm, 82 kg
163 cm, 69 kg
179 cm, 79 kg
170 cm, 72 kg
Put these data in an Excel spreadsheet and save it as a CSV file (comma separated values). Put the data in two columns, one for height and one for weight. Put variable labels in the first row of each column (i.e., ‘height’ and ‘weight’).
Use the prompts in the code chunk below to write a script to analyze these data.

# Show the current working directory.
getwd()
## [1] "/Users/ashley/Downloads"
# Change the working directory to where the CSV file is located. You can do this either with the setwd() function or by using RStudio's Session menu.
setwd("/Users/ashley/Desktop")

# Clear the workspace of any previously defined variables.
rm(list=ls())

# Read the data into a data frame.
BMI.data <- read.csv("Module1-Q2.csv")

# Show the size of the data frame.
dim(BMI.data)
## [1] 10  2
# Show the names of the data frame's variables.
names(BMI.data)
## [1] "height" "weight"
# Calculate the average weight of the ten people.
mean(BMI.data$weight)
## [1] 72.5
# Calculate the BMI of each person and store it in the data frame. Do this with one single command; do not separately calculate the BMI for each person. Don't forget that BMI expects height in meters, not centimeters.
calc_BMI <- BMI.data$weight/((BMI.data$height/100)^2)

# Make a scatterplot of BMI vs. weight. Be sure to label the plot axes.
plot(calc_BMI~BMI.data$weight,
     xlab = "weight (kg)",
     ylab = "BMI (kg/m^2)",
     main = "Plot of BMI versus Weight",
     )

Does BMI appear to depend on weight?
No, there does not appear to be a trend of BMI with weight.

Question 3

How fast does the concentration of a toxin in the bloodstream decrease? A typical pattern is that the concentration decreases by a fixed proportion each unit time (e.g., it goes down by half every 2 hours). File toxin.csv contains data on the concentration (in parts per million) of a toxin in the bloodstream of a rat, measured every hour for eight hours. Use the prompts in the code chunk below to write a script to analyze these data.

# Read the data into a data frame.
toxin.data <- read.csv ("toxin.csv")

# Plot toxin concentration over time.
plot(toxin.data$concentration~toxin.data$time,
     xlab = "time (hours)",
     ylab = "concentration (ppm)",
     main = "Toxin Concentration Over Time",
     )

# Plot the logarithm of concentration over time.
toxin.data$concentration.log <- log(toxin.data$concentration)
plot(toxin.data$concentration.log~toxin.data$time,
     xlab = "time (hours)",
     ylab = "log(concentration (ppm) )",
     main = "Log of Toxin Concentration Over Time",
     )

How do the two plots compare?
The first plot shows a first order logarithmic decay plot that when the logarithm of the concentration is taken, becomes a linear trend.

Question 4

How do the weights of male and female monitor lizards compare? File lizards.csv contains the weights of ten male and ten female lizards (in kilograms). Use the prompts in the code chunk below to write a script to analyze these data.

# Read the data into a data frame.
lizards.data <- read.csv ("lizards.csv")

# Calculate the average weight of males and the average weight of females.
male.data <- subset(lizards.data, lizards.data$sex == "male")
female.data <- subset(lizards.data, lizards.data$sex == "female")
male.avg <- mean(male.data$weight)
female.avg <-mean(female.data$weight)
averages <- c(male.avg, female.avg)

# Make a bar plot showing the average weight of each sex.
barplot(averages,
        xlab = "Sex",
        ylab = "Average Weight (kg)",
        names.arg = c("Male", "Female"),
        main = "Comparison of Lizard Average Weight by Sex",
        )

Does one sex seem bigger?
Yes, females seem bigger.

Question 5

Can lion age be told by the amount of black pigmentation on the nose? File lions.csv contains the age (in years) and proportion of black pigmentation on the nose for 20 male lions. Use the prompts in the code chunk below to write a script to analyze these data.

# Read the data into a data frame.
lions.data <- read.csv("lions.csv")

# Make a scatter plot of the relation between age and proportion of black pigmentation on the nose
plot(lions.data$black~lions.data$age,
     ylab = "proportion black pigmentation",
     xlab = "age (years)",
     main = "Relationship between Lion Age and Proportion of Nose Black Pigmentation",
     )

Based on this plot, what can you say about the usefulness of nose pigmentation for estimating the age of male lions?
There appears to be an upward trend of increased proportion of nose black pigmentation as lions get older.

Normal probability exercises

Please watch the video lectures about the normal distribution before doing these exercises! Also, please first watch the tutorial video about working with normal distributions in R.

Question 6

In many animals, island populations diverge in size from mainland populations of the same species. You are interested in finding out if this is the case for monitor lizards, which are widespread on the Australian continent and surrounding islands. Based on earlier studies, you believe that the mean and variance for the lengths of adult mainland lizards are as follows: Mean = 100 cm, Variance = 400. You also make the reasonable assumption that length is a normal random variable. Before collecting any data on the Kangaroo Island lizards, you will use this prior knowledge to say something about their expected probability distribution. Use the prompts in the code chunk below to write a script that plots the PDF and the CDF.

# Plot the probability density function of lizard lengths. The plot must include informative axis titles and an overall title.
# Enter parameter values.
mu <- 100
variance <- 400
stdev <- sqrt(variance)
# Set the range of X values.
x <- seq(mu - 3*stdev, mu + 3*stdev, by= 0.01)

# Calculate probability densities.
pdf <- dnorm (x, mu, stdev)
# Plot the PDF.
plot(pdf ~ x, 
     type = "l",
     xlab = "Adult Lizard Lengths (cm)",
     ylab = "Probability Density",
     main = "Probability Density Function of Adult Lizard Lengths",
     )

# Plot the cumulative distribution function of lizard lengths, again including informative axis titles and an overall title.
# Calculate probabilities.
cdf <- pnorm(x,mu,stdev)
# Plot the CDF.
plot(cdf~x,
     type ="l",
     xlab = "Adult Lizard Length (cm)",
     ylab = "Cumulative Probability",
     main = "Cumulative Distribution Function of Lizard Lengths",
     )

Question 7

Write another script to answer the following questions about what you expect to find when you measure Kangaroo island lizards, assuming that they follow the same probability distribution as mainland lizards. Inspect the plots you made in 1 to check if your answers make sense.

#   What is the probability density for a length of 75 cm?
0 # the width here is zero, so i dont believe there is a script to be run when the multiplier is zero / since area under curve is zero? only intervals will have probability greater than zero. 
## [1] 0
#   What is the probability that a lizard will be...
# Less than or equal to 75 cm?
pnorm (75,mu,stdev) # 0.106 - this makes sense, since it is the area under the CDF from that point on 
## [1] 0.1056498
#   Greater than 120 cm?
1 - pnorm(120,mu,stdev) #0.159; this makes sense, since in the CDF, 120 is above 0.8 on the y-axis. 
## [1] 0.1586553
#   Between 95 and 115cm?
pnorm(115,mu,stdev) - pnorm(95,mu,stdev) # 0.372; this makes sense this is the bulk of the measurements and clse the mean of 100
## [1] 0.372079
#   At least 40 cm different from the mean, in either direction?
pnorm(mu-40,mu,stdev) #0.0227
## [1] 0.02275013
1-pnorm(mu+40,mu,stdev) #0.0227 - this was a clever question to remind us of the 2-sigma/95% rule, since the stdev was 20 and 40 is two sigma; pnorm is used when looking at values below mean, 1 - pnorm is used when looking at values above mean; since symmetric and normally distributed, ~ 5% is present total outside of the 2-sigma range, which is shown by these calculations. 
## [1] 0.02275013
#   Further than 0.7 standard deviations below the mean?
#can just use Z because it is a standard normal distribution for pnorm
pnorm(-0.7) #0.242; this was the same as when i calculated it using pnorm mu-0.7*stdev with mu and stdev define. 
## [1] 0.2419637
# Closer than 1.3 standard deviations to the mean?
pnorm(1.3) - pnorm(0) #0.4032; for a range, subtract the two items; for stdev, can just use Z for standard normal distribution
## [1] 0.4031995
#   Further than 1.5 standard deviations from the mean?
1-pnorm(1.5) #0.0668 (this makes sense, as you are approaching 2-sigma/95%)
## [1] 0.0668072
#   What are the quartiles of the distribution? That is, what are the 0.25, 0.5, and 0.75 quantiles?
qnorm(0.25) #-0.674 standard deviations
## [1] -0.6744898
qnorm(0.5) #0 standard deviations (this makes sense for a standard normal distribution)
## [1] 0
qnorm(0.75) #0.675 standard deviations (equivalent magnitude to 0.25, which makes sense for standard normal distribution)
## [1] 0.6744898
#   2/3 of observations are expected to lie below what value?
qnorm(2/3,mu,stdev)# 108.6145 cm
## [1] 108.6145
qnorm(2/3) #0.431 standard deviations above mean, which is 100 + 8.62 = 108.62
## [1] 0.4307273
#   80% of observations are expected to lie above what value?
#if 80 percent are above, 20 percent are below
qnorm(0.2,mu,stdev) #83.16758 cm
## [1] 83.16758
qnorm(0.2) #-0.8416212*stdev(20)=-16.83+100=83.16758 cm
## [1] -0.8416212

Question 8

Britain’s domestic intelligence service MI5 places an upper limit on the height of its spies, on the assumption that people who are too tall do not blend in well with the crowd. To be a spy, men must be no taller than 180 cm (~5 feet 11 inches) and women no taller than 173 cm (~5 feet 8 inches). Write a script to answer the questions in the code chunk below.

# If the mean height of British men is 177 cm, with a standard deviation of 7.1 cm, what proportion of British men are excluded from being spies by this height restriction? Assume that height follows a normal distribution.
MImu <- 177
MIstdev <- 7.1
rm(MIvariance)
## Warning in rm(MIvariance): object 'MIvariance' not found
1-pnorm(180,MImu,MIstdev) #33.63172% are excluded
## [1] 0.3363172
MIx <- seq(MImu - 3*MIstdev, MImu + 3*MIstdev, by = 0.01)
MIpdf <- dnorm(MIx, MImu, MIstdev)
plot(MIpdf~MIx,
     type ="l",
     xlab ='height(cm)',
     ylab ='probability density',
     )

MIcdf <- pnorm(MIx, MImu, MIstdev)
plot(MIcdf~MIx,
     type = 'l',
     xlab = 'height(cm)',
     ylab = 'cumulative probability',
     )

# The mean height of British women is 163.3 cm, with a standard deviation of 6.4 cm. Assuming a normal distribution of female height, what fraction of women meet MI5’s height standard?

MIWmu <- 163.3
rm(MiWvariance)
## Warning in rm(MiWvariance): object 'MiWvariance' not found
MIWstdev <- 6.4
pnorm(173,MIWmu,MIWstdev) #93.51929% of women meet the standard. 
## [1] 0.9351929
MIWx <-seq(MIWmu -3*MIWstdev, MIWmu +3*MIWstdev, by = 0.01)
MIWpdf <- dnorm(MIWx, MIWmu, MIWstdev)
plot(MIWpdf~MIWx,
        type='l',
        xlab = 'height(cm)',
        ylab = 'probability density',
        )

# Imagine that MI5 wants to change its maximum height for female spies. Its goal is to exclude the same proportion of women as men. What should the new maximum height for women be?

qnorm((1-0.3363172),MIWmu,MIWstdev) #166.0042 would exclude 33.6%
## [1] 166.0042
# Sean Connery, the original James Bond, was 183 cm tall. By how many standard deviations did he exceed the height limit for spies?

(183 - 180)/MIstdev #0.4225352 standard deviations. Height limit was 180; he was 183. Height limit is not a function of the distribution, so it didn't make sense to answer in terms of another calculation /function. 
## [1] 0.4225352