Elements of Programming: Loops, Functions & Conditionals

Heike Hofmann
Stat 579, Fall 2013

Homework

No homework this week - but make sure to look at the practice exams!

Please submit Homework #4 online to the Blackboard system.

Grading Scheme C/S/O - each area gives 3.5 points, you could potentially over-score

Best answers for homework #3 are now online

Outline

  • Functions
  • Conditional Clauses
  • (Loops)

Functions in R

  • Have been using functions a lot, now we want to write them ourselves!
  • Idea: avoid repetitive coding (errors will creep in, DRY principle)
  • Instead: extract common core, wrap it in a function, make it reusable

Basic Structure of a function

  • Name
  • Input arguments
    • names
    • default values
  • Body
  • Output values

A first function

mymean <- function(x) {
  return(sum(x)/length(x))
}

mymean(1:15)
[1] 8
mymean(c(1:15, NA))
[1] NA

We would like to be able to specify na.rm=TRUE to be able to ignore missing values

First extension

mymean <- function(x, na.rm) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute average then

  return(sum(x)/length(x))
}

mymean(1:15)
[1] 8
mymean(c(1:15, NA))
[1] NA

Conditionals

if (condition) {
  statement
} [else {   ## else is optional
  statement
}]
  • condition is a length one logical value, i.e. either TRUE or FALSE
  • use & and | to combine several conditions
  • ! negates condition

First extension of first example

mymean <- function(x, na.rm) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute average then
  if (na.rm == TRUE) x <- na.omit(x)

  return(sum(x)/length(x))
}
mymean(c(1:15, NA), na.rm=TRUE)
[1] 8

But a regular call without missing values doesn't work anymore. This is what we get for mymean(1:15)

Error in na.rm == TRUE : 'na.rm' is missing

We need to either specify na.rm for each call or use a default value

Function with default value

mymean <- function(x, na.rm=TRUE) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute average then
  if (na.rm == TRUE) x <- na.omit(x)

  return(sum(x)/length(x))
}

mymean(1:15)
[1] 8
mymean(c(1:15, NA))
[1] 8
mymean(c(1:15, NA), na.rm=T)
[1] 8

Function mymean

  • Name: mymean
  • Input arguments: x, na.rm=T
    • names,
    • default values
  • Body: if(na.rm) x <- na.omit(x)
  • Output values: return(sum(x)/length(x))

Successful function writing:

  • Start simple, then extend
  • Test out each step of the way
  • Don’t try too much at once

Your Turn

  • Similar to what we did with mean, write a function mysd that computes the standard deviation of variable x from scratch'. Include a parameter na.rm in it.

  • \( L(\lambda; x) = - n\lambda + \log(\lambda) \sum_i x_i \) is the log likelihood function of a Poisson variable \( X \) with parameter \( \lambda > 0 \). Write a function loglikpois with parameters lambda and (vector) x, that computes the log likelihood value for \( \lambda \) given observations \( x \). Make sure to check that lambda is positive, return an error message (stop) if not. Plot the likelihood values for lambda between 0 and 10 for x = c(1, 3, 2, 3, 0, 1, 0, 1, 3, 3)

mysd <- function(x, na.rm=TRUE) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute standard deviation then
  if (na.rm == TRUE) x <- na.omit(x)
  if (length(x) == 1) return(0)
  tss <- sum((x - mean(x))^2)

  return(sqrt(tss/(length(x)-1)))
}

x <- rnorm(10)
mysd(x)
[1] 1.177
## compare with 
sd(x)
[1] 1.177

mysd(c(x, NA), na.rm=T)
[1] 1.177
## compare with 
sd(x,na.rm=T)
[1] 1.177
loglikpois <- function(x, lambda) {
  stopifnot(lambda>0, all(x>=0))

  n <- length(x)
  return(- n*lambda + log(lambda)*sum(x))
}

x <- c(1, 3, 2, 3, 0, 1, 0, 1, 3, 3)
lambda <- seq(0.01, 10, by=0.01)
library(ggplot2)
qplot(lambda, loglikpois(x, lambda))

plot of chunk unnamed-chunk-7

When things go wrong

Debugging functions is an art – very much like finding errors in a proof. What you can do yourself:

  • check your code step by step
  • include print statements to check intermediate results (and assumptions)
  • use browser()
  • investigate all warnings
  • ask a friend to look over your code

Iterations

Want to run the same block of code multiple times:

players <- unique(baseball$id)

for (i in players) {
  player <- subset(baseball, id == i)
  avg <- with(player, mean(h/ab, na.rm=T))

  print(avg)
}

Loop or iteration

Your Turn

Re-do the first part of the homework without ddply; i.e. use for-loops to gather all the player information.