Elements of Programming: Loops, Functions & Conditionals

Heike Hofmann
Stat 579, Fall 2013

Homework

No homework this week - but make sure to look at the practice exams!

Please submit Homework #4 online to the Blackboard system.

Grading Scheme C/S/O - each area gives 3.5 points, you could potentially over-score

Best answers for homework #3 are now online

Outline

Functions
Conditional Clauses
(Loops)

Functions in R

Have been using functions a lot, now we want to write them ourselves!
Idea: avoid repetitive coding (errors will creep in, DRY principle)
Instead: extract common core, wrap it in a function, make it reusable

Basic Structure of a function

Name
Input arguments
- names
- default values
Body
Output values

A first function

mymean <- function(x) {
  return(sum(x)/length(x))
}

mymean(1:15)

[1] 8

mymean(c(1:15, NA))

[1] NA

We would like to be able to specify na.rm=TRUE to be able to ignore missing values

First extension

mymean <- function(x, na.rm) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute average then

  return(sum(x)/length(x))
}

mymean(1:15)

[1] 8

mymean(c(1:15, NA))

[1] NA

Conditionals

if (condition) {
  statement
} [else {   ## else is optional
  statement
}]

condition is a length one logical value, i.e. either TRUE or FALSE
use & and | to combine several conditions
! negates condition

First extension of first example

mymean <- function(x, na.rm) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute average then
  if (na.rm == TRUE) x <- na.omit(x)

  return(sum(x)/length(x))
}

mymean(c(1:15, NA), na.rm=TRUE)

[1] 8

But a regular call without missing values doesn't work anymore. This is what we get for mymean(1:15)

Error in na.rm == TRUE : 'na.rm' is missing

We need to either specify na.rm for each call or use a default value

Function with default value

mymean <- function(x, na.rm=TRUE) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute average then
  if (na.rm == TRUE) x <- na.omit(x)

  return(sum(x)/length(x))
}

mymean(1:15)

[1] 8

mymean(c(1:15, NA))

[1] 8

mymean(c(1:15, NA), na.rm=T)

[1] 8

Function mymean

Name: mymean
Input arguments: x, na.rm=T
- names,
- default values
Body: if(na.rm) x <- na.omit(x)
Output values: return(sum(x)/length(x))

Successful function writing:

Start simple, then extend
Test out each step of the way
Don’t try too much at once

Your Turn

Similar to what we did with mean, write a function mysd that computes the standard deviation of variable x from scratch'. Include a parameter na.rm in it.
\( L(\lambda; x) = - n\lambda + \log(\lambda) \sum_i x_i \) is the log likelihood function of a Poisson variable \( X \) with parameter \( \lambda > 0 \). Write a function loglikpois with parameters lambda and (vector) x, that computes the log likelihood value for \( \lambda \) given observations \( x \). Make sure to check that lambda is positive, return an error message (stop) if not. Plot the likelihood values for lambda between 0 and 10 for x = c(1, 3, 2, 3, 0, 1, 0, 1, 3, 3)

mysd <- function(x, na.rm=TRUE) {
  ## if na.rm is true, we would like to ignore missing values and 
  ## compute standard deviation then
  if (na.rm == TRUE) x <- na.omit(x)
  if (length(x) == 1) return(0)
  tss <- sum((x - mean(x))^2)

  return(sqrt(tss/(length(x)-1)))
}

x <- rnorm(10)
mysd(x)

[1] 1.177

## compare with 
sd(x)

[1] 1.177


mysd(c(x, NA), na.rm=T)

[1] 1.177

## compare with 
sd(x,na.rm=T)

[1] 1.177

loglikpois <- function(x, lambda) {
  stopifnot(lambda>0, all(x>=0))

  n <- length(x)
  return(- n*lambda + log(lambda)*sum(x))
}

x <- c(1, 3, 2, 3, 0, 1, 0, 1, 3, 3)
lambda <- seq(0.01, 10, by=0.01)
library(ggplot2)
qplot(lambda, loglikpois(x, lambda))

plot of chunk unnamed-chunk-7

When things go wrong

Debugging functions is an art – very much like finding errors in a proof. What you can do yourself:

check your code step by step
include print statements to check intermediate results (and assumptions)
use browser()
investigate all warnings
ask a friend to look over your code

Iterations

Want to run the same block of code multiple times:

players <- unique(baseball$id)

for (i in players) {
  player <- subset(baseball, id == i)
  avg <- with(player, mean(h/ab, na.rm=T))

  print(avg)
}

Loop or iteration

Your Turn

Re-do the first part of the homework without ddply; i.e. use for-loops to gather all the player information.