Finally some statistics...

Matthias Bannert
May 14th

Wait, some R first

Try to do the following. What is remarkable about the behavior? How can we find out what happens here?

# draw plots of two datasets ?
plot(AirPassengers)
plot(swiss)

Hint: What are the differences between both objects from an R perspective?

Generic Methods

plot() is a generic: It has multiple methods each of which is specific to a particular class.

# show some (just for presentation reasons 
# you should try it without the index)
# methods of plot
methods(plot)[c(2,4,15,26)]
[1] "plot.data.frame" "plot.default"    "plot.lm"         "plot.ts"        

Note: You can look at a functions source code by just calling the function without parentheses ().

Applied Statistics

From scatterplots, histograms to quantiles and t.tests: Example

Recap: What We've Seen in the Examples

  1. functions
  2. factors (categorical variables)
  3. apply family

Functions

You can define your own function by

fname <- function(arg1,arg2){
  res <- arg1+arg2
  res # last argument equal to return(res)
}

And call it by

fname(1,2) # returns 3
[1] 3

Factors

load("../data/mlb.RData")
# position is a factor, i.e. a categorical var.
class(mlb$position)
[1] "factor"
levels(mlb$position) # reference level is usually the first level
[1] "Catcher"           "Designated_Hitter" "First_Baseman"    
[4] "Outfielder"        "Relief_Pitcher"    "Second_Baseman"   
[7] "Shortstop"         "Starting_Pitcher"  "Third_Baseman"    
levels(relevel(mlb$position,"Starting_Pitcher"))
[1] "Starting_Pitcher"  "Catcher"           "Designated_Hitter"
[4] "First_Baseman"     "Outfielder"        "Relief_Pitcher"   
[7] "Second_Baseman"    "Shortstop"         "Third_Baseman"    

Factors

apply a function over a list or vector and return a list or vector.

# do not run
lapply(somelist,function(element_of_list) do_something(element_of_list))

sapply(somelist,function(element_of_list) do_something(element_of_list))

see also: apply(), tapply(), mapply().

Your Questions

  • What did you consider useful?
  • Are there any applications you want to see?
  • Are there any dataset that you want to explore? e.g. google trends, twitter, seminar papers, datasets from psychology?