Finally some statistics...

Matthias Bannert
May 14th

Wait, some R first

Try to do the following. What is remarkable about the behavior? How can we find out what happens here?

# draw plots of two datasets ?
plot(AirPassengers)
plot(swiss)

Hint: What are the differences between both objects from an R perspective?

Generic Methods

plot() is a generic: It has multiple methods each of which is specific to a particular class.

# show some (just for presentation reasons 
# you should try it without the index)
# methods of plot
methods(plot)[c(2,4,15,26)]

[1] "plot.data.frame" "plot.default"    "plot.lm"         "plot.ts"

Note: You can look at a functions source code by just calling the function without parentheses ().

Applied Statistics

From scatterplots, histograms to quantiles and t.tests: Example

Recap: What We've Seen in the Examples

functions
factors (categorical variables)
apply family
…

Functions

You can define your own function by

fname <- function(arg1,arg2){
  res <- arg1+arg2
  res # last argument equal to return(res)
}

And call it by

fname(1,2) # returns 3

[1] 3

Factors

load("../data/mlb.RData")
# position is a factor, i.e. a categorical var.
class(mlb$position)

[1] "factor"

levels(mlb$position) # reference level is usually the first level

[1] "Catcher"           "Designated_Hitter" "First_Baseman"    
[4] "Outfielder"        "Relief_Pitcher"    "Second_Baseman"   
[7] "Shortstop"         "Starting_Pitcher"  "Third_Baseman"

levels(relevel(mlb$position,"Starting_Pitcher"))

[1] "Starting_Pitcher"  "Catcher"           "Designated_Hitter"
[4] "First_Baseman"     "Outfielder"        "Relief_Pitcher"   
[7] "Second_Baseman"    "Shortstop"         "Third_Baseman"

Factors

apply a function over a list or vector and return a list or vector.

# do not run
lapply(somelist,function(element_of_list) do_something(element_of_list))

sapply(somelist,function(element_of_list) do_something(element_of_list))

Your Questions

What did you consider useful?
Are there any applications you want to see?
Are there any dataset that you want to explore? e.g. google trends, twitter, seminar papers, datasets from psychology?