Statistial Modeling Notes: Day 2

Organization of data into tables: Variables and Cases

Creating their own spreadsheet

Do the introductions!

One person from each group introduces another person from that group.

Bureaucracy

Starting R

Starting up

Login to RStudio and make sure the “mosaic” box on the “Packages” tab is checked. You'll probably need to do this only once for the semester, but if you happen to restart R, you'll need to do it again.

Basic Syntax

Arithmetic

Arithmetic functions: + - * / ^ ( )

2 + 7
## [1] 9
2 * 7
## [1] 14
2^7
## [1] 128

Functions: name( argument1, argument2, ... )

sqrt(4)
## [1] 2
sin(2)
## [1] 0.9093

Named arguments: When there is more than one argument, it's convenient to be able to refer to them by their names rather than by their positions.

seq(from = 1, to = 10, by = 2)
## [1] 1 3 5 7 9
seq(from = 1, to = 10, length = 5)
## [1]  1.00  3.25  5.50  7.75 10.00

Sometimes you'll mix named and un-named arguments. The typical pattern is that the first one or two arguments you remember by position and the remaining ones you refer to by name. The basic principle is that things should be unambiguous.

seq(1, 10, by = 2)
## [1] 1 3 5 7 9
seq(10, from = 1, length = 2)
## [1]  1 10

But better not to do this.

seq(from = 1, to = 10, 2)
## [1] 1 3 5 7 9

Is the third argument length= or to=? You don't really know until you try. So use the name!

Chaining

You can use the output of one computation as an input to the next calculation:

sqrt((sqrt(9))^2)
## [1] 3

QUESTIONS:

Assignment

You can store data and results for later use. You referred to the stored object by a name. The name should begin with a letter and not have any punctuation other than . or _. Best to keep your names short, but not so short that you can't remember what's stored where.

Here's a pure math example (since we haven't learned any statistics yet!)

legx = 7
legy = 9
hypot = sqrt(legx^2 + legy^2)
legx/hypot  # Is this a sine or a cosine
## [1] 0.6139
ang = atan2(legx, legy)  # the angle in radians
sin(ang)
## [1] 0.6139
cos(ang)
## [1] 0.7894

For those interested, this technique of writing statements with quantities referred to by name is called abstraction. Just by changing the values of legx and legy, we can repeat the calculation for a different right triangle. So the calculations can be thought of as representing the general properties of a right triangle, rather than a particular calculation for one triangle.

Getting Data

Typically you'll use the function fetchData(). It takes as an argument the name of the dataset you want to read.

kids = fetchData("KidsFeet")
## Data KidsFeet found in package.

These data sets are coming from a web site specifically set up for Project MOSAIC. This allows us to access textbook data with a very concise command.

You can also read in your own data. Store it as a CSV file on your computer, upload it to RStudio, and then use the command

mydata = fetchData()  # no argument, but you still need parens.

The object that is created, which I've stored into the name mydata is generically called a “data frame.”

Basic operations on data in data frames

names(kids)
## [1] "name"       "birthmonth" "birthyear"  "length"     "width"     
## [6] "sex"        "biggerfoot" "domhand"   
nrow(kids)
## [1] 39
mean(length, data = kids)  # Note, one variable named
## [1] 24.72

The tilde punctuation

We'll use the “tilde” punctuation through the semester. It can be interpreted in several different but related ways:

Examples:

mean(length ~ sex, data = kids)
##     B     G 
## 25.11 24.32 
median(length ~ sex, data = kids)
##     B     G 
## 24.95 24.20 
xyplot(width ~ length, data = kids)

plot of chunk unnamed-chunk-12

Going a little bit further than we need to, just for fun:

xyplot(width ~ length | sex, data = kids)

plot of chunk unnamed-chunk-13

xyplot(width ~ length, groups = sex, data = kids)

plot of chunk unnamed-chunk-13

And some statistical graphics:

histogram(~width, data = kids)

plot of chunk unnamed-chunk-14

densityplot(~width, groups = sex, data = kids)

plot of chunk unnamed-chunk-14

bwplot(~width, data = kids)

plot of chunk unnamed-chunk-14

bwplot(width ~ sex, data = kids)

plot of chunk unnamed-chunk-14