January 28, 2013

Questions for the students

  1. What is statistics? (Shouldn't they know if they are taking the course?) Write down their suggestions (typically, “science of data”) and then move them toward “The explanation of variation in the context of what remains unexplained.” This will remain abstract for a few weeks, but by the end of the semester it will hopefully seem very concrete.
  2. Why are you taking this course? Students will hem and haw, trying to be polite, but knowing the real answer is, “Because I'm required to take it for my major.'' I'm not troubled by this. My goal for the semester is to make the students agree that the people who require stats for the major were right to do so.
  3. What statistics have you had before? Ferret out the AP students and ask them what techniques they covered. Explain that there will be very little overlap with the AP course. They will recognize some terminology, but any advantage they think they have is compensated for their need to overcome the misconceptions that often stem from an AP course.

Outline of the Course

  1. Description of data. We'll start with simple things: means, standard deviations, distributions. Then quickly move on to describing the relationship among multiple variables. You'll learn a formal language for constructing models that's very widely used in the natural and social sciences. You'll also learn about the importance and uses of subjectivity and incorporating expert knowledge into your models.
  2. Randomness. This is the core of a traditional statistics course, such as the AP course. But we will be using a much richer system for describing data, so you won't see this until about mid-course. (Although we'll do an example today so that you won't feel we're neglecting it.)
  3. Bringing modeling and randomness together.

What's different about this course

  1. We will be using techniques that are generally considered to be graduate level.
  2. We'll take computation very seriously. That's one of the ways that we'll be able to make graduate-level statistics accessible in a first course.
  3. The major themes. In a conventional stat course, tests'' are at the center. But for us …

Doing Some Statistics

What are grades like at Macalester?

g = fetchData("grades.csv")
## Retrieving from http://www.mosaic-web.org/go/datasets/grades.csv
head(g)
##      sid grade   sessionID
## 1 S31185    D+ session1784
## 2 S31185    B+ session1785
## 3 S31185    A- session1791
## 4 S31185    B+ session1792
## 5 S31185    B- session1794
## 6 S31185    C+ session1795
tally(~grade, data = g, format = "percent")
## 
##        A       A-       AU        B       B-       B+        C       C- 
##  25.7032  23.7377   0.4744  13.7072   5.6591  17.2145   2.3382   0.8811 
##       C+        D       D-       D+       NC        S    Total 
##   2.8295   0.3050   0.1017   0.1355   0.2880   6.6249 100.0000

Not every grade is the same. An obvious question: Why is this? What factors influence grades?

Smoking

Is smoking bad for you?

Death Rate per 1000 Person-Years
Smoking.Group Canadian British U.S.
1 Non-smokers 20.20 11.30 13.50
2 Cigarettes only 20.50 14.10 13.50
3 Cigars, pipes 35.50 20.70 17.70

Conclusion:

A covariate …

Mean Ages (years)
Smoking.Group Canadian British U.S.
1 Non-smokers 54.90 49.10 57.00
2 Cigarettes only 50.50 49.80 53.20
3 Cigars, pipes 65.90 55.70 59.70

Age Adjustment …

Age Adjusted Death Rate per 1000 Person-Years
Smoking.Group Canadian British U.S.
1 Non-smokers 20.20 11.30 13.50
2 Cigarettes only 28.30 12.80 17.70
3 Cigars, pipes 21.20 12.00 14.20

Source: W.G. Cochran (1968) Biometrics 24(2):295-313

Are Coins Fair?

The magic coin exercise

Detecting Fraud

Coin flips

Administrative Matters

The Pace of the Class

After giving you some time to settle back in for the semester, I will try to move pretty fast until about ¾ of the way through the semester. Then I will throttle back and go pretty easy for the last quarter of the semester.

I do this because

It's very important to remember

Grades

Components of the Grade

Will I get an A?

That depends on you. Here's my overall policy:

Typical grade in the class has been something like a B+. There have not been very many Cs, but it does happen.

Assignment Grading Policy

Assignments are due the midnight BEFORE class. We use an electronic system for handling assignments so that I don't have to wait until you get to class for you to hand in your work.

Often the assignments will cover matters to be covered in the following class itself. This means, in order to do the assignment successfully, you have to learn the material on your own, or by working with your friends and classmates.

BUT … I would be out of a job if you could always do this. I recognize that there are some things that you might not be able to learn on your own with the materials that are provided. So, you are free to go back and correct your answers AFTER class, if you learned something in class to change your mind. You'll get FULL CREDIT for such changed answers.

BUT … Many of you would wait until after class to hand in your work, and then I wouldn't know what I need to talk about during class. So, every once in a while, I will give you a grade based on your submissions before class. I won't worry about whether you have the answers right, just whether you submitted them. But I will notice if you are just randomly hitting buttons — that's easy, because there is a time stamp on each submission. So, be honest.

There are lots of assignments. It doesn't matter if you miss a few or hand them in late. The main point of the assignments is to exercise and develop your skills. They are calisthenics for statistics. Do them regularly and you will be in good shape.