May 29, 2015
All Reed freshman are required to take Humanities 110.
Only prereq was intro stats. No serious programming experience necessary.
18 students, mostly juniors and seniors.
| Major | Count |
|---|---|
| Mathematics | 4 |
| Biological Science: Biology & Biochem and Molecular Biology | 4 |
| Other Science: Chemistry, Environmental Studies, Physics | 4 |
| Social Science: Political Science, Sociology | 2 |
| Economics | 2 |
| Misc: Psychology, Linguistics | 2 |
dplyr package for data wrangling/manipulationggplot2 package for data visualizationA statistical graphic consists of a mapping of data variables to aesthetic attributes of geometric objects that we can observe.
Minard's map of Napoleon's Russian campaign of 1812:
| Data (Variable) | Geometric Object | Aesthetic Attribute of Geo Obj |
|---|---|---|
| longitude | points | x position |
| latitude | points | y position |
| army size | bars | width |
| army direction | bars | color |
| date | text | (x,y) position |
| temperature | lines | (x,y) position |
All 222,540 songs played on the Reed pool hall jukebox from 2003-2009 c/o Noah Pepper '09
| date_time | artist | album | track |
|---|---|---|---|
| Sun Dec 7 05:12:57 2003 | Tom Petty and the Heartbreakers | Into the Great Wide Open | |
| Sun Dec 7 05:15:56 2003 | Jefferson Airplane | Somebody To Love | |
| Sun Dec 7 05:23:04 2003 | Led Zeppelin | Led Zeppelin IV | 08 When The Levee Breaks |
quandl.com is a great source for economic data
"A web application framework for R. Turn your analyses into interactive web applications. No HTML, CSS, or JavaScript knowledge required."
Features
%>% command, pronounced "THEN"Info on all domestic flights leaving Houston (IAH) in 2011:
flights: info on 227,496 flightsplanes: info on 2853 airplanesWhat are the top 5 carriers using the oldest planes (averaged over all flights)?
The flights dataset:
| date | dep | arr | carrier | flight | dest | plane |
|---|---|---|---|---|---|---|
| 2011-01-01 | 1400 | 1500 | AA | 428 | DFW | N576AA |
| 2011-01-02 | 1401 | 1501 | AA | 428 | DFW | N557AA |
| 2011-01-03 | 1352 | 1502 | AA | 428 | DFW | N541AA |
| 2011-01-04 | 1403 | 1513 | AA | 428 | DFW | N403AA |
| 2011-01-05 | 1405 | 1507 | AA | 428 | DFW | N492AA |
The planes dataset:
| plane | year | model | mfr | no.seats |
|---|---|---|---|---|
| N576AA | 1991 | DC-9-82(MD-82) | MCDONNELL DOUGLAS | 172 |
| N557AA | 1993 | KITFOX IV | MARZ BARRY | 2 |
| N403AA | 1974 | S55A | RAVEN | 1 |
| N492AA | 1989 | DC-9-82(MD-82) | MCDONNELL DOUGLAS | 172 |
| N262AA | 1985 | DC-9-82(MD-82) | MCDONNELL DOUGLAS | 172 |
The following sequence of verbs wrangle/manipulate the data:
left_join(flights, planes, by='plane') %>% select(carrier, plane, year) %>% mutate(age = 2011 - year) %>% group_by(carrier) %>% summarise(avg_age = mean(age)) %>% arrange(desc(avg_age)) %>% top_n(5)
| carrier | avg_age |
|---|---|
| MQ | 29.421 |
| AA | 24.325 |
| DL | 20.760 |
| US | 19.078 |
| UA | 14.635 |