This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

You can also embed plots, for example:

plot of chunk unnamed-chunk-2

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Births in 1978

Let’s begin by looking at a dataset called Births78. This is a data frame containing the number of births in the United States for each day in 1978.

We can use the head() function to look at the first several rows of this dataset:

head(Births78)
##         date births dayofyear
## 1 1978-01-01   7701         1
## 2 1978-01-02   7527         2
## 3 1978-01-03   8825         3
## 4 1978-01-04   8859         4
## 5 1978-01-05   9043         5
## 6 1978-01-06   9208         6

Now you’ve seen your first example of a code chunk (with R code) and its corresponding output.

Q: What are the variables in Births78? Label each as categorical or quantitative.

SOLUTION:

head(Births78, n=4)
##         date births dayofyear
## 1 1978-01-01   7701         1
## 2 1978-01-02   7527         2
## 3 1978-01-03   8825         3
## 4 1978-01-04   8859         4
  1. Date - Catagorical (here, but usually Quantitative)
  2. Births - Quantitative
  3. Day of Year - Catagorical

We can also look at a histogram of the births per day.

histogram(~births, data=Births78)

plot of chunk unnamed-chunk-5

Q: In 1-2 sentences, comment on the center and shape of the histogram.

SOLUTION: The center of the histogram is around 9000 births.The graph is unimodal, and a bit skewed to the left, but the values on the right side are generally higher than those on the left, so it may be closer to symmetric than it appears.


Finally, consider a time plot of the number of births over the entire year, ordered by day.

xyplot(births ~ dayofyear, data=Births78)

plot of chunk unnamed-chunk-6

head(Births78, n=100)
##           date births dayofyear
## 1   1978-01-01   7701         1
## 2   1978-01-02   7527         2
## 3   1978-01-03   8825         3
## 4   1978-01-04   8859         4
## 5   1978-01-05   9043         5
## 6   1978-01-06   9208         6
## 7   1978-01-07   8084         7
## 8   1978-01-08   7611         8
## 9   1978-01-09   9172         9
## 10  1978-01-10   9089        10
## 11  1978-01-11   9210        11
## 12  1978-01-12   9259        12
## 13  1978-01-13   9138        13
## 14  1978-01-14   8299        14
## 15  1978-01-15   7771        15
## 16  1978-01-16   9458        16
## 17  1978-01-17   9339        17
## 18  1978-01-18   9120        18
## 19  1978-01-19   9226        19
## 20  1978-01-20   9305        20
## 21  1978-01-21   7954        21
## 22  1978-01-22   7560        22
## 23  1978-01-23   9252        23
## 24  1978-01-24   9416        24
## 25  1978-01-25   9090        25
## 26  1978-01-26   9387        26
## 27  1978-01-27   8983        27
## 28  1978-01-28   7946        28
## 29  1978-01-29   7527        29
## 30  1978-01-30   9184        30
## 31  1978-01-31   9152        31
## 32  1978-02-01   9159        32
## 33  1978-02-02   9218        33
## 34  1978-02-03   9167        34
## 35  1978-02-04   8065        35
## 36  1978-02-05   7804        36
## 37  1978-02-06   9225        37
## 38  1978-02-07   9328        38
## 39  1978-02-08   9139        39
## 40  1978-02-09   9247        40
## 41  1978-02-10   9527        41
## 42  1978-02-11   8144        42
## 43  1978-02-12   7950        43
## 44  1978-02-13   8966        44
## 45  1978-02-14   9859        45
## 46  1978-02-15   9285        46
## 47  1978-02-16   9103        47
## 48  1978-02-17   9238        48
## 49  1978-02-18   8167        49
## 50  1978-02-19   7695        50
## 51  1978-02-20   9021        51
## 52  1978-02-21   9252        52
## 53  1978-02-22   9335        53
## 54  1978-02-23   9268        54
## 55  1978-02-24   9552        55
## 56  1978-02-25   8313        56
## 57  1978-02-26   7881        57
## 58  1978-02-27   9262        58
## 59  1978-02-28   9705        59
## 60  1978-03-01   9132        60
## 61  1978-03-02   9304        61
## 62  1978-03-03   9431        62
## 63  1978-03-04   8008        63
## 64  1978-03-05   7791        64
## 65  1978-03-06   9294        65
## 66  1978-03-07   9573        66
## 67  1978-03-08   9212        67
## 68  1978-03-09   9218        68
## 69  1978-03-10   9583        69
## 70  1978-03-11   8144        70
## 71  1978-03-12   7870        71
## 72  1978-03-13   9022        72
## 73  1978-03-14   9525        73
## 74  1978-03-15   9284        74
## 75  1978-03-16   9327        75
## 76  1978-03-17   9480        76
## 77  1978-03-18   7965        77
## 78  1978-03-19   7729        78
## 79  1978-03-20   9135        79
## 80  1978-03-21   9663        80
## 81  1978-03-22   9307        81
## 82  1978-03-23   9159        82
## 83  1978-03-24   9157        83
## 84  1978-03-25   7874        84
## 85  1978-03-26   7589        85
## 86  1978-03-27   9100        86
## 87  1978-03-28   9293        87
## 88  1978-03-29   9195        88
## 89  1978-03-30   8902        89
## 90  1978-03-31   9318        90
## 91  1978-04-01   8069        91
## 92  1978-04-02   7691        92
## 93  1978-04-03   9114        93
## 94  1978-04-04   9439        94
## 95  1978-04-05   8852        95
## 96  1978-04-06   8969        96
## 97  1978-04-07   9077        97
## 98  1978-04-08   7890        98
## 99  1978-04-09   7445        99
## 100 1978-04-10   8870       100

Bonus Q: Why does there appear to be two distinct groups of points in the plot?

SOLUTION: (optional) The shapes of the groups match, meaning that if there is an increase in births during one time of the year, that increase persists into the other group of data. It’s more likely then, that something else is “consistantly” effecting the consistancy of the data; for instance the true number of births varied a lot less than this graph shows, so perhaps we got 1000 more or less samples on one day than we would a previous day.