transform-birth-rate-tidyr.R

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit HTML button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

In this document we will work with the birth-rate.csv data file.

First read in the wide format birth-rate.csv file. And make a stem-and-leaf plot.

birth <- read.csv("http://datasets.flowingdata.com/birth-rate.csv")

stem(birth$X2008)

## 
##   The decimal point is at the |
## 
##    8 | 2371334468999
##   10 | 01223455566999001222334555777889
##   12 | 00011111356789993789
##   14 | 0034566788991237
##   16 | 227779123677889
##   18 | 00233677888900448
##   20 | 0024445688912455679
##   22 | 0057834579
##   24 | 11456677771347
##   26 | 31335667
##   28 | 014999
##   30 | 124234
##   32 | 1449069
##   34 | 556049
##   36 | 8890
##   38 | 023455823468
##   40 | 23125
##   42 | 699
##   44 | 17
##   46 | 252
##   48 | 
##   50 | 
##   52 | 5

Next make some histograms.

hist(birth$X2008)

hist(birth$X2008,breaks=5)

hist(birth$X2008,breaks=20)

Second, the non-missing values are removed from the 2008 data and the remaining data is used to make a density plot.

birth2008 <- birth$X2008[!is.na(birth$X2008)]
head(birth2008)

## [1] 11.71600 46.53800 42.87500 14.64900 13.28100 26.32405

d2008 <- density(birth2008)

# d2008$x
# d2008$y

d2008frame <- data.frame(d2008$x,d2008$y)

plot(d2008)

plot(d2008, type="n")
polygon(d2008, col="#821122", border="#cccccc")

Now make the Histogram Matrix.

Let’s reshape the data using the R package tidyr, which is a new interface to the updated reshape2 package. We will do this instead of running the python code the author uses.

# birth_yearly <- read.csv("http://datasets.flowingdata.com/birth-rate-yearly.csv")

library(tidyr)    # this package has the gather() function
library(lattice)  # this package has the histogram() function

dim(birth)

## [1] 234  50

birth <- birth[2:50]  # remove the country names.

birth_yearly <- gather(birth, "year", "rate")

histogram(~ rate | year, data=birth_yearly, layout=c(10,5))

For a nicer plot, remove outlier and add years to the title bar for each plot changing the order of the year from 1960 to 2008

summary(birth_yearly$rate)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    6.90   18.06   29.62   29.94   41.91  132.00    1596

birth_yearly.new <- birth_yearly[birth_yearly$rate < 132,]

birth_yearly.new$year <- as.character(birth_yearly.new$year)

h <- histogram(~ rate | year, data=birth_yearly.new, layout=c(10,5))

update(h, index.cond=list(c(41:50, 31:40, 21:30, 1:10)))

transform-birth-rate-tidyr.R

Prof. Eric A. Suess

March 1, 2016