This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit HTML button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.
In this document we will work with the birth-rate.csv data file.
First read in the wide format birth-rate.csv file. And make a stem-and-leaf plot.
birth <- read.csv("http://datasets.flowingdata.com/birth-rate.csv")
stem(birth$X2008)
##
## The decimal point is at the |
##
## 8 | 2371334468999
## 10 | 01223455566999001222334555777889
## 12 | 00011111356789993789
## 14 | 0034566788991237
## 16 | 227779123677889
## 18 | 00233677888900448
## 20 | 0024445688912455679
## 22 | 0057834579
## 24 | 11456677771347
## 26 | 31335667
## 28 | 014999
## 30 | 124234
## 32 | 1449069
## 34 | 556049
## 36 | 8890
## 38 | 023455823468
## 40 | 23125
## 42 | 699
## 44 | 17
## 46 | 252
## 48 |
## 50 |
## 52 | 5
Next make some histograms.
hist(birth$X2008)
hist(birth$X2008,breaks=5)
hist(birth$X2008,breaks=20)
Second, the non-missing values are removed from the 2008 data and the remaining data is used to make a density plot.
birth2008 <- birth$X2008[!is.na(birth$X2008)]
head(birth2008)
## [1] 11.71600 46.53800 42.87500 14.64900 13.28100 26.32405
d2008 <- density(birth2008)
# d2008$x
# d2008$y
d2008frame <- data.frame(d2008$x,d2008$y)
plot(d2008)
plot(d2008, type="n")
polygon(d2008, col="#821122", border="#cccccc")
Now make the Histogram Matrix.
Let’s reshape the data using the R package tidyr, which is a new interface to the updated reshape2 package. We will do this instead of running the python code the author uses.
# birth_yearly <- read.csv("http://datasets.flowingdata.com/birth-rate-yearly.csv")
library(tidyr) # this package has the gather() function
library(lattice) # this package has the histogram() function
dim(birth)
## [1] 234 50
birth <- birth[2:50] # remove the country names.
birth_yearly <- gather(birth, "year", "rate")
histogram(~ rate | year, data=birth_yearly, layout=c(10,5))
For a nicer plot, remove outlier and add years to the title bar for each plot changing the order of the year from 1960 to 2008
summary(birth_yearly$rate)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 6.90 18.06 29.62 29.94 41.91 132.00 1596
birth_yearly.new <- birth_yearly[birth_yearly$rate < 132,]
birth_yearly.new$year <- as.character(birth_yearly.new$year)
h <- histogram(~ rate | year, data=birth_yearly.new, layout=c(10,5))
update(h, index.cond=list(c(41:50, 31:40, 21:30, 1:10)))