R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

Purpose

  • summarizing data (汇总数据)

  • graphing data

Data

  • Data
library(LearnBayes)
#studentdata = read.table("studentdata.txt", sep = "\t", header = TRUE)
data(studentdata)
head(studentdata)
##   Student Height Gender Shoes Number Dvds ToSleep WakeUp Haircut  Job Drink
## 1       1     67 female    10      5   10    -2.5    5.5      60 30.0 water
## 2       2     64 female    20      7    5     1.5    8.0       0 20.0   pop
## 3       3     61 female    12      2    6    -1.5    7.5      48  0.0  milk
## 4       4     61 female     3      6   40     2.0    8.5      10  0.0 water
## 5       5     70   male     4      5    6     0.0    9.0      15 17.5   pop
## 6       6     63 female    NA      3    5     1.0    8.5      25  0.0 water

Data summary

  • summarizing a categorical variable: Drink
attach(studentdata)
table(Drink)
## Drink
##  milk   pop water 
##   113   178   355
barplot(table(Drink),xlab="Drink",ylab="Count")

Exercise 1.1, Albert (2009)

The variable Dvds in the student dataset ’studentdata‘ contains the number of movie DVDs owned by students in the class.

  1. Construct a histogram of this variable using the hist() command.
hist(Dvds,main="")

  1. Summarize this variable using the summary() command.
summary(Dvds)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00   10.00   20.00   30.93   30.00 1000.00      16
  1. Use the table() command to construct a frequency table of the individual values of Dvds that were observed. Use the barplot() command to constructs a barplot of these tabled values.
table(Dvds)
## Dvds
##    0    1    2  2.5    3    4    5    6    7    8    9   10   11   12   13   14 
##   26   10   13    1   18    9   27   14   12   12    7   78    3   20    7    4 
##   15   16   17 17.5   18   20   21   22 22.5   23   24   25 27.5   28   29   30 
##   46    1    3    1    4   83    3    3    1    3    2   31    3    1    1   45 
##   31   33   35   36   37   40   41   42   45   46   48   50   52   53   55   60 
##    1    1   12    4    1   26    1    1    5    1    2   26    1    2    1    7 
##   62   65   67   70   73   75   80   83   85   90   97  100  120  122  130  137 
##    1    2    1    4    1    3    4    1    1    1    1   10    2    1    2    1 
##  150  152  157  175  200  250  500  900 1000 
##    6    1    1    1    8    1    1    1    1
barplot(table(Dvds),xlab="Dvds",ylab="Count")

Exercise 1.2, Albert (2009)

The variable Height in the student dataset ’studentdata‘ contains the height (in inches) of each student in the class.

  1. Construct parallel boxplots of the heights using the Gender variable.
boxplot(Height~Gender,ylab="Height")

  1. If one assigns the boxplot output to a variable output=boxplot(Height Gender), then output is a list that contains statistics used in constructing the boxplots. Print output to see the statistics that are stored.
female.Height=Height[Gender=="female"]
male.Height=Height[Gender=="male"]
summary(female.Height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   54.00   63.00   64.50   64.76   67.00   84.00       7
summary(male.Height)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   59.00   69.00   71.00   70.51   72.00   79.00       3
  1. On average, how much taller are male students than female students?

70.51-64.76=5.75

Exercise 1.3, Albert (2009)

  1. Construct a scatterplot of ToSleep and WakeUp.
plot(ToSleep,WakeUp)

plot(jitter(ToSleep),jitter(WakeUp))

  1. Find a least-squares fit to these data using the lm() command.
fit=lm(WakeUp~ToSleep)
fit
## 
## Call:
## lm(formula = WakeUp ~ ToSleep)
## 
## Coefficients:
## (Intercept)      ToSleep  
##      7.9628       0.4247
  1. Place the least-squares fit on the scatterplot using the abline() command.
plot(jitter(ToSleep),jitter(WakeUp))
fit=lm(WakeUp~ToSleep)
fit
## 
## Call:
## lm(formula = WakeUp ~ ToSleep)
## 
## Coefficients:
## (Intercept)      ToSleep  
##      7.9628       0.4247
abline(fit)

  1. Use the line to predict the wake-up time for a student who went to bed at midnight.