This document is just a test of RPubs platform for publishing R Markdown. I'll use the survey dataset from the MASS package to explore a few features.
library(MASS)
library(psych)
library(Hmisc)
library(xtable)
data(survey)
So let's have a look at this dataset
head(survey)
## Sex Wr.Hnd NW.Hnd W.Hnd Fold Pulse Clap Exer Smoke Height
## 1 Female 18.5 18.0 Right R on L 92 Left Some Never 173.0
## 2 Male 19.5 20.5 Left R on L 104 Left None Regul 177.8
## 3 Male 18.0 13.3 Right L on R 87 Neither None Occas NA
## 4 Male 18.8 18.9 Right R on L NA Neither None Never 160.0
## 5 Male 20.0 20.0 Right Neither 35 Right Some Never 165.0
## 6 Female 18.0 17.7 Right L on R 64 Right Some Never 172.7
## M.I Age
## 1 Metric 18.25
## 2 Imperial 17.58
## 3 <NA> 16.92
## 4 Metric 20.33
## 5 Metric 23.67
## 6 Imperial 21.00
As you can see, it contains an assortment of measurements for 237 individuals.
Hmisc::describe(survey)
## survey
##
## 12 Variables 237 Observations
## ---------------------------------------------------------------------------
## Sex
## n missing unique
## 236 1 2
##
## Female (118, 50%), Male (118, 50%)
## ---------------------------------------------------------------------------
## Wr.Hnd
## n missing unique Mean .05 .10 .25 .50 .75
## 236 1 60 18.67 16.00 16.50 17.50 18.50 19.80
## .90 .95
## 21.15 22.05
##
## lowest : 13.0 14.0 15.0 15.4 15.5, highest: 22.5 22.8 23.0 23.1 23.2
## ---------------------------------------------------------------------------
## NW.Hnd
## n missing unique Mean .05 .10 .25 .50 .75
## 236 1 68 18.58 15.50 16.30 17.50 18.50 19.72
## .90 .95
## 21.00 22.22
##
## lowest : 12.5 13.0 13.3 13.5 15.0, highest: 22.7 23.0 23.2 23.3 23.5
## ---------------------------------------------------------------------------
## W.Hnd
## n missing unique
## 236 1 2
##
## Left (18, 8%), Right (218, 92%)
## ---------------------------------------------------------------------------
## Fold
## n missing unique
## 237 0 3
##
## L on R (99, 42%), Neither (18, 8%), R on L (120, 51%)
## ---------------------------------------------------------------------------
## Pulse
## n missing unique Mean .05 .10 .25 .50 .75
## 192 45 43 74.15 59.55 60.00 66.00 72.50 80.00
## .90 .95
## 90.00 92.00
##
## lowest : 35 40 48 50 54, highest: 96 97 98 100 104
## ---------------------------------------------------------------------------
## Clap
## n missing unique
## 236 1 3
##
## Left (39, 17%), Neither (50, 21%), Right (147, 62%)
## ---------------------------------------------------------------------------
## Exer
## n missing unique
## 237 0 3
##
## Freq (115, 49%), None (24, 10%), Some (98, 41%)
## ---------------------------------------------------------------------------
## Smoke
## n missing unique
## 236 1 4
##
## Heavy (11, 5%), Never (189, 80%), Occas (19, 8%), Regul (17, 7%)
## ---------------------------------------------------------------------------
## Height
## n missing unique Mean .05 .10 .25 .50 .75
## 209 28 67 172.4 157.0 160.0 165.0 171.0 180.0
## .90 .95
## 185.4 189.6
##
## lowest : 150.0 152.0 152.4 153.5 154.9
## highest: 191.8 193.0 195.0 196.0 200.0
## ---------------------------------------------------------------------------
## M.I
## n missing unique
## 209 28 2
##
## Imperial (68, 33%), Metric (141, 67%)
## ---------------------------------------------------------------------------
## Age
## n missing unique Mean .05 .10 .25 .50 .75
## 237 0 88 20.37 17.08 17.22 17.67 18.58 20.17
## .90 .95
## 23.58 30.68
##
## lowest : 16.75 16.92 17.00 17.08 17.17
## highest: 41.58 43.83 44.25 70.42 73.00
## ---------------------------------------------------------------------------
boxplot(Height ~ Sex, survey)
fit1 <- lm(Height ~ Sex, survey)
summary(fit1)
##
## Call:
## lm(formula = Height ~ Sex, data = survey)
##
## Residuals:
## Min 1Q Median 3Q Max
## -23.89 -5.67 1.17 4.36 21.17
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 165.69 0.73 227.0 <2e-16 ***
## SexMale 13.14 1.02 12.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.37 on 206 degrees of freedom
## (29 observations deleted due to missingness)
## Multiple R-squared: 0.445, Adjusted R-squared: 0.442
## F-statistic: 165 on 1 and 206 DF, p-value: <2e-16
##
Yes, as expected, males are taller. The expected difference is 13.1 centimetres or 5.2 inches.
survey$ExerSome <- survey$Exer %in% c("Freq", "Some")
tab1 <- table(smoke = survey$Smoke, exercise = survey$ExerSome)
print(xtable(tab1, caption = "Cell counts"), "html")
| FALSE | TRUE | |
|---|---|---|
| Heavy | 1 | 10 |
| Never | 18 | 171 |
| Occas | 3 | 16 |
| Regul | 1 | 16 |
print(xtable(round(prop.table(tab1, 1), 2), caption = "Proportion within smoker category"),
"html")
| FALSE | TRUE | |
|---|---|---|
| Heavy | 0.09 | 0.91 |
| Never | 0.10 | 0.90 |
| Occas | 0.16 | 0.84 |
| Regul | 0.06 | 0.94 |
chisq.test(tab1)
## Warning message: Chi-squared approximation may be incorrect
##
## Pearson's Chi-squared test
##
## data: tab1
## X-squared = 1.093, df = 3, p-value = 0.7787
##
No, it doesn't look like it.
pairs.panels(survey[, c("Wr.Hnd", "NW.Hnd", "Height")])
This above figure shows the pearson correlation coefficient between writing hand span (Wr.Hnd), non-writing hand span (NW.Hnd) and actual height (Height).
As I said at the start, this is just my first test of the RPub publishing platform for R Markdown.