This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document.

About Rmd file

Text

Text can be decorated with bold or italics. It is also possible to

create links
include mathematics like $e=mc^2$ or \[y = \beta_0 + \beta_1 x_1 + \beta_2 x_2\]

Be sure to put a space after the * when you are creating bullets and a space after # when creating section headers, but not between $ and the mathematical formulas.

You can embed an R code chunk like this:

summary(cars)

##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

You can also embed plots, for example:

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

head(cars)

##   speed dist
## 1     4    2
## 2     4   10
## 3     7    4
## 4     7   22
## 5     8   16
## 6     9   10

knitr settings to control how R chunks work.

require(knitr) opts_chunk$set( tidy=FALSE, # display code as typed size=“small” # slightly smaller font for code )

The most important template is

goal(y~x,data=mydata)
What you want R to do is Goal : This determines the function to use(favstat, mean, SD, lm)
What must R know to do that: This determines the inputs to the function, must identify variables and data frame
it produces single and multiple variable graphical summaries
it produces single and multiple variable numerical summaries
linear models

Univariate Summaries

Numerical Summaries for one variable

favstats(~ age, data=HELPrct)

##  min Q1 median Q3 max     mean       sd   n missing
##   19 30     35 40  60 35.65342 7.710266 453       0

tally(~ sex, data=HELPrct)

## sex
## female   male 
##    107    346

Graphical Summaries one variable

#graphing quantitative numeric variable
histogram(~age,data=HELPrct)

densityplot(~age,data=HELPrct)

bwplot(~age,data=HELPrct)

qqmath(~age,data=HELPrct)

freqpolygon(~age,data=HELPrct)

bargraph(~age,data=HELPrct)

bargraph(~sex, data=HELPrct) #graphing categorical variable

Bivariate Summaries

Categorical variable vs. categorical variable

tally(homeless~sex,data=HELPrct)

##           sex
## homeless   female male
##   homeless     40  169
##   housed       67  177

bargraph(~sex,group=homeless, data=HELPrct,auto.key=TRUE)

Numerical summaries of two variables

Quantitative variable vs. quantitative variable

i1 average number of drinks consumed per day in past 30 days

cor(i1~age, data=HELPrct)

## [1] 0.2069538

xyplot(i1~age, data=HELPrct)

Categorical Variable vs. Quantitative Variable

a1<-favstats(age~substance|sex,data=HELPrct)
a1

##              sex min Q1 median   Q3 max     mean       sd   n missing
## 1 alcohol.female  23 33   37.0 45.0  58 39.16667 7.980333  36       0
## 2 cocaine.female  24 31   34.0 38.0  49 34.85366 6.195002  41       0
## 3  heroin.female  21 29   34.0 39.0  55 34.66667 8.035839  30       0
## 4   alcohol.male  20 32   38.0 42.0  58 37.95035 7.575644 141       0
## 5   cocaine.male  23 30   33.0 37.0  60 34.36036 6.889772 111       0
## 6    heroin.male  19 27   32.5 39.0  53 33.05319 7.973568  94       0
## 7         female  21 31   35.0 40.5  58 36.25234 7.584858 107       0
## 8           male  19 30   35.0 40.0  60 35.46821 7.750110 346       0

a2<-favstats(age~ racegrp, data=HELPrct)
a2

##    racegrp min    Q1 median    Q3 max     mean       sd   n missing
## 1    black  20 31.00     35 39.00  60 35.68246 7.083759 211       0
## 2 hispanic  21 28.25     32 36.25  55 33.20000 7.989789  50       0
## 3    other  22 30.00     34 40.50  48 34.96154 7.660187  26       0
## 4    white  19 30.00     36 42.00  58 36.46386 8.281152 166       0

bwplot(age~racegrp, data=HELPrct)#boxplot

a3<-mean(age~substance|sex,data=HELPrct,.format="table") #tabular form
a3

##   substance    sex     mean
## 1   alcohol female 39.16667
## 2   alcohol   male 37.95035
## 3   cocaine female 34.85366
## 4   cocaine   male 34.36036
## 5    heroin female 34.66667
## 6    heroin   male 33.05319

Categorical Variable vs. Categorical Variable

Numerical summaries

Cross Tabulations

tally(sex~substance,data=HELPrct)

##         substance
## sex      alcohol cocaine heroin
##   female      36      41     30
##   male       141     111     94

summary(sex~substance,data=HELPrct)

##  Length   Class    Mode 
##       3 formula    call

Graphical summaries two variables

xyplot(i1~age,data=HELPrct)

bwplot(age~substance,data=HELPrct)

Tips

Replace summary name by plot name

bwplot(age~substance|sex,data=HELPrct, .format="table")

add groups = group to overlay

use y~x|z to create multipanel plots

densityplot(~age|sex,data=HELPrct,groups=substance, auto.key=TRUE)

Some other generic functions, that will come in handy as we progress in the course

Mosaic package includes datasets, xtras: xchisq.test(data name),xpnorm(),mplot(),xqqmath

xpnorm( 700, mean=500, sd=100)

##

## If X ~ N(500, 100), then

##  P(X <= 700) = P(Z <= 2) = 0.9772

##  P(X >  700) = P(Z >  2) = 0.02275

##

## [1] 0.9772499

xpnorm( c(300, 700), mean=500, sd=100)

##

## If X ~ N(500, 100), then

##  P(X <= 300) = P(Z <= -2) = 0.02275  P(X <= 700) = P(Z <=  2) = 0.97725

##  P(X >  300) = P(Z >  -2) = 0.97725  P(X >  700) = P(Z >   2) = 0.02275

##

## [1] 0.02275013 0.97724987

Modelling

linear models lm(),glm() # linear models

a<-lm(age~substance*sex, data=HELPrct)
plot(a)

***************************

Your turn

****************************

Refer to the Quiz handout given in class and follow the instructions to complete it.

Intro to Stats with bare minmum R using package Mosaic

TZaihra

Sept, 2019

About Rmd file

Text

You can embed an R code chunk like this:

You can also embed plots, for example:

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

knitr settings to control how R chunks work.

The most important template is

Univariate Summaries

Numerical Summaries for one variable

Graphical Summaries one variable

Bivariate Summaries

Categorical variable vs. categorical variable

Numerical summaries of two variables

Quantitative variable vs. quantitative variable

i1 average number of drinks consumed per day in past 30 days

Categorical Variable vs. Quantitative Variable

Categorical Variable vs. Categorical Variable

Numerical summaries

Cross Tabulations

Graphical summaries two variables

Tips

Replace summary name by plot name

add groups = group to overlay

use y~x|z to create multipanel plots

Some other generic functions, that will come in handy as we progress in the course

Mosaic package includes datasets, xtras: xchisq.test(data name),xpnorm(),mplot(),xqqmath

Modelling

***************************

Your turn

****************************

Intro to Stats with bare minmum R using package Mosaic

TZaihra

Sept, 2019

About Rmd file

Text

You can embed an R code chunk like this:

You can also embed plots, for example:

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

knitr settings to control how R chunks work.

The most important template is

Univariate Summaries

Numerical Summaries for one variable

Graphical Summaries one variable

Bivariate Summaries

Categorical variable vs. categorical variable

Numerical summaries of two variables

Quantitative variable vs. quantitative variable

i1 average number of drinks consumed per day in past 30 days

Categorical Variable vs. Quantitative Variable

Categorical Variable vs. Categorical Variable

Numerical summaries

Cross Tabulations

Graphical summaries two variables

Tips

Replace summary name by plot name

add groups = group to overlay

use y~x|z to create multipanel plots

Some other generic functions, that will come in handy as we progress in the course

Mosaic package includes datasets, xtras: xchisq.test(data name),xpnorm(),mplot(),xqqmath

Modelling

***************************

Your turn

****************************

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.