July 17, 2019

Introduction

  • History and Overview of R
  • RStudio
  • Sample R usage
  • Swirl Package in R
  • Incorporation of R in Statistics and Probability Lessons

The R Environment

  • R is an independent open source implementation of a statistical analysis system developed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1995.
  • R can be used both as a programming language, and as a piece of software. It can be used for data manipulation, calculation and graphical display.
  • One of the biggest advantage of R is that it can be distributed for free.
  • R is freely be downloaded in the internet

The R Installation

  • Obtain a copy of an R language installer from a dependable source or directly from the Internet. The URL is http://cran.r-project.org/
  • The latest version of R is 3.6.1
  • Once the installation is done, start R by clicking the Desktop icon for R

The R Console

  • Along the top of the window is a limited set of menus, which can be used for various tasks including opening, loading and saving script windows, loading and saving your workspace, and installing packages.
  • When you open an R session (i.e. start the R program), the R console opens and you are presented with a screen like this:

The R Console

  • The logo or R

The RStudio

  • RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
  • RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux).
  • You can download the latest version of RStudio at https://www.rstudio.com/products/rstudio/

The RStudio

The “swirl” Package

  • It is called Statistics with Interactive R Learning or SWIRL for short.
  • Using swirl package in R illustrates some key concepts in using R.
  • The swirl package turns the R console into an interactive learning environment.
  • Using swirl will also give you the opportunity to be completely immersed in an authentic R programming environment.

swirl in R

  • Install swirl: install.packages(“swirl”)
  • Load swirl: library(“swirl”)

Install the swirl courses

  • install_from_swirl(“R Programming”)
  • install_from_swirl(“Exploratory Data Analysis”)
  • install_from_swirl(“Getting and Cleaning Data”)
  • install_from_swirl(“Statistical Inference”)
  • install_from_swirl(“Regression Models”)

swirl in R

library("swirl")
## 
## | Hi! Type swirl() when you are ready to begin.
  • swirl()

Statistics and Probability Lessons with R

*Lesson: Mean and Variance of a Random Variable

Question: What is the expected number of heads in tossing a coin twice?

x<-c(0, 1, 2)
p<-c(.25, .5, .25)
coin<-rbind(x, p)
xbar<-sum(x*p)
xbar
## [1] 1

Statistics and Probability Lessons with R

*Lesson: Mean and Variance of a Random Variable

Question: What is variance number of headsin tossing a coin twice?

x<-c(0, 1, 2)
p<-c(.25, .5, .25)
coin<-rbind(x, p)
xbar<-sum(x*p)
xbarobs<-c(1, 1, 1)
variance<-sum(p*(x-xbarobs)^2)
variance
## [1] 0.5

Statistics and Probability Lessons with R

*Lesson: Binomial Probability Distribution

Question: You flip a fair coin 5 times, what is the probability of getting 4 or 5 heads?

bn4<-dbinom(4, 5, 0.5)
bn5<-dbinom(5, 5, 0.5)
bntotal<-bn4+bn5
bntotal
## [1] 0.1875

Statistics and Probability Lessons with R

*Lesson: Binomial Probability Distribution

Or using the pbinom function:

bntotalalt<-1-pbinom(3, 5, .5)
bntotalalt
## [1] 0.1875

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Question: Suppose that diastolic blood pressures (DBPs) from men aged 30-44 are normally distributed with a mean of 85mmHg and a standard deviation of 10 mmHg. What is the probability that a random 30-44 year old has a DBP less than 80?

pnorm(80,mean=85,sd=10,lower.tail = TRUE)
## [1] 0.3085375

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Question: Brain volume for adult men is normally distributed with a mean of about 1,100 cc with a standard deviation of 70 cc. What brain volume represents the 95th percentile ?

qnorm(0.95,mean=1100,sd=70, lower.tail = TRUE)
## [1] 1215.14

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Refer to previous example: Brain volume for adult men is normally distributed with a mean of about 1,100 cc with a standard deviation of 70 cc. Consider the sample mean of 100 random adult men from this population. What is th 95th percentile of the distribution of the sample mean?

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Note: As the number of people is large enough, we can consider that the sample mean follows a normal distribution where the population mean is 1100, population standard deviation is 70 and n=100.

qnorm(0.95,mean=1100,sd=70/10,lower.tail = TRUE)
## [1] 1111.514

Statistics and Probability Lessons with R

*Estimation and Hypothesis Testing

Question: In a population of interest, a sample of 9 men yielded a sample average brain volume of 1,100cc and a standard deviation of 30cc. What is a 95% Student’s T confidence interval for the mean brain volume in this new population?

Statistics and Probability Lessons with R

*Estimation and Hypothesis Testing

n<-9
mu<-1100
st.dev<-30
quantile = 0.975 # is 95% with 2.5% on both sides of the range
conf= mu + c(-1, 1) * qt(quantile, df=n-1) * st.dev/sqrt(n)
conf
## [1] 1076.94 1123.06

Statistics and Probability Lessons with R

*Estimation and Hypothesis Testing

Using the mtcars dat as shown below

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Statistics and Probability Lessons with R

Question: Is there a significant difference on the mpg between manual and automatic transmissions at 5% level of significance?

t.test(mpg~am, data=mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group 0 mean in group 1 
##        17.14737        24.39231

Reference

Thank you and God bless