August 29, 2023

Objectives

  • Provide History and Overview of R
  • Introduce basic commands in R
  • Introduce R Script and R Markdown
  • Install some R packages
  • Illustrate: generate R data, data in R, and Export Excel Data in R

History and Overview of R

  • R is an independent open source implementation of a statistical analysis system developed by Ross Ihaka and Robert Gentleman at the University of Auckland in 1995.
  • R can be used both as a programming language, and as a piece of software. It can be used for data manipulation, calculation and graphical display.
  • One of the biggest advantage of R is that it can be distributed for free.
  • R is freely downloaded in the internet

The R Installation

  • Obtain a copy of an R language installer from a dependable source or directly from the Internet. The URL is http://cran.r-project.org/
  • The latest version of R is 4.0.5
  • Once the installation is done, start R by clicking the Desktop icon for R

The R Console

  • Along the top of the window is a limited set of menus, which can be used for various tasks including opening, loading and saving script windows, loading and saving your workspace, and installing packages.
  • When you open an R session (i.e. start the R program), the R console opens and you are presented with a screen like this:

The R Console

The R Logo

The RStudio

  • RStudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.
  • RStudio is available in open source and commercial editions and runs on the desktop (Windows, Mac, and Linux).
  • You can download the latest version of RStudio at https://www.rstudio.com/products/rstudio/

The RStudio

The RStudio

Statistics and Probability Lessons with R

*Lesson: Mean and Variance of a Random Variable

Question: What is the expected number of heads in tossing a coin twice?

x<-c(0, 1, 2)
p<-c(.25, .5, .25)
coin<-rbind(x, p)
xbar<-sum(x*p)
xbar
## [1] 1

Statistics and Probability Lessons with R

*Lesson: Mean and Variance of a Random Variable

Question: What is variance number of headsin tossing a coin twice?

x<-c(0, 1, 2)
p<-c(.25, .5, .25)
coin<-rbind(x, p)
xbar<-sum(x*p)
xbarobs<-c(1, 1, 1)
variance<-sum(p*(x-xbarobs)^2)
variance
## [1] 0.5

Statistics and Probability Lessons with R

*Lesson: Binomial Probability Distribution

Question: You flip a fair coin 5 times, what is the probability of getting 4 or 5 heads?

bn4<-dbinom(4, 5, 0.5)
bn5<-dbinom(5, 5, 0.5)
bntotal<-bn4+bn5
bntotal
## [1] 0.1875

Statistics and Probability Lessons with R

*Lesson: Binomial Probability Distribution

Or using the pbinom function:

bntotalalt<-1-pbinom(3, 5, .5)
bntotalalt
## [1] 0.1875

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Question: Suppose that diastolic blood pressures (DBPs) from men aged 30-44 are normally distributed with a mean of 85mmHg and a standard deviation of 10 mmHg. What is the probability that a random 30-44 year old has a DBP less than 80?

pnorm(80,mean=85,sd=10,lower.tail = TRUE)
## [1] 0.3085375

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Question: Brain volume for adult men is normally distributed with a mean of about 1,100 cc with a standard deviation of 70 cc. What brain volume represents the 95th percentile ?

qnorm(0.95,mean=1100,sd=70, lower.tail = TRUE)
## [1] 1215.14

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Refer to previous example: Brain volume for adult men is normally distributed with a mean of about 1,100 cc with a standard deviation of 70 cc. Consider the sample mean of 100 random adult men from this population. What is th 95th percentile of the distribution of the sample mean?

Statistics and Probability Lessons with R

*Lesson: Normal Distribution

Note: As the number of people is large enough, we can consider that the sample mean follows a normal distribution where the population mean is 1100, population standard deviation is 70 and n=100.

qnorm(0.95,mean=1100,sd=70/10,lower.tail = TRUE)
## [1] 1111.514

Statistics and Probability Lessons with R

*Estimation and Hypothesis Testing

Question: In a population of interest, a sample of 9 men yielded a sample average brain volume of 1,100cc and a standard deviation of 30cc. What is a 95% Student’s T confidence interval for the mean brain volume in this new population?

Statistics and Probability Lessons with R

*Estimation and Hypothesis Testing

n<-9
mu<-1100
st.dev<-30
quantile = 0.975 # is 95% with 2.5% on both sides of the range
conf= mu + c(-1, 1) * qt(quantile, df=n-1) * st.dev/sqrt(n)
conf
## [1] 1076.94 1123.06

Basic R commands

Can be used as an interactive calculator

Addition

5+7
## [1] 12

Subtraction

10-5
## [1] 5

Storing result to a variable

x<-5+7

Now,

x
## [1] 12

Introduction on R Script and R Markdown

Installing packages

Working with data in R, generated data, and excel data

Reference

Thank you and God bless