R Programming 101

UTSA Department of Management Science & Statistics
March 5, 2016

What is R?

  • R is a free programming language and software environment for statistical computing and graphics
  • Developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand in 1993.

Why Learn R?

  • One of the most widely used statistical software programs
  • FREE!
  • Many online resources available
  • Comprehensive R library and powerful visualization tools
  • all-in-one functionality (presentations, reports, interactive web applications, etc.)

Visualizations in R: Plots

alt text-align

Visualizations in R: Volcano

alt text-align

Visualizations in R: Word Clouds

alt text-align

Visualizations in R: Maps

alt text-align

Simulation: Coin Flipping

  #Step 1: Flip coin ten times
  num=sample(0:1,10,replace=TRUE)
  num
  #Step 2: Total number of heads
  total=sum(num)
  total
  #Step 3: Compute proportion
  prop=mean(num)
  prop

Normal Distribution

alt text-align

Simulation: Coin Flipping (n=25)

props=c()
  #Step 0: The number of students
n=25
for (i in 1:n){ 
  #Step 1: Each student flips coin ten times
  num=sample(0:1,10,replace=TRUE)
  #Step 2: Total number of heads per student
  total=sum(num)
  #Step 3: Compute proportion
  prop=mean(num)
  #Step 4: Combine all students' results
  props=append(props,prop) 
}
mean(props)
[1] 0.464

Simulation: Coin Flipping (n=25)

hist(props)

Does your histogram look normal?

Simulation: Coin Flipping (n=100)

props=c()
  #Step 0: The number of students
n=100
for (i in 1:n){ 
  #Step 1: Each student flips coin ten times
  num=sample(0:1,10,replace=TRUE)
  #Step 2: Total number of heads per student
  total=sum(num)
  #Step 3: Compute proportion
  prop=mean(num)
  #Step 4: Combine all students' results
  props=append(props,prop) 
}
mean(props)
[1] 0.497

Simulation: Coin Flipping (n=100)

hist(props)

Does your histogram look normal?

Simulation: Coin Flipping (n=2000)

props=c()
  #Step 0: The number of students
n=2000
for (i in 1:n){ 
  #Step 1: Each student flips coin ten times
  num=sample(0:1,10,replace=TRUE)
  #Step 2: Total number of heads per student
  total=sum(num)
  #Step 3: Compute proportion
  prop=mean(num)
  #Step 4: Combine all students' results
  props=append(props,prop) 
}
mean(props)
[1] 0.5016

Simulation: Coin Flipping (n=2000)

hist(props,prob=TRUE)
curve(dnorm(x,mean=mean(props)-0.05,sd=sqrt(var(props))),col="blue", add=TRUE)

What about this one?

Simulation: Coin Flipping (n=2000)

plot of chunk unnamed-chunk-9

Data Analysis About You!

We collected your data in the morning session. Let's explore it!

Your Turn: What Will You Do?

  • Explore the collected data
  • Compute the summary statistics
  • Create simple plots
  • Do simple linear regression

Data Dictionary

  • gender : Male or Female
  • shoe : Shoe size
  • height : Height (in centimeters)

Reading in the Data

data=read.csv("http://pastebin.com/raw/MFYcN8Rk")

Exploring Data: View the Data

fix(data)

Typing the object name “data” will output the data to the R console.

The “fix” function also allows you to look at the data just like an Excel spreadsheet.

Exploring Data: Summary Statistics

summary(data)

Exploring Data: Distribution of Shoe Size (Histogram)

hist(data$shoe)

Exploring Data: Distribution of Shoe Size (Histogram)

hist(data$shoe,col="red",xlab="Shoe Size (in cm)",
     main="Histogram of Shoe Size")

Exploring Data: Distribution of Height (Histogram)

hist(data$height)

Exploring Data: Distribution of Height (Histogram)

hist(data$height,col="green",xlab="Height (in cm)",
     main="Histogram of Height")

Analysis: Who Is Taller? (Male or Female)

boxplot(data$height~data$gender,main="Boxplot",
        col=rainbow(2),xlab="Gender",ylab="Height")

Analysis: Who Has Bigger Feet? (Male or Female)

boxplot(data$shoe~data$gender,main="Boxplot",
        col=rainbow(2),xlab="Gender",ylab="Shoe Size")

Analysis: Are Shoe Size and Height Correlated?

#Fit the regression model
lm=lm(data$height~data$shoe)

#plot the relationship
plot(x=data$shoe,y=data$height,xlab="Shoe Size",
     ylab="Height",main="Height vs. Shoe Size")

#add the regression line
abline(lm,col="red")

Example: Michael Bay Films

alt text-align

Example: Michael Bay Films

alt text-align

Question: Is the number of explosions in the movie correlated to its profit?

Example: Michael Bay Films

Movie.Title Number.of.Explosions Profit
1 The Island 16 162.95
2 Bad Boys 18 141.41
3 The Rock 22 335.06
4 Bad Boys 2 31 273.34
5 Armageddon 121 553.71
6 Pearl Harbor 162 449.22
7 Transformers 128 709.71
8 Transformers: Revenge of the Fallen 211 836.30
9 Transformers: Dark of the Moon 283 1119.13

Linear Regression: Profit vs. Number of Explosions

plot of chunk unnamed-chunk-21

Linear Regression: Intepretation of Results

Estimate Std. Error t value Pr(>|t|)
(Intercept) 154.3877 60.7288 2.54 0.0385
Number.of.Explosions 3.2171 0.4248 7.57 0.0001

Our analysis shows that the profit of a Michael Bay movie increases by 3.21 million dollars for each added explosion.

The effect is very significant.

Recap

  • Import and view data
  • Summary statistics
  • Visualizations (boxplot, pie chart, histogram)
  • Linear regression

Motivating Examples - 05 ShinyApp

How Do You Learn More?

alt text-align

Question?

alt text-align