R Programming 101

UTSA Department of Management Science & Statistics
March 5, 2016

What is R?

R is a free programming language and software environment for statistical computing and graphics
Developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand in 1993.

Why Learn R?

One of the most widely used statistical software programs
FREE!
Many online resources available
Comprehensive R library and powerful visualization tools
all-in-one functionality (presentations, reports, interactive web applications, etc.)

Visualizations in R: Plots

alt text-align

Visualizations in R: Volcano

alt text-align

Visualizations in R: Word Clouds

alt text-align

Visualizations in R: Maps

alt text-align

Simulation: Coin Flipping

  #Step 1: Flip coin ten times
  num=sample(0:1,10,replace=TRUE)
  num
  #Step 2: Total number of heads
  total=sum(num)
  total
  #Step 3: Compute proportion
  prop=mean(num)
  prop

Normal Distribution

alt text-align

Simulation: Coin Flipping (n=25)

props=c()
  #Step 0: The number of students
n=25
for (i in 1:n){ 
  #Step 1: Each student flips coin ten times
  num=sample(0:1,10,replace=TRUE)
  #Step 2: Total number of heads per student
  total=sum(num)
  #Step 3: Compute proportion
  prop=mean(num)
  #Step 4: Combine all students' results
  props=append(props,prop) 
}
mean(props)

[1] 0.464

Simulation: Coin Flipping (n=25)

hist(props)

Does your histogram look normal?

Simulation: Coin Flipping (n=100)

props=c()
  #Step 0: The number of students
n=100
for (i in 1:n){ 
  #Step 1: Each student flips coin ten times
  num=sample(0:1,10,replace=TRUE)
  #Step 2: Total number of heads per student
  total=sum(num)
  #Step 3: Compute proportion
  prop=mean(num)
  #Step 4: Combine all students' results
  props=append(props,prop) 
}
mean(props)

[1] 0.497

Simulation: Coin Flipping (n=100)

hist(props)

Does your histogram look normal?

Simulation: Coin Flipping (n=2000)

props=c()
  #Step 0: The number of students
n=2000
for (i in 1:n){ 
  #Step 1: Each student flips coin ten times
  num=sample(0:1,10,replace=TRUE)
  #Step 2: Total number of heads per student
  total=sum(num)
  #Step 3: Compute proportion
  prop=mean(num)
  #Step 4: Combine all students' results
  props=append(props,prop) 
}
mean(props)

[1] 0.5016

Simulation: Coin Flipping (n=2000)

hist(props,prob=TRUE)
curve(dnorm(x,mean=mean(props)-0.05,sd=sqrt(var(props))),col="blue", add=TRUE)

What about this one?

Simulation: Coin Flipping (n=2000)

plot of chunk unnamed-chunk-9

Data Analysis About You!

We collected your data in the morning session. Let's explore it!

Your Turn: What Will You Do?

Explore the collected data
Compute the summary statistics
Create simple plots
Do simple linear regression

Data Dictionary

gender : Male or Female
shoe : Shoe size
height : Height (in centimeters)

Reading in the Data

data=read.csv("http://pastebin.com/raw/MFYcN8Rk")

Exploring Data: View the Data

fix(data)

Typing the object name “data” will output the data to the R console.

The “fix” function also allows you to look at the data just like an Excel spreadsheet.

Exploring Data: Summary Statistics

summary(data)

Exploring Data: Distribution of Shoe Size (Histogram)

hist(data$shoe)

Exploring Data: Distribution of Shoe Size (Histogram)

hist(data$shoe,col="red",xlab="Shoe Size (in cm)",
     main="Histogram of Shoe Size")

Exploring Data: Distribution of Height (Histogram)

hist(data$height)

Exploring Data: Distribution of Height (Histogram)

hist(data$height,col="green",xlab="Height (in cm)",
     main="Histogram of Height")

Analysis: Who Is Taller? (Male or Female)

boxplot(data$height~data$gender,main="Boxplot",
        col=rainbow(2),xlab="Gender",ylab="Height")

Analysis: Who Has Bigger Feet? (Male or Female)

boxplot(data$shoe~data$gender,main="Boxplot",
        col=rainbow(2),xlab="Gender",ylab="Shoe Size")

Analysis: Are Shoe Size and Height Correlated?

#Fit the regression model
lm=lm(data$height~data$shoe)

#plot the relationship
plot(x=data$shoe,y=data$height,xlab="Shoe Size",
     ylab="Height",main="Height vs. Shoe Size")

#add the regression line
abline(lm,col="red")

Example: Michael Bay Films

alt text-align

Example: Michael Bay Films

alt text-align

Question: Is the number of explosions in the movie correlated to its profit?

Example: Michael Bay Films

	Movie.Title	Number.of.Explosions	Profit
1	The Island	16	162.95
2	Bad Boys	18	141.41
3	The Rock	22	335.06
4	Bad Boys 2	31	273.34
5	Armageddon	121	553.71
6	Pearl Harbor	162	449.22
7	Transformers	128	709.71
8	Transformers: Revenge of the Fallen	211	836.30
9	Transformers: Dark of the Moon	283	1119.13

Linear Regression: Profit vs. Number of Explosions

plot of chunk unnamed-chunk-21

Linear Regression: Intepretation of Results

	Estimate	Std. Error	t value	Pr(>\|t\|)
(Intercept)	154.3877	60.7288	2.54	0.0385
Number.of.Explosions	3.2171	0.4248	7.57	0.0001

Our analysis shows that the profit of a Michael Bay movie increases by 3.21 million dollars for each added explosion.

The effect is very significant.

Recap

Import and view data
Summary statistics
Visualizations (boxplot, pie chart, histogram)
Linear regression

Motivating Examples - 05 ShinyApp

Shiny Application Example

How Do You Learn More?

alt text-align

Question?

alt text-align