Module 2 Probability

As mentioned previously, each time you open a new RStudio session, you need to run the following three commands.

require(mosaic)
require(openintro)
require(MASS)

In Module 2 we talk about nominal variables and create contingency tables to organize the counts of two different nominal variables.

First, we’ll read in the flightNYC dataset. This is a subset of the data available in the nycflights13 package.

flightNYC <- read.csv("http://www.math.usu.edu/cfairbourn/Stat2300/RStudioFiles/data/flightNYC.csv")

Use the View() function to look at the variables in the dataset. This data measures 6 different variables on a sample of 600 flights from the JFK and LaGuardia (LGA) airports in New York City during 2013. Three of these variables are nominal and three are interval.

Variable Type Description
origin nominal Originating airport (JFK or LGA)
airline nominal Carrier (Delta or United Airlines)
ontime nominal Did the flight depart on time (yes or no)
distance interval Distance of flight in miles
dep_delay interval Scheduled departure time - Actual departure time
arr_delay interval Scheduled arrival time - Actual arrival time
View(flightNYC)

We can easily create a barchart for each of the nominal variables: origin, airline, and ontime.

Watch how the code and the chart change as we add arguments to the barchart() function.

Basic barchart

barchart(flightNYC$origin)

Barchart with vertical bars

barchart(flightNYC$origin, # Specify the variable to graph
         horizontal=FALSE) # Change to vertical bars

Add a chart title

barchart(flightNYC$origin, # Specify the variable to graph
         horizontal = FALSE, # Change to vertical bars
         main = "Originating Airport") # Add chart label

Change the color of the bars

barchart(flightNYC$origin, # Specify the variable to graph
         horizontal = FALSE, # Change to vertical bars
         main = "Originating Airport", # Add chart label
         col = "skyblue4") #change bar color

Modify the y-axis label

barchart(flightNYC$origin, # Specify the variable to graph
         horizontal = FALSE, # Change to vertical bars
         main = "Originating Airport", # Add chart label
         col = "skyblue4", #change bar color 
         ylab = "Count") #change vertical axis label

Make a barchart for airline

barchart(flightNYC$airline, horizontal=FALSE, 
         main="Airline", col="skyblue4", ylab="Count")

Creating a contingency table

To create a contingency table, we use the xtabs() function. We need to specify which variables we want R to tabulate first, then indicate the data set. In the code below we create a contingency table for gender and broadband and store the result as contable.

contable <- xtabs(~airline + ontime, data = flightNYC)
#Read the output from your table
contable
##         ontime
## airline   no yes
##   Delta  145 328
##   United  42  77