Denver 2013 Offense, Standard Deviations, R Markdown

This a quick post to:

All of the data used in these plots come from this page. Great source of standard NFL stats, as well as point spreads and totals.

I'll let the comments in the code describe most of what's going on, but first read the data into the df data frame:

## read file into data frame
df <- read.csv("http://www.repole.com/sun4cast/stats/nfl2013stats.csv", stringsAsFactors = FALSE)

# limit to columns needed
df <- df[, c("Date", "TeamName", "ScoreOff")]

# for plot create group of Denver and Other
df$group <- c("Other")
df[df$TeamName == "Denver Broncos", "group"] <- c("Denver")
head(df)
##         Date          TeamName ScoreOff  group
## 1 09/05/2013  Baltimore Ravens       27  Other
## 2 09/05/2013    Denver Broncos       49 Denver
## 3 09/08/2013 Arizona Cardinals       24  Other
## 4 09/08/2013   Atlanta Falcons       17  Other
## 5 09/08/2013     Buffalo Bills       21  Other
## 6 09/08/2013 Carolina Panthers        7  Other

Using the plyr package create a data frame scores showing regular season cumulative point totals by Team:

## sum of scores for each team load plyr package
library("plyr")

# sum scores by team to get regular season total
scores <- ddply(df, .(TeamName), summarize, ScoreOff = sum(ScoreOff))

# apply Denver and Other grouping
scores$group <- c("Other")
scores[scores$TeamName == "Denver Broncos", "group"] <- c("Denver")

# create a rank column (sorted ascending) for plot sorting
scores <- scores[order(scores$ScoreOff), ]
scores$scoreRank <- 1:nrow(scores)
tail(scores)
##                TeamName ScoreOff  group scoreRank
## 16   Kansas City Chiefs      430  Other        27
## 9        Dallas Cowboys      439  Other        28
## 24  Philadelphia Eagles      442  Other        29
## 19 New England Patriots      444  Other        30
## 6         Chicago Bears      445  Other        31
## 10       Denver Broncos      606 Denver        32

Now use the ggplot2 package to look at a simple bar plot showing total points by team:

library("ggplot2")
## Warning: package 'ggplot2' was built under R version 2.15.2
p1 <- ggplot(scores, aes(x = factor(scoreRank), y = ScoreOff, fill = factor(group))) + 
    geom_bar(stat = "identity") + theme(axis.title.y = element_blank(), axis.title.x = element_blank(), 
    legend.position = "none") + coord_flip() + scale_x_discrete(labels = scores$TeamName) + 
    geom_hline(yintercept = mean(scores$ScoreOff), linetype = "dashed", colour = "#6666FF", 
        size = 1) + geom_text(x = 3, y = mean(scores$ScoreOff), label = "mean", 
    colour = "#6666FF", angle = 270, vjust = 1) + ggtitle("2013 Total Points by Team")
p1

plot of chunk unnamed-chunk-3

Record breaking year from Denver. Now every score of the season with Denver highlighted. The solid line is the mean score of the 512 values in the data set. The dashed lines are one standard deviation above and below the mean:

p2 <- ggplot(df, aes(x = Date, y = ScoreOff, colour = factor(group))) + geom_jitter(position = position_jitter(width = 0.5, 
    height = 0), size = 4, alpha = 7/10) + theme(axis.text.x = element_text(angle = 90), 
    legend.title = element_blank(), axis.title.y = element_blank()) + geom_hline(yintercept = mean(df$ScoreOff)) + 
    geom_hline(yintercept = mean(df$ScoreOff) + sd(df$ScoreOff), linetype = "dashed") + 
    geom_hline(yintercept = mean(df$ScoreOff) - sd(df$ScoreOff), linetype = "dashed") + 
    ggtitle("2013 Total Scores by Date")
p2

plot of chunk unnamed-chunk-4

Each Denver point on the plot should be clear, if not here's the breakdown:

Lastly here's a look at just Denver's scores with mean and standard deviation lines:

# data frame for denver
den <- df[df$TeamName == "Denver Broncos", ]

p3 <- ggplot(den, aes(x = Date, y = ScoreOff)) + geom_point(size = 4, colour = "#F8766D") + 
    theme(axis.text.x = element_text(angle = 90), legend.title = element_blank(), 
        axis.title.y = element_blank(), axis.title.x = element_blank()) + geom_hline(yintercept = mean(den$ScoreOff)) + 
    ylim(c(0, 60)) + geom_hline(yintercept = mean(den$ScoreOff) + sd(den$ScoreOff), 
    linetype = "dashed") + geom_hline(yintercept = mean(den$ScoreOff) - sd(den$ScoreOff), 
    linetype = "dashed") + ggtitle("Denver 2013 Points by Week")
p3

plot of chunk unnamed-chunk-5

When comparing Denver to itself the mean and standard deviation picture is much different. An obvious point perhaps but a good reminder if (like me) you frequently deal with data sets where there is only one team/organization/object being measured with no external benchmarks or peer comparisons readily available.