This a quick post to:
All of the data used in these plots come from this page. Great source of standard NFL stats, as well as point spreads and totals.
I'll let the comments in the code describe most of what's going on, but first read the data into the df data frame:
## read file into data frame
df <- read.csv("http://www.repole.com/sun4cast/stats/nfl2013stats.csv", stringsAsFactors = FALSE)
# limit to columns needed
df <- df[, c("Date", "TeamName", "ScoreOff")]
# for plot create group of Denver and Other
df$group <- c("Other")
df[df$TeamName == "Denver Broncos", "group"] <- c("Denver")
head(df)
## Date TeamName ScoreOff group
## 1 09/05/2013 Baltimore Ravens 27 Other
## 2 09/05/2013 Denver Broncos 49 Denver
## 3 09/08/2013 Arizona Cardinals 24 Other
## 4 09/08/2013 Atlanta Falcons 17 Other
## 5 09/08/2013 Buffalo Bills 21 Other
## 6 09/08/2013 Carolina Panthers 7 Other
Using the plyr package create a data frame scores showing regular season cumulative point totals by Team:
## sum of scores for each team load plyr package
library("plyr")
# sum scores by team to get regular season total
scores <- ddply(df, .(TeamName), summarize, ScoreOff = sum(ScoreOff))
# apply Denver and Other grouping
scores$group <- c("Other")
scores[scores$TeamName == "Denver Broncos", "group"] <- c("Denver")
# create a rank column (sorted ascending) for plot sorting
scores <- scores[order(scores$ScoreOff), ]
scores$scoreRank <- 1:nrow(scores)
tail(scores)
## TeamName ScoreOff group scoreRank
## 16 Kansas City Chiefs 430 Other 27
## 9 Dallas Cowboys 439 Other 28
## 24 Philadelphia Eagles 442 Other 29
## 19 New England Patriots 444 Other 30
## 6 Chicago Bears 445 Other 31
## 10 Denver Broncos 606 Denver 32
Now use the ggplot2 package to look at a simple bar plot showing total points by team:
library("ggplot2")
## Warning: package 'ggplot2' was built under R version 2.15.2
p1 <- ggplot(scores, aes(x = factor(scoreRank), y = ScoreOff, fill = factor(group))) +
geom_bar(stat = "identity") + theme(axis.title.y = element_blank(), axis.title.x = element_blank(),
legend.position = "none") + coord_flip() + scale_x_discrete(labels = scores$TeamName) +
geom_hline(yintercept = mean(scores$ScoreOff), linetype = "dashed", colour = "#6666FF",
size = 1) + geom_text(x = 3, y = mean(scores$ScoreOff), label = "mean",
colour = "#6666FF", angle = 270, vjust = 1) + ggtitle("2013 Total Points by Team")
p1
Record breaking year from Denver. Now every score of the season with Denver highlighted. The solid line is the mean score of the 512 values in the data set. The dashed lines are one standard deviation above and below the mean:
p2 <- ggplot(df, aes(x = Date, y = ScoreOff, colour = factor(group))) + geom_jitter(position = position_jitter(width = 0.5,
height = 0), size = 4, alpha = 7/10) + theme(axis.text.x = element_text(angle = 90),
legend.title = element_blank(), axis.title.y = element_blank()) + geom_hline(yintercept = mean(df$ScoreOff)) +
geom_hline(yintercept = mean(df$ScoreOff) + sd(df$ScoreOff), linetype = "dashed") +
geom_hline(yintercept = mean(df$ScoreOff) - sd(df$ScoreOff), linetype = "dashed") +
ggtitle("2013 Total Scores by Date")
p2
Each Denver point on the plot should be clear, if not here's the breakdown:
Lastly here's a look at just Denver's scores with mean and standard deviation lines:
# data frame for denver
den <- df[df$TeamName == "Denver Broncos", ]
p3 <- ggplot(den, aes(x = Date, y = ScoreOff)) + geom_point(size = 4, colour = "#F8766D") +
theme(axis.text.x = element_text(angle = 90), legend.title = element_blank(),
axis.title.y = element_blank(), axis.title.x = element_blank()) + geom_hline(yintercept = mean(den$ScoreOff)) +
ylim(c(0, 60)) + geom_hline(yintercept = mean(den$ScoreOff) + sd(den$ScoreOff),
linetype = "dashed") + geom_hline(yintercept = mean(den$ScoreOff) - sd(den$ScoreOff),
linetype = "dashed") + ggtitle("Denver 2013 Points by Week")
p3
When comparing Denver to itself the mean and standard deviation picture is much different. An obvious point perhaps but a good reminder if (like me) you frequently deal with data sets where there is only one team/organization/object being measured with no external benchmarks or peer comparisons readily available.