Michael S. Czahor

Baseball Homework 10

Introduction

In class we saw that the XML package had great functionalities in regard to data import from websites. For this particular homework we will be writing a script based up code which we used in class for analyzing statistics for different players over multiple seasons. With the help of the ggplot2 package we will able to create a nice visualization of the data of the course of each season.

Install Proper Package and Library

install.packages("XML")
library("XML")

Choose Favorite Team

For this homework I will use data from the Philadelphia Phillies which is my favorite baseball team.

Import Team Data

PHI11<- readHTMLTable("http://www.baseball-reference.com/teams/PHI/2011.shtml")$team_batting
PHI12<- readHTMLTable("http://www.baseball-reference.com/teams/PHI/2012.shtml")$team_batting
PHI13<- readHTMLTable("http://www.baseball-reference.com/teams/PHI/2013.shtml")$team_batting

Create a Year for Each set of Data

PHI11$Year=2011
PHI12$Year=2012
PHI13$Year=2013

Use rbind to combine Data Sets

PHI<-rbind(PHI11,PHI12,PHI13)

A Different Method Using Functions

This will show a better way to do what we just did in the binding section above.

searchseason<-function(year, team){
  url<-sprintf("http://www.baseball-reference.com/teams/%s/%d.shtml", team, year)
  tab<-readHTMLTable(url)$team_batting
  tab$Year<-year
  return(tab)
}

searchseason(2011, "PHI") #Outputs the 2011 Statistics for the Phillies

1986 until 2013

The directions in the Homework ask us to use a larger set of seasons. So using the basic methodology already addressed, the code below will collect stats from season 1986 until 2013.

Binding Code

team<-NULL
for (year in 1986:2013){
  team<-rbind(team, searchseason(year, team="PHI"))
}

summary(team)
head(team)   #Proof That we Binded all Data

Left Handed, Right Handed, or Both

The next part of the assignment was to deal with the Handedness of each player, which was discussed in class in regard to approaching this problem.

Represent Handedness through character Operators

team$Handedness<-"right"
team$Handedness[grep("\\*", team[,3])]<-"left"
team$Handedness[grep("#", team[,3])]<-"both"

Verify That the Above Handedness Code Worked

Below will be a summary statement that will show that this worked. If a player has a “*” he is left handed. If a player has nothing he is right handed. If a player has a “#” he is able to bat from both sides of the plate.

summary(team)

Removing String

We will shift the columns accordingly and manipulate strings as directed

team<-subset(team, Rk!="Rk")
write.csv(team, "team2.csv", row.names=FALSE)
team<-read.csv("team2.csv")
summary(team)
names(team)[3]<-"Name"
team<-team[,c(1:3,30,29,4:28)]
summary(team)

Using ggplot2 Graphics to Display Data

library(ggplot2)
qplot(Year, OPS, data=team, na.rm=TRUE)
ggplot(team, aes(x=Year, y=OPS, color=Name)) + geom_point() + geom_line()
ggplot(team, aes(x=Year, y=OPS, group=Name)) + geom_point() + geom_line()
ggplot(team, aes(x=Year, y=OPS, color=Name, group=Name)) + geom_point() + geom_line()

Part 2

For this part I will be looking at this best hitters for the 2013 mlb season.

Top<-readHTMLTable("http://espn.go.com/mlb/stats/batting/_/year/2013/seasontype/2")[[1]]
head(Top)

Now that we have extracted the best hitters of the 2013 season, lets do it for all seasons since 2000.

newyear<-function(year){
  url<-sprintf("http://espn.go.com/mlb/stats/batting/_/year/%d/seasontype/2", year)
  tab<-readHTMLTable(url)[[1]]
  tab$Year<-year
  return(tab)
}

Topplayers<-NULL
for (year in 2000:2013){
  Topplayers<-rbind(Topplayers,newyear(year))
}

summary(Topplayers)

write.csv(Topplayers, "team3.csv")
Topplayers<-read.csv("team3.csv")
summary(Topplayers)