Michael S. Czahor
Introduction
In class we saw that the XML package had great functionalities in regard to data import from websites. For this particular homework we will be writing a script based up code which we used in class for analyzing statistics for different players over multiple seasons. With the help of the ggplot2 package we will able to create a nice visualization of the data of the course of each season.
Install Proper Package and Library
install.packages("XML")
library("XML")
Choose Favorite Team
For this homework I will use data from the Philadelphia Phillies which is my favorite baseball team.
Import Team Data
PHI11<- readHTMLTable("http://www.baseball-reference.com/teams/PHI/2011.shtml")$team_batting
PHI12<- readHTMLTable("http://www.baseball-reference.com/teams/PHI/2012.shtml")$team_batting
PHI13<- readHTMLTable("http://www.baseball-reference.com/teams/PHI/2013.shtml")$team_batting
Create a Year for Each set of Data
PHI11$Year=2011
PHI12$Year=2012
PHI13$Year=2013
Use rbind to combine Data Sets
PHI<-rbind(PHI11,PHI12,PHI13)
A Different Method Using Functions
This will show a better way to do what we just did in the binding section above.
searchseason<-function(year, team){
url<-sprintf("http://www.baseball-reference.com/teams/%s/%d.shtml", team, year)
tab<-readHTMLTable(url)$team_batting
tab$Year<-year
return(tab)
}
searchseason(2011, "PHI") #Outputs the 2011 Statistics for the Phillies
1986 until 2013
The directions in the Homework ask us to use a larger set of seasons. So using the basic methodology already addressed, the code below will collect stats from season 1986 until 2013.
Binding Code
team<-NULL
for (year in 1986:2013){
team<-rbind(team, searchseason(year, team="PHI"))
}
summary(team)
head(team) #Proof That we Binded all Data
Left Handed, Right Handed, or Both
The next part of the assignment was to deal with the Handedness of each player, which was discussed in class in regard to approaching this problem.
Represent Handedness through character Operators
team$Handedness<-"right"
team$Handedness[grep("\\*", team[,3])]<-"left"
team$Handedness[grep("#", team[,3])]<-"both"
Verify That the Above Handedness Code Worked
Below will be a summary statement that will show that this worked. If a player has a “*” he is left handed. If a player has nothing he is right handed. If a player has a “#” he is able to bat from both sides of the plate.
summary(team)
Removing String
We will shift the columns accordingly and manipulate strings as directed
team<-subset(team, Rk!="Rk")
write.csv(team, "team2.csv", row.names=FALSE)
team<-read.csv("team2.csv")
summary(team)
names(team)[3]<-"Name"
team<-team[,c(1:3,30,29,4:28)]
summary(team)
Using ggplot2 Graphics to Display Data
library(ggplot2)
qplot(Year, OPS, data=team, na.rm=TRUE)
ggplot(team, aes(x=Year, y=OPS, color=Name)) + geom_point() + geom_line()
ggplot(team, aes(x=Year, y=OPS, group=Name)) + geom_point() + geom_line()
ggplot(team, aes(x=Year, y=OPS, color=Name, group=Name)) + geom_point() + geom_line()
Part 2
For this part I will be looking at this best hitters for the 2013 mlb season.
Top<-readHTMLTable("http://espn.go.com/mlb/stats/batting/_/year/2013/seasontype/2")[[1]]
head(Top)
Now that we have extracted the best hitters of the 2013 season, lets do it for all seasons since 2000.
newyear<-function(year){
url<-sprintf("http://espn.go.com/mlb/stats/batting/_/year/%d/seasontype/2", year)
tab<-readHTMLTable(url)[[1]]
tab$Year<-year
return(tab)
}
Topplayers<-NULL
for (year in 2000:2013){
Topplayers<-rbind(Topplayers,newyear(year))
}
summary(Topplayers)
write.csv(Topplayers, "team3.csv")
Topplayers<-read.csv("team3.csv")
summary(Topplayers)