Assignment 4

By Cherish Ashby

Question 1

A. In the IMDB, there are listings of full cast members for movies. Navigate to http://www.imdb.com/title/tt1201607/fullcredits?ref_=tt_ql_1. Feel free to View Source to get a good idea of what the page looks like in code.

Right clicked document to showcase html text within URL

B. Scrape the page with any R package that makes things easy for you. Of particular interest is the table of the Cast in order of crediting. Please scrape this table (you might have to fish it out of several of the tables from the page) and make it a data.frame() of the Cast in your R environment

Download “SelectorGator” and once installed go to url and select what is need “cast in green and other aspects in red to get the correct XPath

Or

Right click “inspect” on webpage under “Elements” go to the correct code and “right click” and “copy X Path”

C. Clean up Table in RStudio

HPCast <- data.table(fullcredits)
HPCast
> HPCast <- HPCast[-c(1,3)]
HPCast

Update to

HPCastUpdate = data.frame(HPCast)
names(HPCastUpdate) = c("Actor", "Character")
HPCastUpdate

Removed first blank row and “rest of cast listed alphabetically”

HPCastUpdate <- HPCastUpdate[-c(1, 93),]

D. Split the Actor’s name into two columns

HPCastUpdate <-  HPCastUpdate %>% separate(Actor,
        c("FirstName", "Surname"))

E. First 10 rows of data.frame

MyHPCastFirst10

Question 2

A. On the ESPN website, there are statistics of each NBA player. Navigate to the San Antonio Spurs current statistics (likely http://www.espn.com/nba/team/stats/_/name/sa/san-antonio-spurs). You are interested in the Shooting Statistics table.

Right clicked area that you want and select “inspect” and in “Elements” select the code that corresponds to the table “shooting statistics”

B. Scrape the page with any R package that makes things easy for you. There are a few tables on the page, so make sure you are targeting specifically the Shooting Statistics table.

URL <- "http://www.espn.com/nba/team/stats/_/name/sa/san-antonio-spurs"
Spurs <- read_html(URL)
Spurs
Spurs %>%
html_nodes(xpath = '//*[@id="my-players-table"]/div[3]/div[3]/table') %>%
html_table(Spurs) 
Spurs <- html_table(Spurs)
Spurs
SpursDataFrame <- data.frame(Spurs)
SpursDataFrame

C.Clean up the table (You might get some warnings if you’re working with tibbles)

Rows deleted from data.frame

SpursDataFrame <- SpursDataFrame[-c(1,17),]

New col for Position

SpursGoalPercent$Position <- SpursGoalPercent$Player
SpursGoalPercent$Position

D.Create a colorful bar chart that shows the Field Goals Percentage Per Game for each person. It will be graded on the following criteria.

x = SpursChart$PLAYER
y = SpursChart$`FG%`
SpursGoalPercent = data.frame(SpursGoalPercent)
names(SpursGoalPercent) = c("Player", "FieldGoal%")

Final Chart “Spurs Field Goal Per Player”

barplot(PlayersSpurs$`Goal%`,
+ main = "Spurs Field Goal%", ylab="FG%", xlab="Players", names.arg=c(PlayersSpurs$Player), col = rainbow(14))