First we load the appropriate packages as follows and then download the csv file from github. Then, we replace all of the empty spaces with NA and remove all of the rows that contain the column names as well as the bottom row which contains the total for each team.
library(RCurl)
## Loading required package: bitops
library(tidyr)
## Warning: package 'tidyr' was built under R version 3.2.3
##
## Attaching package: 'tidyr'
##
## The following object is masked from 'package:RCurl':
##
## complete
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.2.2
##
## Attaching package: 'dplyr'
##
## The following objects are masked from 'package:stats':
##
## filter, lag
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
url = "https://raw.githubusercontent.com/cyadusha/NBA/master/leagues_NBA_wins_active.csv"
x = getURL(url)
NBA = read.csv(file = textConnection(x), header = TRUE, na.strings = c("", "NA"))
NBA = subset(NBA, Rk != "Rk")
NBA = subset(NBA, Season != "Total")
Because we do not need those two columns, we remove the columns containing the Rank and the League Names.
NBA = subset(NBA, select = -Rk)
NBA = subset(NBA, select = -Lg)
Then we replace the initials for each team with the appropriate team name. To facilitate this, we get the number of columns. The first column is titled Season. This means that we do not need to replace that column title with anything. Therefore, we come up with new column names for columns 2 through 31. We will use column names that do not have any spaces.
ncol(NBA)
## [1] 31
colnames(NBA)[2:31] = c("Hawks", "Celtics", "Nets", "Bulls", "Hornets", "Cavaliers", "Mavericks", "Nuggets", "Pistons", "Warriors", "Rockets", "Pacers", "Clippers", "Lakers", "Grizzlies", "Heat", "Bucks", "Timberwolves", "Pelicans", "Knicks", "Thunder", "Magic", "76ers", "Suns", "Blazers", "Kings", "Spurs", "Raptors", "Jazz", "Wizards")
Now, we gather all of the team names into one column and title our new dataset as follows. We replace all of the null values with 0. For the last column to be read as a vector of integers, we use the as.integer command as follows.
NBA = NBA %>% gather(Season, Team, Hawks:Wizards, na.rm = F)
## Warning: attributes are not identical across measure variables; they will
## be dropped
colnames(NBA) = c("Season", "Team", "Winnings")
NBA[is.na(NBA)] = 0
NBA$Winnings = c(as.integer(NBA$Winnings))
Now, we sum the number of winnings for each team and group them as follows.
totalNBA = NBA %>% group_by(Team) %>% summarise(Winnings = sum(Winnings))
In order for all of the appropriate information to be displayed, me convert the data set into a data frame as follows.
data.frame(totalNBA)
## Team Winnings
## 1 76ers 2724
## 2 Blazers 1980
## 3 Bucks 1975
## 4 Bulls 2103
## 5 Cavaliers 1713
## 6 Celtics 3210
## 7 Clippers 1456
## 8 Grizzlies 688
## 9 Hawks 2635
## 10 Heat 1156
## 11 Hornets 900
## 12 Jazz 1803
## 13 Kings 2455
## 14 Knicks 2694
## 15 Lakers 3230
## 16 Magic 1054
## 17 Mavericks 1473
## 18 Nets 1356
## 19 Nuggets 1554
## 20 Pacers 1593
## 21 Pelicans 521
## 22 Pistons 2603
## 23 Raptors 718
## 24 Rockets 2041
## 25 Spurs 1990
## 26 Suns 2114
## 27 Thunder 2126
## 28 Timberwolves 837
## 29 Warriors 2571
## 30 Wizards 1993
In order to obtain the maximum total number of winnings in the franchise history, we have to use the which.max command. We also want the name of the team who has had the maximum total number of winnings. For that reason we use square brackets as well as a comma so that we can get the entire row of data.
totalNBA[which.max(totalNBA$Winnings),]
## Source: local data frame [1 x 2]
##
## Team Winnings
## (chr) (int)
## 1 Lakers 3230
The team with the maximum total number of winnings in the franchise history is the Lakers. They have won 3,320 games thus far.
Now if we want the maximum wins by a team over the time span, we use the following command. This will take the number of wins for each team during each season and select the maximum number of wins as well as return the season where the team won the most games.
totalNBA2 = NBA %>% group_by(Team) %>% summarise(Season = Season[which.max(Winnings)], Winnings = max(Winnings))
We use the data frame command again as follows.
data.frame(totalNBA2)
## Team Season Winnings
## 1 76ers 1966-67 68
## 2 Blazers 1990-91 63
## 3 Bucks 1970-71 66
## 4 Bulls 1995-96 72
## 5 Cavaliers 2008-09 66
## 6 Celtics 1972-73 68
## 7 Clippers 2013-14 57
## 8 Grizzlies 2012-13 56
## 9 Hawks 2014-15 60
## 10 Heat 2012-13 66
## 11 Hornets 1996-97 54
## 12 Jazz 1996-97 64
## 13 Kings 2001-02 61
## 14 Knicks 1992-93 60
## 15 Lakers 1971-72 69
## 16 Magic 1995-96 60
## 17 Mavericks 2006-07 67
## 18 Nets 2001-02 52
## 19 Nuggets 2012-13 57
## 20 Pacers 2003-04 61
## 21 Pelicans 2007-08 56
## 22 Pistons 2005-06 64
## 23 Raptors 2014-15 49
## 24 Rockets 1993-94 58
## 25 Spurs 2005-06 63
## 26 Suns 2004-05 62
## 27 Thunder 1995-96 64
## 28 Timberwolves 2003-04 58
## 29 Warriors 2014-15 67
## 30 Wizards 1974-75 60
Now if we want to obtain the maximum number of wins by a team over the time span, all we have to do is just obtain the row from the original dataset where the number of winnings is maximum.
NBA[which.max(NBA$Winnings),]
## Season Team Winnings
## 231 1995-96 Bulls 72
The team who won the maximum number of games over the time span is the Chicago Bulls in 1995-1996. They won 72 games in that season.