Watching the NBA games is the best thing people hangout in the weekend.Without doubts, we all admire the NBA player not only to their basketballs skills but also to their high income. But how did the team pay those players salary? By the wining experiences? the shots? It too complicated to all attibute to one factors. Also, different teams and different positions might have different consideration. So, if I were the manager of the team, how do I measure the salary in the reasonable method? Thus, I am going to collect the data from NBA official website to discover the major factors that affect the player’s salary.

I glanced the data from NBA official website. Just as most NBA fans know, LaBron James ranked the top of the performance among all the NBA players.Also, I browsed the top 10 players, they are famous(which means they mostly signed the contract with Sport Goods company and got the extra income out of Sports) and reflected their so-called salary. Yet, still, I cannot directly measure the salary by observing this seasonal performance. Thus, I would like to know the relatoinship between salary and all the perforamce variables and I wish I would able to predict the furue salary.

I acquired the data set from the NBA official website. The dataset contents 509 observations and 32 variables. It only showed the abbrief name of the variables, thus we need to check the definition of the variables. Also, I acquired the salary data from the website

I selected only the numerical data to narrow down the data size and clarify the data. Second, I conduct the exploratory data analysis. I deploy the independent variables to against the dependent variable(salary). By comparing the paired chart, I keep the variable which is more linear. For example, the plot number six is more scattered than plot number 12.Thus, I would consider that the 12th variable would have more relationship to salary.

I also conduct the hypothesis test. I assume that H0 is the mean of our dataset is in the range of 0.95 and the H1 is the mean of our dataset is not in the range of 0.95. As result, t-value is 3.43 the df value is 508, and the p-value = 1. Thus, the alternative hypothesis is the true mean is not equal to 5555738. It is type one error.

After previous comparision, compare, and test, I selected few variables and apply the “gmodel” to calculate the best multi-linear regression model. The model is Salary = 919066 +17896FG +7123TRB+11433ast + (-19909)tov We may observe the ANOVA for percentage of explained variance. However, after several times try-out, I found that this model is not accuratly reflect the salary because the the heterocedasticity of the data.In conclusion, I should collect more supportive information to refine the model.