Summary Statistics
df = read.csv("~/Desktop/Nascar_data.csv")
attach(df)
summary(df)
## Driver Points Poles Wins
## Length:35 Min. : 192.0 Min. :0.0000 Min. :0
## Class :character 1st Qu.: 807.5 1st Qu.:0.0000 1st Qu.:0
## Mode :character Median : 937.0 Median :1.0000 Median :0
## Mean :1304.2 Mean :0.9429 Mean :1
## 3rd Qu.:2284.0 3rd Qu.:2.0000 3rd Qu.:1
## Max. :2403.0 Max. :3.0000 Max. :5
## Top.5 Top.10 Top2.5 Top6.10
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 1.500 1st Qu.: 4.50 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 4.000 Median :10.00 Median : 3.000 Median : 5.000
## Mean : 5.114 Mean :10.23 Mean : 4.114 Mean : 5.114
## 3rd Qu.: 8.500 3rd Qu.:16.00 3rd Qu.: 5.500 3rd Qu.: 8.000
## Max. :19.000 Max. :26.00 Max. :18.000 Max. :12.000
## Winnings....
## Min. :2271890
## 1st Qu.:3867200
## Median :4579860
## Mean :4705510
## 3rd Qu.:5517570
## Max. :8485990
## 'data.frame': 35 obs. of 9 variables:
## $ Driver : chr "Tony Stewart" "Carl Edwards" "Kevin Harvick" "Matt Kenseth" ...
## $ Points : int 2403 2403 2345 2330 2319 2304 2290 2287 2284 2284 ...
## $ Poles : int 1 3 0 3 1 0 1 1 0 3 ...
## $ Wins : int 5 1 4 3 3 2 0 3 1 1 ...
## $ Top.5 : int 9 19 9 12 10 14 4 13 5 9 ...
## $ Top.10 : int 19 26 19 20 14 21 12 18 14 17 ...
## $ Top2.5 : int 4 18 5 9 7 12 4 10 4 8 ...
## $ Top6.10 : int 10 7 10 8 4 7 8 5 9 8 ...
## $ Winnings....: int 6529870 8485990 6197140 6183580 5087740 6296360 4163690 5912830 5401190 5303020 ...
1) Predict winnings (Dependent Variable) – Indipendent Variables: Poles, Wins, Top5, top10
lm1 = lm(Winnings.... ~ Poles+Wins+Top.5+Top.10)
summary(lm1)
##
## Call:
## lm(formula = Winnings.... ~ Poles + Wins + Top.5 + Top.10)
##
## Residuals:
## Min 1Q Median 3Q Max
## -868477 -383997 -147456 396959 1059512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3140367 184229 17.046 < 2e-16 ***
## Poles -12939 107205 -0.121 0.90474
## Wins 13545 111226 0.122 0.90389
## Top.5 71629 50667 1.414 0.16773
## Top.10 117071 33433 3.502 0.00147 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 581400 on 30 degrees of freedom
## Multiple R-squared: 0.8205, Adjusted R-squared: 0.7966
## F-statistic: 34.28 on 4 and 30 DF, p-value: 8.619e-11
From the given output we can conclude that finishing in top 10 is the provides the best single predictor of winnings as the p-value 0.00147 is lower than the significance level of 0.05.
2) Indivisual significance:
From the regression output in part 1 – The p-values corrosponding to Poles (0.90), Wins(0.903), Top 5 (0.16), and Top 10 (0.0014). At the significance level of 0.05 only Top ten seems significant as the p-value < Alpha.
3) Regression Using dummys for top 2-5 & top 6-10
lm2 = lm(Winnings.... ~Poles+Wins+Top2.5+Top6.10)
summary(lm2)
##
## Call:
## lm(formula = Winnings.... ~ Poles + Wins + Top2.5 + Top6.10)
##
## Residuals:
## Min 1Q Median 3Q Max
## -868477 -383997 -147456 396959 1059512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3140367 184229 17.046 < 2e-16 ***
## Poles -12939 107205 -0.121 0.90474
## Wins 202245 90226 2.242 0.03254 *
## Top2.5 188700 34586 5.456 6.43e-06 ***
## Top6.10 117071 33433 3.502 0.00147 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 581400 on 30 degrees of freedom
## Multiple R-squared: 0.8205, Adjusted R-squared: 0.7966
## F-statistic: 34.28 on 4 and 30 DF, p-value: 8.619e-11
the estimated multiple regression output is as follows: Y= 3140376-12939(poles)+ 202245(Wins) + 188700(top 2 to 5) + 117071(top 6 to 10).
4) From the regression output in part 3, at a 0.05 level of significance, the relationship of top 2-5 and top 6-10 seeem to have an significant relationship (p-values being 6.42e and 0.00147). From the regression we can also conclude that wins (p-value = 0.032) also has a significant relationship with winnings.