Summary Statistics

df = read.csv("~/Desktop/Nascar_data.csv")
     attach(df)
     summary(df)
##     Driver              Points           Poles             Wins  
     ##  Length:35          Min.   : 192.0   Min.   :0.0000   Min.   :0  
     ##  Class :character   1st Qu.: 807.5   1st Qu.:0.0000   1st Qu.:0  
     ##  Mode  :character   Median : 937.0   Median :1.0000   Median :0  
     ##                     Mean   :1304.2   Mean   :0.9429   Mean   :1  
     ##                     3rd Qu.:2284.0   3rd Qu.:2.0000   3rd Qu.:1  
     ##                     Max.   :2403.0   Max.   :3.0000   Max.   :5  
     ##      Top.5            Top.10          Top2.5          Top6.10      
     ##  Min.   : 0.000   Min.   : 0.00   Min.   : 0.000   Min.   : 0.000  
     ##  1st Qu.: 1.500   1st Qu.: 4.50   1st Qu.: 1.000   1st Qu.: 2.000  
     ##  Median : 4.000   Median :10.00   Median : 3.000   Median : 5.000  
     ##  Mean   : 5.114   Mean   :10.23   Mean   : 4.114   Mean   : 5.114  
     ##  3rd Qu.: 8.500   3rd Qu.:16.00   3rd Qu.: 5.500   3rd Qu.: 8.000  
     ##  Max.   :19.000   Max.   :26.00   Max.   :18.000   Max.   :12.000  
     ##   Winnings....    
     ##  Min.   :2271890  
     ##  1st Qu.:3867200  
     ##  Median :4579860  
     ##  Mean   :4705510  
     ##  3rd Qu.:5517570  
     ##  Max.   :8485990
str(df)
## 'data.frame':    35 obs. of  9 variables:
     ##  $ Driver      : chr  "Tony Stewart" "Carl Edwards" "Kevin Harvick" "Matt Kenseth" ...
     ##  $ Points      : int  2403 2403 2345 2330 2319 2304 2290 2287 2284 2284 ...
     ##  $ Poles       : int  1 3 0 3 1 0 1 1 0 3 ...
     ##  $ Wins        : int  5 1 4 3 3 2 0 3 1 1 ...
     ##  $ Top.5       : int  9 19 9 12 10 14 4 13 5 9 ...
     ##  $ Top.10      : int  19 26 19 20 14 21 12 18 14 17 ...
     ##  $ Top2.5      : int  4 18 5 9 7 12 4 10 4 8 ...
     ##  $ Top6.10     : int  10 7 10 8 4 7 8 5 9 8 ...
     ##  $ Winnings....: int  6529870 8485990 6197140 6183580 5087740 6296360 4163690 5912830 5401190 5303020 ...

1) Predict winnings (Dependent Variable) – Indipendent Variables: Poles, Wins, Top5, top10

lm1 = lm(Winnings.... ~ Poles+Wins+Top.5+Top.10)
     summary(lm1)
## 
     ## Call:
     ## lm(formula = Winnings.... ~ Poles + Wins + Top.5 + Top.10)
     ## 
     ## Residuals:
     ##     Min      1Q  Median      3Q     Max 
     ## -868477 -383997 -147456  396959 1059512 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)    
     ## (Intercept)  3140367     184229  17.046  < 2e-16 ***
     ## Poles         -12939     107205  -0.121  0.90474    
     ## Wins           13545     111226   0.122  0.90389    
     ## Top.5          71629      50667   1.414  0.16773    
     ## Top.10        117071      33433   3.502  0.00147 ** 
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 581400 on 30 degrees of freedom
     ## Multiple R-squared:  0.8205, Adjusted R-squared:  0.7966 
     ## F-statistic: 34.28 on 4 and 30 DF,  p-value: 8.619e-11

From the given output we can conclude that finishing in top 10 is the provides the best single predictor of winnings as the p-value 0.00147 is lower than the significance level of 0.05.

2) Indivisual significance:

From the regression output in part 1 – The p-values corrosponding to Poles (0.90), Wins(0.903), Top 5 (0.16), and Top 10 (0.0014). At the significance level of 0.05 only Top ten seems significant as the p-value < Alpha.

3) Regression Using dummys for top 2-5 & top 6-10

lm2 = lm(Winnings.... ~Poles+Wins+Top2.5+Top6.10)
     summary(lm2)
## 
     ## Call:
     ## lm(formula = Winnings.... ~ Poles + Wins + Top2.5 + Top6.10)
     ## 
     ## Residuals:
     ##     Min      1Q  Median      3Q     Max 
     ## -868477 -383997 -147456  396959 1059512 
     ## 
     ## Coefficients:
     ##             Estimate Std. Error t value Pr(>|t|)    
     ## (Intercept)  3140367     184229  17.046  < 2e-16 ***
     ## Poles         -12939     107205  -0.121  0.90474    
     ## Wins          202245      90226   2.242  0.03254 *  
     ## Top2.5        188700      34586   5.456 6.43e-06 ***
     ## Top6.10       117071      33433   3.502  0.00147 ** 
     ## ---
     ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
     ## 
     ## Residual standard error: 581400 on 30 degrees of freedom
     ## Multiple R-squared:  0.8205, Adjusted R-squared:  0.7966 
     ## F-statistic: 34.28 on 4 and 30 DF,  p-value: 8.619e-11

the estimated multiple regression output is as follows: Y= 3140376-12939(poles)+ 202245(Wins) + 188700(top 2 to 5) + 117071(top 6 to 10).

4) From the regression output in part 3, at a 0.05 level of significance, the relationship of top 2-5 and top 6-10 seeem to have an significant relationship (p-values being 6.42e and 0.00147). From the regression we can also conclude that wins (p-value = 0.032) also has a significant relationship with winnings.