F-statistic is a good indicator of whether there is a relationship between our predictor and the response variables. The further the F-statistic is from 1 the better it is. However, how much larger the F-statistic needs to be depends on both the number of data points and the number of predictors. Generally, when the number of data points is large, an F-statistic that is only a little bit larger than 1 is already sufficient to reject the null hypothesis (H0 : There is no relationship between x1 and y). The reverse is true as if the number of data points is small, a large F-statistic is required to be able to ascertain that there may be a relationship between predictor and response variables. In our example the F-statistic is 1990 which is relatively larger than 1 given the size of our data.
a1<-lm(y~x1,df1)
summary(a1)
##
## Call:
## lm(formula = y ~ x1, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -156.06 -33.13 -1.23 36.09 139.83
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 25.94506 6.19828 4.186 3.36e-05 ***
## x1 1.38705 0.03109 44.610 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 51.8 on 498 degrees of freedom
## Multiple R-squared: 0.7998, Adjusted R-squared: 0.7994
## F-statistic: 1990 on 1 and 498 DF, p-value: < 2.2e-16
a2<-lm(y~x2,df1)
summary(a2)
##
## Call:
## lm(formula = y ~ x2, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -127.963 -28.312 0.004 29.609 123.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.5783 5.2574 -0.30 0.764
## x2 4.7608 0.0824 57.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.72 on 498 degrees of freedom
## Multiple R-squared: 0.8702, Adjusted R-squared: 0.8699
## F-statistic: 3338 on 1 and 498 DF, p-value: < 2.2e-16
a3<-lm(y~x3,df1)
summary(a3)
##
## Call:
## lm(formula = y ~ x3, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -268.758 -91.529 -3.845 92.356 258.653
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 277.6493 5.2840 52.546 < 2e-16 ***
## x3 0.5805 0.1624 3.575 0.000385 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 114.3 on 498 degrees of freedom
## Multiple R-squared: 0.02502, Adjusted R-squared: 0.02306
## F-statistic: 12.78 on 1 and 498 DF, p-value: 0.0003849
a4<-lm(y~x1+x2,df1)
summary(a4)
##
## Call:
## lm(formula = y ~ x1 + x2, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -131.941 -28.231 0.108 30.473 123.991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.17846 5.27027 -0.413 0.680
## x1 -0.13253 0.09543 -1.389 0.166
## x2 5.18163 0.31402 16.501 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.68 on 497 degrees of freedom
## Multiple R-squared: 0.8707, Adjusted R-squared: 0.8702
## F-statistic: 1673 on 2 and 497 DF, p-value: < 2.2e-16
a5<-lm(y~x1+x2+x3,df1)
summary(a5)
##
## Call:
## lm(formula = y ~ x1 + x2 + x3, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -131.672 -28.472 0.597 30.258 123.630
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.09331 5.28770 -0.396 0.692
## x1 -0.13048 0.09592 -1.360 0.174
## x2 5.17188 0.31705 16.313 <2e-16 ***
## x3 0.01418 0.06032 0.235 0.814
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 41.72 on 496 degrees of freedom
## Multiple R-squared: 0.8707, Adjusted R-squared: 0.8699
## F-statistic: 1113 on 3 and 496 DF, p-value: < 2.2e-16