Task 1

  1. Root mean squared error: √RSS / (n-1)) = 1,05

  2. (SE(β_1 ) ) ^: Root MSE / √/ [(n-1)s2 X] = 1,05 /√9*1,15 = 0,32

  3. t value for null hypothesis that β_1=0: t statistics against a null assumption equals β_1 / (SE(β_1 ) ) ^ = 0.92 / 0,32 = 2,875 P>|2,875| = between 0.01 and 0.02 P|≤2.875| = 0.99

  4. confidence interval for β_1: β_1 +/- .032*2,875 = [-1,84; 1,84]

Task 2

Generating population and sample

set.seed(516)

population <- rnorm(n = 10000, 10, 2)

ran_sample <- sample(population, 100)


par(
  mfrow=c(1,2),
  mar=c(4,4,1,0)
)
hist(population, xlab = "Random value (X)", col = "#404080",
     main = "The distribution of the population", cex.lab = 1, cex.axis = 1, cex.main = 1)
hist(ran_sample, xlab = "Random value (X)", col = "#69b3a2",
     main = "The distribution of the sample", cex.lab = 1, cex.axis = 1, cex.main = 1)

Sample estimations

x1 <- rnorm(n = ran_sample, mean = 3, sd = 2.1)
x2 <- rnorm(n = ran_sample, mean = 2, sd = 2.8)
x3 <- rnorm(n = ran_sample, mean = 2.9, sd = 9)
x4 <- rnorm(n = ran_sample, mean = 14, sd = 10.5)
x5 <- rnorm(n = ran_sample, mean = 8.2, sd = 4.4)
x6 <- rnorm(n = ran_sample, mean = 5.6, sd = 10.3)
x7 <- rnorm(n = ran_sample, mean = 0.5, sd = 1.5)
x8 <- rnorm(n = ran_sample, mean = 3.3, sd = 1)
x9 <- rnorm(n = ran_sample, mean = 8.1, sd = 2.1)
x10 <- rnorm(n = ran_sample, mean = 4.5, sd = 3)

y <- rnorm(n = ran_sample, mean = 5, sd = 2) 

mod0 <- lm(y ~ 1)
mod1 <- lm(y ~ x1)
mod2 <- lm(y ~ x1 + x2)
mod3 <- lm(y ~ x1 + x2 + x3)
mod4 <- lm(y ~ x1 + x2 + x3 + x4)
mod5 <- lm(y ~ x1 + x2 + x3 + x4 + x5)
mod6 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6)
mod7 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7)
mod8 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)
mod9 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9)
mod10 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10)

tab_model(mod0, mod1, mod2, mod3, mod4, mod5, mod6, mod7, mod8, mod9, mod10, p.style = "stars", show.ci = FALSE, show.se = TRUE, show.fstat = TRUE, CSS = list(
    css.depvarhead = 'color: red;',
    css.centeralign = 'text-align: left;', 
    css.firsttablecol = 'font-weight: bold;', 
    css.summary = 'color: blue;'
  ),
  dv.labels = c('Model 0', 'Model 1', 'Model 2', 'Model 3', 'Model 4', 'Model 5', 'Model 6', 'Model 7', 'Model 8', 'Model 9', 'Model 10'))
  Model 0 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10
Predictors Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error
(Intercept) 4.95 *** 0.20 4.85 *** 0.30 4.84 *** 0.33 4.79 *** 0.35 5.17 *** 0.45 5.26 *** 0.68 5.13 *** 0.70 5.14 *** 0.70 5.98 *** 0.96 6.27 *** 1.32 6.68 *** 1.40
x1 0.04 0.09 0.04 0.09 0.04 0.09 0.05 0.09 0.05 0.09 0.06 0.10 0.06 0.10 0.04 0.10 0.04 0.10 0.05 0.10
x2 0.01 0.08 0.00 0.08 0.01 0.08 0.01 0.08 -0.00 0.08 -0.01 0.08 0.00 0.08 0.01 0.08 -0.00 0.08
x3 0.01 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02 0.02
x4 -0.03 0.02 -0.03 0.02 -0.03 0.02 -0.03 0.02 -0.03 0.02 -0.03 0.02 -0.03 0.02
x5 -0.01 0.05 -0.01 0.05 -0.01 0.05 0.00 0.05 0.00 0.05 0.01 0.05
x6 0.02 0.02 0.02 0.02 0.01 0.02 0.01 0.02 0.02 0.02
x7 0.04 0.15 -0.01 0.15 -0.01 0.15 -0.00 0.15
x8 -0.27 0.21 -0.28 0.21 -0.26 0.21
x9 -0.04 0.12 -0.07 0.12
x10 -0.07 0.08
Observations 100 100 100 100 100 100 100 100 100 100 100
R2 / R2 adjusted 0.000 / 0.000 0.002 / -0.008 0.002 / -0.019 0.006 / -0.025 0.023 / -0.018 0.023 / -0.028 0.029 / -0.034 0.030 / -0.044 0.047 / -0.037 0.048 / -0.047 0.057 / -0.049
  • p<0.05   ** p<0.01   *** p<0.001

F-statistics for sample models

##      Models Fstatistics
## 1    Model1  0.18421557
## 2   Model 2  0.09793601
## 3   Model 3  0.18005965
## 4   Model 4  0.56366095
## 5   Model 5  0.45187139
## 6   Model 6  0.46270030
## 7   Model 7  0.40122589
## 8   Model 8  0.56391772
## 9   Model 9  0.50744888
## 10 Model 10  0.53645863

Root mean squared error for sample models

##      Models     RMES
## 1    Model1 1.991549
## 2   Model 2 1.991410
## 3   Model 3 1.987835
## 4   Model 4 1.970178
## 5   Model 5 1.969886
## 6   Model 6 1.964316
## 7   Model 7 1.963671
## 8   Model 8 1.945772
## 9   Model 9 1.944689
## 10 Model 10 1.935928

Comments:

By looking at the values of the R2, we can see that as predictors are added to the model, the proportion of explained dependent variable variance increases. Thus, the models with more terms may appear to have a better fit but actually the models make no sense at all given these generated data.

As for Adjusted R squared, its negative values mean that the models are predicting worse than the simple mean of the y value.

As for the F statistic, we do not see the significance of regression coefficients in the models.

The root mean squared error tells you us concentrated the data is around the line of best fit. We see that with the addition of more predictors, the model fits better.

Task 3

Population estimations

x_1 <- rnorm(n = population, mean = 3, sd = 2.1)
x_2 <- rnorm(n = population, mean = 2, sd = 2.8)
x_3 <- rnorm(n = population, mean = 2.9, sd = 9)
x_4 <- rnorm(n = population, mean = 14, sd = 10.5)
x_5 <- rnorm(n = population, mean = 8.2, sd = 4.4)
x_6 <- rnorm(n = population, mean = 5.6, sd = 10.3)
x_7 <- rnorm(n = population, mean = 0.5, sd = 1.5)
x_8 <- rnorm(n = population, mean = 3.3, sd = 1)
x_9 <- rnorm(n = population, mean = 8.1, sd = 2.1)
x_10 <- rnorm(n = population, mean = 4.5, sd = 3)
y1 <- rnorm(n = population, mean = 5, sd = 2)

mod_0 <- lm(y1 ~ 1)
mod_1 <- lm(y1 ~ x_1)
mod_2 <- lm(y1 ~ x_1 + x_2)
mod_3 <- lm(y1 ~ x_1 + x_2 + x_3)
mod_4 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4)
mod_5 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5)
mod_6 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6)
mod_7 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7)
mod_8 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8)
mod_9 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8 + x_9)
mod_10 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8 + x_9 + x_10)

tab_model(mod_0, mod_1, mod_2, mod_3, mod_4, mod_5, mod_6, mod_7, mod_8, mod_9, mod_10, p.style = "stars", show.ci = FALSE, show.se = TRUE, show.fstat = TRUE, CSS = list(
    css.depvarhead = 'color: red;',
    css.centeralign = 'text-align: left;', 
    css.firsttablecol = 'font-weight: bold;', 
    css.summary = 'color: blue;'
  ),
  dv.labels = c('Model 0', 'Model 1', 'Model 2', 'Model 3', 'Model 4', 'Model 5', 'Model 6', 'Model 7', 'Model 8', 'Model 9', 'Model 10'))
  Model 0 Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8 Model 9 Model 10
Predictors Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error Estimates std. Error
(Intercept) 5.00 *** 0.02 5.02 *** 0.04 5.03 *** 0.04 5.04 *** 0.04 5.04 *** 0.05 5.08 *** 0.06 5.07 *** 0.06 5.07 *** 0.06 5.05 *** 0.09 5.01 *** 0.12 5.05 *** 0.12
x 1 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01
x 2 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01 -0.01 0.01
x 3 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00
x 4 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00
x 5 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00 -0.00 0.00
x 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
x 7 0.01 0.01 0.01 0.01 0.01 0.01 0.01 0.01
x 8 0.01 0.02 0.01 0.02 0.01 0.02
x 9 0.00 0.01 0.00 0.01
x 10 -0.01 0.01
Observations 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000 10000
R2 / R2 adjusted 0.000 / 0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.000 / -0.000 0.001 / -0.000
  • p<0.05   ** p<0.01   *** p<0.001

F-statistics for general population models

##      Models Fstatistics
## 1    Model1   0.3562914
## 2   Model 2   0.5751030
## 3   Model 3   0.6008714
## 4   Model 4   0.4564295
## 5   Model 5   0.5387954
## 6   Model 6   0.4807844
## 7   Model 7   0.5631786
## 8   Model 8   0.5012680
## 9   Model 9   0.4711361
## 10 Model 10   0.5604867

Root mean squared error for sample models

##      Models     RMES
## 1    Model1 2.002989
## 2   Model 2 2.002909
## 3   Model 3 2.002844
## 4   Model 4 2.002841
## 5   Model 5 2.002754
## 6   Model 6 2.002735
## 7   Model 7 2.002629
## 8   Model 8 2.002622
## 9   Model 9 2.002599
## 10 Model 10 2.002462

Comments

The smaller the sample, the higher the variance of the estimates, so we see relatively large coefficients in the 100-case analysis that explain spurious relationships. In a large sample or in the general population, there is a small dispersion of estimates, they are closer to real values (in the case of my models - 0). Accordingly, when we analyze the general population, we see more precise F statistics scores as well as RMES.

In the same vein, while we analyse only 100 observations, both R-squared and adjusted R-squared vary widely around the population value. In the general population, these parameters become less biased. We see that the proportion of the variance for a dependent variable that is explained by the independent variables is merely zero. Our predictors don not explain anything about how the dependent variable changes.