Task 1

Root mean squared error: √RSS / (n-1)) = 1,05
(SE(β_1 ) ) ^: Root MSE / √/ [(n-1)s2 X] = 1,05 /√9*1,15 = 0,32
t value for null hypothesis that β_1=0: t statistics against a null assumption equals β_1 / (SE(β_1 ) ) ^ = 0.92 / 0,32 = 2,875 P>|2,875| = between 0.01 and 0.02 P|≤2.875| = 0.99
confidence interval for β_1: β_1 +/- .032*2,875 = [-1,84; 1,84]

Task 2

Generating population and sample

set.seed(516)

population <- rnorm(n = 10000, 10, 2)

ran_sample <- sample(population, 100)


par(
  mfrow=c(1,2),
  mar=c(4,4,1,0)
)
hist(population, xlab = "Random value (X)", col = "#404080",
     main = "The distribution of the population", cex.lab = 1, cex.axis = 1, cex.main = 1)
hist(ran_sample, xlab = "Random value (X)", col = "#69b3a2",
     main = "The distribution of the sample", cex.lab = 1, cex.axis = 1, cex.main = 1)

Sample estimations

x1 <- rnorm(n = ran_sample, mean = 3, sd = 2.1)
x2 <- rnorm(n = ran_sample, mean = 2, sd = 2.8)
x3 <- rnorm(n = ran_sample, mean = 2.9, sd = 9)
x4 <- rnorm(n = ran_sample, mean = 14, sd = 10.5)
x5 <- rnorm(n = ran_sample, mean = 8.2, sd = 4.4)
x6 <- rnorm(n = ran_sample, mean = 5.6, sd = 10.3)
x7 <- rnorm(n = ran_sample, mean = 0.5, sd = 1.5)
x8 <- rnorm(n = ran_sample, mean = 3.3, sd = 1)
x9 <- rnorm(n = ran_sample, mean = 8.1, sd = 2.1)
x10 <- rnorm(n = ran_sample, mean = 4.5, sd = 3)

y <- rnorm(n = ran_sample, mean = 5, sd = 2) 

mod0 <- lm(y ~ 1)
mod1 <- lm(y ~ x1)
mod2 <- lm(y ~ x1 + x2)
mod3 <- lm(y ~ x1 + x2 + x3)
mod4 <- lm(y ~ x1 + x2 + x3 + x4)
mod5 <- lm(y ~ x1 + x2 + x3 + x4 + x5)
mod6 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6)
mod7 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7)
mod8 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8)
mod9 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9)
mod10 <- lm(y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7 + x8 + x9 + x10)

tab_model(mod0, mod1, mod2, mod3, mod4, mod5, mod6, mod7, mod8, mod9, mod10, p.style = "stars", show.ci = FALSE, show.se = TRUE, show.fstat = TRUE, CSS = list(
    css.depvarhead = 'color: red;',
    css.centeralign = 'text-align: left;', 
    css.firsttablecol = 'font-weight: bold;', 
    css.summary = 'color: blue;'
  ),
  dv.labels = c('Model 0', 'Model 1', 'Model 2', 'Model 3', 'Model 4', 'Model 5', 'Model 6', 'Model 7', 'Model 8', 'Model 9', 'Model 10'))

	Model 0		Model 1		Model 2		Model 3		Model 4		Model 5		Model 6		Model 7		Model 8		Model 9		Model 10
Predictors	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error
(Intercept)	4.95 ^***	0.20	4.85 ^***	0.30	4.84 ^***	0.33	4.79 ^***	0.35	5.17 ^***	0.45	5.26 ^***	0.68	5.13 ^***	0.70	5.14 ^***	0.70	5.98 ^***	0.96	6.27 ^***	1.32	6.68 ^***	1.40
x1			0.04	0.09	0.04	0.09	0.04	0.09	0.05	0.09	0.05	0.09	0.06	0.10	0.06	0.10	0.04	0.10	0.04	0.10	0.05	0.10
x2					0.01	0.08	0.00	0.08	0.01	0.08	0.01	0.08	-0.00	0.08	-0.01	0.08	0.00	0.08	0.01	0.08	-0.00	0.08
x3							0.01	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02	0.02
x4									-0.03	0.02	-0.03	0.02	-0.03	0.02	-0.03	0.02	-0.03	0.02	-0.03	0.02	-0.03	0.02
x5											-0.01	0.05	-0.01	0.05	-0.01	0.05	0.00	0.05	0.00	0.05	0.01	0.05
x6													0.02	0.02	0.02	0.02	0.01	0.02	0.01	0.02	0.02	0.02
x7															0.04	0.15	-0.01	0.15	-0.01	0.15	-0.00	0.15
x8																	-0.27	0.21	-0.28	0.21	-0.26	0.21
x9																			-0.04	0.12	-0.07	0.12
x10																					-0.07	0.08
Observations	100		100		100		100		100		100		100		100		100		100		100
R² / R² adjusted	0.000 / 0.000		0.002 / -0.008		0.002 / -0.019		0.006 / -0.025		0.023 / -0.018		0.023 / -0.028		0.029 / -0.034		0.030 / -0.044		0.047 / -0.037		0.048 / -0.047		0.057 / -0.049
p<0.05 p<0.01 * p<0.001

F-statistics for sample models

##      Models Fstatistics
## 1    Model1  0.18421557
## 2   Model 2  0.09793601
## 3   Model 3  0.18005965
## 4   Model 4  0.56366095
## 5   Model 5  0.45187139
## 6   Model 6  0.46270030
## 7   Model 7  0.40122589
## 8   Model 8  0.56391772
## 9   Model 9  0.50744888
## 10 Model 10  0.53645863

Root mean squared error for sample models

##      Models     RMES
## 1    Model1 1.991549
## 2   Model 2 1.991410
## 3   Model 3 1.987835
## 4   Model 4 1.970178
## 5   Model 5 1.969886
## 6   Model 6 1.964316
## 7   Model 7 1.963671
## 8   Model 8 1.945772
## 9   Model 9 1.944689
## 10 Model 10 1.935928

Comments:

By looking at the values of the R2, we can see that as predictors are added to the model, the proportion of explained dependent variable variance increases. Thus, the models with more terms may appear to have a better fit but actually the models make no sense at all given these generated data.

As for Adjusted R squared, its negative values mean that the models are predicting worse than the simple mean of the y value.

As for the F statistic, we do not see the significance of regression coefficients in the models.

The root mean squared error tells you us concentrated the data is around the line of best fit. We see that with the addition of more predictors, the model fits better.

Task 3

Population estimations

x_1 <- rnorm(n = population, mean = 3, sd = 2.1)
x_2 <- rnorm(n = population, mean = 2, sd = 2.8)
x_3 <- rnorm(n = population, mean = 2.9, sd = 9)
x_4 <- rnorm(n = population, mean = 14, sd = 10.5)
x_5 <- rnorm(n = population, mean = 8.2, sd = 4.4)
x_6 <- rnorm(n = population, mean = 5.6, sd = 10.3)
x_7 <- rnorm(n = population, mean = 0.5, sd = 1.5)
x_8 <- rnorm(n = population, mean = 3.3, sd = 1)
x_9 <- rnorm(n = population, mean = 8.1, sd = 2.1)
x_10 <- rnorm(n = population, mean = 4.5, sd = 3)
y1 <- rnorm(n = population, mean = 5, sd = 2)

mod_0 <- lm(y1 ~ 1)
mod_1 <- lm(y1 ~ x_1)
mod_2 <- lm(y1 ~ x_1 + x_2)
mod_3 <- lm(y1 ~ x_1 + x_2 + x_3)
mod_4 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4)
mod_5 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5)
mod_6 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6)
mod_7 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7)
mod_8 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8)
mod_9 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8 + x_9)
mod_10 <- lm(y1 ~ x_1 + x_2 + x_3 + x_4 + x_5 + x_6 + x_7 + x_8 + x_9 + x_10)

tab_model(mod_0, mod_1, mod_2, mod_3, mod_4, mod_5, mod_6, mod_7, mod_8, mod_9, mod_10, p.style = "stars", show.ci = FALSE, show.se = TRUE, show.fstat = TRUE, CSS = list(
    css.depvarhead = 'color: red;',
    css.centeralign = 'text-align: left;', 
    css.firsttablecol = 'font-weight: bold;', 
    css.summary = 'color: blue;'
  ),
  dv.labels = c('Model 0', 'Model 1', 'Model 2', 'Model 3', 'Model 4', 'Model 5', 'Model 6', 'Model 7', 'Model 8', 'Model 9', 'Model 10'))

	Model 0		Model 1		Model 2		Model 3		Model 4		Model 5		Model 6		Model 7		Model 8		Model 9		Model 10
Predictors	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error	Estimates	std. Error
(Intercept)	5.00 ^***	0.02	5.02 ^***	0.04	5.03 ^***	0.04	5.04 ^***	0.04	5.04 ^***	0.05	5.08 ^***	0.06	5.07 ^***	0.06	5.07 ^***	0.06	5.05 ^***	0.09	5.01 ^***	0.12	5.05 ^***	0.12
x 1			-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01
x 2					-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01	-0.01	0.01
x 3							-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00
x 4									-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00
x 5											-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00	-0.00	0.00
x 6													0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00	0.00
x 7															0.01	0.01	0.01	0.01	0.01	0.01	0.01	0.01
x 8																	0.01	0.02	0.01	0.02	0.01	0.02
x 9																			0.00	0.01	0.00	0.01
x 10																					-0.01	0.01
Observations	10000		10000		10000		10000		10000		10000		10000		10000		10000		10000		10000
R² / R² adjusted	0.000 / 0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.000 / -0.000		0.001 / -0.000
p<0.05 p<0.01 * p<0.001

F-statistics for general population models

##      Models Fstatistics
## 1    Model1   0.3562914
## 2   Model 2   0.5751030
## 3   Model 3   0.6008714
## 4   Model 4   0.4564295
## 5   Model 5   0.5387954
## 6   Model 6   0.4807844
## 7   Model 7   0.5631786
## 8   Model 8   0.5012680
## 9   Model 9   0.4711361
## 10 Model 10   0.5604867

Root mean squared error for sample models

##      Models     RMES
## 1    Model1 2.002989
## 2   Model 2 2.002909
## 3   Model 3 2.002844
## 4   Model 4 2.002841
## 5   Model 5 2.002754
## 6   Model 6 2.002735
## 7   Model 7 2.002629
## 8   Model 8 2.002622
## 9   Model 9 2.002599
## 10 Model 10 2.002462

Comments

The smaller the sample, the higher the variance of the estimates, so we see relatively large coefficients in the 100-case analysis that explain spurious relationships. In a large sample or in the general population, there is a small dispersion of estimates, they are closer to real values (in the case of my models - 0). Accordingly, when we analyze the general population, we see more precise F statistics scores as well as RMES.

In the same vein, while we analyse only 100 observations, both R-squared and adjusted R-squared vary widely around the population value. In the general population, these parameters become less biased. We see that the proportion of the variance for a dependent variable that is explained by the independent variables is merely zero. Our predictors don not explain anything about how the dependent variable changes.

Week3 Assignment

Eleonora Minaeva

2022-10-19