Harold Nelson
2025-12-02
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.0 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Create x and y as vectors of length 100 by sampling from normal distributions with different means and standard deviations.
Note tha x and y are totally independent. Run a linear model to verify.
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3246 -0.6310 0.0063 0.6159 2.8186
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 15.73946 0.50295 31.29 <2e-16 ***
## x -0.13018 0.09941 -1.31 0.193
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.028 on 98 degrees of freedom
## Multiple R-squared: 0.0172, Adjusted R-squared: 0.007169
## F-statistic: 1.715 on 1 and 98 DF, p-value: 0.1934
Create sumx and sumy as the partial sums of x and y. Alos create time with the integers 1 to 100.
Run a linear model to see if sumx and sumy are correlated.
##
## Call:
## lm(formula = sumy ~ sumx)
##
## Residuals:
## Min 1Q Median 3Q Max
## -22.727 -6.678 -0.396 7.139 26.689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.64715 2.19931 -0.749 0.456
## sumx 3.00862 0.00755 398.494 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.89 on 98 degrees of freedom
## Multiple R-squared: 0.9994, Adjusted R-squared: 0.9994
## F-statistic: 1.588e+05 on 1 and 98 DF, p-value: < 2.2e-16
Create models to show that sumx and sumy both depend on time.
##
## Call:
## lm(formula = sumx ~ time)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1843 -1.9542 0.0365 2.0181 5.8354
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.93235 0.59511 1.567 0.12
## time 4.99375 0.01023 488.101 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.953 on 98 degrees of freedom
## Multiple R-squared: 0.9996, Adjusted R-squared: 0.9996
## F-statistic: 2.382e+05 on 1 and 98 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = sumy ~ time)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.1508 -3.2456 -0.4177 3.0869 7.7560
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.79513 0.74977 1.06 0.292
## time 15.03149 0.01289 1166.15 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.721 on 98 degrees of freedom
## Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
## F-statistic: 1.36e+06 on 1 and 98 DF, p-value: < 2.2e-16
When two variables, A and B, are both highly correlated with a third variable C, A and B will be highly correlated with each other statistically,
This is a real problem in economics because many economic time series are highly correlated with time.