Should have atleast 3 independent variables.
Data
Choose your data.
I will continue with the data for my OLS point estimates in
OLS_matrixVSlm
in W1.
remove(list=ls())
# install.packages("MASS")
library(MASS)
help(Boston)
str(Boston) # 506 rows and 14 columns.
## 'data.frame': 506 obs. of 14 variables:
## $ crim : num 0.00632 0.02731 0.02729 0.03237 0.06905 ...
## $ zn : num 18 0 0 0 0 0 12.5 12.5 12.5 12.5 ...
## $ indus : num 2.31 7.07 7.07 2.18 2.18 2.18 7.87 7.87 7.87 7.87 ...
## $ chas : int 0 0 0 0 0 0 0 0 0 0 ...
## $ nox : num 0.538 0.469 0.469 0.458 0.458 0.458 0.524 0.524 0.524 0.524 ...
## $ rm : num 6.58 6.42 7.18 7 7.15 ...
## $ age : num 65.2 78.9 61.1 45.8 54.2 58.7 66.6 96.1 100 85.9 ...
## $ dis : num 4.09 4.97 4.97 6.06 6.06 ...
## $ rad : int 1 2 2 3 3 3 5 5 5 5 ...
## $ tax : num 296 242 242 222 222 222 311 311 311 311 ...
## $ ptratio: num 15.3 17.8 17.8 18.7 18.7 18.7 15.2 15.2 15.2 15.2 ...
## $ black : num 397 397 393 395 397 ...
## $ lstat : num 4.98 9.14 4.03 2.94 5.33 ...
## $ medv : num 24 21.6 34.7 33.4 36.2 28.7 22.9 27.1 16.5 18.9 ...
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::select() masks MASS::select()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
glimpse(Boston)
## Rows: 506
## Columns: 14
## $ crim <dbl> 0.00632, 0.02731, 0.02729, 0.03237, 0.06905, 0.02985, 0.08829,…
## $ zn <dbl> 18.0, 0.0, 0.0, 0.0, 0.0, 0.0, 12.5, 12.5, 12.5, 12.5, 12.5, 1…
## $ indus <dbl> 2.31, 7.07, 7.07, 2.18, 2.18, 2.18, 7.87, 7.87, 7.87, 7.87, 7.…
## $ chas <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ nox <dbl> 0.538, 0.469, 0.469, 0.458, 0.458, 0.458, 0.524, 0.524, 0.524,…
## $ rm <dbl> 6.575, 6.421, 7.185, 6.998, 7.147, 6.430, 6.012, 6.172, 5.631,…
## $ age <dbl> 65.2, 78.9, 61.1, 45.8, 54.2, 58.7, 66.6, 96.1, 100.0, 85.9, 9…
## $ dis <dbl> 4.0900, 4.9671, 4.9671, 6.0622, 6.0622, 6.0622, 5.5605, 5.9505…
## $ rad <int> 1, 2, 2, 3, 3, 3, 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ tax <dbl> 296, 242, 242, 222, 222, 222, 311, 311, 311, 311, 311, 311, 31…
## $ ptratio <dbl> 15.3, 17.8, 17.8, 18.7, 18.7, 18.7, 15.2, 15.2, 15.2, 15.2, 15…
## $ black <dbl> 396.90, 396.90, 392.83, 394.63, 396.90, 394.12, 395.60, 396.90…
## $ lstat <dbl> 4.98, 9.14, 4.03, 2.94, 5.33, 5.21, 12.43, 19.15, 29.93, 17.10…
## $ medv <dbl> 24.0, 21.6, 34.7, 33.4, 36.2, 28.7, 22.9, 27.1, 16.5, 18.9, 15…
# install.packages("psych")
library(psych)
##
## Attaching package: 'psych'
##
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
describe(Boston)
## vars n mean sd median trimmed mad min max range skew
## crim 1 506 3.61 8.60 0.26 1.68 0.33 0.01 88.98 88.97 5.19
## zn 2 506 11.36 23.32 0.00 5.08 0.00 0.00 100.00 100.00 2.21
## indus 3 506 11.14 6.86 9.69 10.93 9.37 0.46 27.74 27.28 0.29
## chas 4 506 0.07 0.25 0.00 0.00 0.00 0.00 1.00 1.00 3.39
## nox 5 506 0.55 0.12 0.54 0.55 0.13 0.38 0.87 0.49 0.72
## rm 6 506 6.28 0.70 6.21 6.25 0.51 3.56 8.78 5.22 0.40
## age 7 506 68.57 28.15 77.50 71.20 28.98 2.90 100.00 97.10 -0.60
## dis 8 506 3.80 2.11 3.21 3.54 1.91 1.13 12.13 11.00 1.01
## rad 9 506 9.55 8.71 5.00 8.73 2.97 1.00 24.00 23.00 1.00
## tax 10 506 408.24 168.54 330.00 400.04 108.23 187.00 711.00 524.00 0.67
## ptratio 11 506 18.46 2.16 19.05 18.66 1.70 12.60 22.00 9.40 -0.80
## black 12 506 356.67 91.29 391.44 383.17 8.09 0.32 396.90 396.58 -2.87
## lstat 13 506 12.65 7.14 11.36 11.90 7.11 1.73 37.97 36.24 0.90
## medv 14 506 22.53 9.20 21.20 21.56 5.93 5.00 50.00 45.00 1.10
## kurtosis se
## crim 36.60 0.38
## zn 3.95 1.04
## indus -1.24 0.30
## chas 9.48 0.01
## nox -0.09 0.01
## rm 1.84 0.03
## age -0.98 1.25
## dis 0.46 0.09
## rad -0.88 0.39
## tax -1.15 7.49
## ptratio -0.30 0.10
## black 7.10 4.06
## lstat 0.46 0.32
## medv 1.45 0.41
# save the Boston data in MASS package in a dataframe
Boston <- MASS::Boston
Main Regression Model
Specification
Specify your linear regression -
\(Median \ Home \ Value_{i} = \beta_0 +
\beta_1 \ Per \ Capita \ Crime_{i} + \beta_2 \ Prop. \ of \ non retail \
business_{i} + \\ \beta_3 \ Prop. \ of \ Black_{i} +
\epsilon_{i}\)
where i
indexes town. From the subscripts you can
already guess that the data is cross sectional. Type
help(Boston)
to read more on the exact variable description
if you wish.
RECOMMENDATION: Try using equation mode to type out
your specification with subscripts. See Typesetting
Equations for details. If short on time, just tell us what
is your dependent variable and what are your independent variables.