The selected data set, MurderRates, contains data on 44 states (states are not named) in 1950 analyzing the determinants of murder rates in the United States. The data frame consists of 44 observations on 8 variables. This data is cross-sectional in nature.
rate: murder rate per 100,000 (FBI estimate, 1950)
convictions: Number of convictions divided by number of murders in 1950
executions: Average number of executions during 1946-1950 divided by convictions in 1950
time: median time served (in months) of convicted murders released in 1951
income: median family income in 1949 (in 1,000 USD)
lfp: labor force participation rate in 1950 (in percent)
noncauc: proportion of population that is non-Caucasian in 1950
southern: factor indicating region (I converted this variable to a dummy variable where 1 = yes and 0 = no)
Our target variable will be the murder rate per 100,000 (FBI estimate, 1950) - the column titled rate
\[Murder Rates_i = \beta_o + \beta_1convictions+\beta_2executions+\beta_3time+\beta_4income+\beta_5lfp\\+\beta_6noncauc+\beta_7southern+\epsilon_i \]
library("AER")
## Warning: package 'AER' was built under R version 4.2.3
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 4.2.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 4.2.3
## Loading required package: survival
library(rsconnect)
library(knitr)
data("MurderRates")
str(MurderRates)
## 'data.frame': 44 obs. of 8 variables:
## $ rate : num 19.25 7.53 5.66 3.21 2.8 ...
## $ convictions: num 0.204 0.327 0.401 0.318 0.35 0.283 0.204 0.232 0.199 0.138 ...
## $ executions : num 0.035 0.081 0.012 0.07 0.062 0.1 0.05 0.054 0.086 0 ...
## $ time : int 47 58 82 100 222 164 161 70 219 81 ...
## $ income : num 1.1 0.92 1.72 2.18 1.75 2.26 2.07 1.43 1.92 1.82 ...
## $ lfp : num 51.2 48.5 50.8 54.4 52.4 56.7 54.6 52.7 52.3 53 ...
## $ noncauc : num 0.321 0.224 0.127 0.063 0.021 0.027 0.139 0.218 0.008 0.012 ...
## $ southern : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 2 1 1 ...
model1 <- lm(rate ~ ., data = MurderRates)
summary(model1)
##
## Call:
## lm(formula = rate ~ ., data = MurderRates)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.9913 -1.1943 -0.3538 1.2383 6.5574
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.44436 9.96694 0.045 0.9647
## convictions -4.33938 2.78313 -1.559 0.1277
## executions 2.85276 6.12313 0.466 0.6441
## time -0.01547 0.00705 -2.194 0.0348 *
## income -2.50013 1.68519 -1.484 0.1466
## lfp 0.19357 0.20614 0.939 0.3540
## noncauc 10.39903 5.40610 1.924 0.0623 .
## southernyes 3.26216 1.32980 2.453 0.0191 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 36 degrees of freedom
## Multiple R-squared: 0.7459, Adjusted R-squared: 0.6965
## F-statistic: 15.1 on 7 and 36 DF, p-value: 5.105e-09
\[ \beta_i={(X'X)^{-1}}*(X'y) \]
Setup
#make southern column numeric
MurderRates$southern <- ifelse(MurderRates$southern == "yes", 1, 0)
#dependent variable
y <- as.vector(MurderRates$rate)
#matrix of variables from MurderRates
X <- as.matrix(MurderRates[-ncol(MurderRates[1])])
#create vector of ones with same length as rows in MurderRates
int <- rep(x=1,
times = length(y)
)
#add column of ones to matrix
X <- cbind(int, X)
remove(int)
str(X)
## num [1:44, 1:8] 1 1 1 1 1 1 1 1 1 1 ...
## - attr(*, "dimnames")=List of 2
## ..$ : chr [1:44] "1" "2" "3" "4" ...
## ..$ : chr [1:8] "int" "convictions" "executions" "time" ...
#closed form-solution
betas <- solve(t(X) %*% X) %*% t(X) %*% y
betas
## [,1]
## int 0.44435992
## convictions -4.33937650
## executions 2.85275974
## time -0.01547016
## income -2.50013325
## lfp 0.19356759
## noncauc 10.39903157
## southern 3.26216530
Based on the resulting regression and matrix algebra, we can see that the beta values calculated are equal. Therefore, the beta values should be true.
\[ Murder Rate_i \approx 0.44 -4.34convictions+2.85executions-0.02time\\-2.5income+0.19lfp+10.40noncauc+3.26southern + \epsilon_i \]