1. Data Set: MurderRates

The selected data set, MurderRates, contains data on 44 states (states are not named) in 1950 analyzing the determinants of murder rates in the United States. The data frame consists of 44 observations on 8 variables. This data is cross-sectional in nature.

Variables:

  • rate: murder rate per 100,000 (FBI estimate, 1950)

  • convictions: Number of convictions divided by number of murders in 1950

  • executions: Average number of executions during 1946-1950 divided by convictions in 1950

  • time: median time served (in months) of convicted murders released in 1951

  • income: median family income in 1949 (in 1,000 USD)

  • lfp: labor force participation rate in 1950 (in percent)

  • noncauc: proportion of population that is non-Caucasian in 1950

  • southern: factor indicating region (I converted this variable to a dummy variable where 1 = yes and 0 = no)

  • Our target variable will be the murder rate per 100,000 (FBI estimate, 1950) - the column titled rate

2. Estimating the Multivariate Regression Equation

Model Equation:

\[Murder Rates_i = \beta_o + \beta_1convictions+\beta_2executions+\beta_3time+\beta_4income+\beta_5lfp\\+\beta_6noncauc+\beta_7southern+\epsilon_i \]

library("AER")
## Warning: package 'AER' was built under R version 4.2.3
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Warning: package 'lmtest' was built under R version 4.2.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Warning: package 'sandwich' was built under R version 4.2.3
## Loading required package: survival
library(rsconnect)
library(knitr)
data("MurderRates")
str(MurderRates)
## 'data.frame':    44 obs. of  8 variables:
##  $ rate       : num  19.25 7.53 5.66 3.21 2.8 ...
##  $ convictions: num  0.204 0.327 0.401 0.318 0.35 0.283 0.204 0.232 0.199 0.138 ...
##  $ executions : num  0.035 0.081 0.012 0.07 0.062 0.1 0.05 0.054 0.086 0 ...
##  $ time       : int  47 58 82 100 222 164 161 70 219 81 ...
##  $ income     : num  1.1 0.92 1.72 2.18 1.75 2.26 2.07 1.43 1.92 1.82 ...
##  $ lfp        : num  51.2 48.5 50.8 54.4 52.4 56.7 54.6 52.7 52.3 53 ...
##  $ noncauc    : num  0.321 0.224 0.127 0.063 0.021 0.027 0.139 0.218 0.008 0.012 ...
##  $ southern   : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 2 1 1 ...

3. Run lm() command

model1 <- lm(rate ~ ., data = MurderRates)
summary(model1)
## 
## Call:
## lm(formula = rate ~ ., data = MurderRates)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.9913 -1.1943 -0.3538  1.2383  6.5574 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.44436    9.96694   0.045   0.9647  
## convictions -4.33938    2.78313  -1.559   0.1277  
## executions   2.85276    6.12313   0.466   0.6441  
## time        -0.01547    0.00705  -2.194   0.0348 *
## income      -2.50013    1.68519  -1.484   0.1466  
## lfp          0.19357    0.20614   0.939   0.3540  
## noncauc     10.39903    5.40610   1.924   0.0623 .
## southernyes  3.26216    1.32980   2.453   0.0191 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 36 degrees of freedom
## Multiple R-squared:  0.7459, Adjusted R-squared:  0.6965 
## F-statistic:  15.1 on 7 and 36 DF,  p-value: 5.105e-09

4. Confirm point estimates/coefficients/betas with matrix algebra

Matrix Equation:

\[ \beta={(X'X)^{-1}}*(X'y) \]

Setup

#make southern column numeric
MurderRates$southern <- ifelse(MurderRates$southern == "yes", 1, 0)
#dependent variable
y <- as.vector(MurderRates$rate)

#matrix of variables from MurderRates
X <- as.matrix(MurderRates[-ncol(MurderRates[1])])
#create vector of ones with same length as rows in MurderRates
int <- rep(x=1, 
           times = length(y)
           )

#add column of ones to matrix 
X <- cbind(int, X)
remove(int)
str(X)
##  num [1:44, 1:8] 1 1 1 1 1 1 1 1 1 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : chr [1:44] "1" "2" "3" "4" ...
##   ..$ : chr [1:8] "int" "convictions" "executions" "time" ...
#closed form-solution
betas <- solve(t(X) %*% X) %*% t(X) %*% y
betas
##                    [,1]
## int          0.44435992
## convictions -4.33937650
## executions   2.85275974
## time        -0.01547016
## income      -2.50013325
## lfp          0.19356759
## noncauc     10.39903157
## southern     3.26216530

Based on the resulting regression and matrix algebra, we can see that the beta values calculated are equal. Therefore, the beta values should be true.

Resulting Equation:

\[ Murder Rate_i \approx 0.44 -4.34convictions+2.85executions-0.02time\\-2.5income+0.19lfp+10.40noncauc+3.26southern + \epsilon_i \]