2024-11-17

##Intro

This presentation uses the Guns dataset from the package AER. The dataset details crime and population information for all 50 states and Washington DC as a 51st, 1977-1999. All crime rates are per 100,000 population members. We will be specifically looking at Tennessee.

 

Topics: -Linear Regression

Linear Regression

Linear regression is a statistical method that evaluates given data with the purpose of using that to predict future data. In other words, it seeks to estimate a y variable based on a given x variable.

This is the simple linear regression line:

\(\hat{y}=a+bx\)

b represents a rate of change in the x variable, and a represents an unknown variable, “noise”

More Specifically:

\(Y_i=f(X_i,\beta)+e_i\)

Here, \(Y_i\) is the dependent variable, the one you’re trying to predict.

\(f(X_i,\beta)\) is a function of X (the independent variable) and unknown parameters \((\beta)\)

finally, \(e_i\) is the “noise” or error in your estimate.

Example: Does prison deter violent crime?

Prison populations in this dataset are given for the prior year. The orange line is our linear regression.This would indicate that, even though more individuals were imprisoned, crime rates continued to rise.

Example 2: Murder Rates per capita

This example shows that linear regression can’t always provide us with an accurate way to predict our outcomes. We can see that the plotted data points are wide spread and lie mostly away from our linear regression line. In cases such as this, we would need to choose another method.

Do stricter gun laws deter violent crime?

We will use this code to answer this question:

g1=ggplot(data=df_tn, aes(x=population, y=violent, color=law))+
geom_point()+
geom_smooth(method=“lm”, color=“lightyellow”)

This code will show us how violent crimes relate to population. The yellow linear regression line fits this data better, indicating there may be some relationship (or some way to predict) between population and violent crime rates. We can also see our confidence interval (gray) and whether or not these numbers were during a year when TN had a gun carrying law or not.

Results: Crime Types with/without Gun Laws

## `geom_smooth()` using formula = 'y ~ x'

How does TN compare?

If we use the original data set, which includes all states, we can compare TN to other states. We will use Kentucky for this.

## `geom_smooth()` using formula = 'y ~ x'
## `geom_smooth()` using formula = 'y ~ x'

Tennessee/Kentucky

We can see that, for KY, the linear regression doesn’t work very well. There is not as strong a correlation between population and violent crime as there is in TN. This also explains the larger gray area. In order to achieve 95% confidence, we have to include a wider range of possible values of \(\hat{Y}\).