class: center, middle, inverse, title-slide .title[ #
HTML Presentation
] .subtitle[ ##
Data set: x08
] .author[ ###
Zack Shin
] .institute[ ###
West Chester University of Pennsulvania
] .date[ ###
September 29, 2022
Prepared for
STA490: Capstone Statistics
] --- ## Description of the Data These data were taken from a repository of linear regression example data sets. This particular data set contains 20 observations - presumably states - and covers three independent variables (Inhabitants, The percentage of families making under $5,000, The percentage unemployed), and one dependent variable (The number of murders per 1,000,000 inhabitants). ## Research Question The motivation of this particular statistical analysis is to learn to what extent - if any - poverty rate and unemployment rate affect the murder rate in a given area. Multiple linear regression was used to test the null hypothesis that poverty rate and unemployment rate do not affect the murder rate in a given area. --- ## Reading in the Data Here, I read in the data set directly from the website and rename the variables so they're easier to understand. I then place the observations into a table. | Index| Inhabitants| percentBelow5000| percentUnemployed| murderPerMillion| |-----:|-----------:|----------------:|-----------------:|----------------:| | 1| 587000| 16.5| 6.2| 11.2| | 2| 643000| 20.5| 6.4| 13.4| | 3| 635000| 26.3| 9.3| 40.7| | 4| 692000| 16.5| 5.3| 5.3| | 5| 1248000| 19.2| 7.3| 24.8| | 6| 643000| 16.5| 5.9| 12.7| | 7| 1964000| 20.2| 6.4| 20.9| | 8| 1531000| 21.3| 7.6| 35.7| | 9| 713000| 17.2| 4.9| 8.7| | 10| 749000| 14.3| 6.4| 9.6| | 11| 7895000| 18.1| 6.0| 14.5| | 12| 762000| 23.1| 7.4| 26.9| | 13| 2793000| 19.1| 5.8| 15.7| | 14| 741000| 24.7| 8.6| 36.2| | 15| 625000| 18.6| 6.5| 18.1| | 16| 854000| 24.9| 8.3| 28.9| | 17| 716000| 17.9| 6.7| 14.9| | 18| 921000| 22.4| 8.6| 25.8| | 19| 595000| 20.2| 8.4| 21.7| | 20| 3353000| 16.9| 6.7| 25.7| --- ##Descriptive Statistics (Scatterplot) <!-- --> --- ## Linear Model ``` ## ## Call: ## lm(formula = murderPerMillion ~ Inhabitants + percentBelow5000 + ## percentUnemployed, data = x08) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.7174 -3.3233 0.4031 1.7684 10.0329 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -36.7649252820 7.0109257716 -5.244 0.0000803 *** ## Inhabitants 0.0000007629 0.0000006363 1.199 0.24798 ## percentBelow5000 1.1921742108 0.5616539095 2.123 0.04974 * ## percentUnemployed 4.7198213719 1.5304754671 3.084 0.00712 ** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.59 on 16 degrees of freedom ## Multiple R-squared: 0.8183, Adjusted R-squared: 0.7843 ## F-statistic: 24.02 on 3 and 16 DF, p-value: 0.000003629 ``` "Inhabitants" not statistically significant, so take it out, rebuild model --- ## Linear Model V2 ``` ## ## Call: ## lm(formula = murderPerMillion ~ percentBelow5000 + percentUnemployed, ## data = x08) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5.9019 -2.8101 0.1569 1.7788 10.2709 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -34.0725 6.7265 -5.065 0.0000956 *** ## percentBelow5000 1.2239 0.5682 2.154 0.0459 * ## percentUnemployed 4.3989 1.5262 2.882 0.0103 * ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.648 on 17 degrees of freedom ## Multiple R-squared: 0.802, Adjusted R-squared: 0.7787 ## F-statistic: 34.43 on 2 and 17 DF, p-value: 0.000001051 ``` Linear model: murderPerMillion = -34.07 + 1.22(percentBelow5000) + 4.40(percentUnemployed) --- ## VIF Values & Correlation Matrix Our 2 variables are similar, I'll check if they're overly correlated with each other <!-- --> With none of the variables having VIF > 4, we do not have to worry about multicollinearity. --- ## Descriptive Statistics (Normality) <!-- --> Response variable is normal enough --- ##Residual Analysis To test assumption of normality of residuals, I use a histogram of residuals <!-- --> The residuals look normal enough to meet the condition --- ## Constant Variance <!-- --> Residuals are randomly distributed, have no pattern. Condition of constant variance is met. --- ## Conclusion Linear model: murderPerMillion = -34.07 + 1.22(percentBelow5000) + 4.40(percentUnemployed) Means that for every unit increase in percentBelow5000 and percentUnemployed, the murder rate per million will increase by 1.22 and 4.40 respectively. Because the model was found to be statistically significant, [R-Squared = 0.78, F(3, 16) = 24.02, p<0.001] we reject the null and accept the alternative hypothesis that the poverty rate and unemployment rate do affect the murder rate of a given area.