============================================================================================================
About: This document is also available at http://rpubs.com/sherloconan/466232
To identify the correlations between violent / nonviolent crime rates (total number of crimes per 100,000 population) in 1995 and five demographic factors of communities in 1990: population, ethnicity percentage, age group percentage, average income per capita, education level.
Violent crime includes but is not limited to murder, criminal homicide (voluntary manslaughter), forcible rape, aggravated assault, and robbery.
Non-violent crime includes but is not limited to property crime: theft, embezzlement, arson of personal property, fraud, tax crimes, drug and alcohol-related crimes, prostitution, gambling and racketeering, bribery.
Note: VC=Violent Crime, NVC=Non-violent Crime
“Communities and Crime Unnormalized Data Set” is a cross-sectional data uploaded on March 2nd, 2011, created by Michael Redmond, and archived on Machine Learning Repository at UCI. Link: http://archive.ics.uci.edu/ml/datasets/communities+and+crime+unnormalized.
Error in make.names(col.names, unique = TRUE) : invalid multibyte string at ‘
In the format of txt extension, correct the typo in the first column of headers, and then read the dataset again.
crime <- read.csv("~/Documents/HU/ANLY 506-51-B/Week 1, Oct 27- Nov 2nd/crimedata.csv")
Dataset contains 147 attributes and 2215 instances including missing values as “?”. Demographic statistics are in Column 6 - 129 and were recorded in 1990. Crime rates are in Column 130 - 147 and were recorded in 1995. By definition, the first five columns should be the factor type, and the rest are numeric (some are integer while others are double). Warning message suggests 41 columns containing NA. Hawaii (HI), Montana (MT), and Nebraska (NE) states are not included in the dataset while one instance from Washington District of Columbia (DC) has been recorded.
crimeSub <- crime
crimeSub[,5] <- as.factor(crimeSub[,5])
crimeSub[,-c(1,2,5)] <- sapply(crimeSub[,-c(1,2,5)],as.character)
crimeSub[,-c(1,2,5)] <- sapply(crimeSub[,-c(1,2,5)],as.numeric)
crime <- crimeSub; rm(crimeSub)
library(usmap)
us <- as.data.frame(table(crime$state)); colnames(us) <- c("state","count")
plot_usmap(data=us,values="count")+scale_fill_continuous(low="white",high="red",name="Count")+ggtitle("Frequency Count of States in Dataset")+theme(legend.position="right",plot.title=element_text(hjust=0.5,size=15,face="bold"))
Select the dependent variable:
ViolentCrimesPerPop – total number of violent crimes per 100K popuation
nonViolPerPop – total number of non-violent crimes per 100K popuation
Select the independent variables (five demographic factors):
1, Population: population – population for community
2, Ethnicity: racePctWhite – percentage of population that is caucasian
crime$raceSum <- crime$racepctblack+crime$racePctWhite+crime$racePctAsian+crime$racePctHisp
c(sum(crime$raceSum==100),sum(crime$raceSum<100),sum(crime$raceSum>100),range(crime$raceSum))
## [1] 2.00 164.00 2049.00 72.95 175.41
crime[crime$raceSum==min(crime$raceSum),c(1:2)]
## communityname state
## 940 Tahlequahcity OK
crime[crime$raceSum==max(crime$raceSum),c(1:2)]
## communityname state
## 1042 Brownsvillecity TX
QUESTION: why is the sum of race percentage as low as 73%?
[Hint] http://worldpopulationreview.com/us-cities/tahlequah-ok-population/
3, Age: agePct65up / agePct12t29
– agePct65up: percentage of population that is 65 and over in age
– agePct12t29: percentage of population that is 12-29 in age
crime$primaryKey <- paste(crime$communityname,crime$state,sep="_")
if(!(0 %in% crime$agePct12t29)){
crime$age <- round(crime$agePct65up/crime$agePct12t29,4)}
4, Income: medIncome / householdsize
– householdsize: mean people per household
– medIncome: median household income
if(!(0 %in% crime$householdsize)){
crime$income <- round(crime$medIncome/crime$householdsize,2)}
5, Education: PctNotHSGrad – percentage of people 25 and over that are not high school graduates
Below is an overview of a table of descriptive statistics for the list of variables. It is noticeable that there are numerous missing values in the dependent variable(s). Moreover, there is NULL value in ViolentCrimesPerPop, i.e., “0”. All these dirty data will be removed for modeling but can be updated and regarded as a validation set.
crimeNA <- crime[,c(149,6,9,150,151,36,146,147)] #subset of raw data
crimeOmit <- na.omit(crimeNA) #omit rows containing NA in ViolentCrimesPerPop or nonViolPerPop
colnames(crimeNA) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","NVC Rate")
colnames(crimeOmit) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","NVC Rate")
options(scipen=100,digits=2)
pastecs::stat.desc(crimeNA[,-1]) %>% kable() %>% kable_styling()
Population | Ethnicity | Age | Income | Education | VC Rate | NVC Rate | |
---|---|---|---|---|---|---|---|
nbr.val | 2215.0 | 2215.00 | 2215.00 | 2215.00 | 2215.00 | 1994 | 2118.00 |
nbr.null | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 1 | 0.00 |
nbr.na | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 221 | 97.00 |
min | 10005.0 | 2.68 | 0.04 | 2995.27 | 1.46 | 0 | 116.79 |
max | 7322564.0 | 99.63 | 5.15 | 42049.32 | 73.66 | 4877 | 27119.76 |
range | 7312559.0 | 96.95 | 5.12 | 39054.05 | 72.20 | 4877 | 27002.97 |
sum | 117656335.0 | 186015.30 | 1032.67 | 27915545.03 | 49405.84 | 1174623 | 10395656.14 |
median | 22792.0 | 90.35 | 0.43 | 11770.40 | 21.38 | 374 | 4425.45 |
mean | 53118.0 | 83.98 | 0.47 | 12602.95 | 22.31 | 589 | 4908.24 |
SE.mean | 4347.7 | 0.35 | 0.01 | 101.25 | 0.23 | 14 | 59.53 |
CI.mean.0.95 | 8526.0 | 0.68 | 0.01 | 198.56 | 0.46 | 27 | 116.74 |
var | 41869447877.7 | 269.59 | 0.10 | 22707332.49 | 120.77 | 377960 | 7506004.86 |
std.dev | 204620.2 | 16.42 | 0.31 | 4765.22 | 10.99 | 615 | 2739.71 |
coef.var | 3.9 | 0.20 | 0.66 | 0.38 | 0.49 | 1 | 0.56 |
subset(crimeNA,`VC Rate`==0) %>% kable() %>% kable_styling()
Community | Population | Ethnicity | Age | Income | Education | VC Rate | NVC Rate | |
---|---|---|---|---|---|---|---|---|
1397 | Spencercity_IA | 11066 | 99 | 0.58 | 10196 | 15 | 0 | NA |
ggplot(subset(crimeNA,Population<1000000),aes(Population))+geom_histogram(binwidth =35000,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Population")
ggplot(crimeNA,aes(Ethnicity))+geom_histogram(binwidth=8,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Ethnicity")+xlab("Ethnicity (%)")
ggplot(crimeNA,aes(Age))+geom_histogram(binwidth=0.1,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Age")
ggplot(crimeNA,aes(Income))+geom_histogram(binwidth=1000,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Income")+xlab("Income (USD)")
ggplot(crimeNA,aes(Education))+geom_histogram(binwidth=1,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Education")+xlab("Education (%)")
ggplot(crimeNA,aes(Population,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Population and VC Rate")+ylab("VC Rate")
ggplot(crimeNA,aes(Ethnicity,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Ethnicity and VC Rate")+xlab("Ethnicity (%)")+ylab("VC Rate")
ggplot(crimeNA,aes(Age,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Age and VC Rate")+ylab("VC Rate")
ggplot(crimeNA,aes(Income,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Income and VC Rate")+xlab("Income (USD)")+ylab("VC Rate")
ggplot(crimeNA,aes(Education,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Education and VC Rate")+xlab("Education (%)")+ylab("VC Rate")
ggplot(crimeNA,aes(Population,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Population and NVC Rate")+ylab("NVC Rate")
ggplot(crimeNA,aes(Ethnicity,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Ethnicity and NVC Rate")+xlab("Ethnicity (%)")+ylab("NVC Rate")
ggplot(crimeNA,aes(Age,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Age and NVC Rate")+ylab("NVC Rate")
ggplot(crimeNA,aes(Income,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Income and NVC Rate")+xlab("Income (USD)")+ylab("NVC Rate")
ggplot(crimeNA,aes(Education,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Education and NVC Rate")+xlab("Education (%)")+ylab("NVC Rate")
The correlations between each pair of variables are shown as below.
correlations <- round(cor(crimeOmit[,-1]),2)
corrplot::corrplot(correlations,method="color",type="lower",addCoef.col="black",tl.col="black",tl.srt=45,diag=F)
The idea is to group data into some number of clusters and perform k-means clustering (k=10) on the five demographic factors to see how communities vary from each other. Then, under a certain cluster, a specific model will be built for it. The column indicator for population, ethnicity, age, income, and education is 6, 9, 150, 151, and 36 respectively in this case.
#distance matrix of communities
dmCommunities <- cluster::daisy(crimeNA[,-c(1,7,8)])
set.seed(506)
crime$Cluster <- as.factor(kmeans(dmCommunities,10)$cluster)
#NOTICE: the group's indicator numbers may differ
crimeVio <- na.omit(crime[,c(149,6,9,150,151,36,146,152)]) #omit rows containing NA in ViolentCrimesPerPop
crimeNon <- na.omit(crime[,c(149,6,9,150,151,36,147,152)]) #omit rows containing NA in nonViolPerPop
crimeVioS <- crimeVio; crimeNonS <- crimeNon
#standardizing all seven variables, i.e., mean=0, sd=1
crimeVioS[,-c(1,8)] <- scale(crimeVioS[,-c(1,8)])
crimeNonS[,-c(1,8)] <- scale(crimeNonS[,-c(1,8)])
colnames(crimeVioS) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","Cluster")
colnames(crimeNonS) <- c("Community","Population","Ethnicity","Age","Income","Education","NVC Rate","Cluster")
#the count table of 10 clusters
table(crimeVio$Cluster)
##
## 1 2 3 4 5 6 7 8 9 10
## 384 4 1 21 1329 24 11 1 59 160
colnames(crimeVio) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","Cluster")
DT::datatable(crimeVio,filter="top")
#the count table of 10 clusters
table(crimeNon$Cluster)
##
## 1 2 3 4 5 6 7 8 9 10
## 410 6 2 22 1408 22 10 1 63 174
colnames(crimeNon) <- c("Community","Population","Ethnicity","Age","Income","Education","NVC Rate","Cluster")
DT::datatable(crimeNon,filter="top")
p1 <- ggplot(subset(crime,Cluster %in% c(5)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
p2 <- ggplot(subset(crime,Cluster %in% c(1,10)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
p3 <- ggplot(subset(crime,Cluster %in% c(4,6,9)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
p4 <- ggplot(subset(crime,Cluster %in% c(2,3,7,8)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
gridExtra::grid.arrange(p1,p2,p3,p4,nrow=2,bottom="Cluster Analysis")
Filtered on some clusters, a model of linear regression on multi-variables is built respectively. The p-value close to 0 is statistically significant at an alpha level of 0.05. Hence, reject the null hypothesis that independent variables do not have an effect on the dependent variable. In other words, factors such as population, ethnicity, age, income, and education do relate to violent / non-violent crime rates. However, the adjusted R-squared ranges from 0.239 to 0.571, suggesting the linear model may not suit so well.
Take the coefficients in Cluster 5 scenario as an example, factors such as population, age, and education do positively relate to the dependent variable while factors such as ethnicity and income do negatively relate to the dependent variable. Considering variable definition, crime rate will decrease if population is lower or Caucasian percentage is higher or senior proportion is lower or income is higher or education level is higher.
fitVio5 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==5))
summary(fitVio5)
##
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeVioS, Cluster == 5))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.824 -0.295 -0.098 0.156 3.772
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2274 0.0902 2.52 0.012 *
## Population 2.0880 0.5279 3.96 0.000080511 ***
## Ethnicity -0.4718 0.0217 -21.73 < 0.0000000000000002 ***
## Age 0.0968 0.0175 5.52 0.000000041 ***
## Income -0.0886 0.0223 -3.97 0.000076205 ***
## Education 0.1176 0.0258 4.57 0.000005451 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.59 on 1323 degrees of freedom
## Multiple R-squared: 0.46, Adjusted R-squared: 0.458
## F-statistic: 226 on 5 and 1323 DF, p-value: <0.0000000000000002
par(mfrow=c(2,2)); plot(fitVio5)
fitNon5 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==5))
summary(fitNon5)
##
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeNonS, Cluster == 5))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.354 -0.459 -0.139 0.299 8.593
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2503 0.1231 2.03 0.0422 *
## Population 2.0565 0.7231 2.84 0.0045 **
## Ethnicity -0.3965 0.0287 -13.83 < 0.0000000000000002 ***
## Age 0.1508 0.0233 6.46 0.00000000015 ***
## Income -0.2661 0.0300 -8.88 < 0.0000000000000002 ***
## Education 0.0150 0.0345 0.43 0.6637
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.81 on 1402 degrees of freedom
## Multiple R-squared: 0.312, Adjusted R-squared: 0.31
## F-statistic: 127 on 5 and 1402 DF, p-value: <0.0000000000000002
plot(fitNon5)
fitVio1 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==1))
summary(fitVio1)
##
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeVioS, Cluster == 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.956 -0.387 -0.109 0.193 4.867
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0908 0.0483 1.88 0.0606 .
## Population 0.3768 0.8626 0.44 0.6625
## Ethnicity -0.5223 0.0470 -11.11 <0.0000000000000002 ***
## Age 0.0512 0.0330 1.55 0.1212
## Income -0.1569 0.0598 -2.63 0.0090 **
## Education 0.1670 0.0621 2.69 0.0075 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.79 on 378 degrees of freedom
## Multiple R-squared: 0.476, Adjusted R-squared: 0.469
## F-statistic: 68.5 on 5 and 378 DF, p-value: <0.0000000000000002
par(mfrow=c(2,2)); plot(fitVio1)
fitNon1 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==1))
summary(fitNon1)
##
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeNonS, Cluster == 1))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.064 -0.415 -0.098 0.284 6.918
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.1142 0.0514 2.22 0.027 *
## Population 0.5025 0.9020 0.56 0.578
## Ethnicity -0.2700 0.0487 -5.55 0.000000052 ***
## Age 0.0465 0.0335 1.39 0.166
## Income -0.3244 0.0606 -5.36 0.000000143 ***
## Education -0.0212 0.0633 -0.33 0.738
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.81 on 404 degrees of freedom
## Multiple R-squared: 0.249, Adjusted R-squared: 0.239
## F-statistic: 26.7 on 5 and 404 DF, p-value: <0.0000000000000002
plot(fitNon1)
fitVio10 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==10))
summary(fitVio10)
##
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeVioS, Cluster == 10))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.979 -0.385 -0.141 0.301 2.251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2757 0.1505 1.83 0.069 .
## Population -0.1145 0.7751 -0.15 0.883
## Ethnicity -0.6514 0.0598 -10.89 < 0.0000000000000002 ***
## Age 0.4447 0.1001 4.44 0.000017 ***
## Income -0.4037 0.0930 -4.34 0.000025 ***
## Education -0.0718 0.0849 -0.85 0.399
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.71 on 154 degrees of freedom
## Multiple R-squared: 0.585, Adjusted R-squared: 0.571
## F-statistic: 43.4 on 5 and 154 DF, p-value: <0.0000000000000002
par(mfrow=c(2,2)); plot(fitVio10)
fitNon10 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==10))
summary(fitNon10)
##
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeNonS, Cluster == 10))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5131 -0.4190 -0.0492 0.3629 2.3331
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2412 0.1338 1.80 0.0734 .
## Population 0.6452 0.7317 0.88 0.3792
## Ethnicity -0.2692 0.0567 -4.75 0.000004431 ***
## Age 0.5224 0.0933 5.60 0.000000086 ***
## Income -0.5205 0.0873 -5.96 0.000000014 ***
## Education -0.2425 0.0814 -2.98 0.0033 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.69 on 168 degrees of freedom
## Multiple R-squared: 0.352, Adjusted R-squared: 0.333
## F-statistic: 18.3 on 5 and 168 DF, p-value: 0.0000000000000187
plot(fitNon10)
fitVio9 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==9))
summary(fitVio9)
##
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeVioS, Cluster == 9))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.502 -0.543 -0.055 0.237 3.337
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.694 0.572 1.21 0.231
## Population -0.877 1.064 -0.82 0.414
## Ethnicity -0.576 0.185 -3.11 0.003 **
## Age 0.237 0.333 0.71 0.479
## Income -0.658 0.316 -2.08 0.042 *
## Education -0.108 0.268 -0.40 0.690
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.91 on 53 degrees of freedom
## Multiple R-squared: 0.354, Adjusted R-squared: 0.293
## F-statistic: 5.81 on 5 and 53 DF, p-value: 0.00024
par(mfrow=c(2,2)); plot(fitVio9)
fitNon9 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==9))
summary(fitNon9)
##
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income +
## Education, data = subset(crimeNonS, Cluster == 9))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.3306 -0.4625 0.0593 0.3922 2.1699
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.949 0.421 2.25 0.0281 *
## Population -1.001 0.812 -1.23 0.2232
## Ethnicity -0.469 0.139 -3.37 0.0013 **
## Age 0.585 0.247 2.37 0.0214 *
## Income -1.051 0.242 -4.34 0.000059 ***
## Education -0.573 0.204 -2.81 0.0067 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7 on 57 degrees of freedom
## Multiple R-squared: 0.397, Adjusted R-squared: 0.344
## F-statistic: 7.5 on 5 and 57 DF, p-value: 0.0000182
plot(fitNon9)
Applied the data analysis techniques learnt in the course such as descriptive analysis, exploratory data analysis (EDA), principal component analysis (PCA), cluster analysis, linear regression, hypothesis test, and variance analysis, though some may not be covered in this document.
Conducted 10-means clustering by identifying each community’s demographic factors namely population, ethnicity, age, income, and education.
Built a linear regression model to assess crime rates by factors namely population, ethnicity, age, income, and education.