============================================================================================================
About: This document is also available at http://rpubs.com/sherloconan/466232

Objective

To identify the correlations between violent / nonviolent crime rates (total number of crimes per 100,000 population) in 1995 and five demographic factors of communities in 1990: population, ethnicity percentage, age group percentage, average income per capita, education level.

 

Literature Review

Violent crime includes but is not limited to murder, criminal homicide (voluntary manslaughter), forcible rape, aggravated assault, and robbery.
Non-violent crime includes but is not limited to property crime: theft, embezzlement, arson of personal property, fraud, tax crimes, drug and alcohol-related crimes, prostitution, gambling and racketeering, bribery.
Note: VC=Violent Crime, NVC=Non-violent Crime

1968

1996

2002

2004

2018

Reference

 

Data Source

“Communities and Crime Unnormalized Data Set” is a cross-sectional data uploaded on March 2nd, 2011, created by Michael Redmond, and archived on Machine Learning Repository at UCI. Link: http://archive.ics.uci.edu/ml/datasets/communities+and+crime+unnormalized.

Error in make.names(col.names, unique = TRUE) : invalid multibyte string at ‘com<6d>unityname’.
In the format of txt extension, correct the typo in the first column of headers, and then read the dataset again.

crime <- read.csv("~/Documents/HU/ANLY 506-51-B/Week 1, Oct 27- Nov 2nd/crimedata.csv")

 

Data Cleaning

Dataset contains 147 attributes and 2215 instances including missing values as “?”. Demographic statistics are in Column 6 - 129 and were recorded in 1990. Crime rates are in Column 130 - 147 and were recorded in 1995. By definition, the first five columns should be the factor type, and the rest are numeric (some are integer while others are double). Warning message suggests 41 columns containing NA. Hawaii (HI), Montana (MT), and Nebraska (NE) states are not included in the dataset while one instance from Washington District of Columbia (DC) has been recorded.

crimeSub <- crime
crimeSub[,5] <- as.factor(crimeSub[,5])
crimeSub[,-c(1,2,5)] <- sapply(crimeSub[,-c(1,2,5)],as.character)
crimeSub[,-c(1,2,5)] <- sapply(crimeSub[,-c(1,2,5)],as.numeric)
crime <- crimeSub; rm(crimeSub)
library(usmap)
us <- as.data.frame(table(crime$state)); colnames(us) <- c("state","count")
plot_usmap(data=us,values="count")+scale_fill_continuous(low="white",high="red",name="Count")+ggtitle("Frequency Count of States in Dataset")+theme(legend.position="right",plot.title=element_text(hjust=0.5,size=15,face="bold"))

Select the dependent variable:
ViolentCrimesPerPop – total number of violent crimes per 100K popuation

nonViolPerPop – total number of non-violent crimes per 100K popuation

 

Select the independent variables (five demographic factors):
1, Population: population – population for community

2, Ethnicity: racePctWhite – percentage of population that is caucasian

crime$raceSum <- crime$racepctblack+crime$racePctWhite+crime$racePctAsian+crime$racePctHisp
c(sum(crime$raceSum==100),sum(crime$raceSum<100),sum(crime$raceSum>100),range(crime$raceSum))
## [1]    2.00  164.00 2049.00   72.95  175.41
crime[crime$raceSum==min(crime$raceSum),c(1:2)]
##     communityname state
## 940 Tahlequahcity    OK
crime[crime$raceSum==max(crime$raceSum),c(1:2)]
##        communityname state
## 1042 Brownsvillecity    TX

QUESTION: why is the sum of race percentage as low as 73%?
[Hint] http://worldpopulationreview.com/us-cities/tahlequah-ok-population/

3, Age: agePct65up / agePct12t29
– agePct65up: percentage of population that is 65 and over in age
– agePct12t29: percentage of population that is 12-29 in age

crime$primaryKey <- paste(crime$communityname,crime$state,sep="_")
if(!(0 %in% crime$agePct12t29)){
  crime$age <- round(crime$agePct65up/crime$agePct12t29,4)}

4, Income: medIncome / householdsize
– householdsize: mean people per household
– medIncome: median household income

if(!(0 %in% crime$householdsize)){
  crime$income <- round(crime$medIncome/crime$householdsize,2)}

5, Education: PctNotHSGrad – percentage of people 25 and over that are not high school graduates

 

Exploratory Data Analysis (EDA)

Below is an overview of a table of descriptive statistics for the list of variables. It is noticeable that there are numerous missing values in the dependent variable(s). Moreover, there is NULL value in ViolentCrimesPerPop, i.e., “0”. All these dirty data will be removed for modeling but can be updated and regarded as a validation set.

crimeNA <- crime[,c(149,6,9,150,151,36,146,147)] #subset of raw data
crimeOmit <- na.omit(crimeNA) #omit rows containing NA in ViolentCrimesPerPop or nonViolPerPop
colnames(crimeNA) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","NVC Rate")
colnames(crimeOmit) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","NVC Rate")

options(scipen=100,digits=2)
pastecs::stat.desc(crimeNA[,-1]) %>% kable() %>% kable_styling()
Population Ethnicity Age Income Education VC Rate NVC Rate
nbr.val 2215.0 2215.00 2215.00 2215.00 2215.00 1994 2118.00
nbr.null 0.0 0.00 0.00 0.00 0.00 1 0.00
nbr.na 0.0 0.00 0.00 0.00 0.00 221 97.00
min 10005.0 2.68 0.04 2995.27 1.46 0 116.79
max 7322564.0 99.63 5.15 42049.32 73.66 4877 27119.76
range 7312559.0 96.95 5.12 39054.05 72.20 4877 27002.97
sum 117656335.0 186015.30 1032.67 27915545.03 49405.84 1174623 10395656.14
median 22792.0 90.35 0.43 11770.40 21.38 374 4425.45
mean 53118.0 83.98 0.47 12602.95 22.31 589 4908.24
SE.mean 4347.7 0.35 0.01 101.25 0.23 14 59.53
CI.mean.0.95 8526.0 0.68 0.01 198.56 0.46 27 116.74
var 41869447877.7 269.59 0.10 22707332.49 120.77 377960 7506004.86
std.dev 204620.2 16.42 0.31 4765.22 10.99 615 2739.71
coef.var 3.9 0.20 0.66 0.38 0.49 1 0.56
subset(crimeNA,`VC Rate`==0) %>% kable() %>% kable_styling()
Community Population Ethnicity Age Income Education VC Rate NVC Rate
1397 Spencercity_IA 11066 99 0.58 10196 15 0 NA

 

Histogram: Population

ggplot(subset(crimeNA,Population<1000000),aes(Population))+geom_histogram(binwidth =35000,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Population")

Ethnicity

ggplot(crimeNA,aes(Ethnicity))+geom_histogram(binwidth=8,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Ethnicity")+xlab("Ethnicity (%)")

Age

ggplot(crimeNA,aes(Age))+geom_histogram(binwidth=0.1,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Age")

Income

ggplot(crimeNA,aes(Income))+geom_histogram(binwidth=1000,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Income")+xlab("Income (USD)")

Education

ggplot(crimeNA,aes(Education))+geom_histogram(binwidth=1,fill="black",color="white",alpha=0.5)+ggtitle("Histogram of Education")+xlab("Education (%)")

Scatter plot: Population - VC Rate

ggplot(crimeNA,aes(Population,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Population and VC Rate")+ylab("VC Rate")

Ethnicity - VC Rate

ggplot(crimeNA,aes(Ethnicity,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Ethnicity and VC Rate")+xlab("Ethnicity (%)")+ylab("VC Rate")

Age - VC Rate

ggplot(crimeNA,aes(Age,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Age and VC Rate")+ylab("VC Rate")

Income - VC Rate

ggplot(crimeNA,aes(Income,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Income and VC Rate")+xlab("Income (USD)")+ylab("VC Rate")

Education - VC Rate

ggplot(crimeNA,aes(Education,`VC Rate`))+geom_point()+ggtitle("Scatter Plot of Education and VC Rate")+xlab("Education (%)")+ylab("VC Rate")

Scatter plot: Population - NVC Rate

ggplot(crimeNA,aes(Population,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Population and NVC Rate")+ylab("NVC Rate")

Ethnicity - NVC Rate

ggplot(crimeNA,aes(Ethnicity,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Ethnicity and NVC Rate")+xlab("Ethnicity (%)")+ylab("NVC Rate")

Age - NVC Rate

ggplot(crimeNA,aes(Age,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Age and NVC Rate")+ylab("NVC Rate")

Income - NVC Rate

ggplot(crimeNA,aes(Income,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Income and NVC Rate")+xlab("Income (USD)")+ylab("NVC Rate")

Education - NVC Rate

ggplot(crimeNA,aes(Education,`NVC Rate`))+geom_point()+ggtitle("Scatter Plot of Education and NVC Rate")+xlab("Education (%)")+ylab("NVC Rate")

 

The correlations between each pair of variables are shown as below.

correlations <- round(cor(crimeOmit[,-1]),2)
corrplot::corrplot(correlations,method="color",type="lower",addCoef.col="black",tl.col="black",tl.srt=45,diag=F)

 

Clustering

The idea is to group data into some number of clusters and perform k-means clustering (k=10) on the five demographic factors to see how communities vary from each other. Then, under a certain cluster, a specific model will be built for it. The column indicator for population, ethnicity, age, income, and education is 6, 9, 150, 151, and 36 respectively in this case.

#distance matrix of communities
dmCommunities <- cluster::daisy(crimeNA[,-c(1,7,8)])
set.seed(506)
crime$Cluster <- as.factor(kmeans(dmCommunities,10)$cluster)
#NOTICE: the group's indicator numbers may differ

crimeVio <- na.omit(crime[,c(149,6,9,150,151,36,146,152)]) #omit rows containing NA in ViolentCrimesPerPop
crimeNon <- na.omit(crime[,c(149,6,9,150,151,36,147,152)]) #omit rows containing NA in nonViolPerPop
crimeVioS <- crimeVio; crimeNonS <- crimeNon
#standardizing all seven variables, i.e., mean=0, sd=1
crimeVioS[,-c(1,8)] <- scale(crimeVioS[,-c(1,8)])
crimeNonS[,-c(1,8)] <- scale(crimeNonS[,-c(1,8)])

colnames(crimeVioS) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","Cluster")
colnames(crimeNonS) <- c("Community","Population","Ethnicity","Age","Income","Education","NVC Rate","Cluster")

 

VC Rate

#the count table of 10 clusters
table(crimeVio$Cluster)
## 
##    1    2    3    4    5    6    7    8    9   10 
##  384    4    1   21 1329   24   11    1   59  160

 

colnames(crimeVio) <- c("Community","Population","Ethnicity","Age","Income","Education","VC Rate","Cluster")
DT::datatable(crimeVio,filter="top")

NVC Rate

#the count table of 10 clusters
table(crimeNon$Cluster)
## 
##    1    2    3    4    5    6    7    8    9   10 
##  410    6    2   22 1408   22   10    1   63  174

 

colnames(crimeNon) <- c("Community","Population","Ethnicity","Age","Income","Education","NVC Rate","Cluster")
DT::datatable(crimeNon,filter="top")

 

p1 <- ggplot(subset(crime,Cluster  %in% c(5)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
p2 <- ggplot(subset(crime,Cluster  %in% c(1,10)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
p3 <- ggplot(subset(crime,Cluster  %in% c(4,6,9)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
p4 <- ggplot(subset(crime,Cluster  %in% c(2,3,7,8)),aes(ViolentCrimesPerPop,nonViolPerPop))+geom_point(aes(color=Cluster))+xlab("VC Rate")+ylab("NVC Rate")+geom_abline(slope=1,color="black",linetype="dashed")
gridExtra::grid.arrange(p1,p2,p3,p4,nrow=2,bottom="Cluster Analysis")

 

Modeling

Filtered on some clusters, a model of linear regression on multi-variables is built respectively. The p-value close to 0 is statistically significant at an alpha level of 0.05. Hence, reject the null hypothesis that independent variables do not have an effect on the dependent variable. In other words, factors such as population, ethnicity, age, income, and education do relate to violent / non-violent crime rates. However, the adjusted R-squared ranges from 0.239 to 0.571, suggesting the linear model may not suit so well.

Take the coefficients in Cluster 5 scenario as an example, factors such as population, age, and education do positively relate to the dependent variable while factors such as ethnicity and income do negatively relate to the dependent variable. Considering variable definition, crime rate will decrease if population is lower or Caucasian percentage is higher or senior proportion is lower or income is higher or education level is higher.

Cluster 5

fitVio5 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==5))
summary(fitVio5)
## 
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeVioS, Cluster == 5))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.824 -0.295 -0.098  0.156  3.772 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)   0.2274     0.0902    2.52                0.012 *  
## Population    2.0880     0.5279    3.96          0.000080511 ***
## Ethnicity    -0.4718     0.0217  -21.73 < 0.0000000000000002 ***
## Age           0.0968     0.0175    5.52          0.000000041 ***
## Income       -0.0886     0.0223   -3.97          0.000076205 ***
## Education     0.1176     0.0258    4.57          0.000005451 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.59 on 1323 degrees of freedom
## Multiple R-squared:  0.46,   Adjusted R-squared:  0.458 
## F-statistic:  226 on 5 and 1323 DF,  p-value: <0.0000000000000002
par(mfrow=c(2,2)); plot(fitVio5)

fitNon5 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==5))
summary(fitNon5)
## 
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeNonS, Cluster == 5))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.354 -0.459 -0.139  0.299  8.593 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)   0.2503     0.1231    2.03               0.0422 *  
## Population    2.0565     0.7231    2.84               0.0045 ** 
## Ethnicity    -0.3965     0.0287  -13.83 < 0.0000000000000002 ***
## Age           0.1508     0.0233    6.46        0.00000000015 ***
## Income       -0.2661     0.0300   -8.88 < 0.0000000000000002 ***
## Education     0.0150     0.0345    0.43               0.6637    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.81 on 1402 degrees of freedom
## Multiple R-squared:  0.312,  Adjusted R-squared:  0.31 
## F-statistic:  127 on 5 and 1402 DF,  p-value: <0.0000000000000002
plot(fitNon5)

Cluster 1

fitVio1 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==1))
summary(fitVio1)
## 
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeVioS, Cluster == 1))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.956 -0.387 -0.109  0.193  4.867 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)   0.0908     0.0483    1.88              0.0606 .  
## Population    0.3768     0.8626    0.44              0.6625    
## Ethnicity    -0.5223     0.0470  -11.11 <0.0000000000000002 ***
## Age           0.0512     0.0330    1.55              0.1212    
## Income       -0.1569     0.0598   -2.63              0.0090 ** 
## Education     0.1670     0.0621    2.69              0.0075 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.79 on 378 degrees of freedom
## Multiple R-squared:  0.476,  Adjusted R-squared:  0.469 
## F-statistic: 68.5 on 5 and 378 DF,  p-value: <0.0000000000000002
par(mfrow=c(2,2)); plot(fitVio1)

fitNon1 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==1))
summary(fitNon1)
## 
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeNonS, Cluster == 1))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -2.064 -0.415 -0.098  0.284  6.918 
## 
## Coefficients:
##             Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)   0.1142     0.0514    2.22       0.027 *  
## Population    0.5025     0.9020    0.56       0.578    
## Ethnicity    -0.2700     0.0487   -5.55 0.000000052 ***
## Age           0.0465     0.0335    1.39       0.166    
## Income       -0.3244     0.0606   -5.36 0.000000143 ***
## Education    -0.0212     0.0633   -0.33       0.738    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.81 on 404 degrees of freedom
## Multiple R-squared:  0.249,  Adjusted R-squared:  0.239 
## F-statistic: 26.7 on 5 and 404 DF,  p-value: <0.0000000000000002
plot(fitNon1)

Cluster 10

fitVio10 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==10))
summary(fitVio10)
## 
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeVioS, Cluster == 10))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.979 -0.385 -0.141  0.301  2.251 
## 
## Coefficients:
##             Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)   0.2757     0.1505    1.83                0.069 .  
## Population   -0.1145     0.7751   -0.15                0.883    
## Ethnicity    -0.6514     0.0598  -10.89 < 0.0000000000000002 ***
## Age           0.4447     0.1001    4.44             0.000017 ***
## Income       -0.4037     0.0930   -4.34             0.000025 ***
## Education    -0.0718     0.0849   -0.85                0.399    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.71 on 154 degrees of freedom
## Multiple R-squared:  0.585,  Adjusted R-squared:  0.571 
## F-statistic: 43.4 on 5 and 154 DF,  p-value: <0.0000000000000002
par(mfrow=c(2,2)); plot(fitVio10)

fitNon10 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==10))
summary(fitNon10)
## 
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeNonS, Cluster == 10))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.5131 -0.4190 -0.0492  0.3629  2.3331 
## 
## Coefficients:
##             Estimate Std. Error t value    Pr(>|t|)    
## (Intercept)   0.2412     0.1338    1.80      0.0734 .  
## Population    0.6452     0.7317    0.88      0.3792    
## Ethnicity    -0.2692     0.0567   -4.75 0.000004431 ***
## Age           0.5224     0.0933    5.60 0.000000086 ***
## Income       -0.5205     0.0873   -5.96 0.000000014 ***
## Education    -0.2425     0.0814   -2.98      0.0033 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.69 on 168 degrees of freedom
## Multiple R-squared:  0.352,  Adjusted R-squared:  0.333 
## F-statistic: 18.3 on 5 and 168 DF,  p-value: 0.0000000000000187
plot(fitNon10)

Cluster 9

fitVio9 <- lm(`VC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeVioS,Cluster==9))
summary(fitVio9)
## 
## Call:
## lm(formula = `VC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeVioS, Cluster == 9))
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.502 -0.543 -0.055  0.237  3.337 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    0.694      0.572    1.21    0.231   
## Population    -0.877      1.064   -0.82    0.414   
## Ethnicity     -0.576      0.185   -3.11    0.003 **
## Age            0.237      0.333    0.71    0.479   
## Income        -0.658      0.316   -2.08    0.042 * 
## Education     -0.108      0.268   -0.40    0.690   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.91 on 53 degrees of freedom
## Multiple R-squared:  0.354,  Adjusted R-squared:  0.293 
## F-statistic: 5.81 on 5 and 53 DF,  p-value: 0.00024
par(mfrow=c(2,2)); plot(fitVio9)

fitNon9 <- lm(`NVC Rate`~Population+Ethnicity+Age+Income+Education,data=subset(crimeNonS,Cluster==9))
summary(fitNon9)
## 
## Call:
## lm(formula = `NVC Rate` ~ Population + Ethnicity + Age + Income + 
##     Education, data = subset(crimeNonS, Cluster == 9))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.3306 -0.4625  0.0593  0.3922  2.1699 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    0.949      0.421    2.25   0.0281 *  
## Population    -1.001      0.812   -1.23   0.2232    
## Ethnicity     -0.469      0.139   -3.37   0.0013 ** 
## Age            0.585      0.247    2.37   0.0214 *  
## Income        -1.051      0.242   -4.34 0.000059 ***
## Education     -0.573      0.204   -2.81   0.0067 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.7 on 57 degrees of freedom
## Multiple R-squared:  0.397,  Adjusted R-squared:  0.344 
## F-statistic:  7.5 on 5 and 57 DF,  p-value: 0.0000182
plot(fitNon9)

 

Conclusion