605-Discussion 11

y <- data.frame(read.table("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/DATA605/poverty.csv"))

df <- y[,-c(1,3,4,6)]
df[1][is.na(df[1])] <- 0
head(df)

##       V2        V5
## 1 PovPct ViolCrime
## 2   20.1      11.2
## 3    7.1       9.1
## 4   16.1      10.4
## 5   14.9      10.4
## 6   16.7      11.2

plot(df, xlab = "Poverty Percent", ylab = "Violation Crime", las = 1)
lines(lowess(df[[1]], df[[2]], f = 2/3, iter = 3), col = "red")
title(main = "Poverty Index")

Poverty_Percent <- log(as.numeric(as.character(df[[1]])))
Violation_Crime <- log(as.numeric(as.character(df[[2]])))
lregression <- lm(Violation_Crime ~ Poverty_Percent, data = df)
summary(lregression)

## 
## Call:
## lm(formula = Violation_Crime ~ Poverty_Percent, data = df)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.83801 -0.33373  0.05525  0.37489  1.77400 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      -1.0048     0.7292  -1.378  0.17446    
## Poverty_Percent   1.1016     0.2866   3.843  0.00035 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6434 on 49 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2316, Adjusted R-squared:  0.2159 
## F-statistic: 14.77 on 1 and 49 DF,  p-value: 0.00035

par(mfrow = c(2, 2))
plot(lregression)

Even though R-squared is low (23.2%) and with low p-value, we can still show that there’s a real relationship between the response variable and the predictors. We need to take into consideration that the dataset is only 52 rows.

605-Discussion 11

Ohannes (Hovig) Ohannessian

11/6/2018