library(tibble)
library(ggplot2)
library(readxl)
read_excel("TexasCountyPoverty.xlsx")
## # A tibble: 7 × 4
## `Label (Grouping)` Population for whom …¹ pct_decimal county_vote_turnout …²
## <chr> <dbl> <dbl> <dbl>
## 1 Bexar County, Texas… 0.157 0.157 44.2
## 2 Dallas County, Texa… 0.142 0.142 44.2
## 3 El Paso County, Tex… 0.213 0.213 32.9
## 4 Harris County, Texa… 0.165 0.165 43.5
## 5 Hidalgo County, Tex… 0.276 0.276 34.4
## 6 Tarrant County, Tex… 0.106 0.106 47.0
## 7 Webb County, Texas!… 0.201 0.201 31.2
## # ℹ abbreviated names: ¹`Population for whom poverty status is determined`,
## # ²`county_vote_turnout ?`
TexasCountyPoverty <- read_excel("TexasCountyPoverty.xlsx")
x<-TexasCountyPoverty$`Population for whom poverty status is determined`
y<-TexasCountyPoverty$`county_vote_turnout ?`
texas_data<-data.frame(x=x,y=y)
ggplot(texas_data,aes(x=x,y=y)) + geom_point() + geom_smooth(method='lm',color='red')
## `geom_smooth()` using formula = 'y ~ x'
poverty_model<-lm(x~y,data = texas_data)
summary(poverty_model)
##
## Call:
## lm(formula = x ~ y, data = texas_data)
##
## Residuals:
## 1 2 3 4 5 6 7
## 0.008651 -0.005788 -0.014070 0.012367 0.058902 -0.022055 -0.038008
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.458385 0.086014 5.329 0.00312 **
## y -0.007022 0.002145 -3.274 0.02211 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.03423 on 5 degrees of freedom
## Multiple R-squared: 0.6819, Adjusted R-squared: 0.6183
## F-statistic: 10.72 on 1 and 5 DF, p-value: 0.02211
Poverty effects 95% of turnout dropping in the counties with 68% of x (poverty level) being explained by y (voter turnout). Y has a negative effect on X, reducing it with every increase.
plot(poverty_model,which=1)
No, while it is close to linearity, the final two points break it.