I hypothesize that there is a relationship between the type of neighborhood voters live (urban, rural, suburb, etc.) and how they view immigrant contributions to the society (contribute, drain, neither, or unsure). I will be analyzing responses to the voter data dataset in order to test this hypothesis.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
voterdata <- read.csv("/Users/Nazija/Downloads/Abbreviated Voter Dataset Labeled.csv")
head(voterdata)
## gender race education familyincome children region
## 1 Female White 4-year Prefer not to say No West
## 2 Female White Some College $60K-$69,999 No West
## 3 Male White High School Graduate $50K-$59,999 No Midwest
## 4 Male White Some College $70K-$79,999 No South
## 5 Male White 4-year $40K-$49,999 No South
## 6 Female White 2-year $30K-$39,999 No West
## urbancity Vote2012 Vote2016 TrumpSanders
## 1 Suburb Barack Obama Hillary Clinton Bernie Sanders
## 2 Rural Area Mitt Romney Donald Trump Donald Trump
## 3 City Mitt Romney Hillary Clinton Bernie Sanders
## 4 City Barack Obama Gary Johnson Bernie Sanders
## 5 Suburb Mitt Romney Donald Trump Donald Trump
## 6 Suburb Barack Obama Hillary Clinton Bernie Sanders
## PartyRegistration PartyIdentification PartyIdentification2
## 1 <NA> Democrat Not very strong Democrat
## 2 Republican Republican Strong Republican
## 3 <NA> Republican Strong Republican
## 4 Decline/No Party/Independent Independent Independent
## 5 <NA> Republican Strong Republican
## 6 Democrat Democrat Strong Democrat
## PartyIdentification3 NewsPublicAffairs DemPrimary RepPrimary
## 1 Moderate Most of the time Hillary Clinton <NA>
## 2 Conservative Most of the time <NA> Donald Trump
## 3 Moderate Most of the time Hillary Clinton <NA>
## 4 Moderate Most of the time Someone Else <NA>
## 5 Conservative Most of the time <NA> Marco Rubio
## 6 Very Liberal Most of the time Hillary Clinton <NA>
## ImmigrantContributions ImmigrantNaturalization ImmigrationShouldBe
## 1 Mostly Contribute Favor Slightly Easier
## 2 Mostly a Drain Not Sure No change
## 3 Mostly Contribute Favor Much Easier
## 4 Mostly Contribute Favor Much Easier
## 5 Mostly a Drain Not Sure Slightly Easier
## 6 Mostly Contribute Favor Slightly Harder
## Abortion GayMarriage DeathPenalty
## 1 Legal in all cases Favor Oppose
## 2 Legal in some cases and Illegal in others Oppose Favor
## 3 Legal in all cases Favor Favor
## 4 Legal in some cases and Illegal in others Favor Favor
## 5 Illegal in all cases Oppose Favor
## 6 Legal in all cases Favor Not Sure
## DeathPenaltyFreq TaxWealthy Healthcare GlobWarmExist
## 1 Too Often Favor Yes Definitely is happening
## 2 Not Often Enough Oppose No Definitely not happening
## 3 Not Often Enough Favor Yes Definitely is happening
## 4 About Right Favor Yes Definitely is happening
## 5 Not Often Enough Oppose No Definitely not happening
## 6 Not Sure Favor Yes Definitely is happening
## GlobWarmingSerious AffirmativeAction Religion
## 1 Very Serious Favor Roman Catholic
## 2 <NA> Oppose Mormon
## 3 Very Serious Favor Agnostic
## 4 Somewhat Serious Favor Nothing in Particular
## 5 <NA> Oppose Mormon
## 6 Very Serious Favor Agnostic
## ReligiousImportance ChurchAttendance PrayerFrequency NumChildren
## 1 Somewhat Important Seldom Once a day 0
## 2 Very Important More than once a week Several times a day 0
## 3 Not at all Important Seldom Never 0
## 4 Not at all Important Seldom A few times a month 0
## 5 Very Important Once a week A few times a week 0
## 6 Not at all Important Never Never 0
## areatype GunOwnership EconomyBetterWorse Immigr_Economy_GiveTake
## 1 Suburb No Gun in Household Getting Better 7
## 2 Rural Area Gun in Household About the Same 10
## 3 City Gun in Household Getting Better 8
## 4 City No Gun in Household Getting Better NA
## 5 Suburb No Gun in Household Getting Worse 7
## 6 Suburb Gun in Household About the Same 10
## ft_fem_2017 ft_immig_2017 ft_police_2017 ft_dem_2017 ft_rep_2017
## 1 99 95 76 88 21
## 2 65 96 95 86 96
## 3 74 77 78 91 20
## 4 NA NA NA NA NA
## 5 25 91 94 22 83
## 6 100 100 28 99 NA
## ft_evang_2017 ft_muslim_2017 ft_jew_2017 ft_christ_2017 ft_gays_2017
## 1 50 50 50 50 50
## 2 96 61 100 98 82
## 3 2 49 25 50 77
## 4 NA NA NA NA NA
## 5 70 80 91 94 71
## 6 NA 100 100 28 100
## ft_unions_2017 ft_altright_2017 ft_black_2017 ft_white_2017 ft_hisp_2017
## 1 80 1 51 50 79
## 2 62 50 98 90 95
## 3 100 0 87 90 91
## 4 NA NA NA NA NA
## 5 20 50 90 85 90
## 6 100 NA 100 50 100
Select variables necessary for analysis Filter to keep only those categories of interest in your analysis Store prepared data in a new object
voter <- voterdata%>%
select(urbancity, ImmigrantContributions)%>%
filter(ImmigrantContributions != "NA", urbancity != "Other")
The table below shows the actual % of responses given for each category of urbancity
table(voter$urbancity)%>%
prop.table()%>%
round(2)
##
## City Other Rural Area Suburb Town
## 0.29 0.00 0.19 0.38 0.14
The table below shows the actual % of responses given for each category of ImmigrantContributions
table(voter$ImmigrantContributions)%>%
prop.table()%>%
round(2)
##
## Mostly a Drain Mostly Contribute Neither Not Sure
## 0.52 0.27 0.12 0.08
Below are the values that we would expect to observe in a crosstab if the two variables were completely independent of eachother. This is what we might consider the “null hypothesis”.
VariableA//Response2 % * VariableB Response1 % = XX%
VariableA//Response2 % * VariableB Response2 % = %
VariableA//Response2 % * VariableB Response3 % = %
[Note that the above example illustrates a scenario where Variable A has 2 categories, and Variable B has 3 categories. Your variables might be slightly different.]
City * Mostly Contribute = 0.28 * .27 = 0.08
City * Neither = 0.28 * .12 = 0.03
City * Mostly a Drain = 0.28 * .52 = 0.15
City * Not Sure = 0.28 * .08 = 0.02
Rural Area * Mostly Contribute = .19 * .27 = 0.05
Rural Area * Neither = .19 * .12 = 0.02
Rural Area * Mostly a Drain = .19 * .52 = 0.1
Rural Area * Not Sure = .19 * .08 = 0.02
Suburb * Mostly Contribute = .38 * .27 = 0.1
Suburb * Neither = .38 * .12 = 0.05
Suburb * Mostly a Drain = .38 * .52 = 0.2
Suburb * Not Sure = .38 * .08 = 0.03
Town * Mostly Contribute = .14 * .27 = 0.04
Town * Neither = .14 * .12 = 0.02
Town * Mostly a Drain = .14 * .52 = 0.07
Town * Not Sure = .14 * .08 = 0.01
The table below shows the actual % of responses for each category combination. A crosstab showing table %. These values are not very different from the expected observations from the null hypothesis.
table(voter$urbancity, voter$ImmigrantContributions)%>%
prop.table()%>%
round(2)
##
## Mostly a Drain Mostly Contribute Neither Not Sure
## City 0.12 0.10 0.03 0.03
## Other 0.00 0.00 0.00 0.00
## Rural Area 0.12 0.03 0.02 0.02
## Suburb 0.20 0.10 0.05 0.03
## Town 0.08 0.04 0.02 0.01
The table below shows [row%]/[column%] to highlight the relationship of interest. If your independent variable is represented in the rows of your table, calculate row %. If your independent variable is represented in the columns of your table, calculate columns %.
table(voter$urbancity, voter$ImmigrantContributions)%>%
prop.table(1)%>%
round(2)
##
## Mostly a Drain Mostly Contribute Neither Not Sure
## City 0.43 0.34 0.12 0.11
## Other
## Rural Area 0.62 0.18 0.12 0.08
## Suburb 0.53 0.27 0.13 0.07
## Town 0.56 0.25 0.12 0.07
voter%>%
group_by(urbancity, ImmigrantContributions)%>%
summarize(n = n())%>%
mutate(percentage = n/sum(n))
## `summarise()` regrouping output by 'urbancity' (override with `.groups` argument)
## # A tibble: 16 x 4
## # Groups: urbancity [4]
## urbancity ImmigrantContributions n percentage
## <fct> <fct> <int> <dbl>
## 1 City Mostly a Drain 973 0.433
## 2 City Mostly Contribute 761 0.339
## 3 City Neither 269 0.120
## 4 City Not Sure 242 0.108
## 5 Rural Area Mostly a Drain 918 0.622
## 6 Rural Area Mostly Contribute 264 0.179
## 7 Rural Area Neither 173 0.117
## 8 Rural Area Not Sure 120 0.0814
## 9 Suburb Mostly a Drain 1582 0.530
## 10 Suburb Mostly Contribute 799 0.268
## 11 Suburb Neither 384 0.129
## 12 Suburb Not Sure 221 0.0740
## 13 Town Mostly a Drain 639 0.563
## 14 Town Mostly Contribute 283 0.249
## 15 Town Neither 133 0.117
## 16 Town Not Sure 80 0.0705
voter%>%
group_by(urbancity, ImmigrantContributions)%>%
summarize(n = n())%>%
mutate(percentage = n/sum(n))%>%
ggplot()+
geom_col(aes(x = urbancity, y = percentage, fill = ImmigrantContributions ))
## `summarise()` regrouping output by 'urbancity' (override with `.groups` argument)
The table shows that the type of neighborhood voters lived in did seem to impact their views on immigrant contributions to society. Voters in urban neighborhoods were more likely to think that immigrants mostly contribute to society than their rural counterparts, who were more likely to think that immigrants are mostly a drain to society. Voters in suburbs and towns had very similar views on immigrant contributions, and fell between city voters and rural voters.
Below are the results of the chi-squared test for independence. This tells us whether there is a statistically significant relationship between the variables.
The results below indicate that there is a statistically significant relationship between the type of neighborhood a voter is from (urbancity) and their views on immigrant contributions (ImmigrantContributions)
chisq.test(voter$urbancity, voter$ImmigrantContributions)[7]
## $expected
## voter$ImmigrantContributions
## voter$urbancity Mostly a Drain Mostly Contribute Neither Not Sure
## City 1177.3294 603.2668 274.5766 189.82719
## Rural Area 773.5238 396.3557 180.4011 124.71942
## Suburb 1565.9268 802.3852 365.2052 252.48285
## Town 595.2200 304.9923 138.8171 95.97054
chisq.test(voter$urbancity, voter$ImmigrantContributions)[6]
## $observed
## voter$ImmigrantContributions
## voter$urbancity Mostly a Drain Mostly Contribute Neither Not Sure
## City 973 761 269 242
## Rural Area 918 264 173 120
## Suburb 1582 799 384 221
## Town 639 283 133 80
chisq.test(voter$urbancity, voter$ImmigrantContributions)[3]
## $p.value
## [1] 4.195969e-33
chisq.test(voter$urbancity, voter$ImmigrantContributions)
##
## Pearson's Chi-squared test
##
## data: voter$urbancity and voter$ImmigrantContributions
## X-squared = 175.6, df = 9, p-value < 2.2e-16
Since the p-value is well below .05, there is a statistically significant relationship between the two variables.