Final Project

Introduction:

In my report, I will be researching to find any statistical significant relationship between Party Identification and Immigrant Contributions,Naturalization and feelings towards immigrants.

2. Data Prep:

dataset<-read.csv("/Users/apple/Downloads/Abbreviated Voter Dataset Labeled.csv")

dataset<- dataset[,c("PartyIdentification", "ImmigrantContributions", "ImmigrantNaturalization", "ft_immig_2017")]
dataset<-na.omit(dataset)

head(dataset, 10)

##    PartyIdentification ImmigrantContributions ImmigrantNaturalization
## 1             Democrat      Mostly Contribute                   Favor
## 2           Republican         Mostly a Drain                Not Sure
## 3           Republican      Mostly Contribute                   Favor
## 5           Republican         Mostly a Drain                Not Sure
## 6             Democrat      Mostly Contribute                   Favor
## 7             Democrat      Mostly Contribute                   Favor
## 8          Independent         Mostly a Drain                  Oppose
## 9             Democrat      Mostly Contribute                   Favor
## 10            Democrat      Mostly Contribute                   Favor
## 12         Independent      Mostly Contribute                   Favor
##    ft_immig_2017
## 1             95
## 2             96
## 3             77
## 5             91
## 6            100
## 7            100
## 8              1
## 9             90
## 10            80
## 12            75

Analysis of Categorical Variable #1 (ImmigrantContributions):

Crosstab & Visualization:

suppressMessages(suppressWarnings(library(gmodels)))
table(dataset$PartyIdentification, dataset$ImmigrantContributions)

##              
##               Mostly a Drain Mostly Contribute Neither Not Sure
##   Democrat               559               770     243      207
##   Independent            799               396     200      115
##   Not Sure                38                 9       3        5
##   Other                   31                 9       8        5
##   Republican            1049               107     133       59

round(prop.table(table(dataset$PartyIdentification, dataset$ImmigrantContributions))*100,2)

##              
##               Mostly a Drain Mostly Contribute Neither Not Sure
##   Democrat             11.78             16.23    5.12     4.36
##   Independent          16.84              8.35    4.21     2.42
##   Not Sure              0.80              0.19    0.06     0.11
##   Other                 0.65              0.19    0.17     0.11
##   Republican           22.11              2.26    2.80     1.24

chisq.test(dataset$PartyIdentification, dataset$ImmigrantContributions)

## Warning in chisq.test(dataset$PartyIdentification,
## dataset$ImmigrantContributions): Chi-squared approximation may be incorrect

## 
##  Pearson's Chi-squared test
## 
## data:  dataset$PartyIdentification and dataset$ImmigrantContributions
## X-squared = 740.94, df = 12, p-value < 2.2e-16

barplot(prop.table(table(dataset$ImmigrantContributions, dataset$PartyIdentification))*100, ylab='Percentage', xlab='Party', legend.text=TRUE,
        col=c("lightgreen", "pink", "lightblue", "yellow", "lightblue"),  args.legend = list(x = "topright",
                           inset = c(0.23, 0)))

Interpretation of Results:

According to the chi squared results, there is a significant relationship between Immigrant Contributions and Party Identification as the p-value is less than.5.Democrats tend to think that immigrants mostly contribute, Republicans believe they are mostly a drain, as do Independents.

Analysis of Categorical Variable #2 (ImmigrantNaturalization):

Crosstab & Visualization:

table(dataset$PartyIdentification, dataset$ImmigrantNaturalization)

##              
##               Favor Not Sure Oppose
##   Democrat     1079      326    374
##   Independent   626      283    601
##   Not Sure       20       12     23
##   Other          20       15     18
##   Republican    285      276    787

round(prop.table(table(dataset$PartyIdentification, dataset$ImmigrantNaturalization))*100,2)

##              
##               Favor Not Sure Oppose
##   Democrat    22.74     6.87   7.88
##   Independent 13.19     5.96  12.67
##   Not Sure     0.42     0.25   0.48
##   Other        0.42     0.32   0.38
##   Republican   6.01     5.82  16.59

chisq.test(dataset$PartyIdentification, dataset$ImmigrantNaturalization)

## 
##  Pearson's Chi-squared test
## 
## data:  dataset$PartyIdentification and dataset$ImmigrantNaturalization
## X-squared = 570.35, df = 8, p-value < 2.2e-16

barplot(prop.table(table(dataset$ImmigrantNaturalization, dataset$PartyIdentification))*100, ylab='Percentage', xlab='Party', legend.text=TRUE,
        col=c("lightgreen", "pink", "lightblue", "yellow", "lightblue"),  args.legend = list(x = "topright",
                           inset = c(0.23, 0)))

Interpretation of Results:

According to the chi squared results, there is a significant relationship between Immigrant Contributions and Immigrant Naturalization as the p-value is less than .5. According to the results as in the table, Democrats and Independents are mostly in favor of naturalization, while Republicans for the most part aren’t.

Analysis of Continuous Variable #1 (ft_immig_2017):

suppressMessages(suppressWarnings(library(psych)))
suppressMessages(suppressWarnings(library(ggpubr)))

describeBy(dataset$ft_immig_2017, dataset$PartyIdentification)

## 
##  Descriptive statistics by group 
## group: Democrat
##    vars    n  mean    sd median trimmed  mad min max range  skew kurtosis   se
## X1    1 1779 70.27 24.79     76   73.21 25.2   0 100   100 -0.89     0.25 0.59
## ------------------------------------------------------------ 
## group: Independent
##    vars    n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 1510 61.28 26.87     61   63.32 28.17   0 100   100 -0.51    -0.45 0.69
## ------------------------------------------------------------ 
## group: Not Sure
##    vars  n  mean   sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 55 48.13 29.8     50   47.71 35.58   0 100   100 -0.04    -1.03 4.02
## ------------------------------------------------------------ 
## group: Other
##    vars  n  mean sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 53 59.62 27     63   60.93 26.69   0 100   100 -0.33    -0.75 3.71
## ------------------------------------------------------------ 
## group: Republican
##    vars    n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 1348 52.42 26.93     51   53.09 31.13   0 100   100 -0.22    -0.74 0.73

summary(aov(ft_immig_2017 ~ PartyIdentification, data = dataset))

##                       Df  Sum Sq Mean Sq F value Pr(>F)    
## PartyIdentification    4  256973   64243   93.83 <2e-16 ***
## Residuals           4740 3245209     685                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ggplot(dataset, aes(x=PartyIdentification, y=ft_immig_2017, fill=PartyIdentification)) + 
  geom_boxplot(alpha=0.3) +
  theme(legend.position="none") +
  scale_fill_brewer(palette="Dark2")

Interpretation of Results:

According to the one way ANOVA results as in the table, p value of the test is less than .5. Therefore, the results are statistically significant.

Conclusion

-There is a statistically significant relationship between Immigrant Contributions, Naturalization, and Party Identification- There is significant difference in score of feeling towards immigrants (ft_immig_2017) among the Party Identification (Highest average was shown for people in the Democratic party, lowest average was shown for people in the Not Sure group).