Assignment 9: Chi Square Test for Independence
The two categorical variables used for this analysis were: Gender and Area Lived. The purpose of the analyis was to see if either of these variables influenced one another or were in fact independent of one another. Analysis of the data with summarizations can be found below.
library(readr)
library(dplyr)
library(knitr)
library(ggplot2)
VoterData<-read_csv("/Users/safiesaf/Downloads/VOTER_Survey_July17_Release1-csv.csv")
New3Data<-VoterData%>%
rename("Gender"=gender_baseline,
"AreaLived"=urbancity_baseline)%>%
mutate(Gender=ifelse(Gender==1,"Male",
ifelse(Gender==2,"Female",NA)),
AreaLived=ifelse(AreaLived==1,"City",
ifelse(AreaLived==2,"Suburb",
ifelse(AreaLived==3,"Town",
ifelse(AreaLived==4,"Rural Area",
ifelse(AreaLived==5,"Other",NA))))))%>%
select(Gender,AreaLived)
head(New3Data)
## # A tibble: 6 x 2
## Gender AreaLived
## <chr> <chr>
## 1 Female Suburb
## 2 Female Rural Area
## 3 Male City
## 4 Male City
## 5 Male Suburb
## 6 Female Suburb
kable(head(New3Data))
| Gender | AreaLived |
|---|---|
| Female | Suburb |
| Female | Rural Area |
| Male | City |
| Male | City |
| Male | Suburb |
| Female | Suburb |
The percentage of respondents that are Female are: 51%. The percentage of respondents that are Male are: 49%. As for the percentage of respondents living in the city, it is 28%, 19% live in rural areas, 38% in the suburbs and 14% in towns. In terms of Cities, 14% of Female and Male respondents dwell there, rural areas, 10% & 9% respecively, suburbs both respondents have a 19% chance of dwelling there. Towns are no different as there is a tie of 7% of respondents dwelling there. When running Pearson’s Chi Square test, the p value of the categorical variables is > 0.05, therefore, it is NOT statistically different and we must FAIL to reject the null hypothesis.
GenderTable<-table(New3Data$Gender)
prop.table(GenderTable)
##
## Female Male
## 0.5075 0.4925
AreaLivedTable<-table(New3Data$AreaLived)
prop.table(AreaLivedTable)
##
## City Other Rural Area Suburb Town
## 0.284652218 0.006426411 0.186869960 0.378276210 0.143775202
CrossTabTable<-table(New3Data$Gender,New3Data$AreaLived)
prop.table(CrossTabTable)
##
## City Other Rural Area Suburb Town
## Female 0.142767137 0.003654234 0.098538306 0.190524194 0.072328629
## Male 0.141885081 0.002772177 0.088331653 0.187752016 0.071446573
chisq.test(New3Data$Gender,New3Data$AreaLived)
##
## Pearson's Chi-squared test
##
## data: New3Data$Gender and New3Data$AreaLived
## X-squared = 3.6742, df = 4, p-value = 0.4519