Violence Against Women Survey Analysis

Abstract:

I analyzed the survey responses in regards to violence against women, this data was downloaded from Kaggle (Kaggle, 2020). The survey answers were extracted from the Demographic and Health Surveys conducted by United States Agency for International Development (USAID, n.d.). These surveys are conducted periodically to gauge health issues such as breastfeeding practices, children’s health, HIV infections, etc.

The analysis of this survey shows that higher percentages of women, compared to men, believe that a husband is justified in beating his wife. Percentages are lower among the higher educated group; however, even higher educated women are more likely than higher educated man to believe that a husband is justified in beating his wife.

This report also includes a correlation analysis between the survey responses with religion freedom index, GDP, and fertility rate per country. Per the correlation analysis, the three indicator don’t appear to have a strong correlation but among the group, fertility rate scored the highest at 0.36 which means that countries with higher fertility rates tend to have higher violence survey rates. This however does mean that fertility rate causes an increase in gender violence; however there is a moderate correlation between fertility rate and gender violence.

Problem

Per the United Nations (UN Women, n.d.) one in three women have been subjected to intimate partner violence, non-partner sexual violence, or both at least once in their life (30 per cent of women aged 15 and older). Furthermore, One hundred thirty-seven women are killed by a member of their family every day. Gender violence not only harms women but also has negative effect on communities such as substance abuse, lost work days, expenditures on medical, protection, judicial and social services. Children who witness domestic violence are at increased risk of anxiety, depression, low-self esteem and poor school performance. In Nicaragua, 63 percent of children of abused women had to repeat a school year and they left school on average 4 years earlier than other children (UN Women, 2010).

For this report, I will study the gender violence survey responses which includes quantitative and qualitative variables. The survey has responses from males and females from ages 15-49 from various marital, employment, education and residence status. These groups were asked if they agree that a husband is justified in beating his wife for the following reasons: for at least one specific reason, if she burns the food, if she argues with him, if she goes out without telling him, if she neglects the children and if she refuses to have sex with him. This report will also analyze factors such as freedom of religion, GDP, and fertility rate to find any potential causes for gender violence.

Analysis

I decided to analyze the two participant groups that were the most distinct to find trends. For this purpose I created a boxplot to compare the no education vs the higher education participants. I also wanted to know how education affected the participants responses. At the same time, I chose to analyze the response “…for at least one specific reason” because it includes people that agree that a husband is justified in beating his wife for any reason.

Graph 01 shows that higher percentages of women, compared to men, believe that a husband is justified in beating his wife. Graph 02 shows lower percentages among the higher educated group; however, even the higher educated women are more likely than higher educated man to believe that a husband is justified in beating his wife. Graph 01 and Graph 02 show that gender violence acceptance is consistent regardless of education level. The women’s interquartile range, from graph 01 and graph 02, is larger than the men. These findings are surprising because women’s responses seem to go against their own interest.

This analysis also explored the same groups (educated vs. non-educated) per country. Graph 03 shows that among the educated respondents, Timor L’este had the highest percentage of women’s agreement at 74.6%. In graph 04, among the women respondents with no education, Mali had the highest percentage of agreement at 82%.

Per the initial analysis, education doesn’t seem to change acceptance towards violence against women; therefore the next phase of this analysis will focus only on female respondents with higher education. The analysis will attempt to test if there are significant relationships between the acceptance of violence towards women against GDP, freedom of religion and fertility rate (total births per woman). These indicators were extracted from the world bank and gapminder and uses available data from 2010.

The correlation analysis in graph 05 between the women survey responses with religion freedom index for all countries had a correlation of -0.362. An interesting outlier is Timor Leste, which has a high religion freedom index at 0.88 but also a high acceptance for violence at 74.6. The correlation analysis in graph 06 between the women survey responses with countrie’s GDP had the lowest correlation at 0.01. An interesting outlier in this analysis is India which has a high GDP at 1.68 trillion dollars but a high violence acceptance percentage at 33.8. Lastly, the correlation analysis in graph 07 between the women survey responses with women fertility rate (number of births per woman) had the highest correlation at 0.364. In this correlation analysis, Timor Leste continues to be an outlier because it has a relative lower fertility rate at 4.8 but a high violence acceptance percentage at 74.6.

Based on these analyses, freedom of religion index, GDP and fertility don’t appear to have a strong correlation with the acceptance of gender violence but among the different factors, fertility rate had the highest correlation at 0.36. Countries with higher fertility rates tend to have higher violence survey rates however this analysis can’t establish that fertility rate causes an increase in gender violence.

Conclussion

This analysis shows that there isn’t a single factor when it comes to addressing gender violence. We can’t say that education, GDP, freedom of religion or fertility alone can solve gender violence. For example, Timor Leste seems to have all the right indicators but the female respondents had a high acceptance for violence. This analysis reveals that when it comes to programs that aim to address gender violence, all indicators should be considered rather than focusing on only one indicator. Focusing in one not only is inefficient but it also doesn’t solve gender violence problem.

References

Kaggle, Your Machine Learning and Data Science Community. https://www.kaggle.com/andrewmvd/violence-against-women-and-girls

USAID, The Demographic and Health Surveys. https://dhsprogram.com/Methodology/Survey-Types/DHS-Questionnaires.cfm

UN Women,Facts and Figures: Ending Violence Against Women. https://www.unwomen.org/en/what-we-do/ending-violence-against-women/facts-and-figures

UN Women, Virtual Knowledge Centre to End Violence Against Women and Girls. (2010). https://www.endvawnow.org/en/articles/301-consequences-and-costs-.html

library(tidyverse) #installs tidyverse package

## Warning: package 'tidyverse' was built under R version 4.0.4

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.5     v dplyr   1.0.3
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr) #installs dplyr package
library(plotly)

## Warning: package 'plotly' was built under R version 4.0.4

## 
## Attaching package: 'plotly'

## The following object is masked from 'package:ggplot2':
## 
##     last_plot

## The following object is masked from 'package:stats':
## 
##     filter

## The following object is masked from 'package:graphics':
## 
##     layout

library(htmlwidgets)

## Warning: package 'htmlwidgets' was built under R version 4.0.4

setwd("C:/Users/Dano/Documents/") #sets  working directoy
violence <- read_csv("Violencedata.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   RecordID = col_double(),
##   Country = col_character(),
##   Gender = col_character(),
##   `Demographics Question` = col_character(),
##   `Demographics Response` = col_character(),
##   Question = col_character(),
##   `Survey Year` = col_character(),
##   Value = col_double()
## )

religion <- read_csv("religion_index.csv")

## 
## -- Column specification --------------------------------------------------------
## cols(
##   Country = col_character(),
##   Religion_index = col_double()
## )

world_data <- read_csv('world_data.csv')

## 
## -- Column specification --------------------------------------------------------
## cols(
##   Country = col_character(),
##   `Country Code` = col_character(),
##   `Series Name` = col_character(),
##   `2010 [YR2010]` = col_character()
## )

# uploads data

violence1 <- na.omit(violence) #removes missing values
  names(violence1) <- gsub(" ","_",names(violence1)) # replaces spaces in headings with underscore

violence_onereason = filter(violence1, Demographics_Response %in% c("Higher", "No education"), Question %in% c("... for at least one specific reason" ) ) # t, his is to select ALL rows that have higher or no education, also selects all females and males that have higher or no education.  This is the main targeted group that I want to study and compare.

violence_onereason1 = filter(violence_onereason, Demographics_Response %in% c("Higher"), Question %in% c("... for at least one specific reason" ) ) # this is to select all rows that have only higher education that belong to the group that agree that a husband is justified in hitting or beating his wife for at least one specific reason

violence_onereason1 %>%
  ggplot() + 
  geom_boxplot(aes(x = Gender, y=Value, group=Gender,fill=Gender)) +
  labs(title="Graph 1. Percentage Of Respondents With Higher Education Who Agree That A Husband Is Justified In Hitting Or Beating His Wife For At Least One Specific Reason" ,x="Gender", y = "Percentage")+ scale_fill_brewer(palette="Dark2") +
  theme_classic() +  scale_x_discrete(labels=c("Female", "Male"))+ theme(legend.position = "none") #this creates a boxplot with title and classic them and splits the higher education groups into females versus males.

violence_onereason2 = filter(violence_onereason, Demographics_Response %in% c("No education"), Question %in% c("... for at least one specific reason" ) ) # this is to select all rows that have "no education" as demographic response which selects all females and males with no education.  This another  targeted groups that I want to analyze.

violence_onereason2 %>%
  ggplot() + 
  geom_boxplot(aes(x = Gender, y=Value, group=Gender,fill=Gender)) +
  labs(title=" Graph 2. Percentage Of Respondents With No Education Who Agree That A Husband Is
Justified In Hitting Or Beating His Wife For At Least One Specific Reason" ,x="Gender", y = "Percentage")+ scale_fill_brewer(palette="Dark2") +
  theme_classic() +  scale_x_discrete(labels=c("Female", "Male")) +  theme(legend.position = "none") #this creates a boxplot with title and classic them and splits the the no education groups into females versus males.

library(readr) 
library(ggplot2)
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:purrr':
## 
##     discard

## The following object is masked from 'package:readr':
## 
##     col_factor

library(highcharter)

## Warning: package 'highcharter' was built under R version 4.0.4

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

## Highcharts (www.highcharts.com) is a Highsoft software product which is

## not free for commercial and Governmental use

library(RColorBrewer)
#installs highcharter packages that are needed to create highcharter graphs

#I now want to compare all countries females  and male respondents that have  higher education

violence_onereasonhec <- filter(violence1, Demographics_Response %in% c("Higher"), Question %in% c("... for at least one specific reason" ))  
#this is to select all rows that have higher education which selects all female and male respondents

cols <- brewer.pal(4, "Set2") #sets color palette
graph1 <-violence_onereasonhec[order(-violence_onereasonhec$Value),]
graph1 %>%
  hchart("bar", hcaes(x= Country, y = Value, group = Gender), name=c("Female", "Male")) %>%  #command for bar plot graph
  hc_title(text = "Graph 3. Respondents With Higher Education",
           style = list(fontWeight = "bold", fontSize = "14px"),
           align = "center") %>% #command for title and subtitle 
   hc_subtitle(text = "Percentage of Respondents Who Agree That A Husband Is Justified In Beating His Wife For At Least One Specific Reason",
              align = "center") %>% 
     hc_xAxis(title = list(text = "Countries"))%>% 
  hc_yAxis(title = list(text = "Percentage of Respondents "), minorTickInterval = "auto") %>%   #sets axis titles
  hc_size(height=1300,width=600) %>% #elongates graph vertically so that all countries show up on the graph
     hc_add_theme(hc_theme_google()) %>% #sets the google theme for this graph 
  hc_legend(align = "right",
    verticalAlign = "top") %>% #modifies legend
  hc_colors(cols) %>%
  hc_tooltip(pointFormat="<b>{point.Gender}</b>: {point.Value}%")#adds color to graph

#I now want to compare all countries for all female and male respondents with no education

violence_onereasonne <- filter(violence1, Demographics_Response %in% c("No education"), Question %in% c("... for at least one specific reason" ))  # this is to select all rows that have no education which selects all females and males.

cols <- brewer.pal(4, "Set2") #sets color palette
graph2 <-violence_onereasonne[order(-violence_onereasonne$Value),]
options(repr.plot.width = 8, repr.plot.height = 9)
graph2 %>%
  hchart("bar", hcaes(x= Country, y = Value, group = Gender), name=c("Female", "Male")) %>%  #bar plo graph command
  hc_title(text = "Graph 4. Respondents with No Education",
           style = list(fontWeight = "bold", fontSize = "14px"),
           align = "center") %>% #adds title and subtitle for graph
   hc_subtitle(text = "Percentage Of Respondents Who Agree That A Husband Is Justified In Beating His Wife For At Least One Specific Reason",
              align = "center") %>% 
     hc_xAxis(title = list(text = "Countries"))%>% 
  hc_yAxis(title = list(text = "Percentage Of Respondents "), minorTickInterval = "auto")%>% #adds axis title
  hc_size(height=1300,width=600) %>% #elongates graph vertically so that all countries show up
  hc_add_theme(hc_theme_google()) %>% #adds google theme
  hc_legend(
    align = "right",
    verticalAlign = "top", legend=c("Line 1", "Line 2"),
              title=("Line types")) %>% #adds legend to the right side of the graph
   hc_colors(cols)

violence_onerefem <- filter(violence1, Demographics_Response %in% c("Higher"), Gender %in% c("F"), Question %in% c("... for at least one specific reason" ) ) # this is to select ALL females that  have higher education.

str(world_data)

## tibble [3,038 x 4] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Country      : chr [1:3038] "Afghanistan" "Afghanistan" "Afghanistan" "Afghanistan" ...
##  $ Country Code : chr [1:3038] "AFG" "AFG" "AFG" "AFG" ...
##  $ Series Name  : chr [1:3038] "Adolescent fertility rate (births per 1,000 women ages 15-19)" "Contraceptive prevalence, any methods (% of women ages 15-49)" "Fertility rate, total (births per woman)" "GDP (current US$)" ...
##  $ 2010 [YR2010]: chr [1:3038] "113.715" "21.8" "5.977" "15856574731" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   Country = col_character(),
##   ..   `Country Code` = col_character(),
##   ..   `Series Name` = col_character(),
##   ..   `2010 [YR2010]` = col_character()
##   .. )

# displays variables structures to confirm that the columns were converted to numeric variables

world_data1 <- world_data %>%
  pivot_wider(names_from = "Series Name", values_from= "2010 [YR2010]") 

world_data2 <- world_data1 %>% 
   select("Country", "GDP (current US$)", "Fertility rate, total (births per woman)") 

  world_data2$"GDP (current US$)" <- as.numeric(world_data2$"GDP (current US$)")

## Warning: NAs introduced by coercion

world_data2$"Fertility rate, total (births per woman)" <- as.numeric(world_data2$"Fertility rate, total (births per woman)")

## Warning: NAs introduced by coercion

str(world_data2)

## tibble [217 x 3] (S3: tbl_df/tbl/data.frame)
##  $ Country                                 : chr [1:217] "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##  $ GDP (current US$)                       : num [1:217] 1.59e+10 1.19e+10 1.61e+11 5.76e+08 3.45e+09 ...
##  $ Fertility rate, total (births per woman): num [1:217] 5.98 1.66 2.86 NA 1.27 ...

data <- inner_join(violence_onerefem, religion, by = "Country")
data1 <- inner_join(data, world_data2, by = "Country")
names(data1)[names(data1) == "GDP (current US$)"] <- "GDP"
names(data1)[names(data1) == "Fertility rate, total (births per woman)"] <- "Fertility_rate"
names(data1)[names(data1) == "Value"] <- "Survey_violence"

str(data1)

## tibble [61 x 11] (S3: tbl_df/tbl/data.frame)
##  $ RecordID             : num [1:61] 351 352 353 354 355 356 357 358 359 360 ...
##  $ Country              : chr [1:61] "Afghanistan" "Albania" "Angola" "Armenia" ...
##  $ Gender               : chr [1:61] "F" "F" "F" "F" ...
##  $ Demographics_Question: chr [1:61] "Education" "Education" "Education" "Education" ...
##  $ Demographics_Response: chr [1:61] "Higher" "Higher" "Higher" "Higher" ...
##  $ Question             : chr [1:61] "... for at least one specific reason" "... for at least one specific reason" "... for at least one specific reason" "... for at least one specific reason" ...
##  $ Survey_Year          : chr [1:61] "1/1/2015" "1/1/2017" "1/1/2015" "1/1/2015" ...
##  $ Survey_violence      : num [1:61] 61.1 1.4 7 6.6 22.5 13.4 4.7 5.7 6.3 22.2 ...
##  $ Religion_index       : num [1:61] 0.497 0.908 0.704 0.653 0.484 0.695 0.868 0.889 0.837 0.759 ...
##  $ GDP                  : num [1:61] 1.59e+10 1.19e+10 8.38e+10 9.26e+09 5.29e+10 ...
##  $ Fertility_rate       : num [1:61] 5.98 1.66 6.19 1.72 1.92 ...
##  - attr(*, "na.action")= 'omit' Named int [1:1413] 1 16 82 85 88 91 94 157 160 163 ...
##   ..- attr(*, "names")= chr [1:1413] "1" "16" "82" "85" ...

totaldata <- data1 %>%
    mutate(GDP_tn = GDP/1000000000000) #in trillions

correlationplot1 <- totaldata %>%
 ggplot(aes(`Religion_index`, `Survey_violence`, color=Country)) + 
  geom_point(shape = 1, size = 2 ) +
  geom_smooth(method='lm',color="black",formula=y~x, size = 0.3, se = FALSE)+
  labs(x="Freedom of Religion Index",y="Survey's Violence Aceptance Percentage", title="Graph 5. Correlation Between Freedom of Religion Index and Females' Violence Acceptance per Country")+
  theme_light()+ theme(legend.position = "none") + theme( title=element_text(size=7,face="bold"))
fig1 <- ggplotly(correlationplot1)
fig1

cor(totaldata$Religion_index,totaldata$Survey_violence)

## [1] -0.3620064

correlationplot2 <- totaldata %>%
 ggplot(aes(`GDP_tn`, `Survey_violence`, color=Country)) + 
  geom_point(shape = 1, size = 2 ) +
  geom_smooth(method='lm',color="black",formula=y~x, size = 0.3, se = FALSE)+
  labs(x="GDP in Trillions",y="Survey's Violence Aceptance Percentage", title="Graph 6. Correlation Between GDP and Females' Violence Acceptance per Country")+
  theme_light()+ theme(legend.position = "none") + theme( title=element_text(size=7,face="bold"))
fig2 <- ggplotly(correlationplot2)
fig2

cor(totaldata$GDP_tn,totaldata$Survey_violence)

## [1] 0.01568478

correlationplot2 <- totaldata %>%
 ggplot(aes(`Fertility_rate`, `Survey_violence`, color=Country)) + 
  geom_point(shape = 1, size = 2 ) +
  geom_smooth(method='lm',color="black",formula=y~x, size = 0.3, se = FALSE)+
  labs(x="Fertility Rate",y="Survey's Violence Aceptance Percentage", title="Graph 7. Correlation Between Fertility Rate and Females' Violence Acceptance per Country")+
  theme_light()+ theme(legend.position = "none") + theme( title=element_text(size=7,face="bold"))
fig2 <- ggplotly(correlationplot2)
fig2

cor(totaldata$Fertility_rate,totaldata$Survey_violence)

## [1] 0.3636467

Final Project 101