FiveThirtyEight asked > 1600 men whether they felt the #MeToo movement had changed their perception of masculinity. The study was an effort to gain insights into how #MeToo affects how men feel about being men. Important questions about male identity were raised: For example, participants were asked whether society puts unhealthy pressure on men. Additionally, the study investigated male perceptions of gender in the work place. This is among many other interesting insights. More information about the study can be found in this FiveThirtyEight article
This demo will explore the dataset, ‘masculinity-survey’, associated with this study. We wll start by accessing the data and loading it as an r dataframe as follow:
#Access the data from FiveThirtyEight & use the dim() function to get a sense of the size of our dataframe
#the url to the raw data from FiveThirtyEight's git
url <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/masculinity-survey/raw-responses.csv"
#use read_csv() function to read the csv file in to an R dataframe
masculinity_DF <- read_csv( url )
#use the dim() function to get a sense of the size of the df
dim( masculinity_DF )## [1] 1615 98
This dataframe has 1615 rows corresponding to 1615 participants in the survey conducted by FiveThirtyEight. The data record of each participant holds 98 columns. Therefore, there are 98 features to analyze in this dataset.
Let’s view the columns names
## [1] "X1" "StartDate" "EndDate" "q0001" "q0002"
## [6] "q0004_0001" "q0004_0002" "q0004_0003" "q0004_0004" "q0004_0005"
## [11] "q0004_0006" "q0005" "q0007_0001" "q0007_0002" "q0007_0003"
## [16] "q0007_0004" "q0007_0005" "q0007_0006" "q0007_0007" "q0007_0008"
## [21] "q0007_0009" "q0007_0010" "q0007_0011" "q0008_0001" "q0008_0002"
## [26] "q0008_0003" "q0008_0004" "q0008_0005" "q0008_0006" "q0008_0007"
## [31] "q0008_0008" "q0008_0009" "q0008_0010" "q0008_0011" "q0008_0012"
## [36] "q0009" "q0010_0001" "q0010_0002" "q0010_0003" "q0010_0004"
## [41] "q0010_0005" "q0010_0006" "q0010_0007" "q0010_0008" "q0011_0001"
## [46] "q0011_0002" "q0011_0003" "q0011_0004" "q0011_0005" "q0012_0001"
## [51] "q0012_0002" "q0012_0003" "q0012_0004" "q0012_0005" "q0012_0006"
## [56] "q0012_0007" "q0013" "q0014" "q0015" "q0017"
## [61] "q0018" "q0019_0001" "q0019_0002" "q0019_0003" "q0019_0004"
## [66] "q0019_0005" "q0019_0006" "q0019_0007" "q0020_0001" "q0020_0002"
## [71] "q0020_0003" "q0020_0004" "q0020_0005" "q0020_0006" "q0021_0001"
## [76] "q0021_0002" "q0021_0003" "q0021_0004" "q0022" "q0024"
## [81] "q0025_0001" "q0025_0002" "q0025_0003" "q0026" "q0028"
## [86] "q0029" "q0030" "q0034" "q0035" "q0036"
## [91] "race2" "racethn4" "educ3" "educ4" "age3"
## [96] "kids" "orientation" "weight"
We can see the column names in the output above. Aside from a few columns with names that suggest demographic information and other record identifications, most of the column names are ambiguously labelled after the corresponding questions in the survey the participants answered.
To understand what the data represents, we will have to take a look at the actual FiveThirtyEight Masculinity Survey
The questions are broad and cover a variety of topics from perspectives on dating to opinions about professional working environments. However, for the sake of this exercise, we will create a subset of the dataframe that focuses on just a few of the questions about masculinity & #MeToo. The selection was based on personal interest; I think these particular questions would be interesting to look at as a function of age range. These questions are from subjectively broader in scope with relatively simple categorical answers (e.g. yes or no). If you find another survey question more thought provoking, please use this code to pursue your own analysis!
This next block of code will create a new dataframe that holds a subset of the masculinity_DF corresponding to the columns of interest. We will also rename the columns with more intuitive labels.
#create a dataframe that is a subset of the masculinity_DF and holds the columns we are interested in for our analysis and assign the columns new names
subsetMasc_DF <- masculinity_DF %>% select(Age = age3, How_Manly = q0001, Important = q0002, Unhealthy = q0005, MeToo_Aware = q0014, MeToo_Wake = q0015 )
#display the first several rows of the new dataframe 'subsetMasc_DF'
head( subsetMasc_DF )## # A tibble: 6 x 6
## Age How_Manly Important Unhealthy MeToo_Aware MeToo_Wake
## <chr> <chr> <chr> <chr> <chr> <chr>
## 1 35 - 64 Somewhat masculi… Somewhat import… Yes <NA> <NA>
## 2 65 and up Somewhat masculi… Somewhat import… Yes <NA> <NA>
## 3 35 - 64 Very masculine Not too importa… No A lot No
## 4 65 and up Very masculine Not too importa… No <NA> <NA>
## 5 35 - 64 Very masculine Very important Yes A lot Yes
## 6 65 and up Very masculine Somewhat import… Yes Only a litt… No
We have just selected a subset of data from a much larger dataset. This subset selects for some specific features (columns) of the data that we are interested in analizing. In the next block of code, we will look at how men from different age groups percieve how ‘manly’ they feel.
#aggregate the data subset by the 'Age' and 'How_Manly' columns. calculate the frequency or each
freq_AgeManly <- subsetMasc_DF %>%
group_by( Age, How_Manly ) %>%
summarize( Freq = n() )
#now we need to find the frequency of the Age groups
freq_Age <- subsetMasc_DF %>%
group_by( Age ) %>%
summarize( Freq = n() )
#we would like to find the relative proportion of the frequencies of percieved manliness per age group, so we will populate a new column with the frequency values for the corresponding age group
index <- freq_Age[["Age"]]
values <- freq_Age[["Freq"]]
freq_AgeManly$AgeFreq <- values[match(freq_AgeManly$Age, index)]
#take a peak at the first few rows to see if things worked...
head( freq_AgeManly )## # A tibble: 6 x 4
## # Groups: Age [2]
## Age How_Manly Freq AgeFreq
## <chr> <chr> <int> <int>
## 1 18 - 34 Not at all masculine 9 133
## 2 18 - 34 Not very masculine 17 133
## 3 18 - 34 Somewhat masculine 62 133
## 4 18 - 34 Very masculine 45 133
## 5 35 - 64 No answer 9 855
## 6 35 - 64 Not at all masculine 13 855
#Great, now to make a new column in our aggregation that represents the percent of perceived manliness for each age group. This will normalize the data so that we can make a more meaningful comparison between the age groups.
freq_AgeManly$Percent <- round( ( freq_AgeManly$Freq/freq_AgeManly$AgeFreq )*100 )
head( freq_AgeManly )## # A tibble: 6 x 5
## # Groups: Age [2]
## Age How_Manly Freq AgeFreq Percent
## <chr> <chr> <int> <int> <dbl>
## 1 18 - 34 Not at all masculine 9 133 7
## 2 18 - 34 Not very masculine 17 133 13
## 3 18 - 34 Somewhat masculine 62 133 47
## 4 18 - 34 Very masculine 45 133 34
## 5 35 - 64 No answer 9 855 1
## 6 35 - 64 Not at all masculine 13 855 2
#Now we will prepare the data to be plotted as a bar graph.
#This will implement an example graph that can be found at:
#https://www.r-graph-gallery.com/48-grouped-barplot-with-ggplot2.html
# create a dataset to plot
data2plot <- data.frame(HM = freq_AgeManly$How_Manly, A = freq_AgeManly$Age, P = freq_AgeManly$Percent)
#head( data2plot )
# Grouped
ggplot( data2plot, aes(fill=A, y=P, x=HM)) +
geom_bar(position="dodge", stat="identity") +
xlab("Perceived Manliness") + ylab("%") +
ggtitle("Perceived Manliness by Age Group") +
labs(fill = "Age Group")
The figure above plots the percentage of men from each age group that self-reported a perceived measure of how ‘Manly’ they feel. For all age groups, the majority of men reported feeling “Somewhat Masculine” or “Very Masculine”. Comparatvely fewer men self-reported as feeling “Not very” or “Not at all” masculine. The envelope for the data of age groups “35 - 64” and “65 and up” were very similar. However, Age group “18 - 34” deviated from the other age groups. For example, “18 - 34” group had comparatively higher percentages of males that identified “Not at all” or “Not very” masculine. The trend reverses for “Somewhat” and “Very” masculine reportings.
We just examined how percieved feelings of manliness vary across the age groups. There are several other questions we can ask of our data subset: