I took a great deal of time to choose a dataset for my visualisation which was both meant a lot to me and would be able to be visualised in a pleasing way. This is why this current data was chosen, as women have struggled throughout history to be seen as equal to men in employment and within the economy.
This data will visualise the percentage of the female population, that were economically active through participation in the workforce from the age of 15. Dates prior to 1960 include females of 14 years of age also. This data will include data from Germany, UK and USA from the 1920’s up until 2016.
The data was take from the website Our World in Data under the subcategory poverty and economic development. The reference for the original published data is as follows, Long, C. D. (1958) The labor force under changing income and employment. Princeton University Press. Heckman J. and Killingsworth M. (1986) Female Labor Supply: A Survey. in Handbook of Labor Economics, Volume I, Edited by O. Ashenfelter and R. Layard. OECD.Stat.
1- When was employment among women the highest?
2- What trends can be seen in the data?
3- Which country has the lowest/highest employment rate of women?
#load the needed libraries for project
library(here)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(RColorBrewer)
library(plotly)
#importing of the data
df<- read_csv(here("data", "female data.csv"))
#display the data
head(df)
## # A tibble: 6 × 4
## country Code year women
## <chr> <chr> <dbl> <dbl>
## 1 Australia AUS 1966 36.3
## 2 Australia AUS 1967 37.2
## 3 Australia AUS 1968 37.7
## 4 Australia AUS 1969 38.1
## 5 Australia AUS 1970 39.6
## 6 Australia AUS 1971 40.0
#data preparation
df2 <- df[-c(1:396, 449:1307, 1347:1349),] #as the data was so large I decided to pick three countries and delete the other rows.
df2 = select (df2, "country", "year", "women") #deleted 'code' column as it was not needed.
arrange(df2, desc(women)) #shows the first 10 rows, with the highest percentage of working women.
## # A tibble: 152 × 3
## country year women
## <chr> <dbl> <dbl>
## 1 United States 1999 60.0
## 2 United States 2000 59.9
## 3 United States 1998 59.8
## 4 United States 1997 59.8
## 5 United States 2001 59.8
## 6 United States 2002 59.6
## 7 United States 2003 59.5
## 8 United States 2008 59.5
## 9 United States 2006 59.4
## 10 United States 2007 59.3
## # … with 142 more rows
arrange (df2,(women)) #shows the first 10 rows, with the lowest percentage of working women.
## # A tibble: 152 × 3
## country year women
## <chr> <dbl> <dbl>
## 1 United States 1920 23.3
## 2 United States 1930 24.3
## 3 United States 1940 25.4
## 4 United States 1950 28.6
## 5 United Kingdom 1921 32.3
## 6 United Kingdom 1931 34.2
## 7 United Kingdom 1951 34.7
## 8 United Kingdom 1961 37.4
## 9 United States 1963 38.0
## 10 Germany 1946 38
## # … with 142 more rows
Codebook
The column ‘country’ is the name of the country the data in that row in referring to, either Germany, America or the UK. The column ‘year’ refers to the year the data is from. Lastly, the column ‘women’ refers to the percentage of women who were economically active through participation in the workforce in that year and country.
Research question 3
Interestingly, this shows that America has both the lowest and highest employment rate of women. The late 1990’s/ early 2000’s in America was the highest rate of employment for women across the three countries. Similarly, the USA also had the lowest percentage of females in the work force from the years 1920-1950. This is then followed by the UK from 1921-1961 and Germany in 1946.
Attempt 1
#scatterplot
df2 %>%
ggplot (aes(
x = year, #assigns the column "year" to the x axis
y = women, #assigns the column "women to the y axis
colour = country))+ #use country as colour as it will be easy to visualise on the graph.
geom_point(size = 1,
alpha = 0.5) + #alpha makes the graph easier to read as the points are slightly transparent
geom_jitter(height = 0,
width = .3)+ #makes each point easier to see as they overlap
geom_smooth (formula = y ~ x,
method = "loess", #loess was used instead of lm as it is more visually appealing.
se = F)+ #the standard error is not shown on the graph
scale_color_brewer(
palette ="Set3")+ #this colour palette is more appealing than the default colours.
theme_bw()+
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.margin = margin (1,1,1,1, "mm"))+ #this changes the background lines to make the graph look cleaner and easier to read.
labs( #assigns labels to the graph
title = "Female labour force particpation rates, 1920-2016", #assigns a title
subtitle = "Proportion of female population active in the workforce from age 15",
x = "Year", #name for x axis
y = "Percentage of women in the workforce", #name for y axis
caption = "Prior to 1960 data includes females from age 14")
This graph I was very happy with, however I wanted to push myself further as the building blocks for this graph was taught in class. Therefore I took to the internet to see what else I could do with this data.
Attempt 2
plot_ly(df2,
x = ~year, #assigns the column "year" to the x axis
y = ~women, #assigns the column "women" to the y axis
color = ~factor(country))%>% #this creates a more interactive graph using the package plotly
add_lines(
alpha = 0.8)%>% #add lines to the graph using alpha, which makes the lines slightly transparent.
layout( #add labels to the graph
title = "Female labour force particpation rates, 1920-2016",
xaxis = list(
title = "Year", #label for x axis
range = c(1920, 2020) #range for the x axis
),
yaxis = list(
title = "Percentage of women in the workforce", #name for the y axis
range = c(20: 60)) #range the y axis will start and end with.
)%>%
layout(
xaxis = list(
rangeslider = list(type = "women")) #rangeslider makes the graph more interactive as it allows you to focus on specific time points.
)
I was very happy with how this graph turned out and I was proud that I was able to push my coding abilities. However, I still preferred the look of the first attempt. Therefore, I attempted to combined the two.
Attempt 3
p <- ggplot(df2,
aes(x = year, #assigns the column "year" to the x axis
y = women, #assigns the column "women" to the y axis
colour = country))+ #use country as colour as it will be easy to visualise on the graph.
geom_line(#adds a line to the graph
size = 1,
alpha = 0.8)+ #alpha makes the lines slightly transparent to make for easier reading.
scale_color_brewer(
palette ="RdYlBu")+ #this colour palette is more appealing than the default colours.
theme_bw()+ #adds a black and white theme
theme(
panel.grid.minor = element_blank(),
panel.grid.major.x = element_blank(),
plot.margin = margin (1,1,1,1, "mm"))+ #this changes the background lines to make the graph look cleaner and easier to read.
#assign graph labels
labs (title = "Female labour force particpation rates, 1920-2016", #assigns a title
x = "Year", #name for x axis
y = "Percentage of women in the workforce")+ #name for y axis
theme(
title = element_text(face = "bold", color = "black"),
axis.title = element_text(face = "bold", color = "black"), #makes the title and axis titles bold so us able to stand out more
)
ggplotly(p) #uses the code above to plot a more interactive graph
A scatterplot was picked to represent the data as it is the best way to represent two numeric and one categorical data categories.
#saving the data
ggsave((here("figures", "my figure.pdf"))) #saves the data into the folder called figures.
The mean percentage of women in the workforce in the years between 1920-2016 was 50.0011752. There is a general positive trends across all three countries, with growth starting at different points in time, and proceeded at different rates. Despite the overall positive trends Germany showed a dip at the beginning of World War 2 and growth in the USA slowed down considerably at the turn of the 21st century. Overall, the highest point of participation in the workforce in the USA was 1999, whereas in the UK and Germany it was 2016 (the most recent time point shown).
This research aimed to address how women have contributed to the workforce in the past 100 years, as attitudes to women within society have changed participation in the workforce has increased. However, this chart also shows that change was not always positive. For example, the US growth in participation slowed down considerably at the beginning of the 21st century.
One limitation to be considered is only three countries were included in the analysis, in order to improve visualisation. Additionally, data points are uneven as there is more data points for more recent years compared to the 1920’s. Future research may want to investigate further by examining reasons behind these data trends, for example the impact of World War 2 or employment laws. Additionally, future research might consider a wider range of countries. This could be made easier by grouping counties into continents. Alternatively, the most recent year could be visualised for every country to look at trends today.
The End