Steph Brunner and Gena Nelson, University of Minnesota
Data for this tutorial came from Table 3 in King, V. (1994). Nonresident father involvement and child well-being: Can dads make a difference? Journal of Family Issues 15(1), 78-96.
# Enter the following data set into an Excel sheet, and save as a .csv file.
Y | Category | Frequency |
---|---|---|
30.7 | children | Never |
32.6 | BPI | Never |
37.8 | rank | Never |
8.8 | children | Once in past year |
9.2 | BPI | Once in past year |
10.5 | rank | Once in past year |
12.6 | children | About 2-6 times in the past year |
12.8 | BPI | About 2-6 times in the past year |
16.8 | rank | About 2-6 times in the last year |
5.3 | children | About 7-11 times in the last year |
5.1 | BPI | About 7-11 times in the last year |
3.9 | rank | About 7-11 times in the last year |
15.4 | children | About 1-3 times a month |
15.5 | BPI | About 1-3 times a month |
12.2 | rank | About 1-3 times a month |
10.3 | children | About once a week |
10.8 | BPI | About once a week |
9.1 | rank | About once a week |
9.1 | children | About 2-5 times a week |
8.0 | BPI | About 2-5 times a week |
5.9 | rank | About 2-5 times a week |
7.8 | children | Almost every day |
6.0 | BPI | Almost every day |
3.7 | rank | Almost every day |
# Import the data set into RStudio.
visits <- read.csv("~/Spring 2013/8252/Homeworks/REAL FINAL HOMEWORK 1 DATA.csv")
# Examine the data to ensure that it transferred into R appropriately.
head(visits)
# Further explore the structure of the data set (shows which variables are numeric, which are being treated as factors, etc.).
str(visits)
# Load ggplot2 library.
library(ggplot2)
# Begin to make the bar plot – frequency of visitation across the x axis, percent of children along the y axis, with bars filled based on child category (all children, subsample of ages 4 and up, subsample of ages 10 and up).
ggplot(data = visits, aes(x = frequency, y = percent, fill = category)) + geom_bar(stat = "identity", position = position_dodge()) + xlab("Frequency of Father Visits") + ylab("Percent of Children")
# Rotate the text on the x axis so that the labels can be read.
+ theme(axis.text.x = element_text(angle = 90))
# Flip the plot so that the x axis becomes the y axis – this will further increase readability of the variable names and also facilitate the visual comparison across visitation categories.
+ coord_flip()
# Create an ordered factor of the frequency variable to put the frequency of visits categories in a logical order ranging from most to least visitation.
visits$Freq2 <- factor(visits$frequency, levels = c("Never", "Once in past year", "About 2-6 times in past year", "About 7-11 times in past year", "About 1-3 times a month", "About once a week", "About 2-5 times a week", "Almost every day"), ordered = TRUE)
# Reconstruct the bar plot using this new factor variable.Note that the only difference is x = Freq2 within the aes argument.
ggplot(data = visits, aes(x = Freq2, y = percent, fill = category)) + geom_bar(stat = "identity", position = position_dodge()) + xlab("Frequency of Father Visits") + ylab("Percent of Children") + theme(axis.text.x = element_text(angle = 90)) + coord_flip()
# Add a title to the plot.
+ labs(title = "Frequency of Father Visitation, 1988")
# Change the color scheme of the bars to use the “Paired” palette from colorbrewer2.org.
+ scale_fill_brewer(palette = "Paired")
# Employ the black-and-white theme to give the plot a white background – this will help viewers see the contrast between colors in palette.
+ theme_bw()
# Change the text size of the axis scales to size 10, the axis titles to size 15, and the plot title to size 20.
+ theme(axis.text.x = element_text(size = 10)) + theme(axis.title.x = element_text(size = 15)) + theme(axis.title.y = element_text(size = 15)) + theme(plot.title = element_text(size = 20)
# Remove the legend title.
+ labs(fill = "")
Enjoy your plot!