The following tutorial will provide instructions for obtaining a specific data set, manipulating the data so that it can be read by R Studio, and creating a final plot displaying the table information.
The data set that will be used for this tutorial can be found in the following journal article:
Espelage, D. L., Aragon, S. R., & Birkett, M. J. (2008). Homophobic teasing, psychological outcomes, and sexual orientation among high school students: What influence do parents and school have?. School Psychology Review, 37(2), 202-216.
In order to provide the data set for this tutorial, the dput function has been used. The data set is as follows:
lgbtq = structure(list(Reported.Behavior = structure(c(3L, 6L, 2L, 1L, 8L, 7L,
4L, 5L), .Label = c("Alcohol-marijuana", "Depression/Suicidal Ideation",
"Homophobic Teasing", "Parent Communication", "Parent Support", "Peer Victimization",
"Racism", "School climate"), class = "factor"), Heterosexual.Mean = c(0.2,
0.45, 0.63, 0.8, 1.79, 0.61, 1.89, 3.31), Heterosexual.SD = c(0.66, 0.75,
0.67, 0.97, 0.49, 0.67, 0.95, 0.65), Questioning.Mean = c(0.84, 0.95, 1.07,
1.36, 1.63, 1.03, 1.79, 2.83), Questioning.SD = c(1.33, 1.18, 0.95, 1.51,
0.65, 0.82, 1.13, 0.93), LGB.Mean = c(0.57, 0.56, 0.77, 1, 1.72, 0.82, 1.84,
3.14), LGB.SD = c(1.13, 0.9, 0.82, 1.16, 0.56, 0.76, 1.07, 0.8), ANOVA.F.Value = c(375.94,
166.54, 176.54, 138.82, 49.13, 193.31, 5.63, 231.73), ANOVA.Effect.Size = c(0.05,
0.03, 0.03, 0.02, 0.01, 0.03, 0, 0.03)), .Names = c("Reported.Behavior",
"Heterosexual.Mean", "Heterosexual.SD", "Questioning.Mean", "Questioning.SD",
"LGB.Mean", "LGB.SD", "ANOVA.F.Value", "ANOVA.Effect.Size"), class = "data.frame",
row.names = c(NA, -8L))
If you would like practice creating an excel file from the original table, the data has been provided below in a simple format. Remember to save the file as a csv (comma separated value) before imported the data set in to R Studio. Data is provided by column.
Column 1 = Reported Behavior [Homophobic Teasing, Peer Victimization, Depression/Suicidal Ideation, Alcohol-Marijuana, School Climate, Racism, Parent Communication, Parent Support]
Column 2 = Heterosexual Mean [.2, .45, .63, .8, 1.79, .61, 1.89, 3.31]
Column 3 = Heterosexual Standard Deviation [.66, .75, .67, .97, .49, .67, .95, .65]
Column 4 = Questioning Mean [.84, .95, 1.07, 1.36, 1.63, 1.03, 1.79, 2.83]
Column 5 = Questioning Standard Deviation [1.33, 1.18, .95, 1.51, .65, .82, 1.13, .93]
Column 6 = LGB Mean [.57, .56, .77, 1, 1.72, .82, 1.84, 3.14]
Column 7 = LGB Standard Deviation [1.13, .9, .82, 1.16, .56, .76, 1.07, .8]
Column 8 = ANOVA F Value [375.94, 166.54, 176.54, 138.82, 49.13, 193.31, 5.63, 231.73]
Column 9 = ANOVA Effect Size [.05, .03, .03, .02, .01, .03, 0, .03]
The data set consists of eight observations of nine variables. (i.e., 8 rows and 9 columns)
If you created an excel file with the data set provided above, you can import the data set by selecting “Import Dataset” in the workspace provided in the right hand panel of R Studio. You will be presented with two options: select “From Text File” and find the appropriate file you have just created (.csv formatting). R Studio will prompt you with several questions before importing the data: the dataset you have created has a heading, values are separated with commas, and decimals are represented by a period. Name the data set “lgbtq”. You may now import the data set.
Before getting started, several packages need to be installed and uploaded in order to complete the tutorial. These are listed in the following R Code Commands.
install.packages("ggplot2")
## Installing package(s) into
## '/Volumes/RStudio-0.97.310/RStudio.app/Contents/Resources/R/library' (as
## 'lib' is unspecified)
## Warning: 'lib =
## "/Volumes/RStudio-0.97.310/RStudio.app/Contents/Resources/R/library"' is
## not writable
## Error: unable to install packages
library(ggplot2)
install.packages("reshape2")
## Installing package(s) into
## '/Volumes/RStudio-0.97.310/RStudio.app/Contents/Resources/R/library' (as
## 'lib' is unspecified)
## Warning: 'lib =
## "/Volumes/RStudio-0.97.310/RStudio.app/Contents/Resources/R/library"' is
## not writable
## Error: unable to install packages
library(reshape2)
Now that you have installed and uploaded ggplot2, and reshape2, you can begin exploring the data.
Acknowledging the purpose of the study conducted by Espelage et al. (2008), the following tutorial will construct a graph comparing the mean values of heterosexual, questioning, and homosexual individuals, organized by reported behavior. Although there is additional information provided in the original results table, this tutorial will focus on presenting the mean values.
To view the first six observations of the data set, run the following command:
head(lgbtq)
## Reported.Behavior Heterosexual.Mean Heterosexual.SD
## 1 Homophobic Teasing 0.20 0.66
## 2 Peer Victimization 0.45 0.75
## 3 Depression/Suicidal Ideation 0.63 0.67
## 4 Alcohol-marijuana 0.80 0.97
## 5 School climate 1.79 0.49
## 6 Racism 0.61 0.67
## Questioning.Mean Questioning.SD LGB.Mean LGB.SD ANOVA.F.Value
## 1 0.84 1.33 0.57 1.13 375.94
## 2 0.95 1.18 0.56 0.90 166.54
## 3 1.07 0.95 0.77 0.82 176.54
## 4 1.36 1.51 1.00 1.16 138.82
## 5 1.63 0.65 1.72 0.56 49.13
## 6 1.03 0.82 0.82 0.76 193.31
## ANOVA.Effect.Size
## 1 0.05
## 2 0.03
## 3 0.03
## 4 0.02
## 5 0.01
## 6 0.03
The ggplot function can be used to construct a plot incrementally. Layers can be added using the (+) operator. Order of layering matters!
The most basic use of ggplot can be summarized in the following R code:
basicggplot = ggplot(data = lgbtq, aes(x = Reported.Behavior, y = Heterosexual.Mean,
color = Reported.Behavior)) + geom_point()
In the code shown above, “lgbtq” represents the data set of interest, Reported.Behavior consists of one of the column headings, and Heterosexual.Mean also represents one of the column headings. geom_point() simply designates that we want to create a scatterplot of the information. This command should have produced the following plot (note: without editting, the x axis is difficult to read):
plot(basicggplot)
Notice that Reported Behavior is shown on the x-axis, and Heterosexual Mean is shown on the y-axis. Each Reported Behavior is represented by a distinct color. We designated this in the previous aesthetic component (color = Reported.Behavior).
At first glance, it seems logical to do the following R Code: (note: this command will result in a “warning”. However, the command will still be completed.)
ggplot(data = lgbtq, aes(x = Reported.Behavior, y = Heterosexual.Mean, color = Heterosexual.Mean)) +
geom_bar(color = "red", fill = "red") + facet_wrap(~Reported.Behavior)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
## Mapping a variable to y and also using stat="bin". With stat="bin", it
## will attempt to set the y value to the count of cases in each group. This
## can result in unexpected behavior and will not be allowed in a future
## version of ggplot2. If you want y to represent counts of cases, use
## stat="bin" and don't map a variable to y. If you want y to represent
## values in the data, use stat="identity". See ?geom_bar for examples.
## (Deprecated; last used in version 0.9.2)
However, we are interested in observing the mean values for each of the 8 behaviors for Heterosexual students, Questioning students, and Homosexual students. The plots shown above only depict mean values for Heterosexual students.
The current data set is in a wide format. In order to get the desired graph, we need to utilize the reshape2 package melt function to transform the data from a wide format to a long format. We can accomplish this with the following code:
mdat = melt(lgbtq[, 1:7])
## Using Reported.Behavior as id variables
Within the melt function, the 1 represents the variable that will represent the ID variable (Reported.Behavior) and the 7 represents the measured variable that needs to be “stacked”. In other words, this function essentially rids the dataset of the ANOVA F values and Effect Size values. We have titled this new data set 'mdat'.
Following this function, the console should read:“Using Reported.Behavior as id variables”.
mdat <- data.frame(mdat, colsplit(mdat$variable, "\\.", c("type", "val")))
The dcast and melt functions work together to convert data that is in wide format into a long format that is more easily readable by R. Thus, the following command is necessary.
cdat <- dcast(mdat, Reported.Behavior + type ~ val)
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip()
The coord_flip() function allows us to switch the x and y axes. In this case, this function aids in the end presentation of the illustration. We indicate that we want space between the groups of bars for each behavior, thus, we include “Position = "dodge” to specify our geom_bar function.
Additionally, it is advantageous to add a simplified x and y title to the axes. This can be done using the following commands:
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + xlab("Reported Behavior")
After the previous command, the x axis should now be titled “Reported Behavior”.
Similarly, we can label the y axis:
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + xlab("Reported Behavior") + ylab("Mean Values")
We can also customize the colors utilizing a website that has color-schemes selected that are specifically effective in graphical representations. To view this website, go to the following link:
For the purpose of this tutorial, we have selected the color scheme titled the following: “Dark2”
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + xlab("Reported Behavior") + ylab("Mean Values") +
scale_fill_brewer(palette = "Dark2")
Utilizing the ggtitle() allows us to add a title to the plot.
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + scale_fill_brewer(palette = "Dark2") +
xlab("Reported Behavior") + ylab("Mean Value") + ggtitle("A Comparison of Behaviors by Sexuality")
To provide a title for the legend, utilize the following code:
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + scale_fill_brewer(palette = "Dark2",
name = "Sexual Orientation") + xlab("Reported Behavior") + ylab("Mean Value") +
ggtitle("A Comparison of Behaviors by Sexuality")
What we have done with the code above is added “name = 'Sexual Orientation'” within the scale_fill_brewer function. This allows the user to identify a legend with any specific title.
We can also choose to modify the appearance of the legend title, or the labels with the following command:
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + scale_fill_brewer(palette = "Dark2",
name = "Sexual Orientation") + xlab("Reported Behavior") + ylab("Mean Value") +
ggtitle("A Comparison of Behaviors by Sexuality") + theme(legend.title = element_text(colour = "black",
size = 12, face = "bold"))
The above code as indicated that we want the legend title to be bold faced and size 12 font. This can be adjusted accordingly. Similarly, color of the legend title can also be adjusted.
To add a box around the legend, add the following command to the previous command:
ggplot(cdat, aes(x = Reported.Behavior, y = Mean, fill = type)) + geom_bar(stat = "identity",
position = "dodge") + coord_flip() + scale_fill_brewer(palette = "Dark2",
name = "Sexual Orientation") + xlab("Reported Behavior") + ylab("Mean Value") +
ggtitle("A Comparison of Behaviors by Sexuality") + theme(legend.title = element_text(colour = "black",
size = 12, face = "bold")) + theme(legend.background = element_rect(fill = "gray90",
size = 0.5, linetype = "dotted"))
The command above has placed a box around the legend and shaded the box gray. These are only a few examples of the ways in which ggplot2 allows for customization of plots!
You have now transformed a table of results into a graphical representation aiding in comparison of Heterosexual, Questioning, and Homosexual behaviors.