library(tidyverse)
library(readxl)I saved the Excel file to a folder in my project called data. I removed the second row of the Excel file, since it contained no useful information, and I renamed the file to vets.xlsx.
Apparently we are interested in the first sheet. The R-function read_xlsx() reads in the first sheet by default.
We only want to read in the first six columns of that first sheet, ignoring all of the analysis stuck into subsequent columns, so we use the range parameter of the function, as follows:
vets <- read_xlsx(
"data/vets.xlsx",
range = "A1:F912"
)Here is a quick bar graph of the ranks of the subjects:
ggplot(vets, aes(x = Rank)) +
geom_bar(color = "black", fill = "skyblue")The ranks, without the NA-values:
vets %>%
drop_na(Rank) %>%
ggplot(aes(x = Rank)) +
geom_bar(color = "black", fill = "skyblue")Suppose you want the Warrant officers to come in between the enlisted and commissioned officers. In that case, you need to order your ranks. First, find out what your ranks are:
vets %>%
pull(Rank) %>%
unique() %>%
sort()## [1] "E1" "E2" "E3" "E4" "E5" "E6" "E7" "E8" "E9" "O2" "O3" "O4" "W2" "W3" "W4"
## [16] "W5"
One approach to re-ordering is to make Rank into a factor variable, with the "levels of the factor in the order you want, like this:
vets2 <-
vets %>%
mutate(
ordered_rank = factor(
Rank,
levels = c(
"E1", "E2", "E3", "E4", "E5", "E6", "E7", "E8", "E9",
"W3", "W4", "W5",
"O2", "O3", "O4"
)
)
)Now try the bar graph:
vets2 %>%
drop_na(Rank) %>%
ggplot(aes(x = ordered_rank)) +
geom_bar(color = "black", fill = "skyblue") +
labs(x = "Rank")Note that when give a factor, ggplot2 takes account of the ordering on the levels.
You are interested in comparing officers with non-officers, and it seems you are willing to lump the officers into one group. For this, create a new variable:
vets2 <-
vets2 %>%
mutate(rank_status = recode(
Rank,
E1 = "enlisted",
E2 = "enlisted",
E3 = "enlisted",
E4 = "enlisted",
E5 = "enlisted",
E6 = "enlisted",
E7 = "enlisted",
E8 = "enlisted",
E9 = "enlisted",
O2 = "officer",
O3 = "officer",
O4 = "officer",
W2 = "officer",
W3 = "officer",
W4 = "officer",
W5 = "officer"
)
)Note: That’s kinda cumbersome. As you gain programming experience you’ll learn use case_when() with some regular expressions, like this:
vets2 <-
vets2 %>%
mutate(rank_status =
case_when(
str_detect(Rank, pattern = "[OW]") ~ "officer",
str_detect(Rank, pattern = "E") ~ "enlisted"
))Let’s tally that variable:
vets2 %>%
count(rank_status)| rank_status | n |
|---|---|
| enlisted | 811 |
| officer | 19 |
| NA | 81 |
You are right: not very many officers!
But let’s make a bar chart of the DiagA variable, broken down by rank status:
vets2 %>%
drop_na(rank_status, DiagA) %>%
ggplot(aes(x = rank_status)) +
geom_bar(color = "black", aes(fill = DiagA))You can improve the look of the bar graph in various ways:
vets2 %>%
drop_na(rank_status, DiagA) %>%
ggplot(aes(x = rank_status)) +
geom_bar(color = "black", aes(fill = DiagA)) +
labs(
## title for the x-axis:
x = "Rank",
## title for the graph:
title = "Hey, we can make a title!",
## subtitle for the graph:
subtitle = "(and even a sub-title, if we like ...)",
## give the legend a better title:
fill = "PTSD?"
)A jazzier bar graph!
Make a variable that records whether or not the person was exposed to a blast:
vets3 <-
vets2 %>%
mutate(Blast = ifelse(
Ref == "B",
"blast",
"non-blast")
)Now a table:
vets3 %>%
drop_na(rank_status, DiagA, Blast) %>%
count(rank_status, DiagA, Blast)| rank_status | DiagA | Blast | n |
|---|---|---|---|
| enlisted | No PTSD | blast | 233 |
| enlisted | No PTSD | non-blast | 130 |
| enlisted | PTSD | blast | 379 |
| enlisted | PTSD | non-blast | 69 |
| officer | No PTSD | blast | 10 |
| officer | No PTSD | non-blast | 2 |
| officer | PTSD | blast | 6 |
| officer | PTSD | non-blast | 1 |
Now for a bar chart:
vets3 %>%
drop_na(rank_status, DiagA, Blast) %>%
ggplot(aes(x = rank_status)) +
geom_bar(color = "black", aes(fill = DiagA)) +
facet_wrap(~ Blast) +
labs(x = "Rank", fill = "PTSD?")But you might want to show percentages:
ptsd_table <-
vets3 %>%
drop_na(rank_status, DiagA, Blast) %>%
group_by(rank_status, Blast) %>%
count(DiagA) %>%
mutate(perc = n / sum(n) * 100)ptsd_table| Diagnosis | Count | Percentage |
|---|---|---|
| Enlisted, exposed to blast | ||
| No PTSD | 233 | 38.07190 |
| PTSD | 379 | 61.92810 |
| Enlisted, not exposed to blast | ||
| No PTSD | 130 | 65.32663 |
| PTSD | 69 | 34.67337 |
| Officer, exposed to blast | ||
| No PTSD | 10 | 62.50000 |
| PTSD | 6 | 37.50000 |
| Officer, exposed to blast | ||
| No PTSD | 2 | 66.66667 |
| PTSD | 1 | 33.33333 |
Now make the bar graph from the table, instead of making it from the original data. The key is to use the argument stat = "identity" to make the height of the bars equal to the percentages:
ptsd_table %>%
ggplot(aes(x = Blast, y = perc)) +
geom_bar(
color = "black",
aes(fill = DiagA),
position = "dodge",
stat = "identity") +
facet_wrap(~ rank_status) +
labs(x = "Whether or not exposed to blast",
y = "Percentage", fill = "PTSD?")This file was made with R Markdown. For more on R Markdown, consult the following two resources:
This file was made from the material template provided by the `rmdformats`` package. In order to reproduce it:
Press the Code button near the top of the document and download the .Rmd file, saving it to your project directory at the same level as your data folder.
Modify your Excel file as I described above.
Install some packages that the file uses:
install.packages(c("tidyverse", "readxl")) ## you already did thid
install.packages("kableExtra")
install.packages("remotes")
remotes::install_github("juba/rmdformats")Open the file in RStudio and press the Knit button.
Suggestion: There are settings for knitting: press the down arrow next to the cog above the source file to see them. Choose the option Preview in Viewer Pane. Then whenever you knit, the document shows up in a Viewer Pane in the lower right of RStudio. If you want to pop it up into your browser for a larger view, there is a button to do so.
You can publish your documents for to RPubs:
You can always update the document by pressing the publish button on a future version.