In order to further describe the data for the modelling project on Schistosoma mansoni, I needing to make a bar plot of some variables in the data set.
First run the necessary libraries.
library(ggplot2)
library(dplyr)
library(forcats)
library(readxl)
Then I loaded the data into R.
Raw_data <- read_excel("C:\\Users\\user\\Documents\\Rodiyah R\\Raw data.xlsx")
View(Raw_data)
To plot a bar chart of gender against source of water, I started by renaming the levels in the source of water column by using the factor() function.
Raw_data$`What is your source of water at home/school?_labelled` <- factor(
Raw_data$`What is your source of water at home/school?`,
levels=c(1, 2, 3, 4, 5, 6, 7),
labels=c("Well/Rain", "Borehole", "Tapwater", "River", "Well/Rain/River",
"Well/Rain/Borehole/Rivers", "Borehole/Rivers")
)
Then I plotted the column using the ggplot and geom_bar, but putting the gender as the fill, so R can count the number of occurence as the y-axis
ggplot(Raw_data, aes(x = `What is your source of water at home/school?_labelled`, fill = Gender)) +
geom_bar(position = "dodge") +
labs(
title = "Gender by Source of Water",
x = "Source of Water",
y = "Count"
) +
theme_minimal()
After plotting this, I noticed that the data has some missing values. I then went ahead to remove rows with missing values under my interested varibles (columns). I used the !is.na which implies that my renamed data should only contain rows of those columns that do not have missing values. I also used the pipe function in the diplyr library then plotted the data again. I also included the theme()axis.text function so as to rotate the labellings on the x-axis for more visibility.
Raw_data2 <- Raw_data %>%
filter(!is.na(`What is your source of water at home/school?_labelled`) & !is.na(Gender))
ggplot(Raw_data2, aes(x = `What is your source of water at home/school?_labelled`, fill = Gender)) +
geom_bar(position = "dodge") +
labs(
#title = "Gender by Source of Water",
x = NULL,
y = "Number of individuals"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
To plot the bar chart of Age against source of water, I started by first group the age entries from the data into age group using the pipe function, mutate function and factor function.
Raw_data3 <- Raw_data2 %>%
mutate(
Age_Group = case_when(
Age >=5 & Age <=10 ~ "5-10",
Age >=11 & Age <=14 ~ "11-14",
Age >=15 & Age <=40 ~ "15-40",
Age > 40 ~ "40+",
TRUE ~ "Other/Unspecified"
)
) %>%
mutate(
Age_Group = factor(Age_Group,
levels = c("5-10", "11-14", "15-40", "40+", "Other/unspecified")))
I then plotted the data using the ggplot function
ggplot(Raw_data3, aes(x = `What is your source of water at home/school?_labelled`, fill = Age_Group)) +
geom_bar(position = "dodge") +
labs(
#title = "Age_Group by Source of Water",
x = NULL,
y = "Number of individuals"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))