Data description using bar plot

In order to further describe the data for the modelling project on Schistosoma mansoni, I needing to make a bar plot of some variables in the data set.

First run the necessary libraries.

library(ggplot2)
library(dplyr)
library(forcats)
library(readxl)

Then I loaded the data into R.

Raw_data <- read_excel("C:\\Users\\user\\Documents\\Rodiyah R\\Raw data.xlsx")
View(Raw_data)

To plot a bar chart of gender against source of water, I started by renaming the levels in the source of water column by using the factor() function.

Raw_data$`What is your source of water at home/school?_labelled` <- factor(
  Raw_data$`What is your source of water at home/school?`,
  levels=c(1, 2, 3, 4, 5, 6, 7),
  labels=c("Well/Rain", "Borehole", "Tapwater", "River", "Well/Rain/River", 
           "Well/Rain/Borehole/Rivers", "Borehole/Rivers")
)

Then I plotted the column using the ggplot and geom_bar, but putting the gender as the fill, so R can count the number of occurence as the y-axis

ggplot(Raw_data, aes(x = `What is your source of water at home/school?_labelled`, fill = Gender)) +
  geom_bar(position = "dodge") +
  labs(
    title = "Gender by Source of Water",
    x = "Source of Water",
    y = "Count"
  ) +
  theme_minimal()

After plotting this, I noticed that the data has some missing values. I then went ahead to remove rows with missing values under my interested varibles (columns). I used the !is.na which implies that my renamed data should only contain rows of those columns that do not have missing values. I also used the pipe function in the diplyr library then plotted the data again. I also included the theme()axis.text function so as to rotate the labellings on the x-axis for more visibility.

Raw_data2 <- Raw_data %>%
  filter(!is.na(`What is your source of water at home/school?_labelled`) & !is.na(Gender))
ggplot(Raw_data2, aes(x = `What is your source of water at home/school?_labelled`, fill = Gender)) +
  geom_bar(position = "dodge") +
  labs(
    #title = "Gender by Source of Water",
    x = NULL,
    y = "Number of individuals"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

To plot the bar chart of Age against source of water, I started by first group the age entries from the data into age group using the pipe function, mutate function and factor function.

Raw_data3 <- Raw_data2 %>%
  mutate(
    Age_Group = case_when(
      Age >=5 & Age <=10 ~ "5-10",
      Age >=11 & Age <=14 ~ "11-14",
      Age >=15 & Age <=40 ~ "15-40",
      Age > 40        ~ "40+",
      TRUE            ~ "Other/Unspecified"
    )
  )   %>%
  mutate(
  Age_Group = factor(Age_Group,
                     levels = c("5-10", "11-14", "15-40", "40+", "Other/unspecified")))

I then plotted the data using the ggplot function

ggplot(Raw_data3, aes(x = `What is your source of water at home/school?_labelled`, fill = Age_Group)) +
  geom_bar(position = "dodge") +
  labs(
    #title = "Age_Group by Source of Water",
    x = NULL,
    y = "Number of individuals"
  ) +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))