Introduction

This challenge is just about creating a function for any analysis task.

Dataset

We first load the libraries

library(readr)
library(here)
## here() starts at C:/Users/SHAURYA/Desktop/Studies/Winter 2024 601/Challenges/challenge 9
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)

The dataset is then loaded.

weight <- read_csv("animal_weight.csv", show_col_types = FALSE)
weight
## # A tibble: 9 × 17
##   `IPCC Area`   `Cattle - dairy` `Cattle - non-dairy` Buffaloes `Swine - market`
##   <chr>                    <dbl>                <dbl>     <dbl>            <dbl>
## 1 Indian Subco…              275                  110       295               28
## 2 Eastern Euro…              550                  391       380               50
## 3 Africa                     275                  173       380               28
## 4 Oceania                    500                  330       380               45
## 5 Western Euro…              600                  420       380               50
## 6 Latin America              400                  305       380               28
## 7 Asia                       350                  391       380               50
## 8 Middle east                275                  173       380               28
## 9 Northern Ame…              604                  389       380               46
## # ℹ 12 more variables: `Swine - breeding` <dbl>, `Chicken - Broilers` <dbl>,
## #   `Chicken - Layers` <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
## #   Goats <dbl>, Horses <dbl>, Asses <dbl>, Mules <dbl>, Camels <dbl>,
## #   Llamas <dbl>

We see that the dataset as Area followed by different categories of various animals.

Functions

This function reads and cleans the data which is followed by a histogram. The final data is then returned by the function.

func <- function(data, category) {
  cols <- c("IPCC Area", category)
  new_data <- data[, cols]
  
  # Clean Data
  new_data[[category]] <- as.numeric(new_data[[category]])
  
  # Remove missing values if there
  new_data <- na.omit(new_data)
  
  # The plot
  ggplot(new_data, aes(x = `IPCC Area`, y = new_data[[category]])) +
    geom_bar(stat = "identity", fill = "skyblue", color = "black", alpha = 0.7) +
    labs(title = "Weight of Dairy Cattle by Area",
         x = "IPCC Area", y = category) +
    theme_minimal() +
    theme(axis.text.x = element_text(angle = 45, hjust = 1))
}

func(weight, "Cattle - non-dairy")
## Warning: Use of `new_data[[category]]` is discouraged.
## ℹ Use `.data[[category]]` instead.

The above function filters the data by getting the animal category and the area. We then clean the data by removing any missing values if there are. Then, the histogram is created which shows that Western Europe has the heaviest Dairy Cattle which is followed by Asia, Eastern Europe and Northern America. Indian subcontinent has the lighest cattle.

Conclusion

We created a function that can perform multiple operations in a single code block.