Load required libraries

library(conflicted)
library(dplyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.4     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.0
library(ggplot2)

Read data

The dataset contains information on various animal populations across different IPCC areas. The data is structured with the area names as rows and animal categories as columns, with weights for each area-animal combination.

animal_weights <- read.csv("challenge_datasets/animal_weight.csv")
as_tibble(animal_weights)
## # A tibble: 9 × 17
##   IPCC.Area           Cattle...dairy Cattle...non.dairy Buffaloes Swine...market
##   <chr>                        <int>              <int>     <int>          <int>
## 1 Indian Subcontinent            275                110       295             28
## 2 Eastern Europe                 550                391       380             50
## 3 Africa                         275                173       380             28
## 4 Oceania                        500                330       380             45
## 5 Western Europe                 600                420       380             50
## 6 Latin America                  400                305       380             28
## 7 Asia                           350                391       380             50
## 8 Middle east                    275                173       380             28
## 9 Northern America               604                389       380             46
## # ℹ 12 more variables: Swine...breeding <int>, Chicken...Broilers <dbl>,
## #   Chicken...Layers <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
## #   Goats <dbl>, Horses <int>, Asses <int>, Mules <int>, Camels <int>,
## #   Llamas <int>

The dataset contains 9 rows and 17 columns, with the first column listing IPCC areas (geographical regions or countries) and the following columns listing weights for different categories of animals such as dairy cattle, non-dairy cattle, buffaloes, various types of swine, chickens (broilers and layers), ducks, turkeys, sheep, goats, horses, asses, mules, camels, and llamas. The data type is integers, representing weights of livestock for the respective region.

Pivot the data

The dataset is in a wide format, which is not considered “tidy” because each type of animal represents a separate column. However, the focus of the study seems to be on weights of the livestock, thus, rendering it redundant to have multiple columns for different types of animals. After pivoting, the data should have three columns:

# Dimension before pivoting
dim(animal_weights)
## [1]  9 17

To pivot the data, we would use the pivot_longer() function. The function is used to transform the dataset from a wide format to a long format. The 16 columns are getting added to the rows, so the count of rows will go from 9 to (9 * 16 = 144). Except for the first column, all columns are discarded and two are added (Animal Type, and Weight). Thus, the total number of columns is 3.

# Pivot longer the data
animal_weights <- pivot_longer(
  data = animal_weights,
  cols = -IPCC.Area, # Exclude the area column from pivoting
  names_to = "Animal_Type", # The new column for animal types
  values_to = "Weight" # The new column for animal weights
)
# New data frame
as_tibble(animal_weights)
## # A tibble: 144 × 3
##    IPCC.Area           Animal_Type        Weight
##    <chr>               <chr>               <dbl>
##  1 Indian Subcontinent Cattle...dairy      275  
##  2 Indian Subcontinent Cattle...non.dairy  110  
##  3 Indian Subcontinent Buffaloes           295  
##  4 Indian Subcontinent Swine...market       28  
##  5 Indian Subcontinent Swine...breeding     28  
##  6 Indian Subcontinent Chicken...Broilers    0.9
##  7 Indian Subcontinent Chicken...Layers      1.8
##  8 Indian Subcontinent Ducks                 2.7
##  9 Indian Subcontinent Turkeys               6.8
## 10 Indian Subcontinent Sheep                28  
## # ℹ 134 more rows
# Dimension after pivoting
dim(animal_weights)
## [1] 144   3

The provided dataset was successfully transformed from a wide to a long format, enabling more efficient data analysis of the weights per animal type. The pivot operation resulted in a tidy dataset with each row representing a unique combination of an IPCC area and animal type, along with its corresponding weight.