library(conflicted)
library(dplyr)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.4 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.0
library(ggplot2)
The dataset contains information on various animal populations across different IPCC areas. The data is structured with the area names as rows and animal categories as columns, with weights for each area-animal combination.
animal_weights <- read.csv("challenge_datasets/animal_weight.csv")
as_tibble(animal_weights)
## # A tibble: 9 × 17
## IPCC.Area Cattle...dairy Cattle...non.dairy Buffaloes Swine...market
## <chr> <int> <int> <int> <int>
## 1 Indian Subcontinent 275 110 295 28
## 2 Eastern Europe 550 391 380 50
## 3 Africa 275 173 380 28
## 4 Oceania 500 330 380 45
## 5 Western Europe 600 420 380 50
## 6 Latin America 400 305 380 28
## 7 Asia 350 391 380 50
## 8 Middle east 275 173 380 28
## 9 Northern America 604 389 380 46
## # ℹ 12 more variables: Swine...breeding <int>, Chicken...Broilers <dbl>,
## # Chicken...Layers <dbl>, Ducks <dbl>, Turkeys <dbl>, Sheep <dbl>,
## # Goats <dbl>, Horses <int>, Asses <int>, Mules <int>, Camels <int>,
## # Llamas <int>
The dataset contains 9 rows and 17 columns, with the first column listing IPCC areas (geographical regions or countries) and the following columns listing weights for different categories of animals such as dairy cattle, non-dairy cattle, buffaloes, various types of swine, chickens (broilers and layers), ducks, turkeys, sheep, goats, horses, asses, mules, camels, and llamas. The data type is integers, representing weights of livestock for the respective region.
The dataset is in a wide format, which is not considered “tidy” because each type of animal represents a separate column. However, the focus of the study seems to be on weights of the livestock, thus, rendering it redundant to have multiple columns for different types of animals. After pivoting, the data should have three columns:
# Dimension before pivoting
dim(animal_weights)
## [1] 9 17
To pivot the data, we would use the pivot_longer() function. The function is used to transform the dataset from a wide format to a long format. The 16 columns are getting added to the rows, so the count of rows will go from 9 to (9 * 16 = 144). Except for the first column, all columns are discarded and two are added (Animal Type, and Weight). Thus, the total number of columns is 3.
# Pivot longer the data
animal_weights <- pivot_longer(
data = animal_weights,
cols = -IPCC.Area, # Exclude the area column from pivoting
names_to = "Animal_Type", # The new column for animal types
values_to = "Weight" # The new column for animal weights
)
# New data frame
as_tibble(animal_weights)
## # A tibble: 144 × 3
## IPCC.Area Animal_Type Weight
## <chr> <chr> <dbl>
## 1 Indian Subcontinent Cattle...dairy 275
## 2 Indian Subcontinent Cattle...non.dairy 110
## 3 Indian Subcontinent Buffaloes 295
## 4 Indian Subcontinent Swine...market 28
## 5 Indian Subcontinent Swine...breeding 28
## 6 Indian Subcontinent Chicken...Broilers 0.9
## 7 Indian Subcontinent Chicken...Layers 1.8
## 8 Indian Subcontinent Ducks 2.7
## 9 Indian Subcontinent Turkeys 6.8
## 10 Indian Subcontinent Sheep 28
## # ℹ 134 more rows
# Dimension after pivoting
dim(animal_weights)
## [1] 144 3
The provided dataset was successfully transformed from a wide to a long format, enabling more efficient data analysis of the weights per animal type. The pivot operation resulted in a tidy dataset with each row representing a unique combination of an IPCC area and animal type, along with its corresponding weight.