library(tidyverse)
library(ggplot2)
library(dplyr)
library(here)
library(ggsci)
library(ggthemes)
knitr::opts_chunk$set(echo = TRUE, warning=FALSE, message=FALSE)Challenge 9 Instructions
Challenge Overview
Today’s challenge is simple. Create a function, and use it to perform a data analysis / cleaning / visualization task:
Examples of such functions are: 1) A function that reads in and cleans a dataset. 2) A function that computes summary statistics (e.g., computes the z score for a variable).
3) A function that plots a histogram.
That’s it!
This is an intentionally straightforward challenge. You can use any dataset in the challenge datasets folder.
Solutions
Reading the Data
The working directory for RStudio has been set such that “eggs_tidy.csv” can be found at the root of the working directory using the setwd() method.
eggs <- read_csv(here("eggs_tidy.csv"))
eggs# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ℹ 110 more rows
# ℹ 1 more variable: extra_large_dozen <dbl>
Data Description
High Level Description
The data set comprises of 120 rows with 6 columns.
eggs# A tibble: 120 × 6
month year large_half_dozen large_dozen extra_large_half_dozen
<chr> <dbl> <dbl> <dbl> <dbl>
1 January 2004 126 230 132
2 February 2004 128. 226. 134.
3 March 2004 131 225 137
4 April 2004 131 225 137
5 May 2004 131 225 137
6 June 2004 134. 231. 137
7 July 2004 134. 234. 137
8 August 2004 134. 234. 137
9 September 2004 130. 234. 136.
10 October 2004 128. 234. 136.
# ℹ 110 more rows
# ℹ 1 more variable: extra_large_dozen <dbl>
The data set has a total of 1 <chr> type column and the remaining columns are of the <dbl> type. The month and year variables represent the month and year of observation respectively. large_half_dozen, large_dozen, extra_large_half_dozen and extra_large_dozen are variables that represent the type of eggs. Each case represents the count for each type of egg collected for that month and year.
How was the Data likely collected?
The dataset seems to provide a count of the total number of eggs for each of the 4 types collected for a month and year combination. The dataset is pre-cleaned since no NA values are seen. The data is likely to have been collected using official/unofficial sources providing egg count for a poultry facility.
Tidying the Data
The dataset needs a date variable and also needs to be pivoted to a long and narrow form for ease of analysis. The following query achieves this and stores the dataframe as eggs_tidy.
eggs_tidy <- eggs %>%
pivot_longer(cols=3:6,
values_to = "price") %>%
mutate(name=str_replace(name,"extra_large","extra large"),
name=str_replace(name,"half_dozen","half dozen")) %>%
separate(name,into=c("size","amount"),sep="_") %>%
mutate(date = str_c(month, year, sep=" "),
date = my(date))
eggs_tidy# A tibble: 480 × 6
month year size amount price date
<chr> <dbl> <chr> <chr> <dbl> <date>
1 January 2004 large half dozen 126 2004-01-01
2 January 2004 large dozen 230 2004-01-01
3 January 2004 extra large half dozen 132 2004-01-01
4 January 2004 extra large dozen 230 2004-01-01
5 February 2004 large half dozen 128. 2004-02-01
6 February 2004 large dozen 226. 2004-02-01
7 February 2004 extra large half dozen 134. 2004-02-01
8 February 2004 extra large dozen 230 2004-02-01
9 March 2004 large half dozen 131 2004-03-01
10 March 2004 large dozen 225 2004-03-01
# ℹ 470 more rows
Creating a Function
The following function is created to plot a dataframe. It accepts as parameters - the dataframe to be used, the X and Y axes, the fill variable and the variable to created facets on.
plot_grouped_barchart <- function(dataframe, x_axis, y_axis, fill_variable, facet_wrap_var){
ggplot(dataframe,aes_string(x=x_axis,y=y_axis,fill=fill_variable))+
geom_bar(position="dodge", stat="identity") +
scale_x_date()+
scale_y_continuous(labels=scales::label_dollar(),limits=c(0,300))+
facet_wrap(as.formula(paste("~", facet_wrap_var)))+
ggsci::scale_color_rickandmorty() +
labs(title=paste("Grouped Bar Plot by ", facet_wrap_var, "\n", y_axis, " vs ", x_axis), x=x_axis,y=y_axis)+
ggthemes::theme_few()+
theme(plot.title = element_text(hjust=0.5),
axis.text.x=element_text(angle=90))
}The above created function can now be executed to created a grouped bar plot using the eggs_tidy dataframe.
plot_grouped_barchart(dataframe=eggs_tidy, x_axis="date", y_axis="price", fill_variable="amount", facet_wrap_var="size")The above created function also allows keeping the title of the plot and the labels of the axes dynamic based on the parameters provided to the function.