This R Markdown will be using the dataset provided by Barakat Adigun from the discussion board. The following code is used to load and observe the dataset:
product_sales <- read_csv(
"https://raw.githubusercontent.com/GullitNa/DATA607-Project2/main/BarakatAdigunProduct.csv")
## Rows: 9 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Product Name, Region
## dbl (6): Jan Sales, Feb Sales, Mar Sales, Apr Sales, May Sales, Jun Sales
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
product_sales
## # A tibble: 9 × 8
## `Product Name` Region `Jan Sales` `Feb Sales` `Mar Sales` `Apr Sales`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Product A North 100 110 120 130
## 2 Product A South 200 210 220 230
## 3 Product A East 300 310 320 330
## 4 Product B North 150 160 170 180
## 5 Product B South 250 260 270 280
## 6 Product B East 350 360 370 380
## 7 Product C North 50 55 60 65
## 8 Product C South 100 105 110 115
## 9 Product C East 150 155 160 165
## # ℹ 2 more variables: `May Sales` <dbl>, `Jun Sales` <dbl>
This dataset has a more straightforward load compared to the other datasets I explore in this project. I plan to remove the word “Sales” from the month columns for clarity as a simple cleaning measure with this dataset and move on to pivoting from there.
product_sales_long <- product_sales %>%
pivot_longer(
cols = ends_with("Sales"),
names_to = "Month",
values_to = "Sales"
)
product_sales_long
## # A tibble: 54 × 4
## `Product Name` Region Month Sales
## <chr> <chr> <chr> <dbl>
## 1 Product A North Jan Sales 100
## 2 Product A North Feb Sales 110
## 3 Product A North Mar Sales 120
## 4 Product A North Apr Sales 130
## 5 Product A North May Sales 140
## 6 Product A North Jun Sales 150
## 7 Product A South Jan Sales 200
## 8 Product A South Feb Sales 210
## 9 Product A South Mar Sales 220
## 10 Product A South Apr Sales 230
## # ℹ 44 more rows
Although there is still room for tidying the data, which as a result I will proceed with pivoting the data from its already wide structure to a long dataframe. In combination of the two prospects, I will instead have these columns combine into one singular column now that it’s in a long format.
product_sales_long %>%
mutate(Month = sub(" Sales", "", Month))
## # A tibble: 54 × 4
## `Product Name` Region Month Sales
## <chr> <chr> <chr> <dbl>
## 1 Product A North Jan 100
## 2 Product A North Feb 110
## 3 Product A North Mar 120
## 4 Product A North Apr 130
## 5 Product A North May 140
## 6 Product A North Jun 150
## 7 Product A South Jan 200
## 8 Product A South Feb 210
## 9 Product A South Mar 220
## 10 Product A South Apr 230
## # ℹ 44 more rows
head(product_sales_long)
## # A tibble: 6 × 4
## `Product Name` Region Month Sales
## <chr> <chr> <chr> <dbl>
## 1 Product A North Jan Sales 100
## 2 Product A North Feb Sales 110
## 3 Product A North Mar Sales 120
## 4 Product A North Apr Sales 130
## 5 Product A North May Sales 140
## 6 Product A North Jun Sales 150
After the transformation of the data, I aim to analyze the data in terms of comparing the regional performance of the products. Just based off the summary, the east region had sold the most products with 5175, and south is in 2nd with 3675, and the north region sold the least with 2175.
total_by_region <- product_sales_long %>%
group_by(Region) %>%
summarize(TotalSales = sum(Sales)) %>%
arrange(desc(TotalSales))
total_by_region
## # A tibble: 3 × 2
## Region TotalSales
## <chr> <dbl>
## 1 East 5175
## 2 South 3675
## 3 North 2175
This demonstration is for visualizating the data. Proving as an example for further analysis of other categories in this dataset/dataframe.
ggplot(total_by_region, aes(x = reorder(Region, -TotalSales), y = TotalSales)) +
geom_col() +
labs(
title = "Regional Performance",
x = "Region",
y = "Total Sales"
)
Within Barakat’s provided dataset, I loaded in the wide dataset with the intention to pivot into a long format from the getgo. This was in an attempt to tidy the data by absorbing the months into a singular column as well as altering the name of the months, all for improved clarity. Finally, followed up by summarizing the data’s total sales in descending order per region and leading up with a bar graph for demonstration purposes to visualize any analytically attempts within this dataset/dataframe.