This R Markdown will be using the dataset provided by Barakat Adigun from the discussion board. The following code is used to load and observe the dataset:
product_sales <- read_csv(
"https://raw.githubusercontent.com/GullitNa/DATA607-Project2/main/BarakatAdigunProduct.csv")
## Rows: 9 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Product Name, Region
## dbl (6): Jan Sales, Feb Sales, Mar Sales, Apr Sales, May Sales, Jun Sales
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
product_sales
## # A tibble: 9 × 8
## `Product Name` Region `Jan Sales` `Feb Sales` `Mar Sales` `Apr Sales`
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Product A North 100 110 120 130
## 2 Product A South 200 210 220 230
## 3 Product A East 300 310 320 330
## 4 Product B North 150 160 170 180
## 5 Product B South 250 260 270 280
## 6 Product B East 350 360 370 380
## 7 Product C North 50 55 60 65
## 8 Product C South 100 105 110 115
## 9 Product C East 150 155 160 165
## # ℹ 2 more variables: `May Sales` <dbl>, `Jun Sales` <dbl>
This dataset has a more straightforward load compared to the other datasets I explore in this project. Although there is still room for tidying the data, which as a result I will proceed with pivoting the data from its already wide structure to a long dataframe. Additionally, I plan to remove the word “Sales” from the month columns for clarity and instead have these columns combine into one singular column now that it’s in a long format.
product_sales_long <- product_sales %>%
pivot_longer(
cols = ends_with("Sales"),
names_to = "Month",
values_to = "Sales"
) %>%
mutate(Month = sub(" Sales", "", Month))
head(product_sales_long)
## # A tibble: 6 × 4
## `Product Name` Region Month Sales
## <chr> <chr> <chr> <dbl>
## 1 Product A North Jan 100
## 2 Product A North Feb 110
## 3 Product A North Mar 120
## 4 Product A North Apr 130
## 5 Product A North May 140
## 6 Product A North Jun 150
After the transformation of the data, I aim to analyze the data in terms of comparing the regional performance of the products. Just based off the summary, the east region had sold the most products with 5175, and south is in 2nd with 3675, and the north region sold the least with 2175.
total_by_region <- product_sales_long %>%
group_by(Region) %>%
summarize(TotalSales = sum(Sales)) %>%
arrange(desc(TotalSales))
total_by_region
## # A tibble: 3 × 2
## Region TotalSales
## <chr> <dbl>
## 1 East 5175
## 2 South 3675
## 3 North 2175
This demonstration is for visualizating the data. Proving as an aexample for further analysis of other categories in this dataset/dataframe.
ggplot(total_by_region, aes(x = reorder(Region, -TotalSales), y = TotalSales)) +
geom_col() +
labs(
title = "Regional Performance",
x = "Region",
y = "Total Sales"
)
# Conclusion Within Barakat’s provided dataset, I loaded in the wide
dataset with the intention to pivot into a long format from the getgo.
This was in an attempt to tidy the data by absorbing the months into a
singular column as well as altering the name of the months, all for
improved clarity. Finally, followed up by summarizing the data’s total
sales in descending order per region and leading up with a bar graph for
demonstration purposes to visualize any analytically attempts within
this dataset/dataframe.