Loading Dataset

Barakat Adigun

This R Markdown will be using the dataset provided by Barakat Adigun from the discussion board. The following code is used to load and observe the dataset:

product_sales <- read_csv(
  "https://raw.githubusercontent.com/GullitNa/DATA607-Project2/main/BarakatAdigunProduct.csv")
## Rows: 9 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Product Name, Region
## dbl (6): Jan Sales, Feb Sales, Mar Sales, Apr Sales, May Sales, Jun Sales
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
product_sales
## # A tibble: 9 × 8
##   `Product Name` Region `Jan Sales` `Feb Sales` `Mar Sales` `Apr Sales`
##   <chr>          <chr>        <dbl>       <dbl>       <dbl>       <dbl>
## 1 Product A      North          100         110         120         130
## 2 Product A      South          200         210         220         230
## 3 Product A      East           300         310         320         330
## 4 Product B      North          150         160         170         180
## 5 Product B      South          250         260         270         280
## 6 Product B      East           350         360         370         380
## 7 Product C      North           50          55          60          65
## 8 Product C      South          100         105         110         115
## 9 Product C      East           150         155         160         165
## # ℹ 2 more variables: `May Sales` <dbl>, `Jun Sales` <dbl>

Transformation and Initial Thoughts

This dataset has a more straightforward load compared to the other datasets I explore in this project. Although there is still room for tidying the data, which as a result I will proceed with pivoting the data from its already wide structure to a long dataframe. Additionally, I plan to remove the word “Sales” from the month columns for clarity and instead have these columns combine into one singular column now that it’s in a long format.

product_sales_long <- product_sales %>%
  pivot_longer(
    cols = ends_with("Sales"),
    names_to = "Month",
    values_to = "Sales"
  ) %>%
  mutate(Month = sub(" Sales", "", Month))
head(product_sales_long)
## # A tibble: 6 × 4
##   `Product Name` Region Month Sales
##   <chr>          <chr>  <chr> <dbl>
## 1 Product A      North  Jan     100
## 2 Product A      North  Feb     110
## 3 Product A      North  Mar     120
## 4 Product A      North  Apr     130
## 5 Product A      North  May     140
## 6 Product A      North  Jun     150

Analysis

Compare Regional Performance

After the transformation of the data, I aim to analyze the data in terms of comparing the regional performance of the products. Just based off the summary, the east region had sold the most products with 5175, and south is in 2nd with 3675, and the north region sold the least with 2175.

total_by_region <- product_sales_long %>%
  group_by(Region) %>%
  summarize(TotalSales = sum(Sales)) %>%
  arrange(desc(TotalSales))
total_by_region
## # A tibble: 3 × 2
##   Region TotalSales
##   <chr>       <dbl>
## 1 East         5175
## 2 South        3675
## 3 North        2175

Plotting

This demonstration is for visualizating the data. Proving as an aexample for further analysis of other categories in this dataset/dataframe.

ggplot(total_by_region, aes(x = reorder(Region, -TotalSales), y = TotalSales)) +
  geom_col() +
  labs(
    title = "Regional Performance",
    x = "Region",
    y = "Total Sales"
  )

# Conclusion Within Barakat’s provided dataset, I loaded in the wide dataset with the intention to pivot into a long format from the getgo. This was in an attempt to tidy the data by absorbing the months into a singular column as well as altering the name of the months, all for improved clarity. Finally, followed up by summarizing the data’s total sales in descending order per region and leading up with a bar graph for demonstration purposes to visualize any analytically attempts within this dataset/dataframe.