Data Loading

Barakat Adigun’s data

This R Markdown will be using the dataset provided by Barakat Adigun from the discussion board. The following code is used to load and observe the dataset:

product_sales <- read_csv(
  "https://raw.githubusercontent.com/GullitNa/DATA607-Project2/main/BarakatAdigunProduct.csv")

## Rows: 9 Columns: 8
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Product Name, Region
## dbl (6): Jan Sales, Feb Sales, Mar Sales, Apr Sales, May Sales, Jun Sales
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

product_sales

## # A tibble: 9 × 8
##   `Product Name` Region `Jan Sales` `Feb Sales` `Mar Sales` `Apr Sales`
##   <chr>          <chr>        <dbl>       <dbl>       <dbl>       <dbl>
## 1 Product A      North          100         110         120         130
## 2 Product A      South          200         210         220         230
## 3 Product A      East           300         310         320         330
## 4 Product B      North          150         160         170         180
## 5 Product B      South          250         260         270         280
## 6 Product B      East           350         360         370         380
## 7 Product C      North           50          55          60          65
## 8 Product C      South          100         105         110         115
## 9 Product C      East           150         155         160         165
## # ℹ 2 more variables: `May Sales` <dbl>, `Jun Sales` <dbl>

Data Cleaning

This dataset has a more straightforward load compared to the other datasets I explore in this project. I plan to remove the word “Sales” from the month columns for clarity as a simple cleaning measure with this dataset and move on to pivoting from there.

product_sales_long <- product_sales %>%
  pivot_longer(
    cols = ends_with("Sales"),
    names_to = "Month",
    values_to = "Sales"
  )
product_sales_long

## # A tibble: 54 × 4
##    `Product Name` Region Month     Sales
##    <chr>          <chr>  <chr>     <dbl>
##  1 Product A      North  Jan Sales   100
##  2 Product A      North  Feb Sales   110
##  3 Product A      North  Mar Sales   120
##  4 Product A      North  Apr Sales   130
##  5 Product A      North  May Sales   140
##  6 Product A      North  Jun Sales   150
##  7 Product A      South  Jan Sales   200
##  8 Product A      South  Feb Sales   210
##  9 Product A      South  Mar Sales   220
## 10 Product A      South  Apr Sales   230
## # ℹ 44 more rows

Data Transformation

Although there is still room for tidying the data, which as a result I will proceed with pivoting the data from its already wide structure to a long dataframe. In combination of the two prospects, I will instead have these columns combine into one singular column now that it’s in a long format.

product_sales_long %>%
  mutate(Month = sub(" Sales", "", Month))

## # A tibble: 54 × 4
##    `Product Name` Region Month Sales
##    <chr>          <chr>  <chr> <dbl>
##  1 Product A      North  Jan     100
##  2 Product A      North  Feb     110
##  3 Product A      North  Mar     120
##  4 Product A      North  Apr     130
##  5 Product A      North  May     140
##  6 Product A      North  Jun     150
##  7 Product A      South  Jan     200
##  8 Product A      South  Feb     210
##  9 Product A      South  Mar     220
## 10 Product A      South  Apr     230
## # ℹ 44 more rows

head(product_sales_long)

## # A tibble: 6 × 4
##   `Product Name` Region Month     Sales
##   <chr>          <chr>  <chr>     <dbl>
## 1 Product A      North  Jan Sales   100
## 2 Product A      North  Feb Sales   110
## 3 Product A      North  Mar Sales   120
## 4 Product A      North  Apr Sales   130
## 5 Product A      North  May Sales   140
## 6 Product A      North  Jun Sales   150

Analysis

Summary

After the transformation of the data, I aim to analyze the data in terms of comparing the regional performance of the products. Just based off the summary, the east region had sold the most products with 5175, and south is in 2nd with 3675, and the north region sold the least with 2175.

total_by_region <- product_sales_long %>%
  group_by(Region) %>%
  summarize(TotalSales = sum(Sales)) %>%
  arrange(desc(TotalSales))
total_by_region

## # A tibble: 3 × 2
##   Region TotalSales
##   <chr>       <dbl>
## 1 East         5175
## 2 South        3675
## 3 North        2175

Plotting

This demonstration is for visualizating the data. Proving as an example for further analysis of other categories in this dataset/dataframe.

ggplot(total_by_region, aes(x = reorder(Region, -TotalSales), y = TotalSales)) +
  geom_col() +
  labs(
    title = "Regional Performance",
    x = "Region",
    y = "Total Sales"
  )

Conclusion

Within Barakat’s provided dataset, I loaded in the wide dataset with the intention to pivot into a long format from the getgo. This was in an attempt to tidy the data by absorbing the months into a singular column as well as altering the name of the months, all for improved clarity. Finally, followed up by summarizing the data’s total sales in descending order per region and leading up with a bar graph for demonstration purposes to visualize any analytically attempts within this dataset/dataframe.

DATA607 Project 2c

Gullit Navarrete

2025-03-09