Data Analytics Assignment

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.1     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.2.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)

Attaching package: 'janitor'

The following objects are masked from 'package:stats':

    chisq.test, fisher.test
library(furniture)
library(kableExtra)

Attaching package: 'kableExtra'

The following object is masked from 'package:dplyr':

    group_rows
#install.packages("janitor")
#install.packages("furniture") 

hogsmeade <- read_csv ("hogsmeade_sales.csv") 
Rows: 3000 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (9): Order ID, Order Date, Customer Name, Customer Type, Region, Deliver...
dbl (5): Sales, Profit, Discount, Unit Price, Units Sold

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
hogsmeade_clean <- hogsmeade %>%
  remove_empty("rows") %>%            
  remove_empty("cols") %>%
  clean_names()

Introduction

The data below is for 3,000 sales in a magical supplies store in Hogsmeade in Hogsworth. We will analyse the data to discuss trends in sales.

ggplot(data = hogsmeade_clean) +
   geom_bar(mapping = aes(x = product_category, fill = product_category)) +
  scale_fill_manual(values = c("#800000", "#0b6623", "#663399", "#daa520", "#2b547e", "#808588", "black")) +
   labs(title = "Count of Products Per Category",
       x = "Type of Products",
       y = NULL)

The above graph shows which product category performs best. We can see brooms are the best preforming category. I have coloured them to easily compare each category.

ggplot(data = hogsmeade_clean) +
    geom_point(mapping = aes(x = customer_type, y = region)) +
    facet_wrap(~ region, nrow = 2) +
labs(title = "Which region consists of the most Customers",
       x = "Customer Type",
       y = "Region") +
  theme(
    plot.title = element_text(color = "#0b6623"),
    axis.title = element_text(color = "#0b6623"),
    axis.text  = element_text(color = "#0b6623"),
    strip.text = element_text(color = "#0b6623")  
  )

The above graph displays what region most of the customers are from. This can show where the majority of your customers are based.

most_prof_cust <- hogsmeade_clean %>%
  group_by(customer_name) %>%
  summarise(sales_by_cust = sum(sales)) %>%
  arrange(desc(sales_by_cust)) %>%
  filter(sales_by_cust >= 1000)
knitr::kable(most_prof_cust, 
             format = "html",
             digits = c(0,2), 
             align = "lr", 
             col.names = c("Customer Name", "Sales"),
             caption = "Sales per Customer",
             table.attr = 'data-quarto-disable-processing = "true"') %>% 
  kable_styling(full_width = F)
Sales per Customer
Customer Name Sales
Donald Brown 1486.40
Mitchell Lee 1452.87
Sean Smith 1367.64
Hailey Johnson 1272.93
Robert Smith 1249.18
Anne Griffin 1191.06
David Weber 1163.18
Christina Jones 1083.61
Gary Smith 1048.62
Michael Martin 1031.23

Here we have a table of which customer is the most profitable. Donald Brown has made the most sales of €1486.40.

ggplot(data = hogsmeade_clean) + 
  geom_point(mapping = aes(x = product_category, y = product_name), colour = "#2b547e") +
  labs(title = "Which products are in each Category",
       x = "Product Category",
       y = "Product Type") +
  theme(
    plot.title = element_text(color = "#663399", size = 14, face = "bold")
  )

The above graph shows which product is in each category. Each dot represents an individual product, placed according to its category. This makes it easy to see how products are grouped and what items are included in each category.

ggplot(data = hogsmeade_clean) +
  geom_boxplot(mapping = aes(x = sales, y = region)) +
  labs(title = "Sales per Region",
       x = "Sales",
       y = "Region")

ggplot(data = hogsmeade_clean) +
  geom_point(mapping = aes(x = order_date, y = sales)) +
  geom_line(aes(x = order_date, y = sales)) +
  labs(title = "Sales Over Time",
       x = "Date",
       y = "Sales") + 
  theme(
    plot.title = element_text(color = "#daa520", size = 14, face = "bold"))

This line graph shows the sales over time. I used the order date category linked to sales on that order date.

ggplot(data = hogsmeade_clean) +
  geom_point(mapping = aes(x = sales, y = profit), color = "#663399") +
  labs(title = "Profit vs Sales",
        x = "Sales",
       y = "Profit")

This graph shows profit versus sales.

ggplot(data = hogsmeade_clean) +
  geom_point(mapping = aes(x = region, y = profit, fill = region)) +
  labs(title = "Profit Distribution by Region",
        x = "Region",
       y = "Profit")  + 
  theme(
    plot.title = element_text(color = "#daa520", size = 14, face = "bold"))

This chart shows how profit values are distributed across the three regions. Each dot represents a single transaction. The plot makes it easy to compare profit ranges between regions and see how spread out or concentrated the profits are in each area.