Assignment

Question 1:

  • As a data analyst, you are assigned to analyze customer data for a retail company. The dataset includes customer IDs, the product category they frequently purchase, the number of purchases made in a year, and their total spending.

(i) Write R code to create a dataframe from the given customer data.

library(rsconnect)
df <- data.frame(
  CustomerID      = c(101, 102, 103, 104, 105, 106, 107, 108, 109, 110),
  ProductCategory = c("Electronics", "Groceries", "Clothing", "Groceries",
                     "Electronics",  "Clothing",  "Electronics", "Groceries",
                     "Clothing",     "Electronics"),
  Purchases       = c(12, 14, 15, 30, 18, 20, 25, 28, 22, 14),
  TotalSpending   = c(1200, 800, 600, 900, 1500, 700, 1800, 950, 620, 1400)
)

# Display dataframe
df
##    CustomerID ProductCategory Purchases TotalSpending
## 1         101     Electronics        12          1200
## 2         102       Groceries        14           800
## 3         103        Clothing        15           600
## 4         104       Groceries        30           900
## 5         105     Electronics        18          1500
## 6         106        Clothing        20           700
## 7         107     Electronics        25          1800
## 8         108       Groceries        28           950
## 9         109        Clothing        22           620
## 10        110     Electronics        14          1400

(ii) Calculate the average total spending and the median number of purchases

Average:

avg_spending <- mean(df$TotalSpending)
avg_spending
## [1] 1047

Median:

med_purchases <- median(df$Purchases)
med_purchases
## [1] 19

(iii) Total spending by “Groceries” customers

total_groceries_spending <- sum(df$TotalSpending[df$ProductCategory == "Groceries"])
total_groceries_spending
## [1] 2650

(iv) Filter dataframe to include customers with more than 20 purchases

df_over20 <- df[df$Purchases > 20, ]
df_over20
##   CustomerID ProductCategory Purchases TotalSpending
## 4        104       Groceries        30           900
## 7        107     Electronics        25          1800
## 8        108       Groceries        28           950
## 9        109        Clothing        22           620

(v) Extract rows where category = “Electronics” AND total spending > 1000

df_elec_high <- df[df$ProductCategory == "Electronics" & df$TotalSpending > 1000, ]
df_elec_high
##    CustomerID ProductCategory Purchases TotalSpending
## 1         101     Electronics        12          1200
## 5         105     Electronics        18          1500
## 7         107     Electronics        25          1800
## 10        110     Electronics        14          1400

(vi) Create a boxplot of Total Spending by Product Category

boxplot(
  TotalSpending ~ ProductCategory,
  data    = df,
  main    = "Total Spending by Product Category",
  xlab    = "Product Category",
  ylab    = "Total Spending (USD)"
)

Question 2:

You are given data on the monthly revenue (in dollars) of four departments (HR, IT, Marketing, and Finance) for five months (May, June, July, August, September). HR: 5000, 5200, 4900, 5100, 5300 IT: 4500, 4800, 4700, 4900, 5100 Marketing: 6000, 6100, 5900, 6300, 6500 Finance: 7000, 7100, 6900, 7300, 7500

(i) Create a matrix called departmentmatrix with rows = departments and columns = months

# Define each department’s revenue
HR        <- c(5000, 5200, 4900, 5100, 5300)
IT        <- c(4500, 4800, 4700, 4900, 5100)
Marketing <- c(6000, 6100, 5900, 6300, 6500)
Finance   <- c(7000, 7100, 6900, 7300, 7500)

# Stack them by row
departmentmatrix <- rbind(
  HR        = HR,
  IT        = IT,
  Marketing = Marketing,
  Finance   = Finance
)

# Assign column names (months)
colnames(departmentmatrix) <- c("May", "June", "July", "August", "September")

# Display matrix
departmentmatrix
##            May June July August September
## HR        5000 5200 4900   5100      5300
## IT        4500 4800 4700   4900      5100
## Marketing 6000 6100 5900   6300      6500
## Finance   7000 7100 6900   7300      7500

(ii) Ccalculate the total revenue for each department across all months.

total_by_department <- rowSums(departmentmatrix)
total_by_department
##        HR        IT Marketing   Finance 
##     25500     24000     30800     35800

(iii) Calculate the average revenue for each month.

average_by_month <- colMeans(departmentmatrix)
average_by_month
##       May      June      July    August September 
##      5625      5800      5600      5900      6100

(iv) Apply the incentive percentages (3%, 4%, 5%, 6%, 4%) to the revenue matrix

# Define incentive percentages (May to September)
incentive_percentages <- c(0.03, 0.04, 0.05, 0.06, 0.04)

# Create incentive matrix (replicate vector for each department)
incentive_matrix <- matrix(
  incentive_percentages, 
  nrow = nrow(departmentmatrix), 
  ncol = ncol(departmentmatrix), 
  byrow = TRUE
)

# Calculate incentive amounts (element-wise multiplication)
incentive_amounts <- departmentmatrix * incentive_matrix

# Total incentives per department (sum rows)
rowSums(incentive_amounts)
##        HR        IT Marketing   Finance 
##      1121      1060      1357      1577