This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
Note: this analysis was performed using the open source software R and Rstudio.
library(readr)
data <- read_csv('purchase_behavior_data.csv')
## Rows: 150 Columns: 9
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (6): Customer_ID, Gender, Region, Product_Category, Payment_Method, Loy...
## dbl (2): Age, Purchase_Amount
## date (1): Purchase_Date
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(data)
## # A tibble: 6 × 9
## Customer_ID Age Gender Region Product_Category Purchase_Amount
## <chr> <dbl> <chr> <chr> <chr> <dbl>
## 1 CUST0001 56 Male East Books 148.
## 2 CUST0002 69 Other West Home Goods 67.9
## 3 CUST0003 46 Other South Books 351.
## 4 CUST0004 32 Female North Clothing 318.
## 5 CUST0005 60 Female East Clothing 440.
## 6 CUST0006 25 Male North Beauty 370.
## # ℹ 3 more variables: Payment_Method <chr>, Loyalty_Program <chr>,
## # Purchase_Date <date>
# Convert Product_Category to factor if it's not already
data$Product_Category <- as.factor(data$Product_Category)
# You can't do `Product_Category ~ Purchase_Amount` directly in `plot()` for factor vs numeric
# Use boxplot or ggplot2
boxplot(Purchase_Amount ~ Product_Category, data = data,
main = "Purchase Amount by Product Category",
xlab = "Product Category", ylab = "Purchase Amount",
col = "lightblue", las = 2)
#Answers
library(ggplot2)
ggplot(data, aes(x = Purchase_Amount)) +
geom_histogram(binwidth = 20, fill = "steelblue", color = "black") +
labs(title = "Distribution of Purchase Amounts",
x = "Purchase Amount", y = "Frequency")