Data Description

Column

  1. Transaction date: refers to the date when the transaction took place from April, 2021 to April 23 (2 years)
  2. Post date: refers to the date when the transaction was processed and posted to your account.
  3. Description: provides details about the transaction, such as the name of the merchant or service provider.
  4. Location: indicates where the transaction took place, and amount is the cost of the transaction.
  5. Category: can help you to organize and track your spending. such as groceries, utilities, entertainment, etc.
# A tibble: 6 × 6
  `Trans Date`        `Post Date`         Description   Location Amount Category
  <dttm>              <dttm>              <chr>         <chr>     <dbl> <chr>   
1 2021-04-30 00:00:00 2021-04-30 00:00:00 MEIJER # 068… MI        32.6  Gasoline
2 2021-05-01 00:00:00 2021-05-01 00:00:00 INSTACART SU… CA         9.99 Superma…
3 2021-05-02 00:00:00 2021-05-02 00:00:00 INSTACART 88… CA        19.3  Superma…
4 2021-05-08 00:00:00 2021-05-08 00:00:00 INSTACART 88… CA        25.1  Superma…
5 2021-05-09 00:00:00 2021-05-09 00:00:00 WWW.ELS.EDU … NJ       250    Educati…
6 2021-05-11 00:00:00 2021-05-11 00:00:00 TRANSCRIPT S… IL        10    Services

Column

The density plot gives us a good overview of the distribution of transaction amounts, which can be helpful in identifying any unusual transactions or spending patterns. We can observe that the majority of transactions fall in the range of $0-$200, with a small number of transactions above $500.

Density plot

Further Analysis page 1

Column

The monthly spending plot depicts credit card expenditure over the months. We can infer from the visualization that spending increased in the first quarter of 2023, with the highest spending occurring in March.

Monthly Spending

Column

The chart indicates that the highest spending occurred in the Restaurants and Groceries categories, with Utilities and Gas/Fuel coming in a close second.

Bar chart

Further Analysis page 2

Column

From the map, we can see that the states with the highest credit card spending are California, Texas, and New York, as they appear in the darkest shade of blue. On the other hand, the states with the lowest spending are in north East region as they appear in the lightest shade of blue.

Additionally, we can see that there is a general trend of higher spending in the eastern and western regions of the United States, with lower spending in the central region.

Map

Further Analysis page 3

Column

Overall, there seems to be a consistent upward trend in transaction volume over time, with occasional dips in volume.

There are a few significant spikes in transaction volume, such as around mid-2019 and mid-2020, which may indicate some special events or promotions that led to increased transactions during those periods.

There is a clear seasonal pattern in transaction volume, with higher volume during the holiday season (around November and December) and lower volume during the summer months (around June to August).

There may be some weekly patterns in transaction volume as well, with lower volume on weekends and higher volume during weekdays, although this is not as pronounced as the seasonal pattern.

Column

Conclusion

  1. Credit card spending tends to increase during the holiday season, particularly in the months of November and December.
  2. Overall, there seems to be an upward trend in credit card spending over time, with some seasonal fluctuations.
  3. The majority of credit card transactions tend to be for amounts under $100.
  4. There are a few large transactions throughout the year that greatly affect the total spending for a given month.
  5. There is some variability in the number of credit card transactions per month, but this variability is not as pronounced as the variability in the total spending amounts.
---
title: " Credit Card Usage Visulization"
author: " Madhav Shukla"
date: "04-25-2023"
output: 
  flexdashboard::flex_dashboard:
    orientation: columns
    vertical_layout: fill
    source: embed
---

```{r setup, include=FALSE}
library(flexdashboard)
library(readxl)
raw_data <- read_excel("C:/Users/madha/OneDrive/Desktop/To Do/Schools/Harrisburg university/MS-Analytics/512-Data Visulization/Credit card Spend Analyser project/Discover-AllAvailable-20230412.xlsx")

# Load the necessary libraries
library(tidyverse)
library(lubridate)
library(ggplot2)
library(scales)
library(dplyr)


###Data cleaning 

# Filter out negative amounts that are payments from bank
clean_data <- raw_data %>%
  filter(Amount >= 0)

# arrange dates in MM-dd-yyyy format
library(lubridate)
library(dplyr)

# clean_data <- clean_data %>%
#   select(`Trans Date`, `Post Date`, Description, Location, Amount, Category)
# 
# clean_data <- clean_data %>% 
#   mutate(Trans_Date = as.Date(`Trans Date`, format = "%m/%d/%Y")) %>%
#   mutate(Post_Date = as.Date(`Post Date`, format = "%m/%d/%Y"))

```

Data Description
===================================== 

Column {data-width=300}
-----------------------------------------------------------------------

1. Transaction date: refers to the date when the transaction took place from April, 2021 to April 23 (2 years)
2. Post date: refers to the date when the transaction was processed and posted to your account.
3. Description: provides details about the transaction, such as the name of the merchant or service provider.
4. Location: indicates where the transaction took place, and amount is the cost of the transaction.
5. Category: can help you to organize and track your spending. such as groceries, utilities, entertainment, etc.\n


```{r}
head(clean_data, )
```

Column {data-width=350}
-----------------------------------------------------------------------
The density plot gives us a good overview of the distribution of transaction amounts, which can be helpful in identifying any unusual transactions or spending patterns. We can observe that the majority of transactions fall in the range of $0-$200, with a small number of transactions above $500.

### Density plot

```{r, warning=FALSE}
library(ggplot2)
library(scales)
library(plotly)

ggplotly(
  ggplot(clean_data, aes(x = Amount)) +
    geom_density(aes(y = after_stat(count)), fill = "black", alpha = 0.05) +
    scale_x_continuous(labels = dollar_format(prefix = "$")) +
    labs(title = "Density Plot of Credit Card Transaction Amounts",
         x = "Transaction Amount",
         y = "Number of Occurrences")
)
# 
# ggplot(clean_data, aes(x = Amount)) +
#   geom_density(aes(y = after_stat(count)), fill = "blue", alpha = 0.05) +
#   scale_x_continuous(labels = dollar_format(prefix = "$")) +
#   labs(title = "Density Plot of Credit Card Transaction Amounts",
#        x = "Transaction Amount",
#        y = "Number of Occurrences")
```

Further Analysis page 1
===================================== 

Column {data-width=500}
-----------------------------------------------------------------------

The monthly spending plot depicts credit card expenditure over the months. We can infer from the visualization that spending increased in the first quarter of 2023, with the highest spending occurring in March.

### Monthly Spending

```{r}

library(dplyr)
library(ggplot2)
library(scales)
library(plotly)

monthly_spending <- clean_data %>%
  mutate(month = format(`Trans Date`, "%b %Y")) %>%
  group_by(month) %>%
  summarize(total_spending = sum(Amount)) %>%
  arrange(as.Date(paste0("01 ", month), format = "%d %b %Y"))

p <- ggplot(monthly_spending, aes(x = month, y = total_spending)) +
  geom_bar(stat = "identity", fill = "blue", alpha = 0.5) +
  scale_x_discrete(limits = monthly_spending$month) +
  scale_y_continuous(labels = dollar_format(prefix = "$")) +
  labs(title = "Monthly Credit Card Spending",
       x = "Month",
       y = "Total Spending") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

ggplotly(p, tooltip = c("x", "y")) 
```


Column {data-width=500}
-----------------------------------------------------------------------

The chart indicates that the highest spending occurred in the Restaurants and Groceries categories, with Utilities and Gas/Fuel coming in a close second.

### Bar chart 


```{r}
# Create the Bar Chart
barchart <- clean_data %>%
  group_by(Category) %>%
  summarise(Total_Spending = sum(Amount),
            Average_Spending = mean(Amount),
            Number_of_Transactions = n_distinct(Description)) %>%
  mutate(Total_Spending = round(Total_Spending, 2),
         Average_Spending = round(Average_Spending, 2),
         Percent_of_Total = (Total_Spending / sum(Total_Spending)) * 100) %>%
  arrange(Total_Spending) %>%
  mutate(Category = factor(Category, levels = Category))

# Visualize the data
ggplot(barchart, aes(x = Category, y = Total_Spending, fill = Percent_of_Total)) +
  geom_bar(stat = "identity") +
  scale_fill_gradient(low = "lightblue", high = "darkblue") +
  geom_text(aes(label = paste0("$", Total_Spending, "\n", round(Percent_of_Total, 1), "%")),
            vjust = -0.5, size = 3, color = "black") +
  labs(title = "Credit Card Usage by Category",
       x = "Category",
       y = "Total Spending") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
```



Further Analysis page 2
===================================== 


Column {data-width=550}
-----------------------------------------------------------------------

From the map, we can see that the states with the highest credit card spending are California, Texas, and New York, as they appear in the darkest shade of blue. On the other hand, the states with the lowest spending are in north East region as they appear in the lightest shade of blue.

Additionally, we can see that there is a general trend of higher spending in the eastern and western regions of the United States, with lower spending in the central region.


### Map


```{r, warning=FALSE}
# library(ggplot2)
# library(maps)
# library(scales)
# 
# # Generate map of the United States
# states <- map_data("state")
# # Create lookup table for state abbreviations
# state_abbr <- data.frame(state = tolower(state.name), abb = state.abb)
# # Join state and state_abbr data frames
# state_join <- left_join(states, state_abbr, by = c("region" = "state"))
# 
# # Merge with clean_data by location
# map_data <- left_join(state_join, clean_data, by = c("abb" = "Location"))
# 
# # Merge with spending data by state name
# state_spend <- map_data %>%
#   group_by(region) %>%
#   summarize(Total_Spending = sum(Amount))
# 
# # Join with map_data by state name
# Final <- left_join(map_data, state_spend, by = c("region" = "region"))

# # Create map visualization
# ggplot(Final, aes(x = long, y = lat,  fill = Total_Spending)) +
#   geom_polygon(color = "white", linewidth = 0.1) +
#   coord_map() +
#   scale_fill_gradient(low = "lightblue", high = "darkblue", name = "Total Spending", label = dollar_format()) +
#   ggtitle("Total Credit Card Spending by State") +
#   theme(plot.title = element_text(hjust = 0.5))
library(ggplot2)
library(maps)
library(plotly)

# Generate map of the United States
states <- map_data("state")
# Create lookup table for state abbreviations
state_abbr <- data.frame(state = tolower(state.name), abb = state.abb)
# Join state and state_abbr data frames
state_join <- left_join(states, state_abbr, by = c("region" = "state"))

# Merge with clean_data by location
map_data <- left_join(state_join, clean_data, by = c("abb" = "Location"))

# Create plot with transaction points
p <- ggplot(map_data, aes(x = long, y = lat, color = Amount)) +
  geom_point() +
  coord_map() +
  scale_color_gradient(low = "lightblue", high = "darkblue", name = "Transaction Amount") +
  ggtitle("Credit Card Transactions by Location") +
  theme(plot.title = element_text(hjust = 0.5))

# Make plot interactive
ggplotly(p)

```






```{r, include=FALSE}
# library(caret)
# 
# 
# 
# # Split data into train and test sets
# set.seed(123)
# trainIndex <- createDataPartition(clean_data$Category, p = 0.8, list = FALSE)
# train_data <- clean_data[trainIndex, ]
# test_data <- clean_data[-trainIndex, ]
# 
# 
# library(randomForest)
# 
# # Train a random forest model using the train dataset
# rf_model <- train(Category ~ Amount + Description, data = train_data, method = "rf")
# 
# # Use the model to predict Category for the test dataset
# rf_pred <- predict(rf_model, newdata = test_data)
# 
# # Re-level the "Description" variable in new data
# new_data$Description <- factor(new_data$Description, levels = levels(clean_data$Description))
# 
# # Predict using the model and new data
# new_data$Prediction <- predict(model, newdata = new_data)
# 
# # Check the predicted values
# new_data$Prediction
# 
# # Evaluate the accuracy of the model using the test dataset
# confusionMatrix(rf_pred, test_data$Category)

```


<!-- Column {data-width=650} -->
<!-- ----------------------------------------------------------------------- -->
<!-- ```{r, include=FALSE}  -->

<!-- # Create a linear regression model -->
<!-- model <- lm(Amount ~ `Trans Date`, data = clean_data) -->

<!-- # Summarize the model -->
<!-- summary(model) -->

<!-- # Generate predictions for the next 30 days -->
<!-- new_dates <- seq(as.Date("2023-04-24"), by = "day", length.out = 30) -->
<!-- new_data <- data.frame(`Trans Date` = new_dates) -->
<!-- new_data$Amount <- predict(model, newdata = new_data) -->

<!-- # Print the predicted data -->
<!-- new_data -->

<!-- ``` -->


<!-- ### Expense Prediction Linear Model  -->

<!-- ```{r} -->
<!-- # Create a line chart of predicted transaction amounts -->
<!-- library(ggplot2) -->

<!-- ggplot(new_data, aes(x = `Trans Date`, y = Amount)) + -->
<!--   geom_line(color = "blue") + -->
<!--   scale_x_date(date_breaks = "1 week", date_labels = "%b %d") + -->
<!--   scale_y_continuous(labels = dollar_format(prefix = "$")) + -->
<!--   labs(title = "Predicted Credit Card Expenses for the Next 30 Days", -->
<!--        x = "Transaction Date", -->
<!--        y = "Transaction Amount") -->

<!-- ``` -->

Further Analysis page 3
===================================== 


Column {data-width=550}
-----------------------------------------------------------------------
Overall, there seems to be a consistent upward trend in transaction volume over time, with occasional dips in volume.

There are a few significant spikes in transaction volume, such as around mid-2019 and mid-2020, which may indicate some special events or promotions that led to increased transactions during those periods.

There is a clear seasonal pattern in transaction volume, with higher volume during the holiday season (around November and December) and lower volume during the summer months (around June to August).

There may be some weekly patterns in transaction volume as well, with lower volume on weekends and higher volume during weekdays, although this is not as pronounced as the seasonal pattern.


Column {data-width=550}
-----------------------------------------------------------------------

```{r}
# Load necessary libraries
library(dplyr)
library(ggplot2)
library(plotly)

# Create a summary of transactions by date
transactions_by_date <- clean_data %>%
  group_by(`Trans Date`) %>%
  summarize(trans_count = n())

# Create a line plot of transaction volume over time
ggplotly(
  ggplot(transactions_by_date, aes(x = `Trans Date`, y = trans_count)) +
    geom_line(color = "blue") +
    labs(title = "Transaction Volume over Time",
         x = "Transaction Date",
         y = "Transaction Count")
)

```

<!-- # ```{r} -->
<!-- # # Load required libraries -->
<!-- # library(dplyr) -->
<!-- # library(stringr) -->
<!-- #  -->
<!-- # # Create Merchant column using regular expression regex -->
<!-- # raw_data <- raw_data %>% -->
<!-- #   mutate(Merchant = str_extract(Description, "(?<=at\\s).*")) -->
<!-- #  -->
<!-- # # Group by Merchant and calculate total spending -->
<!-- # merchant_totals <- raw_data %>% -->
<!-- #   group_by(Merchant) %>% -->
<!-- #   summarize(Total_Spending = sum(Amount)) -->
<!-- #  -->
<!-- # # Sort by descending order of Total_Spending -->
<!-- # merchant_totals <- merchant_totals %>% -->
<!-- #   arrange(desc(Total_Spending)) -->
<!-- #  -->
<!-- # # Visualize top 10 Merchants by Total_Spending -->
<!-- # library(ggplot2) -->
<!-- #  -->
<!-- # ggplot(data = head(merchant_totals, 10), aes(x = reorder(Merchant, Total_Spending), y = Total_Spending)) + -->
<!-- #   geom_bar(stat = "identity") + -->
<!-- #   coord_flip() + -->
<!-- #   labs(x = "Merchant", y = "Total Spending", title = "Top 10 Merchants by Total Spending") -->
<!-- #  -->
<!-- # ``` -->

<!-- # ```{r} -->
<!-- # # Load required libraries -->
<!-- # library(dplyr) -->
<!-- # library(stringr) -->
<!-- #  -->
<!-- # # Create Credit_Balance column -->
<!-- # raw_data <- raw_data %>% -->
<!-- #   mutate(Date = as.Date(`Post_Date`, format = "%Y-%m-%d"), -->
<!-- #          Credit_Balance = Credit_Limit - cumsum(ifelse(str_detect(Description, "PAYMENT"), -1*Amount, Amount))) -->
<!-- #  -->
<!-- #  -->
<!-- # # Create Credit_Utilization column -->
<!-- # raw_data <- raw_data %>% -->
<!-- #   mutate(Credit_Utilization = round(Credit_Balance / Credit_Limit * 100, 2)) -->
<!-- #  -->
<!-- # # Group by Trans_Date and calculate average Credit_Utilization -->
<!-- # credit_utilization_by_date <- raw_data %>% -->
<!-- #   group_by(Trans_Date) %>% -->
<!-- #   summarize(Average_Credit_Utilization = mean(Credit_Utilization)) -->
<!-- #  -->
<!-- # # Visualize Credit_Utilization over time -->
<!-- # library(ggplot2) -->
<!-- #  -->
<!-- # ggplot(data = credit_utilization_by_date, aes(x = Trans_Date, y = Average_Credit_Utilization)) + -->
<!-- #   geom_line() + -->
<!-- #   labs(x = "Transaction Date", y = "Credit Utilization (%)", title = "Credit Utilization Over Time") -->

<!-- ``` -->


Conclusion
===================================== 
1. Credit card spending tends to increase during the holiday season, particularly in the months of November and December.
2. Overall, there seems to be an upward trend in credit card spending over time, with some seasonal fluctuations.
3. The majority of credit card transactions tend to be for amounts under $100.
4. There are a few large transactions throughout the year that greatly affect the total spending for a given month.
5. There is some variability in the number of credit card transactions per month, but this variability is not as pronounced as the variability in the total spending amounts.