Overview of the Credit Card transaction Analysis

CC_Transactions

CC_Transactions


Lab for ANLY512: Data Visualization

I was struggling keeping up with payments for my Discover Creditcard and wanted to analyse where I overspent my budget and will be analysing transactions for past 24month to pin point the high spending month and narrowing down the data with help of Vizualisation.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 1) What is the total spending by month in 2016 and 2017? 2) Which category costed the most in 2016 and 2017? 3) For Merchandise, what is the spending by month in 2016 and 2017? 4) In February and December, what is the spending in each category? 5) Compared to 2016,2017 and 2018 what Are the differences?

Data preparation

Sample raw data in 2016
Transaction_date Posted_date Description Amount Category Transaction_Month Transaction_Year
2016-05-11 2016-05-13 HERTZ CAR RENTAL PORTLAND OR 43.37 Travel/ Entertainment 5 2016
2016-05-18 2016-05-18 SEARS.COM 09300 W DES MOINES IA 320.74 Merchandise 5 2016
2016-05-21 2016-05-21 AMAZON MKTPLACE PMTS AMZN.COM/BILLWAK771B007XAL 14.99 Merchandise 5 2016
2016-05-24 2016-05-24 INTERNET PAYMENT - THANK YOU -1000.00 Payments and Credits 5 2016
2016-05-27 2016-05-27 THAI LUANG RESTAUR HERNDON VA 15.00 Restaurants 5 2016
2016-05-30 2016-05-30 SHELL 57543025803 THORNBURG VA 45.00 Gasoline 5 2016
2016-06-02 2016-06-02 IL TESORO WASHINGTON DC 15.09 Restaurants 6 2016
2016-06-04 2016-06-04 PHARAOHS CAFE STERLING VA 15.90 Restaurants 6 2016
2016-06-04 2016-06-04 SEARS.COM 09300 W DES MOINES IA 4.21 Merchandise 6 2016
2016-06-04 2016-06-04 SHALIMAR RESTAURAN STERLING VA 11.65 Restaurants 6 2016
2016-06-04 2016-06-04 TGI FRIDAYS #0098 HERNDON VA 45.00 Restaurants 6 2016
2016-06-05 2016-06-05 HAIR CUTTERY #1850 HERNDON VA 21.00 Services 6 2016
2016-06-06 2016-06-06 FAN MAE WOR60031739579 HERNDON VA 7.92 Restaurants 6 2016
2016-06-06 2016-06-06 PAYPAL *CARDCASH CARDCA 402-935-7733 CA 72.25 Services 6 2016
2016-06-09 2016-06-09 BESTBUYCOM6738060 888-237289 MN 243.75 Merchandise 6 2016
Sample raw data in 2017
Transaction_date Posted_date Description Amount Category Transaction_Month Transaction_Year
169 2017-01-04 2017-01-04 AMAZON MKTPLACE PMTS AMAZON MKTPLAWA7BOEJV871UC 2.34 Merchandise 1 2017
170 2017-01-06 2017-01-06 APL*APPLEONLINESTOREUS 800-676-2775 CA40052NNN W460948243F 586.74 Merchandise 1 2017
171 2017-01-13 2017-01-13 INTERNET PAYMENT - THANK YOU -250.00 Payments and Credits 1 2017
172 2017-01-13 2017-01-13 SUNOCO 0154747001 HERNDON VA 26.00 Gasoline 1 2017
173 2017-01-14 2017-01-14 ADNAN HALAL MEAT & GROCE GLEN ALLEN VA 11.76 Supermarkets 1 2017
174 2017-01-14 2017-01-14 PETER CHANG CAFE GLEN ALLEN VA 33.00 Restaurants 1 2017
175 2017-02-03 2017-02-03 AMAZON.COM AMAZON.COM WA5WW5FM314PJ 56.31 Merchandise 2 2017
176 2017-02-07 2017-02-07 BESTBUYCOM795935003257 888-BESTBUY MN 49.95 Merchandise 2 2017
177 2017-02-07 2017-02-07 INTERNET PAYMENT - THANK YOU -620.16 Payments and Credits 2 2017
178 2017-02-08 2017-02-08 AMAZON MKTPLACE PMTS AMZN.COM/BILLWAVUG0FWPP53K 12.90 Merchandise 2 2017
179 2017-02-13 2017-02-13 CASHBACK BONUS REDEMPTION PYMT/STMT CRDT -30.39 Awards and Rebate Credits 2 2017
180 2017-02-14 2017-02-14 APL*APPLEONLINESTOREUS 800-676-2775 CA40057D0X W465439806F 538.00 Merchandise 2 2017
181 2017-02-22 2017-02-22 AMAZON MKTPLACE PMTS AMZN.COM/BILLWAE353MEL1C0D 30.94 Merchandise 2 2017
182 2017-02-22 2017-02-22 INTERNET PAYMENT - THANK YOU -570.46 Payments and Credits 2 2017
183 2017-02-23 2017-02-23 APL*APPLEONLINESTOREUS 800-676-2775 CA400584O9 B489167402H 783.00 Merchandise 2 2017

Q1: What is the total spending by month in 2016 and 2017?


Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 800 dollars. Total spending in February and December are the most.

Q2: Which category costed the most in 2016 and 2017?


Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here We see the category I spend the most money is Merchandise, arount $10000, which is much higher than the other categories. The second category is Travel/Entertainemnt, around $2600.

Q3: For Merchandise, what is the spending by month in 2016 and 2017?


We see that for most months, the spending on Merchandise was less than 1000 dollars. However, in Feb’2017 and Dec’2017, it is 4 times higher than the other months.

Q4: In march and November, what is the spending in each category?


From the plot we can see that, still, the most spending is Merchandise. Besides, in November, the second spending is Travel/Entertainment. I checked detailed information. The auto insurance was renewed in March and Thanksgiving holiday was in November, which caused the majority of the spending.

Q5: Compared to 2017, what is the difference


Since the purpose was to compare the spending between 2016,2017 and 2018, a grouped bar chart was used. From the plot we can see that, overall spending in 2017 is much less than that in 2016, especially Merchandise. The main reason is that during 2016, the most purchases were made by discover card.

---
title: "ANLY 512 - Dashboarding Lab"
subtitle: "Credit Card Transactions"
author: "Nikhil Siddavanahalli Prakash"
date: "`r Sys.Date()`"
output: 
  flexdashboard::flex_dashboard:
    storyboard: true
    social: menu
    source: embed
    orientation: columns
    vertical_layout: fill
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
if(!require(flexdashboard))
{install.packages("flexdashboard") 
  library(flexdashboard)}
if(!require(knitr))
{install.packages("knitr") 
  library(knitr)}
if(!require(ggplot2))
{install.packages("ggplot2") 
  library(ggplot2)}
if(!require(tidyverse))
{install.packages("tidyverse") 
  library(tidyverse)}
if(!require(dplyr))
{install.packages("dplyr") 
  library(dplyr)}
if(!require(zoo))
{install.packages("zoo") 
  library(zoo)}
if(!require(lubridate))
{install.packages("lubridate") 
  library(lubridate)}
if(!require(readxl))
{install.packages("readxl") 
  library(readxl)}
```
###Overview of the Credit Card transaction Analysis
![CC_Transactions](/Users/nprak/Desktop/Logo.png)

***
Lab for ANLY512: Data Visualization

I was struggling keeping up with payments for my Discover Creditcard and wanted to analyse where I overspent my budget and will be analysing transactions for past 24month to pin point the high spending month and narrowing down the 
data with help of Vizualisation.

The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 
1) What is the total spending by month in 2016 and 2017?
2) Which category costed the most in 2016 and 2017?
3) For Merchandise, what is the spending by month in 2016 and 2017?
4) In February and December, what is the spending in each category?
5) Compared to 2016,2017 and 2018 what Are the differences?

### Data preparation

```{r}
Discover_AllAvailable_20180514 <- read_excel("C:/Users/nprak/Desktop/Harrisburg Courses/ANLY_512/Discover-AllAvailable-20180514.xlsx")
View(Discover_AllAvailable_20180514)
ANLY_512<- Discover_AllAvailable_20180514
ANLY_512<- data.frame(cbind(ANLY_512,month(ANLY_512$`Trans. date`),year(ANLY_512$`Trans. date`)))
colnames(ANLY_512) = c("Transaction_date","Posted_date","Description","Amount","Category","Transaction_Month","Transaction_Year")
ANLY_512_2016 <- subset(ANLY_512, ANLY_512$Transaction_Year=='2016')
kable(ANLY_512_2016[1:15,], caption="Sample raw data in 2016")
ANLY_512_2017 <- subset(ANLY_512, ANLY_512$Transaction_Year=='2017')
kable(ANLY_512_2017[1:15,], caption="Sample raw data in 2017")
```

***

- There are 7 variables for analysis:

- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent	
- Category
- Year: 2016, 2017 and 2018


  
###Q1: What is the total spending by month in 2016 and 2017?

```{r}
# plot

Plot_2016<- ggplot(ANLY_512_2016, aes(x = ANLY_512_2016$Transaction_Month, y=ANLY_512_2016$Amount)) + 
     geom_bar(stat = "identity", fill = "Red")+
     scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
     labs(title = "Spending per Month for 2016", x = "Month", y = "Amount") +
     theme_minimal()
Plot_2016
Plot_2017<- ggplot(ANLY_512_2017, aes(x = ANLY_512_2017$Transaction_Month, y=ANLY_512_2017$Amount)) + 
     geom_bar(stat = "identity", fill = "Blue")+
     scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
     labs(title = "Spending per Month for 2017", x = "Month", y = "Amount") +
     theme_minimal()
Plot_2017
```

***
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 800 dollars. Total spending in February and December are the most.

###Q2: Which category costed the most in 2016 and 2017? 

```{r}
library(ggplot2)
ggplot(ANLY_512, aes(x = ANLY_512$Category, y=ANLY_512$Amount, fill=ANLY_512$Category)) + 
     geom_bar(stat = "identity", fill = "Blue")+
     labs(title = "Spending per Category", x = "Category", y = "Amount") +
     coord_flip()
```

***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Merchandise, arount $10000, which is much higher than the other categories. 
The second category is Travel/Entertainemnt, around $2600.

###Q3: For Merchandise, what is the spending by month in 2016 and 2017?

```{r}
 Merchandise_16<- subset(ANLY_512_2016, ANLY_512_2016$Category=='Merchandise')
 Merchandise_17<- subset(ANLY_512_2017, ANLY_512_2017$Category=='Merchandise')
 Merchandise_16_Plot<- ggplot(Merchandise_16, aes(x = Merchandise_16$Transaction_Month, y=Merchandise_16$Amount)) + 
     geom_bar(stat = "identity", fill = "Red")+
     scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
     labs(title = "Spending per Month for 2016", x = "Month", y = "Amount") +
     theme_minimal()
 Merchandise_16_Plot
 Merchandise_17_Plot<- ggplot(Merchandise_17, aes(x = Merchandise_17$Transaction_Month, y=Merchandise_17$Amount)) + 
     geom_bar(stat = "identity", fill = "Blue")+
     scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
     labs(title = "Spending per Month for 2017", x = "Month", y = "Amount") +
     theme_minimal()
 Merchandise_17_Plot
```

***
We see that for most months, the spending on Merchandise was less than 1000 dollars. However, in Feb'2017 and Dec'2017, it is 4 times higher than the other months.

###Q4: In march and November, what is the spending in each category?

```{r}
 Merchandise_17_Feb<- subset(ANLY_512_2017, ANLY_512_2017$Transaction_Month=='2')
 Merchandise_17_Dec<- subset(ANLY_512_2017, ANLY_512_2017$Transaction_Month=='12')
 Merchandise_17_Feb_Plot <-ggplot(Merchandise_17_Feb, aes(x = Merchandise_17_Feb$Category, y=Merchandise_17_Feb$Amount, fill=Merchandise_17_Feb$Category)) + 
     geom_bar(stat = "identity", fill = "Blue")+
     labs(title = "Spending per Category", x = "Category", y = "Amount") +
     coord_flip()
 Merchandise_17_Feb_Plot
 Merchandise_17_Dec_Plot <-ggplot(Merchandise_17_Dec, aes(x = Merchandise_17_Dec$Category, y=Merchandise_17_Dec$Amount, fill=Merchandise_17_Dec$Category)) + 
          geom_bar(stat = "identity", fill = "Gold")+
          labs(title = "Spending per Category", x = "Category", y = "Amount") +
          coord_flip()
 Merchandise_17_Dec_Plot
```


***
From the plot we can see that, still, the most spending is Merchandise. Besides, in November, the second spending is Travel/Entertainment.
I checked detailed information. The auto insurance was renewed in March and Thanksgiving holiday was in November, which caused the majority of the spending.

###Q5: Compared to 2017, what is the difference

```{r}
ggplot(data=ANLY_512, aes(x=ANLY_512$Category, y=ANLY_512$Amount)) +   
     geom_bar(aes(fill = factor(ANLY_512$Transaction_Year)), position = "dodge", stat="identity") +
     labs(title = "Spending per Category", x = "Category", y = "Amount") +
     theme(legend.position = "Right") +
     scale_fill_discrete(name="Year") +
     theme(legend.position = "bottom") +
     coord_flip()
```


***
Since the purpose was to compare the spending between 2016,2017 and 2018, a grouped bar chart was used.
From the plot we can see that, overall spending in 2017 is much less than that in 2016, especially Merchandise. 
The main reason is that during 2016, the most purchases were made by discover card.