CC_Transactions
Lab for ANLY512: Data Visualization
I was struggling keeping up with payments for my Discover Creditcard and wanted to analyse where I overspent my budget and will be analysing transactions for past 24month to pin point the high spending month and narrowing down the data with help of Vizualisation.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected 1) What is the total spending by month in 2016 and 2017? 2) Which category costed the most in 2016 and 2017? 3) For Merchandise, what is the spending by month in 2016 and 2017? 4) In February and December, what is the spending in each category? 5) Compared to 2016,2017 and 2018 what Are the differences?
| Transaction_date | Posted_date | Description | Amount | Category | Transaction_Month | Transaction_Year |
|---|---|---|---|---|---|---|
| 2016-05-11 | 2016-05-13 | HERTZ CAR RENTAL PORTLAND OR | 43.37 | Travel/ Entertainment | 5 | 2016 |
| 2016-05-18 | 2016-05-18 | SEARS.COM 09300 W DES MOINES IA | 320.74 | Merchandise | 5 | 2016 |
| 2016-05-21 | 2016-05-21 | AMAZON MKTPLACE PMTS AMZN.COM/BILLWAK771B007XAL | 14.99 | Merchandise | 5 | 2016 |
| 2016-05-24 | 2016-05-24 | INTERNET PAYMENT - THANK YOU | -1000.00 | Payments and Credits | 5 | 2016 |
| 2016-05-27 | 2016-05-27 | THAI LUANG RESTAUR HERNDON VA | 15.00 | Restaurants | 5 | 2016 |
| 2016-05-30 | 2016-05-30 | SHELL 57543025803 THORNBURG VA | 45.00 | Gasoline | 5 | 2016 |
| 2016-06-02 | 2016-06-02 | IL TESORO WASHINGTON DC | 15.09 | Restaurants | 6 | 2016 |
| 2016-06-04 | 2016-06-04 | PHARAOHS CAFE STERLING VA | 15.90 | Restaurants | 6 | 2016 |
| 2016-06-04 | 2016-06-04 | SEARS.COM 09300 W DES MOINES IA | 4.21 | Merchandise | 6 | 2016 |
| 2016-06-04 | 2016-06-04 | SHALIMAR RESTAURAN STERLING VA | 11.65 | Restaurants | 6 | 2016 |
| 2016-06-04 | 2016-06-04 | TGI FRIDAYS #0098 HERNDON VA | 45.00 | Restaurants | 6 | 2016 |
| 2016-06-05 | 2016-06-05 | HAIR CUTTERY #1850 HERNDON VA | 21.00 | Services | 6 | 2016 |
| 2016-06-06 | 2016-06-06 | FAN MAE WOR60031739579 HERNDON VA | 7.92 | Restaurants | 6 | 2016 |
| 2016-06-06 | 2016-06-06 | PAYPAL *CARDCASH CARDCA 402-935-7733 CA | 72.25 | Services | 6 | 2016 |
| 2016-06-09 | 2016-06-09 | BESTBUYCOM6738060 888-237289 MN | 243.75 | Merchandise | 6 | 2016 |
| Transaction_date | Posted_date | Description | Amount | Category | Transaction_Month | Transaction_Year | |
|---|---|---|---|---|---|---|---|
| 169 | 2017-01-04 | 2017-01-04 | AMAZON MKTPLACE PMTS AMAZON MKTPLAWA7BOEJV871UC | 2.34 | Merchandise | 1 | 2017 |
| 170 | 2017-01-06 | 2017-01-06 | APL*APPLEONLINESTOREUS 800-676-2775 CA40052NNN W460948243F | 586.74 | Merchandise | 1 | 2017 |
| 171 | 2017-01-13 | 2017-01-13 | INTERNET PAYMENT - THANK YOU | -250.00 | Payments and Credits | 1 | 2017 |
| 172 | 2017-01-13 | 2017-01-13 | SUNOCO 0154747001 HERNDON VA | 26.00 | Gasoline | 1 | 2017 |
| 173 | 2017-01-14 | 2017-01-14 | ADNAN HALAL MEAT & GROCE GLEN ALLEN VA | 11.76 | Supermarkets | 1 | 2017 |
| 174 | 2017-01-14 | 2017-01-14 | PETER CHANG CAFE GLEN ALLEN VA | 33.00 | Restaurants | 1 | 2017 |
| 175 | 2017-02-03 | 2017-02-03 | AMAZON.COM AMAZON.COM WA5WW5FM314PJ | 56.31 | Merchandise | 2 | 2017 |
| 176 | 2017-02-07 | 2017-02-07 | BESTBUYCOM795935003257 888-BESTBUY MN | 49.95 | Merchandise | 2 | 2017 |
| 177 | 2017-02-07 | 2017-02-07 | INTERNET PAYMENT - THANK YOU | -620.16 | Payments and Credits | 2 | 2017 |
| 178 | 2017-02-08 | 2017-02-08 | AMAZON MKTPLACE PMTS AMZN.COM/BILLWAVUG0FWPP53K | 12.90 | Merchandise | 2 | 2017 |
| 179 | 2017-02-13 | 2017-02-13 | CASHBACK BONUS REDEMPTION PYMT/STMT CRDT | -30.39 | Awards and Rebate Credits | 2 | 2017 |
| 180 | 2017-02-14 | 2017-02-14 | APL*APPLEONLINESTOREUS 800-676-2775 CA40057D0X W465439806F | 538.00 | Merchandise | 2 | 2017 |
| 181 | 2017-02-22 | 2017-02-22 | AMAZON MKTPLACE PMTS AMZN.COM/BILLWAE353MEL1C0D | 30.94 | Merchandise | 2 | 2017 |
| 182 | 2017-02-22 | 2017-02-22 | INTERNET PAYMENT - THANK YOU | -570.46 | Payments and Credits | 2 | 2017 |
| 183 | 2017-02-23 | 2017-02-23 | APL*APPLEONLINESTOREUS 800-676-2775 CA400584O9 B489167402H | 783.00 | Merchandise | 2 | 2017 |
There are 7 variables for analysis:
Year: 2016, 2017 and 2018
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data. I summarized the total amount spent in each month and usd bar chart display the numbers. From the plot we could find that, average spending over month is around 800 dollars. Total spending in February and December are the most.
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created. Here We see the category I spend the most money is Merchandise, arount $10000, which is much higher than the other categories. The second category is Travel/Entertainemnt, around $2600.
We see that for most months, the spending on Merchandise was less than 1000 dollars. However, in Feb’2017 and Dec’2017, it is 4 times higher than the other months.
From the plot we can see that, still, the most spending is Merchandise. Besides, in November, the second spending is Travel/Entertainment. I checked detailed information. The auto insurance was renewed in March and Thanksgiving holiday was in November, which caused the majority of the spending.
Since the purpose was to compare the spending between 2016,2017 and 2018, a grouped bar chart was used. From the plot we can see that, overall spending in 2017 is much less than that in 2016, especially Merchandise. The main reason is that during 2016, the most purchases were made by discover card.
---
title: "ANLY 512 - Dashboarding Lab"
subtitle: "Credit Card Transactions"
author: "Nikhil Siddavanahalli Prakash"
date: "`r Sys.Date()`"
output:
flexdashboard::flex_dashboard:
storyboard: true
social: menu
source: embed
orientation: columns
vertical_layout: fill
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
if(!require(flexdashboard))
{install.packages("flexdashboard")
library(flexdashboard)}
if(!require(knitr))
{install.packages("knitr")
library(knitr)}
if(!require(ggplot2))
{install.packages("ggplot2")
library(ggplot2)}
if(!require(tidyverse))
{install.packages("tidyverse")
library(tidyverse)}
if(!require(dplyr))
{install.packages("dplyr")
library(dplyr)}
if(!require(zoo))
{install.packages("zoo")
library(zoo)}
if(!require(lubridate))
{install.packages("lubridate")
library(lubridate)}
if(!require(readxl))
{install.packages("readxl")
library(readxl)}
```
###Overview of the Credit Card transaction Analysis

***
Lab for ANLY512: Data Visualization
I was struggling keeping up with payments for my Discover Creditcard and wanted to analyse where I overspent my budget and will be analysing transactions for past 24month to pin point the high spending month and narrowing down the
data with help of Vizualisation.
The goal of the project is to collect, analyze and visualize the data using the tools and methods covered in class. Additionally, using the data-driven approach, I will create a summary which answers the following questions based on the data collected
1) What is the total spending by month in 2016 and 2017?
2) Which category costed the most in 2016 and 2017?
3) For Merchandise, what is the spending by month in 2016 and 2017?
4) In February and December, what is the spending in each category?
5) Compared to 2016,2017 and 2018 what Are the differences?
### Data preparation
```{r}
Discover_AllAvailable_20180514 <- read_excel("C:/Users/nprak/Desktop/Harrisburg Courses/ANLY_512/Discover-AllAvailable-20180514.xlsx")
View(Discover_AllAvailable_20180514)
ANLY_512<- Discover_AllAvailable_20180514
ANLY_512<- data.frame(cbind(ANLY_512,month(ANLY_512$`Trans. date`),year(ANLY_512$`Trans. date`)))
colnames(ANLY_512) = c("Transaction_date","Posted_date","Description","Amount","Category","Transaction_Month","Transaction_Year")
ANLY_512_2016 <- subset(ANLY_512, ANLY_512$Transaction_Year=='2016')
kable(ANLY_512_2016[1:15,], caption="Sample raw data in 2016")
ANLY_512_2017 <- subset(ANLY_512, ANLY_512$Transaction_Year=='2017')
kable(ANLY_512_2017[1:15,], caption="Sample raw data in 2017")
```
***
- There are 7 variables for analysis:
- Tran. Date: Transaction Date
- Post Date
- Month
- Description: Detailed Description of purchases
- Amount: The total amount spent
- Category
- Year: 2016, 2017 and 2018
###Q1: What is the total spending by month in 2016 and 2017?
```{r}
# plot
Plot_2016<- ggplot(ANLY_512_2016, aes(x = ANLY_512_2016$Transaction_Month, y=ANLY_512_2016$Amount)) +
geom_bar(stat = "identity", fill = "Red")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month for 2016", x = "Month", y = "Amount") +
theme_minimal()
Plot_2016
Plot_2017<- ggplot(ANLY_512_2017, aes(x = ANLY_512_2017$Transaction_Month, y=ANLY_512_2017$Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month for 2017", x = "Month", y = "Amount") +
theme_minimal()
Plot_2017
```
***
Since Month is categorical and Amount is continous, and I was aimed to compare the total mounth in each month, bar chart was used to summarize the data.
I summarized the total amount spent in each month and usd bar chart display the numbers.
From the plot we could find that, average spending over month is around 800 dollars. Total spending in February and December are the most.
###Q2: Which category costed the most in 2016 and 2017?
```{r}
library(ggplot2)
ggplot(ANLY_512, aes(x = ANLY_512$Category, y=ANLY_512$Amount, fill=ANLY_512$Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
```
***
Due to the same reason, bar chart was used here. The name of each category is long and can not be displayed at the bottom, so a rotated bar chart was created.
Here We see the category I spend the most money is Merchandise, arount $10000, which is much higher than the other categories.
The second category is Travel/Entertainemnt, around $2600.
###Q3: For Merchandise, what is the spending by month in 2016 and 2017?
```{r}
Merchandise_16<- subset(ANLY_512_2016, ANLY_512_2016$Category=='Merchandise')
Merchandise_17<- subset(ANLY_512_2017, ANLY_512_2017$Category=='Merchandise')
Merchandise_16_Plot<- ggplot(Merchandise_16, aes(x = Merchandise_16$Transaction_Month, y=Merchandise_16$Amount)) +
geom_bar(stat = "identity", fill = "Red")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month for 2016", x = "Month", y = "Amount") +
theme_minimal()
Merchandise_16_Plot
Merchandise_17_Plot<- ggplot(Merchandise_17, aes(x = Merchandise_17$Transaction_Month, y=Merchandise_17$Amount)) +
geom_bar(stat = "identity", fill = "Blue")+
scale_x_discrete(limits=c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec")) +
labs(title = "Spending per Month for 2017", x = "Month", y = "Amount") +
theme_minimal()
Merchandise_17_Plot
```
***
We see that for most months, the spending on Merchandise was less than 1000 dollars. However, in Feb'2017 and Dec'2017, it is 4 times higher than the other months.
###Q4: In march and November, what is the spending in each category?
```{r}
Merchandise_17_Feb<- subset(ANLY_512_2017, ANLY_512_2017$Transaction_Month=='2')
Merchandise_17_Dec<- subset(ANLY_512_2017, ANLY_512_2017$Transaction_Month=='12')
Merchandise_17_Feb_Plot <-ggplot(Merchandise_17_Feb, aes(x = Merchandise_17_Feb$Category, y=Merchandise_17_Feb$Amount, fill=Merchandise_17_Feb$Category)) +
geom_bar(stat = "identity", fill = "Blue")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
Merchandise_17_Feb_Plot
Merchandise_17_Dec_Plot <-ggplot(Merchandise_17_Dec, aes(x = Merchandise_17_Dec$Category, y=Merchandise_17_Dec$Amount, fill=Merchandise_17_Dec$Category)) +
geom_bar(stat = "identity", fill = "Gold")+
labs(title = "Spending per Category", x = "Category", y = "Amount") +
coord_flip()
Merchandise_17_Dec_Plot
```
***
From the plot we can see that, still, the most spending is Merchandise. Besides, in November, the second spending is Travel/Entertainment.
I checked detailed information. The auto insurance was renewed in March and Thanksgiving holiday was in November, which caused the majority of the spending.
###Q5: Compared to 2017, what is the difference
```{r}
ggplot(data=ANLY_512, aes(x=ANLY_512$Category, y=ANLY_512$Amount)) +
geom_bar(aes(fill = factor(ANLY_512$Transaction_Year)), position = "dodge", stat="identity") +
labs(title = "Spending per Category", x = "Category", y = "Amount") +
theme(legend.position = "Right") +
scale_fill_discrete(name="Year") +
theme(legend.position = "bottom") +
coord_flip()
```
***
Since the purpose was to compare the spending between 2016,2017 and 2018, a grouped bar chart was used.
From the plot we can see that, overall spending in 2017 is much less than that in 2016, especially Merchandise.
The main reason is that during 2016, the most purchases were made by discover card.