Introduction

This project presents an in-depth analysis of UPI (Unified Payments Interface) transactions using R programming. The objective is to explore customer transaction behavior, payment trends, and banking performance using real-world financial data.

With the rapid growth of digital payments in India, UPI has become one of the most widely used payment systems. This analysis focuses on understanding how users interact with UPI platforms by studying transaction amounts, frequency, time patterns, merchant activity, and demographic insights.

The dataset contains approximately 20,000 transactions with features such as transaction date and time, amount, payment mode, city, customer age, gender, bank details, and transaction status.

Through data cleaning, feature engineering, and visualization techniques using ggplot2 and dplyr, this project uncovers meaningful business insights such as peak transaction hours, top-performing merchants, customer spending behavior, and regional transaction trends.

The final output includes multiple visualizations and an interactive plot to help better understand digital payment behavior and support data-driven decision-making.
library(readxl)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(scales)
library(hms)
## 
## Attaching package: 'hms'
## The following object is masked from 'package:lubridate':
## 
##     hms
library(tidyr)

#2.Import Dataset

upi <- read_excel("UPI Transactions.xlsx")

#3. Data Cleaning

str(upi)
## tibble [20,000 × 20] (S3: tbl_df/tbl/data.frame)
##  $ TransactionID        : chr [1:20000] "TXN00001" "TXN00002" "TXN00003" "TXN00004" ...
##  $ TransactionDate      : POSIXct[1:20000], format: "2024-02-02" "2024-03-03" ...
##  $ Amount               : num [1:20000] 272 1065 144 613 743 ...
##  $ BankNameSent         : chr [1:20000] "SBI Bank" "ICICI Bank" "Axis Bank" "HDFC Bank" ...
##  $ BankNameReceived     : chr [1:20000] "HDFC Bank" "SBI Bank" "Axis Bank" "ICICI Bank" ...
##  $ RemainingBalance     : num [1:20000] 5557 9753 7597 2328 1137 ...
##  $ City                 : chr [1:20000] "Delhi" "Bangalore" "Hyderabad" "Mumbai" ...
##  $ Gender               : chr [1:20000] "Female" "Male" "Female" "Male" ...
##  $ TransactionType      : chr [1:20000] "Transfer" "Payment" "Transfer" "Payment" ...
##  $ Status               : chr [1:20000] "Success" "Success" "Success" "Success" ...
##  $ TransactionTime      : POSIXct[1:20000], format: "1899-12-31 17:12:14" "1899-12-31 11:15:02" ...
##  $ DeviceType           : chr [1:20000] "Tablet" "Laptop" "Mobile" "Tablet" ...
##  $ PaymentMethod        : chr [1:20000] "Phone Number" "QR Code" "UPI ID" "Phone Number" ...
##  $ MerchantName         : chr [1:20000] "Amazon" "Zomato" "Swiggy" "IRCTC" ...
##  $ Purpose              : chr [1:20000] "Food" "Travel" "Bill Payment" "Others" ...
##  $ CustomerAge          : num [1:20000] 21 22 23 24 25 26 27 28 29 30 ...
##  $ PaymentMode          : chr [1:20000] "Scheduled" "Instant" "Scheduled" "Instant" ...
##  $ Currency             : chr [1:20000] "USD" "EUR" "GBP" "INR" ...
##  $ CustomerAccountNumber: num [1:20000] 1.23e+11 1.23e+11 1.23e+11 1.23e+11 1.23e+11 ...
##  $ MerchantAccountNumber: num [1:20000] 9.88e+11 9.88e+11 9.88e+11 9.88e+11 9.88e+11 ...
summary(upi)
##    TransactionID   TransactionDate                   Amount       
##  Length   :20000   Min.   :2024-01-01 00:00:00   Min.   :   0.05  
##  N.unique :20000   1st Qu.:2024-04-02 00:00:00   1st Qu.: 498.07  
##  N.blank  :    0   Median :2024-06-30 00:00:00   Median : 983.39  
##  Min.nchar:    8   Mean   :2024-06-30 19:36:37   Mean   : 993.61  
##  Max.nchar:    8   3rd Qu.:2024-09-27 00:00:00   3rd Qu.:1494.38  
##                    Max.   :2024-12-30 00:00:00   Max.   :1999.87  
##     BankNameSent    BankNameReceived RemainingBalance         City      
##  Length   :20000   Length   :20000   Min.   :   0.53   Length   :20000  
##  N.unique :    4   N.unique :    4   1st Qu.:2528.49   N.unique :    4  
##  N.blank  :    0   N.blank  :    0   Median :5024.52   N.blank  :    0  
##  Min.nchar:    8   Min.nchar:    8   Mean   :5020.54   Min.nchar:    5  
##  Max.nchar:   10   Max.nchar:   10   3rd Qu.:7531.24   Max.nchar:    9  
##                                      Max.   :9999.47                    
##        Gender       TransactionType        Status     
##  Length   :20000   Length   :20000   Length   :20000  
##  N.unique :    2   N.unique :    2   N.unique :    2  
##  N.blank  :    0   N.blank  :    0   N.blank  :    0  
##  Min.nchar:    4   Min.nchar:    7   Min.nchar:    6  
##  Max.nchar:    6   Max.nchar:    8   Max.nchar:    7  
##                                                       
##  TransactionTime                   DeviceType      PaymentMethod  
##  Min.   :1899-12-31 00:00:01   Length   :20000   Length   :20000  
##  1st Qu.:1899-12-31 05:51:04   N.unique :    3   N.unique :    3  
##  Median :1899-12-31 11:51:29   N.blank  :    0   N.blank  :    0  
##  Mean   :1899-12-31 11:54:57   Min.nchar:    6   Min.nchar:    6  
##  3rd Qu.:1899-12-31 17:54:33   Max.nchar:    6   Max.nchar:   12  
##  Max.   :1899-12-31 23:59:54                                      
##     MerchantName        Purpose       CustomerAge       PaymentMode   
##  Length   :20000   Length   :20000   Min.   :20.00   Length   :20000  
##  N.unique :    5   N.unique :    5   1st Qu.:29.75   N.unique :    2  
##  N.blank  :    0   N.blank  :    0   Median :39.50   N.blank  :    0  
##  Min.nchar:    5   Min.nchar:    4   Mean   :39.50   Min.nchar:    7  
##  Max.nchar:    8   Max.nchar:   12   3rd Qu.:49.25   Max.nchar:    9  
##                                      Max.   :59.00                    
##       Currency     CustomerAccountNumber MerchantAccountNumber
##  Length   :20000   Min.   :1.235e+11     Min.   :9.877e+11    
##  N.unique :    4   1st Qu.:1.235e+11     1st Qu.:9.877e+11    
##  N.blank  :    0   Median :1.235e+11     Median :9.877e+11    
##  Min.nchar:    3   Mean   :1.235e+11     Mean   :9.877e+11    
##  Max.nchar:    3   3rd Qu.:1.235e+11     3rd Qu.:9.877e+11    
##                    Max.   :1.235e+11     Max.   :9.877e+11
# Convert Date & Time
upi$TransactionDate <- as.Date(upi$TransactionDate)
upi$TransactionTime <- as_hms(upi$TransactionTime)

# Remove duplicates
upi <- distinct(upi)

# Missing values check
colSums(is.na(upi))
##         TransactionID       TransactionDate                Amount 
##                     0                     0                     0 
##          BankNameSent      BankNameReceived      RemainingBalance 
##                     0                     0                     0 
##                  City                Gender       TransactionType 
##                     0                     0                     0 
##                Status       TransactionTime            DeviceType 
##                     0                     0                     0 
##         PaymentMethod          MerchantName               Purpose 
##                     0                     0                     0 
##           CustomerAge           PaymentMode              Currency 
##                     0                     0                     0 
## CustomerAccountNumber MerchantAccountNumber 
##                     0                     0
#4. Feature Engineering

upi <- upi %>%
  mutate(
    Month = month(TransactionDate, label = TRUE),
    Hour = hour(TransactionTime),
    
    AgeGroup = case_when(
      CustomerAge < 25 ~ "18-24",
      CustomerAge < 35 ~ "25-34",
      CustomerAge < 45 ~ "35-44",
      TRUE ~ "45+"
    )
  )

### 8 Professional Visualizations ###

# Visualization 1 — Monthly Transaction Trend

# Type: Bar Chart

ggplot(upi, aes(Month)) +
  geom_bar(fill="#1f77b4") +
  labs(
    title="Monthly UPI Transaction Volume",
    x="Month",
    y="Number of Transactions"
  ) +
  theme_minimal()

# Visualization 2 — Distribution of Transaction Amounts

# Type: Histogram

ggplot(upi, aes(Amount)) +
  geom_histogram(fill="#2ca02c", bins=30) +
  labs(
    title="Distribution of UPI Transaction Amounts",
    x="Transaction Amount",
    y="Frequency"
  ) +
  theme_minimal()

# Visualization 3 — Transaction Amount by Payment Mode

# Type: Box Plot

ggplot(upi, aes(PaymentMode, Amount, fill=PaymentMode)) +
  geom_boxplot() +
  labs(
    title="Transaction Amount by Payment Mode",
    x="Payment Mode",
    y="Amount"
  ) +
  theme_minimal()

# Visualization 4 — Peak Transaction Hours

# Type: Line Chart

upi %>%
  count(Hour) %>%
  ggplot(aes(Hour, n)) +
  geom_line(color="red", linewidth=1.2) +
  geom_point() +
  labs(
    title="Peak UPI Transaction Hours",
    x="Hour of Day",
    y="Transactions"
  ) +
  theme_minimal()

# Visualization 5 — Transaction Status by Bank

# Type: Stacked Bar Chart

upi %>%
  count(BankNameSent, Status) %>%
  ggplot(aes(BankNameSent, n, fill=Status)) +
  geom_col() +
  coord_flip() +
  labs(
    title="Transaction Status by Sending Bank",
    x="Bank",
    y="Transactions"
  ) +
  theme_minimal()

# Visualization 6 — Merchant vs Revenue

# Type: Horizontal Bar Chart

upi %>%
  group_by(MerchantName) %>%
  summarise(TotalRevenue=sum(Amount)) %>%
  slice_max(TotalRevenue, n=10) %>%
  ggplot(aes(reorder(MerchantName, TotalRevenue), TotalRevenue)) +
  geom_col(fill="purple") +
  coord_flip() +
  labs(
    title="Top 10 Merchants by Revenue",
    x="Merchant",
    y="Revenue"
  ) +
  theme_minimal()

# Visualization 7 — Heatmap of City vs Status

# Type: Heatmap

upi %>%
  count(City, Status) %>%
  ggplot(aes(City, Status, fill=n)) +
  geom_tile() +
  labs(
    title="Transaction Status by City",
    x="City",
    y="Status"
  ) +
  theme_minimal()

# Visualization 8 — Interactive Plotly Chart

# Type: Interactive Scatter Plot

p <- ggplot(upi, aes(CustomerAge, Amount, color=Gender)) +
  geom_point(alpha=0.5) +
  labs(
    title="Interactive Age vs Transaction Amount",
    x="Customer Age",
    y="Transaction Amount"
  ) +
  theme_minimal()

ggplotly(p)
# Average Transaction by Age Group

upi %>%
  group_by(AgeGroup) %>%
  summarise(AvgAmount = mean(Amount)) %>%
  ggplot(aes(AgeGroup, AvgAmount, fill = AgeGroup)) +
  geom_col()

# City-wise Transaction Heatmap

upi %>%
  count(City, Status) %>%
  ggplot(aes(City, Status, fill = n)) +
  geom_tile()

# Merchant Revenue Analysis

upi %>%
  group_by(MerchantName) %>%
  summarise(TotalRevenue = sum(Amount)) %>%
  arrange(desc(TotalRevenue))
## # A tibble: 5 × 2
##   MerchantName TotalRevenue
##   <chr>               <dbl>
## 1 Zomato           3998156.
## 2 Flipkart         3996787.
## 3 IRCTC            3986339.
## 4 Swiggy           3945854.
## 5 Amazon           3945139.
Conclusion

This analysis of UPI transactions provides valuable insights into digital payment behavior and user spending patterns. The study highlights how UPI usage varies across time, customer demographics, payment modes, cities, and merchants.

Key findings from the analysis include:
- Transactions are highest during peak daytime hours, showing strong daily usage patterns.
- Younger age groups (18–34) are the most active users of UPI services.
- Transaction amounts vary significantly across different payment modes.
- A small group of merchants contributes to a large share of total transaction value.
- Certain banks show variations in transaction success and failure rates.
- City-wise analysis reveals differences in transaction volume and status distribution.

Overall, this project demonstrates how data analysis and visualization using R can uncover meaningful business insights from financial transaction data. These insights can help improve customer experience, optimize payment systems, and support better decision-making in the digital payments ecosystem.