/newpage

##Load the following libraries

Warning: package 'dplyr' was built under R version 4.4.2

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
Warning: package 'tidyr' was built under R version 4.4.2
Warning: package 'ggplot2' was built under R version 4.4.2
Warning: package 'mice' was built under R version 4.4.2

Attaching package: 'mice'
The following object is masked from 'package:stats':

    filter
The following objects are masked from 'package:base':

    cbind, rbind
Warning: package 'corrplot' was built under R version 4.4.2
corrplot 0.95 loaded

#Set a work directory

Inspect the dataset

  TransactionID CustomerID ProductID Quantity    PaymentMethod TransactionDate
1             1        207        14        5              UPI      12/28/2023
2             2        253        15        1 Cash on Delivery       4/17/2023
3             3        110        19        5 Cash on Delivery       1/17/2023
4             4        256         1        3       Debit Card      10/23/2023
5             5        274         2        5       Debit Card       6/30/2023
6             6         52        15        2      Credit Card       6/14/2023
  ProductCategory  Price Rating TotalAmount Age Gender    Location
1        Clothing  33.36    4.2      166.80  50   Male     Houston
2     Electronics 389.03    2.2      389.03  19 Female     Chicago
3        Clothing 145.89    1.7      729.45  37 Female     Houston
4 Home Appliances 215.02    2.7      645.06  50   Male Los Angeles
5 Home Appliances 325.48    3.4     1627.40  24   Male    New York
6     Electronics 389.03    2.2      778.06  50   Male     Houston
  MembershipStatus
1            Basic
2          Premium
3            Basic
4          Premium
5          Premium
6          Premium
'data.frame':   300 obs. of  14 variables:
 $ TransactionID   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID      : int  207 253 110 256 274 52 191 165 18 169 ...
 $ ProductID       : int  14 15 19 1 2 15 10 5 5 17 ...
 $ Quantity        : int  5 1 5 3 5 2 1 5 4 2 ...
 $ PaymentMethod   : chr  "UPI" "Cash on Delivery" "Cash on Delivery" "Debit Card" ...
 $ TransactionDate : chr  "12/28/2023" "4/17/2023" "1/17/2023" "10/23/2023" ...
 $ ProductCategory : chr  "Clothing" "Electronics" "Clothing" "Home Appliances" ...
 $ Price           : num  33.4 389 145.9 215 325.5 ...
 $ Rating          : num  4.2 2.2 1.7 2.7 3.4 2.2 3.7 1.4 1.4 1.9 ...
 $ TotalAmount     : num  167 389 729 645 1627 ...
 $ Age             : int  50 19 37 50 24 50 40 37 25 53 ...
 $ Gender          : chr  "Male" "Female" "Female" "Male" ...
 $ Location        : chr  "Houston" "Chicago" "Houston" "Los Angeles" ...
 $ MembershipStatus: chr  "Basic" "Premium" "Basic" "Premium" ...
 TransactionID      CustomerID      ProductID        Quantity    
 Min.   :  1.00   Min.   :  4.0   Min.   : 1.00   Min.   :1.000  
 1st Qu.: 75.75   1st Qu.: 77.0   1st Qu.: 6.00   1st Qu.:2.000  
 Median :150.50   Median :156.5   Median :11.00   Median :3.000  
 Mean   :150.50   Mean   :154.8   Mean   :10.66   Mean   :3.127  
 3rd Qu.:225.25   3rd Qu.:229.2   3rd Qu.:15.00   3rd Qu.:5.000  
 Max.   :300.00   Max.   :299.0   Max.   :20.00   Max.   :5.000  
                                                                 
 PaymentMethod      TransactionDate    ProductCategory        Price       
 Length:300         Length:300         Length:300         Min.   : 33.36  
 Class :character   Class :character   Class :character   1st Qu.:145.89  
 Mode  :character   Mode  :character   Mode  :character   Median :215.02  
                                                          Mean   :253.52  
                                                          3rd Qu.:350.44  
                                                          Max.   :466.49  
                                                                          
     Rating       TotalAmount           Age           Gender         
 Min.   :1.400   Min.   :  33.36   Min.   :18.00   Length:300        
 1st Qu.:2.100   1st Qu.: 350.44   1st Qu.:30.00   Class :character  
 Median :2.600   Median : 662.55   Median :42.00   Mode  :character  
 Mean   :2.928   Mean   : 803.12   Mean   :41.79                     
 3rd Qu.:3.700   3rd Qu.:1134.50   3rd Qu.:52.25                     
 Max.   :5.000   Max.   :2332.45   Max.   :65.00                     
                                   NA's   :8                         
   Location         MembershipStatus  
 Length:300         Length:300        
 Class :character   Class :character  
 Mode  :character   Mode  :character  
                                      
                                      
                                      
                                      
   TransactionID       CustomerID        ProductID         Quantity 
               0                0                0                0 
   PaymentMethod  TransactionDate  ProductCategory            Price 
               0                0                0                0 
          Rating      TotalAmount              Age           Gender 
               0                0                8                0 
        Location MembershipStatus 
               0                0 

Detect outliers using the IQR method

Remove rows with NA values after outlier handling

Histogram for Age

# Boxplot for Price


Call:
lm(formula = TotalAmount ~ Age + Price + Quantity + Rating, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-493.37  -98.03    8.40   99.71  447.75 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) -725.30851   52.37741 -13.848   <2e-16 ***
Age            0.08560    0.74111   0.116    0.908    
Price          3.14353    0.08184  38.412   <2e-16 ***
Quantity     238.95715    6.71914  35.564   <2e-16 ***
Rating        -8.46592    9.18348  -0.922    0.357    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 166.5 on 291 degrees of freedom
Multiple R-squared:  0.9076,    Adjusted R-squared:  0.9063 
F-statistic: 714.4 on 4 and 291 DF,  p-value: < 2.2e-16

'data.frame':   283 obs. of  14 variables:
 $ TransactionID   : int  1 2 3 4 5 6 7 8 9 10 ...
 $ CustomerID      : int  207 253 110 256 274 52 191 165 18 169 ...
 $ ProductID       : int  14 15 19 1 2 15 10 5 5 17 ...
 $ Quantity        : int  5 1 5 3 5 2 1 5 4 2 ...
 $ PaymentMethod   : chr  "UPI" "Cash on Delivery" "Cash on Delivery" "Debit Card" ...
 $ TransactionDate : chr  "12/28/2023" "4/17/2023" "1/17/2023" "10/23/2023" ...
 $ ProductCategory : chr  "Clothing" "Electronics" "Clothing" "Home Appliances" ...
 $ Price           : num  33.4 389 145.9 215 325.5 ...
 $ Rating          : num  4.2 2.2 1.7 2.7 3.4 2.2 3.7 1.4 1.4 1.9 ...
 $ TotalAmount     : num  167 389 729 645 1627 ...
 $ Age             : num  50 19 37 50 24 50 40 37 25 53 ...
 $ Gender          : chr  "Male" "Female" "Female" "Male" ...
 $ Location        : chr  "Houston" "Chicago" "Houston" "Los Angeles" ...
 $ MembershipStatus: chr  "Basic" "Premium" "Basic" "Premium" ...
 - attr(*, "na.action")= 'omit' Named int [1:4] 90 220 244 255
  ..- attr(*, "names")= chr [1:4] "90" "220" "244" "255"

    Two Sample t-test

data:  TotalAmount by Gender
t = -2.448, df = 281, p-value = 0.01498
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -286.44204  -31.10043
sample estimates:
mean in group Female   mean in group Male 
            696.0371             854.8084 

    Welch Two Sample t-test

data:  TotalAmount by Gender
t = -2.4726, df = 274.83, p-value = 0.01402
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
 -285.1822  -32.3603
sample estimates:
mean in group Female   mean in group Male 
            696.0371             854.8084 

    Pearson's Chi-squared test

data:  table(data$ProductCategory, data$PaymentMethod)
X-squared = 2.2394, df = 6, p-value = 0.8964
                  Df   Sum Sq Mean Sq F value Pr(>F)
MembershipStatus   2   451973  225986   0.762  0.467
Residuals        293 86840140  296383               


               Female       Male 
0.04391892 0.42229730 0.53378378