/newpage
##Load the following libraries
Warning: package 'dplyr' was built under R version 4.4.2
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, lag
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
Warning: package 'tidyr' was built under R version 4.4.2
Warning: package 'ggplot2' was built under R version 4.4.2
Warning: package 'mice' was built under R version 4.4.2
Attaching package: 'mice'
The following object is masked from 'package:stats':
filter
The following objects are masked from 'package:base':
cbind, rbind
Warning: package 'corrplot' was built under R version 4.4.2
corrplot 0.95 loaded
#Set a work directory
TransactionID CustomerID ProductID Quantity PaymentMethod TransactionDate
1 1 207 14 5 UPI 12/28/2023
2 2 253 15 1 Cash on Delivery 4/17/2023
3 3 110 19 5 Cash on Delivery 1/17/2023
4 4 256 1 3 Debit Card 10/23/2023
5 5 274 2 5 Debit Card 6/30/2023
6 6 52 15 2 Credit Card 6/14/2023
ProductCategory Price Rating TotalAmount Age Gender Location
1 Clothing 33.36 4.2 166.80 50 Male Houston
2 Electronics 389.03 2.2 389.03 19 Female Chicago
3 Clothing 145.89 1.7 729.45 37 Female Houston
4 Home Appliances 215.02 2.7 645.06 50 Male Los Angeles
5 Home Appliances 325.48 3.4 1627.40 24 Male New York
6 Electronics 389.03 2.2 778.06 50 Male Houston
MembershipStatus
1 Basic
2 Premium
3 Basic
4 Premium
5 Premium
6 Premium
'data.frame': 300 obs. of 14 variables:
$ TransactionID : int 1 2 3 4 5 6 7 8 9 10 ...
$ CustomerID : int 207 253 110 256 274 52 191 165 18 169 ...
$ ProductID : int 14 15 19 1 2 15 10 5 5 17 ...
$ Quantity : int 5 1 5 3 5 2 1 5 4 2 ...
$ PaymentMethod : chr "UPI" "Cash on Delivery" "Cash on Delivery" "Debit Card" ...
$ TransactionDate : chr "12/28/2023" "4/17/2023" "1/17/2023" "10/23/2023" ...
$ ProductCategory : chr "Clothing" "Electronics" "Clothing" "Home Appliances" ...
$ Price : num 33.4 389 145.9 215 325.5 ...
$ Rating : num 4.2 2.2 1.7 2.7 3.4 2.2 3.7 1.4 1.4 1.9 ...
$ TotalAmount : num 167 389 729 645 1627 ...
$ Age : int 50 19 37 50 24 50 40 37 25 53 ...
$ Gender : chr "Male" "Female" "Female" "Male" ...
$ Location : chr "Houston" "Chicago" "Houston" "Los Angeles" ...
$ MembershipStatus: chr "Basic" "Premium" "Basic" "Premium" ...
TransactionID CustomerID ProductID Quantity
Min. : 1.00 Min. : 4.0 Min. : 1.00 Min. :1.000
1st Qu.: 75.75 1st Qu.: 77.0 1st Qu.: 6.00 1st Qu.:2.000
Median :150.50 Median :156.5 Median :11.00 Median :3.000
Mean :150.50 Mean :154.8 Mean :10.66 Mean :3.127
3rd Qu.:225.25 3rd Qu.:229.2 3rd Qu.:15.00 3rd Qu.:5.000
Max. :300.00 Max. :299.0 Max. :20.00 Max. :5.000
PaymentMethod TransactionDate ProductCategory Price
Length:300 Length:300 Length:300 Min. : 33.36
Class :character Class :character Class :character 1st Qu.:145.89
Mode :character Mode :character Mode :character Median :215.02
Mean :253.52
3rd Qu.:350.44
Max. :466.49
Rating TotalAmount Age Gender
Min. :1.400 Min. : 33.36 Min. :18.00 Length:300
1st Qu.:2.100 1st Qu.: 350.44 1st Qu.:30.00 Class :character
Median :2.600 Median : 662.55 Median :42.00 Mode :character
Mean :2.928 Mean : 803.12 Mean :41.79
3rd Qu.:3.700 3rd Qu.:1134.50 3rd Qu.:52.25
Max. :5.000 Max. :2332.45 Max. :65.00
NA's :8
Location MembershipStatus
Length:300 Length:300
Class :character Class :character
Mode :character Mode :character
TransactionID CustomerID ProductID Quantity
0 0 0 0
PaymentMethod TransactionDate ProductCategory Price
0 0 0 0
Rating TotalAmount Age Gender
0 0 8 0
Location MembershipStatus
0 0
# Boxplot for Price
Call:
lm(formula = TotalAmount ~ Age + Price + Quantity + Rating, data = data)
Residuals:
Min 1Q Median 3Q Max
-493.37 -98.03 8.40 99.71 447.75
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -725.30851 52.37741 -13.848 <2e-16 ***
Age 0.08560 0.74111 0.116 0.908
Price 3.14353 0.08184 38.412 <2e-16 ***
Quantity 238.95715 6.71914 35.564 <2e-16 ***
Rating -8.46592 9.18348 -0.922 0.357
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 166.5 on 291 degrees of freedom
Multiple R-squared: 0.9076, Adjusted R-squared: 0.9063
F-statistic: 714.4 on 4 and 291 DF, p-value: < 2.2e-16
'data.frame': 283 obs. of 14 variables:
$ TransactionID : int 1 2 3 4 5 6 7 8 9 10 ...
$ CustomerID : int 207 253 110 256 274 52 191 165 18 169 ...
$ ProductID : int 14 15 19 1 2 15 10 5 5 17 ...
$ Quantity : int 5 1 5 3 5 2 1 5 4 2 ...
$ PaymentMethod : chr "UPI" "Cash on Delivery" "Cash on Delivery" "Debit Card" ...
$ TransactionDate : chr "12/28/2023" "4/17/2023" "1/17/2023" "10/23/2023" ...
$ ProductCategory : chr "Clothing" "Electronics" "Clothing" "Home Appliances" ...
$ Price : num 33.4 389 145.9 215 325.5 ...
$ Rating : num 4.2 2.2 1.7 2.7 3.4 2.2 3.7 1.4 1.4 1.9 ...
$ TotalAmount : num 167 389 729 645 1627 ...
$ Age : num 50 19 37 50 24 50 40 37 25 53 ...
$ Gender : chr "Male" "Female" "Female" "Male" ...
$ Location : chr "Houston" "Chicago" "Houston" "Los Angeles" ...
$ MembershipStatus: chr "Basic" "Premium" "Basic" "Premium" ...
- attr(*, "na.action")= 'omit' Named int [1:4] 90 220 244 255
..- attr(*, "names")= chr [1:4] "90" "220" "244" "255"
Two Sample t-test
data: TotalAmount by Gender
t = -2.448, df = 281, p-value = 0.01498
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-286.44204 -31.10043
sample estimates:
mean in group Female mean in group Male
696.0371 854.8084
Welch Two Sample t-test
data: TotalAmount by Gender
t = -2.4726, df = 274.83, p-value = 0.01402
alternative hypothesis: true difference in means between group Female and group Male is not equal to 0
95 percent confidence interval:
-285.1822 -32.3603
sample estimates:
mean in group Female mean in group Male
696.0371 854.8084
Pearson's Chi-squared test
data: table(data$ProductCategory, data$PaymentMethod)
X-squared = 2.2394, df = 6, p-value = 0.8964
Df Sum Sq Mean Sq F value Pr(>F)
MembershipStatus 2 451973 225986 0.762 0.467
Residuals 293 86840140 296383
Female Male
0.04391892 0.42229730 0.53378378