Dataset

This is the dataset I have selected from a website called Kaggle. This dataset is called blood_transfusion. To demonstrate the RFMTC marketing model (a modified version of RFM), this study adopted the donor database of Blood Transfusion Service Center in Hsin-Chu City in Taiwan. The center passes their blood transfusion service bus to one university in Hsin-Chu City to gather blood donated about every three months. These 748 donor data, each one included R (Recency - months since last donation), F (Frequency - total number of donation), M (Monetary - total blood donated in c.c.), T (Time - months since first donation), and a binary variable representing whether he/she donated blood in March 2007 (1 stand for donating blood; 0 stands for not donating blood). https://www.kaggle.com/datasets/whenamancodes/blood-transfusion-dataset/data.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

What does each column represent? What variables?

colnames(blood_transfusion)
## [1] "Recency"   "Frequency" "Monetary"  "Time"      "Class"
ncol(blood_transfusion)
## [1] 5

What does each row represent? How many rows/observations?

nrow(blood_transfusion)
## [1] 748

Brief descriptive analytics, such as the missing values proportions

missing <- colSums(is.na(blood_transfusion)) / nrow(blood_transfusion)
missing
##   Recency Frequency  Monetary      Time     Class 
##         0         0         0         0         0
sum(is.na(blood_transfusion))
## [1] 0

What are the dimensions of this data?

dim(blood_transfusion)
## [1] 748   5

Check out the first 10 rows

head(blood_transfusion, 10)
## # A tibble: 10 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1       2        50    12500    98 donated    
##  2       0        13     3250    28 donated    
##  3       1        16     4000    35 donated    
##  4       2        20     5000    45 donated    
##  5       1        24     6000    77 not donated
##  6       4         4     1000     4 not donated
##  7       2         7     1750    14 donated    
##  8       1        12     3000    35 not donated
##  9       2         9     2250    22 donated    
## 10       5        46    11500    98 donated

Check out the last 10 rows

tail(blood_transfusion, 10)
## # A tibble: 10 × 5
##    Recency Frequency Monetary  Time Class      
##      <dbl>     <dbl>    <dbl> <dbl> <chr>      
##  1      23         1      250    23 not donated
##  2      23         4     1000    52 not donated
##  3      23         1      250    23 not donated
##  4      23         7     1750    88 not donated
##  5      16         3      750    86 not donated
##  6      23         2      500    38 not donated
##  7      21         2      500    52 not donated
##  8      23         3      750    62 not donated
##  9      39         1      250    39 not donated
## 10      72         1      250    72 not donated