Contact Information:

Email: osei-akoto.kwarteng@nau.edu
Telephone: +1 928 221 1809

Introduction

Consumer behavior entails how people make decisions to spend their available resources on consumption of different products and services (Jisana 2014). Gabbott and Hogg (2016) explained consumer behavior as the set of behaviors that consumers make when searching for, purchasing, using, evaluating, or disposing products and services, in light of satisfying their needs. The act of consumption is an integral part of our daily lives, and this is true whether we have large disposable income or not (Priest, Carter, and Statt 2013; Schiffman et al., n.d.).
This study discusses consumer behavior utilizing descriptive and inferential statistics. The study conceptualizes consumer behavior as the actions individual take when purchasing and using products and services, with the expectation to satisfy needs. A consumer is a person who buys for personal consumption, or buy to meet the collective needs of family, or household.
This study used data from Kaggle. Analysis were made using R. R is a free statistical language with specialized tools or packages which was developed by statisticians as a language and tool for reproducible data analysis. For easier comprehension and interpretation, this study assumes that the reader has basic knowledge in R, has installed R and RStudio on computer (i.e., Windows or Mac), and have loaded basic R packages like tidyverse. R tidyverse package helps in transforming and presenting data. It is open source, meaning free to use, and constantly being modified and improved (Wickham et al. 2019a). Tidyverse include core packages that are mostly used in data analysis including ggplot2 for data visualization, dplyr for data manipulation, tidyr for data tidying, readr for reading rectangular data like csv / tsv / fwf, purr for working with functions and vectors, tibble for data frames, stringr for working with strings, lubridate for working with dates and times, and forcats for working with factors / categorical data (Wickham et al. 2019b).

Objectives

The following objectives guided this study:
1. Explore revenue generated.
2. Understand the relationship between price and revenue generated. 3. Examine how quarter of the year influences revenue generated.
4. Assess the relationship between quantity ordered and revenue generated.
5. Explore the relationship between day of the week and quantity ordered.

Methodology

The data used in this study was downloaded from Kaggle. The sample sales data, henceforth referred to as data, used in this study, has 2,823 rows and 25 columns with detailed observations in storing consumer information. The data was downloaded in xlsx format, and imported into R for analysis and visualization.
Study’s analysis were done in the following progressive approach: data importation, glimpse / overview of imported data, dealing with missing data, data transformation, and analysis.

Results and discussion

Importing data into R

First, data was imported into R using readxl package. Readxl package allow users to load excel spreadsheet files using three basic functions: read_excel() – this reads excel files with both xls and xlsx extensions; read_xls() – this reads excel files with xls extensions; read_xlsx() – this reads excel files with xlsx functions. All these functions read excel files in as tibble. Tibble is a modern way of presenting data frame in R which gives an overview of what the data frame looks like including the number of rows and columns, that is, first 10 rows, best fit columns, and column type / attributes (i.e., numeric, character, date-time, boolean, etc.) (Wickham and Wickham 2016; Wickham et al. 2019c; Wright 2023). The data used in this study had an xlsx extension, signifying we could either import this file into R using read_excel() or read_xlsx(). The file was imported using read_xlsx() function. The first argument in this function is the path to the file. The read_xlsx() function also used the na argument to replace all missing values with NA. This function was then saved to object: data. Specifically, the following code was used to import the xlsx file into R using the readxl package:

#import data into R
data <- read_excel("sample.xlsx", na = c("", "N/A", "NA", "Missing"))

Overview of data

It is considered best to explore the dimensions of any data in R. In this study, after importing data into R, the following functions were used to explore the data: view(), glimpse(), str(), dim(), colnames(), print(width = Inf). View() invokes a spreadsheet style viewer in R. The contents from this object would be shown in a new window. It produces spreadsheet style format with filtering options to view interested variables. Glimpse() from dplyr package in tidyverse allows users to see data in addition to column type / attribute. Thus glimpse() shows the number of rows and columns in a data, in addition to column attributes either numeric, character, or date-time. Str() displays the internal structure of an object. The str() function indicates the type of data, that is, either data frame, tibble, matrix, etc. and also shows the attributes of each column in addition to the number of rows and columns. Dim() function also shows the number of rows and columns in data. Colnames() prints the names of all columns in an object or data. Print() function with width set to Inf prints out the entire data in an object. The width = Inf forces print() to display all columns with first 10 rows in a data. The column type / attribute in this data were either - double / numeric, or - character.

#overview of data
glimpse(data)

## Rows: 2,823
## Columns: 25
## $ ORDERNUMBER      <dbl> 10107, 10121, 10134, 10145, 10159, 10168, 10180, 1018…
## $ QUANTITYORDERED  <dbl> 30, 34, 41, 45, 49, 36, 29, 48, 22, 41, 37, 23, 28, 3…
## $ PRICEEACH        <dbl> 95.70, 81.35, 94.74, 83.26, 100.00, 96.66, 86.13, 100…
## $ ORDERLINENUMBER  <dbl> 2, 5, 2, 6, 14, 1, 9, 1, 2, 14, 1, 7, 2, 2, 1, 6, 9, …
## $ SALES            <dbl> 2871.00, 2765.90, 3884.34, 3746.70, 5205.27, 3479.76,…
## $ ORDERDATE        <dttm> 2003-02-24, 2003-05-07, 2003-07-01, 2003-08-25, 2003…
## $ STATUS           <chr> "Shipped", "Shipped", "Shipped", "Shipped", "Shipped"…
## $ QTR_ID           <dbl> 1, 2, 3, 3, 4, 4, 4, 4, 4, 1, 1, 2, 2, 2, 3, 3, 3, 4,…
## $ MONTH_ID         <dbl> 2, 5, 7, 8, 10, 10, 11, 11, 12, 1, 2, 4, 5, 6, 7, 8, …
## $ YEAR_ID          <dbl> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003,…
## $ PRODUCTLINE      <chr> "Motorcycles", "Motorcycles", "Motorcycles", "Motorcy…
## $ MSRP             <dbl> 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 9…
## $ PRODUCTCODE      <chr> "S10_1678", "S10_1678", "S10_1678", "S10_1678", "S10_…
## $ CUSTOMERNAME     <chr> "Land of Toys Inc.", "Reims Collectables", "Lyon Souv…
## $ PHONE            <chr> "2125557818", "26.47.1555", "+33 1 46 62 7555", "6265…
## $ ADDRESSLINE1     <chr> "897 Long Airport Avenue", "59 rue de l'Abbaye", "27 …
## $ ADDRESSLINE2     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Level 3", "S…
## $ CITY             <chr> "NYC", "Reims", "Paris", "Pasadena", "San Francisco",…
## $ STATE            <chr> "NY", NA, NA, "CA", "CA", "CA", NA, NA, "CA", NA, "Vi…
## $ POSTALCODE       <chr> "10022", "51100", "75508", "90003", NA, "94217", "590…
## $ COUNTRY          <chr> "USA", "France", "France", "USA", "USA", "USA", "Fran…
## $ TERRITORY        <chr> NA, "EMEA", "EMEA", NA, NA, NA, "EMEA", "EMEA", NA, "…
## $ CONTACTLASTNAME  <chr> "Yu", "Henriot", "Da Cunha", "Young", "Brown", "Hiran…
## $ CONTACTFIRSTNAME <chr> "Kwai", "Paul", "Daniel", "Julie", "Julie", "Juri", "…
## $ DEALSIZE         <chr> "Small", "Small", "Medium", "Medium", "Medium", "Medi…

Missing data

Having a general idea or overview of data provide a roadmap of how analysis should proceed. However, general idea or overview of data is not enough in offering the structure of any dataset. Thus, combining this with missing values offers a somewhat in-depth understanding of one’s data. In this study, missing values were replaced with NA during import. This is best as it gives one specific value for all missing values. Having a general summary of missing values in dataset, and complete dataset is key in understanding sample. Specifically, in finding the total number of missing values in data, R provides useful functions like is.na() which could be paired with sum() to find the total number of missing values in any data set or object.

#total number of missing values
sum(is.na(data))

## [1] 5157

Thus sum(is.na(data)) prints 5,157. This means there are a total of 5,157 values or cells with NA in data. Further, we could find the number of rows with complete cases or no missing values in them. For this, we can use the complete.cases() function.

#total number of rows without NAs
sum(complete.cases(data))

## [1] 147

The complete.cases() function prints out 147. This means there are 147 rows from 2,823 rows with complete cases or no missing values. Similarly, we could also find the number of rows which are not complete or with missing values in them. This is done by placing ! in front of function, that is, sum(!complete.cases(data)).

#total number of rows with at least an NA
sum(!complete.cases(data))

## [1] 2676

This prints out 2,676, meaning there are 2,676 rows with at least one missing value.
Thus, the data in this study has 2,823 observations, 147 complete observations with no missing value (s), and 2,676 observations with at least one missing value.

Data transformation

Data transformation basically entails the process of converting, structuring, and cleansing data into usable format for easier manipulation and analysis (Osborne 2002). For easier analysis, column names were changed from uppercase to lowercase. This was done with the aid of clean_names() from the janitor package. Clean_names() changed all column names to snake cases. In this study, ordernumber column was changed from numeric to character, orderdate column was changed from character to date, month_id column was changed from numeric to ordered, a new column was created called day_of_week, this stored the days values from the orderdate in names of week rather than numbers, and dealsize column was changed from character to factor with three levels: Large Medium Small. The following code was used to transform the data and saved into object: data

#clean and transform data
data <- data %>%
  clean_names() %>%
  mutate(ordernumber = as.character(ordernumber),
         orderdate = as_date(ymd(orderdate)),
         month_id = month(orderdate, label = TRUE, abbr = FALSE),
         day_of_week = wday(orderdate, label = TRUE, abbr = FALSE),
         dealsize = factor(dealsize),
         qtr_id = ifelse(month_id == "January" | month_id == "February" | month_id == "March", "1st Quarter", qtr_id),
         qtr_id = ifelse(month_id == "April" | month_id == "May" | month_id == "June", "2nd Quarter", qtr_id),
         qtr_id = ifelse(month_id == "July" | month_id == "August" | month_id == "September", "3rd Quarter", qtr_id),
         qtr_id = ifelse(month_id == "October" | month_id == "November" | month_id == "December", "4th Quarter", qtr_id),
         qtr_id = factor(qtr_id),
         year_id = factor(year_id),
         country = factor(country),
         productline = factor(productline),
         status = factor(status))

#overview of data
glimpse(data)

## Rows: 2,823
## Columns: 26
## $ ordernumber      <chr> "10107", "10121", "10134", "10145", "10159", "10168",…
## $ quantityordered  <dbl> 30, 34, 41, 45, 49, 36, 29, 48, 22, 41, 37, 23, 28, 3…
## $ priceeach        <dbl> 95.70, 81.35, 94.74, 83.26, 100.00, 96.66, 86.13, 100…
## $ orderlinenumber  <dbl> 2, 5, 2, 6, 14, 1, 9, 1, 2, 14, 1, 7, 2, 2, 1, 6, 9, …
## $ sales            <dbl> 2871.00, 2765.90, 3884.34, 3746.70, 5205.27, 3479.76,…
## $ orderdate        <date> 2003-02-24, 2003-05-07, 2003-07-01, 2003-08-25, 2003…
## $ status           <fct> Shipped, Shipped, Shipped, Shipped, Shipped, Shipped,…
## $ qtr_id           <fct> 1st Quarter, 2nd Quarter, 3rd Quarter, 3rd Quarter, 4…
## $ month_id         <ord> February, May, July, August, October, October, Novemb…
## $ year_id          <fct> 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003, 2003,…
## $ productline      <fct> Motorcycles, Motorcycles, Motorcycles, Motorcycles, M…
## $ msrp             <dbl> 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 95, 9…
## $ productcode      <chr> "S10_1678", "S10_1678", "S10_1678", "S10_1678", "S10_…
## $ customername     <chr> "Land of Toys Inc.", "Reims Collectables", "Lyon Souv…
## $ phone            <chr> "2125557818", "26.47.1555", "+33 1 46 62 7555", "6265…
## $ addressline1     <chr> "897 Long Airport Avenue", "59 rue de l'Abbaye", "27 …
## $ addressline2     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "Level 3", "S…
## $ city             <chr> "NYC", "Reims", "Paris", "Pasadena", "San Francisco",…
## $ state            <chr> "NY", NA, NA, "CA", "CA", "CA", NA, NA, "CA", NA, "Vi…
## $ postalcode       <chr> "10022", "51100", "75508", "90003", NA, "94217", "590…
## $ country          <fct> USA, France, France, USA, USA, USA, France, Norway, U…
## $ territory        <chr> NA, "EMEA", "EMEA", NA, NA, NA, "EMEA", "EMEA", NA, "…
## $ contactlastname  <chr> "Yu", "Henriot", "Da Cunha", "Young", "Brown", "Hiran…
## $ contactfirstname <chr> "Kwai", "Paul", "Daniel", "Julie", "Julie", "Juri", "…
## $ dealsize         <fct> Small, Small, Medium, Medium, Medium, Medium, Small, …
## $ day_of_week      <ord> Monday, Wednesday, Tuesday, Monday, Friday, Tuesday, …

Revenue generated

Revenue generated in each year changed, however, revenues generated exceeded expected revenue, that is, MSRP / retail price * quantity ordered. From 2003 to 2005, revenue dropped by 49%, this equated to over $1.7M. Thus, although, revenue dropped by 49%, average revenue in same period increased by 6.6% (i.e., from $3517 to $3748).

#revenue generated
q1 <- data %>%
  select(sales, msrp, quantityordered, year_id) %>%
  mutate(sales_expected = msrp * quantityordered) %>%
  group_by(year_id) %>%
  summarise(revenue = round(sum(sales), digits = 2),
            expected_revenue = round(sum(sales_expected), digits = 2))

#bar chart of revenue
plot_ly(
        q1,
        x = ~year_id,
        y= ~revenue,
        name = "Revenue Chart",
        type = "bar"
        ) %>%
  add_trace(
    q1,
    x = ~year_id,
    y = ~revenue,
    type = "scatter",
    mode = "lines",
    name = "Revenue Line"
  )

#mean plot of revenue
q11 <- data %>%
  select(sales, msrp, quantityordered, year_id) %>%
  mutate(sales_expected = msrp * quantityordered) %>%
  group_by(year_id) %>%
  summarise(n = n(),
            mean = mean(sales, na.rm = TRUE),
            sd = sd(sales, na.rm = TRUE),
            se = sd / sqrt(n),
            ci = qt(0.975, df = n - 1) * sd / sqrt(n))

plot <- ggplot(q11, aes(x = year_id, y = mean, group = 1)) +
  geom_point(size = 3) +
  geom_line() +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = .1) +
  ggtitle("Mean Plot of Average Revenue by Year") +
  theme_classic()
ggplotly(plot)

Further, among the 19 countries that consumers were found, consumers from USA made up 36% of revenue generated, this was over $3.6M. Spain was next with over $1.2M and revenue share of 12%. France accounted for the third, and was 11% or $1.1M. Australia came next with 6% and over $630,000, then by UK (5%, $478,000). These 5 countries made up about 70% of entire revenue generated. The remaining 30% of consumers were from different countries including Italy, Finland, Norway, Singapore, Canada, Denmark, Germany, Sweden, Austria, Japan, Belgium, Switzerland, Philippines, and Ireland. Denmark recorded the highest average revenue generated, followed by Switzerland, Sweden, Austria, and Singapore. These countries were in the 0-50th percentile in terms of revenue generated, nonetheless, constituted the upper 25th percentile, that is, over 75th percentile with respect to countries with highest average revenue.

#average revenue generated by country
q2 <- data %>%
  select(sales, country) %>%
  group_by(country) %>%
  summarise(n = n(),
            sum = sum(sales),
            mean = mean(sales, na.rm = TRUE),
            sd = sd(sales, na.rm = TRUE),
            se = sd / sqrt(n),
            ci = qt(0.975, df = n - 1) * sd / sqrt(n)) %>%
  arrange(desc(sum))

q22 <- q2 %>%
  arrange(desc(mean)) 

q22 <- q2 %>%
  arrange(desc(mean)) %>%
  filter(mean > 3635)
#quantile(q22$mean, probs = seq(0, 1, 1/4))
q22 <- q22[1:5, ]

plot <- ggplot(q22, aes(x = country, y = mean, group = 1)) +
  geom_point(size = 3) +
  geom_line() +
  geom_errorbar(aes(ymin = mean - se, ymax = mean + se), width = .1) +
  ggtitle("Mean Plot of Average Revenue by Countries") +
  theme_classic()
ggplotly(plot)

Relationship between price and revenue

As revenue generated decreased by 49%, average price also decreased by 0.42%, thus from $83.79 to $83.45. The Pearson correlation between price and revenue was 0.7. This signified a positive correlation between these variables, thus, as price increased, revenue generated also increased, and vice versa. The Pearson product-moment correlation test between these two variables came back significant with a p value less than 0.05. This illustrated a linear relationship between these variables, thus, price and revenue.

#correlation between price and revenue
plot <- ggplot(data, aes(x = priceeach, y = sales)) +
  geom_point() +
  stat_smooth(method = "lm",
              formula = y ~ x,
              geom = "smooth",
              linewidth = 2) +
  labs(title = "Correlation b/n price and revenue") +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5)) +
  theme_classic()

ggplotly(plot)

Influences of quarter of the year on revenue

Quarter 4 generated more revenue in 2003 and 2004 than any other quarter. For this reason, the study used student’s t-test to compare the means between 4th quarter revenue in 2003 and 4th quarter revenue in 2004 to determine whether there is a significant relationship. The test came back with a p value greater than 0.05. This indicated unequal relationship between the means in these samples, and thus attested to the conclusion that 4th quarter of the year does not influence revenue generated. Further, from the box plot chart of revenue and quarter, the plots overlapped with each other indicating no significant difference among groups.

plot <- data %>%
  select(year_id, sales, qtr_id) %>%
  group_by(year_id, qtr_id) %>%
  summarise(revenue = sum(sales)) %>%
  ggplot(aes(x = year_id, y = revenue)) +
  geom_bar(aes(fill = qtr_id), position = "dodge", stat = "identity") +
  theme_classic()

## `summarise()` has grouped output by 'year_id'. You can override using the
## `.groups` argument.

ggplotly(plot)

box_plot <- ggplot(data, aes(x = qtr_id, y = sales, fill = qtr_id)) +
   geom_boxplot() +
   facet_wrap(~year_id, scale = "free") +
   ggtitle("Box Plot of revenue and quarter") +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank(),
        axis.text.y=element_blank(),
        axis.ticks.y=element_blank()) 
ggplotly(box_plot)

qq1 <- data %>%
  select(sales, qtr_id, year_id) %>%
  filter(qtr_id == "4th Quarter" & year_id == 2003)

qq11 <- data %>%
  select(sales, qtr_id, year_id) %>%
  filter(qtr_id == "4th Quarter" & year_id == 2004)

tt <- t.test(qq1$sales, qq11$sales, var.equal = TRUE)
alpha = 0.05
if(tt$p.value < alpha) {
  print("4th quarter of 2003 and 2004 influences revenue generated in same period")
} else {
  print("4th quarter of 2003 and 2004 does not influence revenue generated in same period")
}

## [1] "4th quarter of 2003 and 2004 does not influence revenue generated in same period"

Relationship between quantity ordered and revenue generated

The quantity ordered and revenue generated had a significant relationship (p < 0.05) and a positive Pearson correlation coefficient of 0.6. This in simple terms mean that, as quantity ordered increases, revenue generated also increases.

test <- cor.test(data$quantityordered, data$sales)
alpha = 0.05
if(test$p.value < alpha) {
  print("Significant relationship between variables")
} else {
  print("No significant relationship between variable")
}

## [1] "Significant relationship between variables"

if(test$p.value > 0) {
  print("There is a positive relationship")
} else {
  print ("There is a negative relationship")
}

## [1] "There is a positive relationship"

plot <- ggplot(data, aes(x = quantityordered, y = sales)) +
  geom_point() +
  stat_smooth(method = "lm",
              formula = y ~ x,
              geom = "smooth",
              linewidth = 2) +
  labs(title = "Correlation b/n quantity ordered and revenue") +
  theme(legend.position = "none",
        plot.title = element_text(hjust = 0.5)) +
  theme_classic()

ggplotly(plot)

Relationship between day of the week and quantity ordered

Using analysis of variance (ANOVA) to test this relationship in determining whether varying days of the week had influences on quantity ordered, the study found that, days of the week had significant relationship / influence to quantity ordered since the test came back with a p value less than 0.05. Thus, the test showed that the means among these groups are not equal and thus there exist significant relationship between which day of the week and quantity ordered. Specifically, consumers tend to order more on Fridays, Wednesdays, and Thursdays. This made up 60% of all quantities ordered during the week (over 59,000 quantities). Similarly, Tuesdays and Mondays made up about 31% of quantities ordered during the week (over 31,000 quantities), and weekends, thus, Saturdays and Sundays made up 9% (over 8,600 quantities). Saturdays and Sundays were the days consumers ordered less.

yyy <- data %>%
  select(day_of_week, quantityordered) %>%
  mutate(day_of_week = ifelse(day_of_week == "Sunday", 1, day_of_week),
         day_of_week = ifelse(day_of_week == "Monday", 2, day_of_week),
         day_of_week = ifelse(day_of_week == "Tuesday", 3, day_of_week),
         day_of_week = ifelse(day_of_week == "Wednesday", 4, day_of_week),
         day_of_week = ifelse(day_of_week == "Thursday", 5, day_of_week),
         day_of_week = ifelse(day_of_week == "Friday", 6, day_of_week),
         day_of_week = ifelse(day_of_week == "Saturday", 7, day_of_week))

ee <- aov(yyy$quantityordered ~ factor(yyy$quantityordered))
print(summary(ee))

##                               Df Sum Sq Mean Sq   F value Pr(>F)    
## factor(yyy$quantityordered)   57 267796    4698 1.587e+28 <2e-16 ***
## Residuals                   2765      0       0                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot <- ggplot(data) +
  aes(x = day_of_week, y = quantityordered, color = day_of_week) +
  geom_jitter() +
  labs(title = "Relationship between day of week and quantity ordered") +
  theme_classic() +
  theme(legend.position = "none") 

ggplotly(plot)

p2 <- data %>%
  select (day_of_week, quantityordered) %>%
  group_by(day_of_week) %>%
  summarise(sum = sum(quantityordered)) %>%
  arrange(desc(sum)) %>%
  ggplot(aes(x = reorder(day_of_week, -sum), y = sum)) +
  geom_bar(stat = "identity")
ggplotly(p2)

Conclusion

This study would benefit from further inferential analysis making it possible to detect important differences in variables, and correlations that are relevant (Marshall and Jonker 2011). Further, consumer segmentation could be applied to organize consumers into groups based on shared behaviors, characteristics, or preferences (Westad, Hersleth, and Lea 2004a). This is mostly done with the purpose of delivering more relevant experiences based on consumer data (Westad, Hersleth, and Lea 2004b; Sahmer, Vigneau, and Qannari 2006). In addition, predictive analysis could be employed to make predictions based on consumer data (Surendro 2019).

References

Gabbott, Mark, and Gill Hogg. 2016. “Consumer Behaviour.” In, 150–65. Routledge.

Jisana, T. K. 2014. “Consumer Behaviour Models: An Overview.” Sai Om Journal of Commerce & Management 1 (5): 34–43.

Marshall, Gill, and Leon Jonker. 2011. “An Introduction to Inferential Statistics: A Review and Practical Guide.” Radiography 17 (1): e1–6.

Osborne, Jason. 2002. “Notes on the Use of Data Transformations.” Practical Assessment, Research, and Evaluation 8 (1): 6.

Priest, Jane, Stephen Carter, and David A. Statt. 2013. “Consumer Behaviour.” Edinburgh Business School, HariotWatt University, UK.

Sahmer, Karin, Evelyne Vigneau, and El Mostafa Qannari. 2006. “A Cluster Approach to Analyze Preference Data: Choice of the Number of Clusters.” Food Quality and Preference 17 (3-4): 257–65.

Schiffman, Leon, David Bednall, Elizabeth Cowley, Aron O. Cass, Judith Watson, and Leslie Kanuk. n.d. Consumer Behaviour Prentice Hall Australia. Citeseer.

Surendro, Kridanto. 2019. “Predictive Analytics for Predicting Customer Behavior.” In, 230–33. IEEE.

Westad, F., M. Hersleth, and P. Lea. 2004a. “Strategies for Consumer Segmentation with Applications on Preference Data.” Food Quality and Preference 15 (7-8): 681–87.

———. 2004b. “Strategies for Consumer Segmentation with Applications on Preference Data.” Food Quality and Preference 15 (7-8): 681–87.

Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, Alex Hayes, Lionel Henry, and Jim Hester. 2019a. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686.

———. 2019b. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686.

———. 2019c. “Welcome to the Tidyverse.” Journal of Open Source Software 4 (43): 1686.

Wickham, Hadley, and Hadley Wickham. 2016. Data Analysis. Springer.

Wright, Liam. 2023. “Many Models in r: A Tutorial.”

Applying statistical methods to understanding consumer data using sample sales data in R

Osei Akoto Kwarteng