Visualization Dashboard

Row

Customer Analysis

Customer

Fig1: Marketing Campaign

2205

Fig1: Average Number of Purchases

Row

Fig2: Marital status based on the level of eduaction

Fig3: Average amount spend/household based on the number of kids

Fig4: Average amount spend/houshold based on the number of teens

Row

Fig5: Income Vs Total amount spend based on Complain

Fig6: Income Vs Total amount spend based on Response

Report; Data Visualization for a Food Company

Data Visualization for a Food Company Student Name: Nadesan Aneesha Student Number: S3941371 URL:

Introduction Data visualization combines scientific information, information visualization, and visual analytics to encompass designing and analysis of the visual representation of data. In other words, Data visualization is the analytics technique of communicating information through graphics or visual medium. Consuming large data sets has never been a straightforward process. Sometimes, the data sets are so large and so unstructured to the point that it becomes downright impossible to discern anything useful from the data, this is why data visualization come in. There, are thousands of different entries, all in different formats and one has to create a visualization from scratch. So, it is possible with data visualization but not simple. A generalized data visualization involves different disciplines such as information technology, natural science, statistical analysis, graphics interaction, and geographic information. In this paper, the focus is on statistical analysis also called visual analytics. Visual analytics is a new field that has evolved with the development of scientific visualization and information visualization. This has an emphasis on analytic reasoning through an interactive visual interface this aids one drill down processes in any data visualization project. The amount of information that humans gain through data vision is far beyond any sensory organ. Data visualization is the use of human natural skills to enhance data processing and organization efficiency. Visualization can help humans deal with more complex information and enhance its place in human memory. The simplest understanding of data visualization is the mapping straight from the data space to the graphics space, this is where data visualization tools come in. In this paper, the data visualization tool that is used is R. R-software will be used to visualize marketing data to study and analyze customer behaviors to maximize sales for a food company. The objectives of the visual analytics in this paper are; a) To determine how income is related to the total amount of customer spending on buying products. b) To determine whether complaints and customers’ response to offers during the campaign affects the amount of customer spending on buying products. c) To determine how the number of kids and teenagers in the household affects the average amount of spending on buying products. d) T his paper seeks to visualize data and offer to advice and recommendations to the food company on how to improve sales and thus, increase purchases.

Data and Method The data was obtained from the open public source, GitHub (Boaz, 2022) at https://github.com/nailson/ifood-data-business-analyst-test/blob/master/ifood_df.csv. The data contains 2205 observations or customers that participated in data provision. The data has the following variables as shown in the table below.

Variable Description AcceptedCmp1 1 if a customer accepted the offer during the first campaign, or 0 otherwise. AcceptedCmp2 1 if a customer accepted the offer during the first campaign, or 0 otherwise. AcceptedCmp3 1 if a customer accepted the offer during the first campaign, or 0 otherwise. AcceptedCmp4 1 if a customer accepted the offer during the first campaign, or 0 otherwise. AcceptedCmp5 1 if a customer accepted the offer during the first campaign, or 0 otherwise. Response 1 if the customer accepted the offer during the last campaign, or 0 otherwise. Complain 1 if the customer complained in the last 2 years. DtCustomer The date the customer enrolled with the company Education The level of education of the customer Marital The marital status of the customer Kidhome The number of small children in the customer’s household Teenhome The number of teenagers in the customer’s household Income The yearly household income of the customer. MntFishProducts Amount spend on fish products in the last 2 years MntMeatProducts Amount spend on the meat products in the last 2 years MntFruits Amount spend on fruits in the last 2 years MntSweetProducts Amount spend on sweet products in the last 2 years MntWines The amount spend on wine in the last two years MintGoldProducts The amount spend on gold products in the last 2 years NumDealsPurchases Number of purchases made with discount NumCatalogPurchases Number of purchases made using catalog NumStorePurchases Number of purchases made directly in stores NumWebPurchases Number of purchases made through the company’s website NumWebVisitsMonth Number of visits to the company’s website in the last month Recency Number of days since the last purchase. MntTotal The total amount spend on fish, meat, fruits, sweet, and wine products.

To enable easy analysis and achieve the desired objective, the following preprocessing steps were carried out in R: Creating a categorical variable (Marital) with 5 levels (Married, Widow, Together, single, and Divorced); creating a categorical variable (Education) with 5 levels (2n_cycle, Basic, Graduation, Master, and phD); creating a new numeric variable that sums up the number of purchases for all purchasing options (Totpuchases); creation of 2 data frames with the average amount of spending grouped by the number of kids per household and the other data frame with the number of teenagers per household.
To meet the first objective “To determine how income related to the total amount of customer spending on buying products” two numerical variables total amount spend and income were used to create a combination of a scatter plot and a slope line were used. The same method (visualization) was used for objective 2. For objective 3, bar carts were used to visualize the average amount spend based on the number of kids and teenagers in the household.

Visualizations and discussion

To plot the information shown in figure 1 stacked bar chart could have been used by stacking the level of education on top of each other for each marital status. However, stacked bar charts make visualization too complex and difficult to comprehend (Hehman & Xie, 2021). Instead, a grouped simple bar chart was used. This type of visualization makes the information easier and clear to comprehend. This is because humans understand values represented by length better (Heer & Bostock, 2010) & (Cleveland & McGill, 1985). The audience which is the management of the food company will easily understand the information in simple bar charts than from stacked bar charts. Figure 1, indicates that the majority of the customers that buy food from the company are university graduates, of this group majority are married, the second majority are single and those married but living together. Masters and Ph.D. customers are the second majority, of these groups, married people are the majority once more. Therefore, for the food company to increase sales quicker, they need to target university graduates, masters, Ph.D., married and single customers. This can be done through advertisement and creating products that target these groups of customers.

Figure 2 indicates that 2205 customers were involved in the collection of data for the food company. It also indicates that a 14.9 average number of purchases were made using various purchasing options (catalog, website, discount, and from stores). The figure also indicates that this average is a warning to the food company and thus they need to improve on sales to up the average number of purchases made in the last 2 years. This can be done through advertisements, offering discounts, and creating affordable products. To investigate how the number of kids and teenagers in the customer’s household affects the average amount that a customer spends, bar charts were used, figures 3 and 4. Here pier charts could be used as well but pie charts employ area, arc length, and angle to present information on slices of a circle. Among the three parameters, humans are only good at estimating the arc length but not good at estimating angles and area. Pie charts were recommended by (Few & Edge, 2007) but they should not be prioritized. From figure 3, customers with kids at home are indeed spending very less buying the products from the food company. This may be due to many reasons including the cost, and if the products can also be consumed by kids. However, from the data dictionary, the food company produces a variety of products including fish products. Therefore, diversity is met for this food company. The only problem may be, the cost of the products and the specificity of the products concerning customer preference.

Figure 4, indicates that more parents with 1 or 2 teenagers are spending more on average to buy products from the food company than parents with kids. The difference in the average amount spent by parents with teenagers and those without teenagers is smaller compared to those with kids and without kids. Here, there is hope, if the company chooses to put more effort into the parents with teenagers at home, the company is likely to achieve more sales quicker.

To determine how income is related to the total amount the customer spends on products in the last 2 years. The two variables, income and total amount spent are loaded with huge numerical data and the use of a line chart can be quite misleading given that there is variation in the data. Using a line chat is, therefore, not a good idea. (Soma, 2016) applied a line chart that follows each data point to visualize the relationship between two numerical variables and the results were not quite impressive, in this scenario a scatter plot or a combination of scatter plots and a slope line chart does better (Hehman & Xie, 2021).

Figure 5 indicated that generally, income and amount spend share a positive linear relationship, however, there is one instance where a lower earning customer below $5000 appears to be spending more than $1500 on the company products in a span of 2 years. This is possible because the customer may not be the only person spending in the household (they are given money to spend by someone else in the household). Also, there is one instance where a customer earning more than $105,000 spend about $250 in 2 years. Without data visualization, the above information shown in figure 5, would have been hard to discern. From this visualization, the food company needs to focus more on creating products that are affordable to capture more sales from the low earners. The second objective could have evoked the use of stacked bar charts because complaints and responses (the acceptance of the offer by the customer during the last campaign) are categorical variables and they could easily be stacked. However, the use of a scatter plot in Figures 5 and 6 was appropriate. from Figure 5 the food company is not receiving an alarming number of complaints and thus the presence of lack of complaints does not affect sales as much. The food company needs to keep the policy that is leading to reduced complins. Figure 6 on the other hand indicates that even though not many customers are responding to offers during campaigns, those who respond appears to spend the highest amount on products than those who do don respond to offers. This poses a challenge to the food company to devise ways to increase customer response to offers during campaigns. This can be done through adverts that target specific customers and the production of specific products for specific segments of customers.

Challenges Animation generation methods which are used for interactive visualizations are only possible with time-varying data (Yu et al., 2010). Since the data used in this paper was not time-varying the use of animation methods was not necessary. Another challenge is that users differ in the choice of data visualization methods used resulting in multiple data visualization tools and methods (Sadiku et al., 2016). Big data whether structured or unstructured is always a challenge when it comes to developing visualizations. This problem arises because the diversity of the data must be considered during visualizations (Li et al., 2015). The biggest challenge encountered in this paper is the representation and visualization of multiple dummy variables (more than 4) on the x-axis of a single bar chart in R. This problem was solved by collapsing the dummy variables into one categorical variable with multiple levels.

References Boaz, N. (2022, August 29). ifood-data-business-analyst-test. GitHub. https://github.com/nailson/ifood-data-business-analyst-test/blob/master/ifood_df.csv Cleveland, W. S., & McGill, R. (1985). Graphical Perception and Graphical Methods for Analyzing Scientific Data. Science, 229(4716), 828–833. https://doi.org/10.1126/science.229.4716.828 Few, S., & Edge, P. (2007). Save the Pies for Dessert. https://courses.washington.edu/info424/2007/readings/Save%20the%20Pies%20for%20Dessert.pdf Heer, J., & Bostock, M. (2010, April 15). Crowdsourcing graphical perception. Proceedings of the 28th International Conference on Human Factors in Computing Systems. https://doi.org/10.1145/1753326.1753357 Hehman, E., & Xie, S. Y. (2021). Doing Better Data Visualization. Advances in Methods and Practices in Psychological Science, 4(4), 251524592110453. https://doi.org/10.1177/25152459211045334 Li, X., Kuroda, A., Matsuzaki, H., & Nakajima, N. (2015, October 1). Advanced aggregate computation for large data visualization. IEEE Xplore. https://doi.org/10.1109/LDAV.2015.7348086 Sadiku, M., Shadare, A. E., Musa, S. M., Akujuobi, C. M., & Perry, R. (2016). Data visualization. International Journal of Engineering Research And Advanced Technology (IJERAT), 2(12), 11-16. Yu, L., Lu, A., Ribarsky, W., & Chen, W. (2010). Automatic Animation for Time-Varying Data Visualization. Computer Graphics Forum, 29(7), 2271–2280. https://doi.org/10.1111/j.1467-8659.2010.01816.x