Introduction: Area of Interest

During my time attending Xavier I have been exposed to a variety of data sets and marketing situations that have included food of different sorts. Growing up my Mom worked for a company that supplied grocery stores and restaurants across North America so food has always been of interest. At Xavier I was able to take this interest to the next level understanding why products are priced how they are, who these products target, why they are promoted the way that they are, and why they are placed in different areas.

By learning the tools to understand on both a marketing level and business analytics level, it has lead me to really value the way that data can lead to valuable answers in terms of how to conduct business especially surrounding food. In this project I will be looking at different types of fats and oils, trying to understand how they are priced based on fat content, health rating, and several other variables. It may not be the most spectacular product to analyze but not every product will be within the space of food products, but to me it still is interesting.

Packages Used:

Package Summary
tidyverse The tidyverse collection of packages
DT Javascript enabled data tables
knitr RMarkdown reporting
pander Helps with generating summary tables
tidytext Helps tokenize strings of words
DT Helps to interface with websites
rvest Assists in webscraping
httr Helps in webscraping
PerformanceAnalytics Used for correlation
stargazer Helps create quality tables
stats Used for regression
XML Assists in scraping

Data Collection:

The data that was used comes from Kaggle. It shows 42 different oils and fats, within the data set there are several variables that will be interesting to look at in relation to price of each oil or fat. The data was downloaded and then uploaded to OneDrive, then read in here in order to create a usable data frame.

Download and reuse the data set using the link below:

https://myxavier-my.sharepoint.com/:x:/g/personal/drakec1_xavier_edu/ESqqwW2C6uNMk_QdukwIsNQBTZcf8ijIMhzM5AB2tPLkyQ?e=m0ZXVi

Data Dictionary:

There are a total of 42 observations and 17 variables

Variable Name Variable Explained
index Numbering of the Oil
oil / fat The Oil or Fat
type The Type of the Oil or Fat
taste strength index Strength of Taste
UK retail cost per 100ml ($) Retail Price per 100ml in the UK
retail markup Percent Retail Markup/100
saturated % Percent of Saturated Fats/100
polyunsaturated (omega 3 & 6) % Percent of Polyunsaturated Fats/100
monounsaturated % Percent of Monounsaturated Fats/100
other fat % Percent of Other Fats/100
total reported fat content of oil % Total Percent Fat of the Oil
other things (water, protein etc.), % Percent of Additives/100
omega 3 % Percent of Omega 3/100
omega 6 % Percent of Omega 6/100
health rating Healthiness Rating of the Oil
keep in fridge? Does it Need Refrigeration, Yes - 1, No - 0
smoke point (C), average of sources Average Temperature the Oil Smokes Celsius
Oil Oil / Fat and Type Combined

Data Table:

This is a table created to easily view, search, sort, and filter the data.

Summary Statistics

These are some interesting Summary Statistics to look at before diving further into the data.

Table continues below
Average Retail Cost per 100ml Average Percent Total Fat Content
3.425 98.48
Table continues below
Average Percent Omega 3 & 6 Content Average Health Rating
26.83 0.03619
Average Percent Retail Markup Highest Health Rating Lowest Health Rating
814.5 1.87 -0.65

Descriptive Analysis

Will there be a connection between the Health Rating of the Fat or Oil and the Retail Cost?

My hypothesis is that there will be a connection between the two, with an increase in price as health rating increases. A ggplot with a geom smooth will be used to create the visualization.

What surprised me is that when looking at the graph there does not seem to be a clear connection between the price and how healthy it is for a person to consume. Price seems to be low for most types of oils and fats however, certain oils do vary significantly from the rest in terms of the health rating. Going forward this could interesting to look at.

Is there a connection between the Taste Strength Index and the Retail Markup, leading to higher Retail Costs per 100ml?

My assumption is that there will be a connection as this would explain why it is more sought after at higher prices if the taste strength index is higher and impacts the taste of food more.

It appears that there is not a strong connection between the taste strength index and the retail markup, however, what this visualization did reveal is that the higher end outlier in graph 1 is charging a huge percent retail markup and is a high price to begin with, meaning that it is sought after very high prices above its true worth.

Is there a connection between the Oil Type Index and Cost per 100ml when looking at the Percent of Omega 3 & 6?

Judging by the previous graphs I would not expect there to be a connection between the variables, however if oils higher in Omega 6 and 3 are higher in price it would not shock me, as people purchase items such as these to replenish needed fats like these. Which could result in a price premium.

After analyzing the visualization it appears that the omega 3 and 6 percentages within the oils do not impact the overall price significantly, but it does appear that the higher the content the lower the price is expected to be.

Is there a connection between the Oil Type Index and Retail Markup when looking at the Percent of Omega 3 and then looking at Percent of Omega 6?

I expect that one of the two Omega percents will play a larger role in contributing to the Retail Markup, as I assume one is more valuable than the other.

When looking at the retail markup, based on Omega 3 and Omega 6 percents it does not look like they are connected to the amount the price is marked up which to me means that the value behind the product is not associated immensely with either attribute for the oil.

The tables below are just to help give one last understanding of which Oils sell for higher prices

I would assume that because the majority of the variables that were considered against the price per 100ml did not have much significance that the overall value that some of these oils provide does not stem from the concentration of different types of fats and omega 3 and omega 6 percents, but rather in their applicability for different uses.

10 Most Expensive Oils per 100ml
Oil UK retail cost per 100ml ($) retail markup health rating
argan 17.48 3.59 -0.01
fish cod liver 16.94 69.04 0.59
cashew 14.00 16.77 0.08
pecan nut 11.20 3.52 0.06
apricot kernel 8.45 2.53 0.06
almond refined 7.59 2.99 0.08
macadamia 6.80 11.79 0.07
duck fat 5.40 1.74 -0.12
wheat germ 5.39 5.05 0.06
pumpkin seed 4.45 0.64 -0.06
10 Least Expensive Oils per 100ml
Oil UK retail cost per 100ml ($) retail markup health rating
sunflower linoleic, hydrogenated 0.25 0.60 0.00
corn refined 0.29 2.14 -0.05
cottonseed 0.32 1.94 -0.20
margarine hard 0.44 3.50 0.04
lard 0.50 3.30 -0.17
suet 0.54 NA -0.28
canola / rapeseed refined 0.56 4.08 0.31
margarine soft: canola, palm, palm kernel mix 0.64 4.61 0.24
rice bran 0.64 4.89 0.07
peanut refined 0.72 1.63 -0.05

Secondary Data Source

Data was scraped from Twitter using a developer account. A Twitter token was used to scrape the tweets from twitter. Once scraped and written into a csv, the data was then uploaded into OneDrive and at that point could be utilized. 500 tweets on Argan Oil and 500 tweets on Lard were gathered. These tweets were then combined into one set for ease of use here when doing the sentiment analysis.

How do people feel about the two different Oils/Fats?

Considering that both Argan Oil and Lard are used in everyday life by a wide variety of people I thought it would be interesting to see the way that people reacted when tweeting about them. With Argan Oil being priced higher than lard, generally speaking, and not much in the way of chemical makeup to explain why it deserves this higher price I would expect a high number of positive words vs. negative in favor of Argan Oil.

As can be seen above there is a direct relationship with how people feel towards each word in accordance with their prices. Argan Oil is priced much higher than Lard and it is clear that people are willing to pay that price by just looking at the feedback on it within these tweets.

Predictive or Prescriptive Analysis

How correlated is Price and Health Rating

Here I wanted to take a closer look at how price and health rating coincide or do not, my assumption is that they will be correlated heavily.

## 
## Regression Results
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                                 )`             
## -----------------------------------------------
## `health rating`            2.184 (1.766)       
## Constant                 3.341*** (0.687)      
## -----------------------------------------------
## Observations                    41             
## R2                             0.038           
## Adjusted R2                    0.013           
## Residual Std. Error       4.380 (df = 39)      
## F Statistic             1.530 (df = 1; 39)     
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01

After viewing the regression results it appears that price and health rating do not have much to do with each other at all, they do not coincide to drive price or lower it. That being said it makes sense given previous analysis in this document.

Conclusion

In closing this document does a good job in visualizing the understanding that Oils and Fats are not priced due to their contents or at least the contents included in this data. Instead they are priced by what seems to be their rarity and applicability across different facets of everyday life.