During my time attending Xavier I have been exposed to a variety of data sets and marketing situations that have included food of different sorts. Growing up my Mom worked for a company that supplied grocery stores and restaurants across North America so food has always been of interest. At Xavier I was able to take this interest to the next level understanding why products are priced how they are, who these products target, why they are promoted the way that they are, and why they are placed in different areas.
By learning the tools to understand on both a marketing level and business analytics level, it has lead me to really value the way that data can lead to valuable answers in terms of how to conduct business especially surrounding food. In this project I will be looking at different types of fats and oils, trying to understand how they are priced based on fat content, health rating, and several other variables. It may not be the most spectacular product to analyze but not every product will be within the space of food products, but to me it still is interesting.
| Package | Summary |
|---|---|
| tidyverse | The tidyverse collection of packages |
| DT | Javascript enabled data tables |
| knitr | RMarkdown reporting |
| pander | Helps with generating summary tables |
| tidytext | Helps tokenize strings of words |
| DT | Helps to interface with websites |
| rvest | Assists in webscraping |
| httr | Helps in webscraping |
| PerformanceAnalytics | Used for correlation |
| stargazer | Helps create quality tables |
| stats | Used for regression |
| XML | Assists in scraping |
The data that was used comes from Kaggle. It shows 42 different oils and fats, within the data set there are several variables that will be interesting to look at in relation to price of each oil or fat. The data was downloaded and then uploaded to OneDrive, then read in here in order to create a usable data frame.
Download and reuse the data set using the link below:
There are a total of 42 observations and 17 variables
| Variable Name | Variable Explained |
|---|---|
| index | Numbering of the Oil |
| oil / fat | The Oil or Fat |
| type | The Type of the Oil or Fat |
| taste strength index | Strength of Taste |
| UK retail cost per 100ml ($) | Retail Price per 100ml in the UK |
| retail markup | Percent Retail Markup/100 |
| saturated % | Percent of Saturated Fats/100 |
| polyunsaturated (omega 3 & 6) % | Percent of Polyunsaturated Fats/100 |
| monounsaturated % | Percent of Monounsaturated Fats/100 |
| other fat % | Percent of Other Fats/100 |
| total reported fat content of oil % | Total Percent Fat of the Oil |
| other things (water, protein etc.), % | Percent of Additives/100 |
| omega 3 % | Percent of Omega 3/100 |
| omega 6 % | Percent of Omega 6/100 |
| health rating | Healthiness Rating of the Oil |
| keep in fridge? | Does it Need Refrigeration, Yes - 1, No - 0 |
| smoke point (C), average of sources | Average Temperature the Oil Smokes Celsius |
| Oil | Oil / Fat and Type Combined |
This is a table created to easily view, search, sort, and filter the data.
These are some interesting Summary Statistics to look at before diving further into the data.
| Average Retail Cost per 100ml | Average Percent Total Fat Content |
|---|---|
| 3.425 | 98.48 |
| Average Percent Omega 3 & 6 Content | Average Health Rating |
|---|---|
| 26.83 | 0.03619 |
| Average Percent Retail Markup | Highest Health Rating | Lowest Health Rating |
|---|---|---|
| 814.5 | 1.87 | -0.65 |
Will there be a connection between the Health Rating of the Fat or Oil and the Retail Cost?
My hypothesis is that there will be a connection between the two, with an increase in price as health rating increases. A ggplot with a geom smooth will be used to create the visualization.
What surprised me is that when looking at the graph there does not seem to be a clear connection between the price and how healthy it is for a person to consume. Price seems to be low for most types of oils and fats however, certain oils do vary significantly from the rest in terms of the health rating. Going forward this could interesting to look at.
Is there a connection between the Taste Strength Index and the Retail Markup, leading to higher Retail Costs per 100ml?
My assumption is that there will be a connection as this would explain why it is more sought after at higher prices if the taste strength index is higher and impacts the taste of food more.
It appears that there is not a strong connection between the taste strength index and the retail markup, however, what this visualization did reveal is that the higher end outlier in graph 1 is charging a huge percent retail markup and is a high price to begin with, meaning that it is sought after very high prices above its true worth.
Is there a connection between the Oil Type Index and Cost per 100ml when looking at the Percent of Omega 3 & 6?
Judging by the previous graphs I would not expect there to be a connection between the variables, however if oils higher in Omega 6 and 3 are higher in price it would not shock me, as people purchase items such as these to replenish needed fats like these. Which could result in a price premium.
After analyzing the visualization it appears that the omega 3 and 6 percentages within the oils do not impact the overall price significantly, but it does appear that the higher the content the lower the price is expected to be.
Is there a connection between the Oil Type Index and Retail Markup when looking at the Percent of Omega 3 and then looking at Percent of Omega 6?
I expect that one of the two Omega percents will play a larger role in contributing to the Retail Markup, as I assume one is more valuable than the other.
When looking at the retail markup, based on Omega 3 and Omega 6 percents it does not look like they are connected to the amount the price is marked up which to me means that the value behind the product is not associated immensely with either attribute for the oil.
The tables below are just to help give one last understanding of which Oils sell for higher prices
I would assume that because the majority of the variables that were considered against the price per 100ml did not have much significance that the overall value that some of these oils provide does not stem from the concentration of different types of fats and omega 3 and omega 6 percents, but rather in their applicability for different uses.
| Oil | UK retail cost per 100ml ($) | retail markup | health rating |
|---|---|---|---|
| argan | 17.48 | 3.59 | -0.01 |
| fish cod liver | 16.94 | 69.04 | 0.59 |
| cashew | 14.00 | 16.77 | 0.08 |
| pecan nut | 11.20 | 3.52 | 0.06 |
| apricot kernel | 8.45 | 2.53 | 0.06 |
| almond refined | 7.59 | 2.99 | 0.08 |
| macadamia | 6.80 | 11.79 | 0.07 |
| duck fat | 5.40 | 1.74 | -0.12 |
| wheat germ | 5.39 | 5.05 | 0.06 |
| pumpkin seed | 4.45 | 0.64 | -0.06 |
| Oil | UK retail cost per 100ml ($) | retail markup | health rating |
|---|---|---|---|
| sunflower linoleic, hydrogenated | 0.25 | 0.60 | 0.00 |
| corn refined | 0.29 | 2.14 | -0.05 |
| cottonseed | 0.32 | 1.94 | -0.20 |
| margarine hard | 0.44 | 3.50 | 0.04 |
| lard | 0.50 | 3.30 | -0.17 |
| suet | 0.54 | NA | -0.28 |
| canola / rapeseed refined | 0.56 | 4.08 | 0.31 |
| margarine soft: canola, palm, palm kernel mix | 0.64 | 4.61 | 0.24 |
| rice bran | 0.64 | 4.89 | 0.07 |
| peanut refined | 0.72 | 1.63 | -0.05 |
Data was scraped from Twitter using a developer account. A Twitter token was used to scrape the tweets from twitter. Once scraped and written into a csv, the data was then uploaded into OneDrive and at that point could be utilized. 500 tweets on Argan Oil and 500 tweets on Lard were gathered. These tweets were then combined into one set for ease of use here when doing the sentiment analysis.
How do people feel about the two different Oils/Fats?
Considering that both Argan Oil and Lard are used in everyday life by a wide variety of people I thought it would be interesting to see the way that people reacted when tweeting about them. With Argan Oil being priced higher than lard, generally speaking, and not much in the way of chemical makeup to explain why it deserves this higher price I would expect a high number of positive words vs. negative in favor of Argan Oil.
As can be seen above there is a direct relationship with how people feel towards each word in accordance with their prices. Argan Oil is priced much higher than Lard and it is clear that people are willing to pay that price by just looking at the feedback on it within these tweets.
How correlated is Price and Health Rating
Here I wanted to take a closer look at how price and health rating coincide or do not, my assumption is that they will be correlated heavily.
##
## Regression Results
## ===============================================
## Dependent variable:
## ---------------------------
## )`
## -----------------------------------------------
## `health rating` 2.184 (1.766)
## Constant 3.341*** (0.687)
## -----------------------------------------------
## Observations 41
## R2 0.038
## Adjusted R2 0.013
## Residual Std. Error 4.380 (df = 39)
## F Statistic 1.530 (df = 1; 39)
## ===============================================
## Note: *p<0.1; **p<0.05; ***p<0.01
After viewing the regression results it appears that price and health rating do not have much to do with each other at all, they do not coincide to drive price or lower it. That being said it makes sense given previous analysis in this document.
In closing this document does a good job in visualizing the understanding that Oils and Fats are not priced due to their contents or at least the contents included in this data. Instead they are priced by what seems to be their rarity and applicability across different facets of everyday life.