The Ice Cream Data set was produced through an observational study in the United States between 18th of March 1951 and the 11th of July 1953. The data was collected every four weeks for two years.Throughout the four weeks, the data collectors made observations characterised through four variables concerning ice cream. These included Ice-cream per head (Pints), ambient temperature (fahrenheit), price of Ice-cream (US Dollars) and average family weekly income (US Dollars). Due to the limitations of the dataset information many questions were risen.
This data can be interpreted as necessary for economic research on consumer demand and consumption for ice cream specifically in America for a data study. Thus, this project will be applicable for ice-cream manufacturers. This project can be utilised to predict the prices of ice-cream depending on its popularity at different periods of the year.
Unfortunately, there is no access to the the original book. As the data was only collected every four weeks over a two year period, there may be confounding economic factors restricting its reliability. These may include economic factors affecting consumption, price and weekly income during that time period. Nevertheless, the data was still published in a technical bulletin for Michigan State University which can be in accordance to a certain degree of veracity.
icecream = read.csv("C:/Users/Owner/Downloads/Icecream.csv")
summary(icecream)
## X cons income price
## Min. : 1.00 Min. :0.2560 Min. :76.00 Min. :0.2600
## 1st Qu.: 8.25 1st Qu.:0.3113 1st Qu.:79.25 1st Qu.:0.2685
## Median :15.50 Median :0.3515 Median :83.50 Median :0.2770
## Mean :15.50 Mean :0.3594 Mean :84.60 Mean :0.2753
## 3rd Qu.:22.75 3rd Qu.:0.3912 3rd Qu.:89.25 3rd Qu.:0.2815
## Max. :30.00 Max. :0.5480 Max. :96.00 Max. :0.2920
## temp
## Min. :24.00
## 1st Qu.:32.25
## Median :49.50
## Mean :49.10
## 3rd Qu.:63.75
## Max. :72.00
## Initial Data Analysis (IDA)
head(icecream)
## X cons income price temp
## 1 1 0.386 78 0.270 41
## 2 2 0.374 79 0.282 56
## 3 3 0.393 81 0.277 63
## 4 4 0.425 80 0.280 68
## 5 5 0.406 76 0.272 69
## 6 6 0.344 78 0.262 65
## Size of data
dim(icecream)
## [1] 30 5
## R's classification of data
class(icecream)
## [1] "data.frame"
## R's classification of variables
str(icecream)
## 'data.frame': 30 obs. of 5 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ cons : num 0.386 0.374 0.393 0.425 0.406 0.344 0.327 0.288 0.269 0.256 ...
## $ income: int 78 79 81 80 76 78 82 79 76 79 ...
## $ price : num 0.27 0.282 0.277 0.28 0.272 0.262 0.275 0.267 0.265 0.277 ...
## $ temp : int 41 56 63 68 69 65 61 47 32 24 ...
plot(icecream$price,icecream$cons)
The correlation between the price of ice-cream and its subsequent consumption is very minimal, especially compared to the correlation between temperature and consumption. The maximum price that ice-cream reached was $0.292, which was a considerably expensive price for ice-cream in the early 1950s. At this point, consumption was only 0.319. Generally, though, there is a relatively even distribution of consumption values, regardless of what the price was. This suggests the greater impact of other variables on the consumption of ice-cream.
plot(icecream$income,icecream$cons)
Like the impact of price, the impact of income on the consumption of ice-cream is minimal if not non-existent. There is no correlation between the two variables, and no trend is present. For this reason, it is reasonable to stipulate that the only influencing factor on ice-cream consumption is the temperature.
plot(icecream$temp,icecream$cons)
There is a very positive correlation between the consumption of ice cream and the temperature, as seen by the negative skew of the data set. For example, when the temperature was at a minimum of 24o, consumption was only 0.256, as opposed to when it was at a maximum of 80o and consumption was at 0.470. Given that temperature and consumption are the only two correlating variables, then temperature is the sole influencing factor on ice-cream consumption.
Based on a very simple economic outlook, we would generally expect the price of a certain product to reflect the level of consumption. When consumption is higher we would expect there to be a larger supply on average and subsequently lower prices of a product, in this case ice cream, and vice versa. However, when the price of Ice-cream is plotted against its consumption per head, in a scatter plot, it is difficult to discern any relationship.
library(ggplot2)
cons = icecream$cons
price = icecream$price
ggplot(icecream, aes(x=cons, y= price))+ geom_point() + labs(title = "Price vs Consumption of Icecream", x ="Icecream consumption per head (pints)", y ="Icecream price ($US)")
Using the correlation function allows for a numerical summary of correlation and will indicate any potential relationship that could not be analysed visually. We would expect this to be negative to indicate a negative correlation.
cor(cons,price)
## [1] -0.259594
The correlation function is slightly negative indicating a small correlation, which is not generally a good sign for its predictability. Applying a linear model in r, will allow us to see an estimated linear relationship between the two variables and the confidence margin or interval based on the data provided. We can then use the predict function to see what the price would be for a given consumption based on this model. We will use the example consumption of 0.5 pints per head.
model = lm(price ~cons, icecream)
summary(model)
##
## Call:
## lm(formula = price ~ cons, data = icecream)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.0138080 -0.0066783 0.0001715 0.0064037 0.0153690
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.287132 0.008452 33.973 <2e-16 ***
## cons -0.032917 0.023142 -1.422 0.166
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.008199 on 28 degrees of freedom
## Multiple R-squared: 0.06739, Adjusted R-squared: 0.03408
## F-statistic: 2.023 on 1 and 28 DF, p-value: 0.166
cons1 = data.frame(cons = 0.5)
predict(model, cons1, interval = "confidence")
## fit lwr upr
## 1 0.2706729 0.2633378 0.2780081
library(ggplot2)
ggplot(icecream, aes(x=cons, y= price))+
geom_point() +
labs(title = "Price vs Consumption of Icecream", x ="Icecream consumption per head (pints)", y ="Icecream price ($US)")+geom_smooth(method = "lm")
From the above example it is clear that basing the prediction of price solely on consumption, a single variable that is not greatly correlated in this data set, is not a reliable method. The margin of error is far too large and the distribution of data points regularly fall outside the field of confidence. This does not reflect that the data is unreliable, as there are most likely many confounding variables that affect the price of Ice cream, for example the price of dairy or labour. Although there could be a negative trend, which supports our initial basic economic outlook, much more data would be needed over a larger time frame with a broader range of products, in order to support any meaningful conclusion on the relationship between price and consumption.
How much does ice-cream cost?
hist(icecream$price, main = "Ice-cream Price", xlab = "Price per pint (US Dollars)")
abline(v = mean(icecream$price), col = "green")
abline(v = median(icecream$price), col = "purple")
We can see the mean, $0.2753USD in green and median, $0.277, is shown in purple.
Price alone doesn’t tell us how affordable ice cream was, we need to compare it to income. Introduce a new variable.
ratio = icecream\(price/icecream\)income*100
This represents the price of ice cream as a percentage of income. The lower the number the more affordable ice cream can be said to be.
mean(ratio) [1] 0.3271337
On average, the cost of one pint of ice cream was 0.33% of the weekly income. This is less than 1% of weekly income. Therefore, we can conclude that a pint of ice cream is reasonably affordable.
ratio = icecream$price/icecream$income*100
plot(ratio)
There is a clear decline over time in the price of ice cream as a proportion of income. In other words, ice cream gradually got more affordable over the course of the study. But why?
First, let’s look at income.
plot(icecream$income)
cor(icecream$X, icecream$income)
## [1] 0.8447886
This number is quite close to 1 which indicates a strong positive correlation. That is, income increased gradually during the course of this study.
Now, consider price
plot(icecream$price)
While income increased over time, the scatterplot of ice cream price doesn’t reveal any clear trend. Again, we can use the correlation command to measure this.
cor(icecream$X, icecream$price)
## [1] -0.06831566
The closeness of this result to 0 indicates limited to no correlation.
sd(icecream$price)
## [1] 0.008342455
Further, the standard deviation for ice cream price was only 0.008, indicating that the price of ice cream doesn’t vary much in this dataset. Thus, the over time increase in affordability in this dataset can be accounted for by the steadily increasing income.
The ice-cream data represented over the four years were measured by the possible variables of consumption, price, temperature and income. This can be a representation of America’s economical demands for the uprising market of ice cream manufacturing.
Hildreth, C. and Lu, J. (1960). Demand relations with autocorrelated disturbances. East Lansing, Mich.: Michigan State University.
Prabhakaran, S. (2016). Linear Regression With R. [online] R-statistics.co. Available at: http://r-statistics.co/Linear-Regression.html [Accessed 18 Mar. 2019].
Thepeoplehistory.com. (2019). What Happened in 1951 including Pop Culture, Significant Events, Key Technology and Inventions. [online] Available at: http://www.thepeoplehistory.com/1951.html [Accessed 22 Mar. 2019].