*******************************************
By using statistical methods, it is possible to accurately predict the quality of wine, which can be approximated by it’s price. Examining various factors over 25 years of data produces a model for predicted selling price. A Princeton professor named Orley Ashenfelter was the first person to do this analysis.
The following output shows the structure of the data and the first five rows of data.
## Year Price WinterRain AGST HarvestRain Age FrancePop
## 1 1952 7.4950 600 17.1167 160 31 43183.57
## 2 1953 8.0393 690 16.7333 80 30 43495.03
## 3 1955 7.6858 502 17.1500 130 28 44217.86
## 4 1957 6.9845 420 16.1333 110 26 45152.25
## 5 1958 6.7772 582 16.4167 187 25 45653.81
## 'data.frame': 25 obs. of 7 variables:
## $ Year : int 1952 1953 1955 1957 1958 1959 1960 1961 1962 1963 ...
## $ Price : num 7.5 8.04 7.69 6.98 6.78 ...
## $ WinterRain : int 600 690 502 420 582 485 763 830 697 608 ...
## $ AGST : num 17.1 16.7 17.1 16.1 16.4 ...
## $ HarvestRain: int 160 80 130 110 187 187 290 38 52 155 ...
## $ Age : int 31 30 28 26 25 24 23 22 21 20 ...
## $ FrancePop : num 43184 43495 44218 45152 45654 ...
As you can see, there is a positive linear relationship between the price of a bottle and the temperature during it’s growing season.
Below are a scatterplot matrix and correlation table for all variables.
## Year Price WinterRain AGST HarvestRain
## Year 1.00000000 -0.4477679 0.016970024 -0.24691585 0.02800907
## Price -0.44776786 1.0000000 0.136650547 0.65956286 -0.56332190
## WinterRain 0.01697002 0.1366505 1.000000000 -0.32109061 -0.27544085
## AGST -0.24691585 0.6595629 -0.321090611 1.00000000 -0.06449593
## HarvestRain 0.02800907 -0.5633219 -0.275440854 -0.06449593 1.00000000
## Age -1.00000000 0.4477679 -0.016970024 0.24691585 -0.02800907
## FrancePop 0.99448510 -0.4668616 -0.001621627 -0.25916227 0.04126439
## Age FrancePop
## Year -1.00000000 0.994485097
## Price 0.44776786 -0.466861641
## WinterRain -0.01697002 -0.001621627
## AGST 0.24691585 -0.259162274
## HarvestRain -0.02800907 0.041264394
## Age 1.00000000 -0.994485097
## FrancePop -0.99448510 1.000000000
Anova Table for Price ~ Growing Season Temp, Harvest Rainfall and Age
##
## Call:
## lm(formula = Price ~ AGST + HarvestRain + Age)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.66258 -0.22953 -0.00268 0.27236 0.49391
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.4778196 1.6274142 -0.908 0.37414
## AGST 0.5322922 0.0995343 5.348 2.65e-05 ***
## HarvestRain -0.0045386 0.0008757 -5.183 3.90e-05 ***
## Age 0.0250875 0.0087249 2.875 0.00905 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3186 on 21 degrees of freedom
## Multiple R-squared: 0.79, Adjusted R-squared: 0.76
## F-statistic: 26.34 on 3 and 21 DF, p-value: 2.596e-07
While this model fairs pretty well, Age is not quite as significant as the other two variables.
Anova Table for Price ~ Growing Season Temp and Harvest Rainfall
##
## Call:
## lm(formula = Price ~ AGST + HarvestRain)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.88321 -0.19600 0.06178 0.15379 0.59722
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.20265 1.85443 -1.188 0.247585
## AGST 0.60262 0.11128 5.415 1.94e-05 ***
## HarvestRain -0.00457 0.00101 -4.525 0.000167 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3674 on 22 degrees of freedom
## Multiple R-squared: 0.7074, Adjusted R-squared: 0.6808
## F-statistic: 26.59 on 2 and 22 DF, p-value: 1.347e-06
While this model has a lower R-squared (by approximately 8%), it explains as much as the previous model while still keeping a high adjusted R-squared value.
Dataset used in this analysis comes from “MITx: 15.071x The Analytics Edge” course.