*******************************************
By using statistical methods, it is possible to accurately predict the quality of wine, which can be approximated by it’s price. Examining various factors over 25 years of data produces a model for predicted selling price. A Princeton professor named Orley Ashenfelter was the first person to do this analysis.



The following output shows the structure of the data and the first five rows of data.

##   Year  Price WinterRain    AGST HarvestRain Age FrancePop
## 1 1952 7.4950        600 17.1167         160  31  43183.57
## 2 1953 8.0393        690 16.7333          80  30  43495.03
## 3 1955 7.6858        502 17.1500         130  28  44217.86
## 4 1957 6.9845        420 16.1333         110  26  45152.25
## 5 1958 6.7772        582 16.4167         187  25  45653.81
## 'data.frame':    25 obs. of  7 variables:
##  $ Year       : int  1952 1953 1955 1957 1958 1959 1960 1961 1962 1963 ...
##  $ Price      : num  7.5 8.04 7.69 6.98 6.78 ...
##  $ WinterRain : int  600 690 502 420 582 485 763 830 697 608 ...
##  $ AGST       : num  17.1 16.7 17.1 16.1 16.4 ...
##  $ HarvestRain: int  160 80 130 110 187 187 290 38 52 155 ...
##  $ Age        : int  31 30 28 26 25 24 23 22 21 20 ...
##  $ FrancePop  : num  43184 43495 44218 45152 45654 ...



As you can see, there is a positive linear relationship between the price of a bottle and the temperature during it’s growing season.



Below are a scatterplot matrix and correlation table for all variables.

##                    Year      Price   WinterRain        AGST HarvestRain
## Year         1.00000000 -0.4477679  0.016970024 -0.24691585  0.02800907
## Price       -0.44776786  1.0000000  0.136650547  0.65956286 -0.56332190
## WinterRain   0.01697002  0.1366505  1.000000000 -0.32109061 -0.27544085
## AGST        -0.24691585  0.6595629 -0.321090611  1.00000000 -0.06449593
## HarvestRain  0.02800907 -0.5633219 -0.275440854 -0.06449593  1.00000000
## Age         -1.00000000  0.4477679 -0.016970024  0.24691585 -0.02800907
## FrancePop    0.99448510 -0.4668616 -0.001621627 -0.25916227  0.04126439
##                     Age    FrancePop
## Year        -1.00000000  0.994485097
## Price        0.44776786 -0.466861641
## WinterRain  -0.01697002 -0.001621627
## AGST         0.24691585 -0.259162274
## HarvestRain -0.02800907  0.041264394
## Age          1.00000000 -0.994485097
## FrancePop   -0.99448510  1.000000000




Anova Table for Price ~ Growing Season Temp, Harvest Rainfall and Age

## 
## Call:
## lm(formula = Price ~ AGST + HarvestRain + Age)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.66258 -0.22953 -0.00268  0.27236  0.49391 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.4778196  1.6274142  -0.908  0.37414    
## AGST         0.5322922  0.0995343   5.348 2.65e-05 ***
## HarvestRain -0.0045386  0.0008757  -5.183 3.90e-05 ***
## Age          0.0250875  0.0087249   2.875  0.00905 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3186 on 21 degrees of freedom
## Multiple R-squared:   0.79,  Adjusted R-squared:   0.76 
## F-statistic: 26.34 on 3 and 21 DF,  p-value: 2.596e-07


While this model fairs pretty well, Age is not quite as significant as the other two variables.



Anova Table for Price ~ Growing Season Temp and Harvest Rainfall

## 
## Call:
## lm(formula = Price ~ AGST + HarvestRain)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.88321 -0.19600  0.06178  0.15379  0.59722 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.20265    1.85443  -1.188 0.247585    
## AGST         0.60262    0.11128   5.415 1.94e-05 ***
## HarvestRain -0.00457    0.00101  -4.525 0.000167 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3674 on 22 degrees of freedom
## Multiple R-squared:  0.7074, Adjusted R-squared:  0.6808 
## F-statistic: 26.59 on 2 and 22 DF,  p-value: 1.347e-06


While this model has a lower R-squared (by approximately 8%), it explains as much as the previous model while still keeping a high adjusted R-squared value.



Dataset used in this analysis comes from “MITx: 15.071x The Analytics Edge” course.