apples <- read_csv("apple.csv")
## Rows: 660 Columns: 4
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (4): regprc, ecoprc, reglbs, ecolbs
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Below are some pointers about markdown syntax, which is useful for writing about your results.
Bold the things you really care about. Italicize the things you want to emphasize.
You can make a list:
Or a bulleted list:
You can even write some math: \(Y_i = \beta_1 + \beta_2 X_i + u_i\).
OLS picks the \(\hat{\beta}_1\) and \(\hat{\beta}_2\) that minimize RSS.
Centered equations require two dollar signs on each side:
\[\hat{\beta}_2 = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2}\]
The file contains data from an experimental survey. The survey presented participants with randomly determined prices for “eco-labeled” apples and regular apples and then asked how many eco-labeled and regular apples they would buy at those prices. For reference, eco-labeling helps consumers identify sustainably-produced (or “green”) products and helps firms command higher prices for their products. You will estimate the demand for eco-labeled and regular apples by running regressions of apple quantity on prices. The fact that the prices were randomly assigned means that the exogeneity assumption holds so long as both prices are included in the model.
Variables in the dataset: - reglbs - Pounds of regular apples demanded - ecolbs - Pounds of eco-labeled apples demanded - regprc - Price of regular apples (per pound) - ecoprc - Price of eco-labeled price (per pound)
model1 <- lm(reglbs ~ regprc, data = apples)
summary(model1)
##
## Call:
## lm(formula = reglbs ~ regprc, data = apples)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.484 -1.277 -1.071 0.536 40.723
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.8896 0.4243 4.454 9.92e-06 ***
## regprc -0.6879 0.4632 -1.485 0.138
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.907 on 658 degrees of freedom
## Multiple R-squared: 0.00334, Adjusted R-squared: 0.001826
## F-statistic: 2.205 on 1 and 658 DF, p-value: 0.138
The classical demand curve shows that as quantity increases demand decreases. In our model, the slope is negative, -0.6879 which means that as reglbs (quantity) increases regprc (price) decreases – the model is consistent with the classical demand curve.
model2 <- lm(ecolbs ~ ecoprc, data = apples)
summary(model2)
##
## Call:
## lm(formula = ecolbs ~ ecoprc, data = apples)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.889 -1.298 -0.467 0.533 40.618
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.3881 0.3717 6.426 2.52e-10 ***
## ecoprc -0.8452 0.3315 -2.550 0.011 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.515 on 658 degrees of freedom
## Multiple R-squared: 0.009783, Adjusted R-squared: 0.008279
## F-statistic: 6.501 on 1 and 658 DF, p-value: 0.01101
# graphing for me
ggplot(apples, aes(ecoprc,ecolbs)) +
geom_point() +
geom_smooth(method = "lm")
The slope for this model is also negative (-0.845) and thus also consistent with the classical demand curve. The intercept coefficient of 2.39 means that at the price of 0 we can expect the consumer to buy 2.39 apples.
model3 <- lm(reglbs ~ regprc + ecoprc, data = apples)
summary(model3)
##
## Call:
## lm(formula = reglbs ~ regprc + ecoprc, data = apples)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.661 -1.278 -0.895 0.546 40.897
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.7187 0.4448 3.864 0.000123 ***
## regprc -1.5689 0.8318 -1.886 0.059723 .
## ecoprc 0.8771 0.6880 1.275 0.202823
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.906 on 657 degrees of freedom
## Multiple R-squared: 0.0058, Adjusted R-squared: 0.002773
## F-statistic: 1.916 on 2 and 657 DF, p-value: 0.148
By including both regprc and ecoprc on our regression on reglbs our estimated coefficient on regprc became more negative, -1.5689 as compared to -0.6879 previously. This suggests that the correlation between regprc and ecoprc strongly positive, we can verify this by using the cor() function.
cor(apples$regprc, apples$ecoprc)
## [1] 0.8307587
A correlation of 0.8307587 suggests a strong relationship between the two variables seeing as correlation can only be between 1 and -1, thus as the price in one variable rises or falls the other follows.
model4 <- lm(ecolbs ~ regprc + ecoprc, data = apples)
summary(model4)
##
## Call:
## lm(formula = ecolbs ~ regprc + ecoprc, data = apples)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.087 -1.087 -0.537 0.560 39.913
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.9653 0.3801 5.171 3.10e-07 ***
## regprc 3.0289 0.7108 4.261 2.33e-05 ***
## ecoprc -2.9265 0.5879 -4.978 8.23e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.483 on 657 degrees of freedom
## Multiple R-squared: 0.03641, Adjusted R-squared: 0.03348
## F-statistic: 12.41 on 2 and 657 DF, p-value: 5.107e-06
The \(R^2\) is 0.03641 meaning that only 3.6% of the variation in the data can be explained by the variables regprc and ecoprc.
stargazer(model1, model2, model3, model4,
type = "html")
| Dependent variable: | ||||
| reglbs | ecolbs | reglbs | ecolbs | |
| (1) | (2) | (3) | (4) | |
| regprc | -0.688 | -1.569* | 3.029*** | |
| (0.463) | (0.832) | (0.711) | ||
| ecoprc | -0.845** | 0.877 | -2.926*** | |
| (0.331) | (0.688) | (0.588) | ||
| Constant | 1.890*** | 2.388*** | 1.719*** | 1.965*** |
| (0.424) | (0.372) | (0.445) | (0.380) | |
| Observations | 660 | 660 | 660 | 660 |
| R2 | 0.003 | 0.010 | 0.006 | 0.036 |
| Adjusted R2 | 0.002 | 0.008 | 0.003 | 0.033 |
| Residual Std. Error | 2.907 (df = 658) | 2.515 (df = 658) | 2.906 (df = 657) | 2.483 (df = 657) |
| F Statistic | 2.205 (df = 1; 658) | 6.501** (df = 1; 658) | 1.916 (df = 2; 657) | 12.414*** (df = 2; 657) |
| Note: | p<0.1; p<0.05; p<0.01 | |||
As we can see, model 2 (ecolbs on regprc and ecoprc) and model 4 (ecolbs on ecoprc) had the highest \(R^2\), with 0.010 and 0.036 respectively. In both of these models we used ecolbs as our dependent variable and either only ecoprc or both ecoprc and regprc as the independent variables. It makes sense that as we added another variable from model 2 to 4 our \(R^2\) increased seeing as there more are more variables explaining variance in the data.
Also we should note that repgrc and reglbs are weakly correlated (p-value ≥ 0.1) and thus even though we added another variable (ecoprc) in model 3, the \(R^2\) only went up slightly, and it was lower than either model 2 or 4.
confint(model4, level = 0.99)
## 0.5 % 99.5 %
## (Intercept) 0.9834318 2.947175
## regprc 1.1925995 4.865227
## ecoprc -4.4452828 -1.407650
The output shows the lower and upper bounds for each coefficient, in this case the 99% confidence interval for regprc is between 1.19 and 4.87 and for ecoprc it is -4.45 and -1.41. This means that there is a 99% chance that the real coefficient of the these variables will fall into these ranges. Also, we should note that none of the confidence intervals include 0 so the relationship between the independent and dependent variable is most likely statistically significant and we can reject the null hypothesis.