EC3133
A company wants to understand the relationship between advertising expenditure and sales.
| Month | Advertising | Sales |
|---|---|---|
| 1 | 2 | 4 |
| 2 | 3 | 5 |
| 3 | 5 | 7 |
| 4 | 7 | 10 |
| 5 | 9 | 15 |
The OLS estimators are:
\[ \beta_1 = \frac{\sum (X_i - \bar{X})(Y_i - \bar{Y})}{\sum (X_i - \bar{X})^2} \] \[ \beta_0 = \bar{Y} - \beta_1\bar{X} \]
# Calculate means
X_bar <- mean(advertising_data$Advertising)
Y_bar <- mean(advertising_data$Sales)
# Calculate beta1
numerator <- sum((advertising_data$Advertising - X_bar) *
(advertising_data$Sales - Y_bar))
denominator <- sum((advertising_data$Advertising - X_bar)^2)
beta1_manual <- numerator/denominator
# Calculate beta0
beta0_manual <- Y_bar - beta1_manual * X_bar
# Print results
cat("Manual Calculation: \
")## Manual Calculation:
## β₁ = 1.518
## β₀ = 0.305
# Use lm() to calculate OLS coefficients
ols_model <- lm(Sales ~ Advertising, data = advertising_data)
summary(ols_model)$coefficients## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.304878 1.0435206 0.292163 0.789201842
## Advertising 1.518293 0.1800244 8.433816 0.003498189
# Create scatter plot with regression line
ggplot(advertising_data, aes(x = Advertising, y = Sales)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Sales vs. Advertising",
x = "Advertising Expenditure",
y = "Sales") +
theme_minimal()Estimating returns to education using parental education as an instrument.
| Individual | Education | Income | ParentalEd |
|---|---|---|---|
| 1 | 10 | 30 | 12 |
| 2 | 12 | 35 | 14 |
| 3 | 8 | 25 | 10 |
| 4 | 15 | 50 | 16 |
| 5 | 9 | 28 | 11 |
Two-Stage Least Squares (2SLS):
Stage 1: Regress \(X\) on \(Z\) \[ X = \pi_0 + \pi_1Z + v \]
Stage 2: Regress \(Y\) on \(\hat{X}\) \[ Y = \beta_0 + \beta_1\hat{X} + u \]
# First stage regression
stage1_manual <- lm(Education ~ ParentalEd, data = education_data)
education_data$fitted_education_manual <- fitted(stage1_manual)
# Second stage regression
stage2_manual <- lm(Income ~ fitted_education_manual, data = education_data)
summary(stage2_manual)$coefficients## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.428571 6.4163816 -0.5343466 0.630164652
## fitted_education_manual 3.428571 0.5791589 5.9199150 0.009629824
# First stage
stage1 <- lm(Education ~ ParentalEd, data = education_data)
education_data$fitted_education <- fitted(stage1)
# Second stage
stage2 <- lm(Income ~ fitted_education, data = education_data)
summary(stage2)$coefficients## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.428571 6.4163816 -0.5343466 0.630164652
## fitted_education 3.428571 0.5791589 5.9199150 0.009629824
Estimating the parameter of an exponential distribution for light bulb lifetimes.
| Bulb | Lifetime |
|---|---|
| 1 | 1000 |
| 2 | 1200 |
| 3 | 800 |
| 4 | 950 |
| 5 | 1100 |
For exponential distribution: \[ \hat{\lambda} = \frac{1}{\bar{y}} \]
# Manual calculation of lambda
lambda_manual <- 1/mean(bulb_data$Lifetime)
cat("Manual Calculation: \
")## Manual Calculation:
## λ = 0.00099
# Direct calculation using MLE
lambda_direct <- 1/mean(bulb_data$Lifetime)
cat("Direct Calculation: \
")## Direct Calculation:
## λ = 0.00099
# Plot histogram with fitted exponential density
ggplot(bulb_data, aes(x = Lifetime)) +
geom_histogram(aes(y = ..density..), bins = 10, fill = "lightblue", color = "black") +
stat_function(fun = dexp, args = list(rate = lambda_direct), color = "red") +
labs(title = "Light Bulb Lifetimes with Fitted Exponential Distribution",
x = "Lifetime (hours)",
y = "Density") +
theme_minimal()Estimating parameters of a normal distribution for height data.
Estimating parameters of a normal distribution for height data.
| Individual | Height |
|---|---|
| 1 | 160 |
| 2 | 170 |
| 3 | 165 |
| 4 | 175 |
| 5 | 180 |
For a normal distribution, the method of moments estimators are:
\[ \hat{\mu} = \bar{Y} \] \[ \hat{\sigma}^2 = \frac{1}{n} \sum (Y_i - \bar{Y})^2 \]
# Manual calculation of mean and variance
mu_manual <- mean(height_data$Height)
sigma2_manual <- var(height_data$Height)## Manual Calculation:
## μ̂ = 170
## σ̂² = 62.5
# Direct calculation using R functions
mu_direct <- mean(height_data$Height)
sigma2_direct <- var(height_data$Height)## Direct Calculation:
## μ̂ = 170
## σ̂² = 62.5
Now let’s plot the histogram with fitted normal density:
Remember: