1 Introduction

1.1 What is Fama-MacBeth Regression?

The Fama-MacBeth (1973) regression is a widely used two-step statistical method in empirical asset pricing. It was introduced by Eugene F. Fama and James D. MacBeth in their landmark 1973 paper “Risk, Return, and Equilibrium: Empirical Tests” published in the Journal of Political Economy.

The method is especially powerful when working with panel data that has more cross-sectional units (e.g., firms, portfolios) than time periods. Its key advantage is that it produces standard errors corrected for cross-sectional correlation, which leads to more reliable inference in finance research.

1.2 Why Use Fama-MacBeth Regression?

Feature	Benefit
Corrects cross-sectional correlation	More reliable t-statistics and p-values
Two-step structure	Separates time-series and cross-sectional variation
Scalable	Works with any number of firms or risk factors
Industry standard	Used in top finance journals worldwide

1.3 The Three-Step Procedure

The Fama-MacBeth procedure involves three steps:

Step 0 (Time-Series Regression): For each asset/portfolio, regress its returns on risk factors over time to estimate factor loadings (betas).
Step 1 (Cross-Sectional Regression): For each time period, regress asset returns on the betas estimated in Step 0.
Step 2 (Average the Coefficients): Compute the time-series average of the cross-sectional regression coefficients and test their significance.

Note: The pmg() function in the plm package implements the Mean Groups estimator, which is mathematically equivalent to Steps 1 and 2 of the Fama-MacBeth procedure, assuming Step 0 has already been completed externally.

2 Package Installation and Loading

# ============================================================
# INSTALL PACKAGES (only runs if not already installed)
# ============================================================

# List of required packages
required_packages <- c(
  "plm",       # Panel Linear Models — core package for pmg()
  "lmtest",    # Coefficient testing — coeftest()
  "sandwich",  # Robust standard errors
  "readxl",    # Read Excel files
  "ggplot2",   # Beautiful visualizations
  "dplyr",     # Data manipulation
  "tidyr",     # Data tidying
  "knitr",     # Table formatting
  "kableExtra",# Enhanced tables
  "broom",     # Tidy model outputs
  "scales",    # Axis formatting in ggplot
  "corrplot",  # Correlation matrix plot
  "car"        # Companion to Applied Regression
)

# Install any missing packages automatically
installed <- rownames(installed.packages())
to_install <- required_packages[!required_packages %in% installed]
if (length(to_install) > 0) {
  install.packages(to_install, repos = "https://cran.r-project.org")
}

# ============================================================
# LOAD ALL REQUIRED LIBRARIES
# ============================================================

library(plm)        # Panel data models and Fama-MacBeth via pmg()
library(lmtest)     # coeftest() for coefficient hypothesis testing
library(sandwich)   # Newey-West robust standard errors
library(readxl)     # Read .xlsx data files
library(ggplot2)    # Data visualization
library(dplyr)      # Data manipulation (filter, mutate, select, etc.)
library(tidyr)      # pivot_longer, pivot_wider
library(knitr)      # kable() for neat tables
library(kableExtra) # Enhanced HTML tables
library(broom)      # tidy(), glance() for clean model output
library(scales)     # comma(), percent() for axis labels
library(corrplot)   # Correlation matrix visualization
library(car)        # linearHypothesis(), vif()

cat("All packages loaded successfully!\n")

## All packages loaded successfully!

3 Data

3.1 About the Dataset

We use the Grunfeld Investment Dataset, a classic panel dataset in econometrics. The dataset contains investment, market value, and capital stock data for 10 large U.S. firms across 20 years (1935–1954).

The original data is from: > Grunfeld, Y. (1958). The Determinants of Corporate Investment. Doctoral Dissertation, University of Chicago.

It is available at:
http://people.stern.nyu.edu/wgreene/Econometrics/PanelDataSets.htm

3.2 Variable Description

Variable	Description
`FIRM`	Firm identifier (1–10)
`YEAR`	Year of observation (1935–1954)
`investment`	Gross investment ($ millions)
`mvalue`	Market value of the firm at start of year
`kstock`	Value of the capital stock at start of year

3.3 Data Loading

# ============================================================
# DATA LOADING
# ============================================================
# The Grunfeld dataset is also built into the 'plm' package.
# We load it directly so this document is fully reproducible
# without needing an external file.
#
# IF YOU HAVE THE INSTRUCTOR'S EXCEL FILE:
#   Replace the code below with:
#   grunfeld <- read_excel("grunfeld.xlsx")
#   (Make sure the .xlsx file is in the same folder as this .Rmd)
# ============================================================

# Load Grunfeld data from the plm package (built-in)
data("Grunfeld", package = "plm")

# Create a working copy and rename to match tutorial variable names
grunfeld_copy <- Grunfeld
names(grunfeld_copy) <- c("FIRM", "YEAR", "investment", "mvalue", "kstock")

# Confirm variable names
cat("Variable names:\n")

## Variable names:

names(grunfeld_copy)

## [1] "FIRM"       "YEAR"       "investment" "mvalue"     "kstock"

3.4 Data Preview

# ============================================================
# PREVIEW THE FIRST FEW ROWS
# ============================================================

# Display first 10 rows as a formatted table
head(grunfeld_copy, 10) %>%
  kable(
    caption = "Table 1: First 10 Rows of the Grunfeld Dataset",
    align   = "ccccc",
    digits  = 2
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed", "responsive"),
    full_width        = FALSE,
    position          = "center"
  )

Table 1: First 10 Rows of the Grunfeld Dataset
FIRM	YEAR	investment	mvalue	kstock
1	1935	317.6	3078.5	2.8
1	1936	391.8	4661.7	52.6
1	1937	410.6	5387.1	156.9
1	1938	257.7	2792.2	209.2
1	1939	330.8	4313.2	203.4
1	1940	461.2	4643.9	207.2
1	1941	512.0	4551.2	255.2
1	1942	448.0	3244.1	303.7
1	1943	499.6	4053.7	264.1
1	1944	547.5	4379.3	201.6

4 Data Preprocessing

4.1 Dataset Dimensions

# ============================================================
# BASIC DATASET INFORMATION
# ============================================================

cat("Dimensions of dataset:", nrow(grunfeld_copy), "rows and",
    ncol(grunfeld_copy), "columns\n\n")

## Dimensions of dataset: 200 rows and 5 columns

cat("Number of unique firms:", length(unique(grunfeld_copy$FIRM)), "\n")

## Number of unique firms: 10

cat("Number of unique years:", length(unique(grunfeld_copy$YEAR)), "\n")

## Number of unique years: 20

cat("Year range:", min(grunfeld_copy$YEAR), "to", max(grunfeld_copy$YEAR), "\n")

## Year range: 1935 to 1954

cat("Total observations:", nrow(grunfeld_copy), "\n")

## Total observations: 200

4.2 Check for Missing Values

# ============================================================
# CHECK FOR MISSING VALUES — important before any regression
# ============================================================

missing_summary <- sapply(grunfeld_copy, function(x) sum(is.na(x)))

data.frame(
  Variable       = names(missing_summary),
  Missing_Values = missing_summary
) %>%
  kable(
    caption = "Table 2: Missing Value Summary",
    row.names = FALSE
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width        = FALSE,
    position          = "center"
  )

Table 2: Missing Value Summary
Variable	Missing_Values
FIRM	0
YEAR	0
investment	0
mvalue	0
kstock	0

4.3 Ensure Correct Data Types

# ============================================================
# ENSURE CORRECT DATA TYPES
# ============================================================

# FIRM should be a factor (categorical), YEAR should be numeric
grunfeld_copy$FIRM <- as.factor(grunfeld_copy$FIRM)
grunfeld_copy$YEAR <- as.numeric(as.character(grunfeld_copy$YEAR))

# Confirm structure
str(grunfeld_copy)

## 'data.frame':    200 obs. of  5 variables:
##  $ FIRM      : Factor w/ 10 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ YEAR      : num  1935 1936 1937 1938 1939 ...
##  $ investment: num  318 392 411 258 331 ...
##  $ mvalue    : num  3078 4662 5387 2792 4313 ...
##  $ kstock    : num  2.8 52.6 156.9 209.2 203.4 ...

5 Exploratory Data Analysis

5.1 Summary Statistics

# ============================================================
# DESCRIPTIVE STATISTICS
# ============================================================

summary_stats <- grunfeld_copy %>%
  select(investment, mvalue, kstock) %>%
  summarise(
    across(
      everything(),
      list(
        Mean   = ~round(mean(.x, na.rm = TRUE), 2),
        Median = ~round(median(.x, na.rm = TRUE), 2),
        SD     = ~round(sd(.x, na.rm = TRUE), 2),
        Min    = ~round(min(.x, na.rm = TRUE), 2),
        Max    = ~round(max(.x, na.rm = TRUE), 2)
      )
    )
  ) %>%
  pivot_longer(everything(),
               names_to  = c("Variable", "Statistic"),
               names_sep = "_") %>%
  pivot_wider(names_from = Variable, values_from = value)

summary_stats %>%
  kable(
    caption = "Table 3: Descriptive Statistics",
    align   = "lrrr"
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width        = FALSE,
    position          = "center"
  )

Table 3: Descriptive Statistics
Statistic	investment	mvalue	kstock
Mean	145.96	1081.68	276.02
Median	57.48	517.95	205.60
SD	216.88	1314.47	301.10
Min	0.93	58.12	0.80
Max	1486.70	6241.70	2226.30

5.2 Distribution of Key Variables

# ============================================================
# HISTOGRAMS — visualize distribution of all three variables
# ============================================================

# Reshape data to long format for faceted plot
grunfeld_long <- grunfeld_copy %>%
  select(investment, mvalue, kstock) %>%
  pivot_longer(cols = everything(), names_to = "Variable", values_to = "Value") %>%
  mutate(Variable = case_match(Variable,
    "investment" ~ "Investment",
    "mvalue"     ~ "Market Value",
    "kstock"     ~ "Capital Stock"
  ))

ggplot(grunfeld_long, aes(x = Value, fill = Variable)) +
  geom_histogram(bins = 20, color = "white", alpha = 0.85) +
  facet_wrap(~Variable, scales = "free") +
  scale_fill_manual(values = c("#2C3E50", "#E74C3C", "#27AE60")) +
  scale_x_continuous(labels = comma) +
  labs(
    title    = "Distribution of Key Variables",
    subtitle = "Grunfeld Investment Dataset (1935–1954)",
    x        = "Value ($ millions)",
    y        = "Frequency",
    fill     = "Variable"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position  = "none",
    strip.text       = element_text(face = "bold"),
    plot.title       = element_text(face = "bold", size = 15),
    plot.subtitle    = element_text(color = "gray40")
  )

Figure 1: Distribution of Investment, Market Value, and Capital Stock

5.3 Investment Trends Over Time by Firm

# ============================================================
# LINE PLOT — investment over time, one line per firm
# ============================================================

ggplot(grunfeld_copy, aes(x = YEAR, y = investment, color = FIRM, group = FIRM)) +
  geom_line(size = 0.8, alpha = 0.8) +
  geom_point(size = 1.2, alpha = 0.6) +
  scale_color_brewer(palette = "Paired") +
  scale_y_continuous(labels = comma) +
  labs(
    title    = "Gross Investment Over Time by Firm",
    subtitle = "Each line represents one of the 10 firms (1935–1954)",
    x        = "Year",
    y        = "Gross Investment ($ millions)",
    color    = "Firm ID"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position  = "right",
    plot.title       = element_text(face = "bold", size = 15),
    plot.subtitle    = element_text(color = "gray40")
  )

Figure 2: Investment Trends Over Time by Firm

5.4 Scatter Plot: Market Value vs. Investment

# ============================================================
# SCATTER PLOT — mvalue vs investment with regression line
# ============================================================

ggplot(grunfeld_copy, aes(x = mvalue, y = investment)) +
  geom_point(aes(color = FIRM), size = 2.5, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "#2C3E50", linetype = "dashed") +
  scale_color_brewer(palette = "Paired") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  labs(
    title    = "Market Value vs. Gross Investment",
    subtitle = "Positive relationship expected under investment theory",
    x        = "Market Value ($ millions)",
    y        = "Gross Investment ($ millions)",
    color    = "Firm ID"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "gray40")
  )

Figure 3: Market Value vs. Gross Investment

5.5 Scatter Plot: Capital Stock vs. Investment

# ============================================================
# SCATTER PLOT — kstock vs investment with regression line
# ============================================================

ggplot(grunfeld_copy, aes(x = kstock, y = investment)) +
  geom_point(aes(color = FIRM), size = 2.5, alpha = 0.7) +
  geom_smooth(method = "lm", se = TRUE, color = "#E74C3C", linetype = "dashed") +
  scale_color_brewer(palette = "Paired") +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  labs(
    title    = "Capital Stock vs. Gross Investment",
    subtitle = "Capital stock as a lagged measure of firm size",
    x        = "Capital Stock ($ millions)",
    y        = "Gross Investment ($ millions)",
    color    = "Firm ID"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "gray40")
  )

Figure 4: Capital Stock vs. Gross Investment

5.6 Correlation Matrix

# ============================================================
# CORRELATION MATRIX
# ============================================================

# Select numeric variables only
numeric_vars <- grunfeld_copy %>%
  select(investment, mvalue, kstock)

# Compute correlation matrix
cor_matrix <- cor(numeric_vars, use = "complete.obs")

# Plot correlation matrix
corrplot(
  cor_matrix,
  method   = "color",
  type     = "upper",
  addCoef.col = "black",
  tl.col   = "black",
  tl.srt   = 45,
  col      = colorRampPalette(c("#E74C3C", "white", "#2C3E50"))(200),
  title    = "Correlation Matrix: Investment, Market Value, Capital Stock",
  mar      = c(0, 0, 2, 0)
)

Figure 5: Correlation Matrix of Key Variables

The correlation matrix shows strong positive correlations between all three variables, which makes economic sense — larger firms (higher market value and capital stock) tend to invest more.

6 Methodology

6.1 Theoretical Framework

The Fama-MacBeth (1973) regression is designed to test whether specific characteristics (here: market value and capital stock) explain variation in investment levels across firms and over time.

6.1.1 The Cross-Sectional Regression Model

For each time period $t$, we estimate:

\[\text{Investment}_{i,t} = \alpha_t + \beta_{1,t} \cdot \text{mvalue}_{i,t} + \beta_{2,t} \cdot \text{kstock}_{i,t} + \varepsilon_{i,t}\]

where $i$ indexes firms and $t$ indexes years.

6.1.2 The Final Estimates

The Fama-MacBeth coefficients are computed as the time-series averages:

\[\hat{\beta}_k^{FM} = \frac{1}{T} \sum_{t=1}^{T} \hat{\beta}_{k,t}\]

6.1.3 Standard Errors

The Fama-MacBeth standard errors are:

\[\text{SE}(\hat{\beta}_k^{FM}) = \frac{\hat{\sigma}(\hat{\beta}_{k,t})}{\sqrt{T}}\]

where $\hat{\sigma}$ is the standard deviation of the $T$ period-by-period estimates.

6.2 Implementation in R: The `pmg()` Function

The pmg() function from the plm package implements the Mean Groups estimator, which is mathematically equivalent to the Fama-MacBeth procedure. The key insight is:

Fama-MacBeth is the Mean Groups estimator with groups = time periods (i.e., we average over cross-sections at each time period).

This is achieved by reversing the index in pmg() — putting YEAR first and FIRM second.

7 Fama-MacBeth Regression Analysis

7.1 Step 0: Preliminary — Confirm Panel Structure

# ============================================================
# STEP 0: CONFIRM THE PANEL IS BALANCED
# ============================================================

# Check panel balance
cat("Panel dimensions:\n")

## Panel dimensions:

cat("  Number of firms (cross-sections):", 
    length(unique(grunfeld_copy$FIRM)), "\n")

##   Number of firms (cross-sections): 10

cat("  Number of years (time periods):", 
    length(unique(grunfeld_copy$YEAR)), "\n")

##   Number of years (time periods): 20

cat("  Total observations:", nrow(grunfeld_copy), "\n")

##   Total observations: 200

cat("  Balanced panel:", 
    nrow(grunfeld_copy) == length(unique(grunfeld_copy$FIRM)) * 
                           length(unique(grunfeld_copy$YEAR)), "\n")

##   Balanced panel: TRUE

7.2 Step 1 & 2: Fama-MacBeth via `pmg()`

# ============================================================
# FAMA-MACBETH REGRESSION USING pmg()
# ============================================================
# The pmg() function from the plm package estimates the Mean
# Groups model. By setting index = c("YEAR", "FIRM"), we tell
# the function to:
#   - Run T separate regressions (one per YEAR)
#   - Average the coefficients across all years (over FIRMs)
# This is exactly the Fama-MacBeth procedure.
# ============================================================

fpmg <- pmg(
  formula = investment ~ mvalue + kstock,
  data    = grunfeld_copy,
  index   = c("YEAR", "FIRM")   # YEAR first = time is the grouping variable
)

# Display the model summary
summary(fpmg)

## Mean Groups model
## 
## Call:
## pmg(formula = investment ~ mvalue + kstock, data = grunfeld_copy, 
##     index = c("YEAR", "FIRM"))
## 
## Balanced Panel: n = 20, T = 10, N = 200
## 
## Residuals:
##        Min.     1st Qu.      Median     3rd Qu.        Max. 
## -342.804410  -20.100677    0.726992   23.652588  258.504556 
## 
## Coefficients:
##                Estimate  Std. Error z-value  Pr(>|z|)    
## (Intercept) -14.7569720   7.2876699 -2.0249  0.042875 *  
## mvalue        0.1306047   0.0093422 13.9801 < 2.2e-16 ***
## kstock        0.0729576   0.0277398  2.6301  0.008537 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Total Sum of Squares: 9359900
## Residual Sum of Squares: 1205800
## Multiple R-squared: 0.87117

7.2.1 Interpretation of `summary()` Output

The output above shows:

Balanced Panel: n = 20 time periods, T = 10 firms, N = 200 total observations.
Coefficients: The estimated average coefficients across all 20 cross-sectional regressions.
R-squared: The proportion of variance in investment explained by market value and capital stock.

7.3 Coefficient Test (t-test)

# ============================================================
# COEFFICIENT HYPOTHESIS TEST
# ============================================================
# coeftest() from the lmtest package performs t-tests on each
# coefficient. The null hypothesis is H0: coefficient = 0.
# ============================================================

cat("=== Fama-MacBeth Regression: Coefficient Test ===\n\n")

## === Fama-MacBeth Regression: Coefficient Test ===

coeftest(fpmg)

## 
## t test of coefficients:
## 
##                Estimate  Std. Error t value  Pr(>|t|)    
## (Intercept) -14.7569720   7.2876699 -2.0249  0.044225 *  
## mvalue        0.1306047   0.0093422 13.9801 < 2.2e-16 ***
## kstock        0.0729576   0.0277398  2.6301  0.009211 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

7.4 Confidence Intervals

# ============================================================
# 95% CONFIDENCE INTERVALS FOR EACH COEFFICIENT
# ============================================================

ci <- confint(fpmg)

ci_table <- as.data.frame(ci)
ci_table$Estimate <- coef(fpmg)
ci_table <- ci_table[, c("Estimate", "2.5 %", "97.5 %")]
names(ci_table) <- c("Estimate", "Lower 95% CI", "Upper 95% CI")

ci_table %>%
  round(4) %>%
  kable(
    caption = "Table 4: Fama-MacBeth Coefficient Estimates with 95% Confidence Intervals"
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover"),
    full_width        = FALSE,
    position          = "center"
  )

Table 4: Fama-MacBeth Coefficient Estimates with 95% Confidence Intervals
	Estimate	Lower 95% CI	Upper 95% CI
(Intercept)	-14.7570	-29.0405	-0.4734
mvalue	0.1306	0.1123	0.1489
kstock	0.0730	0.0186	0.1273

8 Results Visualization

8.1 Coefficient Plot

# ============================================================
# COEFFICIENT PLOT — visualize estimates and confidence intervals
# ============================================================

# Extract coefficients and confidence intervals
coef_df <- data.frame(
  term     = rownames(ci_table),
  estimate = ci_table$Estimate,
  lower    = ci_table$`Lower 95% CI`,
  upper    = ci_table$`Upper 95% CI`
)

# Remove intercept for cleaner plot (optional — comment out to include)
coef_df_no_int <- coef_df[coef_df$term != "(Intercept)", ]

ggplot(coef_df_no_int, aes(x = term, y = estimate)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50", size = 0.8) +
  geom_point(size = 5, color = "#2C3E50") +
  geom_errorbar(
    aes(ymin = lower, ymax = upper),
    width = 0.15,
    size  = 1,
    color = "#E74C3C"
  ) +
  labs(
    title    = "Fama-MacBeth Regression: Coefficient Estimates",
    subtitle = "Error bars represent 95% confidence intervals",
    x        = "Variable",
    y        = "Coefficient Estimate"
  ) +
  scale_x_discrete(labels = c("kstock" = "Capital Stock", "mvalue" = "Market Value")) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "gray40")
  )

Figure 6: Fama-MacBeth Coefficient Estimates with 95% Confidence Intervals

8.2 Full Coefficient Plot (Including Intercept)

# ============================================================
# FULL COEFFICIENT PLOT including intercept
# ============================================================

ggplot(coef_df, aes(x = term, y = estimate, color = term)) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray50", size = 0.8) +
  geom_point(size = 5) +
  geom_errorbar(
    aes(ymin = lower, ymax = upper),
    width = 0.2,
    size  = 1
  ) +
  scale_color_manual(values = c("#8E44AD", "#27AE60", "#E74C3C")) +
  scale_x_discrete(labels = c(
    "(Intercept)" = "Intercept",
    "kstock"      = "Capital Stock",
    "mvalue"      = "Market Value"
  )) +
  labs(
    title    = "Fama-MacBeth Regression: All Coefficients",
    subtitle = "Error bars represent 95% confidence intervals",
    x        = "Variable",
    y        = "Coefficient Estimate"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    legend.position = "none",
    plot.title      = element_text(face = "bold", size = 15),
    plot.subtitle   = element_text(color = "gray40")
  )

Figure 7: All Fama-MacBeth Coefficients Including Intercept

8.3 Model Fit: Actual vs. Fitted Values

# ============================================================
# ACTUAL VS FITTED VALUES — assess model fit visually
# ============================================================

# Get fitted values from the model
fitted_vals  <- fitted(fpmg)
actual_vals  <- grunfeld_copy$investment

fit_df <- data.frame(
  Actual  = actual_vals,
  Fitted  = fitted_vals
)

ggplot(fit_df, aes(x = Fitted, y = Actual)) +
  geom_point(color = "#2980B9", alpha = 0.6, size = 2.5) +
  geom_abline(slope = 1, intercept = 0,
              linetype = "dashed", color = "#E74C3C", size = 1) +
  scale_x_continuous(labels = comma) +
  scale_y_continuous(labels = comma) +
  labs(
    title    = "Actual vs. Fitted Values",
    subtitle = "Points close to the dashed line indicate a good model fit",
    x        = "Fitted Investment ($ millions)",
    y        = "Actual Investment ($ millions)"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "gray40")
  )

Figure 8: Actual vs. Fitted Investment Values

8.4 Residuals Distribution

# ============================================================
# RESIDUALS — check for normality
# ============================================================

resid_df <- data.frame(Residuals = residuals(fpmg))

ggplot(resid_df, aes(x = Residuals)) +
  geom_histogram(
    aes(y = after_stat(density)),
    bins    = 25,
    fill    = "#27AE60",
    color   = "white",
    alpha   = 0.85
  ) +
  geom_density(color = "#2C3E50", size = 1.2) +
  labs(
    title    = "Distribution of Model Residuals",
    subtitle = "Should be approximately normal for valid inference",
    x        = "Residuals",
    y        = "Density"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title    = element_text(face = "bold", size = 15),
    plot.subtitle = element_text(color = "gray40")
  )

Figure 9: Distribution of Model Residuals

9 Summary Results Table

# ============================================================
# CLEAN SUMMARY TABLE OF REGRESSION RESULTS
# ============================================================

ct <- coeftest(fpmg)

results_df <- data.frame(
  Variable    = c("Intercept", "Market Value (mvalue)", "Capital Stock (kstock)"),
  Estimate    = round(ct[, "Estimate"], 4),
  Std_Error   = round(ct[, "Std. Error"], 4),
  t_value     = round(ct[, "t value"], 4),
  p_value     = round(ct[, "Pr(>|t|)"], 4),
  Significance = ifelse(ct[, "Pr(>|t|)"] < 0.001, "***",
                 ifelse(ct[, "Pr(>|t|)"] < 0.01, "**",
                 ifelse(ct[, "Pr(>|t|)"] < 0.05, "*",
                 ifelse(ct[, "Pr(>|t|)"] < 0.1, ".", ""))))
)

results_df %>%
  kable(
    caption   = "Table 5: Fama-MacBeth Regression Results Summary",
    col.names = c("Variable", "Estimate", "Std. Error", "t-value", "p-value", "Sig."),
    align     = "lccccc"
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "hover", "condensed"),
    full_width        = TRUE,
    position          = "center"
  ) %>%
  footnote(
    general = "Significance codes: *** p < 0.001, ** p < 0.01, * p < 0.05, . p < 0.1",
    general_title = ""
  )

Table 5: Fama-MacBeth Regression Results Summary
	Variable	Estimate	Std. Error	t-value	p-value	Sig.
(Intercept)	Intercept	-14.7570	7.2877	-2.0249	0.0442
mvalue	Market Value (mvalue)	0.1306	0.0093	13.9801	0.0000	***
kstock	Capital Stock (kstock)	0.0730	0.0277	2.6301	0.0092	**
Significance codes: * p < 0.001, p < 0.01, * p < 0.05, . p < 0.1

10 Findings and Discussion

10.1 Interpretation of Results

The Fama-MacBeth regression results reveal the following:

10.1.1 Market Value (`mvalue`)

The coefficient on market value is positive and highly statistically significant (p < 0.001). This finding is consistent with economic theory — firms with higher market value tend to invest more, reflecting stronger growth opportunities and easier access to capital markets.

An increase of $1 million in market value is associated with an average increase of approximately $0.13 million in gross investment, holding capital stock constant.

10.1.2 Capital Stock (`kstock`)

The coefficient on capital stock is also positive and statistically significant (p < 0.01). This suggests that firms with a larger existing capital base tend to invest more, possibly due to depreciation replacement needs or economies of scale in investment.

An increase of $1 million in capital stock is associated with an average increase of approximately $0.07 million in gross investment, holding market value constant.

10.1.3 Intercept

The intercept is negative and statistically significant (p < 0.05), which may reflect a base level of disinvestment or sunk costs when both market value and capital stock are at zero — an economically implausible scenario at the margins of the data range.

10.2 Model Fit

The model achieves an R-squared of approximately 0.871, meaning that market value and capital stock together explain around 87% of the variation in gross investment across firms and time periods. This is a strong fit for a two-variable model in empirical finance.

10.3 Economic Significance

Factor	Coefficient	Interpretation
Market Value	+0.1306	Higher market value → higher investment
Capital Stock	+0.0730	Larger capital base → higher investment
Intercept	-14.757	Base adjustment (negative)

11 Conclusion

11.1 Summary

This analysis successfully demonstrated the Fama-MacBeth (1973) two-step regression procedure using the classic Grunfeld Investment Dataset in R. Using the pmg() function from the plm package, we estimated a Mean Groups model equivalent to the Fama-MacBeth procedure.

11.2 Key Takeaways

The Fama-MacBeth procedure corrects for cross-sectional correlation in standard errors, providing more reliable statistical inference than standard OLS.
Both market value and capital stock are statistically significant determinants of gross investment, confirming the core predictions of the neoclassical investment theory.
The model fits the data well (R² ≈ 87%), suggesting that these two firm-level characteristics are powerful predictors of investment behavior.
The pmg() function in R offers an elegant and computationally efficient implementation of the Fama-MacBeth estimator through the Mean Groups framework.

11.3 Limitations and Future Work

The dataset is relatively small (200 observations, 10 firms, 20 years) and dates back to 1935–1954.
Future work could apply this methodology to modern stock return data with the Fama-French three-factor model (Market, SMB, HML factors).
Extensions could include Newey-West correction for serial autocorrelation in residuals.
Researchers might also consider rolling window estimation for time-varying risk premia.

12 Session Information

# ============================================================
# SESSION INFO — important for reproducibility
# Paste this into your RPubs submission for full transparency
# ============================================================
sessionInfo()

## R version 4.5.1 (2025-06-13 ucrt)
## Platform: x86_64-w64-mingw32/x64
## Running under: Windows 11 x64 (build 26200)
## 
## Matrix products: default
##   LAPACK version 3.12.1
## 
## locale:
## [1] LC_COLLATE=English_Indonesia.utf8  LC_CTYPE=English_Indonesia.utf8   
## [3] LC_MONETARY=English_Indonesia.utf8 LC_NUMERIC=C                      
## [5] LC_TIME=English_Indonesia.utf8    
## 
## time zone: Asia/Taipei
## tzcode source: internal
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] car_3.1-5        carData_3.0-6    corrplot_0.95    scales_1.4.0    
##  [5] broom_1.0.12     kableExtra_1.4.0 knitr_1.51       tidyr_1.3.1     
##  [9] dplyr_1.1.4      ggplot2_4.0.2    readxl_1.4.5     sandwich_3.1-1  
## [13] lmtest_0.9-40    zoo_1.8-14       plm_2.6-7       
## 
## loaded via a namespace (and not attached):
##  [1] gtable_0.3.6       xfun_0.53          bslib_0.9.0        collapse_2.1.6    
##  [5] lattice_0.22-7     vctrs_0.6.5        tools_4.5.1        Rdpack_2.6.6      
##  [9] generics_0.1.4     parallel_4.5.1     tibble_3.3.0       pkgconfig_2.0.3   
## [13] Matrix_1.7-3       RColorBrewer_1.1-3 S7_0.2.0           lifecycle_1.0.5   
## [17] compiler_4.5.1     farver_2.1.2       stringr_1.5.2      maxLik_1.5-2.2    
## [21] textshaping_1.0.3  htmltools_0.5.8.1  sass_0.4.10        yaml_2.3.10       
## [25] Formula_1.2-5      pillar_1.11.0      jquerylib_0.1.4    MASS_7.3-65       
## [29] cachem_1.1.0       abind_1.4-8        nlme_3.1-168       tidyselect_1.2.1  
## [33] bdsmatrix_1.3-7    digest_0.6.37      stringi_1.8.7      purrr_1.1.0       
## [37] splines_4.5.1      labeling_0.4.3     miscTools_0.6-30   fastmap_1.2.0     
## [41] grid_4.5.1         cli_3.6.5          magrittr_2.0.3     withr_3.0.2       
## [45] backports_1.5.0    rmarkdown_2.30     otel_0.2.0         cellranger_1.1.0  
## [49] evaluate_1.0.5     rbibutils_2.4.1    viridisLite_0.4.2  mgcv_1.9-3        
## [53] rlang_1.1.6        Rcpp_1.1.0         glue_1.8.0         xml2_1.4.0        
## [57] svglite_2.2.2      rstudioapi_0.17.1  jsonlite_2.0.0     R6_2.6.1          
## [61] systemfonts_1.3.2

13 Appendix: RPubs Publishing Instructions

13.1 Step 1: Knit to HTML

Open this .Rmd file in RStudio.
Click the Knit button (or press Ctrl+Shift+K / Cmd+Shift+K on Mac).
Select “Knit to HTML” from the dropdown.
Wait for the document to render — a preview will appear in the Viewer pane.

13.2 Step 2: Publish to RPubs

Method A — From the Knit Preview Window:

After knitting, click the “Publish” button (blue icon, top-right of the preview).
Select “RPubs”.
If prompted, create a free account at rpubs.com.
Add a title, description, and slug (the URL ending).
Click “Publish”.
Copy the generated RPubs URL and submit it.

Method B — From the RStudio Menu:

Go to File → Publish Document.
Select RPubs and follow the same steps above.

13.3 Troubleshooting Common Issues

Problem	Solution
`Error: package 'X' not found`	Run `install.packages("X")` in the console
`object not found` error	Make sure all chunks run in order; click Run All first
`could not find function 'pmg'`	Install/load `plm`: `library(plm)`
Dataset not loading	Check file path or use `data("Grunfeld", package = "plm")`
Knit fails with LaTeX error	Use `output: html_document` (not PDF) in the YAML
RPubs upload timeout	Reduce figure sizes or set `fig.width = 6` in setup chunk
`kableExtra` styling not showing	Ensure you are knitting to HTML, not PDF

## 
## === Document compiled successfully! ===

## Ready to knit and publish to RPubs.

HW5_114035109_Jayden Khalifa Armand

114035109 - Jayden Khalifa Armand

2026-05-19