Simulating Time-Series Cross-Sectional Data

Jonathan Kropko
February 13, 2017

Time-Series Cross-Sectional (TSCS) Data

TSCS data contains

  • N cases,
  • Observed repeatedly over T consecutive time points

Variables may be related to each other over time in a way that is different from how they are related cross-sectionally.

The challenge is to simulate data while controlling BOTH the over-time and cross-sectional slopes.

Formulas

Within one time point, we want the slope to be \( \beta \). Within one case, we want the slope to be \( \gamma \). So we have the system of equations, \[ \begin{cases} y_{it} = \alpha_t + \beta x_{it} + \varepsilon_{it},\\ y_{it} = \alpha_i + \gamma x_{it} + \delta_{it}, \end{cases} \] which solves to \[ x_{it} = \frac{(\alpha_i - \alpha_t) + (\delta_{it} - \varepsilon_{it})}{\beta - \gamma} \] and \[ y_{it} = \frac{(\beta \alpha_i - \gamma \alpha_t) + (\beta \delta_{it} - \gamma \varepsilon_{it})}{\beta - \gamma}. \]

The `twsimdata()` command

I authored a twsimdata() command to generate \( x_{it} \) and \( y_{it} \) from these equations. The data are saved in twdata.csv.

twdata <- read.csv("twdata.csv")
summary(twdata[,1:4])
      case           time            y                  x           
 Min.   : 1.0   Min.   : 1.0   Min.   :-1.66622   Min.   :-0.50828  
 1st Qu.: 8.0   1st Qu.: 8.0   1st Qu.:-0.51042   1st Qu.:-0.06943  
 Median :15.5   Median :15.5   Median :-0.03604   Median : 0.07913  
 Mean   :15.5   Mean   :15.5   Mean   :-0.04012   Mean   : 0.07839  
 3rd Qu.:23.0   3rd Qu.:23.0   3rd Qu.: 0.41392   3rd Qu.: 0.23062  
 Max.   :30.0   Max.   :30.0   Max.   : 1.85683   Max.   : 0.71479  

The Scatterplot

plot of chunk unnamed-chunk-1