X Wang - Assignment 1

PSCI 7108: Advanced Data Analysis III

1: Computing Environment

Install R and \( \LaTeX \).

2: Least Squares Regression

a: Using matrix notation, write a function that estimates the Least Squares coefficients and the \( \hat{\sigma^{2}} \) parameter given two inputs: a vector of dependent variables and a matrix of explanatory variables.

The matrix notation for Least Squares coefficients is: \( \hat{\beta}=(X'X)^{-1}X'\vec{y} \).

# A function that estimates Least Squares coefficients given X, a matrix of
# explanatory variables and y, a vector of dependent variables
betaCoeffs <- function(X, y) {
    return((solve(t(X) %*% X)) %*% (t(X) %*% y))
}

\( \hat{\sigma^{2}} \) is the variance of the residuals. Its matrix notation is: \( \hat{\sigma^{2}} = \large{\frac{\sum\epsilon^{2}}{n-k}} \), where \( \epsilon \) are the errors, \( n \) is the number of observations, and \( k \) is the number of explanatory variables. The denominator is the degrees of freedom.

# A function that estimates sigma-hat squared given X, y, and b, a vector of
# Least Squares coefficients
SigHatSq <- function(X, y, b) {
    return((sum((y - X %*% b)^2))/(nrow(X) - ncol(X)))
}


b: Using functions from (a), estimate the least squares coefficients and \( \hat{\sigma^{2}} \) parameter for the provided dataset. Include an intercept in the model as well. Report the results in the form of a \( \LaTeX \) table.

# Install and load libraries
install.packages("foreign")
## Error: trying to use CRAN without setting a mirror
library(foreign)
install.packages("micEcon")
## Error: trying to use CRAN without setting a mirror
library(micEcon)
## Loading required package: miscTools
install.packages("xtable")
## Error: trying to use CRAN without setting a mirror
library(xtable)

# Load and examine dataset
data1 <- read.dta("/Users/squishy/Dropbox/PSCI 7108/Assignment 1/assign1.dta")
str(data1)
## 'data.frame':    1000 obs. of  4 variables:
##  $ Y : num  6.82 16.24 25.76 38.62 10.95 ...
##  $ X1: num  0.886 3.609 0.862 4.969 -3.213 ...
##  $ X2: num  -10.42 -5.606 8.856 -1.961 0.879 ...
##  $ X3: num  5.28 5.11 5.06 10.09 5.59 ...
##  - attr(*, "datalabel")= chr ""
##  - attr(*, "time.stamp")= chr "19 Jan 2013 13:32"
##  - attr(*, "formats")= chr  "%10.0g" "%10.0g" "%10.0g" "%10.0g"
##  - attr(*, "types")= int  255 255 255 255
##  - attr(*, "val.labels")= chr  "" "" "" ""
##  - attr(*, "var.labels")= chr  "Y" "X1" "X2" "X3"
##  - attr(*, "version")= int 8

# Separate dependent and explanatory variables into matrices
Y.var <- as.matrix(data1[, 1])  # Dependent variable into vector
X.vars <- as.matrix(data1[, 2:4])  # 3 explanatory variables into matrix
X.vars <- insertCol(X.vars, 1, v = 1)  # Add a column of 1's for intercept B_0 

# Test functions for beta coefficients and sigma-hat squared
betas <- betaCoeffs(X.vars, Y.var)
sigmahat.sqrd <- SigHatSq(X.vars, Y.var, betas)
betas
##       [,1]
##    0.04335
## X1 1.98823
## X2 1.00338
## X3 3.00062
sigmahat.sqrd
## [1] 0.4542

# Check coefficients from betaCoeffs against those from lm function
data1_lm <- lm(data1$Y ~ data1$X1 + data1$X2 + data1$X3)
summary(data1_lm)  # lm estimates for betas match those above, lm estimate for sigma (residual standard error) is the square root of sigma-hat squared above  
## 
## Call:
## lm(formula = data1$Y ~ data1$X1 + data1$X2 + data1$X3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.1201 -0.4453 -0.0101  0.4420  2.0733 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.04335    0.04296    1.01     0.31    
## data1$X1     1.98823    0.00746  266.38   <2e-16 ***
## data1$X2     1.00338    0.00349  287.46   <2e-16 ***
## data1$X3     3.00062    0.00712  421.64   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.674 on 996 degrees of freedom
## Multiple R-squared:  0.997,  Adjusted R-squared:  0.997 
## F-statistic: 1.04e+05 on 3 and 996 DF,  p-value: <2e-16

# Print in LaTeX
betas.table <- xtable(betas)
print(betas.table, floating = F)
## % latex table generated in R 3.0.1 by xtable 1.7-1 package
## % Wed Sep 11 11:41:12 2013
## \begin{tabular}{rr}
##   \hline
##  & x \\ 
##   \hline
##  & 0.04 \\ 
##   X1 & 1.99 \\ 
##   X2 & 1.00 \\ 
##   X3 & 3.00 \\ 
##    \hline
## \end{tabular}

See \( \LaTeX \) document for results table.


c: Randomly draw 100 datasets with 50 observations each from the dataset in part (b) with replacement, and extract the mean and standard deviation for each of the four variables for each of the 100 datasets. How do they compare to the mean and standard deviation for the dataset overall? Report the results in the form of a \( \LaTeX \) table.

# Set variables and create matrix to save results
sample.obs <- 50  # Number of observations for each simulation
sims <- 100  # Number of simulations
obs <- nrow(data1)  # Counts total observations in dataset
sample.space <- 1:obs  # Creates the sample space from which sample() can draw from
results <- matrix(NA, nrow = sims, ncol = 8)  # Creates a results matrix in which to store the results of the simulations
colnames(results) <- c("Y Mean", "Y SD", "X1 Mean", "X1 SD", "X2 Mean", "X2 SD", 
    "X3 Mean", "X3 SD")  # Name columns of the matrix

# Set for loop
set.seed(23123)  # Sets random seed number
for (i in 1:sims) {
    random.draws <- sample(sample.space, sample.obs, replace = T)  # Stores 50 random numbers sample() draws from sample space of 1000, replaces draw each time
    subset <- subset(data1[random.draws, ])  # Subsets 50 observations from original dataset based on indices from random.draws
    results[i, 1] <- mean(subset[, 1])  # Saves mean of Y values from 50 observations into 'Y Mean'' column in results matrix
    results[i, 2] <- sd(subset[, 1])
    results[i, 3] <- mean(subset[, 2])
    results[i, 4] <- sd(subset[, 2])
    results[i, 5] <- mean(subset[, 3])
    results[i, 6] <- sd(subset[, 3])
    results[i, 7] <- mean(subset[, 4])
    results[i, 8] <- sd(subset[, 4])
}

# View simulations and find the mean
results
##        Y Mean   Y SD X1 Mean X1 SD  X2 Mean X2 SD X3 Mean X3 SD
##   [1,]  17.02 11.901  1.8615 2.492 -0.36936 6.573   4.536 3.034
##   [2,]  17.48 11.821  1.2866 3.222 -0.22769 6.045   4.991 3.050
##   [3,]  17.75 13.108  1.7072 3.064  0.72710 7.549   4.570 3.360
##   [4,]  19.27 10.986  1.7845 2.714  0.18357 6.370   5.146 3.067
##   [5,]  20.58 11.481  1.8131 3.371  0.14250 6.242   5.574 3.421
##   [6,]  20.77 12.544  1.8104 2.559  0.27961 5.997   5.601 3.085
##   [7,]  18.56 12.775  1.5416 2.801 -0.14716 6.600   5.196 2.703
##   [8,]  17.95 12.139  1.5876 2.965  1.47056 6.276   4.463 2.764
##   [9,]  14.85 12.409  0.6939 2.709 -0.37333 6.658   4.650 3.066
##  [10,]  15.22  9.854  0.6048 2.750 -0.38508 6.238   4.754 2.795
##  [11,]  17.24 11.960  1.3043 2.605 -0.91309 6.150   5.156 2.711
##  [12,]  17.44 10.743  1.6846 2.703  0.18954 4.888   4.600 3.079
##  [13,]  20.09 13.520  1.6476 2.919  0.21748 6.982   5.512 3.346
##  [14,]  16.58 10.820  1.7552 2.662 -1.22052 6.769   4.812 2.628
##  [15,]  21.13 11.432  1.4397 2.878  0.77817 6.005   5.832 3.034
##  [16,]  17.33  9.372  1.6418 2.975 -0.18790 5.788   4.742 3.531
##  [17,]  18.48 11.733  1.4698 3.333  0.58761 5.790   4.971 2.581
##  [18,]  21.05 12.346  1.9446 2.848  0.13384 6.107   5.695 3.172
##  [19,]  16.66 11.280  1.7448 2.927 -0.22726 5.914   4.430 2.655
##  [20,]  16.98 12.175  0.8356 3.077  0.20915 5.888   5.062 3.129
##  [21,]  20.99 13.215  1.4460 3.332  1.64696 5.733   5.461 3.405
##  [22,]  15.96  9.487  1.8300 3.189 -2.02406 5.573   4.768 2.550
##  [23,]  19.22 12.094  1.2012 2.743  0.22667 6.722   5.553 2.932
##  [24,]  15.52 11.126  1.0662 3.084 -0.60252 5.903   4.619 3.071
##  [25,]  17.91 13.529  0.9095 3.209 -0.22583 6.752   5.396 3.676
##  [26,]  18.54 11.756  1.9732 2.693 -1.16355 5.796   5.242 2.959
##  [27,]  16.59 11.527  1.4296 2.820 -1.30590 7.857   5.005 2.731
##  [28,]  20.28 13.820  1.9902 2.638 -0.41422 6.949   5.619 3.560
##  [29,]  16.40 13.593  0.8175 2.656 -0.71239 7.519   5.080 3.472
##  [30,]  17.47 10.845  1.7489 3.236  0.03586 6.816   4.609 3.033
##  [31,]  16.51 10.187  0.5944 2.320  0.09657 5.634   5.117 2.973
##  [32,]  16.45 12.062  1.3976 2.730 -1.39116 4.943   5.012 2.880
##  [33,]  17.85 10.763  1.6828 3.028  0.28990 5.062   4.728 2.478
##  [34,]  14.36  9.820  0.8604 2.616 -0.06828 4.815   4.230 2.515
##  [35,]  19.71  9.879  1.8419 3.013 -0.58851 5.742   5.528 2.600
##  [36,]  20.88 13.144  0.8034 2.936  2.19161 7.367   5.690 3.199
##  [37,]  18.31 10.204  1.9359 2.522  0.55849 5.378   4.635 2.935
##  [38,]  16.44 10.187  1.6611 3.020 -1.26196 5.048   4.759 2.811
##  [39,]  19.77 12.346  1.7400 2.411  0.76337 7.182   5.160 2.997
##  [40,]  14.47  9.558  1.2429 3.053 -1.98218 5.519   4.606 3.311
##  [41,]  18.44 10.053  1.1237 3.257 -0.52593 5.188   5.607 3.117
##  [42,]  19.08 11.715  0.7108 2.362  1.48267 5.870   5.342 2.871
##  [43,]  16.81 10.916  0.9511 2.434  0.61137 7.173   4.700 3.264
##  [44,]  19.64 11.864  1.7095 3.228 -0.18880 6.434   5.487 2.801
##  [45,]  22.00 13.270  1.8013 2.820 -0.20516 6.078   6.178 3.897
##  [46,]  20.33 11.056  1.9588 3.033 -0.42227 6.577   5.641 2.990
##  [47,]  20.17 11.890  1.2663 2.608  1.17607 6.399   5.482 2.998
##  [48,]  17.89 12.021  1.4650 2.703  0.33477 6.237   4.903 3.064
##  [49,]  17.07  9.739  1.4552 3.193 -0.37086 6.202   4.845 3.059
##  [50,]  20.30 11.214  1.5431 2.931 -0.81264 5.719   6.025 3.232
##  [51,]  17.12 13.491  1.1658 3.491 -0.78745 6.244   5.178 3.008
##  [52,]  20.21 10.340  1.9956 3.182 -0.78463 5.630   5.704 3.083
##  [53,]  19.37 11.027  1.1971 2.740  1.08740 6.116   5.249 2.746
##  [54,]  18.77 11.978  1.5581 3.062 -0.25144 5.642   5.300 3.163
##  [55,]  18.20 12.787  1.8424 3.051 -1.11257 6.114   5.236 2.761
##  [56,]  15.87 12.052  1.5562 2.693 -0.99649 5.980   4.592 2.772
##  [57,]  15.68  9.408  1.0464 2.933 -1.33600 5.946   4.992 2.519
##  [58,]  17.53 13.185  1.1767 3.101 -0.22347 6.299   5.104 2.949
##  [59,]  19.87 10.920  1.6988 3.007  1.20605 5.472   5.112 3.173
##  [60,]  18.45 12.983  1.7802 3.388  0.03161 5.565   4.950 3.270
##  [61,]  17.90 13.469  0.7492 2.657 -0.80450 6.306   5.719 3.510
##  [62,]  19.56 11.468  1.6709 3.104  1.10222 5.293   5.066 3.285
##  [63,]  15.31 11.045  0.8664 2.692 -0.79822 5.855   4.733 2.994
##  [64,]  20.05 12.217  1.1946 2.675  1.19247 6.915   5.464 2.928
##  [65,]  18.49 10.651  1.7479 2.729  0.37259 5.839   4.878 2.382
##  [66,]  17.87 13.263  1.6356 2.705 -1.81644 6.856   5.509 3.330
##  [67,]  16.17 14.138  0.9683 3.057 -0.12267 7.077   4.776 3.391
##  [68,]  18.61 13.998  1.7668 3.049 -0.04324 6.410   5.029 3.331
##  [69,]  20.97 13.534  1.9160 2.826 -0.38200 6.706   5.873 2.882
##  [70,]  16.60 12.389  1.4971 2.825  0.23691 6.335   4.432 3.280
##  [71,]  18.01 11.320  1.3549 3.089  0.84652 5.057   4.803 2.918
##  [72,]  19.80 10.774  2.0011 3.136 -0.14228 5.466   5.347 3.167
##  [73,]  19.05  8.524  1.5259 2.334 -0.26725 6.580   5.432 2.682
##  [74,]  17.41 11.656  1.3538 2.422 -0.31380 6.319   5.019 3.320
##  [75,]  17.42 13.237  1.7837 2.920 -0.35611 7.346   4.786 3.349
##  [76,]  20.97 14.383  1.5866 3.182  0.71396 6.201   5.683 3.035
##  [77,]  18.44 12.124  1.5824 2.544 -1.27225 5.925   5.440 2.922
##  [78,]  19.30 10.436  1.5555 2.730  1.00065 6.026   5.099 2.529
##  [79,]  18.54 14.151  1.6347 3.144 -0.32224 7.140   5.158 2.977
##  [80,]  16.03 10.861  1.1623 2.765  0.36574 5.948   4.464 2.914
##  [81,]  20.15 13.326  1.2275 3.122  0.44419 6.115   5.758 3.385
##  [82,]  19.11 12.238  1.3362 2.809  1.03699 6.666   5.160 3.045
##  [83,]  16.65 10.448  1.3724 2.609 -0.66671 5.670   4.823 3.260
##  [84,]  16.46 11.769  1.8061 2.946 -0.05209 6.232   4.319 3.051
##  [85,]  19.47 12.902  1.9352 2.748  0.39740 6.019   5.051 3.698
##  [86,]  19.04 10.918  1.1786 2.877  0.79153 6.385   5.312 3.360
##  [87,]  17.11 11.219  1.0956 2.701 -0.44443 6.570   5.140 3.339
##  [88,]  18.39 12.479  2.5083 2.990 -1.84014 5.744   5.030 3.153
##  [89,]  17.30 11.706  1.3355 3.159 -0.01876 5.219   4.849 3.036
##  [90,]  17.99 12.744  2.0388 3.154 -0.10282 6.369   4.684 2.927
##  [91,]  17.32  9.753  1.9726 2.428 -1.98578 6.312   5.110 2.261
##  [92,]  15.20 12.492  0.9629 2.418 -0.75902 5.287   4.706 3.535
##  [93,]  18.87 12.777  1.3912 2.660  0.04582 6.127   5.324 2.902
##  [94,]  17.35  9.302  1.2882 2.894  1.01190 6.237   4.559 2.418
##  [95,]  18.51 12.735  1.3942 3.098 -0.59210 6.030   5.429 2.714
##  [96,]  19.65 10.880  1.8229 2.460 -0.15868 6.663   5.343 2.781
##  [97,]  17.13 10.935  1.8492 3.267 -0.12092 6.395   4.464 2.372
##  [98,]  16.27 13.151  1.7720 2.919  0.83611 5.589   3.986 2.993
##  [99,]  17.87 12.131  1.6688 2.623 -0.31664 5.768   4.988 2.923
## [100,]  17.62 12.606  0.7154 3.068 -0.42315 5.292   5.485 3.235
apply(results, 2, mean)  # Finds mean of results by column
##  Y Mean    Y SD X1 Mean   X1 SD X2 Mean   X2 SD X3 Mean   X3 SD 
## 18.1285 11.7560  1.4756  2.8757 -0.1088  6.1431  5.0897  3.0255

# Find mean and standard of dataset
y.mean <- mean(data1[, 1])
y.sd <- sd(data1[, 1])
x1.mean <- mean(data1[, 2])
x1.sd <- sd(data1[, 2])
x2.mean <- mean(data1[, 3])
x2.sd <- sd(data1[, 3])
x3.mean <- mean(data1[, 4])
x3.sd <- sd(data1[, 4])
y.mean
## [1] 17.92
y.sd
## [1] 11.91
x1.mean
## [1] 1.49
x1.sd
## [1] 2.857
x2.mean
## [1] -0.07935
x2.sd
## [1] 6.133
x3.mean
## [1] 4.995
x3.sd
## [1] 3.008

See \( \LaTeX \) document for results table and comparison.