Econometrics

Econometric is a branch in economics focus on the use of mathematics to answer economic questions. Econometric mostly uses statistics as a tool to tackle the question at hand. For example econometric would be use to answer the demand of a certain good. A linear regression is use to estimate the benefits and cost of education. Linear regressions are also use in finance to estimate the risk of an asset given certain explanatory variables (\(X\)’s). Some math concepts taught in econometric are: statistics; linear regression, time series and cross section estimations, panel data, testing how well the model explains the \(Y\) variable, calculus; derivatives, maximization or minimization, finding the zero point, probability, and linear algebra. Luckily, today computer software like R, STATA, and Python makes learning econometric much easier.

The basic way econometric is use in analysis is: getting data, from the data finding the dependent & independent variables, create a model, test the model for how well explain the data, and explain the findings. For example, tracking sales for five stores of a corporation. The dependent variable, \(Y\) would be sales. The independent variables, \(X_1, X_2,\) etc. would be: location, marketing campaign, number of customers trough years, cost, etc. Having the data from the dependent and independent variables a model can be created and would predict future sales. The model would also explain how much the independent variables affect sales on the five stores.

Notations
Generally speaking, capital letters are for representing the population. Lower case letters are for representing the sample. For example \(N\) referes to the population size; \(n\) refers to sample size.
1. \(\alpha\) Significance level in hypothesis test. Acceptable probability of a Type I error. (1- \(\alpha\)) = confidence level.
2. \(\beta\) (in hypothesis test) the acceptable probability of Type II error. (1-\(\beta\)) is called the power of the test.
3. \(\mu\) Mean of a population.
4. \(\nu\), \(df\) Degrees of freedom.
5. \(\rho\) Linear correlation coefficient of a population.
6. \(\sigma\) Standard deviation of a population.
7. \(\sigma_{\bar{x}}\) Standard error of the mean.
8. \(\sigma^2\) The variance of the population.
9. \(\bar{x}\) Sample mean
10. \(X_i\) The population
11. \(x_i\) The sample independent variable.
12. \(x_k\)
13. \(\hat{y}\) Sample estimate.
14. \(N\) Population size, the number of elements in the population.
15. \(\beta_0\) The intercept constat in a regression line.
16. \(\beta_1\) The regression coefficient, slope, of a regression line.
17. \(n\) Sample size 18. \(k\) Numbers of \(x\)

Statistics is commonly use in social science because the direction and magnitude are important. Social scientist want to find out where the trend is going and how fast. For example, is GDP per capital is increasing or decreasing. How fast is increasing or decreasing is also important. Another example would be: the effects of sugar in the brain. One group of volunteers would be given sugar and another would not be given sugar. Time would be accounted for the sugar to kick in and a set of tasks the volunteers would be subject too. The time it would take to complete the task would be analyze between the volunteers who took sugar and who did not took sugar.

There are different types of data and different ways of analyzing the data. The three main type of data in econometrics are:

Cross Sectional Data is each observation is a new individual, firm, person, etc. with information at a point of time. For example observing five individual at a specific time. The observations need to be random samples.
Time Series Time series data has a separate observation for each time period.
Panel Data Can fallow the same random individual observation over time; is a mixture of cross sectional and time series data.

The questions of causality: One major task in econometric is to establish the casual effect. For example, having an education would make a person more productive. “Correlation does not imply causation”

Mathematics use in Econometrics

Econometrics is using mathematics to answer economic questions. Because of this fact, econometrics uses different types of mathematics. For example, calculus, statistics (even though statistics is not af branch in mathematics), probability, linear algebra, etc. Econometrics mostly uses statistics, but is is not only statistics.

Summation Operator

The summation operator is widely used in econometrics. For example; Deviation from the average \(\sum_{i=1}^n(x_i - \bar{x})\). \(\bar{x}\) which shows the sample average. Another example, \(\bar{x} = \frac{1}{n}\sum_{i=1}^nx_i\) which says the summation from the divination from the average is always going to equal 0.

Sum of squares deviation \(\sum_{i=1}^n(x_i - \bar{x})^2\), this is not going to give = 0.

Formula for coveriance \(\sum_{i=1}^n(x_i - \bar{x})(y_i - \bar{y})\) = \(\sum_{i=1}^nx_iy_i-n\bar{x}\bar{y}\). r code cov()

Linear functions

Linear functions play an important role in econometric because they are simple to interpret and manipulate. If \(x\) and \(y\) are two variables then we say that \(y\) is a linear function of \(x\) and \(\beta_0\) and \(\beta_1\) are two parameter numbers that describe this relationship. (\(\Delta{Y} = B_1\Delta{x}\)). Why is linear function so good because we can get ceteris paribus, we can get partial derivative much easier. \(\beta_0\) & \(\beta_1\) are also called partial effect on \(x_1\) on \(y\), because the partial effect involves holding other factors fixed.

Proportions and Percentage

Proportional change = \(\frac{x_1 - x_0}{x_0}\) or \(\frac{\Delta x}{x_0}\). The proportionate change in a variable, multiplied by 100. Percentage point change is the change in a variable that is measured as a percentage.

Statistics

Statistics is the major concept use in econometric’s.

Random variable is one that takes on numerical values and has outcome that is determined by an experiment. Example, heads and tells; giving values as 0 for heads and 1 for tails is a random variable and it has an outcome that is determine by an experiment.

Expectations: is the expected value. If X is a random variable, the expected value or expectation of x, denoted E(X) or \(\mu x\) or \(\mu\), is a weighted average of all possible values of x. The expected value some times is call the population mean, is the population average. The average height is the expectation.
Properties of expectations:

\(E(c)\) = \(c\)
Constant = \(a\),\(b\): \(E(aX + b)\) = \(aE(X)\) + \(b\)
If \(k\) is a constant, \(E(kx)\) = \(kE(x)\)
If \(x\) and \(y\) are two random variables, \(E(x+y)\) = \(E(x) +E(y)\)
If \(x\) and \(y\) are two independent random variables, \(E(xy) = E(x)*E(y)\)

Variance
How far is x to the expected value. \((x - \mu)^2\). or \(Var(x)\). how far is \(x\) from the average.
Properties of variance.

\(Var(c) = 0\)
\(Var(aX +b) = a^2*var(x)\), for any constant \(a\).
\(Var(aX+bY) = a^2Var(X) + b^2Var(Y) + 2abCov(X,Y)\), for any constant \(a\) and \(b\).

Standard divination is related to variance. Standard diviation is related to the z score.

Sample Varaince:
\[s^2 = \frac{1}{n-1} \sum_{t=1}^n (x_1 - \bar{x})^2\]

Population Variance:
\[\sigma^2 = \frac{1}{N} \sum_{t=1}^N (x_{1} - \mu)^2\]

duration = faithful$eruptions
var(duration)

## [1] 1.302728

The variance is telling how the data points are spread out. Is the adverage distance of each data point to the mean. \(N\) is the population size and \(x_i\) are the data points.

Co-variance
If the variance is positive both would get larger with higher probability.
properties of co-variance.

\((x, y) = 0\)
the magnitude of covariance does not tell you a lot of things. Sign is important.
\(cov(aX, bY) = abCov(X,Y)\)

Correlation

\(Corr(x,y) = \frac{Cov(x,y)}{sd(x)sd(y)}\)
Co-variance and correlation measure the linear relationship between two random variables symmetrically.

Conditional Expectation
Expectation of a variable given x a specific value. Regression model is trying to explain conditional expectation. \(E(Y|x)\). We can compute the expected value of Y given that we know this outcome of X. as x changes expectation of y changes. Conditional expectation is use to measure the causality.

Properties of expectations:

\(E(c(X)|X) = c(X)\)
For functions \(a(X)\) and \(b(X)\): \(E(a(X)Y + b(X)|X) = a(X)E(Y|X) + b(X)\)
If \(X\) and \(Y\) are independent, then \(E(Y|X) = E(Y)\)
\(E(E(Y|X)) = E(Y)\). Law on itinerary expectation
If \(E(Y|X) = 0\), then \(Cov(X,Y) = 0\). every function of \(X\) is uncorrelated with \(Y\)

Conditional Variance \(Var(Y|X = x) = E(Y^2|X) - [E(Y|X)]^2\)
Play an important role in econometric. example, as income increases the rate of savings increases.

Population
Population, Parameters, and random sampling are all part of econometric analysis. Econometrics involves learning something about a population or data set from a sample of that population. The first step is to identify the population. Specify a model for the population relationship of interest. Such models involve probability distributions or features of probability distribution.

Properties

Unbiaseness. Expectation of estimator = data. If ab estimator is unbiased, then its probability distribution has an expected value equal to the parameter it is supposed to be estimating. \(E(W)\). Expectation is trowing a die is 3.5 the average. \(\mu\) = \(\bar{Y} =\frac1n \sum_{i=1}^{n}Y_i\). \(\bar{Y}\) is the sample average and viewed as an estimator.
For evaluating estimation procedures, we study various properties of the probability distribution of the random variable W. The distribution of an estimator is often called its sampling distribution.
The sampling Variance of estimators.
\(Var(W) = \sigma^2\)
Second criteria would be efficiency. The estimator with the less variance. Distribution bell curve would be flatter if the distribution is not efficient. Bigger variance.

Simple Regression Model

The formula for a simple regression model is:
\[Y_i = \beta_0 + \beta_{1}X_i = u_i\]
The r comand is the lm() function.

\(\beta_0\): is the intercept of the regression model. In a simple regression is the average \(Y\) when \(X\) = 0
\(E(Y|X) = \beta_{0} + \beta_{1}X\): This shows linear relation between \(X\) and \(Y\).
\(u_i\) is the error term.

Least square and method of moment would give you the same information.
Know equation of \(\hat{u}\)
method of moment is a newer method of least square.
OLS regression line:
\[\hat{y} = \hat{\beta_{0}} + \hat{\beta_{1}}x\] Sum of deviation to the mean: is equal to 0
\[\sum_{i=1}^n x_i(y_i-\bar{y})=0\] co-variance:
\[\sum_{i=1}^n (x_i - \bar{x})*(y_i - \bar{y})\] Least Square: \[\sum_{i=1}^n (\hat{u}_i)^2 = \sum_{i=1}^n(y_i - \hat{\beta_0}-\hat{\beta_1}x_i)^2\] How \(\hat{\beta}_1\) gets estimated. numerator is co-variance and denominator is variance of x. Variance of x cannot be negative or zero. if correlation is positive co-variance is positive, same as negative.
\[\hat{\beta}_1=\frac{\sum_{i-1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i-1}^n(x_i-\bar{x})^2}\] One important part is that the denominator needs to have some correlation, it cannot be negative or zero.

OLS regression line also called sample regression function SRF:
\[\hat{y} = \hat{\beta_0} + \hat{\beta_1}x\]

Total Sum of Squares (SST)
\[\sum_{i-1}^n(y_i-\bar{y})^2\]

Explained Sum of Squares (SSE)
\[\sum_{i-1}^n(\hat{y}_i-\bar{y})^2\]

Residual Sum of Squares (SSR)
\[\sum_{i-1}^n(\hat{u})^2\]
the same as
\[\sum_{i-1}^n(y_i-\hat{y})^2\]

R Code

This R Code would create a simple linear regression.

# Set up the data
STR = c(14, 16, 18, 19, 21, 22, 24)
TestScores = c(679, 639, 669, 659, 629, 659, 634)
linear_model = lm(TestScores ~ STR) # The lm is what creates the OLS regression.

Plot the residuals

 plot(linear_model$fitted.values, # Here you put the model name
    linear_model$residuals, type = "p", 
     ylab="Residuals", xlab="Fitted Value", 
     main="Residuals vs Fitted Value") 
 abline(0, 0)

Plot the simple regression model.

# plot the data
plot(TestScores ~ STR,
     main = "Scatterplot of TestScore and STR", 
     xlab = "STR (X)",
     ylab = "Test Score (Y)",
     xlim = c(10, 30),
     ylim = c(600, 720))
abline(linear_model, col = "red") # Adds a line in the plot

par(mfrow = c(2,2)) # Sets 2x2 view of charts
plot(linear_model)

Multiple Regression Model

Hypothesis Testing

Hypothesis Testing is base on the idea that we can tell about a population from a sample taken.

Null hypothesis: a statement of the status. The null hypothesis refers to a specified value of the population parameter, not a sample statistic. Null hypothesis = \(H_0\). The status-quo.

Type one error:the error of rejecting a null hypothesis when it is actually true. In other words, this is the error of accepting an alternative hypothesis (the real hypothesis of interest) when the results can be attributed to chance. probability of type one error is significant level. The probability of commiting a Type I error is called the significance level and is denoted by \(\alpha\).

Alternative Hpothesis: one in which some difference or effect is expected. What you are trying to prove.
True state: \(H_0\) Alternative hypotheses \(H_1\) When known hypothesis is true but conclude is wring. type one error. When null hypothesis is wrong but conclude is true. The probability of type one error and type two error are negative correlated. Probability of having type one error is significant error.

Type two error: false negative: the error of not rejecting a null hypothesis when the alternative hypothesis is the true state of nature.
The null hypothesis \(H_0: \beta_2 = 0\) means that once education and tenure have been accounted for, the number of the number of years in the workforce (experience) has no effect on hourly wage.

\(\alpha\) is the significant error. \(\alpha\) shows the probability that you are going to reject the hypotheses even if is true. You have to have some probability to make mistake.

Linear Conbination Testing
\[t = \frac{\hat{\beta}_1 - \hat{\beta}_2}{se(\hat{\beta}_1 - \hat{\beta}_2)}\]
Is use to test \(H_0\) whether \(\hat{\beta_1} = \hat{\beta_2}\). The problem with the above equation is we don’t know the standard error of \(\beta_1\) and \(\beta_2\).

The hard part is obtaining the coefficient estimates, the standard error, and the critical values.
The statistics to test against any alternative \(H_0: \beta_{3} = 0\) is called t-statistics, t ratio. \(t_{educ}\) would be for \(\beta_{educ}\)
A sample value of \(\beta_{3}\) very far from zero provides evidence in our estimate against \(H_0: \beta_{3} = 0\).

Significance level: The probability of rejecting \(H_0\) when it is in fact true.
When p-value is smaller than alpha - \(\alpha\) you reject the null hypothesis.

In order to reject \(H_0: \beta_j = 0\) in favor of \(H_1:\beta_j>0\). negative values of \(t_{\hat{\beta}_j}\) provide no evidence in favor of \(H_1\).

Small p-values are evidence against the null. large p-values provide little evidence against the null.
The economic significance or practical significance of a variable is related to the size and sign of \(\hat{\beta}_j\). Using a smaller significance level means that economic and statistical significance are more likely to coincide.

When there is more then one independent variable, you cannot use t-test. F-test is use if there are more than one restriction.
\[log(wage) = \beta_0 + \beta_{1}educ + \beta{2}exper + \beta{3}tenure + u\]

The null hypothesis \(H_0:\beta_2 = 0\) means that, once education an tenure have been a counted for, the number of years in the workforce \(exper\) has no effect on hourly wage. If it is true, it implies that a person’s work history prior to the current employment does not effect wage. If \(\beta_2>0\), then prior work experience contributes to productivity, and hence to wage.

Gauss-Markov assumption

Linearity, the parameter estimated must be linear.
Random, the data must be randmly sampled from the population.
Exogeity, The regressors are not correlated with the error term.
Non-Collinearity, the regressors being estimated aren’t perfectly correlated with each other.
Homoscedasticity, the varaibce of the error term is constant.

Without homoscedasticity assumption the model still works but it would be heteroscedasticity. There are ways of dealing with it.

What is the formula of robust standard errors insimple regression model? Inmultiple regression model? Notice that the estimator of robust sandard erros is only consisten, in the sense that it is only valid when sample size is large enough.
the formula for the robust F test using the LM test. the fallowing steps are:

Run OLS on the restricted model and save the resudal, \(\hat{u}\).
Regress each of the excluded varaibles on all of the included variables and save each set of residuals.
Regress a varaible definded to be = 1 on \(\hat{r}\).
The LM statsitic is n - SSR1, where SSr1 is the sum of squared residuals from this fianl regression.

One importaint part of CH8 was the idea of homoscedasticity assumption. Without this assumption, estimator will still be unbiased and consistent as long as the first four assumotion in Gaus markov assuption. But the inference method will all fail.

\(Var(u|x_1, x_2, x_3) = \delta^2\). This most be constant for homoskedasticity to work. You use the BP test or the white test to check whether this is true.

When there is heteroskedasticity we must use the robust standar error. If possible, dont use model with heteroskedasticity. Use another model if you need to do a sandard error test.
\(\frac{\Sigma_{i=1}^n (x_i - \bar{x})\hat{u}_{i}^2}{\Sigma_{i=1}^n(x_1 - \bar{x})^2}\)
If OLS is BLUE stimator then Robust is not the best.

R Code

Multi Varaible Regression Model

# Set up the data
STR = c(14, 16, 18, 19, 21, 22, 24)
TestScores = c(679, 639, 669, 659, 629, 659, 634)
GPA = c(3.7, 2.9, 2.0, 3.5, 3.0, 2.8, 3.2)
model = lm(TestScores ~ STR + GPA)

Results of the model

summary(model) # Results of the model

## 
## Call:
## lm(formula = TestScores ~ STR + GPA)
## 
## Residuals:
##       1       2       3       4       5       6       7 
##  11.560 -23.869  10.476   7.075 -17.673  15.067  -2.637 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  720.553     65.042  11.078 0.000378 ***
## STR           -3.194      2.227  -1.434 0.224817    
## GPA           -2.270     14.056  -0.162 0.879512    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 18.77 on 4 degrees of freedom
## Multiple R-squared:  0.3401, Adjusted R-squared:  0.0101 
## F-statistic: 1.031 on 2 and 4 DF,  p-value: 0.4355

Plotting in Three dimension

# plot the data
p <- plot_ly(x = ~GPA, y = ~STR, z = ~TestScores, type = 'scatter3d', mode = 'scatter')
p

Different Types of Models

When comparing models with log and with no log, you cannot compare \(R^2\) or adjusted \(R^2\). The reason you cannot compare the two model is because the \(Y\) variable must be on the same unit of account.

Models with interactions: to find the change in \(y\) given a change in x a derivative is taken out from the formula and bake \(x_2\) the average of \(x_2\).

Models with Quadratics

Quadratics Functions are also used to capture decreasing or increasing marginal effects.
When the coefficient of \(x\) is positive and the coefficient on \(x^2\) is negative, the quadratic has a parabolic shape. There is always a positive value on \(x\) where the effect of \(x\) on \(y\) is zero.
The turning point (maximum of the function) is always achieve at the coefficient of \(x\) over twice the absolute value of the coefficient on \(x^2\). Quadratic functions are used to check elasticity.
\[y = \beta_0 + \beta_1x + \beta_2x^2 + u\]

Models with quadratic \(x\) variables. To find the min or max of a quadratic equation; take the first derivative. To see if you are finding a minimum or maximum is shown by whether beta 1 is positive or negative.

Quadratic covariates: if \(\beta_1\) is positive and \(\beta_2\) is negative it crease a concave curve. if beta 1 is negative and beta 2 is convex. the use the equation above to find the max or min.

In the model above, we cannot interpret \(\beta_1\) as measuring the change in \(y\) when \(x_1\) change, we need to take into account \(\beta_2\).

\[ \frac{\Delta{\hat{y}}}{\Delta{\hat{x_1}}}= \hat{\beta_1}+2\hat{\beta_2}x_1\]
In the case where \(\beta_1\) is positive and \(\beta_2\) is negative, we can find the value of \(x_1\) so that the change in \(x_1\) around that value has the larger effect.

Natural Logarithm

The difference of the log differences can approximate proportional change. When the difference between \(x_0\) and \(x_1\) are not too large the approximation works very well. When the changes are big it does not work well. The elasticity of \(y\) with respect to \(x\) is the percentage change in \(y\) when \(x\) increases by 1%. Elasticity is not constant along the demand curve. To create a constant elasticity
\[log(y) = \beta_0 + \beta_1log(x)\]
Natural log functions often arises in empirical work. Divide \(\beta_1\) by 100.
\[y = \beta_0 + \beta_1log(x)\]
\[log(y) = \beta_0 + \beta_1x\] multiply \(\beta_1\) by 100.

\(log(y) = B0 + B1log(x1) + B2x2 + u\) Recall that throughout the text log(x) is the natural log of x. The coefficient b1 is the elasticity of price with respect to nox (pollution). The coefficient b2 is the change in log( price), when ???rooms 5 1; as we have seen many times, when multiplied by 100, this is the approximate percentage change in price. Recall that 100???b2 is sometimes called the semi-elasticity of price with respect to rooms.

Logistic Regression

Logistic regression is use to predict whether something is true or false. (top of S curve would be high probability and the low part of the S curve would be low probability). The dependent variables would be categorical i.e. 0,1, yes,no. Another way of categorical "is there a correlation between the independent variables and the dependent variable.

The model gives out the probability of the outcome. The output is 0 or 1.

Dummy Variables

Categorical data stands for: male or female, white of black, new customer or old customer, old of new, etc.

Dummy variable on the right hand and left hand side: dummy variable on the right hand side is on your \(x\)’s. Dummy variables that is base on people decisions is something to questions.

A dummy variable is a variable that takes the values of 0 or 1 take into account a categorical effect in model. It separates groups in the model. When creating groups you need to st up a group base. The group base takes the value of 0 in he 0, 1 dummy notion. The dummy variable can have interactions within another. Dummy variables and continua variables can interact. this was done by multiplying the two variables.

Testing the Models

Over all significant tell whether your model is garbage. In general we can have a linear restrictions.

Gauss-Markov assumption

True model is linear in parameters.
Random sampling.
\(x\) is uncorrelated with \(u\).
No linear dependence of regressors.
Homoskedasticity and no autocorrelation in the residuals. Homoskedasticity in statistics means the error term is the same across all independent variables \(x\)s.

\(R^2\)

\(R^2\) is between 0 and 1.
A larger \(R^2\) implies a larger variance for the OLS estimators and it means the model is a good estimator of the prediction between the \(x\) variable and \(y\) variables. A high \(R^2\) is a good thing to have. When compering to \(R^2\) the higher one is the better predictor.

The total sample variation: A larger SST implies a smaller variance for the estimators. Larger variation in general, beta one-hat, will have smaller variance. Linear relationships among the independent variables: a larger \(R^2\) implies a larger variance for the estimators. larger \((1-r^2)\) < will get smaller. beta hat will get bigger.

The wrong model would have low variance. In some cases, bias is not as bad with small variance.

Adjusted \(R^2\): The only difference between \(R^2\) and adjusted \(R^2\) is: Adjusted \(R^2\) is adjusted for the number of predictors, independent variables, \(x_k\) in the model. This is one of the formulas for adjusted \(R^2\):

\[R^2 = 1 - (1 - R^2) \frac{n-1} {n-p-1}\]

When there is a small \(R^2\) SSR is large.

T-test

The t score is a ratio between the difference between two groups and the difference within the groups. The larger the t score, the more difference there is between groups. The smaller the t score, the more similarity there is between groups. the bigger the t-value, the more likely it is that the results are repeatable.

T-Test
\[\frac{\hat{\beta}_j - \beta_j} {se(\hat{\beta_j})} ~ t_{n-k-1}\]

t = (estimate - hypothesized valued)/standard error.

We know \(\hat{\beta}_j\) ones we estimate the sample. We don’t know \(\beta_j\)

Alpha \(\alpha\) is the significant level. Even in the case the null hypothesis is true we allow our selves some probability to reject the null hypothesis. You got to allow yourself to have some probability to make mistake. If you don’t want a type I error then you will have a very large type II error. If you don’t want a type II error then you will have a large probability of having a type I error. \(\alpha\) does not fallow a T distribution (normal distribution). T-Test can be on the right hand side or left hand side. T-test can be one tail or two tail test. \(H_1\) can be one sided test or two sided test. \(H_1: \beta_j > 0\) and \(H_1: \beta_j< 0\) are one sided test. \(H_1: \beta (not equal) 0\) is a two sided test. Let say you want to test whether beta 1 is equal to one instead of the alternative of beta 1 being less than one.

T statistic onyl works with one peramiter. T test work under heterskedastitcity. Single varibale F test works. Multiple variable F test.

Two sided test Two sided t test also test for either \(\mu\) or \(\hat{\mu}\) being wrong.

Two sided states: you break the probability on the two side, split. if you want t value to be smaller then critical table number then you reject the null hypothesis. for example, if the data table is bigger.
The t-test can be use even if the error term is not normally distributed it has to be large enough. F test cannot be use if the error term is not normally distributed. Example, an exclusion restriction. we want to know if the parameters are equal to zero. Whether education and experience are useless in explaining your wage. You cannot use a t-test because if you do a t test on each one there would be a problem with the significant level.

One Sided Test
\(H_1: \beta_{j}>0\): One way to write it if \(\beta_{j}\) is greater than 0.

One way to understand the null hypothesis is to imagine you are a judge and you want to put innocent verdict in the null hypothesis. The thing you want to reject in the known hypothesis.

Linear Combination Testing

\(t = \frac{\hat{\beta}_1 - \hat{\beta_2}} {se(\hat{\beta_1} - \hat{\beta_2})}\). Is use to test \(H_0\) whether \(\hat{\beta_1} = \hat{\beta_2}\). The problem with the above equation is we don’t know the standard error of \(\beta_1\) and \(\beta_2\).

Normal distribution is center around 0.

F-Test

Is used to compare the means of two populations. F statistic has two parameters: F ~ Fq, \(n-k-1\). denominator degree of freedom \(n-k-1\). no one tail test. even though F-test is a two sided test never divide it by two.

F test \(R^2\) form. Finding the f test are the same even thought there are different ways of doing it.

\[F = \frac{\frac{(SSR_r-SSR_{ur})}{q}}{\frac{SSR_{ur}}{(n-k-1)}}\]

F-Test with \(R^2\)
\[F = \frac{\frac{(R_{ur}^2 - R_r^2)}{q}}{\frac{(1-R_{ur}^2)}{df_{ur}}}\]

# Heteroskedasticity robust F-test.
waldtest(model1, model.unrestricted, vcov = vcovHC(model.unrestric, type = "HC0"))

# F-test
anova(model1, model2)

P Value

P-value low p-value is good. p-Value: The smallest significance level at which the null hypothesis can be rejected. Equivalently, the largest significance level at which the null hypothesis cannot be rejected.

Testing for Heteroskadisticity

Testing for Heterosedasticity

# This example uses the car data in R
lmodel = lm(dist ~ speed, data = cars)
# Check data. x <-anova(model) will save data into x.
anova(lmodel)

## Analysis of Variance Table
## 
## Response: dist
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## speed      1  21186 21185.5  89.567 1.49e-12 ***
## Residuals 48  11354   236.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

\(\hat{/sigma}\) = standard error of the regression. Standard error: measure of the statistical acccuracy of an estimate, equal to the standard deviation of the theoretical distribution of a large populatio of such estimates. as the data points grows beta hat gets closes to beta. this means that we can make our estiamtor arbitrarily close to beta if we can collect as much data as we want.

Steps for the LM test: Lagrain model.

estimate the restricted model, find out what is your u hat then you regress what is the u hat on all the xs.run a restricted model given what is your hypothesis, then you are going to get a residual and you are going to get on all the X’s. Treat the residual as a new variable and regress against all the x’s

If we find that the slope of the regressio line is isgnificantly different from zero, we will condlude that the re is a significnat relationship between the independent and depende varaible. If there is a significant linear relationship between the independent variable x and the dependent varaible y the slope will not equal zero.

comparing regular standard error and rubust standard error done mean anything.

when the modle is home skedasticity the white test has a higher varaice.

Rubust standard errors: is a technique to obtain uniased standard errors of OLS coefficient under heteroscedasticity. Heteroskedasticity is bad because OLS assumes that all residuals are draon froma population that has a constant variance Homoskedasticicty, to satisfy the regression assumptions and be able to trust the results, the residuals shoudl ahve a constant variance. The white test is also use to check for heteroskedasticity.

Robust standard error is called robust because if the model is heteroskedastic it would work and if the model is homekesdasticity it would work.

Rubust LM test is use when you have heteroskedasticity.

First step is to run restricted model and save the residual U hat.
regress each of the excluded variables on the included variabesl.
defind a vairable to be equal to 1.

# Robust Standard Error test
coeftest(gpa.lm, vcovHC(gpa.lm, type = "HC0"))
# The smaller the standard error, the less the spread and the more likely it is that any sample mean is close to the population mean. A small standard error is thus a Good Thing.

\(u\) is uncorrelated with any functin of \(x\). that is home skedasitcity which we can conclude that u square is oncorrelated with x. When data is lare enough rubust standard error is goign to work.

Heteroscedasticity does not cause ordinary least square coeffficent estimates to be biased, although it can cause ordinary least squares estiates of the variabace and thus standard errors of the coefficient to be biases. To test for heteroskedasticity the breush pegan test is doen. Heteros kedasticity does not effect the R^2 and udjusted R^2.

Heteroskedasticity is the opposite of homoscedasticity. Homoscedasticity is an assumption for OLS estimation. Homoscedasticity assumes the variance of the errors should be consistent for all observations. Heteroskedasticity is the opposite and thus there are some remedies when heteroskedasticity occurs. Heteroskedasticity does not cause bias or inconsistency in the OLS estimators, but the usual standard errors and test statistics are no longer valid.

The Breusch- Pagan test and White test for special cases. Both test involve regressing the squared OLS residuals on either the independent variables or the fitted vales.

# BP test
bptest(lmodel) # Breush-Pagan test

The P-value is less than the significance level at .05 percent.

OLS is no longer the best linear unbiased estimator BLUE in the presence of heteroskedasticity. Simple White test.

\[\frac{\sum_{i=1}^n(x_i - \bar{x})^2 \hat{u}_i}{SST_x^2}\]

Box-Cox Transformation This creates a new model with a normal distribution using the Y varaibles.

# How to deal with Heteroskedasticity. One way is to create a Box-Cox Transformation
newlmodel = BoxCoxTrans(cars$dist)
print(newlmodel)

## Box-Cox Transformation
## 
## 50 data points used to estimate Lambda
## 
## Input data summary:
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2.00   26.00   36.00   42.98   56.00  120.00 
## 
## Largest/Smallest: 60 
## Sample Skewness: 0.759 
## 
## Estimated Lambda: 0.5

Testing for Autocorrelation

# Checking for autocorrelation
acf(model$residuals) # Plots autocorrelation

Chi-Square test

Multiple linear restriction If you have more than one restriction, t test cannot be use. why two or more restriction cannot be tested by a t-test. Significant level is the probability of us making a type one error. you cannot check t statistic separately. The F test is use for multiple variable restrictions.

Testing for Heterosedasticity

# This example uses the car data in R
lmodel = lm(dist ~ speed, data = cars)
# Check data. x <-anova(model) will save data into x.
anova(lmodel)

## Analysis of Variance Table
## 
## Response: dist
##           Df Sum Sq Mean Sq F value   Pr(>F)    
## speed      1  21186 21185.5  89.567 1.49e-12 ***
## Residuals 48  11354   236.5                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

RESET

test is use to test which model is preferred. For example, using regular variables, log variables or square.

Process:

First model

\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + u\)

Estimate the following model:
\(y = \beta_0 + \beta_1x_1 + \beta_2x_2 + \delta_1\hat{y}^2 + \delta_2\hat{y}^3 + u\) after thismodel is created test for \(\delta_1 = \delta_2 = 0\)

Proxy variables: is use when the observe variables for the regression cannot be obtain. Then we suse a proxy varaible to approximate the unobserved variable.

Lagged dependent variable: is a method use to approximate unobserved varaible. Is the basic idea of proxy variable method.

Messurement error. There are two types of messurement errors. There is messurement error in the Y varaible and there is messurement error in the x varaible. If the error is in the y variable it means the messurement error is independnet of the x variable.

Endogeneity problem is when one of the independent variable is correlated with the error term. Arises when ther eis something that is related to your \(y\) varaible that is also related to your \(x\) varaible.

Time Series

Time series data is one possible outcome of the stochastic process. Static model only captures the contemporaneous effect of \(x\) on \(y\). The two basic time series regression models are static model and finite distributed lab models (FDL).

Finite distributed lag model allows lagged x to have effect on y. We can also say that static model is a special case of finite distributed lag model. With the finite distributed lag model, we can use a distribution of effect of x on y. This is impulse response function in macroeconomics.

The assumptions are the same with cross sectional approach. The second assumption is much stronger than in cross-sectional approach.Serial correlation is very bad in Finite Distribution Lag. FDL

This assumption would have Gauss-Markov theorem in time series. The assumptions for hypothesis testing are:

Homoscedasticity
No serial correlation

Dummy Variables in time series are use to capture special events or particular periods.

In time series, trend is important to identify. If the underlying data contains trend but we didn’t control for it, it causes spurious regression. Seasonality is also important to identify in time series. The most common approach in time series regression is to use seasonally-adjusted data. There are several ways to deal with seasonality, for example dummy variables.

The obvious characteristic of time series from cross sectional data is temporal ordering. Example, the data from 1970 immediately precede the data for 1971, the past can affect the future. cross sectional views the data as drawn from the population will generally yield different values of the independent and dependent variables. Exp., education, experience, wage, etc.

Why time series can been seen being outcomes of random variables. we do not know what the Dow Jones Industrial Average will be at the close of the next trading day. We do not know what the annual growth in output will be in Canada during the coming year. Since the outcomes of these variables are not foreknown, they should clearly be viewed as random variables.

Formally, a sequence of random variables indexed by time is called a stochastic process or a time series process. (“Stochastic” is a synonym for random.)

The sample size for a time series data set is the number of time periods over which we observe the variables of interest.

Example of the model: \(gfr\) against \(pe\).

\[gfr_t = \beta_0 + \beta_1pe_t + u_t\]

Use Finite distributed Lag model FDL.

It allows one or more variables to affect \(y\) with a lag. For Example, for annual observations.

\[gfr_t = \alpha_0 + \delta_0pe_t + \delta_1pe_{t-1} + \delta_1pe_{t-2} + u_t\]

the model above is a FDL of two orders. to interpreter the coefficients, \(pe\) is equal to a constant before \(t\). At t \(pe\) increases by one unit. Long-run propensity (LRP) is often of interest in distributed lag model.

Time series and the Gauss Markov assumptions:

Variables can be correlated but it rules out perfect correlation.
second assumption: zero conditional mean, this means that \(t\) and \(u\) cannot be correlated. \(u\) most be uncorrelated with the pass values. One example of past values being correlated with \(u\) would be, the city adjusts the size of its police force based on past values of the murder rate. If this is the case feed back from \(u\) to future \(z\) is always an issue.

Variables that are strictly exogenous cannot react to what has happened to y in the past. A factor such as the amount of rainfall in an agricultural production function satisfies this requirement: rainfall in any future year is not influenced by the output during the current or past years. But something like the amount of labor input might not be strictly exogenous, as it is chosen by the farmer, and the farmer may adjust the amount of labor based on last year’s yield.

The variance of the OLS estimators and Guas-Markov theorem.

Homeskedasticity: When \(var(u|x)\) does depend on \(x\), it often depend on the explanatory variables at time \(_t\), \(x_t\).
No-serial correlation: When it is false, then there is serial correlation or auto-correlation because they are correlated across time.

The most popular stationary time series model is ARMA(p,q) model.

\[y_t = \alpha_0 + \alpha_1y_{t-1} + ... + \alpha_py_{t-p} + \epsilon + \beta_1\epsilon_{t-1}\]

\(\epsilon_t\) is white noise. The definition is similar to the assumption about error terms in cross-sectional approach.

One important assumption of ARMA(p,q) model is the data is stationary. If the data is not stationary and we use ARMA(p,q) model, we will not be able to obtain consistent estimators. The co-variance stationary condition is not the weakest or strongest stationary condition.

For moving average part of the model, there is no restrictions on the parameters to make it to be stationary. For the autoregressive part of the model, we need the root of the polynomial to be outside of unit circle.

for models with AR(1) component, is ideal for AR(1) coefficients to be smaller than 1 in absolute value. If the estimation result is very close to the boundary or exceeds the boundary, this is a very important indicator the underlying data is not stationary.
For AR(2) components, ideal, the sum of AR(1) coefficient and AR(2) coefficients to be less than 1 in absolute value. If the estimation results is very close to the boundary, or exceeds the boundary, is an important indicator that the underlying data is not stationary.

One classical way to choose the order of \(p\) and \(q\) is Box and Jenkins method. This method exploits the different patterns in ACF and PACF for AR, MA, and ARMA model. In general, the Box and Jenkins method wont be able to pin down the correct model, but this method is still helpful to narrow down the search.

After narrowing down the search of optimal model, we usually use information criteria to determine which model is the best model. This idea is very similar to using adjusted R=square to choose the best model. The model that provides the smallest AIC/BIC is the best model. The difference between those two model is the penalty term. AIC’s penalty term is smaller than BICs when sample size is larger than 10.

The natural log function work the same as in cross sectional data.

Binary or dummy independent variables are also quite useful in time series applications. Since the unit of observation is time, a dummy variable represents whether, in each time period, a certain event has occurred. this is called event studies. Sometimes, multiple dummy variables are used. For example, if the event is the imposition of a new regulation that might affect a certain firm, we might include a dummy variable that is one for a few weeks before the regulation was publicly announced and a second dummy variable for a few weeks after the regulation was announced. The first dummy variable might detect the presence of inside information.

Index Number An index number typically aggregates a vast amount of information into a single quantity. Index numbers are used regularly in time series analysis,

Example of Lag

Starting model:

\[gfr_t = \alpha_0 + \delta_0pe_t + \delta_1pe_{t-1} + \delta_2pe_{t-2} + u_t\]

Change the starting model into:

\[y_t = \alpha_0 + \delta_0z_t + \delta_1z_{t-1} + \delta_2z_{t-2} + u_t\]

Assume \(z\) is a constant = \(c\), in all time periods before time \(t\). At \(t\), \(z\) increase by one unit to \(c +1\)

Taylor rule: data from the pass to determined current interest rate.

In time series it is almost impossible to satisfy guass Markov assumptions.

the error term is the part of y that x cannot explain.

When interest rate reduces we are expecting reduction on output and low inflation.

AR Model

Pure AR Models - Depends on the lagged values of the data you are modeling to make forecasts. The AR part involves regressing the variable on its own lagged (i.e., past) values.

AR stands for Auto-Regressive model. The data set that can be model by pure AR model, we would have a decaying in ACF and cut off in PACF.

ACF is geometric and PACF is significant until \(p\) lags

MA Model

MA stands for Moving Average model. The MA part involves modeling the error term as a linear combination of error terms occurring contemporaneously and at various times in the past. Pure MA Models - Depends on the errors(residuals) of the previous forecasts you made to make current forecasts.

There characteristics are a decay in PACF but cut of in ACF. PACF is geometric.

ARMA Model

This is a combination of both AR and MA models.

A lot of the united states macro economic variables start after WWII.

Inflation and unemployment would have a negative relationship.

Stag inflation when inflation and unemployment has a positive relationship.

Rational expectation. We sort all the possible information and try to predict what to predict tomorrow.

When adapting learning is when when you adjust you behavior given new information.

Mixed Models ARMA - Takes into account both of the above factors when making predictions.

You use this models to tell whether there is serious auto correlation, then you choose the ACF or PACF to see why type of error it has. Then you fix it by taking away the items that are highly auto correlated.
ARMA. ARMA model is a special case of AR and MA models.

Yield curve differential is important indicator to determined recession. when yield curve differential becomes negative, inverse yield curve, is a leading indicator getting into recession. If we can model yield curve differential we can forecast recession. Very noisy series,

ACF

This is a way to see serial correlation when data changes over time, time series. this can give an idea of whether or not pairs of data show auto correlation. they cannot measure how large that auto-correlation is for example using the Durbin Watson Test ACF Auto correlation function
This shows how the \(y_t\) is correlated with the past. How this year value correlates with last year value, or previous values. The first ACF is not important. Ignore that means the line on the graphs, this is because is always equal to 1. \(y_t\) is always correlated with \(y_t\). On this example, ACF is decaying slowly over time. it has positively significant till the seven line. \(y_t\) and \(y_{t-7}\) there is things become significant again after 7 but negative.

Auto Correlation function ACF plot is a visual way to show serial correlation in data that changes over time (time series)

acf(yielddiff$spread)

PACF

Corr election between \(y_t\) and \(y_{t-1}\) conditional.

pacf(yielddiff$spread)

ACF Auto correlation function

For this example there is a decaying PACF. this model shows some MA component inside the data. after controlling for past observation our errors are correlated.

While the first one and second one is significant is because there is some AR part in the model. In this example there are AR1 and AR2 components. so, alpha one and alpha 2 should exist. we use this methods to see which model would be best, ACF, PACF, AR2

ACF in MA(1) model. The correlation between \(y_t - y_{t-1}\)

When both models have a common coefficient they are correlated.

MA(2):

Plot ACF and PACF and

ARMA(1,1)
the properties of an ARMA(1,1) is a mixture of those of an AR(1) and MA(1) process. ACF would be geometric, and PACF would be geometric if serial correlation exist.

Autocorrelation function of an ARMA (p,q) process exhibits exponential decay toward zero.

NOTES
The free rider problem is the burden on a shared resource that is created by its use or overuse by people who aren’t paying their fair share for it or aren’t paying anything at all. The free rider problem can occur in any community, large or small

# Basic forecast.

library(forcats)

a = forecast(is1,h=10)
plot(a)

a = forecast(is1, h=100)
plot(a)

Panel Data

Panel Data Analysis is another method of find relationships between the explanatory variable and the dependent variables. Is a mixture of time series and cross sectional. Panel analysis is use when the data set contains many individuals over a period of time. The individuals are denoted as \(i = 1, 2, ..., N\). Through time is denotes as \(t = 1, 2, ..., T\). The data can either have many \(i\) or many \(t\). Data containing large cross section of individuals, many \(i = 1, 2, ..., N\) and not many \(t = 1, 2, ..., T\), are called short panels. Long panel data is the opposite which is data containing many \(t = 1, 2, ..., T\) and not as many \(i = 1, 2, ..., N\). The \(i\) subscript, therefore, denotes the cross-section dimension whereas \(t\) denotes the time-series dimension. Balance panel NO missing data. Having a balance panel with a fair amount of \(t\) and \(i\) is ideal because it would make the analysis simpler.

Pooling the data means to treat them as one larger sample and control for the fact that some observations are from a different year. Which is done with the additional of the \(y2010_i\) dummy variable. \(hpruces_i = \beta_0 + \beta_1bdrms_i+ \theta{y2010}_i\). Pooling:

Cross sectional data is more of a snap shot of randomly selected individuals, panel data is more of a movie.

The general panel data model is written as:

\[y_{it} = \alpha_{it} + \beta_{it}x_{it} + u_{it}\]

\(y_{it}\): is the dependent ed variable. The variable that is being explain. dependent variable value for \(i\) individual \(i\) at time \(t\).
\(\alpha_{it}\): the intercept, is independent of \(i\) and \(t\).
\(\beta_{it}\): vector, the slopes, is independent of \(i\) and \(t\).
\(x_{it}\): independent variable 1 value for individual \(i\) at time \(t\). is the independent variable. The data gather that is explaining \(y_{it}\).
\(\beta_{it}x_{it}\):
\(u_{it}\): Stand for the error term.

The above formula is basically saying \(y_{it}\) is explain by \(\alpha_{it} + \beta_{it}x_{it} + u_{it}\).

Panel data give more informative data, more variability, less col-linearity among the variables,more degrees of freedom and more efficiency.

Time series has a lot of multicollinearity.

Better at studying dynamics of adjustment. example, job turnover, residential and income mobility,

Benefits from using panel data

There are benefits in using panel data instead of only cross section or time series. One of the benefits is controlling for individual heterogeneity. In panel data, the amount of \(i = 1, 2, ..., N\), which are the individuals, countries, states, firms, etc. are said to be heterogeneous. In time series and cross section analysis. With observations that span both time and individuals in a cross section, more information is available, giving more efficient estimates. Panel data can help control for omitted variable bias,

We can control for individual heterogeneity. the differences across the individuals.

We are able to track change over time like policies. panel data gives more information data. less collinearity among the variables. the information is not only coming from one individual thus correlating problem is not that severe. more degree of freedom. more efficiency. in the sense we are using more information to predict the change in capital against woman’s contribution to the economy

Time-series and cross-section studies not controlling this heterogeneity run the risk of obtaining biased results, Panel data give more informative data, more variability, less collinearity among the variables, more degrees of freedom and more efficiency. time series is plague with multicollinearity. Panel data are better able to study the dynamics of adjustment. good for economic studies labor. Panel data are better able to identify and measure effects that are simply not detectable in pure cross-section or pure time-series data. Panel data models allow us to construct and test more complicated behavioral models than purely cross-section or time-series data. Biases resulting from aggregation over firms or individuals may be reduced or eliminated ( panel date provide information on individual behavior bot across time and across individuals.

Better to study dynamics of adjustment. cross sectional distributions that look relatively stable hidden a multiple of changes.

Better to identify and measure effects that are simple not detectable in pure cross section or pure time series data. example, whether union membership increases or decreases wages. This can be better answered as we observed a worker moving from union to nonunion jobs or vice versa.

Limitations of panel data

Design and data collection problems. Distortions of measurement errors. Selectivity problems. Short time-series dimension. Cross-section dependence. selectivity problem (self selectivity). Short time series dimension.

\(u_{it} = u_{i} + v_{it}\). \(u\) is is time and it counts for any individual specific effect that is not included in the regression. the individuals unobserved ability. \(v_{it}\) = varies with individuals and time and can be thought of as the usual disturbance in the regression.

\(y_{it}\) will measure earnings of household, \(x_{it}\) may contain a set of variables like experience, education union membership, sex, race, etc..

Examples of usage of panel analysis

Where panel data is use Panel data can be use in macroeconomics when we want to analyze the GDP of many countries for many time periods. Other places are: Schooling and career transitions, marriage and fertility, training investments, child care usage and drug and alcohol use, agricultural economics. There are also social studies that fallow certain people through time. For example, Swedish study of household market and non-market activities. Canadian Survey of Labor Income Dynamics, it fallows the same participants for several years. Another study is the Labor Dynamics in Australia.

Public Capital Productivity

Gasoline Demand Can be done by estimating a two-way fixed effects model to provide evidence of the importance of the firm’s financing constraints in explaining the dramatic cycles in inventory investments. Using quarterly firm panel data, they conclude that cash flows much more successful than cash stocks or coverage in explaining inventory investment across firm size, different inventory cycles and different manufacturing sectors.

Co-variance model within estimator individual dummy variable model least square dummy variable model

Fix Effect, and Random Effect

Fix effect null hypothesis is that the variance of the random intercept error component equals zero. Fix effect: whenever you are only interested in analyzing the impact of variables that vary over time. FE explore the relationship between predictor and outcome variables within an entity (country, person, company, etc.). Each entity has its own individual characteristics that may or may not influence the predictor variables (for example, being a male or female could influence the opinion toward certain issue; or the political system of a particular country could have some effect on trade or GDP; or the business practices of a company may influence its stock price). When using FE we assume that something within the individual may impact or bias the predictor or outcome variables and we need to control for this. This is the rationale behind the assumption of the correlation between entity’s error term and predictor variables. FE remove the effect of those time-invariant characteristics so we can assess the net effect of the predictors on the outcome variable. Another important assumption of the FE model is that those time-invariant characteristics are unique to the individual and should not be correlated with other individual characteristics. Each entity is different therefore the entity’s error term and the constant (which captures individual characteristics) should not be correlated with the others. If the error terms are correlated, then FE is no suitable since inferences may not be correct and you need to model that relationship (probably using random-effects), this is the main rationale for the Hausman test The equation for the fixed effects model becomes:

Fix effect regress it against a dummy variable equal N the number of cross section units. the fix effect formula is the same just with another symbol representing the number of cross section units. Remember to drop one when doing the dummy variable. Test to see which one is performing better. the model with the dummy of the restricted model. You use the F test to test it. From there compare it to the critical value and decide to reject the null or accept.

The key insight is that if the unobserved variable does not change over time, then any changes in the dependent variable must be due to influences other than these fixed characteristics." Fixed-effects will not work well with data for which within-cluster variation is minimal or for slow changing variables over time.

Another way to see the fixed effects model is by using binary variables. So the equation for the fixed effects model becomes: \(Y_{it} = \beta_0 + \beta_1X_{1,it} +...+ \beta_kX_{k,it} + ?2E2 +.+ ?nEn + uit\)

Where
\(Y_{it}\) is the dependent variable (DV) where \(i\) = entity and \(t\) = time. \(X_k,it\) represents independent variables (IV), \(\beta_k\) is the coefficient for the IVs, \(u_{it}\) is the error term -En is the entity n. Since they are binary (dummies) you have n-1 entities included in the model. -?2 Is the coefficient for the binary representation (entities)

The slope coefficient on X is the same from one [entity] to the next. The [entity]-specific intercepts in [eq.1] and the binary regressors in [eq.2] have the same source: the unobserved variable \(Z_i\) that varies across states but not over time.

Fixed effect factor: Data has been gathered from all the levels of the factor that are of interest.

Example of fix effect The purpose of an experiment is to compare the effects of three specific dosages of a drug on the response. “Dosage” is the factor; the three specific dosages in the experiment are the levels; there is no intent to say anything about other dosages.
in this case \(u_{i}\) are assumed to be fixed parameters and the remainder disturbances stochastic with \(v_{it}\) for all i and t. the fixed effects model is an appropriate specification if we are focusing on a specific set of N firms, say IBM, GE, countries, states, cities.
Testing for joint significance performing an F-test, Chow test.
Assumptions for fix effect model, the slopes of the regression lines are the same across countries states, 2. the fixed effects capture entirely the time constant omitted variables.

Disadvantage of fix effects wipe out explanatory variables that do not vary within an individual such as gender, race, we are often interested in the effect of these separate sources of individual heterogeneity. You lose r square because you are taking one variable from the x.

Random effect has many possible levels, interest is in all possible levels, but only a random sample of levels is included in the data. The error term is compose of two terms which is the traditional error and individual coefficient components. The random effect approach attempts to model the individual effects as drawings from a probability distribution instead of removing them. Generalized least square is used. the null hypothesis is that the random effect estimates are consistent.

\(u_i\) are independent of \(v_{it}\). \(x_{it}\) are independent of \(u_i\) and \(v_{it}\).
\(/u_{it} = u_{i} + \lambda_{t} + v_{it}\) \(u_i\) the observable individual effects. lambda_t denotes the observable time effect and \(v_{it}\) is the remainder stochastic disturbance term. lambda it accounts for any time specific effect that is not included in the regression, example a strike in that particular year that reduced output.

Example: A large manufacturer of widgets is interested in studying the effect of machine operator on the quality final product. The researcher selects a random sample of operators from the large number of operators at the various facilities that manufacture the widgets. The factor is “operator.” The analysis will not estimate the effect of each of the operators in the sample, but will instead estimate the variability attributable to the factor “operator”.

Mix Effect is a mixture of fix and random effect.

R Code

Geting data into R

There are different ways of starting the model. The first one and easier way is to use the RStido Import Dataset option.

Second way to read data into R.

# Import to enviroment
# Upload the data into the r consul.
mydata <-read.csv("") # Reads data from csv file. Use $ to read selected columns.
mydata <- read_xlsx() # Reads xlsx files. There are also many other types to choose from.

Third method of getting data into R

# Another method of geting the data into r by Defind the variables.
Y<-cbind(Benchmark, data = mydata) #data would get the data from columns of the CSV.
X<-cbind(stock, bond, X3, data = mydata)

This is to create a linear regression straight from the data file.

# One way of creating the linear regression streight from the csv file. 
x<-lm(y column ~ x1column + x2, data = name of spreadsheet) # Add correct Y and X

# Set Data as OLS
olsreg<-lm(Y ~ X1 + X2, data = ) #defind your x's. X could be a matrix or selected columns.

Checking the data

After uploading the data into R there are many ways of checking the data. Thelist below are a few common ways in checking the data in R.

Checking the data
headTail(olsreg)
str(olsreg)
summary(olsreg)
anova(olsreg)

Statistical Analysis

Descriptive Stats R Code
sum() Returns the sum of all the values present in its arguments.
mean() Generates the arithmetic mean.
geoMean() Finds the geometric mean.
var() Finds the variance.
cor() Finds correlation.
cov() Finds the covaraince.
sd()
range()
min() Finds the minimum of a data set.
max()Finds the max of a data set.
median() Finds the median.

# Visualising the data
hist() # gives a histogram for any given value.
plot(y,x) # Produces a plot of the data
qqplot(olsreg, dist = "norm") # Would give reg.line with stedentized residuals.
density.default(olsreg) # Will give density plot, and skews.
plot(olsreg, which = 1:4) # r shortcut for OLS diagnostics.

# Outliers are unusula observations.
rstudent(olsreg) # if values are larger than 2 = outliers. From here they can be deleted.

# Leverage is the ability to change the slope of the reg.line. mesure of leverage is the hat values.
hatvalues(olsreg)
plot(hatvalues(olsreg)) 

# Influence is how strong leverage is and outlier status.
influence.measures(olsreg) # dfbeta is the influence of an observation on the coefficients. the change in i's coefficient caused by deleting a single observation
cooks.distance(olsreg)

# Other useful comands
glm() # General lm
multinom() # Multinominal Logit
optim() # General Optimizer

Plotting the Data

par(mfrow = c(2,2)) # Changes the view of charts
plot(lmodel)

# Create the new data frame with new column
cars <- cbind(cars, dist2 = predict(newlmodel, cars$dist))
head(cars)

Create the new regression model

lmodel2 = lm(dist2 ~ speed, data = cars)

Colinearity

# Collinearity
colldiag(lmodel) # will give condition index, larger than 30 is a problem.

Examples

Returns on Education

\[wage = \beta_0 + \beta_1education + u\]
Can this be consider as causal effect. To find that out what is the relationship between the variables that you control and the variables that you don’t control. \(u\) Is there other things that can effect wage, how much your parents make, people who go to college are more determine. You want to control as much as possible. Is there any other factor that i did not control for that may potentially affect my y. If there is something that you ignore that would hold into the error term and if they are related to the things you control, if the answer is yes then you will not be able to get a ceteris paribus effect.

Education and Fertility

Let \(kids\) denote the number of children ever born to a woman, and let \(educ\) denote years of education for the woman. A simple model relating fertility to years of education is
\[kids = \beta_{0} + \beta_{1}educ + u\] where \(u\) is the unobserved error.

What kinds of factors might contain in \(u\)? Are these likely to be correlated with level of education? The types of factors that might contain \(u\) ranges from being correlated with the level of education and uncorrelated. An example of a correlated factor with education is the woman’s parents education. An example of an uncorrelated factor would be whether the group sample can have children.
Based on your answer in part (a), will a simple regression analysis uncover the ceteris paribus eﬀect of education on fertility? Explain. A simple regression analysis would not uncover the cetaris paribus effect on education and woman’s fertility. The reason is that finding a causal effect between education and woman’s fertility is more complex. The model would need to account more for correlated and uncorrelated independent variables. Most important, needs to be unbiased

The following table contains the ACT scores and the GPA (grade point average) for eight college students. Grade point average is based on a four-point scale and has been rounded to one digit after the decimal,

StudentID = c(001, 002, 003, 004, 005, 006, 007, 008)
GPA = c(2.8, 3.4, 3.0, 3.5, 3.6, 3.0, 2.7, 3.7)
ACT = c(21, 24, 26, 27, 29, 25, 25, 30)

Estimate the relationship between GPA and ACT using OLS; that is, obtain the intercept and slope estimates in the equation, \[GPA = \beta_0 + \beta_1ACT + u\] Comment on the direction of the relationship. Does the intercept have a useful interpretation here? Explain. How much higher is the GPA predicted to be if the ACT score is increased by ﬁve points? The direction is upward sloping. The intercept = 0.5681 and the lope = 0.1022. The intercept is saying when ACT is 0 GPA is .5681. This is not useful information because GPA would never be at 0 in this data set. The slope is the parameter with value information. If ACT increases by 5 points then 0.5681 + (0.1022 * 5) = 1.079. 1.079 is how much GPA would increase if ACT increased by 5 points according to this model.

## 
## Call:
## lm(formula = GPA ~ ACT)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.42308 -0.14863  0.06703  0.10742  0.37912 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  0.56813    0.92842   0.612   0.5630  
## ACT          0.10220    0.03569   2.863   0.0287 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2692 on 6 degrees of freedom
## Multiple R-squared:  0.5774, Adjusted R-squared:  0.507 
## F-statistic: 8.199 on 1 and 6 DF,  p-value: 0.02868

Compute the ﬁtted values and residuals for each observation, and verify that the residuals (approximately) sum to zero. Residuals vs Fitted graphs shows that the residuals approximately sum to zero.

fitted = fitted(grade)
fitted

##        1        2        3        4        5        6        7        8 
## 2.714286 3.020879 3.225275 3.327473 3.531868 3.123077 3.123077 3.634066

res = resid(grade)
res

##           1           2           3           4           5           6 
##  0.08571429  0.37912088 -0.22527473  0.17252747  0.06813187 -0.12307692 
##           7           8 
## -0.42307692  0.06593407

SSTotal = c(0.2, 0.0, 0.0, 0.1, 0.2, 0.0, 0.3, 0.2)
SSRegression = c(0.0, 0.1, 0.0, 0.0, 0.0, 0.0, 0.2, 0.0) # approximately sum to zero
x<-lm(fitted ~ res)
par(mfrow = c(2, 2))
plot(grade)

What is the predicted value of GPA when ACT = 20? When ACT = 20. The equation would be 0.56810 + .1022(20) = 2.6121
How much of the variation in GPA for these eight students is explained by ACT? Explain. The variation in GPA for these eight students is 0.5774. This is shown by the r square. It means the model independent variable can explain %57.74 of the variance of the dependent variable

Suppose you are interested in estimating the eﬀect of hours spent in an SAT preparation course (\(hours\)) on total SAT score (\(sat\)). The population is all college-bound high school seniors for a particular year.

Suppose you can only randomly sample student and record their SAT score (\(sat\)) and hours spend in SAT preparation course (\(hours\)) and you construct the following population model, \[sat = \beta_0 + \beta_1hours + u\] List at least two factors contained in \(u\). Are these likely to have positive or negative correlation with hours? One factor in \(u\) would be stress levels and it would have a negative correlation with hour because the more stress the less sat wold be. Another example is how efficient the student is at retaining information. This would have a positive correlation with hours because the better the student is at retaining information the better he/she would score on SAT. Other examples are GPA scores, and how motivated the student is at the preparation course
In the equation from part (a), what should be the sign of \(β_1\) if the preparation course is eﬀective? The sign for \(β_1\) would be positive if the preparation course is effective
In the equation from part (b), what is the interpretation of \(β_0\)? \(β_0\) represent the intercept coefficient and represents if there were no hours spend on preparation course

Computer Exercise

The data set in CEOSAL2 contains information on chief executive oﬃcers for U.S. corporations. The variable salary is annual compensation, in thousands of dollars, and \(ceoten\) is prior number of years as company CEO.

Find the average salary and the average tenure in the sample.
How many CEO’s are in their ﬁrst year as CEO (that is, \(ceoten\) = 0)? What is the longest tenure as a CEO?
Plot scatter plot for log(\(salary\)) and ceoten with log(\(salary\)) on the y axis and ceoten on the x-axis. Also plot a line ﬁt on top of the scatter plot. Based on this, do you see negative or positive relationship between log(\(salary\)) and \(ceoten\)? There is a positive relationship of 0.009724. It hardly has a positive relationship but none the less there is a positive relationship.
Estimate the simple regression model
\[log(salary) = \beta_0 + \beta_1ceoten + u\] and report your results in the usual form. What is the (approximate) predicted percentage increase in salary given one more year as a CEO? One more year as CEO would be: .00972 * 100 = 0.9724%. That is because salary is in log form and to find out the percentage increase in salary, we multiply the slope times 100

Use the data in SLEEP75 from Biddle and Hamermesh (1990) to study whether there is a tradeoﬀ between the time spent sleeping per week and the time spent in paid work. We could use either variable as the dependent variable. For concreteness, estimate the model \[sleep = \beta_0 + \beta_1totwrk + u\] where sleep is minutes spent sleeping at night per week and \(totwrk\) is total minutes worked during the week.

Report your results in equation form along with the number of observations and R2. What does the intercept in this equation mean? The equation form would be \(sleep = 3586.37695 + -0.15075(totwrk)\). The intercept means that when totwrk is 0 (not working) there is 3586.37 minutes of sleep a week
If totwrk increases by 2 hours, by how much is sleep estimated to fall? 120 * -0.15075 = -18.09. Sleep would fall by 18.09 minutes, ceteris paribus.

NOTES

# Mean of residual is zero
mean(model$residuals) # Write the model and the residuals.

# Method II: Durbin-Watson test
dwtest(model) # Just write the model names.