Author : Gianfranco David Chamorro Rodriguez
E-mail : gianfranco.chamorror@gmail.com
Channel : Video
This document only estimates the results by the least squares method without evaluating the assumptions of the model.
We will make use of data from Table 5.5 of Damodar Gujarati’s book Econometrics, Fifth Edition, which gives us information on the average salary of a public school teacher (annual salary is in dollars) and public education spending per pupil. (dollars) for 1985 in the 50 states and the District of Columbia in the United States. We will observe if there is any relationship between teacher salary and spending per student in public schools, the following model was suggested: \[\begin{equation*} Salary = \alpha + \beta \; Spending + e \end{equation*}\]
We have :
\[\begin{equation} Y = \alpha + \beta_1 X +e \end{equation}\]
then :
\[\begin{equation*} e= Y - \alpha - \beta_1 X \end{equation*}\]
Recall that the OLS method minimizes the sum of the square of the vertical distances between the responses observed in the sample and the responses of the model.
\[ RSS= min\sum_{i=1}^{n}e_{i}^{2} \]
First order minimization conditions, the first derivatives of the objective function with respect to the coefficients to be estimated must be zero:
\(\alpha\): \[ \sum_{i=1}^{n}e_{i}^{2}= \sum_{i=1}^{n} (Y - \alpha - \beta_1 X)^2 \]
\[ \dfrac{dRSS}{d \alpha} = 2 \sum_{i=1}^{n}( Y - \alpha - \beta_1 X) (-1) =0 \] \[ \sum_{i=1}^{n}Y - n\alpha - \beta_1 \sum_{i=1}^{n}X = 0\\ \]
\[\begin{equation*} \hat{\alpha}_{mco} = \bar{Y}-\beta_1 \bar{X} \end{equation*}\]
\(\beta\):
\[ \dfrac{dRSS}{d \beta_1} = 2 \sum_{i=1}^{n}( Y - \alpha - \beta_1 X) (-X) =0 \]
\[ \sum_{i=1}^{n}YX - (\bar{Y}- \beta_1 \bar{X} ) \sum_{i=1}^{n}X -\beta_1 \sum_{i=1}^{n} X^2 = 0\\ \] \[ \sum YX - \bar{Y}\sum X + n\beta_1 \bar{X}^2 -\beta_1 \sum X^2 = 0\\ \] \[ \beta_1 (\sum X^2 - n\bar{X}^2)= \sum YX - \bar{Y} \sum X \\ \] \[ \beta_1 = \frac{\sum YX -\bar{Y} \sum X }{\sum X^2 -n\bar{X}^2 } = \frac{\sum YX - n \bar{Y} \bar{X}}{\sum X^2 -n\bar{X}^2} \]
\[\begin{equation*} \hat{\beta}_{mco} = \frac{COV(XY)}{VAR(X)} \end{equation*}\]
#We use DB from Github
file <- "https://raw.githubusercontent.com/Gianfrancocr27/Data-Set/main/gujarati55.csv"
datos <- read.csv(file=file, header=TRUE)
head(datos) # shows the first 6 rows
## salary spending
## 1 19583 3346
## 2 20263 3114
## 3 20325 3554
## 4 26800 4642
## 5 29470 4669
## 6 26610 4888
summary(datos) # We analyze the statistics of our variables
## salary spending
## Min. :18095 Min. :2297
## 1st Qu.:21495 1st Qu.:2974
## Median :23382 Median :3554
## Mean :24356 Mean :3697
## 3rd Qu.:26568 3rd Qu.:4082
## Max. :41480 Max. :8349
par(mfrow = c(2, 2)) #We set up a 2x2 graph for the histogram and Boxplot of each variable
hist(datos$spending, breaks = 5, ylab = "Frequency", main = "", xlab = "Spending", col="pink", border="blue")
hist(datos$salary, breaks = 5, ylab = "Frequency", main = "", xlab = "Salary", col="pink", border="blue")
mtext("Histogram",
side = 3,
line = - 2,
outer = TRUE)
mtext("Boxplot",
side=3,
line=-17,
outer=TRUE)
boxplot(datos$spending, horizontal = TRUE, xlab = "Spending", border = "black", whiskcol = "blue", col="pink")
boxplot(datos$salary, horizontal = TRUE, xlab = "Salary", border = "black", whiskcol = "blue", col="pink")
regresion <- lm(datos$salary ~ datos$spending) #The “Linear Model” function lm() is the main function within R for calculating the fit of a simple linear model.
regresion # Shows us the estimators
##
## Call:
## lm(formula = datos$salary ~ datos$spending)
##
## Coefficients:
## (Intercept) datos$spending
## 12129.371 3.308
summary(regresion) # The Model Statistics
##
## Call:
## lm(formula = datos$salary ~ datos$spending)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3848.0 -1844.6 -217.5 1660.0 5529.3
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.213e+04 1.197e+03 10.13 1.31e-13 ***
## datos$spending 3.308e+00 3.117e-01 10.61 2.71e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2325 on 49 degrees of freedom
## Multiple R-squared: 0.6968, Adjusted R-squared: 0.6906
## F-statistic: 112.6 on 1 and 49 DF, p-value: 2.707e-14
anova(regresion) # Variance analysis
## Analysis of Variance Table
##
## Response: datos$salary
## Df Sum Sq Mean Sq F value Pr(>F)
## datos$spending 1 608555015 608555015 112.6 2.707e-14 ***
## Residuals 49 264825250 5404597
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow = c(1, 1)) #We configure a 1x1 graph
plot(datos$spending, datos$salary)
abline(regresion, col = "blue") # Add the line estimated by the model
residuo <- resid(regresion) # Get the residuals of the model
plot(fitted(regresion), residuo) # Scatterplot of the estimates and model residuals
abline(0,0) #add a line at the value 0
plot(density(residuo)) #Histogram of residual density
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
jarque.bera.test(residuo) #Jarque Bera Normality Test of residues
##
## Jarque Bera Test
##
## data: residuo
## X-squared = 2.1963, df = 2, p-value = 0.3335
summary(residuo) # Residue statistics
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3848.0 -1844.6 -217.5 0.0 1660.0 5529.3
qqnorm(residuo) # Quantile-Quantile plot of residuals