This is a suggested solution to SOK-3020 Econometrics, Home assignment 1, part ii - Fall 2015. The text is found at: https://goo.gl/xEF9u8.

Exercise a)

The model we would like to estimate is: \(sales=\beta _{0}+\beta _{1}\times price\).

rm(list=ls())
price=c(0.5,1.35,0.79,1.71,1.38,1.22,1.03,1.84,1.73,1.62,0.76,1.79,1.57,1.27,0.96,0.52,0.64,1.05,0.72,0.75)
sales=c(181,33,91,13,34,47,73,11,15,20,91,13,22,34,74,164,129,55,107,119)
q=sales/price

m1=lm(sales~price);m1 # two codelines separated by ;
## 
## Call:
## lm(formula = sales ~ price)
## 
## Coefficients:
## (Intercept)        price  
##       193.5       -109.7

Let, s=sales, p=price, and q=quantity, so we have estimated: \(s=\beta _{0}+\beta _{1}\times p\).
However, since \(s=p \times q\), we are defacto estimating: \(p \times q =\beta _{0}+\beta _{1}\times p\).
Lets total differentiate this expression, and we get: \(d(p \times q)= \beta _{1}\times dp\).
We get: \(dp \times q + dq \times p= \beta _{1}\times dp\).
We solve for the elasticity, and (after some rearranging) get: \(\frac{dq \times p}{dp \times q} = \frac{\beta _{1}}{q} -1\).

Our numerical value of the elasticity is then:

(coef(m1)[2]/mean(q))-1
##     price 
## -2.224347

Reading data from POE4 into R

We are going to use the br dataset for this assignment. All data in POE4 are accessible from their webpage.

The data is found here http://www.principlesofeconometrics.com/poe4/data/dat/br.dat.
When we browse this page, we notice that there are no variable names. The variable names are found in the data definition file, at: http://www.principlesofeconometrics.com/poe4/data/def/br.def.

This means that we need to read the data, AND assign variable names, as R only assigns variable names like V1, V2 by default when there are no variable names in the first row of the data.

We now have the following options for reading data.

Alternative 1

Read the data, and copy/paste the variable names from the data definition file into the colnames(br) function.

br <- read.table("http://www.principlesofeconometrics.com/poe4/data/dat/br.dat")
colnames(br) <- colnames(read.table(header=T,
  text="price sqft Bedrooms Baths Age Occupancy Pool Style Fireplace Waterfront DOM"))

Alternative 2

Here we read the data first (same as above), but since all variable names are in the 3’rd line of the data definition file, we utilize this, and read the variable names directly from the data definition file!

br <- read.table("http://www.principlesofeconometrics.com/poe4/data/dat/br.dat")
colnames(br) = paste(lapply(read.table("http://www.principlesofeconometrics.com/poe4/data/def/br.def",
                                        skip=2, nrows=1), as.character, sep=","))

These two alternatives only works if one inserts the filename br where appropriate in the R code above (4 places). This leads us to our next alternative. This utilizes the R function() function. We can write R code that reads the filename, and then does all the rest automatically. If you look carefully, you will notice that the code paste the filname into the url’s, otherwise it is similar to Alternative 2. The code is as follows:

poe4read <- function(x){
   fileName <- as.character(x)
   url01 <- paste0("http://www.principlesofeconometrics.com/poe4/data/dat/", fileName, ".dat")
   url02 <- paste0("http://www.principlesofeconometrics.com/poe4/data/def/", fileName, ".def")
     x <- read.table(url01, header=FALSE)
   # fetch colnames in the second file
   # and modify column names according to the file name :
   colnames(x) <- paste0(paste(lapply(read.table(url02, skip=2, nrows=1), as.character, sep=",")))
    x
     }

To make it easy to use, I have saved the code in a file so it is available online. We can then use the R function source() to read the function from the online file. This is now alternative 3. This means that you do not have to run the whole function in an R session first, you only source() the function file.

Alternative 3

Note that the filename, in this case br, has to be between the " " in order for the function to work. The function name is poe4read().

rm(list=ls()) # start with clean sheets
source("http://ansatte.uit.no/oystein.myrland/poe4/source/poe4read.R")
br <- poe4read("br")
head(br)
##    price sqft Bedrooms Baths Age Occupancy Pool Style Fireplace Waterfront
## 1  66500  741        1     1  18         1    1     1         1          0
## 2  66000  741        1     1  18         2    1     1         0          0
## 3  68500  790        1     1  18         1    0     1         1          0
## 4 102000 2783        2     2  18         1    0     1         1          0
## 5  54000 1165        2     1  35         2    0     1         0          0
## 6 143000 2331        2     2  25         1    0     1         1          0
##   DOM
## 1   6
## 2  23
## 3   8
## 4  50
## 5 190
## 6  86
summary(br)
##      price              sqft         Bedrooms        Baths      
##  Min.   :  22000   Min.   : 662   Min.   :1.00   Min.   :1.000  
##  1st Qu.:  99000   1st Qu.:1604   1st Qu.:3.00   1st Qu.:2.000  
##  Median : 130000   Median :2186   Median :3.00   Median :2.000  
##  Mean   : 154863   Mean   :2326   Mean   :3.18   Mean   :1.973  
##  3rd Qu.: 170163   3rd Qu.:2800   3rd Qu.:4.00   3rd Qu.:2.000  
##  Max.   :1580000   Max.   :7897   Max.   :8.00   Max.   :5.000  
##       Age          Occupancy          Pool             Style       
##  Min.   : 1.00   Min.   :1.000   Min.   :0.00000   Min.   : 1.000  
##  1st Qu.: 5.00   1st Qu.:1.000   1st Qu.:0.00000   1st Qu.: 1.000  
##  Median :18.00   Median :2.000   Median :0.00000   Median : 1.000  
##  Mean   :19.57   Mean   :1.565   Mean   :0.07963   Mean   : 3.753  
##  3rd Qu.:25.00   3rd Qu.:2.000   3rd Qu.:0.00000   3rd Qu.: 7.000  
##  Max.   :80.00   Max.   :3.000   Max.   :1.00000   Max.   :11.000  
##    Fireplace       Waterfront           DOM        
##  Min.   :0.000   Min.   :0.00000   Min.   :  0.00  
##  1st Qu.:0.000   1st Qu.:0.00000   1st Qu.: 14.00  
##  Median :1.000   Median :0.00000   Median : 40.00  
##  Mean   :0.563   Mean   :0.07222   Mean   : 74.06  
##  3rd Qu.:1.000   3rd Qu.:0.00000   3rd Qu.:100.25  
##  Max.   :1.000   Max.   :1.00000   Max.   :728.00

By using this procedure, next time you need to download a file from POE4, all you need to do is to change the filename.

If, for example, you would like to download the food data from Chapter 2 (p. 49), you must write the following two lines of R code:

source("http://ansatte.uit.no/oystein.myrland/poe4/source/poe4read.R")
food <- poe4read("food")

A page of all the data file “names” in POE4 is found at: http://www.principlesofeconometrics.com/poe4/poe4def.htm.

Exercise b)

Load R packages.

suppressPackageStartupMessages(require(mosaic)||{install.packages("mosaic");require(mosaic)})
## [1] TRUE
suppressPackageStartupMessages(require(rockchalk)||{install.packages("rockchalk");require(rockchalk)})
## [1] TRUE

A Quadratic model

We would like to estimate the model: \(price=\alpha _{0}+\alpha _{1}\times sqft^{2}+e\) (2.26)

m1=lm(price~I(sqft^2), data=br)
m1
## 
## Call:
## lm(formula = price ~ I(sqft^2), data = br)
## 
## Coefficients:
## (Intercept)    I(sqft^2)  
##   5.578e+04    1.542e-02

Note that we use the I() function, so that we do not have to create a new variable.

If we would like to change the format from scientific notation to numeric, we can use:

options("scipen"=100, "digits"=4)
m1
## 
## Call:
## lm(formula = price ~ I(sqft^2), data = br)
## 
## Coefficients:
## (Intercept)    I(sqft^2)  
##  55776.5656       0.0154

The estimated equation is: \(\widehat{price}=\) 55776.57 \(+\) 0.0154 \(sqft^{2}\).

To find the slope, we take the derivative of (2.26) wrt sqft and find (2.27). We create a function s1 of the slope based on the estimated model.

s1=function(x) {2*coef(m1)[2]*x}
curve(s1, 0,8000, main="Slope of model (2.26)", xlab="sqft", ylab="Sale price, $")

So the slope of a 2000-sqft house is 61.6852, for a 4000-sqft house is 123.3704, and for a 6000-sqft house it is 185.0556.

The elasticity (\(\varepsilon\)) is: \[ \hat{\varepsilon} =\widehat{slope} \times \frac{sqft}{\widehat{price}} =(2\times \hat{\alpha} _{1} \times sqft) \times \frac{sqft}{\widehat{price}} \]

However, since price is a function of sqft the elasticity estimate is: \[ \hat{\varepsilon} = (2\times \hat{\alpha} _{1} \times sqft) \times \frac{sqft}{\hat{\alpha} _{0}+\hat{\alpha} _{1}\times sqft^{2}} \]

First we find the predicted price as a function of sqft:

f1=makeFun(m1) # Predicted price as a function of sqft (makeFun() from mosaic package)
f1(2000) # e.g., predicted price when sqft=2000
##      1 
## 117462

Then we use this f1 function in a elasticity function e1:

e1= function(x) {2*coef(m1)[2]*x^2/f1(x)}
curve(e1, 0,8000, main="Elasticity as a function of sqft", xlab="sqft")

The elasticity of a 2000-sqft house is 1.0503 (predicted price is 117461.7714) , for a 4000-sqft house it is 1.6313 (predicted price is 302517.3885), and for a 6000-sqft house it is 1.8174 (while predicted price here is 610943.4172).

To replicate Figure 2.14 we use the rockchalk package, and the plotCurves function.

plotCurves(m1, plotx = "sqft", main="Quadratic Relationship",
           xlab="Total square feet", ylab="Sale price, $")

A Log-Linear model

We would like to estimate the model: \(log(price)=\gamma _{0}+\gamma _{1}\times sqft^{2}+e\) (2.29)

The log transformation makes the distribution of price less skewed and more symmetric, as seen in Figure 2.16 (b).

histogram(~price, breaks=25, data=br, main="Fig. 2.16 (a)")

histogram(~log(price), breaks=25, data=br, main="Fig. 2.16 (b)")

The fitted log-linear model is:

m2=lm(log(price)~sqft, data=br)
m2
## 
## Call:
## lm(formula = log(price) ~ sqft, data = br)
## 
## Coefficients:
## (Intercept)         sqft  
##   10.838596     0.000411

The estimated equation is: \(log(\widehat{price})=\) 10.84 \(+\) 0.0004 \(sqft\).

This model is linear in a log (y-axis) linear (x-axis) “world”, as seen in the following plot:

plotCurves(m2, plotx = "sqft", main="Log-Linear Relationship",
           xlab="Total square feet", ylab="log(Sale price, $)")

However, to get the price back into levels, we use the exponential function, exp():
\(\widehat{price}=\)exp($ 10.84 \(+\) 0.0004 \(sqft)\)

A plot of this non-linear function in the linear “world” can easily be made:

g <- function(x) {exp(coef(m2)[1]+coef(m2)[2]*x)}
curve(g, 0,8500, lwd=2, col="darkgreen", main="Log-Linear Relationship",
           xlab="Total square feet", ylab="Sale price, $")
points(br$sqft,br$price, col="aquamarine4")

To find the slope, we take the derivative of (2.29) with respect to sqft;
\[ \frac{d(\widehat {price})}{d(sqft)}=\hat{\gamma} _{2} \times \widehat{price}\]

First we create a function f2 of the predicted log-price.

f2=makeFun(m2) # Predicted log-price as a function of sqft 
f2(2000) # e.g., predicted log(price|sqft=2000)
##     1 
## 11.66

We use this f2 function in a slope function s2 in predicted price levels:

s2=function(x) {coef(m2)[2]*exp(f2(x))}
curve(s2, 0,8000, main="Slope of model (2.29)", xlab="sqft", ylab="Sale price, $")

The slope of a 2000-sqft house is 47.6971, for a 4000-sqft house it is 108.5714, and for a 6000-sqft house it is 247.1378.

The elasticity (\(\varepsilon\)) is:
\[ \hat{\varepsilon} = \frac{d(\widehat {price})}{d(sqft)} \times \frac{sqft}{\widehat{price}}=\hat{\gamma} _{2} \times \widehat{price} \times \frac{sqft}{\widehat{price}}=\hat{\gamma} _{2} \times sqft \]

In R lingo the elasticity function e2 is written as:

e2= function(x) {coef(m2)[2]*x}
curve(e2, 0,8000, main="Elasticity as a function of sqft", xlab="sqft")

The elasticity of a 2000-sqft house is 0.8225 (predicted price is 115975.4749) , for a 4000-sqft house it is 1.6451 (predicted price is 263991.3801), and for a 6000-sqft house it is 2.4676 (while predicted price here is 600915.3989).