UTS Ekonometrika

dhela asafiani agatha

March 18, 2024


Analyze the relationship between a company’s advertising expenditure, its product price, future value, tax,interest rate, and its sales revenue. Follow the instruction below:

a. Generate hypothetical data for 10002 observations.

set.seed(45)  # For reproducibility
n <- 10009
expenditure <- runif(n, 1000, 5000)
product_price <- runif(n, 10, 50)
future_value <- runif(n, 10000, 20000)
tax <- runif(n, 0.1, 0.3)
interest_rate <- runif(n, 0.01, 0.05)

b. Create five independent variables: expenditure, its product price, future value, tax, and interest rate.

data <- data.frame(expenditure, product_price, future_value, tax, interest_rate)

c. Generate a dependent variable, sales revenue, using a linear relationship with the independent variables.

beta <- c(2, -0.5, 3, -1000, -500)  # hypothetical coefficients
sales_revenue <- 50000 + expenditure * beta[1] + product_price * beta[2] + future_value * beta[3] + tax * beta[4] + interest_rate * beta[5] + rnorm(n)
data$sales_revenue <- sales_revenue

d. Fit a multiple regression model where dependent variables are regressed to the independent variables.

model <- lm(sales_revenue ~ expenditure + product_price + future_value + tax + interest_rate, data = data)
model
## 
## Call:
## lm(formula = sales_revenue ~ expenditure + product_price + future_value + 
##     tax + interest_rate, data = data)
## 
## Coefficients:
##   (Intercept)    expenditure  product_price   future_value            tax  
##       50000.0            2.0           -0.5            3.0        -1000.2  
## interest_rate  
##        -499.3

e. Print a summary of the regression results, which includes coefficients, standard errors, t-statistics, p-values, and R-squared.

summary(model)
## 
## Call:
## lm(formula = sales_revenue ~ expenditure + product_price + future_value + 
##     tax + interest_rate, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5459 -0.6639  0.0074  0.6695  3.5755 
## 
## Coefficients:
##                 Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)    5.000e+04  7.699e-02 649412.6   <2e-16 ***
## expenditure    2.000e+00  8.580e-06 233101.0   <2e-16 ***
## product_price -5.000e-01  8.558e-04   -584.3   <2e-16 ***
## future_value   3.000e+00  3.444e-06 871141.4   <2e-16 ***
## tax           -1.000e+03  1.710e-01  -5850.1   <2e-16 ***
## interest_rate -4.993e+02  8.588e-01   -581.4   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.992 on 10003 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.634e+11 on 5 and 10003 DF,  p-value: < 2.2e-16

f. Plot the residuals against the fitted values to check for heteroscedasticity (unequal variance) and nonlinearity.

plot(model$fitted.values, model$residuals)
abline(h = 0, col = "green")

g. Plot diagnostic plots to further assess the assumptions of linear regression, including normality of residuals, constant variance, and absence of influential outliers.

par(mfrow=c(2,2))
plot(model)

Investigate the factors influencing housing prices as the following instructions:

a. Simulate a hypothetical dataset with 20002 observations containing variables such as house size, number of bedrooms, city (five cities), toll access (yes or no), age of the house, and price.

n <- 20009
house_size <- runif(n, 800, 3500)  # square feet
num_bedrooms <- sample(1:5, n, replace = TRUE)
city <- sample(c("CityA", "CityB", "CityC", "CityD", "CityE"), n, replace = TRUE)
toll_access <- sample(c("yes", "no"), n, replace = TRUE)
age <- sample(1:100, n, replace = TRUE)
price <- 100000 + house_size * 200 - num_bedrooms * 10000 + ifelse(toll_access == "yes", 15000, -10000) - age * 500 + rnorm(n, 0, 10000)

data_housing <- data.frame(house_size, num_bedrooms, city, toll_access, age, price)

b. Fit a multiple regression model using the lm() function, where the price of the house is the dependent variable, and house size, number of bedrooms, city, and age are the independent variables.

model_housing <- lm(price ~ house_size + num_bedrooms + city + age, data = data_housing)

c. Convert the "city" and “toll access” variable to a factor to treat it as a categorical variable.

data_housing$city <- as.factor(data_housing$city)
data_housing$toll_access <- as.factor(data_housing$toll_access)

d. Summarize the fitted regression model to analyze the coefficients, standard errors, t-values, and p-values.

summary(model_housing)
## 
## Call:
## lm(formula = price ~ house_size + num_bedrooms + city + age, 
##     data = data_housing)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -49042 -12610    -95  12690  48286 
## 
## Coefficients:
##                Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)   1.028e+05  5.074e+02  202.616   <2e-16 ***
## house_size    2.001e+02  1.458e-01 1372.344   <2e-16 ***
## num_bedrooms -1.003e+04  8.049e+01 -124.607   <2e-16 ***
## cityCityB    -1.824e+02  3.610e+02   -0.505    0.613    
## cityCityC    -1.934e+02  3.569e+02   -0.542    0.588    
## cityCityD    -2.517e+02  3.576e+02   -0.704    0.481    
## cityCityE    -4.229e+01  3.571e+02   -0.118    0.906    
## age          -5.051e+02  3.944e+00 -128.059   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16050 on 20001 degrees of freedom
## Multiple R-squared:  0.9896, Adjusted R-squared:  0.9896 
## F-statistic: 2.728e+05 on 7 and 20001 DF,  p-value: < 2.2e-16

e. Check for multicollinearity using the Variance Inflation Factor (VIF) to assess the correlation between independent variables.

library(car)
vif(model_housing)
##                  GVIF Df GVIF^(1/(2*Df))
## house_size   1.000306  1        1.000153
## num_bedrooms 1.000325  1        1.000163
## city         1.000622  4        1.000078
## age          1.000437  1        1.000218

f. Perform diagnostic tests for heteroskedasticity using the Breusch-Pagan test and for linearity using the Rainbow test.

library(lmtest)
library(gvlma)

bptest(model_housing)
## 
##  studentized Breusch-Pagan test
## 
## data:  model_housing
## BP = 2.7481, df = 7, p-value = 0.9073
gvlma(model_housing)
## 
## Call:
## lm(formula = price ~ house_size + num_bedrooms + city + age, 
##     data = data_housing)
## 
## Coefficients:
##  (Intercept)    house_size  num_bedrooms     cityCityB     cityCityC  
##    102800.28        200.09     -10028.98       -182.39       -193.38  
##    cityCityD     cityCityE           age  
##      -251.74        -42.29       -505.12  
## 
## 
## ASSESSMENT OF THE LINEAR MODEL ASSUMPTIONS
## USING THE GLOBAL TEST ON 4 DEGREES-OF-FREEDOM:
## Level of Significance =  0.05 
## 
## Call:
##  gvlma(x = model_housing) 
## 
##                        Value p-value                   Decision
## Global Stat        449.50753  0.0000 Assumptions NOT satisfied!
## Skewness             0.03277  0.8563    Assumptions acceptable.
## Kurtosis           449.13053  0.0000 Assumptions NOT satisfied!
## Link Function        0.01507  0.9023    Assumptions acceptable.
## Heteroscedasticity   0.32916  0.5662    Assumptions acceptable.

g. Create diagnostic plots to assess the model's assumptions, including residual plots against fitted values, Q-Q plots of residuals, and plots of residuals against leverage.

par(mfrow=c(2,2))
plot(model_housing)

References: