rm(list=ls())
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.3     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.0
## ✔ ggplot2   3.4.3     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(car)
## Loading required package: carData
## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:purrr':
## 
##     some
dat<-readxl::read_excel("/Users/mwilandhlovu/Desktop/ECONS2000/ECOM2000_final_sample_gva_mine.xlsx")

##Question 1:
create (i) a time variable, t, that starts at 1 in the first period and increases by 1 every period,

dat$t<-seq(1,166)

and (ii) three dummy variables, D1,t , D2,t and D3,t , which take a value 1 if the observation t is in the corresponding quarter and 0 otherwise (e.g., 1, 1 D t  if the period t is in the 1st quarter and 0 otherwise).

dat$q2<-ifelse(dat$Quarter==2,1,0)
dat$q3<-ifelse(dat$Quarter==3,1,0)
dat$q4<-ifelse(dat$Quarter==4,1,0)

Then, estimate a linear trend model with seasonality. Provide a summary output from R.

#yt = b00 + b1t + b2qt2 + b3qt3 + b4qt2

lms<-lm(GVA_mine~t+q2+q3+q4,data = dat)
summary(lms)
## 
## Call:
## lm(formula = GVA_mine ~ t + q2 + q3 + q4, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4067.3 -1465.5  -808.5   381.2 10404.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1630.416    561.431   2.904  0.00420 ** 
## t            188.605      4.413  42.738  < 2e-16 ***
## q2           930.346    601.748   1.546  0.12405    
## q3          1617.913    598.140   2.705  0.00757 ** 
## q4          1286.450    598.156   2.151  0.03299 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2724 on 161 degrees of freedom
## Multiple R-squared:  0.9194, Adjusted R-squared:  0.9174 
## F-statistic: 458.9 on 4 and 161 DF,  p-value: < 2.2e-16

##Question 2: State the sample regression equation estimated in Question 1.

\[ GVA = 1630.416 + 188.605t + 930.346D1_t + 1617.913D2_t + 1286.450D4_t + e_t \]

##Question 3:provide interpretations of the coefficients for t, D1, (i) Interpret the estimated coefficient for t. 188.60 is how much GVA changes, every quarter when t increases by 1, holding all other variables constant (D1, D2, D3).

  1. Interpret the estimated coefficients for D1, D2, and D3. The average difference in GVA between quarters 1 and 2,

how much GVA is higher in quarter 2 as compared to quater 1, holding all other Variables constant. The average difference in GVA between Q1 and 2. same goes from Q3 how much higher GVA is in Q3 from Q1.

##Question 4: Interpret the reported R‐square value and briefly comment on the adequacy of the model.

In this model the adjusted R squared is 0.91, which suggests 91% of the variation in GVA per captia is explained by model. The standard error is 2724 which itially looks like a high number, but when you compare it to the other high co-effieents it isnt such as big change. So the model is showing to be adequcate because of the R sqaured