Question 4(a)

First, the R environment was set up and the dataset was loaded into R.

library(readxl)
data <- read_excel("Hw1_SaleExpenseData.xlsx")
head(data)
## # A tibble: 6 × 2
##   AI_Usage_Hours Productivity_Score
##            <dbl>              <dbl>
## 1           7.18               68.3
## 2           4.48               69.8
## 3           4.15               62.0
## 4           4.15               73.3
## 5          13.9                85.6
## 6           8.60               70.5

Question 4(b)

Next, a simple linear regression (SLR) model was estimated using R in order to examine the linear relationship between Sales (response variable) and Expense (explanatory variable). I first made sure that the column headers were correct.

names(data) <- c("AI_Usage_Hours", "Productivity_Score")
model <- lm(Productivity_Score ~ AI_Usage_Hours, data = data)
summary(model)
## 
## Call:
## lm(formula = Productivity_Score ~ AI_Usage_Hours, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.8494  -3.3299   0.0375   3.3017  15.3541 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     62.5539     0.2796  223.77   <2e-16 ***
## AI_Usage_Hours   1.6458     0.0374   44.01   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.974 on 998 degrees of freedom
## Multiple R-squared:   0.66,  Adjusted R-squared:  0.6596 
## F-statistic:  1937 on 1 and 998 DF,  p-value: < 2.2e-16