First, the R environment was set up and the dataset was loaded into R.
library(readxl)
data <- read_excel("Hw1_SaleExpenseData.xlsx")
head(data)
## # A tibble: 6 × 2
## AI_Usage_Hours Productivity_Score
## <dbl> <dbl>
## 1 7.18 68.3
## 2 4.48 69.8
## 3 4.15 62.0
## 4 4.15 73.3
## 5 13.9 85.6
## 6 8.60 70.5
Next, a simple linear regression (SLR) model was estimated using R in order to examine the linear relationship between Sales (response variable) and Expense (explanatory variable). I first made sure that the column headers were correct.
names(data) <- c("AI_Usage_Hours", "Productivity_Score")
model <- lm(Productivity_Score ~ AI_Usage_Hours, data = data)
summary(model)
##
## Call:
## lm(formula = Productivity_Score ~ AI_Usage_Hours, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.8494 -3.3299 0.0375 3.3017 15.3541
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 62.5539 0.2796 223.77 <2e-16 ***
## AI_Usage_Hours 1.6458 0.0374 44.01 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.974 on 998 degrees of freedom
## Multiple R-squared: 0.66, Adjusted R-squared: 0.6596
## F-statistic: 1937 on 1 and 998 DF, p-value: < 2.2e-16