A multiple linear regression is done on a data that includes a one year of a company’s sales (to predict) and expenses (independent variable) of every month.

library(ggplot2)
library(knitr)

data <- read.csv("https://raw.githubusercontent.com/hovig/MSDS_CUNY/master/DATA605/full-year-budgeting.csv")
kable(head(data,5))
Month Expense Sales
1 1000 9914
2 4000 40487
3 5000 54324
4 4500 50044
5 3000 34719
linear_regression = lm(Sales~Expense,data=data)
summary(linear_regression)
## 
## Call:
## lm(formula = Sales ~ Expense, data = data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3385  -2097    258   1726   3034 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1383.4714  1255.2404   1.102    0.296    
## Expense       10.6222     0.1625  65.378 1.71e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2313 on 10 degrees of freedom
## Multiple R-squared:  0.9977, Adjusted R-squared:  0.9974 
## F-statistic:  4274 on 1 and 10 DF,  p-value: 1.707e-14
multi_regression = lm(Sales~Expense+Month, data=data)
summary(multi_regression)
## 
## Call:
## lm(formula = Sales ~ Expense + Month, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1793.73 -1558.33    -1.73  1374.19  1911.58 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -567.6098  1041.8836  -0.545  0.59913    
## Expense       10.3825     0.1328  78.159 4.65e-14 ***
## Month        541.3736   158.1660   3.423  0.00759 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1607 on 9 degrees of freedom
## Multiple R-squared:  0.999,  Adjusted R-squared:  0.9988 
## F-statistic:  4433 on 2 and 9 DF,  p-value: 3.368e-14
ggplot(data, aes(x = data$Month, y = multi_regression$resid)) + 
  geom_point(size = 3, alpha = .4) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "Month", y = "Residuals") 

hist(multi_regression$resid,main="Histogram of Residuals")

qqnorm(multi_regression$resid)
qqline(multi_regression$resid)

Summary of Interpretations: