Regression

Part I. An accountant wishes to predict direct labor cost (y, in $100) on the basis of the batch size (x) of a product produced in a job shop. Data for 12 production runs are given in Table 6414-HW2-laborcost. By using R (or an appropriate software you prefer), answer Questions 1{5 and submit the relevant outputs.

Construct and submit a scatter plot of y versus x. Does a simple linear regression model seem appropriate here?

data= read.csv("D:/Georgia Tech/Regression/HW2/6414-HW2-laborcost.csv")

library("ggplot2")

## Warning: package 'ggplot2' was built under R version 4.2.2

plot(data$Bsize, data$Cost)

From the scatter plot, we can see that the relationship between the two variables is roughly linear. This suggests that a simple linear regression model may be appropriate for this data.

plot(data$Bsize, data$Cost)

2.Fit the simple linear regression model using the method of least squares, i.e., find the least squares line, ˆy = ˆβ0 + ˆβ1x by using the software (preferably R). Submit your solution (output).

library("tidyverse")

## Warning: package 'tidyverse' was built under R version 4.2.2

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.1      ✔ stringr 1.5.0 
## ✔ readr   2.1.3      ✔ forcats 0.5.2 
## ✔ purrr   0.3.5

## Warning: package 'tibble' was built under R version 4.2.1

## Warning: package 'tidyr' was built under R version 4.2.2

## Warning: package 'readr' was built under R version 4.2.2

## Warning: package 'purrr' was built under R version 4.2.2

## Warning: package 'dplyr' was built under R version 4.2.2

## Warning: package 'stringr' was built under R version 4.2.2

## Warning: package 'forcats' was built under R version 4.2.1

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

model <- lm(Cost ~ Bsize, data = data)

summary(model)

## 
## Call:
## lm(formula = Cost ~ Bsize, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.5351  -3.5462   0.4444   3.2786  15.4444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18.48751    4.67658   3.953  0.00272 ** 
## Bsize       10.14626    0.08662 117.134  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.642 on 10 degrees of freedom
## Multiple R-squared:  0.9993, Adjusted R-squared:  0.9992 
## F-statistic: 1.372e+04 on 1 and 10 DF,  p-value: < 2.2e-16

According to the summary the least square line is y= 18.48 + 10.14x.

In plain English, interpret the meaning of the slope parameter β1.

The slope paramter, also known as coefficient of c, represents the change in the mean value of dependant variable y for one unit change in the independent variable(x).

For this specific example, the slope parameter (10.14) is positive, which indicates that as the batch size (x) increases by one unit, the direct labor cost (y) is expected to increase by 10.14 units on average.

In plain English, interpret the meaning of the intercept 0. Does it have a practical meaning here?

The intercept, also known as the constant term, represents the value of the dependent variable (y) when the independent variable (x) is equal to zero. In other words, it tells us the expected value of y when x is zero.

In this specific case, the intercept (18.48) is positive, which indicates that when the batch size (x) is zero, the direct labor cost (y) is expected to be 18.48. However, it’s important to note that the interpretation of the intercept becomes less meaningful when the independent variable is not able to take on a value of zero, as it is in this case with the batch size. It is also not a practical interpretation as it is not possible to have a batch size of zero.

In plain English, interpret the meaning of the intercept β0. Does it have a practical meaning here?

-The intercept (β0) in a simple linear regression model represents the predicted value of the response variable (y) when the predictor variable (x) is equal to zero. In other words, it is the point where the least squares line crosses the y-axis.

In this case, since our response variable is the direct labor cost and predictor variable is the batch size (x) of a product produced in a job shop. The intercept (β0) can be interpreted as the expected direct labor cost when the batch size is zero. However, as it is impractical to have a batch size of zero in this scenario, the intercept does not have a practical meaning here.

Report the values of ˆσ, ˆσ2, and SSE . The values of ˆσ (residual standard error), ˆσ2 (estimate of population variance of the errors), and SSE (sum of squared errors) can be obtained from the output of the summary() function of the linear regression model.

In this specific case, the residual standard error (ˆσ) is 8.64 on 10 degree of freedom, the estimate of population variance of the errors (σˆ2) is 74.64 and the sum of squared errors (SSE) is 231040.

The residual standard error is a measure of the average deviation of the residuals (prediction errors) from zero. It indicates the average distance that the residuals deviate from the true values.

5.Report the values of ˆσ, ˆσ2, and SSE .

The estimate of population variance of the errors (ˆσ2) is a measure of the variability of the residuals around the regression line. It is an estimate of the variance of the error term in the population. The sum of squared errors (SSE) is a measure of the total deviation of the observed values from the predicted values. It is the sum of the squared differences between the observed values of y and the predicted values of y.

residuals <- model$residuals
SSE <- sum(residuals^2)
SSE

## [1] 746.7624

It’s important to note that these values are used to evaluate the goodness of fit of the model and the residuals, and to compare different models.

Part II. The mathematics department of a liberal arts college administered a 25-point placement test to assign appropriate math courses to incoming freshman. The depart- ment believes that the test is a good predictor of a student’s final grade in its introductory course. A linear regression model that uses a student’s placement test score (x) to pre- dict their final course grade (y) was studied, and the following summary quantities were obtained: n = 15, n∑i=1(yi = 1072), n∑i=1 y2i = 79026, n∑i=1 xi = 254, n∑i=1 x2i = 4592, and n∑i=1 xiyi = 18867. Answer the following questions. 6. Calculate the least squares estimates of the slope and the intercept.

Regression_Hw1

Abhilasha

2023-01-25