R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.

Load Packages

In this section, we install and load the necessary packages.

Import Data

In this section, we import the necessary data for this lab.

Quality Control Case

Everybody seems to disagree about just why so many parts have to be fixed or thrown away after they are produced. Some say that it’s the temperature of the production process, which needs to be held constant (within a reasonable range). Others claim that it’s clearly the density of the product, and that if we could only produce a heavier material, the problems would disappear. Then there is Ole the site manager, who has been warning everyone forever to take care not to push the equipment beyond its limits. This problem would be the easiest to fix, simply by slowing down the production rate; however, this would increase costs. Unfortunately, rate is the only variable that the manager can control. Interestingly, many of the workers on the morning shift think that the problem is “those inexperienced workers in the afternoon,” who, curiously, feel the same way about the morning workers.

Ever since the factory was automated, with computer network communication and bar code readers at each station, data have been piling up. After taking MGT585 class, you’ve finally decided to have a look. Your assistant aggregated the data by 4-hour blocks and then typed in the AM/PM variable, you found the following description of the variables:

temp: measures the temperature variability as a standard deviation during the time of measurement

density: indicates the density of the final product

rate: rate of production

am: 1 indicates morning and 0 afternoon

defect: average number of defects per 1000 produced

Do the following tasks and answer the questions below.

Task 1: data transformation, descriptive stats and visualization of ‘am’ variable

am is categorical variable: 1 = morning and 0 = afternoon. 1. Convert am to factor 2. Calculate the frequency distribution for am 3. Plot the relationship between am and defect using a bar chart

# Correct the type of am

quality$am <- as.factor(quality$am)

# descriptive stats for am: frequency distribution

summary(quality)
##       temp          density           rate       am         defect     
##  Min.   :0.970   Min.   :19.45   Min.   :177.7   0:15   Min.   : 0.00  
##  1st Qu.:1.765   1st Qu.:22.82   1st Qu.:221.3   1:15   1st Qu.:10.62  
##  Median :2.260   Median :24.61   Median :238.5          Median :26.15  
##  Mean   :2.203   Mean   :25.29   Mean   :236.5          Mean   :27.14  
##  3rd Qu.:2.720   3rd Qu.:27.48   3rd Qu.:253.9          3rd Qu.:43.85  
##  Max.   :3.020   Max.   :32.19   Max.   :281.9          Max.   :60.80
# use bar chart to plot relationship between am and defect

ggplot(quality,aes(x = am, y = defect)) + 
  geom_bar(stat = "identity",fill="blue")

Question 1: What does frequency distribution and the bar chart show?

Most of the product defects are occurring during the PM shift.

Task 2: Simple Regression Model with a Qualitative Preditor

Use lm() to run a regression analysis on am as X and defect as Y.

# Use lm() to run a regression analysis on am as X and defect as Y

reg_am <- lm(defect ~ am, data = quality)

summary(reg_am)
## 
## Call:
## lm(formula = defect ~ am, data = quality)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -35.86 -11.14   4.26  11.97  23.44 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   16.920      4.308   3.927  0.00051 ***
## am1           20.440      6.093   3.355  0.00229 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.69 on 28 degrees of freedom
## Multiple R-squared:  0.2867, Adjusted R-squared:  0.2612 
## F-statistic: 11.25 on 1 and 28 DF,  p-value: 0.002295

Question 2: How do you interpret the results? Interpret (1) the coefficient estimates, (2) p-value for beta1, (3) R-squared , and (4) p-value for F-statistics.

The average defects for the AM shift is estimated to be 16.92 units, whereas the PM shift are estimated to be 37.36 units.

The significance of the effect, the p-value is small but still significant.

The R-Squared is somewhat low at 28%, which means we can explain 28% of the variance in the data with this model.

The F-statistics is <0.05 and therefore the model is valid.

Question 3: Interestingly, many of the workers on the morning shift think that the problem is those inexperienced workers in the afternoon, who, curiously, feel the same way about the morning workers. Based on your regression analysis, do you think the claim by morning shift workers is true?

There are more defects in the PM shift, however there could be other factors (R-Square is far from 1 (.2867) and we cannot conclude the PM shift is solely responsible for all the defects.

Task 3: Multiple Regression Model

Run a full model of multiple linear regression on temp, density, rate, am and interaction between rate and am.

# Use lm() to run a multiple regression analysis 

reg_interaction <- lm(defect ~ temp + density + rate + am + rate:am, data = quality)

summary(reg_interaction)
## 
## Call:
## lm(formula = defect ~ temp + density + rate + am + rate:am, data = quality)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.494 -4.071 -1.736  3.812 16.055 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)  17.4929    65.8464   0.266  0.79277   
## temp         16.1557     7.9220   2.039  0.05257 . 
## density      -1.3264     1.4748  -0.899  0.37739   
## rate          0.0208     0.1233   0.169  0.86743   
## am1         -86.7220    27.6546  -3.136  0.00448 **
## rate:am1      0.3662     0.1135   3.227  0.00360 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.139 on 24 degrees of freedom
## Multiple R-squared:  0.9172, Adjusted R-squared:    0.9 
## F-statistic:  53.2 on 5 and 24 DF,  p-value: 3.326e-12

Question 4: How do you interpret the results?

The p-value for the interaction rate×am1 is low, indicating that there is some evidence for the interaction.

The coefficient estimates suggest an increase in temp of 1 unit is associated with increase of defects (intercept + 16.1557) and the same can be said about the rate (intercept + .0208). An increase in density and working in the PM shift does not increase defects.

Both R_squared (multiple and adjusted) are high and we can account for 90%+ of the variance in this model.

Question 5: What is your final recommendation to your manager Ole?

The defects seemed to be caused by increased temperture and not by the PM workers.