This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Use control+Enter to run the code chunks on PC. Use command+Enter to run the code chunks on MAC.
In this section, we install and load the necessary packages.
In this section, we import the necessary data for this lab.
Everybody seems to disagree about just why so many parts have to be fixed or thrown away after they are produced. Some say that it’s the temperature of the production process, which needs to be held constant (within a reasonable range). Others claim that it’s clearly the density of the product, and that if we could only produce a heavier material, the problems would disappear. Then there is Ole the site manager, who has been warning everyone forever to take care not to push the equipment beyond its limits. This problem would be the easiest to fix, simply by slowing down the production rate; however, this would increase costs. Unfortunately, rate is the only variable that the manager can control. Interestingly, many of the workers on the morning shift think that the problem is “those inexperienced workers in the afternoon,” who, curiously, feel the same way about the morning workers.
Ever since the factory was automated, with computer network communication and bar code readers at each station, data have been piling up. After taking MGT585 class, you’ve finally decided to have a look. Your assistant aggregated the data by 4-hour blocks and then typed in the AM/PM variable, you found the following description of the variables:
temp: measures the temperature variability as a standard deviation during the time of measurement
density: indicates the density of the final product
rate: rate of production
am: 1 indicates morning and 0 afternoon
defect: average number of defects per 1000 produced
Do the following tasks and answer the questions below.
am is categorical variable: 1 = morning and 0 = afternoon. 1. Convert am to factor 2. Calculate the frequency distribution for am 3. Plot the relationship between am and defect using a bar chart
# Correct the type of am
quality$am <- as.factor(quality$am)
# descriptive stats for am: frequency distribution
summary(quality)
## temp density rate am defect
## Min. :0.970 Min. :19.45 Min. :177.7 0:15 Min. : 0.00
## 1st Qu.:1.765 1st Qu.:22.82 1st Qu.:221.3 1:15 1st Qu.:10.62
## Median :2.260 Median :24.61 Median :238.5 Median :26.15
## Mean :2.203 Mean :25.29 Mean :236.5 Mean :27.14
## 3rd Qu.:2.720 3rd Qu.:27.48 3rd Qu.:253.9 3rd Qu.:43.85
## Max. :3.020 Max. :32.19 Max. :281.9 Max. :60.80
# use bar chart to plot relationship between am and defect
ggplot(quality,aes(x = am, y = defect)) +
geom_bar(stat = "identity",fill="blue")
Question 1: What does frequency distribution and the bar chart show?
Most of the product defects are occurring during the PM shift.
Use lm() to run a regression analysis on am as X and defect as Y.
# Use lm() to run a regression analysis on am as X and defect as Y
reg_am <- lm(defect ~ am, data = quality)
summary(reg_am)
##
## Call:
## lm(formula = defect ~ am, data = quality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -35.86 -11.14 4.26 11.97 23.44
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 16.920 4.308 3.927 0.00051 ***
## am1 20.440 6.093 3.355 0.00229 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.69 on 28 degrees of freedom
## Multiple R-squared: 0.2867, Adjusted R-squared: 0.2612
## F-statistic: 11.25 on 1 and 28 DF, p-value: 0.002295
Question 2: How do you interpret the results? Interpret (1) the coefficient estimates, (2) p-value for beta1, (3) R-squared , and (4) p-value for F-statistics.
The average defects for the AM shift is estimated to be 16.92 units, whereas the PM shift are estimated to be 37.36 units.
The significance of the effect, the p-value is small but still significant.
The R-Squared is somewhat low at 28%, which means we can explain 28% of the variance in the data with this model.
The F-statistics is <0.05 and therefore the model is valid.
Question 3: Interestingly, many of the workers on the morning shift think that the problem is those inexperienced workers in the afternoon, who, curiously, feel the same way about the morning workers. Based on your regression analysis, do you think the claim by morning shift workers is true?
There are more defects in the PM shift, however there could be other factors (R-Square is far from 1 (.2867) and we cannot conclude the PM shift is solely responsible for all the defects.
Run a full model of multiple linear regression on temp, density, rate, am and interaction between rate and am.
# Use lm() to run a multiple regression analysis
reg_interaction <- lm(defect ~ temp + density + rate + am + rate:am, data = quality)
summary(reg_interaction)
##
## Call:
## lm(formula = defect ~ temp + density + rate + am + rate:am, data = quality)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.494 -4.071 -1.736 3.812 16.055
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.4929 65.8464 0.266 0.79277
## temp 16.1557 7.9220 2.039 0.05257 .
## density -1.3264 1.4748 -0.899 0.37739
## rate 0.0208 0.1233 0.169 0.86743
## am1 -86.7220 27.6546 -3.136 0.00448 **
## rate:am1 0.3662 0.1135 3.227 0.00360 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.139 on 24 degrees of freedom
## Multiple R-squared: 0.9172, Adjusted R-squared: 0.9
## F-statistic: 53.2 on 5 and 24 DF, p-value: 3.326e-12
Question 4: How do you interpret the results?
The p-value for the interaction rate×am1 is low, indicating that there is some evidence for the interaction.
The coefficient estimates suggest an increase in temp of 1 unit is associated with increase of defects (intercept + 16.1557) and the same can be said about the rate (intercept + .0208). An increase in density and working in the PM shift does not increase defects.
Both R_squared (multiple and adjusted) are high and we can account for 90%+ of the variance in this model.
Question 5: What is your final recommendation to your manager Ole?
The defects seemed to be caused by increased temperture and not by the PM workers.