#Data : Assembly times
assembly_times = c(28, 32, 29, 31, 30, 28, 33, 30, 31, 29, 30, 32, 31, 30, 29, 32, 33, 29, 31, 30, 32, 29, 31, 30)
#Null Hypothesis mu =22
#Alternative Hypthesis mu !=22
t.test(assembly_times, mu=22, alternative = "two.sided")
##
## One Sample t-test
##
## data: assembly_times
## t = 28.592, df = 23, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 22
## 95 percent confidence interval:
## 29.80771 31.02562
## sample estimates:
## mean of x
## 30.41667
The null hypothesis is always 22. The test statistic value is 28.592. The p value is less than 0.01 Since p <0.05 reject the null hypothesis.
total customers surveyed n=500 customers satisified x=120 expected satisification rate of 0.25 significane level = 0.05 null hypothesis = 0.25 alternative hypothesis != 0.25
x = 120 #number of satisfied customers
n = 500 # total customers surveyed
prop.test(x = x, n = n, p = 0.25, alternative = "two.sided", correct = FALSE)
##
## 1-sample proportions test without continuity correction
##
## data: x out of n, null probability 0.25
## X-squared = 0.26667, df = 1, p-value = 0.6056
## alternative hypothesis: true p is not equal to 0.25
## 95 percent confidence interval:
## 0.2046379 0.2793268
## sample estimates:
## p
## 0.24
The null hypothesis p = 0.25 The p value is 0.6056. Sine p> 0.05, we fail to reject the null hypothesis.
#Q3 Method A: 75, 80, 85, 78, 88, 82, 79, 81, 86, 90 Method B: 70, 85, 88, 76, 92, 80, 77, 82, 84, 89 looking for significant difference in the mean scores between the two methods
A = c(75, 80, 85, 78, 88, 82, 79, 81, 86, 90)
B = c(70, 85, 88, 76, 92, 80, 77, 82, 84, 89)
#two sample t-test
t.test (A, B, alternative = "two.sided")
##
## Welch Two Sample t-test
##
## data: A and B
## t = 0.038458, df = 16.186, p-value = 0.9698
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -5.407059 5.607059
## sample estimates:
## mean of x mean of y
## 82.4 82.3
The null hypothesis is correct. The p value is 0.9698 p> 0.05, fail to reject the null hypothesis
#Q4 For Line X: n1 = 400, x1 = 20 For Line Y: n2 = 450, x2 = 40 looking to see if the proportions of defectives products p1 and p2 differs significantly. null hypothesis p1=p2 alternative hypothesis p1 != p2
x = c(20, 40) #number of the defects
n = c(400, 450) #total units that was inspected
#two-proportion test
prop.test(x = x, n = n, alternative = "two.sided", correct = FALSE)
##
## 2-sample test for equality of proportions without continuity correction
##
## data: x out of n
## X-squared = 4.8816, df = 1, p-value = 0.02714
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.07276411 -0.00501367
## sample estimates:
## prop 1 prop 2
## 0.05000000 0.08888889
The null hypothesis is p1=p2 The p value is 0.027 which is less than 0.03 p< 0.05, we reject the null hypothesis
#Q5 Material A: 45, 48, 41, 43, 50, 42, 44, 47, 42, 40, 49, 46, 43, 41, 48 Material B: 46, 49, 43, 45, 50, 44, 48, 46, 42, 45, 47, 43, 45, 49, 50 Material C: 47, 45, 42, 49, 46, 50, 48, 44, 41, 46, 49, 42, 45, 47, 48 H0- mean tensile strength is the same for all the materials Ha - atleast one mean is different
A = c(45, 48, 41, 43, 50, 42, 44, 47, 42, 40, 49, 46, 43, 41, 48)
B = c(46, 49, 43, 45, 50, 44, 48, 46, 42, 45, 47, 43, 45, 49, 50)
C = c(47, 45, 42, 49, 46, 50, 48, 44, 41, 46, 49, 42, 45, 47, 48)
data = data.frame(strength = c(A, B, C), group = rep(c("A", "B", "C"), c(length (A), length(B), length(C))))
#one way ANOVA
model = aov(strength ~ group, data = data)
summary (model)
## Df Sum Sq Mean Sq F value Pr(>F)
## group 2 20.8 10.422 1.257 0.295
## Residuals 42 348.3 8.292
The alternative hypothesis is that at least one mean is different but not all. The test statistic value is 1.26 The p value is 0.295 p>0.05 we fail to reject the null hypothesis.
##Q6 Diameter (x) : 11.8, 11.9, 11.5, 11.7, 11.9, 11.6, 11.7, 11.8, 11.6, 11.5, 11.9, 11.8, 11.7, 11.6, 11.5, 11.8, 11.9, 11.6, 11.7, 11.8 Tensile Strength (y): 45, 48, 41, 43, 50, 42, 44, 47, 42, 40, 49, 46, 43, 41, 48, 52, 45, 43, 46, 49.
x = c(11.8, 11.9, 11.5, 11.7, 11.9, 11.6, 11.7, 11.8, 11.6, 11.5, 11.9, 11.8, 11.7, 11.6, 11.5, 11.8, 11.9, 11.6, 11.7, 11.8)
y = c(45, 48, 41, 43, 50, 42, 44, 47, 42, 40, 49, 46, 43, 41, 48, 52, 45, 43, 46, 49)
#simple linear regression
model = lm(y~x)
summary (model)
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3078 -1.5982 -0.6081 0.7821 6.4118
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -151.599 48.781 -3.108 0.006075 **
## x 16.799 4.164 4.035 0.000777 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.517 on 18 degrees of freedom
## Multiple R-squared: 0.4749, Adjusted R-squared: 0.4457
## F-statistic: 16.28 on 1 and 18 DF, p-value: 0.0007775
#prediction for x
predict(model, newdata = data.frame(x=11.6))
## 1
## 43.26813
intercept = -151.60 and slope =16.80 so the regression equation is correct the p value is 0.0007775, less than 0.05, so variable x is significant for each unit increase in x y increases by 16.80 on average. correct, the predicted value is 43.26813
#Q7 Controlling the mean of a continuous measurement would not be typical at all for use for a p-chart. P charts are typically used for monitoring proportions, not monitoring continous data.
##Q8 we have 20 samples, each has size of n=50 Number of defective tems in each sample 2, 3, 4, 1, 2, 3, 5, 2, 3, 1, 2, 3, 4, 2, 3,1, 5, 4, 2, 3
x = c(2, 3, 4, 1, 2, 3, 5, 2, 3, 1, 2, 3, 4, 2, 3,1, 5, 4, 2, 3)
#number of samples and sample size
n_samples = length (x)
sample_size = 50
#total defectives
total_defectives = sum(x)
#number of items inspected
total_items = n_samples * sample_size
#calculate the center lne
p_bar = total_defectives / total_items
#print the center line
p_bar
## [1] 0.055
##Q9 Expected proportions Category X = 30% Category Y = 40% Category Z = 30% Observed Counts X= 40, Y = 55, Z =55 H0 - the data matches the expected proportions Ha - the data does not match the expected proportions Chi Square Goodness of Fit Test
#counts
observed = c(40, 55, 55)
#expected proportions
expected_prop = c(0.30, 0.40, 0.30)
#sample size
total = sum(observed)
#expected counts
expected = total * expected_prop
#chi-square test
chisq.test(observed, p=expected_prop)
##
## Chi-squared test for given probabilities
##
## data: observed
## X-squared = 3.1944, df = 2, p-value = 0.2025
In a chi-square goodness of fit test, the null hypothesis is always that the data is consistent with the proportions provided. The test statistic value iis 3.19 the p value is 0.2025 p>0.05, we fail to reject the null hypothesis
#Q10 We are looking to test to see if there is an association between two values, the Chi-Square Test of Indepence. H0 - no relationship between the material type and the presence of surface defects HA - there is a relationshp between the material type and the presence of surface defects