Libraries required for the solution.Only tidyverse is enough as dplyr,ggplot2,readr all are kept inside tidyverse
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
rm(list=ls())
srcFdr="D:\\D Drive\\Certificate Course\\Examination"
fileNm="Production_2024.csv"
srcFile=paste(srcFdr,fileNm,sep="\\")
prd=read_csv(srcFile)
## Rows: 73 Columns: 2
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (2): PRDMTD, STRENGTH
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
summary(prd)
## PRDMTD STRENGTH
## Min. :1.000 Min. :-46.00
## 1st Qu.:1.000 1st Qu.: 92.22
## Median :2.000 Median : 95.59
## Mean :1.548 Mean : 95.62
## 3rd Qu.:2.000 3rd Qu.:100.03
## Max. :3.000 Max. :186.00
unique(prd$PRDMTD)
## [1] 1 2 3
prdCln=prd%>%mutate(PRDMTD=na_if(PRDMTD,3))%>%
mutate(STRENGTH=if_else(STRENGTH<0,NA,STRENGTH))%>%
mutate(PRDMTD=factor(PRDMTD,labels=c("Batch","Mass")))%>%
drop_na(PRDMTD)%>%drop_na(STRENGTH)
summary(prdCln)
## PRDMTD STRENGTH
## Batch:34 Min. : 4.00
## Mass :36 1st Qu.: 92.73
## Median : 95.61
## Mean : 98.34
## 3rd Qu.:100.00
## Max. :186.00
Q1=92.73
Q3=100.00
IQR=Q3-Q1
ll=Q1-1.5*IQR
ul=Q3+1.5*IQR
prdCln=prdCln%>% filter(STRENGTH>ll & STRENGTH<ul)
summary(prdCln)
## PRDMTD STRENGTH
## Batch:30 Min. : 83.88
## Mass :27 1st Qu.: 92.82
## Median : 95.18
## Mean : 95.89
## 3rd Qu.: 99.13
## Max. :109.26
prdStat=prdCln%>%group_by(PRDMTD)%>%
summarize(avg=mean(STRENGTH),
var=var(STRENGTH),
cnt=n())
prdStat
## # A tibble: 2 × 4
## PRDMTD avg var cnt
## <fct> <dbl> <dbl> <int>
## 1 Batch 95.0 8.03 30
## 2 Mass 96.8 43.0 27
prdCln%>%group_by(PRDMTD)%>%
summarize(var=sum((STRENGTH-mean(STRENGTH))^2)/n())
## # A tibble: 2 × 2
## PRDMTD var
## <fct> <dbl>
## 1 Batch 7.76
## 2 Mass 41.4
The variance values are different for (i) and (ii).The sample variance directly from r is greater than the result from the formula given in (ii). As r uses the denominator of the variance as n-1 instead of n,the result is higher than the formula given in (ii).The correct formula of sample variance is \(\frac{(x_i-\bar x)^2}{n-1}\) as it is the unbiased estimator of population variance.
prdCln%>%ggplot(aes(x=PRDMTD,y=STRENGTH,fill = PRDMTD))+
geom_boxplot()+
stat_summary(fun.y = mean,geom = "point",size=3)+
labs(title="Strength by Production Method",
x="Production Method",
y="Strength",fill="Production Method")+
theme_minimal()
## Warning: The `fun.y` argument of `stat_summary()` is deprecated as of ggplot2 3.3.0.
## ℹ Please use the `fun` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
\(H_0\):There is no difference
between mean strength of method1 and mean strength of method2.
\(H_a\): The mean strength of method1
and mean strength of method2 are different.
2 sample \(t - test\) As the variance of mean strength of type mass(43.0) more than 5 times higher than the variance of mean strength of type batch(8.03), the sample variance in \(t-test\) can be assumed to be unequal. Here the t statistics formula is \(t=\frac{\bar{x_m}-\bar{x_b}}{\sqrt(\frac{var_m}{n_1}+\frac{var_b}{n_2})}\)
t=(prdStat[2,2]-prdStat[1,2])/sqrt(prdStat[2,3]/prdStat[2,4]+
prdStat[1,3]/prdStat[1,4])
t.test(prdCln$STRENGTH~prdCln$PRDMTD,var.equal=FALSE)
##
## Welch Two Sample t-test
##
## data: prdCln$STRENGTH by prdCln$PRDMTD
## t = -1.3175, df = 34.602, p-value = 0.1963
## alternative hypothesis: true difference in means between group Batch and group Mass is not equal to 0
## 95 percent confidence interval:
## -4.5666770 0.9730275
## sample estimates:
## mean in group Batch mean in group Mass
## 95.03855 96.83538
As the p-value is 0.1963>.05 the null hypothesis can’t be rejected for significance level .05 and .01.
\(H_0\):The mean strength of method2
is no better than mean strength of method1.
\(H_a\): The mean strength of method2
is better than mean strength of method1.
Here alternative hypothesis is one sided therefore 1 tailed \(t - test\) can be used.
t.test(prdCln$STRENGTH~prdCln$PRDMTD,var.equal=FALSE,
alternative="less")
##
## Welch Two Sample t-test
##
## data: prdCln$STRENGTH by prdCln$PRDMTD
## t = -1.3175, df = 34.602, p-value = 0.09817
## alternative hypothesis: true difference in means between group Batch and group Mass is less than 0
## 95 percent confidence interval:
## -Inf 0.5081784
## sample estimates:
## mean in group Batch mean in group Mass
## 95.03855 96.83538
The p-value = 0.09817. As p-value <0.1 we can reject null hypothesis and say that the claim2 is true for significance level=0.1
1 —-> 3 2 —-> 4 3 —-> 8 4 —-> 7 5 —-> 1 6 —-> 6 7 —-> 2 8 —-> 9 9 —-> 5
\(H_0\):Soft drink bottle contains greater or equal to 67.6 fluid ounces. \(H_a\): Soft drink bottle contains less than 67.6 fluid ounces.
\(H_0\):Soft drink bottle contain 67.6 fluid ounces. \(H_a\): Soft drink bottle doesn’t contain 67.6 fluid ounces.