Q1

Data Import

t4q1 <- read.table("C:/Users/Wei Hao/Desktop/ST2137/Tutorials/Data/furniture.txt",header=T)
attach(t4q1)

IQR and MAD

IQR(days)

## [1] 38.25

mad(days, constant=1)

## [1] 15.5

mad(days)

## [1] 22.9803

The three robust estimators are smaller than the s.d. It seems that there are some big values in the data set that inflated the s.d. The Gini’s mean difference try to reduce the effect of the extreme values but the result is not as good as the other two robust estimators of $\sigma$.

The variation is better measured by the MAD. The estimate for $\sigma$ based on the MAD is about $22$. All these estimators are robust against outliers in the data set.

Q2

Data Import

t4q2 <- read.table("C:/Users/Wei Hao/Desktop/ST2137/Tutorials/Data/student.txt",header=T)
attach(t4q2)

Labels & Frequency Counts

gendergp <- ifelse(gender=="F","Female","Male")
table(gendergp)

## gendergp
## Female   Male 
##     96    123

travelgp <- ifelse(travel=="Y","Yes","No")
table(travelgp)

## travelgp
##  No Yes 
##  53 166

drivelicgp <- ifelse(drivelic=="Y","Yes","No")
table(drivelicgp)

## drivelicgp
##  No Yes 
##  78 141

Testing For Independence Of Driving Licencse And Gender

table(gendergp, drivelicgp)

##         drivelicgp
## gendergp No Yes
##   Female 36  60
##   Male   42  81

chisq.test(table(gendergp, drivelicgp))

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  table(gendergp, drivelicgp)
## X-squared = 0.13842, df = 1, p-value = 0.7099

Since p-value $=0.7099 > 0.05$, we do not reject the null and conclude that they are independent.

Creating Categorical Variable “wkhrgp”

wkhrgp = character(length(workhour))
wkhrgp[workhour == 0] = "None (0 hrs)"
wkhrgp[workhour > 0 & workhour < 20] = "Some (1 - 19 hrs)"
wkhrgp[workhour >= 20] = "Many (20 - 99 hrs)"
table(wkhrgp)

## wkhrgp
## Many (20 - 99 hrs)       None (0 hrs)  Some (1 - 19 hrs) 
##                 37                101                 81

Cross Tabulation To Examine Relationship Between Travel Outside Asia & Work Hour Group

chisq.test(table(wkhrgp, travelgp))

## 
##  Pearson's Chi-squared test
## 
## data:  table(wkhrgp, travelgp)
## X-squared = 5.8444, df = 2, p-value = 0.05382

Since the observed $\chi^2$ value is $5.8444 < \chi_{0.05}^2 (2) = 5.9915$ (or with p-value = 0.0538), we do not reject $H_0$ and conclude that travel and wkhrgp are independent.

Q3

Table Of Values

v <- matrix(c(16, 24, 654, 306), nc = 2,byrow=T)

Chi-Square Test

With Continuity Correction

chisq.test(v, correct=T)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  v
## X-squared = 12.496, df = 1, p-value = 0.0004079

Without Continuity Correction

chisq.test(v, correct=FALSE)

## 
##  Pearson's Chi-squared test
## 
## data:  v
## X-squared = 13.738, df = 1, p-value = 0.0002101

Since the observed $^2 $ value $=13.74 > \chi_{0.05}^2 = 3.8415$ (or with p-value $= 0.0002$), we reject $H_0$ that “conform” and “shift” are independent.

Tutorial 4 Q1, Q2, Q3

Wei Hao Khoong

3 April 2019

Q1

Data Import

IQR and MAD

Q2

Data Import

Labels & Frequency Counts

Testing For Independence Of Driving Licencse And Gender

Creating Categorical Variable “wkhrgp”

Cross Tabulation To Examine Relationship Between Travel Outside Asia & Work Hour Group

Q3

Table Of Values

Chi-Square Test

With Continuity Correction

Without Continuity Correction