DataM: HW Exercise 0316 3
The IQ scores and behavioral problem scores of children at age 5 were examined depending on whether or not their mothers had suffered an episode of post-natal depression. The main questions of interests were:
(1) Did the two groups of children have different IQ and/or behavioral problems?
(2) Was there any evidence of a relationship between IQ and behavioral problems?
Source: Dr. C. Kumar, Institute of Psychiatry, Lonon.
Load in the data set and check its structure
- Load in the data set and named it
dta.
- Find the class of
dta.
[1] "data.frame"
dta is a data frame.
- Check the structure of
dta.
'data.frame': 94 obs. of 3 variables:
$ Dep: Factor w/ 2 levels "D","N": 2 2 2 2 1 2 2 2 2 2 ...
$ IQ : int 103 124 124 104 96 92 124 99 92 116 ...
$ BP : int 4 12 9 3 3 3 6 4 3 9 ...
dta is a data frame with 94 observations and 3 variables: Dep, IQ, and BP. Dep is a factorial variable. IQ and BP are numerical variable with integers.
- Check the dimension of
dta.
[1] 94 3
dta has 94 rows and 3 columns.
- Find the names of the columns in
dta.
[1] "Dep" "IQ" "BP"
- Check if
BP, a variable indta, is a vector or not.
[1] TRUE
BP, a variable in dta, is a vector.
- Take the \(1^{st}\) row of
dta, it is a slice ofdta.
- Take the \(1^{st} - 3^{rd}\) elements of
IQ, a variable indta. It is a vector.
[1] 103 124 124
- Sort
dtain the ascending order of variableBP, and take the last 6 rows. In other words, this code takes the data of the 6 highestBP.
- Sort
dtain the descending order of variableBP, and take the last 4 rows. In other words, this code takes the data of the 4 lowestBP.
Data visualization
- Draw the histogram of ˋˋdta$IQˋˋ with x-axis name, “IQ” and without the title.
- Draw the box plots of
dta$BPin groups ofdta$Depwith names of x-axis and y-axis.
Compare to participants in the group of non-depression, participant in the group of depression seemed to have higher levels of behavioral problems.
- Draw the scatter plot of
dta$IQanddta$BPwith different colors representing different groups ofDep. Set the point style and add a grid on the plot.
plot(IQ ~ BP, data = dta, pch = 20, col = dta$Dep,
xlab = "Behavior problem score", ylab = "IQ")
grid()- Draw the scatter plot of
dta$BPanddta$IQwith point labels ofDep. Add the regression lines ofy=BP, x=IQwith different types representing different groups ofDep.
plot(BP ~ IQ, data = dta, type = "n",
ylab = "Behavior problem score", xlab = "IQ")
text(dta$IQ, dta$BP, labels = dta$Dep, cex = 0.5)
abline(lm(BP ~ IQ, data = dta, subset = Dep == "D"))
abline(lm(BP ~ IQ, data = dta, subset = Dep == "N"), lty = 2)Compare to the group of non-depression, there were a more negative correlation (i.e., \(r<0\)) between IQ and behavioral problems in the group of depression.
Hypothesis testing
For question (a): Independent two sample t-test
- Did the two groups of children have different IQ?
\(H_0: \mu_{IQ(D)} = \mu_{IQ(N)}\)
\(H_1: \mu_{IQ(D)} \neq \mu_{IQ(N)}\)
Welch Two Sample t-test
data: IQ by Dep
t = -1.6374, df = 15.53, p-value = 0.1216
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-26.926586 3.490299
sample estimates:
mean in group D mean in group N
101.0667 112.7848
Since \(p > \alpha =.05\), we retain \(H_0\). Two groups of children did not have significantly different IQ.
- Did the two groups of children have different behavioral problems?
\(H_0: \mu_{BP(D)} = \mu_{BP(N)}\)
\(H_1: \mu_{BP(D)} \neq \mu_{BP(N)}\)
Welch Two Sample t-test
data: BP by Dep
t = 1.4924, df = 17.14, p-value = 0.1538
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-0.6637017 3.8788916
sample estimates:
mean in group D mean in group N
7.000000 5.392405
Since \(p > \alpha =.05\), we retain \(H_0\). Two groups of children did not have significantly different behavioral problems.
For question (b): Correlation test and linar regression analysis
Was there any evidence of a relationship between IQ and behavioral problems?
Correlation test
\(H_0: \phi_{IQ, BP} = 0\)
\(H_1: \phi_{IQ, BP} \neq 0\)
Pearson's product-moment correlation
data: dta$IQ and dta$BP
t = -3.8088, df = 92, p-value = 0.0002518
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.5319037 -0.1798969
sample estimates:
cor
-0.3690615
Since \(p < \alpha =.05\), we reject \(H_0\). IQ is significantly correlated with behavioral problems.
Regression analysis
\(H_0: \beta_{IQ} = 0\)
\(H_1: \beta_{IQ} \neq 0\)
Call:
lm(formula = BP ~ IQ, data = dta)
Residuals:
Min 1Q Median 3Q Max
-5.9828 -2.3564 -0.4111 2.1210 7.2399
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.18280 2.00180 6.585 2.76e-09 ***
IQ -0.06792 0.01783 -3.809 0.000252 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.983 on 92 degrees of freedom
Multiple R-squared: 0.1362, Adjusted R-squared: 0.1268
F-statistic: 14.51 on 1 and 92 DF, p-value: 0.0002518
Since \(p < \alpha =.05\), we reject \(H_0\). IQ is significantly associated with behavioral problems.