#tinytex::install_tinytex()
This report use the MacArthur-Bates Communicative Development Inventories (CDIs), a family of parent-report instruments about early language acquisition, provided by Wordbank, for exploring the relationship between comprehension and production of language and gesture development in infants. First section descriptive characteristics of sample used in the analysis. Second section, explore the relationship between comprehension and production of language and gesture development using a linear regression model and beta-regression model. Finally, third section presents the results of a logistic hierarchical regression model used to investigate whether exists a relationship between the children demographic characteristics and the likelihood to produce gestures between 8 and 18 months.
For current analysis, we focus on \(1.149\) native English infants (\(49.1\%\) Female, \(72.4\%\) White) ranged from 8 and 18 months of age (Mean = 13, SD=2.8) (Table 1)
Table 1 Descriptive statistics demographic information
| Variable | Stats / Values | Freqs (% of Valid) | Missing | ||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| age [integer] |
|
11 distinct values | 0 (0.0%) | ||||||||||||||||||||||||||||||||||||||||
| sex [factor] |
|
|
43 (3.7%) | ||||||||||||||||||||||||||||||||||||||||
| birth_order [factor] |
|
|
102 (8.9%) | ||||||||||||||||||||||||||||||||||||||||
| ethnicity [factor] |
|
|
82 (7.1%) | ||||||||||||||||||||||||||||||||||||||||
| mom_ed [factor] |
|
|
81 (7.0%) |
Generated by summarytools 1.0.0 (R version 4.1.2)
2021-11-12
Data set includes the total scores of child comprehension and production, which measures the parent report of a representative sample of words from many different semantic (e.g. animal names, household items) and syntactic (e.g. action words, connectives) that children current understand and says, respectively.
Data set includes five sub-scales about gestures development: first gestures (\(12\) items), adult gestures (\(15\) items), parent gestures (\(13\) items), object gestures (\(17\) items) and games gestures (\(6\) items). Scores for each sub-scale was modeling using a unidimensional multiple-group IRT model based on the two-parameter logistic model (2PLM) for the binary item responses and the Graded Response Model (GRM) for the polytomous item responses. The 2PLM is a generalization of the Rasch model, which assumes that the probability of a correct response to item i depends only on the difference between the student v’s trait level \(\theta_{v}\) and the difficulty of the item \(b_{i}\). In addition, the 2PLM postulates that for every item, the association between this difference and the response probability depends on an additional item discrimination parameter \(a_{i}\). The discrimination parameter describes how well a certain item relates to the latent trait and, therefore, discriminates between children with different trait levels compared to other items on the test:
\[ Pr(x_{vi} = 1 | \theta_{v}, b_{i}, a_{i}) = \frac{exp(Da_{i}(\theta_{v}-b_{i})}{1 + exp(Da_{i}(\theta_{v}-b_{i})} \] The GRM, like the 2PLM , is a mathematical model for the probability that an individual will respond certain response category (k) on a particular item appropriate for the polytomous and ordinal nature of the items. The GRM is specified as a follow:
\[ Pr(x_{vi} \geq k | \theta_{v}, b_{ik}, a_{i}) = \frac{1}{1 + exp(-a_{i}(\theta_{v}-b_{ik}))} \] Here, \(\theta_{v}\) represents the latent score for children v, \(a_{i}\) represents the information parameter for each item i, and \(b_{ik}\) indicates the location parameter for each item i and score category k. The information parameter (\(a_{i}\)) indicates how well an item can distinguish between children with very similar latent abilities. The location parameter (\(b_{ik}\)) indicates whether children need a higher or lower level of perceived gesture ability, \(\theta_{v}\), to respond at or above that level k. Before estimating scores, the difficulty and discrimination parameters were evaluated as appropriate fit using criteria values within the range between \(-3.0\) and \(3.0\) and from \(0.5\) to \(5.0\), respectively.
The sub-scale score can be interpreted on a standard normal scale, where −1 and +1 are one standard deviation below and above the mean, respectively. Table 2 shows the Cronbach-\(\alpha\) coefficient, that is a measure of internal consistency, for each gesture sub-scale. Except by gesture games, all gestures sub-scales showed an acceptable internal consistency.
Furthermore, figures 1 and 2 present the curve items characteristics of each item for the first gestures and gestures object sub-scales which were used in the subsequent analysis.
| Sub-Scales | No. Items | Cronbach |
|---|---|---|
| Gestures adult | 15 | 0.885 |
| First Gestures | 12 | 0.852 |
| Gestures games | 6 | 0.593 |
| Gestures objects | 17 | 0.891 |
| Gestures parent | 13 | 0.885 |
Figure 1 “Item Characteristic curves for First Gestures Sub-scale, 2PL Model
Figure 2 Item Characteristic curves for Gestures Objects Sub-scale, 2PL Model
Table 3 shows the descriptive information for language acquisition scores (comprehension and production) and the gestures sub-scales. Production scores range from 0 to 376 (Mean = \(24.3\), S.D = \(45.7\)), while comprehension score range between 0 and 396 (Mean = \(114.9\), S.D= \(94.5\)). Scores for first gestures and gestures objects sub-scales takes values between \(-2.4\) and \(2.3\), with mean \(0\) and standard deviation approximately \(1\).
Table 3 Descriptive statistics language acquisition and gestures development scales
| Variable | Stats / Values | Freqs (% of Valid) | Missing | ||||
|---|---|---|---|---|---|---|---|
| comprehension [integer] |
|
310 distinct values | 0 (0.0%) | ||||
| production [integer] |
|
145 distinct values | 0 (0.0%) | ||||
| gestures_adult [numeric] |
|
575 distinct values | 15 (1.3%) | ||||
| gestures_first [numeric] |
|
881 distinct values | 4 (0.3%) | ||||
| gestures_games [numeric] |
|
60 distinct values | 11 (1.0%) | ||||
| gestures_objects [numeric] |
|
634 distinct values | 11 (1.0%) | ||||
| gestures_parent [numeric] |
|
350 distinct values | 15 (1.3%) |
Generated by summarytools 1.0.0 (R version 4.1.2)
2021-11-12
Figure below displays the distribution of child age, language acquisition scales, and gestures sub-scales used in the subsequent analysis. This figure shows that scores’ distribution of language production is severely positive skewed (or right-skewed), that is most values are clustered around the left tail of the distribution. Indeed, asymmetry and kurtosis values were \(3.54\) an \(15.20\), respectively. According to George & Mallery (2010), skewness and kurtosis values between \(-2\) and \(+2\) are considered acceptable in order to prove normal univariate distribution, which is a strong assumption for linear regression analysis. In contrast, comprehension score has asymmetry and kurtosis within this range (\(0.94\) and \(0.084\), respectively).
Figure 3 Distribution of variables
Finally, Figure 4 presents the correlation plot, which provides a visual representation of bi-variate relationships between the variables included in the analysis. Scattterplots includes the estimation of linear (red line) and local polynomial regression (green line), thus latter is a nonparametric method where the linearity assumptions of conventional regression methods have been relaxed. Results suggested that relationship between production and gestures could be non-linear. Furthermore, variables includes in the analysis were positive and moderate or fairly strong correlated, with pearson correlation coefficients range between \(0.493\) and \(0.773\).
Figure 3 Scatterplot and Correlation between variables
Table 4 presents the linear regression models results, which examined the degree to which age, first gestures, and gestures about objects predicts language comprehension and production.
The results across the regression show that age, first gestures and gestures objects are positive and significant predictors of language acquisition indicators. According to results, on average an increase one month in child age increase the comprehension and production in \(5.73\) and \(3.38\) points, while a variation in one point (equal to one standard deviation) in gestures about objects score increase scores in comprehension and production in \(42.77\) and \(11.70\) points, respectively. Furtheremore, explanoty variables explained the \(58.1%\) and \(32.4%\) of variance in comprehension and production scales, respectively.
| Comprehension | Production | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | std.Error | p-value | Estimates | std.Error | p-value |
| Intercept | 40.62 ** | 13.67 | 0.003 | -19.62 * | 8.41 | 0.020 |
| Age (Months) | 5.73 *** | 1.04 | <0.001 | 3.38 *** | 0.64 | <0.001 |
| First Gestures | 23.92 *** | 3.17 | <0.001 | 8.93 *** | 1.95 | <0.001 |
| Gestures Objects | 42.77 *** | 3.60 | <0.001 | 11.70 *** | 2.22 | <0.001 |
| Observations | 1137 | 1137 | ||||
| R2 / R2 adjusted | 0.581 / 0.579 | 0.324 / 0.322 | ||||
|
||||||
Linear regression makes several assumptions about the data, such as linearity of the data (relationship between the predictor and the outcome is assumed to be linear), normality of residuals (residual errors are assumed to be normally distributed), homoscedasticity or homogeneity of residuals variance, and independence of residuals error. Figures below check whether these assumptions hold true in the regression models conducted before.
The linear assumption can be checked by inspecting the Residuals vs Fitted plot, where a horizontal line, without distinct patterns is an indication for a linear relationship. In both regression model, there is no pattern in the residual plot, suggesting that we can assume linear relationship between the predictors and the outcome variables.
The normal distribution of residual can be verified by using the Normal Q-Q plot. A normal probability plot of residuals should approximately follow a straight line. In the case of model for comprehension, almost all the points fall approximately along this reference line, so we can assume normality. However, in the production model residuals did not fit a normal distribution. The assumption of homogeneity of variance of the residuals is checked by using the Scale-Location (or Spread-Location), if residuals are homoscedastic it have been equally spread along the ranges of predictors. Residuals in the production of production scores are heteroscedasticity, due to plot shows that the variability (variances) of the residual points increases with the value of the fitted outcome variable, suggesting non-constant variances in the residuals errors. Finally, the Residuals vs Leverage plot identified that there exist extreme values that influence the estimations results. In sum, results suggested that linear regression is not the most appropriate method for predicting the language acquisition scales, particularly language production.
Figure 4 Linear Regression Diagnostic plots. Results for language comprehension
Figure 5 Linear Regression Diagnostic plots. Results for language production
The class of beta regression models is an alternative approach to manage data that incorporates features such as heteroskedasticity or skewness. The beta regression models,introduced by Ferrari and Cribari-Neto (2004), is useful for modeling continuous variables \(y\) that assume values in the open standard unit interval \((0; 1)\). It is based on the assumption that the dependent variable is beta-distributed and that its mean is related to a set of regressors through a linear predictor with unknown coefficients and a link function. If the variable \(y\) assumes the extremes 0 and 1, a useful transformation in practice, proposed by Smithson and Verkuilen (2006) is: \(\frac{y*(n-1)+0.5}{n}\) where \(n\) is the sample size.
Tables 6 and 7 compare linear and beta regression models for comprehension and production, respectively, while figures 6 and 7 display below show the diagnostic plots. Results indicate that beta-regression is a better approach for estimating these variables, particularly language production, where the explanation of variance increase to \(55.3\%\) in comparison to the OLS model ((\(R^2 =32.2\%\)). In both models, the effect of age, first gestures and gestures about the objects is positive and statistically significant. Finally, Figures 6 and 7 assess the goodness of fit using different types of diagnostic plot.
| Linear Regression | Beta-Regression | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | std.Error | p-value | Estimates | std.Error | p-value |
| Intercept | 40.62 ** | 13.67 | 0.003 | 0.18 *** | 0.03 | <0.001 |
| Age (Months) | 5.73 *** | 1.04 | <0.001 | 1.06 *** | 0.01 | <0.001 |
| First Gestures | 23.92 *** | 3.17 | <0.001 | 1.51 *** | 0.06 | <0.001 |
| Gestures Objects | 42.77 *** | 3.60 | <0.001 | 1.71 *** | 0.08 | <0.001 |
| Observations | 1137 | 1137 | ||||
| R2 / R2 adjusted | 0.581 / 0.579 | 0.558 | ||||
| AIC | 12597.375 | -1492.095 | ||||
|
||||||
| Linear Regression | Beta-Regression | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | std.Error | p-value | Estimates | std.Error | p-value |
| Intercept | -19.62 * | 8.41 | 0.020 | 0.02 *** | 0.00 | <0.001 |
| Age (Months) | 3.38 *** | 0.64 | <0.001 | 1.09 *** | 0.02 | <0.001 |
| First Gestures | 8.93 *** | 1.95 | <0.001 | 1.35 *** | 0.06 | <0.001 |
| Gestures Objects | 11.70 *** | 2.22 | <0.001 | 1.48 *** | 0.08 | <0.001 |
| Observations | 1137 | 1137 | ||||
| R2 / R2 adjusted | 0.324 / 0.322 | 0.553 | ||||
| AIC | 11493.654 | -5106.356 | ||||
|
||||||
Figure 6 Beta Regression Diagnostic plots. Results for language comprehension
Figure 7 Beta Regression Diagnostic plots. Results for language production
The gestures that children produce early in development are related to the progress they make in lanuage acquisition. Furthermore, once language has been mastered, children’s gestures facilitated their learning of other concepts. This section presents the regression of a hierarchical logistic model used to explore a relationship between the children’s sociodemographic characteristics and the likelihood of producing gestures between 8 and 18 months.Particularly, the analysis considered that data by gestures have a hierarchical structure; thus, items of gestures production are nested in groups (children), that is, children repeat gestures. Overall, the dataset includes \(65.415\) items about gestures (level 1) observed in \(1.149\) children (group or level 2). The following models were estimated:
Model 0: Logistic model without random effect. This model does not include any explanatory variable and assumes independence of gestures, this assumption is unappropriated because gestures are repeated observation by each child
\[ ln(\frac{p(Y=1)}{1-p(Y=1)})=\beta_{0}\]
Model 1: Logistic model with random intercept. This model does not include any explanatory variable and also includes an intercept random effect (\(\tau_{00}\)), meaning that we expected that the probability to produce gestures (\(\beta_{0}\)) varies by children.
\[ ln(\frac{p(Y=1)}{1-p(Y=1)})=\beta_{0}+\sigma^{2}\]
\[ \beta_{0}= \gamma_{00} + \tau_{00}\] Model 3: Logistic model with two random intercept effects. This model includes two intercept random effects, thus considers that the probability to produce gestures (\(\beta_{0}\)) varies by children (\(\tau_{00}\)} and type of gestures (\(\tau_{01}\).
\[ ln(\frac{p(Y=1)}{1-p(Y=1)})=\beta_{0}+\sigma^{2}\]
\[\beta_{0}= \gamma_{00} + \tau_{00} +\tau_{01}\] Model 4: Logistic model with two random intercept and fixed effects. In addition, to intercept random effects, this model includes explanatory variables (age (months), mother education (1= College degree or above), and minority (1= No White race)) at level 2 for explaining differences in the probability to produce gestures between children (\(\beta_{0}\)).
\[ ln(\frac{p(Y=1)}{1-p(Y=1)})=\beta_{0}+\sigma^{2}\]
\[ \beta_{0}= \gamma_{00} + \gamma_{01}*age + \gamma_{02}*momedu + \gamma_{03}*minority + \tau_{00} +\tau_{01} \]
Table 10 presents the summary of models estimation. Values for coefficients effect are presented as odds ratios (OR), a measure of the strength of association with an explanatory and an outcome variable. OR \(>1\) means greater odds of association between variables, while OR \(< 1\) means there is a lower odds of association between the explanatory and outcome variable.
The negative coefficient in Model 0 indicates that it is more likely not to produce gestures in this particular sample. Indeed, the \(53.77\%\) of gestures items were not produce by children in the sample.
Model 1 examines whether the intercept varies from child to child; this means the probability to produce gestures as a random effect. This variation (\(\tau_{00}\)) is \(1.18\). This random effect is statistically significant, concluding that overall, the probability of producing gestures varies between children.
| 2.5 % | 97.5 % | |
|---|---|---|
| .sig01 | 1.0344090 | 1.1405162 |
| (Intercept) | -0.2708215 | -0.1340603 |
In Model 2 the two random effects (\(\tau_{00}\) = \(1.54\)) and (\(\tau_{00}\) = \(0.78\)) estimated are statistically significant (Table 9), this means that the probability to produce gestures varies between children and type of gestures.
| 2.5 % | 97.5 % | |
|---|---|---|
| .sig01 | 1.1826768 | 1.3027953 |
| .sig02 | 0.5284464 | 1.9276818 |
| (Intercept) | -1.0560035 | 0.8507078 |
Finally, Model 3 includes additional fixed effects in level 2 (child), which estimates whether there exists a relationship between children’ socio-demographic variables and the likelihood that a child will produce gestures. Age is a positive and significant effect, with means that the odd of produce gestures increase when children’ age. In contrast, the effect of mother education is also significant, however have a mother with college degree or above is associate with a lower odds of gestures production. With respect to the mother’s education level, the empirical evidence in this field have pointed out its influence in early language development. Nevertheless, this effect seems to be mediated by the linguistic input that the child receives and the quality of parental communication (e.g., direct speech, routines) (Serrat-Sellabona, 2021). Then, it is possible that the effect of mother education is mediated by other variables that were not included in the analysis or it is also possible that it has a greater impact in later development, instead in pre-linguistic stages of language acquisition.
| Model 0 | Model 1 | Model 2 | Model 3 | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Predictors | Odds Ratios | std.Error | p-value | Odds Ratios | std.Error | p-value | Odds Ratios | std.Error | p-value | Odds Ratios | std.Error | p-value |
| Intercept | 0.86 *** | 0.01 | <0.001 | 0.82 *** | 0.03 | <0.001 | 0.90 | 0.36 | 0.794 | 0.01 *** | 0.00 | <0.001 |
| Age (Months) | 1.40 *** | 0.01 | <0.001 | |||||||||
| Mother Education (College degree or above) | 0.90 * | 0.05 | 0.047 | |||||||||
| Minority | 1.04 | 0.06 | 0.477 | |||||||||
| Random Effects | ||||||||||||
| σ2 | 3.29 | 3.29 | 3.29 | |||||||||
| τ00 | 1.18 data_id | 1.54 data_id | 0.56 data_id | |||||||||
| 0.78 type | 0.78 type | |||||||||||
| N | 1044 data_id | 1044 data_id | 1044 data_id | |||||||||
| 5 type | 5 type | |||||||||||
| Observations | 65415 | 65415 | 65415 | 65415 | ||||||||
| R2 Tjur | 0.000 | 0.000 / 0.264 | 0.000 / 0.414 | 0.163 / 0.406 | ||||||||
| AIC | 90314.091 | 79769.861 | 72304.183 | 71421.092 | ||||||||
|
||||||||||||
Cribari-Neto,F. and Zeileis, A (2010). Beta Regression in R. Journal of Statistical Software, April 2010, Volumne 34, Issue 2.
George, D. and Mallery, P. (2010) SPSS for Windows Step by Step: A Simple Guide and Reference 17.0 Update. 10th Edition, Pearson, Boston.
Serrat-Sellabona, E., Aguilar-Mediavilla, E., Sanz-Torrente, M., Andreu, L., Amadóm A., and Serra, M. (2021) Sociodemographic and Pre-Linguistic Factors in Early Vocabulary Acquisition, Children, 8, 206
Smithson M, Verkuilen J (2006). Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables.” Psychological Methods, 11(1), 54{71.