Note: These exercises are reproduced from Gries (2013)
Instructions
author:.setwd() function or via the RStudio menu
(Session >> Set Working Directory).(```{r}) and end with three back ticks. You can add as much
space as you need within the chunks but do not delete the back ticks or
otherwise modify the chunks in any way or the file will cause errors
when compiled.Knit button. This will create an HTML file in your working
directory.Data description
The file 04_exercises_01-09.csv contains data from a
corpus study on the alternation of particle placement that was
introduced in Section 1.3.1 of Gries (2013):
Construction Verb ~ Object ~ Particle
E.g. He picked [the book] up.
Construction Verb ~ Particle ~ Object
E.g. He picked up [the book].
It contains the following variables:
- CASE: the number of the data point
- MEDIUM: whether the example is from spoken or written
language
- CONSTRUCTION: which construction is used
(V_DO_PRT : verb-direct object_particle;
V_PRT_DO: verb-particle-direct object)
- DO_COMPLEXITY: how complex the direct object is (3
levels)
- DO_LENGTH_SYLL: how long the direct object is (in
syllables)
- PP: whether the verb-particle construction is followed by
a directional prepositional phrase (2 levels)
- DO_ANIMACY: whether the referent of the direct object is
animate or inanimate (2 levels)
- DO_CONCRETENESS: whether the referent of the direct
object is concrete or abstract (2 levels)
Load the file “04_exercises_01-09.csv” into a data frame
called vpcs and check whether the import was
successful.
setwd("~/Desktop/Satistics for linguistics/Exercise")
vpcs<- read.csv("04_exercises_01-09.csv", sep = "\t", header = TRUE, stringsAsFactors = TRUE)
Print a summary of all of of the variables in the data frame.
str(vpcs)
## 'data.frame': 200 obs. of 8 variables:
## $ CASE : int 1 2 3 4 5 6 7 8 9 10 ...
## $ MEDIUM : Factor w/ 2 levels "spoken","written": 1 1 1 1 1 1 1 1 1 1 ...
## $ CONSTRUCTION : Factor w/ 2 levels "V_DO_PRT","V_PRT_DO": 2 2 2 2 2 2 2 2 2 2 ...
## $ DO_COMPLEXITY : Factor w/ 3 levels "claus-modif",..: 3 3 2 2 3 2 3 3 2 2 ...
## $ DO_LENGTH_SYLL : int 2 2 3 8 2 4 4 2 8 5 ...
## $ PP : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 2 ...
## $ DO_ANIMACY : Factor w/ 2 levels "animate","inanimate": 2 2 2 2 2 2 2 2 2 2 ...
## $ DO_CONCRETENESS: Factor w/ 2 levels "abstract","concrete": 1 2 1 1 2 1 1 1 1 1 ...
Attach the data frame so that you can easily access the variables.
attach(vpcs)
Answer:
H0 = Asbtract and concrete references are not as frequent as referents
of direct objects in verp-participle constructions H1 = Asbtract and
concrete references are as frequent as referents of direct objects in
verp-participle constructions
plot(CONSTRUCTION ~ DO_CONCRETENESS)
Peters.2001 <- table(CONSTRUCTION, DO_CONCRETENESS)
Peters.2001
## DO_CONCRETENESS
## CONSTRUCTION abstract concrete
## V_DO_PRT 34 66
## V_PRT_DO 61 39
plot(CONSTRUCTION ~ DO_CONCRETENESS)
test.Peters <- chisq.test(Peters.2001, correct=FALSE)
test.Peters
##
## Pearson's Chi-squared test
##
## data: Peters.2001
## X-squared = 14.617, df = 1, p-value = 0.0001318
Answer: we can reject H0 taking into account that the p-value result is smaller than 0.05 meaning that we can accept H1.
Answer:
H0 = the lengths of the direct objects in verb particle constructions are normally distributed H1 = the lenghts of the direct objects in verb particle constructions are not normally distributed
hist(vpcs$DO_LENGTH_SYLL, main="DISTRIBUTION OF DIRECT OOBJECTS IN VERB-PARTICLE CONSTRUCTION", xlab="Object Lenght", ylab="Frequency")
shapiro.test(DO_LENGTH_SYLL)
##
## Shapiro-Wilk normality test
##
## data: DO_LENGTH_SYLL
## W = 0.78094, p-value = 5.179e-16
Answer: Given the result of the shapiro test, we see that the distribution is not normal.
Answer:
H0= there is no correlation between the choice of construction and the
complexity of the direct object H1=there is a correlation between the
choice of construction and the complexity of the direct object
data.num<- table (CONSTRUCTION,DO_COMPLEXITY)
plot(CONSTRUCTION~DO_COMPLEXITY, xlab="Complexity of the direct object", ylab="Choice of construction")
fisher.test (data.num)
##
## Fisher's Exact Test for Count Data
##
## data: data.num
## p-value = 2.925e-15
## alternative hypothesis: two.sided
Answer: Since there pearson’s test result is smaller than 0, we reject the alternative hypothesis and accept the null hypothesis since there is no correlation between the choice of construction and the complexity of the direct object.
Answer:
H0 = the average lengths of the direct objects in construction V_Prt_DO
and V_DO_Prt are equal H1 = the average lengths of the direct objects in
construction V_Prt_DO is larger than in construction V_DO_Prt.
tapply(CONSTRUCTION == "V_PRT_DO", CONSTRUCTION == "V_DO_PRT", mean)
## FALSE TRUE
## 1 0
boxplot(CONSTRUCTION == "V_PRT_DO", CONSTRUCTION == "V_DO_PRT", notch = TRUE)
#some options for the boxplot
boxplot(CONSTRUCTION == "V_PRT_DO", CONSTRUCTION == "V_DO_PRT", notch = TRUE)
rug(CONSTRUCTION == "V_PRT_DO", side=2)
rug(CONSTRUCTION == "V_DO_PRT", side=2)
tapply(CONSTRUCTION=="V_PRT_DO", DO_LENGTH_SYLL, mean)
## 1 2 3 4 5 6 7 8
## 0.0750000 0.3157895 0.4230769 0.4500000 0.7222222 0.7857143 1.0000000 0.7500000
## 9 10 11 12 13 14 15 16
## 0.8000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
## 17 18 19 31
## 1.0000000 1.0000000 1.0000000 1.0000000
tapply(CONSTRUCTION == "V_DO_PRT", DO_LENGTH_SYLL, mean)
## 1 2 3 4 5 6 7 8
## 0.9250000 0.6842105 0.5769231 0.5500000 0.2777778 0.2142857 0.0000000 0.2500000
## 9 10 11 12 13 14 15 16
## 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## 17 18 19 31
## 0.0000000 0.0000000 0.0000000 0.0000000
differences <- DO_LENGTH_SYLL[CONSTRUCTION=="V_PRT_DO"]-DO_LENGTH_SYLL[CONSTRUCTION!="V_PRT_DO"]
stripchart(differences, method="stack")
shapiro.test(differences)
##
## Shapiro-Wilk normality test
##
## data: differences
## W = 0.89451, p-value = 7.877e-07
Answer: There is no normal distribution on the differences provided that the value obtained in the shapiro test is less than 0.05.
fligner.test(DO_LENGTH_SYLL~CONSTRUCTION)
##
## Fligner-Killeen test of homogeneity of variances
##
## data: DO_LENGTH_SYLL by CONSTRUCTION
## Fligner-Killeen:med chi-squared = 44.174, df = 1, p-value = 3.004e-11
Answer: There is not significant variance according to the result on the fligner test.
tapply(CONSTRUCTION=="V_PRT_DO", DO_LENGTH_SYLL,median)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 31
## 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
tapply(CONSTRUCTION=="V_PRT_DO", DO_LENGTH_SYLL,IQR)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0.00 1.00 1.00 1.00 0.75 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 17 18 19 31
## 0.00 0.00 0.00 0.00
tapply(CONSTRUCTION=="V_DO_PRT", DO_LENGTH_SYLL,median)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 31
## 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tapply(CONSTRUCTION=="V_DO_PRT", DO_LENGTH_SYLL,IQR)
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 0.00 1.00 1.00 1.00 0.75 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 17 18 19 31
## 0.00 0.00 0.00 0.00
wilcox.test(DO_LENGTH_SYLL[CONSTRUCTION=="V_PRT_DO"], DO_LENGTH_SYLL[CONSTRUCTION=="V_DO_PRT"], paired=FALSE, correct=FALSE)
##
## Wilcoxon rank sum test
##
## data: DO_LENGTH_SYLL[CONSTRUCTION == "V_PRT_DO"] and DO_LENGTH_SYLL[CONSTRUCTION == "V_DO_PRT"]
## W = 8490.5, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Answer: We reject the null hypothesis since the wilcoxon test result equals less than 0.05 meaning that the length of the direct objects is larger in construction V_PRT_DO than V_DO_PRT.