Note: These exercises are reproduced from Gries (2013)

Instructions

  1. Rename this file by replacing “LASTNAME” with your last name. This can be done via the RStudio menu (File >> Rename).
  2. Write your full name in the chunk above beside author:.
  3. Before beginning, it is good practice to create a directory that contains your R scripts (this file) as well as any data you will need (the “04_exercises_01-09.csv” file). This can be done in the console directly with the setwd() function or via the RStudio menu (Session >> Set Working Directory).
  4. Write R code to answer the questions below. The code should be written within the chunks provided for each question. These chunks begin with three back ticks and the letter r in curly brackets (```{r}) and end with three back ticks. You can add as much space as you need within the chunks but do not delete the back ticks or otherwise modify the chunks in any way or the file will cause errors when compiled.
  5. When you have answered all of the questions, click the Knit button. This will create an HTML file in your working directory.
  6. Upload the HTML file to Moodle.

Data description

The file 04_exercises_01-09.csv contains data from a corpus study on the alternation of particle placement that was introduced in Section 1.3.1 of Gries (2013):

Construction Verb ~ Object ~ Particle
E.g. He picked [the book] up.
Construction Verb ~ Particle ~ Object
E.g. He picked up [the book].

It contains the following variables:
- CASE: the number of the data point
- MEDIUM: whether the example is from spoken or written language
- CONSTRUCTION: which construction is used (V_DO_PRT : verb-direct object_particle; V_PRT_DO: verb-particle-direct object)
- DO_COMPLEXITY: how complex the direct object is (3 levels)
- DO_LENGTH_SYLL: how long the direct object is (in syllables)
- PP: whether the verb-particle construction is followed by a directional prepositional phrase (2 levels)
- DO_ANIMACY: whether the referent of the direct object is animate or inanimate (2 levels)
- DO_CONCRETENESS: whether the referent of the direct object is concrete or abstract (2 levels)

Load the file “04_exercises_01-09.csv” into a data frame called vpcs and check whether the import was successful.

setwd("~/Desktop/Satistics for linguistics/Exercise")
vpcs<- read.csv("04_exercises_01-09.csv", sep = "\t", header = TRUE, stringsAsFactors = TRUE)

Print a summary of all of of the variables in the data frame.

str(vpcs)
## 'data.frame':    200 obs. of  8 variables:
##  $ CASE           : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ MEDIUM         : Factor w/ 2 levels "spoken","written": 1 1 1 1 1 1 1 1 1 1 ...
##  $ CONSTRUCTION   : Factor w/ 2 levels "V_DO_PRT","V_PRT_DO": 2 2 2 2 2 2 2 2 2 2 ...
##  $ DO_COMPLEXITY  : Factor w/ 3 levels "claus-modif",..: 3 3 2 2 3 2 3 3 2 2 ...
##  $ DO_LENGTH_SYLL : int  2 2 3 8 2 4 4 2 8 5 ...
##  $ PP             : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 2 ...
##  $ DO_ANIMACY     : Factor w/ 2 levels "animate","inanimate": 2 2 2 2 2 2 2 2 2 2 ...
##  $ DO_CONCRETENESS: Factor w/ 2 levels "abstract","concrete": 1 2 1 1 2 1 1 1 1 1 ...

Attach the data frame so that you can easily access the variables.

attach(vpcs)

1 You want to test whether abstract and concrete referents are equally frequent as referents of direct objects in verb-particle constructions.

1.1 Formulate non-directional hypotheses.

Answer:
H0 = Asbtract and concrete references are not as frequent as referents of direct objects in verp-participle constructions H1 = Asbtract and concrete references are as frequent as referents of direct objects in verp-participle constructions

1.2 Represent the data graphically

plot(CONSTRUCTION ~ DO_CONCRETENESS)

1.3 Compute the required statistical test

Peters.2001 <- table(CONSTRUCTION, DO_CONCRETENESS)
Peters.2001
##             DO_CONCRETENESS
## CONSTRUCTION abstract concrete
##     V_DO_PRT       34       66
##     V_PRT_DO       61       39
plot(CONSTRUCTION ~ DO_CONCRETENESS) 

test.Peters <- chisq.test(Peters.2001, correct=FALSE)
test.Peters
## 
##  Pearson's Chi-squared test
## 
## data:  Peters.2001
## X-squared = 14.617, df = 1, p-value = 0.0001318

1.4 Summarize the result(s) briefly.

Answer: we can reject H0 taking into account that the p-value result is smaller than 0.05 meaning that we can accept H1.

2 You want to test whether the lengths of the direct objects in verb-particle constructions are normally distributed.

2.1 Formulate non-directional hypotheses.

Answer:

H0 = the lengths of the direct objects in verb particle constructions are normally distributed H1 = the lenghts of the direct objects in verb particle constructions are not normally distributed

2.2 Represent the data graphically

hist(vpcs$DO_LENGTH_SYLL, main="DISTRIBUTION OF DIRECT OOBJECTS IN VERB-PARTICLE CONSTRUCTION", xlab="Object Lenght", ylab="Frequency")

2.3 Compute the required statistical test

shapiro.test(DO_LENGTH_SYLL)
## 
##  Shapiro-Wilk normality test
## 
## data:  DO_LENGTH_SYLL
## W = 0.78094, p-value = 5.179e-16

2.4 Summarize the result(s) briefly.

Answer: Given the result of the shapiro test, we see that the distribution is not normal.

3 You want to test whether there is a correlation between the choice of construction and the complexity of the direct object.

3.1 Formulate non-directional hypotheses.

Answer:
H0= there is no correlation between the choice of construction and the complexity of the direct object H1=there is a correlation between the choice of construction and the complexity of the direct object

3.2 Summarize the data numerically

data.num<- table (CONSTRUCTION,DO_COMPLEXITY)

3.3 Represent the data graphically

plot(CONSTRUCTION~DO_COMPLEXITY, xlab="Complexity of the direct object", ylab="Choice of construction")

3.4 Compute the required statistical test

fisher.test (data.num)
## 
##  Fisher's Exact Test for Count Data
## 
## data:  data.num
## p-value = 2.925e-15
## alternative hypothesis: two.sided

3.5 Summarize the result(s) briefly.

Answer: Since there pearson’s test result is smaller than 0, we reject the alternative hypothesis and accept the null hypothesis since there is no correlation between the choice of construction and the complexity of the direct object.

4 You now also want to test whether the average lengths of the direct objects in both constructions differ. Given previous studies, you suspect that the construction V_Prt_DO has a larger mean length. Formulate hypotheses, represent the data graphically, compute the required statistical test, and summarize the result(s) briefly.

4.1 Formulate hypotheses

Answer:
H0 = the average lengths of the direct objects in construction V_Prt_DO and V_DO_Prt are equal H1 = the average lengths of the direct objects in construction V_Prt_DO is larger than in construction V_DO_Prt.

4.2 Represent the data graphically

tapply(CONSTRUCTION == "V_PRT_DO", CONSTRUCTION == "V_DO_PRT", mean)
## FALSE  TRUE 
##     1     0
boxplot(CONSTRUCTION == "V_PRT_DO", CONSTRUCTION == "V_DO_PRT", notch = TRUE)

#some options for the boxplot
boxplot(CONSTRUCTION == "V_PRT_DO", CONSTRUCTION == "V_DO_PRT", notch = TRUE)
rug(CONSTRUCTION == "V_PRT_DO", side=2)
rug(CONSTRUCTION == "V_DO_PRT", side=2)

4.3 Are the data normally distributed (assumption 1)?

tapply(CONSTRUCTION=="V_PRT_DO", DO_LENGTH_SYLL, mean)
##         1         2         3         4         5         6         7         8 
## 0.0750000 0.3157895 0.4230769 0.4500000 0.7222222 0.7857143 1.0000000 0.7500000 
##         9        10        11        12        13        14        15        16 
## 0.8000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 
##        17        18        19        31 
## 1.0000000 1.0000000 1.0000000 1.0000000
tapply(CONSTRUCTION == "V_DO_PRT", DO_LENGTH_SYLL, mean)
##         1         2         3         4         5         6         7         8 
## 0.9250000 0.6842105 0.5769231 0.5500000 0.2777778 0.2142857 0.0000000 0.2500000 
##         9        10        11        12        13        14        15        16 
## 0.2000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 
##        17        18        19        31 
## 0.0000000 0.0000000 0.0000000 0.0000000
differences <- DO_LENGTH_SYLL[CONSTRUCTION=="V_PRT_DO"]-DO_LENGTH_SYLL[CONSTRUCTION!="V_PRT_DO"]
stripchart(differences, method="stack")

shapiro.test(differences)
## 
##  Shapiro-Wilk normality test
## 
## data:  differences
## W = 0.89451, p-value = 7.877e-07

Answer: There is no normal distribution on the differences provided that the value obtained in the shapiro test is less than 0.05.

4.4 Is the variance the same in the two groups (assumption 2)?

fligner.test(DO_LENGTH_SYLL~CONSTRUCTION)
## 
##  Fligner-Killeen test of homogeneity of variances
## 
## data:  DO_LENGTH_SYLL by CONSTRUCTION
## Fligner-Killeen:med chi-squared = 44.174, df = 1, p-value = 3.004e-11

Answer: There is not significant variance according to the result on the fligner test.

4.5 Calculate an appropriate measure of central tendency and a measure of dispersion for the length of direct objects in the two constructions.

tapply(CONSTRUCTION=="V_PRT_DO", DO_LENGTH_SYLL,median)
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 31 
##  0  0  0  0  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
tapply(CONSTRUCTION=="V_PRT_DO", DO_LENGTH_SYLL,IQR)
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 0.00 1.00 1.00 1.00 0.75 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
##   17   18   19   31 
## 0.00 0.00 0.00 0.00
tapply(CONSTRUCTION=="V_DO_PRT", DO_LENGTH_SYLL,median)
##  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 31 
##  1  1  1  1  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0
tapply(CONSTRUCTION=="V_DO_PRT", DO_LENGTH_SYLL,IQR)
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 0.00 1.00 1.00 1.00 0.75 0.00 0.00 0.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 
##   17   18   19   31 
## 0.00 0.00 0.00 0.00

4.6 Choose an appropriate statistical test for your hypothesis according to the assumptions you tested above.

wilcox.test(DO_LENGTH_SYLL[CONSTRUCTION=="V_PRT_DO"], DO_LENGTH_SYLL[CONSTRUCTION=="V_DO_PRT"], paired=FALSE, correct=FALSE) 
## 
##  Wilcoxon rank sum test
## 
## data:  DO_LENGTH_SYLL[CONSTRUCTION == "V_PRT_DO"] and DO_LENGTH_SYLL[CONSTRUCTION == "V_DO_PRT"]
## W = 8490.5, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

4.7 Summarize the results briefly

Answer: We reject the null hypothesis since the wilcoxon test result equals less than 0.05 meaning that the length of the direct objects is larger in construction V_PRT_DO than V_DO_PRT.