Instructions:
1. Rename this file by replacing “LASTNAME” with your last name. This
can be done via the RStudio menu (File >> Rename).
2. Write your full name in the chunk above beside
author:.
3. Before beginning, it is good practice to create a directory that
contains your R scripts (this file) as well as any data you will need
(the “metadata_ICLEv2.csv” file). This can be done in the console
directly with the setwd() function or via the RStudio menu
(Session >> Set Working Directory).
4. Write R code to answer the questions below. The code should be
written within the chunks provided for each question. These chunks begin
with three back ticks and the letter r in curly brackets
(```{r}) and end with three back ticks. You can add as much
space as you need within the chunks but do not delete the back ticks or
otherwise modify the chunks in any way or the file will cause errors
when compiled.
5. When you have answered all of the questions, click the
Knit button. This will create an HTML file in your working
directory.
6. Upload the HTML file to Moodle.
ICLEv2 <- read.delim("~/Desktop/Exercise/metadata_ICLEv2.csv")
attach(ICLEv2)
ICLEv2<-data.frame(ICLEv2)
str(ICLEv2)
## 'data.frame': 6085 obs. of 33 variables:
## $ file : chr "BGSU1001" "BGSU1002" "BGSU1003" "BGSU1004" ...
## $ corpus_version : chr "1" "1" "1" "1" ...
## $ subcorpus_code : chr "BG" "BG" "BG" "BG" ...
## $ subcorpus_name : chr "Bulgarian" "Bulgarian" "Bulgarian" "Bulgarian" ...
## $ subsubcorpus_code: chr "BGSU" "BGSU" "BGSU" "BGSU" ...
## $ title : chr "Some people say that in our modern world, dominated by science and technology and industrialisation, there is n"| __truncated__ "Most University degrees are theoretical and do not prepare us for the real life. Do you agree or disagree?" "Some people say that in our modern world, dominated by science and technology and industrialisation, there is n"| __truncated__ "Most University degrees are theoretical and do not prepare us for the real life. Do you agree or disagree?" ...
## $ tagged : chr "No" "No" "No" "No" ...
## $ type : chr "Argumentative" "Argumentative" "Argumentative" "Argumentative" ...
## $ length : int 500 502 779 522 580 577 580 525 373 325 ...
## $ conditions : chr "No Timing" "No Timing" "No Timing" "No Timing" ...
## $ reftools : chr "Yes" "Yes" "Yes" "Yes" ...
## $ exam : chr "No" "No" "No" "No" ...
## $ age : int 20 20 20 20 21 21 21 21 21 21 ...
## $ sex : chr "Female" "Female" "Female" "Female" ...
## $ country : chr "Bulgaria" "Bulgaria" "Bulgaria" "Bulgaria" ...
## $ llanguage : chr "Bulgarian" "Bulgarian" "Bulgarian" "Bulgarian" ...
## $ homelang1 : chr "Bulgarian" "Bulgarian" "Bulgarian" "Bulgarian" ...
## $ homelang2 : chr "None" "None" "None" "None" ...
## $ homelang3 : chr "None" "None" "None" "None" ...
## $ instit : chr "Code48" "Code48" "Code48" "Code48" ...
## $ schooleng : num 8 8 8 8 10 10 8 8 8 8 ...
## $ unieng : num 2 2 2 2 2 2 2 2 2 2 ...
## $ monthseng : num 0 0 0 0 0 0 0 0 0 0 ...
## $ olang1 : chr "Spanish" "Spanish" "German" "German" ...
## $ olang2 : chr "Russian" "Russian" "None" "None" ...
## $ olang3 : chr "None" "None" "None" "None" ...
## $ date : chr "13/06/96 00:00:00" "06/06/96 00:00:00" "06/06/96 00:00:00" "06/06/96 00:00:00" ...
## $ status : chr "Complete" "Complete" "Complete" "Complete" ...
## $ comments : chr "-" "-" "-" "-" ...
## $ active : int 1 1 1 1 1 1 1 1 1 1 ...
## $ interface1 : int 1 1 1 1 1 1 1 1 1 1 ...
## $ instit2 : chr "Bulgaria - Sofia University « St. Kliment Ohridski »" "Bulgaria - Sofia University « St. Kliment Ohridski »" "Bulgaria - Sofia University « St. Kliment Ohridski »" "Bulgaria - Sofia University « St. Kliment Ohridski »" ...
## $ title2 : chr "Some people say that in our modern world, dominated by science and technology and industrialisation, there is n"| __truncated__ "Most University degrees are theoretical and do not prepare us for the real life. Do you agree or disagree?" "Some people say that in our modern world, dominated by science and technology and industrialisation, there is n"| __truncated__ "Most University degrees are theoretical and do not prepare us for the real life. Do you agree or disagree?" ...
dim(ICLEv2)
## [1] 6085 33
Answer: 6085
dim(ICLEv2)
## [1] 6085 33
Answer: 33
table(conditions)
## conditions
## No Timing Timed Unknown
## 3793 2051 241
Answer: The variable stored is a nominal variable.
tail(length)
## [1] 569 493 788 607 585 576
Answer: 607, 585, 576
table(exam)
## exam
## No Unknown Yes
## 3738 371 1976
Answer: 1976 texts were written under exam conditions.
table(country)
## country
## Austria Belgium Botswana Bulgaria China-Hong Kong
## 70 473 161 302 800
## China-Mainland Czech Republic Finland Germany Italy
## 179 241 391 302 401
## Japan Netherlands Norway Other Poland
## 366 109 317 48 363
## Russia South Africa Spain Sweden Switzerland
## 266 358 250 342 60
## Turkey Unknown
## 280 6
Answer: 401 texts come from Italy.
mean(length)
## [1] 616.7675
Answer: 616.7675 is the mean length
sd(length)
## [1] 269.2484
Answer: The standard deviation of text length is 269.2484
Answer:
hist(length)
Answer: No, since there is not a normal distribution, we cannot use the mean to summarize the “length” variable, the min value and max value are too far from each other.
summary(length)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 92.0 472.0 554.0 616.8 696.0 4139.0
mean(length)
## [1] 616.7675
prop.table(table(sex))*100
## sex
## Female Male Unknown
## 76.2037798 23.2046015 0.5916187
Answer: 23.20% of the text were written by males in the dataset.
unique(age)
## [1] 20 21 22 23 19 -1 28 27 25 18 38 30 26 29 41 24 35 40 34 42 48 36 31 37 44
## [26] 43 49 46 45 33 32 54 50 55 17 39 47 51 56 53 66 71 61 57
Answer: There is a negative value in this variable.
Remember to remove the problematic values you discovered in Question 11.
a<-age
b<-which(a==-1)#which elements of a are =1?
ag<-a[-c(b)]
quantile(ag)
## 0% 25% 50% 75% 100%
## 17 20 21 23 71
IQR(ag)
## [1] 3
Answer: The interquartile range for age is 3.
which.max(table(ag))
## 20
## 4
table(ag==20)
##
## FALSE TRUE
## 4741 1106
Answer: The most frequent age is 20, and 1106 learners are that age.